{"id":607,"library":"apache-beam","title":"Apache Beam SDK for Python","description":"Apache Beam is an open-source, unified programming model for defining and executing data processing pipelines for both batch and streaming data. It offers language-specific SDKs, including Python, to construct pipelines that can run on various distributed processing backends such as Apache Flink, Apache Spark, and Google Cloud Dataflow. The library maintains an active development pace with minor releases approximately every 6 weeks, and its current version is 2.71.0.","status":"active","version":"2.71.0","language":"python","source_language":"en","source_url":"https://github.com/apache/beam","tags":["data-processing","batch","streaming","etl","dataflow","flink","spark","pipeline"],"install":[{"cmd":"pip install apache-beam","lang":"bash","label":"Basic installation"},{"cmd":"pip install 'apache-beam[gcp]'","lang":"bash","label":"With Google Cloud Platform (Dataflow) support"}],"dependencies":[{"reason":"Apache Beam 2.71.0 requires Python 3.10 or later.","package":"python","optional":false},{"reason":"Includes dependencies for Google Cloud Dataflow Runner, GCS IO, BigQuery IO, etc.","package":"apache-beam[gcp]","optional":true},{"reason":"Includes dependencies for interactive pipeline development (e.g., in notebooks).","package":"apache-beam[interactive]","optional":true},{"reason":"Includes dependencies for TFRecord I/O operations.","package":"apache-beam[tfrecord]","optional":true}],"imports":[{"note":"Standard alias for the Apache Beam SDK.","symbol":"beam","correct":"import apache_beam as beam"},{"symbol":"ReadFromText","correct":"from apache_beam.io import ReadFromText"},{"symbol":"WriteToText","correct":"from apache_beam.io import WriteToText"},{"symbol":"PipelineOptions","correct":"from apache_beam.options.pipeline_options import PipelineOptions"}],"quickstart":{"code":"import re\nimport argparse\nimport apache_beam as beam\nfrom apache_beam.io import ReadFromText, WriteToText\nfrom apache_beam.options.pipeline_options import PipelineOptions\n\ndef main(argv=None):\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n        '--input',\n        dest='input',\n        default='gs://dataflow-samples/shakespeare/kinglear.txt',\n        help='Input file to process.')\n    parser.add_argument(\n        '--output',\n        dest='output',\n        default='output.txt',\n        help='Output file to write results to.')\n    known_args, pipeline_args = parser.parse_known_args(argv)\n\n    pipeline_options = PipelineOptions(pipeline_args)\n\n    with beam.Pipeline(options=pipeline_options) as p:\n        # Read the text file into a PCollection\n        lines = p | ReadFromText(known_args.input)\n\n        # Count the occurrences of each word\n        counts = (\n            lines\n            | 'Split' >> beam.FlatMap(lambda x: re.findall(r'[A-Za-z\\']+', x))\n            | 'PairWithOne' >> beam.Map(lambda x: (x, 1))\n            | 'GroupAndSum' >> beam.CombinePerKey(sum)\n        )\n\n        # Format the counts into strings\n        output = counts | 'Format' >> beam.Map(\n            lambda word_count: '%s: %s' % (word_count[0], word_count[1]))\n\n        # Write the output\n        output | WriteToText(known_args.output)\n\nif __name__ == '__main__':\n    print(\"Running Beam WordCount pipeline locally...\")\n    main()\n    print(\"Pipeline finished. Check 'output.txt' for results.\")\n","lang":"python","description":"This classic WordCount example demonstrates basic Apache Beam concepts: reading data from a source (local file or GCS), applying transformations like splitting, mapping, and combining, and writing the results to an output file. Run it locally using the DirectRunner."},"warnings":[{"fix":"Review your pipeline's dependencies and install Apache Beam with the appropriate extras, e.g., `pip install 'apache-beam[gcp,interactive]'`.","message":"As of Apache Beam 2.70.0, many Python dependencies were split into 'extras'. If your pipeline previously relied on these implicitly, you may need to explicitly install them using `pip install apache-beam[gcp,interactive,yaml,redis,hadoop,tfrecord,...]` to ensure all necessary components are present.","severity":"breaking","affected_versions":">=2.70.0"},{"fix":"Upgrade your Python environment to 3.10 or a later supported version (e.g., Python 3.10, 3.11, 3.12). The PyPI package requires >=3.10 for 2.71.0.","message":"Support for Python 3.9 was removed in Apache Beam 2.70.0 after Python 3.9 reached its End-of-Life in October 2025. Ensure your development and runtime environments use Python 3.10 or newer.","severity":"deprecated","affected_versions":">=2.70.0"},{"fix":"If using `pickle_library=dill`, add `dill==0.3.1.1` to your `requirements.txt` or ensure it's installed in your custom container.","message":"The `dill` library is no longer a required, default dependency as of Apache Beam 2.71.0. If your pipeline explicitly uses the `pickle_library=dill` pipeline option, you must manually ensure `dill==0.3.1.1` is installed in both your submission and runtime environments.","severity":"gotcha","affected_versions":">=2.71.0"},{"fix":"Refactor code to use Beam's recommended patterns for sharing data, such as side inputs or stateful `DoFn`s, which are designed for distributed execution.","message":"Apache Beam's distributed nature makes global state problematic. Avoid using global variables to share data across elements. Instead, leverage Beam's side inputs for read-only shared data or stateful processing primitives (`StateSpec`, `TimerSpec`) for mutable, per-key state.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Define pipeline dependencies carefully using `requirements.txt` (or `setup.py` for packages). Consider using custom container images to control the exact environment and pre-install dependencies, ensuring reproducibility and avoiding conflicts.","message":"Managing Python pipeline dependencies can lead to 'Diamond Dependency' problems, especially when bundling `apache-beam` with other libraries or using many optional extras. Incompatible versions of shared dependencies can cause runtime errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your environment has the necessary build tools installed. For Alpine Linux, run `apk add build-base python3-dev`.","message":"Installing Apache Beam from source requires system-level build tools (like GCC and Python development headers). This is particularly relevant in minimal environments such as Alpine Linux, where these tools are not pre-installed.","severity":"breaking","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-05-12T16:39:43.615Z","next_check":"2026-06-26T00:00:00.000Z","problems":[{"fix":"Install Apache Beam using pip: `pip install apache-beam`.","cause":"This error occurs when the Apache Beam library is not installed in the Python environment.","error":"ModuleNotFoundError: No module named 'apache_beam'"},{"fix":"Ensure that Apache Beam is correctly installed and up to date: `pip install --upgrade apache-beam`.","cause":"This error arises due to an internal import issue within the Apache Beam library, often related to version mismatches or installation problems.","error":"ImportError: cannot import name 'coder_impl'"},{"fix":"Reduce the number of files processed in a single pipeline or split the processing into smaller batches.","cause":"This error occurs when attempting to process a very large number of files in a single pipeline, exceeding the system's allowable limit.","error":"Total size of the BoundedSource objects returned by BoundedSource.split() operation is larger than the allowable limit"},{"fix":"Implement error handling mechanisms to skip or log bad records, such as using try-except blocks around file reading operations.","cause":"This error happens when the pipeline encounters a malformed or corrupted file during processing.","error":"OSError: Invalid data stream"},{"fix":"Ensure that all functions and variables are properly defined and imported before they are used in the code.","cause":"This error occurs when a function or variable is referenced before it has been defined or imported.","error":"NameError: name 'parse_into_dict' is not defined"}],"ecosystem":"pypi","meta_description":null,"install_score":45,"install_tag":"draft","quickstart_score":0,"quickstart_tag":"stale","pypi_latest":"2.73.0","install_checks":{"last_tested":"2026-05-12","tag":"draft","tag_description":"notable install failures or slow imports","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":0.1,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":39.8,"import_time_s":4.13,"mem_mb":70.3,"disk_size":"712M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":17.2,"import_time_s":2.25,"mem_mb":50.1,"disk_size":"448M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":3.94,"mem_mb":69.9,"disk_size":"705M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.97,"mem_mb":49.9,"disk_size":"446M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":39.5,"import_time_s":5.26,"mem_mb":76.9,"disk_size":"789M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":16.9,"import_time_s":3.28,"mem_mb":54.1,"disk_size":"487M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":6.51,"mem_mb":76.6,"disk_size":"771M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.85,"mem_mb":54,"disk_size":"474M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":35.8,"import_time_s":6.11,"mem_mb":75.9,"disk_size":"775M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":15.9,"import_time_s":3.6,"mem_mb":52.2,"disk_size":"481M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":9.62,"mem_mb":78.5,"disk_size":"748M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":3.53,"mem_mb":51.9,"disk_size":"459M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":34.6,"import_time_s":5.85,"mem_mb":76.5,"disk_size":"766M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":15.9,"import_time_s":3.55,"mem_mb":53.7,"disk_size":"480M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":10.5,"mem_mb":78.5,"disk_size":"763M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":3.55,"mem_mb":53.4,"disk_size":"478M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":0.1,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"gcp","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":47.3,"import_time_s":4.53,"mem_mb":70.1,"disk_size":"659M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":22.3,"import_time_s":2.89,"mem_mb":50.2,"disk_size":"411M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"gcp","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":3.77,"mem_mb":69.8,"disk_size":"657M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.5,"mem_mb":50.1,"disk_size":"410M"}]},"quickstart_checks":{"last_tested":"2026-04-23","tag":"stale","tag_description":"widespread failures or data too old to trust","results":[{"runtime":"python:3.10-alpine","exit_code":1},{"runtime":"python:3.10-slim","exit_code":1},{"runtime":"python:3.11-alpine","exit_code":1},{"runtime":"python:3.11-slim","exit_code":1},{"runtime":"python:3.12-alpine","exit_code":1},{"runtime":"python:3.12-slim","exit_code":1},{"runtime":"python:3.13-alpine","exit_code":1},{"runtime":"python:3.13-slim","exit_code":1},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":1}]}}