{"id":7595,"library":"pyorc","title":"PyORC","description":"PyORC is a Python module designed for efficiently reading and writing data in the Apache ORC (Optimized Row Columnar) file format. It provides high-performance access to ORC files, commonly used in big data ecosystems like Apache Hive, Spark, and Flink. The current version is 0.11.0, and the library maintains an active development schedule with several releases per year.","status":"active","version":"0.11.0","language":"en","source_language":"en","source_url":"https://github.com/noirello/pyorc.git","tags":["orc","apache","data-format","big-data","columnar"],"install":[{"cmd":"pip install pyorc","lang":"bash","label":"Basic installation"},{"cmd":"pip install 'pyorc[dataframe]'","lang":"bash","label":"With Pandas DataFrame support"}],"dependencies":[{"reason":"Required for timezone-aware operations when using the optional 'dataframe' extra.","package":"pytz","optional":true}],"imports":[{"symbol":"Reader","correct":"from pyorc import Reader"},{"symbol":"Writer","correct":"from pyorc import Writer"},{"symbol":"TypeDescription","correct":"from pyorc import TypeDescription"},{"note":"TypeKind was moved directly under the pyorc module in version 0.9.0","wrong":"from pyorc.enums import TypeKind","symbol":"TypeKind","correct":"from pyorc import TypeKind"}],"quickstart":{"code":"import pyorc\nimport os\nimport datetime\nimport decimal\n\n# Define a schema for demonstration\nschema_str = \"struct<id:int,name:string,value:decimal(10,2),timestamp:timestamp>\"\nschema = pyorc.TypeDescription.from_string(schema_str)\n\nfile_path = \"example.orc\"\n\n# --- Writing an ORC file ---\nprint(f\"Writing to {file_path}\")\nwith open(file_path, \"wb\") as f:\n    with pyorc.Writer(f, schema) as writer:\n        writer.write((1, \"Alice\", decimal.Decimal(\"10.50\"), datetime.datetime(2023, 1, 1, 10, 0, 0, tzinfo=datetime.timezone.utc)))\n        writer.write((2, \"Bob\", decimal.Decimal(\"20.75\"), datetime.datetime(2023, 1, 2, 11, 30, 0, tzinfo=datetime.timezone.utc)))\n        writer.write((3, \"Charlie\", decimal.Decimal(\"30.00\"), datetime.datetime(2023, 1, 3, 12, 0, 0, tzinfo=datetime.timezone.utc)))\n\nprint(f\"Successfully wrote {file_path}\")\n\n# --- Reading an ORC file ---\nprint(f\"Reading from {file_path}\")\nwith open(file_path, \"rb\") as f:\n    # For file-like objects, pass them directly to the Reader constructor (v0.10.0+)\n    reader = pyorc.Reader(f)\n    print(\"Schema:\", reader.schema)\n    print(\"Rows:\")\n    for row in reader:\n        print(row)\n\n# Clean up\nif os.path.exists(file_path):\n    os.remove(file_path)\n    print(f\"Cleaned up {file_path}\")","lang":"python","description":"This quickstart demonstrates how to define an ORC schema, write data into an ORC file, and then read the data back using `pyorc.Writer` and `pyorc.Reader`. It includes examples for `int`, `string`, `decimal`, and `timestamp` types, ensuring timezone awareness for `datetime` objects."},"warnings":[{"fix":"If you are opening a file-like object (e.g., from `open()`, `io.BytesIO`), use `pyorc.Reader(file_like_object=f_obj)`. For a file path, use `pyorc.Reader('path/to/file.orc')`.","message":"The `pyorc.Reader` constructor's behavior changed significantly for file-like objects. Previously, you could pass a file-like object directly as the `path` argument. Now, `path` is strictly for string or `pathlib.Path` objects. For file-like objects, you must pass them to the `file_like_object` keyword argument.","severity":"breaking","affected_versions":">=0.10.0"},{"fix":"Update imports from `from pyorc.enums import TypeKind` to `from pyorc import TypeKind`. Review `pyorc.Column` usage if relying on default `tzinfo` or other schema defaults.","message":"The `TypeKind` enum was moved directly under the `pyorc` module. Additionally, default values for some `Column` attributes (e.g., `tzinfo`) were changed.","severity":"breaking","affected_versions":">=0.9.0"},{"fix":"Ensure all `datetime` objects passed to `pyorc.Writer` have an explicit timezone, preferably UTC. Example: `datetime.datetime(YYYY, M, D, H, M, S, tzinfo=datetime.timezone.utc)`.","message":"When writing `datetime` objects to ORC `timestamp` columns, it is highly recommended to use timezone-aware `datetime` objects (e.g., using `datetime.timezone.utc` or `pytz`). Writing naive `datetime` objects can lead to ambiguous or incorrect time interpretations in downstream systems.","severity":"gotcha","affected_versions":"All"},{"fix":"Carefully define your `TypeDescription` to match the data you intend to write. Ensure `decimal` values have correct precision/scale. Validate your data against the schema before writing.","message":"PyORC enforces strict schema matching during writing. If the Python data types or structure do not align precisely with the defined ORC `TypeDescription`, a `PyorcDataError` will be raised. This includes precision and scale for `decimal` types.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Use the `file_like_object` keyword argument for file-like objects: `reader = pyorc.Reader(file_like_object=f_obj)`.","cause":"Attempting to pass a file-like object directly to the `path` argument of `pyorc.Reader` after version 0.10.0. The `path` argument now expects a string or `pathlib.Path`.","error":"TypeError: object of type '_io.BufferedReader' has no len()"},{"fix":"Review your `TypeDescription` and the data being written. Ensure column order, data types, and (for decimals) precision/scale are correctly aligned. For example, if schema expects `string`, don't pass `int`.","cause":"The Python tuple/list passed to `writer.write()` does not match the structure or data types specified in the `TypeDescription` provided to the `pyorc.Writer`.","error":"pyorc.errors.PyorcDataError: The given data is not compatible with the current schema"},{"fix":"Verify the integrity of the ORC file. Ensure it's not truncated. If writing, try different `compression` or `stripe_size` options. If reading, ensure the file is indeed an ORC file.","cause":"The ORC file being read is corrupted, truncated, or not a valid ORC file. This can also occur if the file was written with incompatible compression or encoding options.","error":"pyorc.errors.ParseError: Malformed ORC file"}]}