{"library":"seqio","title":"SeqIO","description":"SeqIO is a Python library by Google for creating task-based datasets, preprocessing pipelines, and evaluation for sequence models. It integrates deeply with T5, Gin-config, and TensorFlow/JAX/PyTorch backends, providing a flexible framework for machine learning research, particularly in NLP. The current version is 0.0.20, and it's under active development with frequent minor releases.","language":"python","status":"active","last_verified":"Mon May 18","install":{"commands":["pip install seqio","pip install seqio[tf]","pip install seqio[jax]"],"cli":null},"imports":["import seqio","from seqio import Task","from seqio import Mixture","from seqio import FunctionDataSource","from seqio import Feature","from seqio import Vocabulary","from seqio import preprocessors","from seqio import get_mixture_or_task"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import seqio\nimport tensorflow as tf\nimport functools\n\n# 1. Define a minimal mock vocabulary (required for seqio.Feature)\nclass SimpleVocabulary(seqio.Vocabulary):\n    def _encode(self, s): return [ord(c) for c in s] # Simple char to int\n    def _decode(self, ids): return \"\".join([chr(i) for i in ids]) # Simple int to char\n    @property\n    def EOS_ID(self): return 1\n    @property\n    def vocab_size(self): return 256 # ASCII range\n\n# 2. Define a data source function that returns a tf.data.Dataset\ndef my_data_source_fn(split, shuffle_files=False):\n    if split == \"train\":\n        return tf.data.Dataset.from_tensor_slices({\n            \"inputs\": [\"hello world\", \"python is fun\"],\n            \"targets\": [\"olleh dlrow\", \"nohtyp si nuf\"] # Simple reverse task\n        })\n    raise ValueError(f\"Unknown split: {split}\")\n\n# 3. Define a simple preprocessor (converts string to integer IDs)\n@seqio.map_over_dataset_fn\ndef tokenize_example(example):\n    return {\n        \"inputs\": tf.constant([ord(c) for c in example[\"inputs\"].numpy().decode()], dtype=tf.int32),\n        \"targets\": tf.constant([ord(c) for c in example[\"targets\"].numpy().decode()], dtype=tf.int32),\n    }\n\n# 4. Register the task with SeqIO\nseqio.Task.make_task(\n    name=\"simple_reverse_task\",\n    source=seqio.FunctionDataSource(\n        dataset_fn=my_data_source_fn,\n        splits=[\"train\"]\n    ),\n    preprocessors=[\n        tokenize_example,\n        functools.partial(seqio.preprocessors.trim_and_pad, \n                          output_features={\"inputs\": 20, \"targets\": 20}),\n        seqio.preprocessors.append_eos_after_trim,\n    ],\n    output_features={\n        \"inputs\": seqio.Feature(vocabulary=SimpleVocabulary(), add_eos=True),\n        \"targets\": seqio.Feature(vocabulary=SimpleVocabulary(), add_eos=True)\n    }\n)\n\n# 5. Retrieve the task and get its processed dataset\ntask = seqio.get_mixture_or_task(\"simple_reverse_task\")\nds = task.get_dataset(\n    sequence_length={\"inputs\": 20, \"targets\": 20}, # Max sequence length for features\n    split=\"train\",\n    shuffle=False\n)\n\n# 6. Iterate through an example to verify\nfor ex in ds.take(1):\n    print(\"\\n--- Processed Example ---\")\n    print(\"Raw features:\", {k: v.numpy() for k, v in ex.items()})\n    \n    decoded_inputs = task.output_features[\"inputs\"].vocabulary.decode(ex[\"inputs\"].numpy())\n    decoded_targets = task.output_features[\"targets\"].vocabulary.decode(ex[\"targets\"].numpy())\n    print(f\"Decoded inputs: '{decoded_inputs}'\")\n    print(f\"Decoded targets: '{decoded_targets}'\")\n","lang":"python","description":"This quickstart demonstrates how to define a custom task in SeqIO, including a data source function, a preprocessor to convert data into integer IDs, and a mock vocabulary. It registers the task and then retrieves a processed `tf.data.Dataset` for inspection. This setup forms the basis for training sequence models.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-18","installed_version":"0.0.20","pypi_latest":"0.0.20","is_stale":false,"summary":{"python_range":"3.10–3.9","success_rate":40,"avg_install_s":56.8,"avg_import_s":17.8,"wheel_type":"sdist"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"seqio","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"jax","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"tf","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"seqio","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":58.2,"import_time_s":14.08,"mem_mb":164.5,"disk_size":"3.0G"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"jax","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":58.3,"import_time_s":14.15,"mem_mb":164.5,"disk_size":"3.0G"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"tf","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":56.9,"import_time_s":14.13,"mem_mb":164.5,"disk_size":"3.0G"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"seqio","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"jax","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"tf","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"seqio","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":56.2,"import_time_s":20.96,"mem_mb":186.5,"disk_size":"3.1G"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"jax","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":57,"import_time_s":20.68,"mem_mb":186.5,"disk_size":"3.1G"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"tf","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":54.9,"import_time_s":20.94,"mem_mb":186.5,"disk_size":"3.1G"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"seqio","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"jax","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"tf","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"seqio","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":54.4,"import_time_s":19.3,"mem_mb":175.9,"disk_size":"3.1G"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"jax","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":54.3,"import_time_s":19.01,"mem_mb":175.9,"disk_size":"3.1G"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"tf","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":54.2,"import_time_s":19.05,"mem_mb":175.9,"disk_size":"3.1G"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"seqio","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"jax","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"tf","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"seqio","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":23.9,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"jax","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":23.9,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"tf","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":22.9,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"seqio","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"jax","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"tf","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"seqio","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":60,"import_time_s":16.88,"mem_mb":173.4,"disk_size":"2.8G"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"jax","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":58.9,"import_time_s":17.02,"mem_mb":173.4,"disk_size":"2.8G"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"tf","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"noisy","install_time_s":58.8,"import_time_s":17.37,"mem_mb":173.4,"disk_size":"2.8G"}]}}