{"id":9694,"library":"docarray","title":"DocArray","description":"DocArray is a Python library that provides a data structure for multimodal data. It is designed to work efficiently with unstructured data like text, images, and audio, often used in machine learning and vector database contexts. The current version is 0.41.0, with frequent patch and minor releases, typically on a monthly to bi-monthly cadence.","status":"active","version":"0.41.0","language":"en","source_language":"en","source_url":"https://github.com/docarray/docarray","tags":["data structure","multimodal","vector database","embeddings","pydantic","machine learning","ai"],"install":[{"cmd":"pip install docarray","lang":"bash","label":"Standard install"},{"cmd":"pip install 'docarray[full]'","lang":"bash","label":"Install with all optional dependencies (e.g., Torch, TensorFlow, Jax, various vector DB clients)"}],"dependencies":[{"reason":"Core dependency for defining document schemas. Supports Pydantic v1 and v2.","package":"pydantic","optional":false},{"reason":"Core dependency, especially for NdArray types.","package":"numpy","optional":false},{"reason":"Required for `TorchArray`, included in the `full` extra.","package":"torch","optional":true},{"reason":"Required for `TensorFlowTensor`, included in the `full` extra.","package":"tensorflow","optional":true},{"reason":"Required for `JaxArray`, included in the `full` extra.","package":"jax","optional":true}],"imports":[{"note":"The `Document` class is part of DocArray's legacy API (pre-0.30.0) and should not be used for new projects. `BaseDoc` is the current base class for custom documents.","wrong":"from docarray import Document","symbol":"BaseDoc","correct":"from docarray import BaseDoc"},{"symbol":"DocList","correct":"from docarray import DocList"},{"symbol":"DocVec","correct":"from docarray import DocVec"},{"symbol":"NdArray","correct":"from docarray.typing import NdArray"}],"quickstart":{"code":"from docarray import BaseDoc, DocList\nfrom docarray.typing import NdArray\nimport numpy as np\n\n# 1. Define your custom document schema using BaseDoc\nclass MyDocument(BaseDoc):\n    text: str\n    image_embedding: NdArray[128] # Define an embedding field with fixed dimensions\n\n# 2. Create a single document instance\ndoc = MyDocument(text='hello world', image_embedding=np.random.rand(128))\nprint(f\"Created document with text: {doc.text}\")\n\n# 3. Create a collection of documents using DocList\ndocs = DocList[MyDocument]([\n    MyDocument(text='document one', image_embedding=np.random.rand(128)),\n    MyDocument(text='document two', image_embedding=np.random.rand(128)),\n])\nprint(f\"DocList contains {len(docs)} documents.\")\n\n# 4. Access individual documents and their fields\nprint(f\"First document's text: {docs[0].text}\")\nprint(f\"Second document's embedding shape: {docs[1].image_embedding.shape}\")","lang":"python","description":"Define custom document schemas using `BaseDoc` and type hints (including `NdArray` for numerical arrays/embeddings), then create instances of single documents or collections using `DocList`."},"warnings":[{"fix":"Migrate your document definitions from `docarray.Document` to `docarray.BaseDoc` and use `docarray.DocList` for document collections. Refer to the official DocArray migration guide.","message":"The `docarray.Document` class is a legacy API that has been deprecated since v0.30.0. Using it with newer DocArray features or for new projects will lead to missing functionality or errors. The current API uses `docarray.BaseDoc` for document definitions and `docarray.DocList` for collections.","severity":"breaking","affected_versions":">=0.30.0"},{"fix":"If your application expects a `dict` from `to_json()`, you must now explicitly parse the returned string. Example: `import json; data_dict = json.loads(doclist_instance.to_json())`.","message":"The `to_json()` method for `DocList` and `DocVec` changed its return type from a dictionary (`dict`) to a JSON-formatted string (`str`) to ensure consistency across serialization methods.","severity":"breaking","affected_versions":">=0.38.0"},{"fix":"Consult Pydantic's official migration guide for changes between v1 and v2. DocArray itself remains compatible, but your schema definitions might require adjustments.","message":"DocArray supports both Pydantic v1 and v2. However, if you upgrade your project's Pydantic dependency to v2, you may need to adapt your `BaseDoc` definitions to align with Pydantic v2's API changes (e.g., for `Field` usage, `default_factory`).","severity":"gotcha","affected_versions":">=0.39.0"},{"fix":"Ensure you are using `docarray>=0.39.1` if you rely on the `from_dataframe` method and have `numpy>=1.26.1` installed.","message":"A bug in `from_dataframe` when used with `numpy>=1.26.1` caused issues due to changes in NumPy's versioning semantics. This was patched in a subsequent release.","severity":"gotcha","affected_versions":"=0.39.0"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Replace `from docarray import Document` with `from docarray import BaseDoc` when defining your document schemas.","cause":"You are attempting to import the legacy `Document` class which is no longer part of the primary `docarray` namespace for new projects. It has been superseded by `BaseDoc`.","error":"ImportError: cannot import name 'Document' from 'docarray'"},{"fix":"If you need a 'tags' field, define it explicitly in your `BaseDoc` schema: `class MyDoc(BaseDoc): tags: List[str]`.","cause":"Features like `.tags` or `.chunks` were specific to the legacy `Document` class. `BaseDoc` objects are Pydantic models, so custom fields are defined directly.","error":"AttributeError: 'BaseDoc' object has no attribute 'tags'"},{"fix":"Use the built-in `to_json()` method of `DocList` to get a JSON string, then process it. Example: `json_string = my_doclist.to_json()`.","cause":"Attempting to directly serialize a `DocList` instance using `json.dumps()` without first converting it to a JSON-compatible format like a string or dictionary.","error":"TypeError: Object of type DocList is not JSON serializable"},{"fix":"Check the detailed error message for the specific field causing the validation error. Ensure data types and shapes match your `BaseDoc` schema definitions (e.g., `NdArray[128]` expects a NumPy array of shape (128,)).","cause":"Your `BaseDoc` model validation failed, often due to providing a value of the wrong type or shape for a field, e.g., passing a list when an `NdArray` is expected.","error":"pydantic.error_wrappers.ValidationError: 1 validation error for MyDocument"}]}