{"id":8078,"library":"deeplake","title":"Deep Lake","description":"Deep Lake is a Python library for building, managing, and querying multi-modal datasets for AI. It enables storing and streaming data (images, videos, audio, text, embeddings) directly from cloud storage to machine learning models, supporting various operations like version control, indexing, and complex queries. As of version 4.5.10, it features a C++ core for enhanced performance and offers robust data management for AI workflows. The project is actively developed with frequent minor releases.","status":"active","version":"4.5.10","language":"en","source_language":"en","source_url":"https://github.com/activeloopai/deeplake","tags":["data lake","multi-modal AI","vector database","dataset management","machine learning","data streaming"],"install":[{"cmd":"pip install deeplake","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"symbol":"deeplake","correct":"import deeplake"},{"note":"The VectorStore class moved from the deprecated 'activeloop' package to 'deeplake' in v3.0+.","wrong":"from activeloop.vectorstore import VectorStore","symbol":"VectorStore","correct":"from deeplake.vectorstore import VectorStore"},{"note":"The function to create an empty dataset was moved and simplified from deeplake.dataset.create_empty to the top-level deeplake.empty in v3.0+.","wrong":"deeplake.dataset.create_empty(...)","symbol":"empty","correct":"deeplake.empty(...)"}],"quickstart":{"code":"import deeplake\nimport numpy as np\nimport os\n\n# Authenticate to Deep Lake Hub (optional for local, required for cloud storage)\n# For cloud storage, ensure DEEPLAKE_TOKEN is set as an environment variable or use deeplake.login()\n# DEEPLAKE_TOKEN = os.environ.get('DEEPLAKE_TOKEN', '')\n# if DEEPLAKE_TOKEN:\n#     deeplake.login(token=DEEPLAKE_TOKEN)\n\n# Use a local path for quick testing without authentication, or hub:// for cloud\nds_path = os.environ.get(\"DEEPLAKE_PATH\", \"./my_local_dataset\") \n# For cloud: hub_path = os.environ.get(\"DEEPLAKE_CLOUD_PATH\", \"hub://activeloop/quickstart-test\")\n\n# Create an empty dataset or overwrite existing one\nds = deeplake.empty(ds_path, overwrite=True)\n\n# Define schema and append data within a 'with' block\nwith ds:\n    ds.create_tensor('images', htype='image', sample_compression='jpeg')\n    ds.create_tensor('labels', htype='class_label')\n    \n    for i in range(5):\n        # Append random image and label data\n        ds.images.append(np.random.rand(64, 64, 3) * 255)\n        ds.labels.append(i % 2)\n\nprint(f\"Dataset created at {ds_path} with {len(ds)} samples.\")\n\n# Load the dataset\nds_loaded = deeplake.load(ds_path)\n\n# Query and access data\nprint(f\"Loaded dataset has {len(ds_loaded)} samples.\")\nprint(ds_loaded.summary())\n\n# Access a sample\nfirst_image = ds_loaded.images[0].numpy()\nfirst_label = ds_loaded.labels[0].numpy()\nprint(f\"First image shape: {first_image.shape}, First label: {first_label}\")","lang":"python","description":"This quickstart demonstrates how to create a new Deep Lake dataset, define its schema with tensors for images and labels, append synthetic data, and then load and query the dataset. It shows how to handle authentication via environment variables (recommended for non-interactive use) and provides options for local or cloud storage paths."},"warnings":[{"fix":"Old: `ds = deeplake.dataset.create_empty(path)`. New: `ds = deeplake.empty(path)`. Tensor access often changed from `ds.tensors.foo.append()` to `ds.foo.append()`.","message":"Deep Lake v3.0 introduced significant API changes, including the reorganization of dataset creation functions and tensor access patterns.","severity":"breaking","affected_versions":"3.0.0 and above"},{"fix":"Migrate `activeloop.login()` calls to `deeplake.login()` or set the `DEEPLAKE_TOKEN` environment variable for non-interactive authentication.","message":"The `activeloop` package and its authentication methods are deprecated in favor of `deeplake.login()` and environment variables.","severity":"breaking","affected_versions":"3.0.0 and above"},{"fix":"Iterate over samples or use slicing to process data in chunks. E.g., `for sample in ds.tensor_name: ...` or `ds.tensor_name[start:end].numpy()` for smaller slices.","message":"Deep Lake datasets are designed for efficient streaming; loading an entire tensor into memory (`tensor.numpy()`) for very large tensors can lead to OOM errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure the path format is correct, including the `hub://` prefix for cloud datasets. Verify `DEEPLAKE_TOKEN` grants access to the specified organization.","message":"Dataset paths for Hub (cloud) storage must include `hub://` prefix and a valid organization/username, e.g., `hub://org_name/dataset_name`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `ds.create_tensor()` to define `htype` (e.g., 'image', 'text') and `sample_compression`. Ensure appended NumPy arrays or other data types match expectations (e.g., RGB images should be (H, W, 3)).","message":"When appending data to a dataset, ensure the data type and shape are consistent with the tensor's `htype` and inferred dimensions, or explicitly define the schema.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `deeplake.login()` interactively, or set the `DEEPLAKE_TOKEN` environment variable to your Activeloop token before running your script.","cause":"Attempting to access a Deep Lake Hub dataset without proper authentication.","error":"Hub API token not provided. Please provide a token through `deeplake.login()` or by setting the `DEEPLAKE_TOKEN` environment variable."},{"fix":"Verify the dataset path is correct. For Hub datasets, ensure the token has access to the specified path. For local datasets, check file system permissions.","cause":"The specified dataset path (local or hub://) does not exist, or the user lacks read/write permissions.","error":"Dataset not found. Please check the path and permissions."},{"fix":"To get the number of samples in a dataset, use `len(ds)`. To get the length of a specific tensor (number of samples it contains), use `len(ds.tensor_name)`.","cause":"Attempting to call `len()` directly on a Deep Lake Tensor object instead of the dataset or a specific tensor property.","error":"TypeError: object of type 'Tensor' has no len()"},{"fix":"Ensure the `htype` of the tensor matches the type of data you are appending. If you need to store different types, create separate tensors or define a more generic `htype` if applicable, or define a flexible schema.","cause":"Attempting to append data of a different `htype` than what the tensor was initialized with or inferred.","error":"ValueError: Mismatch in data type. Expected 'image', got 'video' for tensor 'my_tensor'."}]}