{"id":5260,"library":"intake","title":"Intake","description":"Intake is a lightweight Python package for finding, investigating, loading, and distributing data. It provides a common API for loading data from a wide variety of sources (e.g., CSV, NetCDF, SQL, HDF5, Parquet, Zarr) and enables the creation and management of data catalogs. The current version is 2.0.9, and the project is in a stable maintenance phase for its 2.x series, with less frequent but significant updates.","status":"active","version":"2.0.9","language":"en","source_language":"en","source_url":"https://github.com/intake/intake","tags":["data catalog","data access","dataframe","science","discovery","metadata"],"install":[{"cmd":"pip install intake","lang":"bash","label":"Core library"},{"cmd":"pip install intake[parquet,s3,sql]","lang":"bash","label":"With common plugins (example)"}],"dependencies":[{"reason":"YAML parsing for catalogs","package":"pyyaml"},{"reason":"Filesystem abstraction","package":"fsspec"},{"reason":"Parallel computing capabilities (optional, but commonly used)","package":"dask","optional":true},{"reason":"DataFrame processing (optional, but commonly used)","package":"pandas","optional":true}],"imports":[{"symbol":"open_catalog","correct":"import intake\ncatalog = intake.open_catalog('my_catalog.yaml')"},{"symbol":"open_csv","correct":"import intake\ndf = intake.open_csv('data.csv').read()"}],"quickstart":{"code":"import intake\n\n# Open a public example catalog\ncatalog = intake.open_catalog(\"https://raw.githubusercontent.com/intake/intake-examples/master/catalogs/us_states.yml\")\n\n# Access a data source from the catalog\ndf = catalog.states.read()\n\nprint(df.head())\n","lang":"python","description":"This quickstart demonstrates how to open a remote Intake catalog and load a dataset (US States data) into a Pandas DataFrame."},"warnings":[{"fix":"Update catalog YAML files to the Intake 2.x schema. Use `intake.open_catalog()` for catalogs and `intake.open_csv()`, `intake.open_parquet()`, etc., for direct source access. Consult the official migration guide.","message":"Major API changes occurred between Intake 1.x and 2.x, particularly concerning how drivers are accessed and catalog specifications are defined. Directly using `intake.source.<driver>.SourceClass` is deprecated in favor of `intake.open_<format>(...)` functions.","severity":"breaking","affected_versions":"<2.0 migrating to >=2.0"},{"fix":"Ensure you install the necessary Intake plugins, e.g., `pip install intake-parquet` for Parquet files, `pip install intake-sql` for SQL databases, or `pip install s3fs` for S3 access (often included in `intake[s3]` extras).","message":"Intake relies heavily on a plugin system for specific data formats and remote storage. If you try to open a file type (e.g., Parquet, SQL) or access a remote system (e.g., S3) without the corresponding `intake-<plugin_name>` package installed, you will encounter errors.","severity":"gotcha","affected_versions":"All 2.x versions"},{"fix":"Use `intake.open_catalog()` when you have a YAML file defining multiple data sources or remote catalogs. Use `intake.open_csv()`, `intake.open_parquet()`, etc., for quick, one-off access to individual files.","message":"Confusing `intake.open_catalog()` with direct source opening functions like `intake.open_csv()`. `open_catalog` is for loading YAML catalog files (which can contain multiple sources), whereas `open_csv` (and similar) directly open a single data file without a catalog.","severity":"gotcha","affected_versions":"All 2.x versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}