{"id":12732,"library":"hyparquet","title":"Hyparquet","description":"Hyparquet is a pure JavaScript library for parsing Apache Parquet files directly in web browsers and Node.js environments. It specializes in efficient data retrieval from cloud storage by leveraging HTTP range requests, allowing for direct querying of Parquet files over the network without requiring a server-side intermediary. The library is dependency-free since 2023, offering a lightweight solution. Its current stable version is 1.25.6, with a release cadence that follows active development. Key differentiators include its ability to minimize data fetches through selective row and column filtering, comprehensive support for all Parquet types, encodings, and compression codecs, and inclusion of TypeScript definitions for improved developer experience. It is particularly well-suited for data engineering, data science, and machine learning applications where large datasets stored in Parquet format need to be accessed and processed client-side.","status":"active","version":"1.25.6","language":"javascript","source_language":"en","source_url":"https://github.com/hyparam/hyparquet","tags":["javascript","ai","data","dataset","hyperparam","hyparquet","geoparquet","llm","ml","typescript"],"install":[{"cmd":"npm install hyparquet","lang":"bash","label":"npm"},{"cmd":"yarn add hyparquet","lang":"bash","label":"yarn"},{"cmd":"pnpm add hyparquet","lang":"bash","label":"pnpm"}],"dependencies":[],"imports":[{"note":"Hyparquet is published as an ES module; CommonJS 'require()' is not supported directly. Use dynamic import() or ensure your Node.js environment supports ES modules.","wrong":"const { parquetReadObjects } = require('hyparquet')","symbol":"parquetReadObjects","correct":"import { parquetReadObjects } from 'hyparquet'"},{"note":"Used to create an AsyncBuffer from a URL, suitable for browser environments or Node.js network fetches. For local files in Node.js, use asyncBufferFromFile.","symbol":"asyncBufferFromUrl","correct":"import { asyncBufferFromUrl } from 'hyparquet'"},{"note":"parquetMetadataAsync and parquetSchema are named exports, not default. They are crucial for inspecting file structure and statistics without full data loading.","wrong":"import parquetMetadataAsync from 'hyparquet'","symbol":"parquetMetadataAsync","correct":"import { parquetMetadataAsync, parquetSchema } from 'hyparquet'"},{"note":"This is a TypeScript type import for custom implementations of the AsyncBuffer interface.","symbol":"AsyncBuffer","correct":"import type { AsyncBuffer } from 'hyparquet'"}],"quickstart":{"code":"import { asyncBufferFromUrl, parquetReadObjects } from 'hyparquet'\n\nasync function fetchData() {\n  const url = 'https://hyperparam-public.s3.amazonaws.com/bunnies.parquet'\n  // Wrap the URL for asynchronous fetching with HTTP range requests\n  const file = await asyncBufferFromUrl({ url })\n  // Read objects, filtering by specific columns and rows for efficiency\n  const data = await parquetReadObjects({\n    file,\n    columns: ['Breed Name', 'Lifespan'],\n    rowStart: 10,\n    rowEnd: 20,\n  })\n  console.log('Fetched data:', data)\n}\n\nfetchData().catch(console.error)\n","lang":"typescript","description":"Demonstrates how to fetch and parse a remote Parquet file in a browser or Node.js using HTTP range requests, filtering for specific columns and rows."},"warnings":[{"fix":"Use ESM import statements (e.g., `import { ... } from 'hyparquet'`) or dynamic `import()` for compatibility with older Node.js versions. Configure your project to handle ES modules.","message":"Hyparquet is distributed exclusively as an ES module (ESM). Direct 'require()' calls for CommonJS environments are not supported.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"Ensure your HTTP server (e.g., S3, Google Cloud Storage) is configured to handle `Range` headers for partial content requests. Test with small files first.","message":"When reading remote Parquet files with `asyncBufferFromUrl`, efficient performance relies on the server supporting HTTP range requests. Without proper server support, the entire file might be downloaded.","severity":"gotcha","affected_versions":">=1.0.0"},{"fix":"Convert `BigInt` values to `Number` using `Number(metadata.num_rows)` when working with smaller numbers, or ensure your application logic correctly handles `BigInt` types for large row counts.","message":"The `num_rows` property from Parquet metadata is returned as a `BigInt`. Direct arithmetic operations or comparisons with standard `Number` types may lead to errors or unexpected results.","severity":"gotcha","affected_versions":">=1.0.0"},{"fix":"Define `columns` to specify only the needed columns and `rowStart`/`rowEnd` to limit the row range, reducing network bandwidth and parsing overhead. Example: `parquetReadObjects({ file, columns: ['col1'], rowStart: 0, rowEnd: 100 })`.","message":"To optimize data fetching for large remote files, always specify `columns`, `rowStart`, and `rowEnd` parameters in `parquetReadObjects`. Failing to do so will result in downloading and parsing the entire file.","severity":"gotcha","affected_versions":">=1.0.0"}],"env_vars":null,"last_verified":"2026-04-19T00:00:00.000Z","next_check":"2026-07-18T00:00:00.000Z","problems":[{"fix":"Change `const { ... } = require('hyparquet')` to `import { ... } from 'hyparquet'`. Ensure your Node.js project or bundler is configured for ES modules (e.g., 'type': 'module' in package.json or using .mjs extension).","cause":"Attempting to import hyparquet using CommonJS `require()` syntax in a Node.js environment or bundler that expects ES modules.","error":"TypeError: require is not a function"},{"fix":"Ensure `file` is created using `asyncBufferFromUrl({ url })` for remote files or `asyncBufferFromFile('path/to/file.parquet')` for local Node.js files, or provide a custom object implementing the `AsyncBuffer` interface correctly.","cause":"The `file` argument passed to `parquetReadObjects` or `parquetMetadataAsync` is not a valid `AsyncBuffer` instance or a compatible object.","error":"TypeError: Cannot read properties of undefined (reading 'slice')"},{"fix":"Verify the integrity and structure of your Parquet file. Ensure the `parquetSchema` function is used correctly, and its output (e.g., `schema.children`) is what you expect to iterate over. Inspect `metadata` and `schema` objects before mapping.","cause":"This error often occurs when attempting to map over a non-array result, sometimes indicating that the Parquet file was malformed, or the schema parsing returned an unexpected structure.","error":"Uncaught (in promise) TypeError: x.map is not a function"}],"ecosystem":"npm"}