Hyparquet

1.25.6 · active · verified Sun Apr 19

Hyparquet is a pure JavaScript library for parsing Apache Parquet files directly in web browsers and Node.js environments. It specializes in efficient data retrieval from cloud storage by leveraging HTTP range requests, allowing for direct querying of Parquet files over the network without requiring a server-side intermediary. The library is dependency-free since 2023, offering a lightweight solution. Its current stable version is 1.25.6, with a release cadence that follows active development. Key differentiators include its ability to minimize data fetches through selective row and column filtering, comprehensive support for all Parquet types, encodings, and compression codecs, and inclusion of TypeScript definitions for improved developer experience. It is particularly well-suited for data engineering, data science, and machine learning applications where large datasets stored in Parquet format need to be accessed and processed client-side.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates how to fetch and parse a remote Parquet file in a browser or Node.js using HTTP range requests, filtering for specific columns and rows.

import { asyncBufferFromUrl, parquetReadObjects } from 'hyparquet'

async function fetchData() {
  const url = 'https://hyperparam-public.s3.amazonaws.com/bunnies.parquet'
  // Wrap the URL for asynchronous fetching with HTTP range requests
  const file = await asyncBufferFromUrl({ url })
  // Read objects, filtering by specific columns and rows for efficiency
  const data = await parquetReadObjects({
    file,
    columns: ['Breed Name', 'Lifespan'],
    rowStart: 10,
    rowEnd: 20,
  })
  console.log('Fetched data:', data)
}

fetchData().catch(console.error)

view raw JSON →