T-Digest data structure

0.5.2.2 · active · verified Thu Apr 16

The `tdigest` library is a Python implementation of Ted Dunning's t-digest data structure, designed for efficient and accurate percentile and quantile estimation from streaming or distributed data. It enables computations like percentiles, quantiles, and trimmed means. The current official PyPI version is 0.5.2.2, with releases focusing on performance improvements and bug fixes. The library is actively maintained with occasional updates.

Common errors

Warnings

Install

Imports

Quickstart

Initializes a TDigest object, updates it with data either sequentially or in batches, and demonstrates how to compute percentiles and merge two digests. Requires `numpy` for random data generation.

import numpy as np
from tdigest import TDigest

# Create a TDigest instance
digest = TDigest()

# Update the digest sequentially with random data
for _ in range(5000):
    digest.update(np.random.random())

# Or update the digest in batches
another_digest = TDigest()
another_digest.batch_update(np.random.random(5000))

# Compute the 15th percentile
print(f"15th percentile (sequential): {digest.percentile(15)}")
print(f"15th percentile (batch): {another_digest.percentile(15)}")

# Sum two digests
sum_digest = digest + another_digest
print(f"30th percentile (summed): {sum_digest.percentile(30)}")

view raw JSON →