Flupy: Fluent Data Processing
Flupy is a lightweight Python library and command-line interface (CLI) for implementing data pipelines with a fluent, chainable interface. Built upon generators, it processes data lazily and uses a constant amount of memory, making it suitable for large datasets. The current stable version is 1.2.3, and it maintains an active development cadence.
Warnings
- gotcha Flupy pipelines are built on generators and evaluate lazily. Operations return new generators, and data is only processed when explicitly iterated over (e.g., in a `for` loop or by calling `.collect()`). If you assign a pipeline to a variable and iterate it once, then try to iterate the same variable again, it will be exhausted and yield no further results unless re-initialized.
- gotcha Methods on the `flu` object (like `.map()`, `.filter()`) return *new* `flu` instances, reflecting a functional programming paradigm. They do not modify the original iterable in-place. Attempting to modify the original source indirectly through chained methods will not work as expected.
- gotcha The `flupy` library has an accompanying command-line interface (CLI) named `flu`. In the CLI, input data is automatically assigned to a variable named `_`. This `_` variable is specific to the CLI context and should not be directly used or confused with the `flu` object when writing Python code.
Install
-
pip install flupy
Imports
- flu
from flupy import flu
Quickstart
from itertools import count
from flupy import flu
# Example: Process an infinite sequence in constant memory
pipeline = (
flu(count()) # Start with an infinite sequence
.map(lambda x: x**2) # Square each number
.filter(lambda x: x % 517 == 0) # Keep only multiples of 517
.chunk(5) # Group into chunks of 5
.take(3) # Take the first 3 chunks
)
results = []
for item in pipeline:
results.append(item)
print(results)
# Expected output (varies slightly based on iteration start, but structure is similar):
# [[0, 267289, 1069156, 2405601, 4276624], [6682225, 9622404, 13097161, 17106496, 21650409], [26728900, 32341969, 38489616, 45171841, 52388644]]