FastTransform
Fasttransform is a Python library designed for creating reusable, reversible, and extensible data transformations. It is a core building block for data pipelines, particularly within the fastai ecosystem, and leverages multiple dispatch for type-based specialization of transforms. The current version is 0.0.2, and it appears to have a low release cadence with infrequent updates.
Warnings
- breaking Migration from `fastcore.dispatch`'s `typedispatch` to `plum-dispatch` may introduce `AmbiguousLookupError` where `fastcore` would silently pick a function. `plum` is stricter to prevent unexpected behavior.
- breaking The `Pipeline` class has moved from `fastcore` to `fasttransform`. Older `fastai` versions (specifically 2.7.x) might expect `Pipeline` in `fastcore`, leading to `ModuleNotFoundError` when loading models or using certain functionalities.
- gotcha When implementing custom reversible transforms, remember to define both `encodes` and `decodes` methods for the `Transform` class or pass both functions to the `Transform` constructor. Forgetting the `decodes` method will prevent reversibility.
- gotcha Fasttransform's type-based multiple dispatch, powered by `plum-dispatch`, will return the original input if no matching type annotation is found for an argument. This can lead to unexpected no-op behavior if types are not correctly annotated or handled.
- gotcha There is another unrelated library named `fastflowtransform` on PyPI. Ensure you are installing and importing the correct library (`fasttransform`) for data pipeline transformations described here to avoid confusion.
Install
-
pip install fasttransform -
pip install git+https://github.com/AnswerDotAI/fasttransform.git
Imports
- Transform
from fasttransform import Transform
- Pipeline
from fasttransform import Pipeline
Quickstart
from fasttransform import Transform, Pipeline
# Create a simple transform using a decorator
@Transform
def add_one(x: int) -> int:
return x + 1
# Create a reversible transform
def multiply_by_two_encodes(x: int) -> int:
return x * 2
def multiply_by_two_decodes(x: int) -> int:
return x // 2
MultiplyByTwo = Transform(multiply_by_two_encodes, multiply_by_two_decodes)
# Use a Pipeline to chain transforms
my_pipeline = Pipeline([add_one, MultiplyByTwo])
# Demonstrate usage
result_encoded = my_pipeline(5)
print(f"Encoded result: {result_encoded}") # Expected: (5 + 1) * 2 = 12
result_decoded = MultiplyByTwo.decode(result_encoded)
print(f"Decoded result (MultiplyByTwo): {result_decoded}") # Expected: 12 // 2 = 6