Slicerator
Slicerator is a Python library that provides a lazy-loading, fancy-sliceable iterable. It allows you to wrap existing classes or functions to make them behave like reusable generators that support advanced slicing (like NumPy arrays), but only load data when explicitly accessed. The current version is 1.1.0, released in April 2022. While releases are not on a strict schedule, it remains a functional and actively used utility in various projects.
Common errors
-
TypeError: object is not subscriptable
cause You are attempting to slice a regular Python object (e.g., an instance of `MyLazyLoader`) directly, without it being wrapped or decorated by `Slicerator`.fixEnsure your class is decorated with `@Slicerator.from_class` or your function is explicitly wrapped with `Slicerator.from_func` to enable fancy slicing. For instance, `loader = MyLazyLoader()` should be `loader = Slicerator.from_class(MyLazyLoader)()` or `loader = MyLazyLoader()` where `MyLazyLoader` is decorated. -
TypeError: object of type 'MyLazyLoader' has no len()
cause When using `@Slicerator.from_class` or `Slicerator.from_func`, the underlying object or function is expected to provide a length, typically via a `__len__` method for classes or the `length` argument for `from_func`.fixImplement a `__len__` method in your class that returns the total number of items, or pass the `length` argument when creating a `Slicerator` using `Slicerator.from_func`. -
AttributeError: 'Slicerator' object has no attribute 'my_custom_attribute'
cause Attributes from the original class are not automatically propagated to the `Slicerator` instance by default. You tried to access an attribute defined on your base class directly on the Slicerator object.fixUse the `propagate_attrs` parameter in `@Slicerator.from_class` (e.g., `@Slicerator.from_class(propagate_attrs=['my_custom_attribute'])`), use the `@propagate_attr` decorator on the method/property within your class, or define a `propagate_attrs` class attribute to explicitly list attributes to be propagated.
Warnings
- gotcha Operations like `list(my_slicerator)` or iterating directly over a Slicerator object (`for item in my_slicerator:`) will force all data to be loaded eagerly, negating the benefits of lazy loading for the entire dataset. Always be mindful of operations that trigger full evaluation.
- gotcha The propagation of attributes from the original class to the Slicerator object can be complex. There are multiple mechanisms (`propagate_attrs` parameter, `@propagate_attr` decorator, or a `propagate_attrs` class attribute), and their precedence can lead to unexpected behavior if not understood.
- breaking Version 0.9.8 introduced changes enabling `Pipelines` to modify properties. This might alter the expected behavior of existing pipelines that relied on immutable properties or specific side-effects, potentially leading to different outcomes after an upgrade.
Install
-
pip install slicerator
Imports
- Slicerator
from slicerator import Slicerator
- pipeline
from slicerator import pipeline
Quickstart
from slicerator import Slicerator
@Slicerator.from_class
class MyLazyLoader:
def __getitem__(self, i):
# This method is wrapped by Slicerator to accept slices, lists of integers, or boolean masks.
# The code below will only execute when an individual item (integer index) is requested.
# In a real application, this would load data from disk, a database, or network.
print(f"[DEBUG] Loading item {i}")
return f"data_item_{i}"
def __len__(self):
# Slicerator needs __len__ for proper slicing (e.g., reverse slicing or knowing slice bounds).
print("[DEBUG] Calculating total length")
return 10
# Instantiate the lazy loader
loader = MyLazyLoader()
# Create a Slicerator object by slicing. No data is loaded yet.
sliced_data = loader[::2] # Get every second item
print(f"\nInitial slice created: {sliced_data}")
# Further slice the Slicerator. Still no data loaded.
sub_sliced_data = sliced_data[1:] # Get elements from index 1 onwards of the sliced_data
print(f"Further sliced: {sub_sliced_data}")
# Access a single item - this triggers loading for that specific item.
first_item = sub_sliced_data[0]
print(f"First item accessed: {first_item}")
# Convert to list - this triggers loading for all items in `sub_sliced_data`.
all_items = list(sub_sliced_data)
print(f"All items loaded: {all_items}")