Ploomber Core
Ploomber-core is a foundational Python library providing common utilities and functionality reused across projects within the Ploomber ecosystem. It includes modules for deprecations, telemetry, exceptions, and validations. As a core component, it supports the main Ploomber library, which is a framework for building modular data pipelines, integrating with Jupyter, and deploying to various platforms like Airflow and Kubernetes. The library is actively maintained, with a current version of 0.2.27, and the broader Ploomber project follows semantic versioning with frequent minor releases.
Common errors
-
An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
cause Running Ploomber tasks (specifically `ploomber.tasks.PythonCallable`) from a script on macOS or Windows without the `if __name__ == '__main__':` guard, leading to multiprocessing issues.fixEncapsulate the main execution block of your script within `if __name__ == '__main__':` to properly initialize child processes. -
Pipeline build failed due to an error in a downstream task, but the root cause is difficult to trace to an upstream data quality issue (e.g., unexpected nulls, incorrect join cardinality).
cause Unverified data assumptions or faulty logic in an upstream task allows bad data to propagate, causing a seemingly unrelated failure in a later task.fixImplement robust pipeline testing using Ploomber's `on_finish` hooks to validate data quality and expectations immediately after each task execution. This catches issues closer to their source. -
Incompatibility with ploomber-core when running 'pip install ploomber' or 'ploomber install'.
cause An older or incompatible version of `ploomber-core` is present or being installed, conflicting with the requirements of the main `ploomber` package.fixEnsure `ploomber-core` is updated to a version compatible with your `ploomber` installation. Use `pip install --upgrade ploomber ploomber-core` or consult the `ploomber` changelog for specific version requirements.
Warnings
- gotcha When running PythonCallable tasks (from the main Ploomber library, which uses ploomber-core) in a script on macOS or Windows, multiprocessing issues can arise if the task execution isn't protected by `if __name__ == '__main__':`.
- gotcha Ploomber tasks (relying on ploomber-core's foundations) are designed to produce at least one output (a 'product', e.g., a file or database entry) to enable incremental builds and dependency tracking. Tasks that generate no explicit output can break pipeline logic or prevent caching benefits.
- breaking The main Ploomber library (which depends on ploomber-core) follows semantic versioning, meaning major version increments (`0.x` to `0.y`) can introduce API-incompatible changes. These are typically preceded by `FutureWarning` for two minor releases.
Install
-
pip install ploomber-core
Imports
- Telemetry
from ploomber_core.telemetry import Telemetry
- deprecated
import ploomber_core.deprecated
- exceptions
import ploomber_core.exceptions
- validations
import ploomber_core.validations
Quickstart
import os
from ploomber_core.telemetry import Telemetry
# Initialize Telemetry. In a real application, this is often managed by Ploomber's main library.
# The 'do_not_track' setting can be controlled via an environment variable.
telemetry = Telemetry(
config={'do_not_track': os.environ.get('PLOOMBER_DO_NOT_TRACK', '0') == '1'},
version='0.2.27' # Typically, this would be `__version__` from the package
)
print(f"Ploomber-core Telemetry initialized (version {telemetry.version}).")
print(f"Telemetry tracking enabled: {not telemetry.do_not_track}")
# Example: In a larger application, you might log events:
# telemetry.log_api_call('some_internal_function')