RudderStack Profiles
RudderStack Profiles (version 0.25.2) is a warehouse-native semantic modeling layer that enables data teams to create unified customer profiles by transforming declarative YAML configurations into optimized SQL. It focuses on identity resolution and feature engineering within your data warehouse. The library is actively maintained with frequent updates, primarily delivered as a CLI tool.
Common errors
-
'pb' is not recognized as an internal or external command, operable program or batch file.
cause The Python installation's script directory (where 'pb' is installed) is not in your system's PATH, or the terminal session hasn't refreshed.fixRestart your terminal. If it persists, ensure Python and pip are correctly installed and added to your PATH. On Windows, a fresh reinstall might be needed: `pip3 uninstall profiles-rudderstack-bin; pip3 uninstall profiles-rudderstack; pip3 install profiles-rudderstack --no-cache-dir`. -
pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. profiles-pycorelib X requires profiles-rudderstack!=Y,<=Z,>=A, but you have profiles-rudderstack B which is incompatible.
cause Dependency conflicts, often when upgrading `profiles-rudderstack` without also updating `profiles-pycorelib` in a compatible way.fixPerform a clean uninstall and reinstall: `pip3 uninstall profiles-pycorelib; pip3 uninstall profiles-rudderstack; pip3 install profiles-rudderstack --no-cache-dir`. -
Could not find parent table for alias "".
cause This error typically occurs during the ID stitcher model run when RudderStack cannot resolve a referenced table or alias within your YAML configurations.fixReview your `profiles.yaml` and `inputs.yaml` files. Ensure all table names and aliases are correctly specified and accessible in your connected warehouse. Validate your project using `pb validate` for syntax errors. -
ERROR: (100033): User does not have privileges to operate on schema 'YOUR_SCHEMA'.
cause The warehouse user configured in `siteconfig.yaml` lacks the necessary permissions to read from input schemas or write to output schemas.fixGrant the configured warehouse user appropriate read permissions on source schemas and write permissions on the schema where Profiles will output its tables. Refer to the RudderStack documentation for specific warehouse permission requirements.
Warnings
- breaking Upgrading to new versions might require manual uninstallation and reinstallation of associated libraries like `profiles-pycorelib` to resolve dependency conflicts.
- gotcha The 'pb' command might not be recognized after installation, especially on Windows or if Python paths are not correctly configured.
- gotcha Warehouse connection requires specific read/write permissions to source data schemas and the target schema for Profiles outputs. Insufficient permissions will lead to runtime errors.
- gotcha Cross-database references can fail on some Redshift clusters, and `pb insert` functionality is not supported for Redshift, Databricks, and BigQuery.
- gotcha Linux users might encounter a `DBUS_SESSION_BUS_ADDRESS` warning during command runs, which can usually be ignored.
Install
-
pip install profiles-rudderstack -
pip install profiles-rudderstack==0.25.0b5
Imports
- CLI Tool (pb)
Interaction is primarily via the 'pb' command-line interface.
Quickstart
# 1. Install Profiles Builder pip install profiles-rudderstack # 2. Verify installation pb version # 3. Create warehouse connection (follow prompts) pb init connection # 4. Initialize a new Profiles project in a directory 'my-project' pb init pb-project -o my-project cd my-project # 5. Open pb_project.yaml and set 'connection:' to your connection name. # Edit inputs.yaml and profiles.yaml with your data sources and features. # 6. Validate warehouse access pb validate access # 7. Run the project to compile SQL and execute on your warehouse pb run