{"id":6257,"library":"streamsets","title":"StreamSets Python SDK","description":"The StreamSets Python SDK enables developers to programmatically interact with StreamSets DataOps Platform components, including Control Hub, Data Collector, and Transformer. It facilitates automation of data pipeline creation, management, monitoring, and deployment workflows. The library is currently at version 6.6.2 and receives regular updates to support new platform features and provide bug fixes.","status":"active","version":"6.6.2","language":"en","source_language":"en","source_url":"https://github.com/onefoursix/streamsets-platform-sdk-examples","tags":["data integration","ETL","StreamSets","Control Hub","Data Collector","pipeline automation","dataops"],"install":[{"cmd":"pip install streamsets","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Required interpreter version.","package":"Python","version":">=3.6, <3.14"}],"imports":[{"note":"Main class for interacting with StreamSets Control Hub.","symbol":"ControlHub","correct":"from streamsets.sdk import ControlHub"},{"note":"Main class for interacting with StreamSets Data Collector instances.","symbol":"DataCollector","correct":"from streamsets.sdk import DataCollector"},{"note":"As of SDK v3.11.0, Topology-related methods were moved from `ControlHub` to `streamsets.sdk.sch_models.Topology` for proper scope.","wrong":"from streamsets.sdk import ControlHub # for Topology methods","symbol":"Topology","correct":"from streamsets.sdk.sch_models import Topology"}],"quickstart":{"code":"import os\nfrom streamsets.sdk import ControlHub\n\n# Ensure CRED_ID and CRED_TOKEN are set as environment variables\n# Example: export CRED_ID='your_credential_id'\n# Example: export CRED_TOKEN='your_credential_token'\n\ncred_id = os.environ.get('CRED_ID', 'YOUR_CRED_ID')\ncred_token = os.environ.get('CRED_TOKEN', 'YOUR_CRED_TOKEN')\nsch_url = os.environ.get('SCH_URL', 'https://cloud.streamsets.com') # Or your on-prem Control Hub URL\n\n# Connect to Control Hub\ntry:\n    control_hub = ControlHub(sch_url=sch_url, credential_id=cred_id, credential_token=cred_token)\n    print(f\"Successfully connected to Control Hub at {sch_url}\")\n\n    # Example: List available Data Collectors\n    data_collectors = control_hub.get_data_collectors()\n    if data_collectors:\n        print(\"Available Data Collectors:\")\n        for dc in data_collectors:\n            print(f\" - {dc.name} (ID: {dc.id})\")\n    else:\n        print(\"No Data Collectors found.\")\n\n    # Example: Create a simple pipeline (requires a Data Collector if deployed)\n    # This example demonstrates creating a pipeline in memory, but not deploying it.\n    # For deployment, you'd typically use PipelineBuilder and then control_hub.publish_pipeline()\n    # or control_hub.create_job() if managing via Control Hub.\n    print(\"\\nExample: Creating a simple in-memory pipeline builder (not yet deployed to Control Hub).\")\n    pipeline_builder = control_hub.get_pipeline_builder()\n    dev_data_generator = pipeline_builder.add_stage('Dev Data Generator')\n    trash = pipeline_builder.add_stage('Trash')\n    dev_data_generator >> trash # Connect stages\n    \n    my_pipeline = pipeline_builder.build('My First SDK Pipeline')\n    print(f\"Created pipeline builder object: {my_pipeline.name}\")\n    # To deploy this pipeline, you would use control_hub.publish_pipeline(my_pipeline)\n\nexcept Exception as e:\n    print(f\"Error connecting to Control Hub or performing operations: {e}\")","lang":"python","description":"This quickstart demonstrates how to connect to StreamSets Control Hub using API credentials and list available Data Collectors. It also includes an example of how to initialize a `PipelineBuilder` to create an in-memory pipeline definition. Remember that creating an object in code does not automatically deploy it to StreamSets; explicit `publish` or `add` calls are required."},"warnings":[{"fix":"Verify and align your installed SDK version with your StreamSets product (Platform or Legacy). Upgrade SDK to >=4.0.0 for Platform or use <4.0.0 for Legacy products.","message":"StreamSets SDK for Python versions 4.0.0 and higher are designed for the StreamSets DataOps Platform, while versions below 4.0.0 support legacy StreamSets products. Ensure your SDK version matches the StreamSets product version you intend to interact with for compatibility.","severity":"breaking","affected_versions":"<4.0.0"},{"fix":"Update imports and method calls to use `from streamsets.sdk.sch_models import Topology` and interact with `Topology` objects directly for topology management.","message":"Starting with SDK v3.11.0, methods related to Topologies were moved from the `streamsets.sdk.ControlHub` class to `streamsets.sdk.sch_models.Topology`. Existing code using `control_hub.get_topology_by_name()` or similar directly on the `ControlHub` object will fail.","severity":"breaking","affected_versions":">=3.11.0"},{"fix":"Refer to the updated documentation for `streamsets.sdk.sdc_models.Snapshot` to adjust your code to the new syntax for pipeline snapshot interactions.","message":"The SDK for Python v3.10.0 refactored SDC pipeline snapshots. Upgrading to 3.10.0 or later without modifying existing code to use the new snapshot syntax will result in execution failures.","severity":"breaking","affected_versions":">=3.10.0"},{"fix":"Generate API credentials (Credential ID and Token) in your StreamSets Control Hub instance and use them to initialize the `ControlHub` object, preferably via environment variables.","message":"For StreamSets SDK versions that interact with Control Hub (typically 4.0.0+), authentication requires API credentials (`CRED_ID` and `CRED_TOKEN`), which must be generated in Control Hub. This replaces the username and password authentication methods used in older SDK v3.x versions.","severity":"gotcha","affected_versions":">=4.0.0"},{"message":"When using the SDK, creating an object (e.g., a `PipelineBuilder` or a pipeline object) in your Python script does not automatically mean that object exists within StreamSets Control Hub or Data Collector. You must explicitly 'add' or 'publish' the object (e.g., `control_hub.publish_pipeline(pipeline)`) for it to be reflected in the StreamSets environment.","severity":"gotcha"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}