Singer SDK
The Singer SDK (Software Development Kit) is a Python framework for building Singer taps (data extractors) and targets (data loaders). It simplifies the creation of data connectors that are compliant with the open-source Singer Spec, handling much of the boilerplate code for configuration, schema discovery, state management, and data serialization. The library is actively maintained by Meltano and the Singer community, with frequent releases addressing bug fixes, new features, and deprecations.
Warnings
- breaking SQL-related classes (e.g., `SQLTap`, `SQLStream`, `SQLSink`, `SQLConnector`, `SQLTarget`) will be moved from top-level `singer_sdk` imports to `singer_sdk.sql`.
- breaking The functions `singer_sdk.testing.get_standard_tap_tests` and `singer_sdk.testing.get_standard_target_tests` will be removed. They are replaced by `singer_sdk.testing.get_tap_test_class` and `singer_sdk.testing.get_target_test_class` to generate richer test suites.
- breaking The `PyJWT` and `cryptography` libraries for JWT authentication, and `SQLAlchemy` for SQL connectors, will no longer be installed by default. They will become optional extra dependencies.
- deprecated The `Stream.reset_state_progress_marker` method is deprecated and its logic was never used at the stream level.
- gotcha Prior to v0.53.6, specific interactions with the `simpleeval` dependency could lead to issues when using `json` within stream map expressions.
- gotcha The SDK previously crashed with `SyntaxError: Unexpected end of JSON input` if an API returned an empty body with a `200` status code but without a `Content-Length: 0` header or a `204` status code.
- gotcha When using stream map expressions, `NameNotDefined` errors (e.g., `'datetime' is not defined for expression 'datetime.datetime.now()'`) can occur if the `simpleeval` context does not have the necessary built-ins or if the SDK version is too old for specific functions.
Install
-
pip install singer-sdk -
pip install singer-sdk[sql] -
pip install singer-sdk[jwt]
Imports
- Tap
from singer_sdk import Tap
- Stream
from singer_sdk import Stream
- RESTStream
from singer_sdk.streams import RESTStream
- th (typing helpers)
import singer_sdk.typing as th
- SQLTap, SQLStream, SQLSink, SQLConnector, SQLTarget
from singer_sdk.sql import SQLTap, SQLStream, SQLSink, SQLConnector, SQLTarget
Quickstart
import os
from singer_sdk import Tap, Stream
from singer_sdk.streams import RESTStream
import singer_sdk.typing as th
class UsersStream(RESTStream):
"""Users stream."""
name = "users"
url_base = os.environ.get("API_URL", "https://api.example.com")
path = "/users"
primary_keys = ["id"]
records_jsonpath = "$.data[*]"
schema = th.PropertiesList(
th.Property("id", th.IntegerType),
th.Property("name", th.StringType),
th.Property("email", th.StringType),
).to_dict()
class MyTap(Tap):
"""My custom tap."""
name = "tap-myapi"
config_jsonschema = th.PropertiesList(
th.Property("api_url", th.StringType, required=True),
th.Property("api_key", th.StringType, required=True, secret=True),
).to_dict()
def discover_streams(self):
return [UsersStream(self)]
# To run the tap (e.g., discover catalog or sync data):
# python -m your_tap_module --config config.json --discover > catalog.json
# python -m your_tap_module --config config.json --catalog catalog.json --state state.json > state_new.json
if __name__ == "__main__":
MyTap.cli()