{"id":7792,"library":"tinsel","title":"tinsel: PySpark schema generator","description":"Tinsel is a lightweight Python library designed to simplify PySpark DataFrame schema generation. It allows users to define complex PySpark schemas using familiar Python native types like `NamedTuple` and `dataclasses`, removing the need for verbose PySpark schema DSLs. The library is small, fast, and provides type shims for some Python types that might not have direct Spark equivalents. The current version is 0.3.0, with the last public update in September 2018, indicating a maintenance-level release cadence.","status":"maintenance","version":"0.3.0","language":"en","source_language":"en","source_url":"https://github.com/benchsci/tinsel","tags":["pyspark","schema","dataclasses","namedtuple","data-engineering","etl"],"install":[{"cmd":"pip install tinsel","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Core functionality relies on PySpark for DataFrame operations and schema generation.","package":"pyspark","optional":false},{"reason":"Used for schema definition; built-in in Python 3.7+, requires backport for Python 3.6.","package":"dataclasses","optional":true}],"imports":[{"note":"The `struct` decorator is directly available from the top-level `tinsel` package.","wrong":"from tinsel.schema import struct","symbol":"struct","correct":"from tinsel import struct"},{"note":"The `transform` function is a direct import from the `tinsel` package.","wrong":"import tinsel.transform","symbol":"transform","correct":"from tinsel import transform"}],"quickstart":{"code":"from dataclasses import dataclass\nfrom typing import NamedTuple, Optional, Dict, List\nfrom tinsel import struct, transform\nfrom pyspark.sql import SparkSession\n\n# Define nested schema using dataclass\n@struct\n@dataclass\nclass UserInfo:\n    hobby: List[str]\n    last_seen: Optional[int]\n    pet_ages: Dict[str, int]\n\n# Define root schema using NamedTuple\n@struct\nclass User(NamedTuple):\n    login: str\n    age: int\n    active: bool\n    info: Optional[UserInfo]\n\n# Transform the Python class into a PySpark schema\nspark_schema = transform(User)\n\n# Prepare sample data matching the defined structure\ndata = [\n    User(\n        login=\"Ben\",\n        age=18,\n        active=False,\n        info=None\n    ),\n    User(\n        login=\"Tom\",\n        age=32,\n        active=True,\n        info=UserInfo(\n            hobby=[\"pets\", \"flowers\"],\n            last_seen=16,\n            pet_ages={\n                \"Jack\": 2,\n                \"Sunshine\": 6\n            }\n        )\n    )\n]\n\n# Initialize SparkSession\nspark = SparkSession.builder.master('local').appName(\"TinselQuickstart\").getOrCreate()\n\n# Create DataFrame using the generated schema and data\ndf = spark.createDataFrame(data=data, schema=spark_schema)\ndf.printSchema()\ndf.show(truncate=False)\n\nspark.stop()\n","lang":"python","description":"This quickstart demonstrates how to define a PySpark schema using Tinsel with Python's `dataclasses` and `NamedTuple`. It then converts this definition into a `StructType` compatible with PySpark and creates a DataFrame with sample data."},"warnings":[{"fix":"Thoroughly test `tinsel` generated schemas with your specific PySpark and Python versions. Consider manual schema definition for highly complex or cutting-edge PySpark features if issues arise.","message":"The `tinsel` library has not seen updates since September 2018. While its core functionality remains valid, it might not be compatible with the absolute latest features or changes in very recent PySpark versions or Python language constructs.","severity":"gotcha","affected_versions":"<=0.3.0"},{"fix":"Review the generated PySpark schema (via `df.printSchema()`) carefully to confirm that types and nullability match expectations. Refer to the `tinsel` source for explicit type mapping details if ambiguities occur.","message":"Tinsel handles nullable fields and provides 'type shims' for certain Python types that don't have direct PySpark equivalents (e.g., `long` or `short`). Users should be aware of how these types are mapped to avoid unexpected schema interpretations.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install tinsel` to install the library.","cause":"The `tinsel` library is not installed in the current Python environment.","error":"ModuleNotFoundError: No module named 'tinsel'"},{"fix":"Ensure the output of `transform()` is assigned to a variable (e.g., `schema = transform(YourClass)`) and then passed to PySpark's `createDataFrame` using the `schema=` keyword argument (e.g., `spark.createDataFrame(data, schema=schema)`).","cause":"Attempting to call the result of `transform(YourClass)` as if it were a function, or misusing the generated schema object.","error":"TypeError: 'StructType' object is not callable"},{"fix":"Verify that `from tinsel import struct, transform` is used. Check your Python environment for any conflicting packages named `tinsel` or issues with the installation.","cause":"This usually means `struct` or `transform` was imported incorrectly, or the `tinsel` package itself is not properly installed or is shadowed by another module.","error":"AttributeError: 'module' object has no attribute 'struct' or 'transform'"}]}