RDT (Reversible Data Transforms)

1.21.0 · active · verified Wed Apr 15

RDT (Reversible Data Transforms) is a Python library that enables the transformation of raw data into fully numerical data, making it ready for various data science tasks. The transformations are designed to be reversible, allowing conversion back to the original data format. It is part of The Synthetic Data Vault Project and is actively maintained by DataCebo, with frequent updates and releases. The current version is 1.21.0.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a demo dataset, initialize a `HyperTransformer`, automatically detect its configuration based on the data, transform the data into a numerical format, and then reverse the transformation back to its original representation.

import pandas as pd
from rdt import HyperTransformer, get_demo

# Load a demo dataset
customers = get_demo()
print("Original Data:\n", customers.head())

# Initialize and detect config with HyperTransformer
ht = HyperTransformer()
ht.detect_initial_config(data=customers)
print("\nDetected Config:\n", ht.get_config())

# Transform the data
transformed_data = ht.transform(customers)
print("\nTransformed Data (first 5 rows):\n", transformed_data.head())

# Reverse transform the data back to original format
reversed_data = ht.reverse_transform(transformed_data)
print("\nReversed Data (first 5 rows):\n", reversed_data.head())

view raw JSON →