RNA-Norm

raw JSON →
2.2.0 verified Fri May 01 auth: no python

Rnanorm provides common RNA-seq normalization methods (TPM, CPM, FPKM, TMM, etc.) with a scikit-learn-like API. Current version 2.2.0 requires Python >=3.9, <3.14. The library is actively maintained with regular releases.

pip install rnanorm
error AttributeError: module 'rnanorm' has no attribute 'CountData'
cause CountData moved to submodule rnanorm.datasets in version 2.0.0.
fix
Use 'from rnanorm.datasets import CountData'
error TypeError: TPM.fit_transform() missing 1 required positional argument: 'lengths'
cause Length-dependent normalizers require gene lengths as second argument. In version 1.x, lengths were optional or not needed.
fix
Provide a pandas Series or array of gene lengths as the second argument to fit_transform.
error ValueError: The feature names should match those that were passed during fit.
cause When using set_output(transform='pandas'), the feature names (gene IDs) must be consistent between fit and transform.
fix
Ensure the same column names/index are used in both fit and transform calls.
breaking In version 2.0.0, the API changed from fit/transform on raw counts to requiring explicit gene lengths for length-dependent methods (TPM, FPKM). The 'expression' attribute of CountData now returns a DataFrame, not an object with .counts.
fix Upgrade code to pass lengths to fit_transform. See migration guide: https://rnanorm.readthedocs.io/en/latest/migration.html
breaking CountData is no longer importable directly from rnanorm; it must be imported from rnanorm.datasets.
fix Change 'from rnanorm import CountData' to 'from rnanorm.datasets import CountData'
gotcha Length-dependent methods (TPM, FPKM) require gene lengths as a pandas Series or array with the same index as the expression DataFrame columns. Using wrong or missing lengths will silently produce incorrect results.
fix Always provide correct gene lengths; verify by checking that the sum of each sample's TPM is approximately 1e6.
gotcha The set_output(transform='pandas') method must be called before fit_transform to get a DataFrame output; otherwise output is a numpy array.
fix Chain .set_output(transform='pandas') on the estimator before calling fit_transform.
deprecated The 'log1p' parameter in some normalizers is deprecated and will be removed in future versions.
fix Apply log transformation manually after normalization using numpy.log1p.

Basic usage: load example count data, apply TPM normalization with dummy gene lengths.

import pandas as pd
from rnanorm import TPM
from rnanorm.datasets import CountData

# Load example dataset
counts = CountData()
exp = counts.expression

# TPM normalization (requires gene lengths)
# For demo, use dummy lengths (1 for all genes)
lengths = pd.Series(1.0, index=exp.columns)
tpm = TPM().set_output(transform='pandas').fit_transform(exp, lengths)
print(tpm.iloc[:5, :5])