RNA-Norm
raw JSON → 2.2.0 verified Fri May 01 auth: no python
Rnanorm provides common RNA-seq normalization methods (TPM, CPM, FPKM, TMM, etc.) with a scikit-learn-like API. Current version 2.2.0 requires Python >=3.9, <3.14. The library is actively maintained with regular releases.
pip install rnanorm Common errors
error AttributeError: module 'rnanorm' has no attribute 'CountData' ↓
cause CountData moved to submodule rnanorm.datasets in version 2.0.0.
fix
Use 'from rnanorm.datasets import CountData'
error TypeError: TPM.fit_transform() missing 1 required positional argument: 'lengths' ↓
cause Length-dependent normalizers require gene lengths as second argument. In version 1.x, lengths were optional or not needed.
fix
Provide a pandas Series or array of gene lengths as the second argument to fit_transform.
error ValueError: The feature names should match those that were passed during fit. ↓
cause When using set_output(transform='pandas'), the feature names (gene IDs) must be consistent between fit and transform.
fix
Ensure the same column names/index are used in both fit and transform calls.
Warnings
breaking In version 2.0.0, the API changed from fit/transform on raw counts to requiring explicit gene lengths for length-dependent methods (TPM, FPKM). The 'expression' attribute of CountData now returns a DataFrame, not an object with .counts. ↓
fix Upgrade code to pass lengths to fit_transform. See migration guide: https://rnanorm.readthedocs.io/en/latest/migration.html
breaking CountData is no longer importable directly from rnanorm; it must be imported from rnanorm.datasets. ↓
fix Change 'from rnanorm import CountData' to 'from rnanorm.datasets import CountData'
gotcha Length-dependent methods (TPM, FPKM) require gene lengths as a pandas Series or array with the same index as the expression DataFrame columns. Using wrong or missing lengths will silently produce incorrect results. ↓
fix Always provide correct gene lengths; verify by checking that the sum of each sample's TPM is approximately 1e6.
gotcha The set_output(transform='pandas') method must be called before fit_transform to get a DataFrame output; otherwise output is a numpy array. ↓
fix Chain .set_output(transform='pandas') on the estimator before calling fit_transform.
deprecated The 'log1p' parameter in some normalizers is deprecated and will be removed in future versions. ↓
fix Apply log transformation manually after normalization using numpy.log1p.
Imports
- CountData wrong
from rnanorm import CountDatacorrectfrom rnanorm.datasets import CountData - TPM
from rnanorm import TPM - CPM
from rnanorm import CPM - FPKM
from rnanorm import FPKM - TMM
from rnanorm import TMM - UpperQuartile
from rnanorm import UpperQuartile - RemoveUninformative wrong
from rnanorm import RemoveUninformativecorrectfrom rnanorm.filters import RemoveUninformative - CountFilter
from rnanorm.filters import CountFilter
Quickstart
import pandas as pd
from rnanorm import TPM
from rnanorm.datasets import CountData
# Load example dataset
counts = CountData()
exp = counts.expression
# TPM normalization (requires gene lengths)
# For demo, use dummy lengths (1 for all genes)
lengths = pd.Series(1.0, index=exp.columns)
tpm = TPM().set_output(transform='pandas').fit_transform(exp, lengths)
print(tpm.iloc[:5, :5])