pca: A Python Package for Principal Component Analysis

2.10.2 · active · verified Thu Apr 16

pca is a Python package for Principal Component Analysis (PCA), providing extended functionality beyond basic scikit-learn implementations. It leverages sklearn's core for compatibility while offering features like SparsePCA and TruncatedSVD, comprehensive analysis, and advanced plotting capabilities such as biplots, explained variance plots, outlier detection, and feature importance extraction. The current version is 2.10.2, and it is actively maintained.

Common errors

Warnings

Install

Imports

Quickstart

Initializes the `pca` model, fits it to sample data, and transforms the data into principal components. It also shows how to access the explained variance and the transformed data. Plotting functionality is also available.

import numpy as np
from pca import pca
import pandas as pd

# Sample data
X = pd.DataFrame(np.random.rand(100, 10), columns=[f'feature_{i}' for i in range(10)])

# Initialize PCA model with 3 components
model = pca(n_components=3)

# Fit and transform the data
out = model.fit_transform(X)

print("Explained variance ratio:", model.results['explained_var'])
print("Principal Components (transformed data shape):", out['PC'].shape)

# To display the biplot (requires matplotlib to be installed)
# import matplotlib.pyplot as plt
# model.plot()
# plt.show()

view raw JSON →