pca: A Python Package for Principal Component Analysis
pca is a Python package for Principal Component Analysis (PCA), providing extended functionality beyond basic scikit-learn implementations. It leverages sklearn's core for compatibility while offering features like SparsePCA and TruncatedSVD, comprehensive analysis, and advanced plotting capabilities such as biplots, explained variance plots, outlier detection, and feature importance extraction. The current version is 2.10.2, and it is actively maintained.
Common errors
-
ImportError: cannot import name 'pca' from 'pca'
cause You are trying to import 'pca' (lowercase) as a class directly from the package, but Python's import system might resolve 'pca' as the package itself rather than its main class/function if not structured carefully, or you might be trying to import 'PCA' (uppercase) which is not the main class of this library.fixThe correct import for the main PCA class in this library is `from pca import pca` (lowercase `pca` for the class name) or ensure your environment doesn't have conflicting modules named 'pca'. -
AttributeError: 'PCA' object has no attribute 'biplot'
cause You are likely using scikit-learn's `sklearn.decomposition.PCA` class, which does not have a `biplot` method. The `biplot` method is a feature provided by the `erdogant/pca` library.fixEnsure you are importing the `pca` class from the `pca` library: `from pca import pca`. Then, create an instance of this class to access its extended methods like `biplot`. -
ValueError: not enough features to compute (n_components) components
cause This error occurs when the `n_components` parameter specified for PCA is greater than or equal to the number of features (columns) in your input data. PCA cannot create more components than existing features.fixReduce the value of `n_components` to be less than the number of features in your dataset. For `sklearn.decomposition.PCA` specifically, it must be strictly less than `min(n_samples, n_features)` when using solvers like 'arpack'.
Warnings
- gotcha PCA is highly sensitive to the scale of input data. Features with larger value ranges or variances can disproportionately influence the principal components if data is not scaled (e.g., using `sklearn.preprocessing.StandardScaler`) prior to PCA.
- gotcha When using the plotting features of `pca`, specifically with `matplotlib` versions 3.10 and above, you might encounter warnings related to `get_cmap`.
- gotcha There's a separate `PCA` class within scikit-learn (`sklearn.decomposition.PCA`). Attempting to use methods unique to the `erdogant/pca` package (like `.plot()` or `.biplot()`) on a `sklearn.decomposition.PCA` object will result in an `AttributeError`.
Install
-
pip install pca
Imports
- pca
from pca import pca
- PCA
from pca import PCA
from sklearn.decomposition import PCA
Quickstart
import numpy as np
from pca import pca
import pandas as pd
# Sample data
X = pd.DataFrame(np.random.rand(100, 10), columns=[f'feature_{i}' for i in range(10)])
# Initialize PCA model with 3 components
model = pca(n_components=3)
# Fit and transform the data
out = model.fit_transform(X)
print("Explained variance ratio:", model.results['explained_var'])
print("Principal Components (transformed data shape):", out['PC'].shape)
# To display the biplot (requires matplotlib to be installed)
# import matplotlib.pyplot as plt
# model.plot()
# plt.show()