{"id":7493,"library":"pca","title":"pca: A Python Package for Principal Component Analysis","description":"pca is a Python package for Principal Component Analysis (PCA), providing extended functionality beyond basic scikit-learn implementations. It leverages sklearn's core for compatibility while offering features like SparsePCA and TruncatedSVD, comprehensive analysis, and advanced plotting capabilities such as biplots, explained variance plots, outlier detection, and feature importance extraction. The current version is 2.10.2, and it is actively maintained.","status":"active","version":"2.10.2","language":"en","source_language":"en","source_url":"https://github.com/erdogant/pca","tags":["PCA","dimensionality reduction","machine learning","data analysis","visualization","scikit-learn wrapper","outlier detection"],"install":[{"cmd":"pip install pca","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core functionality is built on sklearn for PCA algorithms.","package":"scikit-learn","optional":false},{"reason":"Fundamental library for numerical operations and data handling.","package":"numpy","optional":false},{"reason":"Required for plotting functions like biplots and explained variance.","package":"matplotlib","optional":false}],"imports":[{"symbol":"pca","correct":"from pca import pca"},{"note":"The `pca` library provides its own `pca` class (lowercase 'p'). The uppercase 'PCA' is typically imported from scikit-learn's `sklearn.decomposition` module. Confusing these can lead to missing methods.","wrong":"from pca import PCA","symbol":"PCA","correct":"from sklearn.decomposition import PCA"}],"quickstart":{"code":"import numpy as np\nfrom pca import pca\nimport pandas as pd\n\n# Sample data\nX = pd.DataFrame(np.random.rand(100, 10), columns=[f'feature_{i}' for i in range(10)])\n\n# Initialize PCA model with 3 components\nmodel = pca(n_components=3)\n\n# Fit and transform the data\nout = model.fit_transform(X)\n\nprint(\"Explained variance ratio:\", model.results['explained_var'])\nprint(\"Principal Components (transformed data shape):\", out['PC'].shape)\n\n# To display the biplot (requires matplotlib to be installed)\n# import matplotlib.pyplot as plt\n# model.plot()\n# plt.show()","lang":"python","description":"Initializes the `pca` model, fits it to sample data, and transforms the data into principal components. It also shows how to access the explained variance and the transformed data. Plotting functionality is also available."},"warnings":[{"fix":"Always preprocess your data using a scaling method like `StandardScaler` from scikit-learn before applying PCA, especially if features have different units or ranges.","message":"PCA is highly sensitive to the scale of input data. Features with larger value ranges or variances can disproportionately influence the principal components if data is not scaled (e.g., using `sklearn.preprocessing.StandardScaler`) prior to PCA.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure you are using the latest `pca` library version which often includes fixes for newer dependency versions. If warnings persist, refer to the library's GitHub issues for potential workarounds or updates, or consider temporarily downgrading `matplotlib` if critical for production.","message":"When using the plotting features of `pca`, specifically with `matplotlib` versions 3.10 and above, you might encounter warnings related to `get_cmap`.","severity":"gotcha","affected_versions":">=2.10.0 (with matplotlib >= 3.10)"},{"fix":"Ensure you are importing and instantiating the correct PCA object: `from pca import pca` for this package, or `from sklearn.decomposition import PCA` for scikit-learn's native implementation.","message":"There's a separate `PCA` class within scikit-learn (`sklearn.decomposition.PCA`). Attempting to use methods unique to the `erdogant/pca` package (like `.plot()` or `.biplot()`) on a `sklearn.decomposition.PCA` object will result in an `AttributeError`.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"The correct import for the main PCA class in this library is `from pca import pca` (lowercase `pca` for the class name) or ensure your environment doesn't have conflicting modules named 'pca'.","cause":"You are trying to import 'pca' (lowercase) as a class directly from the package, but Python's import system might resolve 'pca' as the package itself rather than its main class/function if not structured carefully, or you might be trying to import 'PCA' (uppercase) which is not the main class of this library.","error":"ImportError: cannot import name 'pca' from 'pca'"},{"fix":"Ensure you are importing the `pca` class from the `pca` library: `from pca import pca`. Then, create an instance of this class to access its extended methods like `biplot`.","cause":"You are likely using scikit-learn's `sklearn.decomposition.PCA` class, which does not have a `biplot` method. The `biplot` method is a feature provided by the `erdogant/pca` library.","error":"AttributeError: 'PCA' object has no attribute 'biplot'"},{"fix":"Reduce the value of `n_components` to be less than the number of features in your dataset. For `sklearn.decomposition.PCA` specifically, it must be strictly less than `min(n_samples, n_features)` when using solvers like 'arpack'.","cause":"This error occurs when the `n_components` parameter specified for PCA is greater than or equal to the number of features (columns) in your input data. PCA cannot create more components than existing features.","error":"ValueError: not enough features to compute (n_components) components"}]}