{"id":1888,"library":"umap-learn","title":"UMAP (Uniform Manifold Approximation and Projection)","description":"UMAP (Uniform Manifold Approximation and Projection) is a general-purpose manifold learning and dimensionality reduction algorithm. It constructs a high-dimensional graph and then searches for a low-dimensional projection of the data that has the closest possible equivalent fuzzy topological structure. The current version is 0.5.12, with a release cadence that includes frequent patch releases and minor updates.","status":"active","version":"0.5.12","language":"en","source_language":"en","source_url":"https://github.com/lmcinnes/umap","tags":["dimensionality reduction","manifold learning","unsupervised learning","data visualization","machine learning"],"install":[{"cmd":"pip install umap-learn","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Numerical operations, array handling.","package":"numpy","optional":false},{"reason":"Scientific computing, sparse matrices, spatial algorithms.","package":"scipy","optional":false},{"reason":"Machine learning utilities, data preprocessing, dataset generation.","package":"scikit-learn","optional":false},{"reason":"Just-in-time compiler for performance-critical code sections.","package":"numba","optional":false},{"reason":"Approximate nearest neighbor search algorithm, used by UMAP.","package":"pynndescent","optional":false}],"imports":[{"note":"The primary class `UMAP` is typically imported from the top-level `umap` module, not `umap_learn`.","wrong":"from umap_learn import UMAP","symbol":"UMAP","correct":"import umap\nreducer = umap.UMAP()"},{"note":"Direct import from `umap` module.","symbol":"UMAP","correct":"from umap import UMAP"}],"quickstart":{"code":"import umap\nfrom sklearn.datasets import make_blobs\n\n# 1. Generate some sample data\nX, y = make_blobs(n_samples=500, centers=4, cluster_std=1.0, random_state=42)\n\n# 2. Initialize UMAP reducer\n# n_neighbors: Balances local vs. global structure. Larger values preserve more global structure.\n# min_dist: Controls how tightly points are packed together. Smaller values lead to denser clusters.\n# n_components: Desired dimensionality of the output embedding.\n# random_state: For reproducible results.\nreducer = umap.UMAP(n_neighbors=15, min_dist=0.1, n_components=2, random_state=42)\n\n# 3. Fit and transform the data\nembedding = reducer.fit_transform(X)\n\n# The 'embedding' now contains the 2D projection of the original data\nprint(f\"Original data shape: {X.shape}\")\nprint(f\"UMAP embedding shape: {embedding.shape}\")\n# print(embedding[:5]) # Display first 5 embedded points","lang":"python","description":"This quickstart demonstrates how to use `umap-learn` to reduce the dimensionality of a synthetic dataset. It covers generating data, initializing the `UMAP` reducer with common parameters, and performing the fit and transform operation."},"warnings":[{"fix":"Always pass an integer value to the `random_state` parameter during `UMAP` object initialization, e.g., `umap.UMAP(random_state=42)`.","message":"UMAP is stochastic, and results are not reproducible without setting `random_state`. This applies to both `UMAP` initialization and any subsequent operations like `transform`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Experiment with different values for `n_neighbors` (e.g., 5 to 50) and `min_dist` (e.g., 0.0 to 0.5). Higher `n_neighbors` captures more global structure, while lower `min_dist` allows for tighter clustering.","message":"The `n_neighbors` and `min_dist` parameters heavily influence the resulting manifold structure. Choosing appropriate values is critical for meaningful results, and defaults may not always be optimal for specific datasets.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Understand that `transform` is an approximation. If exact embeddings for new data are critical, consider retraining UMAP on the combined dataset or evaluating the stability of the transformation for your application. For small changes to the dataset, `update` might be an option.","message":"The `transform` method for new, out-of-sample data points performs an *approximate* projection. It is not guaranteed to perfectly preserve the relationships from the training data or match the quality of the `fit_transform` method.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure `numba` is correctly installed and compatible with your Python environment. Check for any `numba` warnings upon import or during execution. Refer to the `numba` documentation for troubleshooting installation issues.","message":"UMAP's performance relies heavily on `numba` for just-in-time compilation. Issues with `numba` installation or environment configuration (e.g., older compilers) can lead to significant performance degradation or errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"It is generally recommended to preprocess data by scaling or normalizing features before applying UMAP, e.g., using `sklearn.preprocessing.StandardScaler` or `MinMaxScaler`.","message":"UMAP is not inherently scale-invariant. Features with larger scales will have a disproportionately larger influence on the distance calculations and the resulting manifold structure.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}