{"id":6411,"library":"pot","title":"Python Optimal Transport Library (POT)","description":"POT (Python Optimal Transport) is a comprehensive Python library offering various solvers for optimal transport problems. It provides efficient implementations for classic optimal transport, Wasserstein distances, Sinkhorn algorithm, Gromov-Wasserstein, and more, including recent extensions like unbalanced OT and GMM-OT. Currently at version 0.9.6.post1, the library sees frequent minor releases, often introducing new features, solvers, and bug fixes.","status":"active","version":"0.9.6.post1","language":"en","source_language":"en","source_url":"https://github.com/PythonOT/POT","tags":["optimal transport","wasserstein distance","machine learning","statistics","geometric deep learning"],"install":[{"cmd":"pip install pot","lang":"bash","label":"Install stable release"}],"dependencies":[{"reason":"Core library for numerical operations and array manipulation.","package":"numpy"},{"reason":"Provides scientific computing tools, sparse matrix handling, and optimization algorithms.","package":"scipy"},{"reason":"Used extensively for plotting and visualizing optimal transport plans and results, especially in examples and tutorials.","package":"matplotlib","optional":true}],"imports":[{"note":"The entire library is typically imported under the alias 'ot'.","symbol":"ot","correct":"import ot"}],"quickstart":{"code":"import numpy as np\nimport ot\n\n# Generate two 1D samples\nn = 100\nnp.random.seed(0)\nxs = np.random.normal(0, 1, n)\nxt = np.random.normal(5, 1, n)\n\n# Histogram counts (uniform distribution)\na = np.ones(n) / n\nb = np.ones(n) / n\n\n# Cost matrix: squared Euclidean distance\nM = ot.dist(xs.reshape((n, 1)), xt.reshape((n, 1)))\nM /= M.max() # Normalize cost matrix for stability\n\n# Compute Earth Mover's Distance (EMD) / Wasserstein-1 distance\nG = ot.emd(a, b, M)\nprint(f\"Optimal Transport plan (first 5x5):\n{G[:5,:5]}\")\nprint(f\"EMD cost: {np.sum(G * M)}\")","lang":"python","description":"This example demonstrates how to compute the Earth Mover's Distance (EMD) between two 1D samples using POT's core `ot.emd` function. It covers generating samples, defining uniform marginal distributions, computing a normalized cost matrix, and finally, calculating the optimal transport plan and its total cost."},"warnings":[{"fix":"Review the official documentation and examples for GW solvers if migrating from versions older than 0.9.0. Verify results, especially when dealing with non-symmetric matrices.","message":"The Gromov-Wasserstein (GW) solvers underwent a major refactor in version 0.9.0, leading to significant performance gains and the ability to handle non-symmetric cost matrices. While the API generally remained consistent, users relying on specific internal behaviors or numerical properties of older GW implementations might observe changes in results or performance characteristics.","severity":"breaking","affected_versions":">=0.9.0"},{"fix":"For GPU or specific backend usage, import and configure `ot.backend` (e.g., `import ot.backend as ob; ob.set_backend('torch', 'cuda')`) or ensure all input tensors are compatible with the desired backend.","message":"POT supports multiple array backends (NumPy, PyTorch, JAX, CuPy) for computation. By default, it uses NumPy. To leverage GPU acceleration (e.g., with CuPy or PyTorch on CUDA), users must explicitly configure `ot.backend` or ensure their input arrays are of the desired backend type (e.g., CuPy arrays for `ot.gpu` functions). Mixing backends or incorrect setup can lead to errors or unexpected CPU-only computation.","severity":"gotcha","affected_versions":"All versions with backend support"},{"fix":"Always consult the specific function's documentation for expected input shapes. Use `.reshape()` or `.T` carefully to ensure arrays conform to the required dimensions.","message":"Input array dimensions are critical and frequently a source of errors. For example, marginal distributions `a` and `b` are typically 1D arrays, while coordinates `X` and `Y` are 2D arrays (n_samples, n_features), and cost matrices `M` are 2D (n_samples_source, n_samples_target). Mismatched dimensions (e.g., `(n,)` instead of `(n,1)` for single-feature coordinates or transposed cost matrices) will lead to runtime errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Normalize marginal distributions `a` and `b` such that `np.sum(a) == 1` and `np.sum(b) == 1` before passing them to POT functions, unless the specific function documentation explicitly states otherwise for unbalanced OT.","message":"For many optimal transport problems, particularly those with a probabilistic interpretation, the marginal distributions `a` and `b` are expected to sum to 1. While some solvers might handle unnormalized inputs, it's best practice to normalize them (e.g., `a = a / np.sum(a)`) to ensure correct interpretation and avoid potential numerical instabilities in certain algorithms.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For N > 1000 samples, favor `ot.sinkhorn` or other regularized/approximate solvers over `ot.emd`. Explore techniques like sub-sampling, multi-scale, or barycentric mapping for further scalability.","message":"Optimal transport problems, especially exact EMD, can be computationally very expensive for large numbers of samples. While POT provides efficient C/Cython implementations, exact solvers scale poorly (e.g., cubic complexity for EMD). For large-scale applications, consider using entropic regularized solvers (Sinkhorn) or specialized approximate methods which trade off accuracy for speed.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z"}