{"id":9505,"library":"autofaiss","title":"AutoFaiss","description":"AutoFaiss is a Python library that automatically selects and tunes the best Faiss index for a given dataset, optimizing for search quality and inference speed. It simplifies the process of building and evaluating vector search indexes, abstracting away much of the complexity of Faiss. The current version is 2.18.0, and it generally follows a minor release cadence driven by dependency updates and small feature enhancements.","status":"active","version":"2.18.0","language":"en","source_language":"en","source_url":"https://github.com/criteo/autofaiss","tags":["machine-learning","vector-search","faiss","similarity-search","ann","approximate-nearest-neighbor"],"install":[{"cmd":"pip install autofaiss","lang":"bash","label":"Base installation (CPU)"},{"cmd":"pip install autofaiss[gpu]","lang":"bash","label":"Installation with GPU support (requires NVIDIA CUDA)"}],"dependencies":[{"reason":"Core Faiss library for CPU-based indexing. Automatically installed with `pip install autofaiss`.","package":"faiss-cpu","optional":false},{"reason":"Enables GPU acceleration for index building and searching. Requires `pip install autofaiss[gpu]` and a compatible NVIDIA CUDA setup.","package":"faiss-gpu","optional":true},{"reason":"Fundamental package for numerical operations and array handling.","package":"numpy","optional":false},{"reason":"Used for data manipulation, particularly for input data handling and metadata.","package":"pandas","optional":false},{"reason":"Provides various machine learning utilities, often used for data preprocessing.","package":"scikit-learn","optional":false}],"imports":[{"note":"This is the primary function for automatically selecting and building a Faiss index.","symbol":"build_index","correct":"from autofaiss import build_index"}],"quickstart":{"code":"import numpy as np\nfrom autofaiss import build_index\n\n# Create dummy data: 1000 vectors of 128 dimensions, float32 is recommended\ndata = np.float32(np.random.rand(1000, 128))\n\n# Build the index. Specify max_ram_usage relevant to your system.\n# For production, consider 'metric_type=\"ip\"' for inner product or 'l2' for L2 distance.\nindex, index_infos = build_index(\n    data,\n    index_path=\"my_autofaiss_index.bin\",\n    index_infos_path=\"my_autofaiss_index_infos.json\",\n    max_ram_usage=\"4GB\", # IMPORTANT: Adjust based on available RAM\n    metric_type=\"ip\" \n)\n\nprint(f\"Index built and saved to my_autofaiss_index.bin with info in my_autofaiss_index_infos.json\")","lang":"python","description":"This quickstart demonstrates how to build a Faiss index using `autofaiss.build_index` from a NumPy array. It highlights essential parameters like `index_path`, `max_ram_usage`, and `metric_type`. The `max_ram_usage` parameter is critical for preventing out-of-memory errors on large datasets."},"warnings":[{"fix":"Always explicitly set `max_ram_usage` in `build_index` to a value comfortably below your system's physical RAM (e.g., '4GB', '16GB', etc.). Monitor memory usage during index building.","message":"Building large Faiss indices, especially with `autofaiss`, can be very memory-intensive. Users frequently encounter Out-of-Memory (OOM) errors if the `max_ram_usage` parameter is not set appropriately for their system's available RAM or if it's omitted.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install with `pip install autofaiss[gpu]`. Ensure you have a compatible NVIDIA CUDA toolkit and drivers installed on your system. Do not install `faiss-cpu` and `faiss-gpu` simultaneously.","message":"For GPU acceleration, `autofaiss` requires the `faiss-gpu` dependency. Simply installing `autofaiss` (which pulls `faiss-cpu`) will not enable GPU support. Attempting to use GPU features without `faiss-gpu` installed will result in runtime errors or fall back to CPU.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your input data is explicitly cast to `np.float32` before passing it to `build_index`. Example: `data = np.float32(your_raw_data)`.","message":"Faiss, and by extension AutoFaiss, is optimized for and often expects input vectors to be of `np.float32` data type. Passing `np.float64` (the default for many NumPy operations) can significantly increase memory consumption, reduce performance, and potentially lead to Out-of-Memory errors for large datasets.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade your Python environment to Python 3.10 or newer. Use virtual environments to manage different Python versions if needed.","message":"AutoFaiss requires Python >= 3.10. Attempting to install or run AutoFaiss on older Python versions will lead to dependency resolution errors, installation failures, or `SyntaxError` on import.","severity":"breaking","affected_versions":"< 2.0 (for Python < 3.8); >= 2.0 (for Python < 3.10)"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"To enable GPU support, install AutoFaiss with the `[gpu]` extra: `pip install autofaiss[gpu]`. Ensure your system has a compatible NVIDIA CUDA setup. If you don't intend to use GPU, remove any GPU-related configurations.","cause":"Attempting to use GPU features (e.g., setting `use_gpu=True` implicitly or explicitly) when only `faiss-cpu` is installed, or when `autofaiss[gpu]` was not used for installation.","error":"ModuleNotFoundError: No module named 'faiss_gpu'"},{"fix":"Convert your input NumPy array to `np.float32` before passing it to `build_index`. For example: `data = np.float32(your_data)`.","cause":"The input data (embeddings) provided to `build_index` is in `np.float64` format, while Faiss prefers and is optimized for `np.float32`.","error":"TypeError: Expected np.float32 for input vectors, got np.float64."},{"fix":"Increase the `max_ram_usage` parameter in your `build_index` call to a higher value, ensuring it doesn't exceed your system's physical RAM. Example: `max_ram_usage='16GB'`.","cause":"AutoFaiss's internal logic determined that the `max_ram_usage` specified (or default) is insufficient for the size and dimensionality of your input data.","error":"ValueError: max_ram_usage is too low to build an index for the given data. Required RAM: X.XGB, Available RAM: Y.YGB."}]}