{"id":9350,"library":"tabpfn","title":"TabPFN: Foundation model for tabular data","description":"TabPFN is a transformer-based foundation model for tabular data that leverages prior-data based learning to achieve strong performance on small-to-medium sized datasets without requiring task-specific training. Currently at version 7.1.1, it is actively developed by Prior Labs and offers fast, zero-shot predictions, often outperforming tuned tree-based models and AutoML systems on suitable datasets.","status":"active","version":"7.1.1","language":"en","source_language":"en","source_url":"https://github.com/PriorLabs/TabPFN","tags":["machine-learning","tabular-data","foundation-model","ai","transformer","zero-shot","classification","regression","pytorch"],"install":[{"cmd":"pip install tabpfn","lang":"bash","label":"Stable release"},{"cmd":"pip install --upgrade tabpfn","lang":"bash","label":"Upgrade to latest version"}],"dependencies":[{"reason":"Required Python version","package":"python","version":">=3.9"},{"reason":"Deep learning backend, especially for GPU acceleration","package":"torch","version":">=2.1"},{"reason":"Standard ML interface compatibility","package":"scikit-learn","version":">=1.0"},{"reason":"Numerical computation","package":"numpy","optional":true},{"reason":"Data manipulation","package":"pandas","optional":true}],"imports":[{"symbol":"TabPFNClassifier","correct":"from tabpfn import TabPFNClassifier"},{"symbol":"TabPFNRegressor","correct":"from tabpfn import TabPFNRegressor"}],"quickstart":{"code":"import numpy as np\nfrom tabpfn import TabPFNClassifier\nfrom sklearn.model_selection import train_test_split\n\n# Generate synthetic data\nX = np.random.rand(100, 10) # 100 samples, 10 features\ny = np.random.randint(0, 2, 100) # Binary classification target\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Initialize and use TabPFNClassifier (sklearn-like interface)\n# The first call might trigger a license acceptance in browser.\nclf = TabPFNClassifier(device='cpu') # Use 'cuda' if GPU is available\nclf.fit(X_train, y_train)\npredictions = clf.predict(X_test)\nprobabilities = clf.predict_proba(X_test)\n\nprint(f\"Predictions: {predictions[:5]}\")\nprint(f\"Probabilities (first 5 samples):\\n{probabilities[:5]}\")","lang":"python","description":"Demonstrates basic usage of TabPFNClassifier with a scikit-learn compatible interface for binary classification. For optimal performance, specify `device='cuda'` if a GPU is available. Note that the first execution may prompt a browser window for license acceptance."},"warnings":[{"fix":"For very large datasets, consider sampling, hybrid approaches (e.g., with Random Forests), or commercial versions. Use TabPFN Extensions for 'many_class' problems.","message":"TabPFN is optimized for small to medium-sized datasets, typically up to 50,000 rows and 2,000 features. Performance significantly degrades on larger datasets, or if the number of classes exceeds common limits (e.g., 10 for the core model, though extensions exist).","severity":"gotcha","affected_versions":"<7.0.0"},{"fix":"Ensure a CUDA-enabled PyTorch installation and set `device='cuda'` for `TabPFNClassifier` or `TabPFNRegressor`. If no GPU is available, consider using the TabPFN API Client for hosted inference.","message":"GPU is strongly recommended for TabPFN for optimal performance, even older ones with ~8GB VRAM. CPU inference is considerably slower and only feasible for very small datasets (≲1000 samples).","severity":"gotcha","affected_versions":"All"},{"fix":"Upgrade your Python environment to 3.9, 3.10, or 3.11.","message":"TabPFN requires Python 3.9 or newer due to reliance on modern language features. Using older Python versions will result in import errors or installation failures.","severity":"breaking","affected_versions":"All"},{"fix":"Feed raw or minimally preprocessed numerical and categorical data directly to TabPFN. Let the model handle feature transformations.","message":"TabPFN performs internal data preprocessing (e.g., normalization, handling missing values, categorical features). Explicitly applying data scaling (e.g., StandardScaler) or one-hot encoding *before* feeding data to TabPFN is generally not recommended and can negatively impact performance.","severity":"gotcha","affected_versions":"All"},{"fix":"Always use batch prediction mode by passing all test samples (`X_test`) in a single call to `clf.predict(X_test)` or `clf.predict_proba(X_test)`. If `X_test` is very large, split it into chunks (e.g., 1000 samples each) and process in batches.","message":"Calling `predict` or `predict_proba` repeatedly for single test samples is highly inefficient, as each call recomputes the training set context.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Remove the `N_ensemble_configurations` argument. Check the documentation for available parameters in your installed version. The default ensemble behavior is usually handled internally.","cause":"The `N_ensemble_configurations` parameter was removed or changed in newer versions.","error":"TypeError: 'TabPFNClassifier' object got an unexpected keyword argument 'N_ensemble_configurations'"},{"fix":"If you have a GPU, set `clf = TabPFNClassifier(device='cuda')` for significantly faster inference. Ensure PyTorch with CUDA support is installed.","cause":"TabPFN detected that a GPU is available but is running on CPU by default or due to explicit 'cpu' device setting.","error":"UserWarning: Running on CPU to estimate real data statistics. Note: performing inference on CPU is considerably slower than GPU. Consider calling 'set_device('cuda')' on your TabPFNClassifier to set TabPFN to GPU."},{"fix":"Reduce the size of your training data (e.g., by sampling) or test data (by chunking `X_test` for prediction). Consider using a GPU with more VRAM, or switch to CPU inference (which will be slower).","cause":"The dataset (training and/or test set) is too large to fit into the GPU's VRAM.","error":"RuntimeError: CUDA out of memory. Tried to allocate XXX MiB (GPU XXX; YYY MiB total capacity; ZZZ MiB already allocated; AAA MiB free; BBB MiB reserved in total by PyTorch)"},{"fix":"For truly constant targets, consider adding tiny, practically insignificant noise to the target variable `y` or implement explicit checks to skip fitting if `y` is constant and handle such cases separately (e.g., by predicting the constant value directly).","cause":"TabPFNRegressor might encounter errors when the target variable `y` in the training data is constant.","error":"TabPFNRegressor fails on constant input data"},{"fix":"Ensure `tabpfn` is installed with `pip install tabpfn`. Verify your Python environment meets the `>=3.9` requirement. If using a virtual environment, activate it before installing and importing.","cause":"This usually indicates an incorrect installation, a Python version mismatch, or trying to import from an old or incorrect path.","error":"ImportError: cannot import name 'TabPFNClassifier' from 'tabpfn'"}]}