TabPFN: Foundation model for tabular data

7.1.1 · active · verified Thu Apr 16

TabPFN is a transformer-based foundation model for tabular data that leverages prior-data based learning to achieve strong performance on small-to-medium sized datasets without requiring task-specific training. Currently at version 7.1.1, it is actively developed by Prior Labs and offers fast, zero-shot predictions, often outperforming tuned tree-based models and AutoML systems on suitable datasets.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates basic usage of TabPFNClassifier with a scikit-learn compatible interface for binary classification. For optimal performance, specify `device='cuda'` if a GPU is available. Note that the first execution may prompt a browser window for license acceptance.

import numpy as np
from tabpfn import TabPFNClassifier
from sklearn.model_selection import train_test_split

# Generate synthetic data
X = np.random.rand(100, 10) # 100 samples, 10 features
y = np.random.randint(0, 2, 100) # Binary classification target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and use TabPFNClassifier (sklearn-like interface)
# The first call might trigger a license acceptance in browser.
clf = TabPFNClassifier(device='cpu') # Use 'cuda' if GPU is available
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)

print(f"Predictions: {predictions[:5]}")
print(f"Probabilities (first 5 samples):\n{probabilities[:5]}")

view raw JSON →