rpy2: Python interface to the R language
rpy2 is a powerful Python package that provides a bridge between Python and the R programming language. It enables seamless integration of R's statistical capabilities and specialized packages with Python's versatile ecosystem. The library is actively developed, with its current stable version being 3.6.7, and maintains a continuous development and release cadence.
Warnings
- gotcha rpy2 requires a working R installation. Issues often arise from R not being in the system's PATH or the R_HOME environment variable not being set, leading to `OSError: cannot load library` on import.
- breaking Significant architectural and API changes occurred between major versions (e.g., RPy-1.x to rpy2, and further changes in rpy2 3.x, including a shift to `cffi` for its R interface). Code written for older versions will likely break.
- gotcha R packages (e.g., ggplot2, dplyr) must be installed within R or explicitly managed by rpy2 for `importr` to find them. Simply installing them in R might not make them immediately visible to rpy2 if R's library paths are not correctly configured or unique for the rpy2 environment.
- gotcha While `pandas2ri.activate()` provides convenient automatic conversion between pandas and R data structures, it's a global setting that can sometimes lead to unexpected behavior or conflicts. For more controlled conversions, consider using explicit converter objects.
- gotcha Errors originating from R code executed via rpy2 are typically raised as `rpy2.rinterface.RRuntimeError`. However, some low-level R errors (e.g., from R's C++ components) can lead to Python crashing with a core dump rather than raising a catchable exception.
Install
-
pip install rpy2 -
# System requirements (install R first) sudo apt-get install r-base r-base-dev # Debian/Ubuntu brew install r # macOS
Imports
- robjects
import rpy2.r as r
import rpy2.robjects as robjects
- importr
from rpy2.robjects.packages import importr
- pandas2ri
from rpy2.robjects import pandas2ri
Quickstart
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
import pandas as pd
# Activate conversion between pandas and R (optional, but common)
pandas2ri.activate()
# Import an R package (e.g., base R's 'stats' for statistical functions)
stats = importr('stats')
base = importr('base')
# Define R code as a Python string and execute it
ro.r('''
# Generate a sequence of numbers
x <- 1:10
# Calculate the mean
mean_x <- mean(x)
# Create an R data frame
r_df <- data.frame(A=c(1,2,3), B=c('x','y','z'))
''')
# Access R variables from Python
mean_x = ro.r['mean_x'][0]
print(f"Mean of x from R: {mean_x}")
# Get an R data frame and convert it to pandas (if pandas2ri is activated)
r_dataframe = ro.r['r_df']
py_dataframe = pandas2ri.rpy2py(r_dataframe)
print("Python DataFrame from R:")
print(py_dataframe)
print(f"Type of py_dataframe: {type(py_dataframe)}")
# Example: Call an R function with Python objects
py_data = pd.DataFrame({'value': [10, 20, 30], 'group': ['A', 'B', 'A']})
r_data = pandas2ri.py2rpy(py_data)
# Use an R function (e.g., 't.test') from the 'stats' package
t_test_result = stats.t_test(ro.Formula('value ~ group'), data=r_data)
print("\nT-test result from R:")
print(t_test_result)