Google Tunix
Google Tunix (current version 0.1.6) is a lightweight, JAX-native framework designed for post-training Large Language Models (LLMs) using both reinforcement learning (RL) and supervised fine-tuning (SFT). It provides powerful tools for researchers and production teams to achieve maximum control and scalability when aligning and improving foundation models, particularly on accelerators like TPUs. Releases are frequent, focusing on new model support, API stability, and performance enhancements.
Common errors
-
TypeError: GrpoLearner.__init__() got an unexpected keyword argument 'grpo_config'
cause The `GrpoLearner` constructor's configuration parameter was renamed from `grpo_config` to `algo_config` in version 0.1.5.fixChange `grpo_config=your_config` to `algo_config=your_config` when instantiating `GrpoLearner`. -
ModuleNotFoundError: No module named 'tunix.rl_cluster_lib'
cause Module paths and class names related to cluster configuration and distributed training (`rl_cluster_lib`) were refactored in `v0.1.4` and subsequent releases.fixConsult the latest Tunix documentation or examples for the correct import paths and usage of distributed training utilities, which may now be located under `tunix.cluster` or similar updated namespaces. -
RuntimeError: JAX/Flax compilation failed with an internal error...
cause This often indicates an incompatibility between Tunix, your JAX/Flax versions, or your Python environment. Specific JAX/Flax versions are critical for TPU/GPU compilation.fixVerify that your `jax`, `jaxlib`, and `flax` packages meet Tunix's requirements and are correctly installed for your specific hardware accelerator (CPU, GPU, or TPU). Upgrading JAX to a newer compatible version is often necessary.
Warnings
- breaking The `GrpoLearner` constructor changed the parameter name for the main configuration object from `grpo_config` to `algo_config`.
- breaking API changes were introduced for distributed training components, specifically impacting `rl_cluster_lib.ClusterConfig` and related utilities.
- gotcha As a JAX-native library, Tunix requires specific versions of JAX and Flax. Mismatched versions, especially with `jaxlib` for your accelerator (CPU/GPU/TPU), can lead to complex installation issues and runtime errors.
Install
-
pip install google-tunix
Imports
- GrpoConfig
from tunix.configs import GrpoConfig
- PpoConfig
from tunix.configs import PpoConfig
- GrpoLearner
from tunix.rl.trainer import GrpoLearner
from tunix.trainer import GrpoLearner
- AgenticGRPOConfig
from tunix import AgenticGRPOConfig
- AgenticGRPOLearner
from tunix import AgenticGRPOLearner
Quickstart
from tunix import AgenticGRPOConfig
# Configure Agentic GRPO for LLM post-training
# This is a minimal configuration; a real setup would require more specific parameters
# like model_config, optimizers, and potentially a tokenizer.
agentic_grpo_config = AgenticGRPOConfig(
num_generations=2, # Number of generations per iteration
num_iterations=10, # Total training iterations
max_response_length=512, # Maximum length for generated responses
beta=0.1, # KL-divergence coefficient
# Placeholders for complex objects; in a real scenario these would be actual config objects
model_config=None, # e.g., Llama2Config, GemmaConfig
optimizer_config_factory=lambda: None, # Factory for optimizer configs
)
print(f"AgenticGRPOConfig initialized with num_generations: {agentic_grpo_config.num_generations}")
print(f"Max response length: {agentic_grpo_config.max_response_length}")
# Note: To run a full training loop, you would also need to instantiate
# AgenticGRPOLearner with actual JAX/Flax models, a tokenizer, and a dataset.