{"id":9789,"library":"google-tunix","title":"Google Tunix","description":"Google Tunix (current version 0.1.6) is a lightweight, JAX-native framework designed for post-training Large Language Models (LLMs) using both reinforcement learning (RL) and supervised fine-tuning (SFT). It provides powerful tools for researchers and production teams to achieve maximum control and scalability when aligning and improving foundation models, particularly on accelerators like TPUs. Releases are frequent, focusing on new model support, API stability, and performance enhancements.","status":"active","version":"0.1.6","language":"en","source_language":"en","source_url":"https://github.com/google/tunix","tags":["JAX","LLM","RLHF","Fine-tuning","Reinforcement Learning","Google"],"install":[{"cmd":"pip install google-tunix","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core machine learning framework, Tunix is JAX-native.","package":"jax","optional":false},{"reason":"Neural network library for JAX, often used with Tunix models.","package":"flax","optional":false},{"reason":"Required Python version.","package":"python","version":">=3.11"}],"imports":[{"symbol":"GrpoConfig","correct":"from tunix.configs import GrpoConfig"},{"symbol":"PpoConfig","correct":"from tunix.configs import PpoConfig"},{"note":"Module path changed in recent versions.","wrong":"from tunix.rl.trainer import GrpoLearner","symbol":"GrpoLearner","correct":"from tunix.trainer import GrpoLearner"},{"note":"Introduced in v0.1.6 for agentic RL workflows.","symbol":"AgenticGRPOConfig","correct":"from tunix import AgenticGRPOConfig"},{"note":"Introduced in v0.1.6 for agentic RL workflows.","symbol":"AgenticGRPOLearner","correct":"from tunix import AgenticGRPOLearner"}],"quickstart":{"code":"from tunix import AgenticGRPOConfig\n\n# Configure Agentic GRPO for LLM post-training\n# This is a minimal configuration; a real setup would require more specific parameters\n# like model_config, optimizers, and potentially a tokenizer.\nagentic_grpo_config = AgenticGRPOConfig(\n    num_generations=2, # Number of generations per iteration\n    num_iterations=10, # Total training iterations\n    max_response_length=512, # Maximum length for generated responses\n    beta=0.1, # KL-divergence coefficient\n    # Placeholders for complex objects; in a real scenario these would be actual config objects\n    model_config=None, # e.g., Llama2Config, GemmaConfig\n    optimizer_config_factory=lambda: None, # Factory for optimizer configs\n)\n\nprint(f\"AgenticGRPOConfig initialized with num_generations: {agentic_grpo_config.num_generations}\")\nprint(f\"Max response length: {agentic_grpo_config.max_response_length}\")\n\n# Note: To run a full training loop, you would also need to instantiate\n# AgenticGRPOLearner with actual JAX/Flax models, a tokenizer, and a dataset.","lang":"python","description":"This quickstart demonstrates how to initialize a basic `AgenticGRPOConfig`, which is central to defining Agentic Reinforcement Learning from Human Feedback (RLHF) training parameters in Tunix. This config would typically be passed to an `AgenticGRPOLearner` along with actual JAX/Flax models and data for a full training workflow."},"warnings":[{"fix":"Update `rl_trainer = GrpoLearner(grpo_config=grpo_config)` to `rl_trainer = GrpoLearner(algo_config=grpo_config)`.","message":"The `GrpoLearner` constructor changed the parameter name for the main configuration object from `grpo_config` to `algo_config`.","severity":"breaking","affected_versions":"v0.1.4 to v0.1.5 (fixed in v0.1.5)"},{"fix":"Review the latest Tunix examples and documentation (especially for v0.1.4+) for updated module paths and class signatures related to distributed training and cluster configuration.","message":"API changes were introduced for distributed training components, specifically impacting `rl_cluster_lib.ClusterConfig` and related utilities.","severity":"breaking","affected_versions":"v0.1.3 to v0.1.4"},{"fix":"Always install JAX/Flax versions compatible with your hardware and the Tunix release. Check Tunix's `pyproject.toml` or `setup.py` for exact dependencies, and consult JAX's official documentation for correct `jaxlib` installation for your device.","message":"As a JAX-native library, Tunix requires specific versions of JAX and Flax. Mismatched versions, especially with `jaxlib` for your accelerator (CPU/GPU/TPU), can lead to complex installation issues and runtime errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Change `grpo_config=your_config` to `algo_config=your_config` when instantiating `GrpoLearner`.","cause":"The `GrpoLearner` constructor's configuration parameter was renamed from `grpo_config` to `algo_config` in version 0.1.5.","error":"TypeError: GrpoLearner.__init__() got an unexpected keyword argument 'grpo_config'"},{"fix":"Consult the latest Tunix documentation or examples for the correct import paths and usage of distributed training utilities, which may now be located under `tunix.cluster` or similar updated namespaces.","cause":"Module paths and class names related to cluster configuration and distributed training (`rl_cluster_lib`) were refactored in `v0.1.4` and subsequent releases.","error":"ModuleNotFoundError: No module named 'tunix.rl_cluster_lib'"},{"fix":"Verify that your `jax`, `jaxlib`, and `flax` packages meet Tunix's requirements and are correctly installed for your specific hardware accelerator (CPU, GPU, or TPU). Upgrading JAX to a newer compatible version is often necessary.","cause":"This often indicates an incompatibility between Tunix, your JAX/Flax versions, or your Python environment. Specific JAX/Flax versions are critical for TPU/GPU compilation.","error":"RuntimeError: JAX/Flax compilation failed with an internal error..."}]}