{"id":5215,"library":"fairscale","title":"FairScale: PyTorch Large-Scale Training Utilities","description":"FairScale is a PyTorch extension library providing utilities for large-scale and high-performance training, including Fully Sharded Data Parallel (FSDP) and Optimizer State Sharding (OSS). While many features, especially FSDP, have been upstreamed to PyTorch, FairScale offers specialized tools for memory and communication efficiency. The current version is 0.4.13. Release cadence is infrequent now, as core functionalities are integrated into PyTorch.","status":"maintenance","version":"0.4.13","language":"en","source_language":"en","source_url":"https://github.com/facebookresearch/fairscale","tags":["pytorch","distributed-training","data-parallel","fsdp","optimizer-sharding","gpu","hpc","deep-learning"],"install":[{"cmd":"pip install fairscale","lang":"bash","label":"Install FairScale"}],"dependencies":[{"reason":"Core deep learning framework. Requires torch>=1.11 for full compatibility.","package":"torch","optional":false}],"imports":[{"note":"FairScale's FSDP implementation is nested under the `data_parallel` submodule.","wrong":"from fairscale.nn import FullyShardedDataParallel","symbol":"FullyShardedDataParallel","correct":"from fairscale.nn.data_parallel import FullyShardedDataParallel"},{"symbol":"OSS","correct":"from fairscale.optim.oss import OSS"}],"quickstart":{"code":"import torch\nimport torch.nn as nn\nimport torch.distributed as dist\nfrom fairscale.nn.data_parallel import FullyShardedDataParallel as FSDP\nfrom fairscale.optim.oss import OSS\n\n# NOTE: For actual distributed use, dist.init_process_group must be called for multi-GPU/node setups.\n# This example simulates a single-process setup for quickstart.\n# In a real distributed run, rank and world_size would come from the environment.\n\n# Dummy initialization for single-process quickstart\nif not dist.is_initialized():\n    try:\n        # Using HashStore for a simple single-node, single-process initialization\n        dist.init_process_group(backend='gloo', rank=0, world_size=1, store=dist.HashStore())\n    except RuntimeError as e:\n        # Catch if already initialized (e.g., in some interactive environments)\n        print(f\"Could not initialize process group (might be already initialized): {e}\")\n\n# 1. Define a simple model\nclass MyModel(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.layer = nn.Linear(10, 10)\n    def forward(self, x):\n        return self.layer(x)\n\n# 2. Instantiate the model\nmodel = MyModel()\n\n# 3. Wrap the model with FairScale's FSDP\n# For simplicity, default options are used. Real-world usage often requires careful tuning.\nfsdp_model = FSDP(model)\n\n# 4. Wrap the optimizer with FairScale's OSS\noptimizer = torch.optim.Adam(fsdp_model.parameters(), lr=1e-3)\noss_optimizer = OSS(params=fsdp_model.parameters(), optim=optimizer)\n\n# 5. Dummy data and training step\ninput_data = torch.randn(2, 10)\nlabels = torch.randn(2, 10)\n\n# Forward pass\noutput = fsdp_model(input_data)\nloss = nn.MSELoss()(output, labels)\n\n# Backward pass and optimizer step\noss_optimizer.zero_grad()\nloss.backward()\noss_optimizer.step()\n\nprint(f\"FairScale FSDP and OSS example completed. Loss: {loss.item():.4f}\")\n\n# Clean up distributed environment if it was initialized by this script\nif dist.is_initialized() and dist.get_world_size() == 1:\n    dist.destroy_process_group()","lang":"python","description":"This quickstart demonstrates how to wrap a PyTorch model with FairScale's Fully Sharded Data Parallel (FSDP) and its Optimizer State Sharding (OSS) for memory-efficient training. Note that `dist.init_process_group` is essential for multi-GPU/node training; a dummy initialization is used here for a runnable single-process example. For new projects, it is highly recommended to consider migrating to PyTorch's native FSDP."},"warnings":[{"fix":"Migrate your FSDP usage to `torch.distributed.fsdp.FullyShardedDataParallel`. Consult the official PyTorch FSDP documentation for migration guides and updated best practices.","message":"FairScale's FSDP (`fairscale.nn.data_parallel.FullyShardedDataParallel`) is largely superseded by PyTorch's native FSDP (`torch.distributed.fsdp.FullyShardedDataParallel`) since PyTorch 1.11 and 1.12+. For new projects, the native PyTorch implementation is strongly encouraged due to ongoing development and optimizations.","severity":"deprecated","affected_versions":"0.4.0+"},{"fix":"Plan for migration to native PyTorch distributed features, especially `torch.distributed.fsdp`, to ensure future compatibility, access to the latest optimizations, and bug fixes.","message":"FairScale is in maintenance mode, meaning active development for new features has largely shifted to PyTorch's native distributed modules. Future API changes or new features in PyTorch's core distributed components might not be backported or fully compatible with FairScale in the future.","severity":"breaking","affected_versions":"0.4.0+"},{"fix":"Ensure `torch.distributed.init_process_group` is called before instantiating FairScale's FSDP or OSS. Use environment variables (e.g., `MASTER_ADDR`, `MASTER_PORT`, `RANK`, `WORLD_SIZE`) or helper functions for distributed setup.","message":"FairScale requires a properly initialized `torch.distributed` environment. Running without `dist.init_process_group` (even for single-GPU FSDP) will result in errors or unexpected behavior during model wrapping or training.","severity":"gotcha","affected_versions":"All"},{"fix":"Refer to FairScale's documentation on mixed precision usage with FSDP. In many cases, `torch.cuda.amp` can be used alongside FSDP, but careful integration is required.","message":"When using FairScale's FSDP with mixed precision, ensure that the `mixed_precision` argument in `FSDP` is configured correctly, or that you are using a compatible `torch.cuda.amp.GradScaler` outside of FSDP, depending on your PyTorch version and specific setup. Incorrect configuration can lead to performance issues or `NaN` gradients.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}