{"id":9195,"library":"prodigyopt","title":"ProdigyOpt Optimizer","description":"ProdigyOpt is an Adam-like optimizer for neural networks, designed for high performance and memory efficiency. It features adaptive learning rate estimation and implements decoupled weight decay. The current version is 1.1.2, and releases typically focus on minor bug fixes and performance enhancements.","status":"active","version":"1.1.2","language":"en","source_language":"en","source_url":"https://github.com/konstmish/prodigy","tags":["optimizer","pytorch","machine-learning","deep-learning","neural-networks"],"install":[{"cmd":"pip install prodigyopt","lang":"bash","label":"Install ProdigyOpt"}],"dependencies":[],"imports":[{"note":"The library is named 'prodigyopt', not 'prodigy'.","wrong":"from prodigy import Prodigy","symbol":"Prodigy","correct":"from prodigyopt import Prodigy"}],"quickstart":{"code":"import torch\nimport torch.nn as nn\nfrom prodigyopt import Prodigy\n\n# 1. Define a simple model\nmodel = nn.Linear(10, 2)\n\n# 2. Initialize the optimizer with model parameters\n#    decouple_wd=True is default, but explicitly shown for clarity\noptimizer = Prodigy(model.parameters(), lr=1e-3, decouple_wd=True)\n\n# 3. Define a loss function\nloss_fn = nn.MSELoss()\n\n# 4. Prepare dummy data\ninputs = torch.randn(5, 10)\ntargets = torch.randn(5, 2)\n\n# 5. Perform a training step\noptimizer.zero_grad() # Clear gradients\noutputs = model(inputs) # Forward pass\nloss = loss_fn(outputs, targets) # Compute loss\nloss.backward() # Backward pass to compute gradients\noptimizer.step() # Update model parameters\n\nprint(f\"Loss after one step: {loss.item():.4f}\")","lang":"python","description":"This quickstart demonstrates how to initialize the Prodigy optimizer with a PyTorch model and perform a single optimization step. It includes model definition, loss calculation, and the standard optimizer workflow."},"warnings":[{"fix":"Consider increasing `slice_p` if you encounter OutOfMemory errors with large models. Experiment to find the optimal balance between memory and speed for your specific setup.","message":"The `slice_p` parameter (introduced in v1.1, default 1) can significantly impact memory usage for large models. Higher values (e.g., 4) process parameters in slices, reducing peak memory at the cost of a slight performance overhead.","severity":"gotcha","affected_versions":">=1.1"},{"fix":"If you require coupled weight decay or different regularization behavior, explicitly set `decouple_wd=False` during optimizer initialization: `Prodigy(..., decouple_wd=False)`.","message":"The `decouple_wd` parameter defaults to `True` in Prodigy. This applies weight decay in a decoupled manner, which is generally desired for AdamW-like optimizers but might behave differently from other optimizers if you expect coupled weight decay.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to `prodigyopt>=1.1.2` for full compatibility and stability with FSDP setups. Ensure all parameters passed to the optimizer are unfrozen or handled correctly by FSDP.","message":"Versions prior to `1.1.2` had known issues when used with PyTorch's FSDP (Fully Sharded Data Parallel), particularly when some parameters were frozen, leading to incorrect behavior or crashes.","severity":"breaking","affected_versions":"<1.1.2"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install prodigyopt` in your terminal to install the library.","cause":"The `prodigyopt` library is not installed in your current Python environment.","error":"ImportError: No module named 'prodigyopt'"},{"fix":"Ensure your import statement is exactly `from prodigyopt import Prodigy` and verify the package is installed correctly.","cause":"Typo in the import statement or an issue with the installed package.","error":"ImportError: cannot import name 'Prodigy' from 'prodigyopt'"},{"fix":"Verify that your model has parameters with `requires_grad=True` and that you are correctly passing `model.parameters()` to the optimizer, e.g., `optimizer = Prodigy(model.parameters(), lr=1e-3)`.","cause":"No trainable parameters were passed to the `Prodigy` optimizer during its initialization. This can happen if `model.parameters()` is empty, or all parameters are frozen.","error":"ValueError: Optimizer got an empty parameter list."},{"fix":"Ensure that your model's parameters have `requires_grad=True` and that the computation graph connecting your inputs through the model to the loss is intact (i.e., no `.detach()` calls inappropriately breaking the graph).","cause":"You are attempting to call `.backward()` on a loss tensor that does not originate from a computation graph involving parameters that require gradients. This often means your model parameters are frozen, or inputs were detached.","error":"RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn"}]}