{"library":"nvidia-modelopt","title":"NVIDIA Model Optimizer","description":"NVIDIA Model Optimizer (nvidia-modelopt) is an open toolkit designed to accelerate AI inference by applying state-of-the-art model optimization techniques such as quantization, pruning, and distillation. It primarily targets PyTorch and ONNX models, integrating directly into the training loop and enabling seamless deployment to NVIDIA's inference frameworks like TensorRT-LLM and TensorRT. The library is actively developed, with its current stable version being 0.42.0, and frequent pre-release candidates (e.g., 0.43.0rcX) indicating a rapid release cadence.","language":"python","status":"active","last_verified":"Sun May 17","install":{"commands":["pip install nvidia-modelopt","pip install \"nvidia-modelopt[all]\" --extra-index-url https://pypi.nvidia.com"],"cli":null},"imports":["from diffusers import NVIDIAModelOptConfig","from modelopt.torch.opt import enable_huggingface_checkpointing","import modelopt.torch.quantization as mtq","from modelopt.torch.export import export_hf_checkpoint"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import torch\nfrom diffusers import AutoModel, NVIDIAModelOptConfig\nfrom modelopt.torch.opt import enable_huggingface_checkpointing\nimport os # Required for os.environ.get if needed for token, though not direct in this example\n\n# Enable checkpointing for Hugging Face models\nenable_huggingface_checkpointing()\n\n# Define the model ID and data type\nmodel_id = \"Efficient-Large-Model/Sana_600M_1024px_diffusers\"\ndtype = torch.bfloat16\n\n# Define quantization configuration for FP8\n# For simplicity, this example doesn't use os.environ.get as the model loading doesn't require explicit auth in this snippet.\n# However, if your model required a Hugging Face token, you would pass token=os.environ.get('HF_TOKEN', '')\nquantization_config = NVIDIAModelOptConfig(quant_type=\"FP8\", quant_method=\"modelopt\")\n\n# Load the model with quantization configuration\n# In a real scenario, ensure your environment has the necessary NVIDIA drivers and CUDA setup.\ntry:\n    print(f\"Attempting to load model {model_id} with FP8 quantization...\")\n    model = AutoModel.from_pretrained(\n        model_id,\n        subfolder=\"transformer\",\n        quantization_config=quantization_config,\n        torch_dtype=dtype,\n    )\n    print(\"Model loaded successfully with quantization enabled.\")\n    # Example of a simple forward pass (replace with actual usage)\n    # dummy_input = torch.randn(1, 3, 224, 224, dtype=dtype, device='cuda')\n    # output = model(dummy_input)\n    # print(\"Forward pass successful.\")\n    \n    # To save the quantized model (requires a path)\n    # model.save_pretrained('path/to/sana_fp8', safe_serialization=False)\nexcept Exception as e:\n    print(f\"Error loading or processing model: {e}\")\n    print(\"Ensure you have `diffusers` installed, a compatible GPU, and potentially `--extra-index-url https://pypi.nvidia.com` during installation if encountering issues.\")\n","lang":"python","description":"This quickstart demonstrates how to load a Hugging Face model and apply FP8 quantization using `NVIDIAModelOptConfig`. It shows the integration of Model Optimizer with popular deep learning frameworks and libraries like Hugging Face Diffusers to prepare models for efficient deployment.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-17","installed_version":"0.29.0","pypi_latest":"0.44.0","is_stale":true,"summary":{"python_range":"3.10–3.9","success_rate":25,"avg_install_s":65.8,"avg_import_s":null,"wheel_type":"sdist"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"nvidia-modelopt","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"timeout","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"nvidia-modelopt","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"broken","install_time_s":82.9,"import_time_s":null,"mem_mb":null,"disk_size":"4.9G"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"nvidia-modelopt","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"timeout","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"nvidia-modelopt","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"broken","install_time_s":81.2,"import_time_s":null,"mem_mb":null,"disk_size":"5.0G"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"nvidia-modelopt","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"timeout","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"nvidia-modelopt","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"broken","install_time_s":75.7,"import_time_s":null,"mem_mb":null,"disk_size":"5.0G"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"nvidia-modelopt","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"timeout","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"nvidia-modelopt","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"broken","install_time_s":74.5,"import_time_s":null,"mem_mb":null,"disk_size":"5.0G"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"nvidia-modelopt","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"all","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":61.4,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"nvidia-modelopt","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":14.8,"import_time_s":null,"mem_mb":null,"disk_size":"261M"}]}}