NVIDIA Resiliency Extension

JSON →
library 0.5.0 ·python
verified May 25, 2026

NVIDIA Resiliency Extension (NVRE) is a Python package that provides fault-tolerant features for framework developers and users, aiming to minimize downtime in deep learning training due to failures and interruptions. It supports features like checkpointing (local and cloud), in-job restarts, and health checks. The current version is 0.5.0, with minor releases occurring every few months to introduce new features and bug fixes.

total hits 16
actors 8 distinct systems
last hit 3d ago MetaBot
MetaBot
4
GPTBot
2
Script
2
ClaudeBot
1
ChatGPT-User
1
Search engines
1

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany · 🇫🇷 France