Torch Memory Saver

0.0.9 · active · verified Wed Apr 15

Torch Memory Saver is a PyTorch library designed to optimize GPU memory usage by allowing `torch` tensor memory to be temporarily released and resumed later. It enables developers to manage memory more efficiently, especially for large models or when performing operations that might exceed available VRAM. The library is actively developed, with its latest stable release, version 0.0.9, released in October 2025.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the core functionality of `torch-memory-saver`. It shows how to define a memory region, create a large tensor within it, and then temporarily release and resume its GPU memory. By default, tensor content is discarded during `pause()` for maximum memory savings, but it can be preserved using `enable_cpu_backup=True` when defining the region. The example also includes print statements to observe CUDA memory changes.

import torch
import torch_memory_saver
import os

if not torch.cuda.is_available():
    print("CUDA is not available. This library is designed for GPU memory saving.")
    exit()

print(f"Initial CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")

# 1. For tensors that want to be paused, create them within `region`
with torch_memory_saver.region():
    # Create a large tensor (adjust size based on your GPU memory)
    pauseable_tensor = torch.full((1_000_000_000,), 100, dtype=torch.uint8, device="cuda") # ~1GB
    print(f"Tensor created. Current CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")

    # 2. Temporarily pause memory for tensors in this region
    # By default, content is thrown away. Use `enable_cpu_backup=True` to preserve content.
    torch_memory_saver.pause()
    print(f"Memory paused. Current CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")

    # At this point, `nvidia-smi` would show reduced GPU memory usage for the process.
    # You can perform other memory-intensive operations here.

    # 3. After `resume`, CUDA memory is re-occupied for those tensors.
    torch_memory_saver.resume()
    print(f"Memory resumed. Current CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")

    # If `enable_cpu_backup=True` was used, you could now access `pauseable_tensor` and its content would be intact.
    # print(f"Tensor element value after resume (if backed up): {pauseable_tensor[0].item()}")

# Ensure to delete tensors and clear cache if running multiple experiments in a single script
del pauseable_tensor
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print(f"Final CUDA memory allocated: {torch.cuda.memory_allocated() / (1024**2):.2f} MB")

view raw JSON →