PyTorch Profiler TensorBoard Plugin
torch-tb-profiler is a TensorBoard plugin that provides rich visualizations and analysis tools for profiling PyTorch models. It parses, processes, and visualizes profiling results dumped by `torch.profiler`, helping users identify performance bottlenecks and receive optimization recommendations. The current version is 0.4.3, with releases often tied to PyTorch updates or major bug fixes.
Warnings
- breaking The TensorBoard integration with PyTorch profiler (`tb_plugin` submodule, which this library provides) is deprecated and scheduled for permanent removal on March 5, 2026. Users are advised to consider migrating their workflow.
- gotcha Profiling results may not be displayed in TensorBoard if the `torch.profiler` is not actively enabled and generating data in your code. Simply installing `torch-tb-profiler` is not enough to see results.
- gotcha Trace files containing invalid values like 'inf' (e.g., in 'memory bandwidth (GB/s)') can cause `torch-tb-profiler` to fail when opening and visualizing the data in TensorBoard.
- gotcha When profiling CUDA activities, issues like mismatched CUDA toolkit versions between your environment and PyTorch installation, or improper use of `autocast` for mixed precision, can lead to incorrect or misleading GPU profiling results and performance.
Install
-
pip install torch-tb-profiler
Imports
- tensorboard_trace_handler
from torch.profiler import tensorboard_trace_handler
Quickstart
import torch
import torch.nn as nn
import torch.optim as optim
from torch.profiler import profile, record_function, ProfilerActivity, tensorboard_trace_handler
import os
# Create a dummy model and data
model = nn.Linear(10, 10).cuda() if torch.cuda.is_available() else nn.Linear(10, 10)
optimizer = optim.SGD(model.parameters(), lr=0.01)
dummy_input = torch.randn(64, 10).cuda() if torch.cuda.is_available() else torch.randn(64, 10)
# Define log directory for TensorBoard
log_dir = "./runs/profiler_test"
# Run profiler
with profile(
schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
on_trace_ready=tensorboard_trace_handler(log_dir),
activities=[
ProfilerActivity.CPU,
ProfilerActivity.CUDA if torch.cuda.is_available() else ProfilerActivity.CPU,
],
record_shapes=True,
with_stack=True
) as prof:
for i in range(10):
optimizer.zero_grad()
output = model(dummy_input)
loss = output.sum()
loss.backward()
optimizer.step()
prof.step() # Advance profiler to next step
print(f"Profiling results saved to {log_dir}.\n")
print("To view in TensorBoard, run: ")
print(f"tensorboard --logdir {os.path.abspath(log_dir.split('/profiler_test')[0])}")