XProf Profiler Plugin
XProf is a comprehensive profiling and performance analysis tool for machine learning workloads. It supports frameworks such as JAX, TensorFlow, and PyTorch/XLA, running on various hardware including CPUs, GPUs, and TPUs. The library offers a suite of tools like Overview, Trace Viewer, Memory Profile Viewer, and Graph Viewer to aid in understanding, debugging, and optimizing ML programs. It is actively maintained with frequent minor and patch releases.
Warnings
- breaking A known regression in `libtpu` versions `0.0.35` and `0.0.37` causes tools dependent on HLO Modules (e.g., HLO Op Profile, Trace Viewer, Graph Viewer) to not work as intended across all XProf versions. This significantly impacts core visualization features.
- gotcha XProf requires internet access to load the Google Chart library. If running offline, behind a corporate firewall, or in a datacenter without external access, some charts and tables in the UI may be missing or fail to load.
- gotcha Python 3.12+ users may encounter a `ModuleNotFoundError: No module named 'pkg_resources'` during installation or runtime due to changes in Python's packaging system.
- gotcha When used with TensorBoard, version conflicts with the `protobuf` package (a common dependency for TensorFlow/TensorBoard) can lead to `TypeError: Descriptors cannot be created directly`. This indicates potential incompatibility with specific `tensorflow` or `tensorboard` versions.
Install
-
pip install xprof -
pip install xprof tensorboard
Quickstart
# 1. Collect profile data (example using JAX profiler, actual collection varies by framework) # In your ML training code (e.g., JAX): # import jax.profiler # jax.profiler.start_server(9012) # ... run your model ... # jax.profiler.stop_server() # 2. Run XProf as a standalone server to view collected profiles # Assuming profile data is saved to 'profiler/demo' directory: # To run XProf standalone: xprof --logdir=profiler/demo --port=6006 # Or, to view with TensorBoard (if installed): tensorboard --logdir=profiler/demo