Clusterscope
Clusterscope is a Python library and CLI tool designed to extract information from High-Performance Computing (HPC) clusters and jobs, particularly those using Slurm. It provides functionalities for cluster detection, GPU/CPU/memory information, job requirement generation, and AWS environment detection. The library is actively maintained, with frequent releases, and is currently at version 0.0.32.
Common errors
-
ModuleNotFoundError: No module named 'clusterscope'
cause The `clusterscope` package is not installed in the current Python environment.fixInstall the package using pip: `pip install clusterscope` -
cscope: command not found
cause The `cscope` command-line entry point is not in your system's PATH, or the package was installed into an environment not activated in your shell.fixEnsure your Python environment is activated (e.g., `source .venv/bin/activate`) or that pip's script directory is in your system's PATH. If using `pipx`, ensure `pipx ensurepath` has been run. -
(CLI output is empty or shows 'Unknown cluster')
cause Clusterscope could not detect the cluster type or retrieve information from the underlying HPC system (e.g., Slurm). This often happens if Slurm commands are not available or if the output format is unexpected.fixVerify that you are on a node with Slurm client tools installed and configured. Try running native Slurm commands like `sinfo`, `squeue` to confirm they work. Ensure necessary environment variables (e.g., `SLURM_CONF`) are set correctly.
Warnings
- breaking Changes in v0.0.31 removed direct `init` calls to methods. Code relying on implicit initialization or specific method call order in constructors might break.
- gotcha Clusterscope heavily integrates with Slurm. Misconfigurations in the Slurm environment (e.g., missing `sinfo`, `squeue` commands, incorrect `srun` environment variables, or unexpected Slurm output formats) can lead to incorrect or incomplete information being extracted.
- gotcha The library's functionality might be affected by specific cluster hardware or cloud configurations, especially regarding GPU visibility or resource allocation details. For instance, the changelog mentions `getting max gpus, cpus out of sinfo output` and updates related to `aws nccl defaults`.
Install
-
pip install clusterscope
Imports
- cluster
import clusterscope; clusterscope.cluster()
- get_gpus
from clusterscope.gpus import get_gpus
Quickstart
import clusterscope
# Get the detected cluster name
cluster_name = clusterscope.cluster()
print(f"Detected cluster: {cluster_name}")
# You can also use the CLI for more specific information:
# print("\nTry running these commands in your terminal:")
# print(" $ cscope gpus") # Show GPU information
# print(" $ cscope cpus") # Show CPU counts per node
# print(" $ cscope mem") # Show memory information per node
# print(" $ cscope aws") # Check if running on AWS and show NCCL settings