{"id":5892,"library":"cut-cross-entropy","title":"Cut Cross Entropy","description":"Cut Cross Entropy provides a highly memory-efficient implementation of the linear-cross-entropy loss function, primarily optimized for large language models and high-throughput inference scenarios. It is part of the vLLM project. The current version is 25.1.1, indicating a rapid development cycle, likely following a date-based or frequent release cadence, designed for NVIDIA GPUs.","status":"active","version":"25.1.1","language":"en","source_language":"en","source_url":"https://github.com/vllm-project/cut-cross-entropy","tags":["machine learning","deep learning","loss function","pytorch","gpu","cuda","vllm","llm"],"install":[{"cmd":"pip install cut-cross-entropy torch>=2.0.0","lang":"bash","label":"Install with PyTorch (requires CUDA)"}],"dependencies":[{"reason":"PyTorch is a mandatory dependency for tensor operations and GPU kernels. Requires version >=2.0.0 and a CUDA-enabled installation.","package":"torch","optional":false}],"imports":[{"note":"The PyPI package name uses hyphens (`cut-cross-entropy`), but the Python module name uses underscores (`cut_cross_entropy`).","wrong":"from cut-cross-entropy import cut_cross_entropy","symbol":"cut_cross_entropy","correct":"from cut_cross_entropy import cut_cross_entropy"},{"note":"The PyPI package name uses hyphens (`cut-cross-entropy`), but the Python module name uses underscores (`cut_cross_entropy`).","wrong":"from cut-cross-entropy import cut_cross_entropy_reference","symbol":"cut_cross_entropy_reference","correct":"from cut_cross_entropy import cut_cross_entropy_reference"}],"quickstart":{"code":"import torch\nfrom cut_cross_entropy import cut_cross_entropy\n\nif torch.cuda.is_available():\n    device = torch.device(\"cuda\")\n    print(f\"Using CUDA device: {device}\")\n\n    # Example: logits (batch_size, vocab_size), labels (batch_size,)\n    batch_size = 2\n    vocab_size = 4\n\n    # Data often uses float16 for memory efficiency and performance on GPU\n    logits = torch.randn(batch_size, vocab_size, device=device, dtype=torch.float16)\n    labels = torch.randint(0, vocab_size, (batch_size,), device=device, dtype=torch.int64)\n\n    # Calculate loss\n    loss = cut_cross_entropy(logits, labels)\n    print(f\"Calculated loss: {loss.item():.4f}\")\n\n    # Example with num_total_tokens (for distributed/batched scenarios)\n    num_total_tokens = torch.tensor([10], device=device, dtype=torch.int64)\n    loss_with_tokens = cut_cross_entropy(logits, labels, num_total_tokens)\n    print(f\"Calculated loss with total tokens: {loss_with_tokens.item():.4f}\")\nelse:\n    print(\"CUDA is not available. This library is designed for NVIDIA GPUs.\")\n    print(\"Please ensure you have a CUDA-enabled GPU and the correct PyTorch installation.\")","lang":"python","description":"This quickstart demonstrates how to use `cut_cross_entropy` to calculate the loss. It explicitly checks for CUDA availability, as the library is fundamentally designed for and requires a CUDA-enabled NVIDIA GPU. The example shows both basic usage and an application with `num_total_tokens`, using `float16` for logits as is common for memory-efficient GPU workloads."},"warnings":[{"fix":"Ensure your environment has a CUDA-enabled GPU and a PyTorch installation compiled with CUDA support (e.g., `pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118`).","message":"This library is exclusively designed for and requires a CUDA-enabled NVIDIA GPU. It will not function on CPU-only systems, even if PyTorch is installed.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always use `from cut_cross_entropy import ...` for imports.","message":"The PyPI package name is `cut-cross-entropy` (using hyphens), but the Python module you import is `cut_cross_entropy` (using underscores). Incorrect module import paths are a common mistake.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consider using `torch.float16` for logits where appropriate to maximize memory efficiency and performance, ensuring your model and hardware support it.","message":"The library is optimized for memory efficiency and often used with `torch.float16` (half-precision). While `float32` might work, the primary performance and memory benefits are realized with `float16`, and using `float32` could potentially negate some of the library's advantages.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Regularly consult the GitHub repository's README, release notes, or changelog for specific updates and potential API adjustments when upgrading to new versions.","message":"The versioning (e.g., 25.1.1) suggests a rapid development pace, likely tied to the `vllm` project's releases. This can imply more frequent API changes compared to libraries adhering to strict semantic versioning.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}