{"id":7798,"library":"torch-stoi","title":"Torch-STOI","description":"Torch-STOI is a Python library that provides a PyTorch implementation of the Short-Time Objective Intelligibility (STOI) metric, primarily designed for use as a loss function in deep learning models for tasks like speech enhancement and source separation. It wraps the functionality of the `pystoi` package to calculate both classical and extended STOI. The current version is 0.2.3, and releases are generally infrequent, focusing on functional improvements and correlation with the reference `pystoi` implementation.","status":"active","version":"0.2.3","language":"en","source_language":"en","source_url":"https://github.com/mpariente/pytorch_stoi","tags":["pytorch","audio","speech processing","stoi","loss function","metrics"],"install":[{"cmd":"pip install torch-stoi","lang":"bash","label":"Install stable release"}],"dependencies":[{"reason":"Core deep learning framework dependency.","package":"torch"},{"reason":"Backend implementation for STOI calculation, required at runtime.","package":"pystoi","optional":false}],"imports":[{"symbol":"NegSTOILoss","correct":"from torch_stoi import NegSTOILoss"}],"quickstart":{"code":"import torch\nfrom torch import nn\nfrom torch_stoi import NegSTOILoss\n\nsample_rate = 16000\nloss_func = NegSTOILoss(sample_rate=sample_rate)\n\n# Example dummy data\nclean_speech = torch.randn(2, sample_rate) # Batch of 2, 1 second audio\nnoisy_speech = torch.randn(2, sample_rate) # Batch of 2, 1 second audio\n\n# In a real scenario, noisy_speech would be passed through a neural network\n# to produce an estimated clean speech signal.\n# For quickstart, let's assume `noisy_speech` is our `est_speech` for demonstration.\n\nest_speech = noisy_speech # Replace with your model's output\n\n# Compute loss\nloss_batch = loss_func(est_speech, clean_speech)\n\nprint(f\"Computed STOI loss: {loss_batch.mean().item()}\")","lang":"python","description":"Initializes `NegSTOILoss` with a sample rate and demonstrates its use as a loss function with example clean and estimated speech tensors. Note that `torch-stoi` is typically integrated into a neural network training loop."},"warnings":[{"fix":"Use `pystoi` directly for accurate STOI metric evaluation: `import pystoi; pystoi.stoi(clean_audio, degraded_audio, fs)`.","message":"The `NegSTOILoss` provided by `torch-stoi` is primarily intended as a loss function for optimization and does not always perfectly replicate the exact values of the 'real' STOI metric. For objective evaluation, it is recommended to use the original `pystoi` library or `torchmetrics.audio.stoi.ShortTimeObjectiveIntelligibility` (which wraps `pystoi`).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be aware of potential device transfers. For performance-critical applications, consider pre-moving data to CPU if feasible, or ensure batch sizes are optimized for transfer.","message":"Calculations within `torch-stoi` (and `torchmetrics`'s STOI wrapper) are performed on the CPU. Input tensors will automatically be moved to the CPU for processing and then potentially moved back to their original device, which can introduce overhead, especially with large batches or frequent calls on GPU-accelerated workflows.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If closer adherence to the standard STOI metric is desired, ensure `use_vad` is set to `True` (default behavior).","message":"Setting the `use_vad` parameter to `False` in `NegSTOILoss` can lead to results that are 'substantially different' from the standard STOI metric, as it bypasses the silent frame detection mechanism.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install the `pystoi` dependency: `pip install pystoi`.","cause":"The `torch-stoi` library depends on `pystoi` for its underlying STOI calculation, but `pystoi` was not installed.","error":"ModuleNotFoundError: No module named 'pystoi'"},{"fix":"If working with `torchtext`, update your code from `vocab.stoi` to `vocab.get_stoi()`. This error is unrelated to `torch-stoi`.","cause":"This error typically occurs when using `torchtext`'s `Vocab` object, where `stoi` (string-to-integer) was a direct attribute in older versions but has been replaced by `get_stoi()` in newer `torchtext` releases. This is *not* an error related to the `torch-stoi` library, but a common confusion due to the shared 'stoi' acronym.","error":"AttributeError: 'Vocab' object has no attribute 'stoi'"},{"fix":"Ensure that the `est_targets` (predicted speech) and `targets` (clean reference speech) tensors have identical shapes (e.g., `[batch_size, num_samples]`).","cause":"The input `preds` and `target` tensors passed to `NegSTOILoss` (or any STOI calculation) do not have matching shapes, which is required for comparison.","error":"RuntimeError: The size of tensor a (X) must match the size of tensor b (Y) at non-singleton dimension Z"}]}