{"id":6918,"library":"torchcrepe","title":"Torchcrepe","description":"Torchcrepe is a PyTorch implementation of the CREPE pitch tracker, a state-of-the-art monophonic pitch estimation tool based on a deep convolutional neural network. It allows users to compute pitch and periodicity from audio signals, offering functionalities for direct file processing, filtering, thresholding, and various decoding options. The library is actively maintained, with regular updates to its PyPI package.","status":"active","version":"0.0.24","language":"en","source_language":"en","source_url":"https://github.com/maxrmorrison/torchcrepe","tags":["audio","pitch tracking","deep learning","pytorch","music information retrieval"],"install":[{"cmd":"pip install torchcrepe","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core deep learning framework dependency.","package":"torch","optional":false},{"reason":"Commonly used for audio loading and processing in examples and real-world usage.","package":"librosa","optional":true},{"reason":"Alternative or complementary library for audio I/O and transformations.","package":"torchaudio","optional":true}],"imports":[{"symbol":"torchcrepe","correct":"import torchcrepe"},{"note":"Main function for pitch prediction","symbol":"predict","correct":"from torchcrepe import predict"},{"note":"Utility to load audio files for processing","symbol":"load.audio","correct":"from torchcrepe.load import audio"}],"quickstart":{"code":"import torch\nimport torchcrepe\nimport numpy as np\n\n# Mock torchcrepe.load.audio for a runnable example without external files\nclass MockLoadAudio:\n    def audio(self, *args, **kwargs):\n        # Generate a dummy 16kHz sine wave audio (1 second)\n        sr = 16000\n        duration = 1.0\n        frequency = 440.0 # Hz\n        t = np.linspace(0., duration, int(sr * duration), endpoint=False)\n        audio_np = 0.5 * np.sin(2 * np.pi * frequency * t).astype(np.float32)\n        return torch.from_numpy(audio_np).unsqueeze(0), sr # unsqueeze for batch dimension\n\ntorchcrepe.load = MockLoadAudio()\n\n# Load dummy audio\naudio, sr = torchcrepe.load.audio('dummy.wav', sr=16000)\n\n# Here we'll use a 5 millisecond hop length\nhop_length = int(sr / 200.)\n\n# Provide a sensible frequency range for your domain (upper limit is 2006 Hz)\n# This would be a reasonable range for speech\nfmin = 50\nfmax = 550\n\n# Select a model capacity--one of \"tiny\" or \"full\"\nmodel = 'tiny'\n\n# Choose a device to use for inference\ndevice = 'cuda:0' if torch.cuda.is_available() else 'cpu'\n\n# Pick a batch size that doesn't cause memory errors on your gpu\nbatch_size = 2048 # Note: Batching here refers to internal frame processing, not input audio files\n\n# Compute pitch\npitch = torchcrepe.predict(\n    audio,\n    sr,\n    hop_length,\n    fmin,\n    fmax,\n    model,\n    batch_size=batch_size,\n    device=device,\n    return_periodicity=False # Set to True to get a confidence score\n)\n\nprint(f\"Predicted pitch shape: {pitch.shape}\")\nif pitch.shape[-1] > 0:\n    print(f\"First few pitch values: {pitch[0, :5].tolist()}\")","lang":"python","description":"This quickstart demonstrates how to load an audio signal (using a mocked function for a self-contained example), set common parameters like hop length, frequency range, model capacity, and device, and then use `torchcrepe.predict` to estimate the pitch. It highlights the basic workflow for integrating torchcrepe into a PyTorch-based audio processing pipeline."},"warnings":[{"fix":"Be aware of this default behavior. For specific use cases, explore options in `torchcrepe.decode` if you need to replicate the original CREPE's decoding or implement custom post-processing.","message":"Torchcrepe's default Viterbi decoding differs from the original CREPE (TensorFlow) implementation. It uses Viterbi decoding on the softmax output instead of a weighted average, which helps prevent double/half frequency errors but changes the default pitch estimation approach.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Utilize `torchcrepe.threshold.Silence` to manually set periodicity (confidence) to zero in silent regions, or apply custom silence detection and masking.","message":"CREPE models were not trained on silent audio. This can lead to the model assigning high confidence to pitch bins even in silent regions. You may observe spurious pitch predictions in quiet sections.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Process individual audio files separately or manage custom padding and batching strategies if you need to run multiple audio signals through the model concurrently. The library's `predict_from_files_to_files` functions are designed for convenience with multiple files, handling them sequentially.","message":"The `batch_size` argument in `torchcrepe.predict` refers to internal batching over audio frames, not directly to processing multiple distinct audio files in a single call. Feeding multiple audio files of varying lengths in a batch for `predict` is not straightforward and might not offer the expected speed benefits due to padding overhead and other design choices.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}