Torchcrepe

0.0.24 · active · verified Wed Apr 15

Torchcrepe is a PyTorch implementation of the CREPE pitch tracker, a state-of-the-art monophonic pitch estimation tool based on a deep convolutional neural network. It allows users to compute pitch and periodicity from audio signals, offering functionalities for direct file processing, filtering, thresholding, and various decoding options. The library is actively maintained, with regular updates to its PyPI package.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load an audio signal (using a mocked function for a self-contained example), set common parameters like hop length, frequency range, model capacity, and device, and then use `torchcrepe.predict` to estimate the pitch. It highlights the basic workflow for integrating torchcrepe into a PyTorch-based audio processing pipeline.

import torch
import torchcrepe
import numpy as np

# Mock torchcrepe.load.audio for a runnable example without external files
class MockLoadAudio:
    def audio(self, *args, **kwargs):
        # Generate a dummy 16kHz sine wave audio (1 second)
        sr = 16000
        duration = 1.0
        frequency = 440.0 # Hz
        t = np.linspace(0., duration, int(sr * duration), endpoint=False)
        audio_np = 0.5 * np.sin(2 * np.pi * frequency * t).astype(np.float32)
        return torch.from_numpy(audio_np).unsqueeze(0), sr # unsqueeze for batch dimension

torchcrepe.load = MockLoadAudio()

# Load dummy audio
audio, sr = torchcrepe.load.audio('dummy.wav', sr=16000)

# Here we'll use a 5 millisecond hop length
hop_length = int(sr / 200.)

# Provide a sensible frequency range for your domain (upper limit is 2006 Hz)
# This would be a reasonable range for speech
fmin = 50
fmax = 550

# Select a model capacity--one of "tiny" or "full"
model = 'tiny'

# Choose a device to use for inference
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

# Pick a batch size that doesn't cause memory errors on your gpu
batch_size = 2048 # Note: Batching here refers to internal frame processing, not input audio files

# Compute pitch
pitch = torchcrepe.predict(
    audio,
    sr,
    hop_length,
    fmin,
    fmax,
    model,
    batch_size=batch_size,
    device=device,
    return_periodicity=False # Set to True to get a confidence score
)

print(f"Predicted pitch shape: {pitch.shape}")
if pitch.shape[-1] > 0:
    print(f"First few pitch values: {pitch[0, :5].tolist()}")

view raw JSON →