TorchFCPE: Fast Context-based Pitch Estimation

0.0.4 · active · verified Fri Apr 17

The official Pytorch implementation of Fast Context-based Pitch Estimation (FCPE), `torchfcpe` provides a robust and efficient solution for extracting fundamental frequency (F0) from audio signals. It is currently at version `0.0.4` and sees occasional updates, primarily focusing on model improvements and compatibility.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the `FCPE` model, prepare dummy audio data, and perform pitch estimation. It includes essential preprocessing steps like ensuring the correct device and input tensor shape. For real-world usage, replace the dummy audio with `torchaudio.load`.

import torch
import torchaudio
from torchfcpe import FCPE
import os

# --- Configuration ---
sampling_rate = 44100 # Default sample rate for FCPE
hop_length = 512    # Default hop length
device = os.environ.get('FCPE_DEVICE', 'cuda' if torch.cuda.is_available() else 'cpu')

print(f"Using device: {device}")

# --- 1. Initialize the FCPE model ---
model = FCPE(
    sampling_rate=sampling_rate,
    hop_length=hop_length,
    device=device
)

# --- 2. Create dummy audio data (or load real audio) ---
# For a real scenario, replace with torchaudio.load('your_audio.wav')
# This creates a 5-second mono sine wave at 440 Hz
num_samples = sampling_rate * 5 # 5 seconds of audio
t = torch.linspace(0, 5, num_samples, device=device)
frequency = 440.0 # Hz
# Generate a simple sine wave. FCPE expects mono audio.
# Shape: (batch_size, num_samples)
audio = torch.sin(2 * torch.pi * frequency * t).unsqueeze(0)

# --- 3. Preprocess audio for the model ---
# Ensure audio is on the correct device (already done for dummy data)
# Ensure audio is 2D (batch_size, num_samples)
if audio.dim() == 1:
    audio = audio.unsqueeze(0)
# If audio were stereo (e.g., shape (2, N)), convert to mono:
if audio.shape[0] > 1: # Assuming batch_size is 1, check channel dim
    audio = torch.mean(audio, dim=0, keepdim=True)

print(f"Input audio shape: {audio.shape}")

# --- 4. Perform pitch estimation ---
with torch.no_grad():
    f0, uv = model(audio)

# f0: fundamental frequency (Hz), uv: unvoiced/voiced decision (boolean-like)
print(f"Estimated F0 shape: {f0.shape}")
print(f"Estimated UV shape: {uv.shape}")
print(f"First 10 F0 values: {f0[0, :10].cpu().numpy()}")
print(f"First 10 UV values: {uv[0, :10].cpu().numpy()}")

view raw JSON →