TorchFCPE: Fast Context-based Pitch Estimation
The official Pytorch implementation of Fast Context-based Pitch Estimation (FCPE), `torchfcpe` provides a robust and efficient solution for extracting fundamental frequency (F0) from audio signals. It is currently at version `0.0.4` and sees occasional updates, primarily focusing on model improvements and compatibility.
Common errors
-
ModuleNotFoundError: No module named 'torchfcpe'
cause The `torchfcpe` library is not installed in your current Python environment.fixInstall the library using pip: `pip install torchfcpe` -
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) do not match
cause The input audio tensor is on a different device (e.g., CPU) than the `FCPE` model (e.g., CUDA or MPS).fixMove your input audio tensor to the same device as the model: `audio = audio.to(model.device)` -
ValueError: expected 2D input (got N-D input)
cause The `FCPE` model expects a 2D input tensor of shape `(batch_size, num_samples)`. Your input might be 1D (mono, no batch dim) or 3D (e.g., batch, channels, samples).fixEnsure your audio tensor has the correct shape. If 1D: `audio = audio.unsqueeze(0)`. If stereo (e.g., `(2, num_samples)`), convert to mono: `audio = torch.mean(audio, dim=0, keepdim=True)`. -
AttributeError: 'FCPE' object has no attribute 'get_device'
cause This error is likely from an older, potentially unofficial or modified version of FCPE, or a misunderstanding of how `torchfcpe` handles devices. The official `torchfcpe` model uses `model.device` directly.fixUse `model.device` to query the device the model is on. Ensure your `torchfcpe` library is up-to-date and installed from the official PyPI or GitHub source.
Warnings
- breaking Starting from `v0.0.4`, all FCPE models now correctly inherit directly from `torch.nn.Module`. While direct instantiation of `FCPE(...)` is generally stable, any custom subclasses or code directly manipulating internal model structure that relied on a previous inheritance hierarchy might require review.
- gotcha Improved device detection and specific support for Apple Silicon's MPS device were added in `v0.0.2`. Users upgrading from `v0.0.1` or experiencing device-related issues (especially with MPS) may find more reliable behavior in newer versions.
- gotcha The `sampling_rate` parameter passed to the `FCPE` model during initialization must match the sample rate of your input audio. A mismatch will lead to incorrect pitch estimation or errors due to internal feature extraction assumptions.
Install
-
pip install torchfcpe
Imports
- FCPE
from torchfcpe import FCPE
Quickstart
import torch
import torchaudio
from torchfcpe import FCPE
import os
# --- Configuration ---
sampling_rate = 44100 # Default sample rate for FCPE
hop_length = 512 # Default hop length
device = os.environ.get('FCPE_DEVICE', 'cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
# --- 1. Initialize the FCPE model ---
model = FCPE(
sampling_rate=sampling_rate,
hop_length=hop_length,
device=device
)
# --- 2. Create dummy audio data (or load real audio) ---
# For a real scenario, replace with torchaudio.load('your_audio.wav')
# This creates a 5-second mono sine wave at 440 Hz
num_samples = sampling_rate * 5 # 5 seconds of audio
t = torch.linspace(0, 5, num_samples, device=device)
frequency = 440.0 # Hz
# Generate a simple sine wave. FCPE expects mono audio.
# Shape: (batch_size, num_samples)
audio = torch.sin(2 * torch.pi * frequency * t).unsqueeze(0)
# --- 3. Preprocess audio for the model ---
# Ensure audio is on the correct device (already done for dummy data)
# Ensure audio is 2D (batch_size, num_samples)
if audio.dim() == 1:
audio = audio.unsqueeze(0)
# If audio were stereo (e.g., shape (2, N)), convert to mono:
if audio.shape[0] > 1: # Assuming batch_size is 1, check channel dim
audio = torch.mean(audio, dim=0, keepdim=True)
print(f"Input audio shape: {audio.shape}")
# --- 4. Perform pitch estimation ---
with torch.no_grad():
f0, uv = model(audio)
# f0: fundamental frequency (Hz), uv: unvoiced/voiced decision (boolean-like)
print(f"Estimated F0 shape: {f0.shape}")
print(f"Estimated UV shape: {uv.shape}")
print(f"First 10 F0 values: {f0[0, :10].cpu().numpy()}")
print(f"First 10 UV values: {uv[0, :10].cpu().numpy()}")