PyWorld: Python Wrapper for WORLD Vocoder
PyWorld is a Python wrapper for the WORLD vocoder, a highly efficient and high-quality speech analysis, manipulation, and synthesis system. It allows users to extract fundamental frequency (f0), harmonic spectral envelope (sp), and aperiodic spectral envelope (ap) from speech, and subsequently synthesize speech from these parameters. The library is currently at version 0.3.5 and is actively maintained, though major releases have been infrequent. [1, 7, 9, 11]
Common errors
-
library not found: sndfile error
cause The underlying C library `libsndfile` is not installed on your system. PySoundFile, a common dependency for audio I/O with PyWorld, relies on this system library. [1]fixOn Debian/Ubuntu: `sudo apt-get install libsndfile1`. On macOS: `brew install libsndfile`. On Windows, this library is usually bundled with PySoundFile wheels or needs to be manually installed/configured if building from source. -
Building wheel for pyworld (pyproject.toml) ... error
cause This often indicates a problem during the compilation of the Cython extensions or the C++ WORLD vocoder. Common causes include missing C/C++ compilers, incorrect Cython versions, or environmental issues during the `git submodule update` step for the C++ WORLD source. [1, 12]fixEnsure you have a C/C++ compiler (e.g., `build-essential` on Linux, Xcode command line tools on macOS, Visual Studio on Windows). Try updating Cython (`pip install --upgrade Cython`). If installing from source, ensure `git submodule update --init` completes successfully to fetch the C++ WORLD source. Consider `pip install pyworld-prebuilt` if platform-specific wheels are available and easier to install. [3] -
ValueError: frame_period must be a positive value.
cause This error can occur if the sampling frequency (`fs`) is too low for PyWorld (e.g., < 16 kHz), leading to invalid internal parameter calculations. [1]fixCheck your audio's sampling rate. If it's below 16 kHz, resample it to 16 kHz or higher before passing it to PyWorld functions. E.g., `resampled_x = librosa.resample(x, orig_sr=fs, target_sr=16000)`.
Warnings
- breaking WORLD vocoder, and thus PyWorld, is designed for speech sampled at 16 kHz or higher. Applying it to audio with a sampling rate below 16 kHz will result in failure or incorrect output. [1, 7]
- gotcha When loading audio with libraries like `scipy` or `librosa` for PyWorld processing, ensure the audio data is converted to `numpy.float64` (double precision). PyWorld's C backend expects this data type, and incorrect types can lead to errors or unexpected behavior. [1]
- gotcha For audio with a low Signal-to-Noise Ratio (SNR), the `pyworld.dio` pitch extractor may perform poorly. The `pyworld.harvest` extractor is often a better alternative in such conditions. [1]
- gotcha When installing `pyworld` from source, especially if encountering build errors, an outdated `Cython` version can be the cause. [1]
Install
-
pip install pyworld
Imports
- pyworld
import pyworld as pw
Quickstart
import numpy as np
import pyworld as pw
# Simulate a mono audio waveform (e.g., 2 seconds at 44.1 kHz)
fs = 44100 # Sampling frequency
t = np.arange(0, 2.0, 1.0/fs) # Time vector
f0_val = 200 # Hz
x = 0.5 * np.sin(2 * np.pi * f0_val * t).astype(np.float64)
# Ensure the waveform is float64 as expected by pyworld
# Extract WORLD features
f0, sp, ap = pw.wav2world(x, fs)
print(f"Extracted f0 shape: {f0.shape}")
print(f"Extracted spectral envelope shape: {sp.shape}")
print(f"Extracted aperiodicity shape: {ap.shape}")
# Synthesize speech back (optional, requires additional components like 'y')
y_synthesized = pw.synthesize(f0, sp, ap, fs)
print(f"Synthesized audio shape: {y_synthesized.shape}")