VoxCPM

raw JSON →
2.0.2 verified Sat May 09 auth: no python

VoxCPM is a tokenizer-free text-to-speech (TTS) model for context-aware speech generation and voice cloning. Version 2.0.2 requires Python >=3.10. It leverages a causal transformer trained on continuous speech representations, enabling expressive and cloned voice outputs without discrete tokens. The library is under active development by OpenBMB.

pip install voxcpm
error TypeError: 'NoneType' object is not callable
cause The underlying model download failed or was interrupted, leaving the model object as None.
fix
Reinstall the package and ensure a stable internet connection. Clear the cache: rm -rf ~/.cache/voxcpm and retry.
error RuntimeError: CUDA out of memory. Tried to allocate ... MiB
cause Insufficient GPU memory for the model or batch.
fix
Reduce batch size, use a smaller model (if available), or run on CPU by setting device='cpu'.
error FileNotFoundError: [Errno 2] No such file or directory: 'path/to/ref_audio.wav'
cause The voice cloning reference file path is incorrect or the file does not exist.
fix
Verify the file path and ensure it points to a valid WAV file.
gotcha The `voice_clone` parameter expects a file path to a WAV file. Passing a numpy array or audio buffer will raise a TypeError.
fix Ensure you provide a file path string to `voice_clone`.
gotcha The model requires significant GPU memory. On a 16GB GPU, batch inference may cause OOM errors.
fix Reduce batch size or use smaller model variants if available.
deprecated The `voxcpm.VoxCPM` initialization without explicit `model_path` argument downloads the default model, which is deprecated in favor of explicit model selection.
fix Specify `model_path='default'` or a custom path to future-proof your code.
pip install git+https://github.com/OpenBMB/VoxCPM.git

Load the VoxCPM model, generate speech with optional voice cloning from a reference audio file, and save the output.

from voxcpm import VoxCPM
import soundfile as sf

model = VoxCPM()
waveform, sr = model.synthesize("Hello, this is a test of voice cloning.", voice_clone="path/to/ref_audio.wav")
sf.write("output.wav", waveform, sr)