VoxCPM
raw JSON → 2.0.2 verified Sat May 09 auth: no python
VoxCPM is a tokenizer-free text-to-speech (TTS) model for context-aware speech generation and voice cloning. Version 2.0.2 requires Python >=3.10. It leverages a causal transformer trained on continuous speech representations, enabling expressive and cloned voice outputs without discrete tokens. The library is under active development by OpenBMB.
pip install voxcpm Common errors
error TypeError: 'NoneType' object is not callable ↓
cause The underlying model download failed or was interrupted, leaving the model object as None.
fix
Reinstall the package and ensure a stable internet connection. Clear the cache:
rm -rf ~/.cache/voxcpm and retry. error RuntimeError: CUDA out of memory. Tried to allocate ... MiB ↓
cause Insufficient GPU memory for the model or batch.
fix
Reduce batch size, use a smaller model (if available), or run on CPU by setting
device='cpu'. error FileNotFoundError: [Errno 2] No such file or directory: 'path/to/ref_audio.wav' ↓
cause The voice cloning reference file path is incorrect or the file does not exist.
fix
Verify the file path and ensure it points to a valid WAV file.
Warnings
gotcha The `voice_clone` parameter expects a file path to a WAV file. Passing a numpy array or audio buffer will raise a TypeError. ↓
fix Ensure you provide a file path string to `voice_clone`.
gotcha The model requires significant GPU memory. On a 16GB GPU, batch inference may cause OOM errors. ↓
fix Reduce batch size or use smaller model variants if available.
deprecated The `voxcpm.VoxCPM` initialization without explicit `model_path` argument downloads the default model, which is deprecated in favor of explicit model selection. ↓
fix Specify `model_path='default'` or a custom path to future-proof your code.
Install
pip install git+https://github.com/OpenBMB/VoxCPM.git Imports
- VoxCPM wrong
import voxcpmcorrectfrom voxcpm import VoxCPM
Quickstart
from voxcpm import VoxCPM
import soundfile as sf
model = VoxCPM()
waveform, sr = model.synthesize("Hello, this is a test of voice cloning.", voice_clone="path/to/ref_audio.wav")
sf.write("output.wav", waveform, sr)