OmniVoice
raw JSON → 0.1.5 verified Sat May 09 auth: no python
OmniVoice is a zero-shot text-to-speech library using diffusion language models. It supports multilingual TTS with voice cloning from short audio samples. Current version 0.1.5, actively maintained. Requires Python >= 3.10.
pip install omnivoice Common errors
error RuntimeError: Audio length mismatch ↓
cause Reference audio and text lengths do not align, or audio is too long (>30s recommended).
fix
Trim reference audio to 3-30 seconds and ensure the text corresponds exactly.
error AttributeError: module 'torchaudio' has no attribute 'resample' ↓
cause torchaudio version is too old (<0.12) for resample function.
fix
Install torchaudio >= 0.12: pip install --upgrade torchaudio
error ImportError: cannot import name 'OmniVoice' from 'omnivoice' ↓
cause Incorrect import path; older documentation showed wrong path.
fix
Use 'from omnivoice import OmniVoice' instead of 'from omnivoice.model import OmniVoice'.
error ValueError: The truth value of an array with more than one element is ambiguous ↓
cause Passing stereo audio as reference; expects mono.
fix
Convert reference audio to mono with torchaudio.functional.to_mono().
error FileNotFoundError: No such file or directory: 'path/to/model' ↓
cause Model not downloaded or cache path misconfigured.
fix
Ensure internet connection for first download, or set OMNIVOICE_CACHE_DIR to a valid path.
Warnings
breaking Model loading without internet will fail if cache is missing. Use local pretrained path explicitly. ↓
fix Set OMNIVOICE_CACHE_DIR or download model files manually.
gotcha Reference audio must be monophonic and at 24kHz sample rate. Mismatch causes quality degradation. ↓
fix Resample audio to 24000 Hz and convert to mono before passing.
gotcha Inference on MPS (Apple Silicon) may fail due to unsupported operations. Use CPU or CUDA. ↓
fix Set device='cpu' explicitly when using MPS.
deprecated The `load_asr` argument in model loading is deprecated. ASR model is now loaded automatically. ↓
fix Remove `load_asr=True` from `OmniVoice.from_pretrained`.
Imports
- OmniVoice wrong
from omnivoice.model import OmniVoicecorrectfrom omnivoice import OmniVoice - infer wrong
from omnivoice.inference import infercorrectfrom omnivoice import infer
Quickstart
from omnivoice import OmniVoice, infer
# Load model
model = OmniVoice.from_pretrained("k2-fsa/OmniVoice")
# Synthesize speech
audio = infer(model, text="Hello world", reference_audio="ref.wav", reference_text="The quick brown fox")
# Save to file
import torchaudio
torchaudio.save("output.wav", audio.unsqueeze(0), 24000)