F5-TTS: Flow Matching Text-to-Speech
raw JSON → 1.1.20 verified Fri May 01 auth: no python
F5-TTS is a text-to-speech library using flow matching, designed for faithful and fluent speech synthesis. Current version: 1.1.20. Active development with frequent releases.
pip install f5-tts Common errors
error ModuleNotFoundError: No module named 'f5_tts' ↓
cause Library not installed or virtual environment not activated.
fix
pip install f5-tts
error ImportError: cannot import name 'F5TTS' from 'f5_tts' ↓
cause Outdated version (<1.1.0) where F5TTS was in a submodule.
fix
pip install --upgrade f5-tts, then use from f5_tts import F5TTS
error RuntimeError: MelSpectrogram cache issue ↓
cause Known bug in v1.1.18 fixed in v1.1.19.
fix
Upgrade to f5-tts>=1.1.19
Warnings
gotcha Gradio version conflict: Gradio >=6.11 causes UI freeze. Pin to <6.11 or use version from requirements. ↓
fix pip install 'gradio<6.11'
deprecated Direct import of model classes (e.g., `from f5_tts.model import F5TTS`) may break in future releases. Use top-level `from f5_tts import F5TTS`. ↓
fix from f5_tts import F5TTS
gotcha Inference requires both reference audio and reference text. Omitting ref_text may yield poor quality. ↓
fix Always provide matching reference text for best results.
Install
pip install f5-tts[infer] Imports
- F5TTS wrong
from f5_tts.model import F5TTScorrectfrom f5_tts import F5TTS - infer_batch wrong
from f5_tts import infer_batchcorrectfrom f5_tts.infer import infer_batch
Quickstart
from f5_tts import F5TTS
tts = F5TTS()
# Generate speech from text
waveform, sample_rate = tts.infer(text="Hello, world!", ref_audio="path/to/ref.wav", ref_text="Reference text")
# Save to file
import soundfile as sf
sf.write("output.wav", waveform, sample_rate)