F5-TTS: Flow Matching Text-to-Speech

raw JSON →
1.1.20 verified Fri May 01 auth: no python

F5-TTS is a text-to-speech library using flow matching, designed for faithful and fluent speech synthesis. Current version: 1.1.20. Active development with frequent releases.

pip install f5-tts
error ModuleNotFoundError: No module named 'f5_tts'
cause Library not installed or virtual environment not activated.
fix
pip install f5-tts
error ImportError: cannot import name 'F5TTS' from 'f5_tts'
cause Outdated version (<1.1.0) where F5TTS was in a submodule.
fix
pip install --upgrade f5-tts, then use from f5_tts import F5TTS
error RuntimeError: MelSpectrogram cache issue
cause Known bug in v1.1.18 fixed in v1.1.19.
fix
Upgrade to f5-tts>=1.1.19
gotcha Gradio version conflict: Gradio >=6.11 causes UI freeze. Pin to <6.11 or use version from requirements.
fix pip install 'gradio<6.11'
deprecated Direct import of model classes (e.g., `from f5_tts.model import F5TTS`) may break in future releases. Use top-level `from f5_tts import F5TTS`.
fix from f5_tts import F5TTS
gotcha Inference requires both reference audio and reference text. Omitting ref_text may yield poor quality.
fix Always provide matching reference text for best results.
pip install f5-tts[infer]

Quick start for basic inference with a reference audio clip.

from f5_tts import F5TTS

tts = F5TTS()

# Generate speech from text
waveform, sample_rate = tts.infer(text="Hello, world!", ref_audio="path/to/ref.wav", ref_text="Reference text")

# Save to file
import soundfile as sf
sf.write("output.wav", waveform, sample_rate)