{"id":7655,"library":"qwen-tts","title":"Qwen-TTS","description":"Qwen-TTS is a powerful text-to-speech (TTS) synthesis library developed by the Qwen team (Alibaba Cloud). It enables high-quality speech generation from text, supporting various languages and speaking styles. The library is currently at version 0.1.1 and is under active development, with updates typically coinciding with major model releases or feature improvements.","status":"active","version":"0.1.1","language":"en","source_language":"en","source_url":"https://github.com/Qwen/Qwen3-TTS","tags":["AI","TTS","Speech Synthesis","Alibaba","Qwen","deep learning"],"install":[{"cmd":"pip install qwen-tts","lang":"bash","label":"Install Qwen-TTS"},{"cmd":"pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118","lang":"bash","label":"Install PyTorch (CUDA 11.8 example)"}],"dependencies":[{"reason":"Deep learning framework for model operations; requires specific CUDA setup for GPU acceleration.","package":"torch","optional":false},{"reason":"Used for tokenizer and model loading infrastructure from Hugging Face.","package":"transformers","optional":false},{"reason":"Required for saving synthesized audio to WAV files.","package":"soundfile","optional":false},{"reason":"Used for efficient model loading and inference, especially on different devices.","package":"accelerate","optional":false},{"reason":"A dependency for some text processing components, particularly tokenization.","package":"sentencepiece","optional":false}],"imports":[{"note":"The main model class is nested within the `models` submodule.","wrong":"from qwen_tts import QwenTTS","symbol":"QwenTTS","correct":"from qwen_tts.models import QwenTTS"},{"note":"The text processing frontend is found in the `frontend` submodule.","wrong":"from qwen_tts.utils import get_frontend","symbol":"get_frontend","correct":"from qwen_tts.frontend import get_frontend"}],"quickstart":{"code":"import torch\nimport soundfile as sf\nfrom qwen_tts.frontend import get_frontend\nfrom qwen_tts.models import QwenTTS\n\n# Define text and style for synthesis\ntext = \"Hello, this is a test from Qwen TTS, demonstrating speech synthesis.\"\nlanguage = \"en\"\nstyle_name = \"neutral\" # Other options: 'happy', 'sad', etc.\n\n# Determine device for model loading (GPU if available, else CPU)\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\nprint(f\"Attempting to load model on: {device}\")\n\n# Load the QwenTTS model from Hugging Face Hub\ntry:\n    model = QwenTTS.from_pretrained('Qwen/Qwen3-TTS', device=device)\nexcept Exception as e:\n    print(f\"Failed to load model on {device}: {e}. Retrying with 'cpu'.\")\n    device = 'cpu'\n    model = QwenTTS.from_pretrained('Qwen/Qwen3-TTS', device=device)\n\n# Initialize the frontend for text processing\n# The exp_name is retrieved from the loaded model's hyperparameters\nfrontend = get_frontend(model.hparams.data.exp_name)\n\n# Get text and style tokens from the frontend\ntext_token, style_token = frontend.get_text_token_and_style_token(\n    text=text,\n    language=language,\n    style_name=style_name\n)\n\n# Synthesize speech using the model\noutput = model.synthesize(text_token, style_token)\nwav = output['wav'][0].cpu().numpy() # Extract waveform and move to CPU\nsampling_rate = model.hparams.data.sampling_rate\n\n# Save the synthesized audio to a WAV file\noutput_filename = \"qwen_tts_output.wav\"\nsf.write(output_filename, wav, sampling_rate)\nprint(f\"Speech synthesized and saved to {output_filename}\")\n","lang":"python","description":"This quickstart demonstrates how to load the Qwen-TTS model, prepare text with its frontend, synthesize speech, and save the output to a WAV file. It includes robust device selection (GPU/CPU) and handles common initialization steps."},"warnings":[{"fix":"Follow PyTorch's official installation instructions for your specific CUDA version and OS. Check the `transformers` and `accelerate` package documentation for any specific environment requirements. Often, installing PyTorch *before* `qwen-tts` is recommended.","message":"Qwen-TTS relies heavily on PyTorch and other deep learning dependencies. Ensuring correct installation, especially for GPU (CUDA) acceleration, is crucial. Mismatched CUDA versions between your system, PyTorch, and other libraries can lead to runtime errors or poor performance.","severity":"gotcha","affected_versions":"0.1.x"},{"fix":"Ensure a stable internet connection. If behind a proxy, configure environment variables like `HTTP_PROXY` and `HTTPS_PROXY`. If downloads are consistently failing, check your disk space and consider clearing the Hugging Face cache (`~/.cache/huggingface/hub/`) if corruption is suspected.","message":"The `QwenTTS.from_pretrained()` method downloads model weights from Hugging Face Hub. This requires an active internet connection and significant disk space (several GBs for the model). Slow connections or network issues can cause downloads to fail or be very slow.","severity":"gotcha","affected_versions":"0.1.x"},{"fix":"Consult the Qwen-TTS documentation or the model's configuration for a list of supported languages and available styles for each language. Start with 'en' and 'neutral' to ensure basic functionality before exploring other options.","message":"The `frontend.get_text_token_and_style_token()` method requires valid `language` and `style_name` parameters. Using unsupported languages or style names (e.g., 'happy' for a language that only supports 'neutral') will result in errors.","severity":"gotcha","affected_versions":"0.1.x"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Explicitly set `device='cpu'` when loading the model: `model = QwenTTS.from_pretrained('Qwen/Qwen3-TTS', device='cpu')`. Ensure your PyTorch installation matches your CUDA toolkit version if you intend to use a GPU.","cause":"The system attempted to load the model on a CUDA device, but no compatible GPU with a correctly configured PyTorch/CUDA environment was detected.","error":"RuntimeError: No available GPU(s) found"},{"fix":"Verify your internet connection and ensure direct access to Hugging Face Hub. Check the model ID for typos. If the issue persists, try clearing the Hugging Face cache: `rm -rf ~/.cache/huggingface/`.","cause":"The model or tokenizer files could not be downloaded or found locally. This often indicates network issues, incorrect model ID, or corrupted cached files.","error":"OSError: Can't load tokenizer for 'Qwen/Qwen3-TTS'. If you were trying to load it from 'https://huggingface.co/Qwen/Qwen3-TTS', make sure you don't have a local directory with the same name."},{"fix":"Install the library using pip: `pip install qwen-tts`.","cause":"The `qwen-tts` library is not installed in the current Python environment.","error":"ModuleNotFoundError: No module named 'qwen_tts'"}]}