{"id":8637,"library":"sherpa-onnx","title":"Sherpa ONNX","description":"sherpa-onnx is a next-generation speech recognition (ASR) and text-to-speech (TTS) toolkit built with k2 and ONNX Runtime. It provides high-performance, cross-platform inference for various state-of-the-art speech models, enabling real-time and offline processing. The library is actively maintained with frequent minor releases, often multiple times a week, reflecting rapid development and integration of new models and features. The current version is 1.12.38.","status":"active","version":"1.12.38","language":"en","source_language":"en","source_url":"https://github.com/k2-fsa/sherpa-onnx","tags":["speech-recognition","asr","tts","onnx","audio","machine-learning","deep-learning","real-time"],"install":[{"cmd":"pip install sherpa-onnx","lang":"bash","label":"Install CPU version"},{"cmd":"pip install sherpa-onnx onnxruntime-gpu","lang":"bash","label":"Install GPU version (requires compatible CUDA setup)"}],"dependencies":[{"reason":"Core inference engine for ONNX models. Automatically installed with `sherpa-onnx`.","package":"onnxruntime","optional":false},{"reason":"Required for GPU acceleration. Must be installed instead of `onnxruntime` for GPU support.","package":"onnxruntime-gpu","optional":true}],"imports":[{"symbol":"OfflineRecognizer","correct":"from sherpa_onnx import OfflineRecognizer"},{"symbol":"OfflineRecognizerConfig","correct":"from sherpa_onnx import OfflineRecognizerConfig"},{"symbol":"FeatureConfig","correct":"from sherpa_onnx import FeatureConfig"},{"symbol":"OnlineRecognizer","correct":"from sherpa_onnx import OnlineRecognizer"},{"symbol":"VitsModel","correct":"from sherpa_onnx import VitsModel"}],"quickstart":{"code":"import os\nimport wave\nimport urllib.request\nfrom sherpa_onnx import OfflineRecognizer, OfflineRecognizerConfig, FeatureConfig, SpeakerEmbeddingExtractorConfig, OfflineStream\n\n# --- Configuration and Model Download ---\n# This example uses a small, popular ASR model.\n# You can find more models at https://k2-fsa.github.io/sherpa-onnx/index.html\n\nMODEL_DIR = \"./sherpa-onnx-models/csukuangfj/sherpa-onnx-offline-zh-en-conformer-mix-init-transducer-2023-12-13\"\nMODEL_URL = \"https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.12.38/sherpa-onnx-offline-zh-en-conformer-mix-init-transducer-2023-12-13.tar.bz2\"\nMODEL_TAR_FILE = os.path.join(os.path.dirname(MODEL_DIR), os.path.basename(MODEL_URL))\nMODEL_FILES = {\n    \"encoder\": \"encoder-epoch-99-avg-1.onnx\",\n    \"decoder\": \"decoder-epoch-99-avg-1.onnx\",\n    \"joiner\": \"joiner-epoch-99-avg-1.onnx\",\n    \"tokens\": \"tokens.txt\"\n}\n\nAUDIO_FILE = \"./sherpa-onnx-models/test.wav\"\nAUDIO_URL = \"https://github.com/k2-fsa/sherpa-onnx/raw/master/sherpa-onnx/python/test.wav\"\n\ndef download_file_if_not_exists(url, filename):\n    if not os.path.exists(filename):\n        print(f\"Downloading {os.path.basename(filename)} from {url}...\")\n        os.makedirs(os.path.dirname(filename), exist_ok=True)\n        urllib.request.urlretrieve(url, filename)\n        print(\"Download complete.\")\n\ndef extract_tar_bz2(tar_path, extract_path):\n    if not os.path.exists(extract_path) or not os.listdir(extract_path):\n        print(f\"Extracting {os.path.basename(tar_path)} to {extract_path}...\")\n        import tarfile\n        with tarfile.open(tar_path, \"r:bz2\") as tar:\n            tar.extractall(path=extract_path)\n        print(\"Extraction complete.\")\n\n# Ensure model directory exists and models are downloaded/extracted\nos.makedirs(MODEL_DIR, exist_ok=True)\nif not all(os.path.exists(os.path.join(MODEL_DIR, f)) for f in MODEL_FILES.values()):\n    download_file_if_not_exists(MODEL_URL, MODEL_TAR_FILE)\n    extract_tar_bz2(MODEL_TAR_FILE, MODEL_DIR)\n\n# Ensure test audio exists\ndownload_file_if_not_exists(AUDIO_URL, AUDIO_FILE)\n\n# --- Recognizer Configuration ---\nfeat_config = FeatureConfig(sample_rate=16000, feature_dim=80)\n\nrecognizer_config = OfflineRecognizerConfig(\n    feat_config=feat_config,\n    model_config={\n        \"encoder\": os.path.join(MODEL_DIR, MODEL_FILES[\"encoder\"]),\n        \"decoder\": os.path.join(MODEL_DIR, MODEL_FILES[\"decoder\"]),\n        \"joiner\": os.path.join(MODEL_DIR, MODEL_FILES[\"joiner\"]),\n        \"tokens\": os.path.join(MODEL_DIR, MODEL_FILES[\"tokens\"]),\n        \"num_threads\": 1, # Use 1 thread for CPU inference\n        \"debug\": False\n    },\n    lm_config={},\n    transducer_config={},\n    decode_config={\n        \"method\": \"modified_beam_search\",\n        \"num_active_paths\": 4\n    }\n)\n\n# --- Create Recognizer and Process Audio ---\nrecognizer = OfflineRecognizer(recognizer_config)\n\n# Read the audio file\nwith wave.open(AUDIO_FILE, \"rb\") as f:\n    assert f.getframerate() == 16000, f.getframerate()\n    assert f.getnchannels() == 1, f.getnchannels()\n    assert f.getsampwidth() == 2, f.getsampwidth()\n    n_samples = f.getnframes()\n    audio_bytes = f.readframes(n_samples)\n\n# Create an audio stream and pass the audio data\nstream = recognizer.create_stream()\n# Data expects float32 array, scale int16 to float32 range\nimport numpy as np\nsamples = np.frombuffer(audio_bytes, dtype=np.int16).astype(np.float32) / 32768.0\nstream.accept_waveform(16000, samples)\n\n# Decode the stream\nrecognizer.decode_stream(stream)\n\n# Get the result\nresult = stream.result.text\nprint(f\"Recognition Result: {result}\")\n","lang":"python","description":"This quickstart demonstrates how to perform offline speech recognition using `sherpa-onnx`. It first ensures that a sample ASR model and an audio file are downloaded locally, then configures and initializes an `OfflineRecognizer`. Finally, it processes the sample audio file and prints the transcribed text. For GPU inference, install `onnxruntime-gpu` and ensure CUDA is properly set up."},"warnings":[{"fix":"Always refer to the latest `sherpa-onnx` documentation and examples for your specific version. Download models specified for the exact version or release date you are using. Retrain or convert models if necessary.","message":"Model file formats and required configurations can change between major updates, especially with the introduction of new model architectures or ONNX Runtime versions. Models trained for older `sherpa-onnx` versions might not be directly compatible with newer APIs or vice-versa.","severity":"breaking","affected_versions":"All versions (due to continuous rapid development)"},{"fix":"For CPU: `pip install sherpa-onnx`. For GPU: `pip install sherpa-onnx onnxruntime-gpu`. Ensure you do not have both `onnxruntime` and `onnxruntime-gpu` installed simultaneously in your environment. Uninstall one before installing the other if switching.","message":"sherpa-onnx depends on `onnxruntime` for its core inference. By default, `pip install sherpa-onnx` installs the CPU version of `onnxruntime`. For GPU acceleration, `onnxruntime-gpu` must be explicitly installed and replace `onnxruntime`. Mixing `onnxruntime` and `onnxruntime-gpu` in the same environment can lead to unexpected behavior or failures.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure all required model files are downloaded from the official `sherpa-onnx` GitHub releases or specified model repositories. Verify that the paths provided in your `OfflineRecognizerConfig` or `OnlineRecognizerConfig` objects accurately point to these files.","message":"Models for `sherpa-onnx` (encoder, decoder, joiner, tokens.txt, etc.) are not bundled with the pip package and must be downloaded separately. Incorrect paths, missing files, or using models not specifically designed for `sherpa-onnx` will lead to model loading failures.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Prefer using official pre-built wheels via `pip install sherpa-onnx`. If building from source, ensure you have the necessary build tools (CMake, C++ compiler) and dependencies installed as outlined in the project's contribution guide.","message":"The library heavily uses native extensions (C++/Rust) which are compiled for specific platforms. While pre-built wheels are provided for common platforms, users on less common environments or custom builds might encounter compilation issues or `ImportError` if the native components cannot be loaded.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install the library using pip: `pip install sherpa-onnx`.","cause":"The `sherpa-onnx` library is not installed in your current Python environment.","error":"ModuleNotFoundError: No module named 'sherpa_onnx'"},{"fix":"Verify that all model files exist at the specified paths. Re-download the model files to ensure they are not corrupt. Check the `sherpa-onnx` documentation or model release notes for any specific `onnxruntime` version requirements.","cause":"This error typically indicates that one of the ONNX model files (encoder, decoder, joiner) is missing, corrupt, or in an incompatible format for the `onnxruntime` version being used. It can also occur if the specified path to the model is incorrect.","error":"onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model failed:"},{"fix":"Double-check all model paths provided in your configuration. Ensure the model directory and all its required files are present and accessible from your script's execution environment. Download the complete model archive if any files are missing.","cause":"One of the model's essential configuration files (like `tokens.txt` or a `.onnx` model file) could not be found at the path specified in your `OfflineRecognizerConfig` or `OnlineRecognizerConfig`.","error":"FileNotFoundError: [Errno 2] No such file or directory: './path/to/tokens.txt'"},{"fix":"Ensure you are accessing `stream.result.text` after `recognizer.decode_stream(stream)` for offline recognition, or `recognizer.get_result().text` for online recognition. Always check the current documentation for the exact `Result` object structure.","cause":"This usually means you are accessing the result attribute incorrectly. `OfflineRecognizer` returns a `result` object that contains `text`, but it might be nested or accessed differently in newer versions or specific configurations.","error":"AttributeError: 'OfflineRecognizerResult' object has no attribute 'text'"}]}