Sherpa ONNX
sherpa-onnx is a next-generation speech recognition (ASR) and text-to-speech (TTS) toolkit built with k2 and ONNX Runtime. It provides high-performance, cross-platform inference for various state-of-the-art speech models, enabling real-time and offline processing. The library is actively maintained with frequent minor releases, often multiple times a week, reflecting rapid development and integration of new models and features. The current version is 1.12.38.
Common errors
-
ModuleNotFoundError: No module named 'sherpa_onnx'
cause The `sherpa-onnx` library is not installed in your current Python environment.fixInstall the library using pip: `pip install sherpa-onnx`. -
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model failed:
cause This error typically indicates that one of the ONNX model files (encoder, decoder, joiner) is missing, corrupt, or in an incompatible format for the `onnxruntime` version being used. It can also occur if the specified path to the model is incorrect.fixVerify that all model files exist at the specified paths. Re-download the model files to ensure they are not corrupt. Check the `sherpa-onnx` documentation or model release notes for any specific `onnxruntime` version requirements. -
FileNotFoundError: [Errno 2] No such file or directory: './path/to/tokens.txt'
cause One of the model's essential configuration files (like `tokens.txt` or a `.onnx` model file) could not be found at the path specified in your `OfflineRecognizerConfig` or `OnlineRecognizerConfig`.fixDouble-check all model paths provided in your configuration. Ensure the model directory and all its required files are present and accessible from your script's execution environment. Download the complete model archive if any files are missing. -
AttributeError: 'OfflineRecognizerResult' object has no attribute 'text'
cause This usually means you are accessing the result attribute incorrectly. `OfflineRecognizer` returns a `result` object that contains `text`, but it might be nested or accessed differently in newer versions or specific configurations.fixEnsure you are accessing `stream.result.text` after `recognizer.decode_stream(stream)` for offline recognition, or `recognizer.get_result().text` for online recognition. Always check the current documentation for the exact `Result` object structure.
Warnings
- breaking Model file formats and required configurations can change between major updates, especially with the introduction of new model architectures or ONNX Runtime versions. Models trained for older `sherpa-onnx` versions might not be directly compatible with newer APIs or vice-versa.
- gotcha sherpa-onnx depends on `onnxruntime` for its core inference. By default, `pip install sherpa-onnx` installs the CPU version of `onnxruntime`. For GPU acceleration, `onnxruntime-gpu` must be explicitly installed and replace `onnxruntime`. Mixing `onnxruntime` and `onnxruntime-gpu` in the same environment can lead to unexpected behavior or failures.
- gotcha Models for `sherpa-onnx` (encoder, decoder, joiner, tokens.txt, etc.) are not bundled with the pip package and must be downloaded separately. Incorrect paths, missing files, or using models not specifically designed for `sherpa-onnx` will lead to model loading failures.
- gotcha The library heavily uses native extensions (C++/Rust) which are compiled for specific platforms. While pre-built wheels are provided for common platforms, users on less common environments or custom builds might encounter compilation issues or `ImportError` if the native components cannot be loaded.
Install
-
pip install sherpa-onnx -
pip install sherpa-onnx onnxruntime-gpu
Imports
- OfflineRecognizer
from sherpa_onnx import OfflineRecognizer
- OfflineRecognizerConfig
from sherpa_onnx import OfflineRecognizerConfig
- FeatureConfig
from sherpa_onnx import FeatureConfig
- OnlineRecognizer
from sherpa_onnx import OnlineRecognizer
- VitsModel
from sherpa_onnx import VitsModel
Quickstart
import os
import wave
import urllib.request
from sherpa_onnx import OfflineRecognizer, OfflineRecognizerConfig, FeatureConfig, SpeakerEmbeddingExtractorConfig, OfflineStream
# --- Configuration and Model Download ---
# This example uses a small, popular ASR model.
# You can find more models at https://k2-fsa.github.io/sherpa-onnx/index.html
MODEL_DIR = "./sherpa-onnx-models/csukuangfj/sherpa-onnx-offline-zh-en-conformer-mix-init-transducer-2023-12-13"
MODEL_URL = "https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.12.38/sherpa-onnx-offline-zh-en-conformer-mix-init-transducer-2023-12-13.tar.bz2"
MODEL_TAR_FILE = os.path.join(os.path.dirname(MODEL_DIR), os.path.basename(MODEL_URL))
MODEL_FILES = {
"encoder": "encoder-epoch-99-avg-1.onnx",
"decoder": "decoder-epoch-99-avg-1.onnx",
"joiner": "joiner-epoch-99-avg-1.onnx",
"tokens": "tokens.txt"
}
AUDIO_FILE = "./sherpa-onnx-models/test.wav"
AUDIO_URL = "https://github.com/k2-fsa/sherpa-onnx/raw/master/sherpa-onnx/python/test.wav"
def download_file_if_not_exists(url, filename):
if not os.path.exists(filename):
print(f"Downloading {os.path.basename(filename)} from {url}...")
os.makedirs(os.path.dirname(filename), exist_ok=True)
urllib.request.urlretrieve(url, filename)
print("Download complete.")
def extract_tar_bz2(tar_path, extract_path):
if not os.path.exists(extract_path) or not os.listdir(extract_path):
print(f"Extracting {os.path.basename(tar_path)} to {extract_path}...")
import tarfile
with tarfile.open(tar_path, "r:bz2") as tar:
tar.extractall(path=extract_path)
print("Extraction complete.")
# Ensure model directory exists and models are downloaded/extracted
os.makedirs(MODEL_DIR, exist_ok=True)
if not all(os.path.exists(os.path.join(MODEL_DIR, f)) for f in MODEL_FILES.values()):
download_file_if_not_exists(MODEL_URL, MODEL_TAR_FILE)
extract_tar_bz2(MODEL_TAR_FILE, MODEL_DIR)
# Ensure test audio exists
download_file_if_not_exists(AUDIO_URL, AUDIO_FILE)
# --- Recognizer Configuration ---
feat_config = FeatureConfig(sample_rate=16000, feature_dim=80)
recognizer_config = OfflineRecognizerConfig(
feat_config=feat_config,
model_config={
"encoder": os.path.join(MODEL_DIR, MODEL_FILES["encoder"]),
"decoder": os.path.join(MODEL_DIR, MODEL_FILES["decoder"]),
"joiner": os.path.join(MODEL_DIR, MODEL_FILES["joiner"]),
"tokens": os.path.join(MODEL_DIR, MODEL_FILES["tokens"]),
"num_threads": 1, # Use 1 thread for CPU inference
"debug": False
},
lm_config={},
transducer_config={},
decode_config={
"method": "modified_beam_search",
"num_active_paths": 4
}
)
# --- Create Recognizer and Process Audio ---
recognizer = OfflineRecognizer(recognizer_config)
# Read the audio file
with wave.open(AUDIO_FILE, "rb") as f:
assert f.getframerate() == 16000, f.getframerate()
assert f.getnchannels() == 1, f.getnchannels()
assert f.getsampwidth() == 2, f.getsampwidth()
n_samples = f.getnframes()
audio_bytes = f.readframes(n_samples)
# Create an audio stream and pass the audio data
stream = recognizer.create_stream()
# Data expects float32 array, scale int16 to float32 range
import numpy as np
samples = np.frombuffer(audio_bytes, dtype=np.int16).astype(np.float32) / 32768.0
stream.accept_waveform(16000, samples)
# Decode the stream
recognizer.decode_stream(stream)
# Get the result
result = stream.result.text
print(f"Recognition Result: {result}")