FunASR
FunASR is a fundamental, end-to-end speech recognition toolkit from Alibaba DAMO Academy, currently at version 1.3.1. It provides a wide range of features including Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), Punctuation Restoration, Language Models, Speaker Verification, and Speaker Diarization. The library is actively maintained with frequent updates, often releasing new models and features such as the Fun-ASR-Nano-2512 which supports 31 languages and low-latency real-time transcription.
Common errors
-
× python setup.py egg_info did not run successfully.
cause This error typically occurs during installation from source or when `setuptools` or `pip` are outdated, preventing proper package metadata generation.fixUpdate `pip`, `setuptools`, and `wheel` before attempting installation: `pip install --upgrade pip setuptools wheel`. If installing from a cloned repository, then try `pip install -e .`. -
error while loading shared libraries: libtorch_global_deps.so: cannot open shared object file: No such file or directory
cause The PyTorch (libtorch) shared libraries are not found in the system's dynamic linker search paths, often due to custom PyTorch installations or missing environment variables in containerized environments.fixSet `LD_LIBRARY_PATH` (Linux) or `PATH` (Windows) to include the directory containing your PyTorch installation's shared libraries (e.g., `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/pytorch/lib`). Reinstalling PyTorch or FunASR in a clean environment can also help. -
AssertionError: FunASRNano is not registered
cause This error arises when trying to use a specific model (e.g., `Fun-ASR-Nano-2512`) with an `funasr` library version that does not yet recognize or properly register that model's class, or if there's a model loading configuration mismatch.fixEnsure `funasr` and `modelscope` are updated to their latest versions: `pip install -U funasr modelscope`. Also, carefully review the model's documentation on ModelScope or Hugging Face for specific initialization parameters like `model_revision` or `trust_remote_code=True` that might be required.
Warnings
- gotcha The `funasr` library primarily supports PyTorch-based inference, while `funasr_onnx` is a separate package designed for ONNX Runtime. Ensure you are using the correct library or runtime components for your desired deployment, especially if using ONNX for optimized inference.
- gotcha When initializing `AutoModel`, models are downloaded from ModelScope by default. To use models hosted on Hugging Face, you must explicitly set `hub="hf"` in the `AutoModel` constructor.
- gotcha Model interfaces or required revisions (`model_revision` parameter) may change frequently with updates. It's advisable to specify `model_revision` if a specific model version is needed for reproducibility or compatibility.
- gotcha For model export operations (e.g., to ONNX), specific PyTorch versions might be required. For example, `torch >= 1.11.0` is necessary for ONNX export functionality.
Install
-
pip install -U funasr -
pip install -U modelscope
Imports
- AutoModel
from funasr import AutoModel
Quickstart
from funasr import AutoModel
import soundfile as sf
import os
# You might need to set an environment variable for modelscope token if hitting rate limits or private models
# os.environ['MODELSCOPE_API_TOKEN'] = 'your_token_here'
# Initialize the ASR model, will download 'paraformer-zh' from ModelScope if not local
# 'paraformer-zh' is a multi-functional model, with VAD and PUNC integrated.
# Use a public audio URL for demonstration
model = AutoModel(model="paraformer-zh",
vad_model="fsmn-vad",
punc_model="ct-punc-c",
device="cpu") # Specify 'cuda:0' for GPU if available
# Example audio input: a remote URL or a local file path
# For a local file, ensure it exists, e.g., 'path/to/your/audio.wav'
# Using a provided example audio from FunASR's repository
audio_input = "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav"
print(f"Processing audio: {audio_input}")
# Perform speech recognition
# The generate method returns a list of dictionaries with transcription results
result = model.generate(input=audio_input)
# Print the transcription result
if result and result[0].get('text'):
print(f"Transcription: {result[0]['text']}")
else:
print("No transcription result found.")
# Example of VAD (Voice Activity Detection)
# model_vad = AutoModel(model="fsmn-vad", device="cpu")
# vad_result = model_vad.generate(input=audio_input)
# print(f"VAD Result: {vad_result}")