Microsoft Azure Cognitive Services Speech SDK for JavaScript
The Microsoft Cognitive Services Speech SDK for JavaScript provides robust APIs for integrating speech-to-text, text-to-speech, and speech translation capabilities into JavaScript applications. It supports both browser and Node.js environments, making it versatile for various use cases. The current stable version is 1.49.0, with a release cadence that appears to be monthly or bi-monthly, indicating active development and continuous feature enhancements. Key differentiators include official support for Azure Speech Services, comprehensive feature set for speech AI, and first-class TypeScript type definitions, enabling a more robust development experience compared to generic WebSocket or REST API integrations.
Common errors
-
CANCELED: Reason=ErrorDetails=<Connection was closed by the remote host. ErrorCode: 1000. ErrorType: ConnectionFailure>
cause This error typically indicates an issue with the authentication key or region, or network connectivity problems preventing the WebSocket connection to the Azure Speech Service.fixVerify that `SPEECH_KEY` and `SPEECH_REGION` are correctly configured and match your Azure resource. Check network connectivity and firewall rules. -
TypeError: Cannot read properties of undefined (reading 'recognizeOnceAsync')
cause This usually happens when `SpeechRecognizer` (or other SDK classes) is not properly initialized, or the `new` keyword is omitted during instantiation.fixEnsure that `new SpeechRecognizer(speechConfig, audioConfig)` is used and that `speechConfig` and `audioConfig` are valid instances.
Warnings
- breaking Speaker recognition and intent recognition support were removed from the SDK. Applications relying on these features must use older versions of the SDK or migrate to alternative services/implementations.
- breaking SpeechRecognizer and TranslationRecognizer were updated to use V2 endpoints by default. This change means that `NoMatch` results may no longer be received as they were in previous versions, significantly altering recognition behavior.
- deprecated The `SpeechServiceConnection_EndSilenceTimeoutMs` property has been deprecated. While it may still function, its use is discouraged and it may be removed in future releases.
- gotcha The `FromEndpoint` API is now the recommended method for constructing a `SpeechConfig` for most scenarios, especially when working with `SpeechRecognizer`. While `FromSubscription` still works, `FromEndpoint` offers more flexibility and control.
Install
-
npm install microsoft-cognitiveservices-speech-sdk -
yarn add microsoft-cognitiveservices-speech-sdk -
pnpm add microsoft-cognitiveservices-speech-sdk
Imports
- SpeechConfig
const SpeechConfig = require('microsoft-cognitiveservices-speech-sdk');import { SpeechConfig } from 'microsoft-cognitiveservices-speech-sdk'; - SpeechRecognizer
import SpeechRecognizer from 'microsoft-cognitiveservices-speech-sdk';
import { SpeechRecognizer } from 'microsoft-cognitiveservices-speech-sdk'; - AudioConfig
import { audioConfig } from 'microsoft-cognitiveservices-speech-sdk';import { AudioConfig } from 'microsoft-cognitiveservices-speech-sdk';
Quickstart
import { SpeechConfig, AudioConfig, SpeechRecognizer, ResultReason } from 'microsoft-cognitiveservices-speech-sdk';
const speechKey: string = process.env.SPEECH_KEY ?? '';
const speechRegion: string = process.env.SPEECH_REGION ?? '';
async function recognizeFromMicrophone(): Promise<void> {
if (!speechKey || !speechRegion) {
console.error('Please set the SPEECH_KEY and SPEECH_REGION environment variables.');
return;
}
const speechConfig = SpeechConfig.fromSubscription(speechKey, speechRegion);
speechConfig.speechRecognitionLanguage = 'en-US';
const audioConfig = AudioConfig.fromDefaultMicrophoneInput();
const recognizer = new SpeechRecognizer(speechConfig, audioConfig);
console.log('Say something into your microphone...');
recognizer.recognizeOnceAsync(result => {
switch (result.reason) {
case ResultReason.RecognizedSpeech:
console.log(`RECOGNIZED: Text=${result.text}`);
break;
case ResultReason.NoMatch:
console.log('NOMATCH: Speech could not be recognized.');
break;
case ResultReason.Canceled:
const cancellationDetails = result.cancellationDetails;
console.log(`CANCELED: Reason=${cancellationDetails?.reason}`);
if (cancellationDetails?.errorDetails) {
console.log(`CANCELED: ErrorDetails=${cancellationDetails.errorDetails}`);
}
break;
}
});
}
recognizeFromMicrophone();