AST Finetuned Speech Commands v2

JSON →
mit audio
audio

An Audio Spectrogram Transformer fine-tuned on the Speech Commands v2 dataset for keyword spotting.

context window 4K tokens
max output 4K tokens
input price $1 / 1M tokens
output price $2 / 1M tokens
streaming