AST Finetuned Speech Commands v2

mit audio

audio

An Audio Spectrogram Transformer fine-tuned on the Speech Commands v2 dataset for keyword spotting.

Specs

context window 4K tokens

max output 4K tokens

input price $1 / 1M tokens

output price $2 / 1M tokens

streaming