AST Finetuned Speech Commands v2
JSON →An Audio Spectrogram Transformer fine-tuned on the Speech Commands v2 dataset for keyword spotting.
Specs
context window 4K tokens
max output 4K tokens
input price $1 / 1M tokens
output price $2 / 1M tokens
Capabilities
streaming