CLAP HTSAT Fused

JSON →
laion multimodal
audiotext

A contrastive language-audio pretraining model using HTSAT audio encoder, enabling zero-shot audio classification and retrieval.

releasedMar 2023