CLIP ViT-B/32 Multilingual v1

JSON →
sentence-transformers multimodal
textimage

A multilingual vision-language embedding model that maps images and text to a shared embedding space.

vision
releasedMar 2022