CLIP ViT-B/32 Multilingual v1
JSON →A multilingual vision-language embedding model that maps images and text to a shared embedding space.
Capabilities
vision
Dates
releasedMar 2022
A multilingual vision-language embedding model that maps images and text to a shared embedding space.