Talk2DINO ViT-B

JSON →
lorebianchi98 multimodal
imagetext

A vision-language model for referring segmentation using DINOv2 ViT-B backbone.

vision
releasedMar 2024