DepthVLM-4B

JSON →
JonnyYu828 multimodal
imagetext

A vision-language model with 4 billion parameters for depth-aware visual reasoning and question answering.

visionreasoning
releasedAug 2024