Prompt Depth Anything ViT-L

JSON →
depth-anything vision
image

A promptable depth estimation model using a Vision Transformer large backbone, enabling conditional depth prediction from user prompts.

vision
releasedNov 2024