Prompt Depth Anything ViT-L

depth-anything vision

image

A promptable depth estimation model using a Vision Transformer large backbone, enabling conditional depth prediction from user prompts.