DPT SwinV2 Large 384

intel vision

image

A monocular depth estimation model using a Swin Transformer V2 large backbone with 384x384 input resolution, developed by Intel.