DPT SwinV2 Large 384

JSON →
intel vision
image

A monocular depth estimation model using a Swin Transformer V2 large backbone with 384x384 input resolution, developed by Intel.

vision
releasedDec 2023