ViT Base Patch32 384
JSON →A base Vision Transformer with 32x32 patch size and 384x384 input resolution, pretrained on ImageNet-21k with augmentation regularization and fine-tuned on ImageNet-1k.
Capabilities
vision
A base Vision Transformer with 32x32 patch size and 384x384 input resolution, pretrained on ImageNet-21k with augmentation regularization and fine-tuned on ImageNet-1k.