ViT Base Patch32 384

JSON →
timm vision
image

A base Vision Transformer with 32x32 patch size and 384x384 input resolution, pretrained on ImageNet-21k with augmentation regularization and fine-tuned on ImageNet-1k.

vision