ViT-Base (Patch16, 224px, AugReg v2, IN21k → IN1k)

JSON →
timm vision
image

A Vision Transformer base model with patch size 16, pretrained on ImageNet-21k with AugReg v2 and fine-tuned on ImageNet-1k for classification.

vision
releasedOct 2021