ViT Base Patch8 224

JSON →
timm vision
image

A base Vision Transformer with 8x8 patch size and 224x224 input resolution, pretrained on ImageNet-21k with second-generation augmentation regularization and fine-tuned on ImageNet-1k.

vision