ViT Base Patch8 224
JSON →A base Vision Transformer with 8x8 patch size and 224x224 input resolution, pretrained on ImageNet-21k with second-generation augmentation regularization and fine-tuned on ImageNet-1k.
Capabilities
vision
A base Vision Transformer with 8x8 patch size and 224x224 input resolution, pretrained on ImageNet-21k with second-generation augmentation regularization and fine-tuned on ImageNet-1k.