ViT Base Patch16 224
JSON →A Vision Transformer (ViT) base model pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k at 224x224 resolution.
Capabilities
vision
A Vision Transformer (ViT) base model pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k at 224x224 resolution.