ViT Base Patch16 224

JSON →
google vision
image

A Vision Transformer (ViT) base model pre-trained on ImageNet-21k and fine-tuned on ImageNet-1k at 224x224 resolution.

vision