ConvNeXt Large MLP CLIP LAION-2B

JSON →
timm vision
image

ConvNeXt large model with MLP head trained with CLIP on LAION-2B, model soup averaged, then fine-tuned on ImageNet-12k and ImageNet-1k at 320x320 resolution.

vision