Grok 2 Vision

JSON →
xai multimodal
textimage

A multimodal version of Grok 2 from xAI capable of understanding and analyzing images alongside text.

context window 33K tokens
max output 33K tokens
input price $2 / 1M tokens
output price $10 / 1M tokens
tool-usejson-modevisionstreamingreasoningcode-generationfunction-calling
releasedAug 2024
knowledge cutoffJul 2024
full doc /v1/models/grok-2-vision