Grok Vision Beta

JSON →
xai multimodal
textimage

Beta multimodal model from xAI capable of understanding and analyzing images alongside text.

context window 8K tokens
max output 8K tokens
input price $5 / 1M tokens
output price $15 / 1M tokens
visionstreamingfunction-callingtool-use