Grok Vision Beta
JSON →Beta multimodal model from xAI capable of understanding and analyzing images alongside text.
Specs
context window 8K tokens
max output 8K tokens
input price $5 / 1M tokens
output price $15 / 1M tokens
Capabilities
visionstreamingfunction-callingtool-use
API
full doc /v1/models/grok-vision-beta