Arctic Inference

library 0.1.2 ·python

✓ verified Apr 16, 2026

Arctic Inference is an open-source vLLM plugin developed by Snowflake AI Research, designed for high-throughput, low-latency inference of Large Language Models (LLMs) and embeddings. It achieves this through advanced optimizations like Shift Parallelism, Speculative Decoding, SwiftKV, and Arctic Ulysses. The library seamlessly integrates with and automatically patches vLLM (v0.8.4 and later), allowing users to leverage these performance gains while continuing to use familiar vLLM APIs and CLI. The current version is 0.1.2, with an active development cycle releasing updates and research findings.