{"library":"arctic-inference","type":"library","category":null,"description":"Arctic Inference is an open-source vLLM plugin developed by Snowflake AI Research, designed for high-throughput, low-latency inference of Large Language Models (LLMs) and embeddings. It achieves this through advanced optimizations like Shift Parallelism, Speculative Decoding, SwiftKV, and Arctic Ulysses. The library seamlessly integrates with and automatically patches vLLM (v0.8.4 and later), allowing users to leverage these performance gains while continuing to use familiar vLLM APIs and CLI. The current version is 0.1.2, with an active development cycle releasing updates and research findings.","language":"python","status":"active","version":"0.1.2","tags":["LLM inference","vLLM","Snowflake","performance optimization","GPU acceleration","speculative decoding","shift parallelism","AI","embeddings"],"last_verified":"Thu Apr 16","install":[{"cmd":"pip install arctic-inference[vllm]","imports":[]}],"homepage":"https://www.snowflake.com/","github":null,"docs":null,"changelog":null,"pypi":"https://pypi.org/project/arctic-inference/","npm":null,"openapi_spec":null,"status_page":null,"smithery":null,"compatibility":null}