{"id":9090,"library":"lmcache","title":"lmcache","description":"lmcache is a Python library that provides an LLM serving engine extension. It aims to reduce Time To First Token (TTFT) and increase throughput, particularly in scenarios involving long contexts. The current version is 0.4.3, and it appears to have an active development cadence.","status":"active","version":"0.4.3","language":"en","source_language":"en","source_url":"https://github.com/lm-cache/lmcache","tags":["LLM","AI","serving","caching","throughput"],"install":[{"cmd":"pip install lmcache","lang":"bash","label":"Install lmcache"}],"dependencies":[{"reason":"Core deep learning library for LLM operations (especially server side).","package":"torch","optional":false},{"reason":"Used for loading and interacting with various transformer models (server side).","package":"transformers","optional":false},{"reason":"For defining data schemas used in client-server communication.","package":"pydantic","optional":false},{"reason":"Asynchronous HTTP client used for client-server communication.","package":"httpx","optional":false}],"imports":[{"note":"The client import path was changed in versions leading up to 0.4.0, moving from a 'core' submodule.","wrong":"from lmcache.core.client import Client","symbol":"Client","correct":"from lmcache.client import Client"},{"symbol":"ChatCompletionRequest","correct":"from lmcache.schemas import ChatCompletionRequest"},{"symbol":"ChatCompletionMessage","correct":"from lmcache.schemas import ChatCompletionMessage"}],"quickstart":{"code":"import os\nfrom lmcache.client import Client\nfrom lmcache.schemas import ChatCompletionRequest, ChatCompletionMessage\n\n# NOTE: An lmcache server must be running separately for this client to connect.\n# Default server host is 'localhost', port 13333.\n\ntry:\n    client = Client(host=os.environ.get('LMCACHE_HOST', 'localhost'), \n                    port=int(os.environ.get('LMCACHE_PORT', 13333)))\n\n    request = ChatCompletionRequest(\n        model=os.environ.get('LMCACHE_MODEL', 'gpt-3.5-turbo'), # Replace with a model supported by your lmcache server\n        messages=[\n            ChatCompletionMessage(role=\"user\", content=\"Hello, how are you?\"),\n            ChatCompletionMessage(role=\"assistant\", content=\"I am doing well, thank you!\"),\n            ChatCompletionMessage(role=\"user\", content=\"What is your purpose?\")\n        ]\n    )\n\n    response = client.chat_completion(request)\n    print(f\"Assistant: {response.choices[0].message.content}\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Ensure the lmcache server is running and accessible at the specified host and port.\")\n","lang":"python","description":"This quickstart demonstrates how to use the lmcache client to interact with a running lmcache server. It sends a chat completion request similar to the OpenAI API. Please ensure that an lmcache server is running independently before executing this client code."},"warnings":[{"fix":"Update client code to use `lmcache.client.Client` and schema objects like `ChatCompletionRequest` from `lmcache.schemas`. Replace old methods like `complete` with `chat_completion`.","message":"The client-side API underwent a significant refactor in version 0.4.0 to align more closely with the OpenAI API. Code written for versions prior to 0.4.0 will likely be incompatible.","severity":"breaking","affected_versions":"<0.4.0"},{"fix":"Ensure the lmcache server (e.g., via `lmcache serve`) is running and accessible at the host and port specified by the client. Check network configurations if running remotely.","message":"lmcache is a client-server architecture. The client library cannot function without a separate lmcache server instance running. A common error is a 'Connection Refused' message if the server is not started or is inaccessible.","severity":"gotcha","affected_versions":"All"},{"fix":"Allocate sufficient GPU memory and CPU resources. Consult lmcache documentation for hardware recommendations and model-specific resource requirements.","message":"The lmcache server (and thus, the library's utility) often requires significant GPU memory and computational resources, especially for large language models. Insufficient resources can lead to performance issues or failures.","severity":"gotcha","affected_versions":"All"},{"fix":"Verify the model name and configuration on the lmcache server. Ensure required model weights are available to the server and that the client requests a compatible model.","message":"Model compatibility and configuration can be tricky. The client's `model` parameter must correspond to a model successfully loaded and served by the lmcache server, which might require specific server configurations or local model files.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Start the lmcache server (e.g., `lmcache serve`) and ensure the client's `host` and `port` parameters match the server's configuration.","cause":"The lmcache server is not running or the client is trying to connect to the wrong host/port.","error":"ConnectionRefusedError: [Errno 111] Connection refused"},{"fix":"Update your client code to use the new OpenAI-compatible API, specifically `client.chat_completion()` and related schema objects.","cause":"You are using an older API method (e.g., `complete`) with a `lmcache` client version 0.4.0 or newer.","error":"AttributeError: 'Client' object has no attribute 'complete'"},{"fix":"Change the import statement from `from lmcache.core.client import Client` to `from lmcache.client import Client`.","cause":"You are trying to import the `Client` class from an old module path.","error":"ModuleNotFoundError: No module named 'lmcache.core.client'"}]}