GPTCache
GPTCache is a powerful caching library designed to speed up and lower the cost of chat applications that rely on Large Language Model (LLM) services. It functions as a semantic cache, storing and retrieving responses for similar (not just exact) queries using embedding algorithms and vector stores. The library is actively maintained with frequent minor releases.
Warnings
- gotcha When integrating with LangChain, particularly with Pydantic v2, older versions of GPTCache might have caused 'metaclass conflict errors' or 'LangChain chat pydantic bugs'.
- gotcha Using certain features like remote Redis cache stores or distributed caching might require explicit installation of `redis` and can encounter connection issues in older versions.
- gotcha Changes in external LLM APIs (e.g., OpenAI's API base for embeddings) can cause unexpected behavior or errors if GPTCache is not updated to reflect these changes.
Install
-
pip install gptcache -
pip install gptcache[openai] -
pip install gptcache[langchain] -
pip install gptcache[redis]
Imports
- cache
from gptcache import cache
- GPTCache
from gptcache import GPTCache
- openai
from gptcache.adapter import openai
Quickstart
import os
from gptcache import cache
from gptcache.adapter import openai
# Set your OpenAI API key from an environment variable
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "sk-...")
# Initialize GPTCache
cache.init()
# The gptcache.adapter.openai module automatically wraps the openai library
# Subsequent OpenAI API calls will use the cache
response1 = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hello, what is the capital of France?"}
]
)
print(f"First response (likely from LLM): {response1.choices[0].message.content}")
# A second identical request will hit the cache for faster response and cost savings
response2 = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hello, what is the capital of France?"}
]
)
print(f"Second response (likely from cache): {response2.choices[0].message.content}")