GPTCache

0.1.44 · active · verified Tue Apr 14

GPTCache is a powerful caching library designed to speed up and lower the cost of chat applications that rely on Large Language Model (LLM) services. It functions as a semantic cache, storing and retrieving responses for similar (not just exact) queries using embedding algorithms and vector stores. The library is actively maintained with frequent minor releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to integrate GPTCache with the OpenAI API. After initializing the cache, subsequent OpenAI calls will automatically leverage the semantic caching capabilities. The first query will likely go to the LLM, while identical or semantically similar subsequent queries will be served from the cache.

import os
from gptcache import cache
from gptcache.adapter import openai

# Set your OpenAI API key from an environment variable
os.environ["OPENAI_API_KEY"] = os.environ.get("OPENAI_API_KEY", "sk-...")

# Initialize GPTCache
cache.init()

# The gptcache.adapter.openai module automatically wraps the openai library
# Subsequent OpenAI API calls will use the cache
response1 = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Hello, what is the capital of France?"}
    ]
)
print(f"First response (likely from LLM): {response1.choices[0].message.content}")

# A second identical request will hit the cache for faster response and cost savings
response2 = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Hello, what is the capital of France?"}
    ]
)
print(f"Second response (likely from cache): {response2.choices[0].message.content}")

view raw JSON →