Databricks LangChain Integration
The databricks-langchain library provides support for integrating Databricks AI capabilities, such as LLMs and Vector Search, directly into LangChain applications. It enables developers to leverage Databricks-hosted models and data artifacts within the LangChain ecosystem. The current version is 0.18.0, with releases typically aligning with LangChain community updates and Databricks feature rollouts.
Warnings
- gotcha Authentication failures are common. Ensure `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables are correctly set or passed directly. The host URL should include the `https://` prefix.
- gotcha Incorrect import paths. While the package name is `databricks-langchain`, its components are integrated into LangChain's community modules and should be imported from `langchain_community`.
- gotcha Pydantic version conflicts are frequent within the LangChain ecosystem. `databricks-langchain` currently requires `pydantic<3`. If other libraries in your environment demand `pydantic>=2`, you may encounter dependency resolution issues.
- gotcha The `model` parameter for `ChatDatabricks` (and similar components) must refer to a valid and accessible LLM deployment within your Databricks workspace. Using an incorrect or unavailable model name will result in an API error.
Install
-
pip install databricks-langchain
Imports
- ChatDatabricks
from langchain_community.chat_models import ChatDatabricks
- DatabricksVectorSearch
from langchain_community.vectorstores import DatabricksVectorSearch
Quickstart
import os
from langchain_community.chat_models import ChatDatabricks
from langchain_core.messages import HumanMessage
# Ensure these environment variables are set for authentication:
# os.environ["DATABRICKS_HOST"] = "https://<your-workspace-url>"
# os.environ["DATABRICKS_TOKEN"] = "dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Retrieve credentials from environment variables
host = os.environ.get("DATABRICKS_HOST", "")
token = os.environ.get("DATABRICKS_TOKEN", "")
if not host or not token:
print("Error: DATABRICKS_HOST and DATABRICKS_TOKEN environment variables must be set.")
# In a production environment, you would likely raise an exception or handle gracefully.
exit(1)
# Initialize the Databricks chat model
llm = ChatDatabricks(
databricks_host=host,
databricks_token=token,
temperature=0.1,
model="databricks-mixtral-8x7b-instruct" # Use an appropriate model available in your workspace
)
# Invoke the model with a string prompt
response = llm.invoke("Explain the concept of RAG in LLMs in one sentence.")
print(f"\nString prompt response: {response.content}")
# Example with a list of messages (common for chat models)
messages = [
HumanMessage(content="What is the capital of France?"),
]
response_chat = llm.invoke(messages)
print(f"\nChat response: {response_chat.content}")