LangChain Hugging Face Integration
Langchain-huggingface provides integrations to leverage Hugging Face models and pipelines within the LangChain ecosystem. This includes support for various Hugging Face LLMs and embeddings, allowing users to connect to the Hugging Face Hub, Inference Endpoints, or run models locally via the transformers library. As of its current version 1.2.1, it's a dedicated integration package that follows LangChain's modular architecture, with frequent updates aligning with the broader LangChain ecosystem.
Warnings
- breaking The LangChain ecosystem transitioned from a monolithic 'langchain' package to modular 'langchain-*' integration packages. Components previously imported directly from `langchain` (e.g., `langchain.llms.HuggingFaceHub`) are now in `langchain-huggingface` (e.g., `langchain_huggingface.llms.HuggingFacePipeline`).
- gotcha Running large Hugging Face models locally requires significant hardware resources (RAM/VRAM). Attempting to load or run large models on insufficient hardware will lead to slow inference, out-of-memory errors, or crashes.
- gotcha Many Hugging Face models, especially when accessed via the Hugging Face Hub Inference API or for downloading private models, require an `HUGGINGFACEHUB_API_TOKEN`. Without it, you might encounter authentication errors or rate limits.
- gotcha Using `trust_remote_code=True` when loading models from Hugging Face can be a security risk as it executes arbitrary code from the model repository. This is sometimes required for custom architectures or tokenizers.
- gotcha Version conflicts can arise between `langchain-huggingface`, `langchain-core`, `transformers`, and `huggingface-hub`. These packages often have strict dependency ranges, and mismatched versions can cause unexpected behavior or import errors.
Install
-
pip install langchain-huggingface
Imports
- ChatHuggingFace
from langchain_huggingface.chat_models import ChatHuggingFace
- HuggingFacePipeline
from langchain_huggingface.llms import HuggingFacePipeline
- HuggingFaceEmbeddings
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
Quickstart
import os
from langchain_huggingface.llms import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Set your Hugging Face API token if accessing models from the Hub
# os.environ["HUGGINGFACEHUB_API_TOKEN"] = os.environ.get("HUGGINGFACEHUB_API_TOKEN", "")
# Initialize the HuggingFacePipeline with a small, accessible model
# Ensure 'transformers' and a deep learning backend (e.g., 'torch') are installed.
llm = HuggingFacePipeline.from_model_id(
model_id="google/flan-t5-small",
task="text2text-generation",
pipeline_kwargs={"max_new_tokens": 100},
# Pass token explicitly if needed, e.g., for private models or inference endpoints
# model_kwargs={"huggingfacehub_api_token": os.environ.get("HUGGINGFACEHUB_API_TOKEN", "")}
)
# Create a simple prompt template
template = "Question: {question}\nAnswer:"
prompt = PromptTemplate.from_template(template)
# Create a chain
chain = prompt | llm | StrOutputParser()
# Invoke the chain
question = "What is the capital of France?"
response = chain.invoke({"question": question})
print(response)