{"id":6707,"library":"llama-index-llms-ollama","title":"LlamaIndex Ollama LLM Integration","description":"The `llama-index-llms-ollama` library provides an integration for LlamaIndex to utilize Large Language Models (LLMs) hosted locally via Ollama. It enables users to leverage various open-source models (like Llama, Mistral, Gemma, Phi-3, etc.) for tasks such as completions and chat within a LlamaIndex application, without relying on cloud-based LLM services. The current version is 0.10.1, released on March 20, 2026, and follows LlamaIndex's active and rapid release cadence for its integration packages.","status":"active","version":"0.10.1","language":"en","source_language":"en","source_url":"https://github.com/run-llama/llama_index","tags":["llm","ollama","llama-index","rag","local-llm","on-premise"],"install":[{"cmd":"pip install llama-index-llms-ollama","lang":"bash","label":"Install package"}],"dependencies":[{"reason":"This is a LlamaIndex integration, requiring the core LlamaIndex framework to function.","package":"llama-index-core","optional":false},{"reason":"Requires the Ollama server application to be installed and running locally to serve the LLM models.","package":"ollama","optional":false}],"imports":[{"note":"This is the standard import path for the Ollama LLM integration. Older versions of LlamaIndex might have different structures, but for recent versions, this is correct.","symbol":"Ollama","correct":"from llama_index.llms.ollama import Ollama"}],"quickstart":{"code":"# First, ensure Ollama is installed and running, and pull a model:\n# On your terminal:\n# curl -fsSL https://ollama.com/install.sh | sh\n# ollama serve\n# ollama pull llama3.1\n\nfrom llama_index.llms.ollama import Ollama\nfrom llama_index.core.llms import ChatMessage\nimport os\n\n# Initialize Ollama LLM. Adjust model and timeout as needed.\n# Ensure the model 'llama3.1' is pulled via 'ollama pull llama3.1'\nllm = Ollama(\n    model=\"llama3.1:latest\",\n    request_timeout=120.0, # Increase timeout from default 30s if model is slow\n    # context_window=8000 # Optionally set context window to limit memory usage\n)\n\n# Generate a completion\nresponse_completion = llm.complete(\"Tell me a short story about a brave knight.\")\nprint(\"\\n--- Completion Response ---\")\nprint(response_completion)\n\n# Send a chat message\nmessages = [\n    ChatMessage(role=\"system\", content=\"You are a helpful assistant.\"),\n    ChatMessage(role=\"user\", content=\"What is the capital of France?\")\n]\nresponse_chat = llm.chat(messages)\nprint(\"\\n--- Chat Response ---\")\nprint(response_chat.message.content)","lang":"python","description":"This quickstart demonstrates how to initialize the `Ollama` LLM and use it for both text completion and chat interactions within LlamaIndex. It assumes the Ollama server is running locally and that a model like `llama3.1` has been pulled using `ollama pull llama3.1`. The `request_timeout` is increased for robustness, and `context_window` is an optional parameter for memory management."},"warnings":[{"fix":"Install Ollama from https://ollama.ai/, run `ollama serve` in your terminal, and pull your chosen model (e.g., `ollama pull llama3.1`).","message":"Ollama Server Prerequisite: The Ollama application must be installed and actively running on your local machine, and the desired LLM model (e.g., `llama3.1`) must be pulled using `ollama pull <model_name>` before this integration can connect to it.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Increase the `request_timeout` parameter when initializing `Ollama`: `llm = Ollama(..., request_timeout=120.0)`.","message":"Default Timeout: The default request timeout (often 30 seconds) may be too short for larger local LLMs or slower machines, leading to `Timeout` errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Monitor system RAM usage. Consider using smaller, quantized models (e.g., `llama3.1:7b-q4_0`), or adjust the `context_window` parameter in the `Ollama` constructor to limit memory consumption.","message":"High Memory Usage: Running large local LLMs (e.g., Llama 3.1 8B) through Ollama can be memory-intensive, often requiring 32GB of RAM or more, especially when combined with embedding models.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure the package is installed in your active environment (`pip install llama-index-llms-ollama`) and that your IDE is configured to use the correct Python interpreter. Restarting the IDE can also help.","message":"ModuleNotFoundError: Users frequently encounter `ModuleNotFoundError` if `llama-index-llms-ollama` is not installed in the currently active Python environment, or if their IDE (e.g., VS Code's Pylance) is configured to use a different interpreter.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If both integrations are needed, carefully check their respective `ollama-client` dependencies and try to find compatible versions. In some cases, separate virtual environments might be required or reporting the issue to LlamaIndex.","message":"Conflicting Ollama Client Versions: There have been reports of conflicts when trying to install both `llama-index-multi-modal-llms-ollama` and `llama-index-llms-ollama` due to differing `ollama-client` version requirements.","severity":"gotcha","affected_versions":"Specific version ranges of `llama-index-multi-modal-llms-ollama` and `llama-index-llms-ollama` (e.g., around 0.11.1 for multi-modal package)"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}