LlamaIndex Readers Web
raw JSON → 0.6.0 verified Fri May 01 auth: no python
A collection of web-based data readers for LlamaIndex, enabling ingestion from URLs, web pages, and online documents. Currently at version 0.6.0, released under the LlamaIndex ecosystem. Release cadence is irregular, tied to LlamaIndex updates.
pip install llama-index-readers-web Common errors
error ModuleNotFoundError: No module named 'llama_index.readers.web' ↓
cause Older version of llama-index-readers-web or incorrect import path.
fix
Upgrade to latest version (>=0.10.0) and use correct import: from llama_index.readers.web import ...
error ImportError: cannot import name 'SimpleWebPageReader' from 'llama_index' ↓
cause Using old top-level import path from before v0.10.
fix
Change import to: from llama_index.readers.web import SimpleWebPageReader
error ValueError: You must provide at least one URL. ↓
cause Called load_data() with an empty list or no urls parameter.
fix
Pass a non-empty list of URL strings: reader.load_data(urls=['https://example.com'])
Warnings
breaking Import paths changed in v0.10+. Readers are now under llama_index.readers.web, not top-level llama_index.web. ↓
fix Use 'from llama_index.readers.web import ...' instead of 'from llama_index import ...'.
deprecated SimpleWebPageReader requires requests and beautifulsoup4 as dependencies; they are not installed by default. ↓
fix Install extra dependencies: pip install llama-index-readers-web[beautifulsoup4] or pip install beautifulsoup4 requests.
gotcha Some readers (e.g., BeautifulSoupWebReader) require additional dependencies like lxml for certain parsers. ↓
fix Install lxml if you encounter parser errors: pip install lxml.
Imports
- BeautifulSoupWebReader wrong
from llama_index.readers.web.BeautifulSoupWebReader import BeautifulSoupWebReadercorrectfrom llama_index.readers.web import BeautifulSoupWebReader - SimpleWebPageReader wrong
from llama_index.web import SimpleWebPageReadercorrectfrom llama_index.readers.web import SimpleWebPageReader - TrafilaturaWebReader
from llama_index.readers.web import TrafilaturaWebReader
Quickstart
from llama_index.readers.web import SimpleWebPageReader
reader = SimpleWebPageReader()
docs = reader.load_data(urls=["https://example.com"])
print(docs[0].text[:100])