LlamaIndex File Readers

0.6.0 · active · verified Fri Apr 10

The `llama-index-readers-file` library provides specialized data loaders for various local file formats (e.g., PDF, DOCX, CSV, TXT, Image) within the LlamaIndex ecosystem. It allows users to ingest different file types into LlamaIndex Document objects for indexing and retrieval. Current version is 0.6.0, with releases typically aligning with LlamaIndex core library updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the `FlatReader` from `llama-index-readers-file` to load a plain text document. It creates a temporary file, loads its content into LlamaIndex Document objects, and prints a snippet of the loaded text. Remember to install the `llama-index-readers-file` package.

import tempfile
from pathlib import Path
from llama_index.readers.file import FlatReader

# Create a dummy text file
file_content = "This is a sample document for LlamaIndex. It contains some text."
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as tmp_file:
    tmp_file.write(file_content)
    tmp_file_path = Path(tmp_file.name)

# Initialize the FlatReader
reader = FlatReader()

# Load data from the temporary file
documents = reader.load_data(file=tmp_file_path)

# Print the content of the first document
if documents:
    print(f"Loaded document content: {documents[0].text[:100]}...")
    print(f"Metadata: {documents[0].metadata}")

# Clean up the temporary file
tmp_file_path.unlink()

view raw JSON →