Itemloaders

1.4.0 · active · verified Fri Apr 10

Itemloaders is a base library for Scrapy's ItemLoader, providing a robust and flexible way to parse and populate Scrapy Items. It handles data extraction from various sources (XPath, CSS, regular expressions, JMESPath) and processes it through a chain of input and output processors. The current version is 1.4.0, and the library maintains an active release cadence, frequently updating Python version support.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define a simple Item, create an `ItemLoader` inheriting from `itemloaders.ItemLoader`, and use CSS selectors, XPath, and custom processors (`MapCompose`, `TakeFirst`) to extract and process data from an HTML string using `parsel.Selector` to populate the item fields.

import re
from itemloaders import ItemLoader
from itemloaders.processors import TakeFirst, MapCompose

# A minimal Scrapy-like Item (often defined as scrapy.Item)
class MyItem:
    def __init__(self, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)

    def __repr__(self):
        return str(self.__dict__)

# Define an ItemLoader for MyItem
class ProductLoader(ItemLoader):
    default_item_class = MyItem
    default_output_processor = TakeFirst()

    name_in = MapCompose(lambda x: x.strip(), str.title)
    price_out = MapCompose(lambda x: x.replace('$', ''), float)
    description_in = MapCompose(lambda x: x.strip())

# Example HTML fragment
html_data = '''
<div class="product">
    <h1 class="name">  product a  </h1>
    <span class="price">$12.99</span>
    <div class="description">A really good product.</div>
</div>
'''

# Using parsel.Selector for data extraction
from parsel import Selector
selector = Selector(text=html_data)

# Instantiate the loader and populate the item
loader = ProductLoader(selector=selector)
loader.add_css('name', '.name::text')
loader.add_xpath('price', '//span[@class="price"]/text()')
loader.add_value('description', 'Short description from custom source.') # Add a fixed value
loader.add_css('description', '.description::text') # Can add multiple sources for the same field

# Load the item
item = loader.load_item()

print(item)
# Expected output: {'name': 'Product A', 'price': 12.99, 'description': 'A really good product.'}

view raw JSON →