Parsel

1.11.0 · active · verified Thu Apr 09

Parsel is a powerful Python library designed to extract data from HTML and XML documents using XPath and CSS selectors. It provides a flexible and efficient way to navigate and query web content, making it a common dependency for web scraping tools. The current version is 1.11.0, and it maintains an active development cycle with frequent updates, often tied to Python version support and dependency requirement changes.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a `Selector` from HTML or JSON text, and then use both CSS selectors and XPath expressions to extract data. It also includes an example of JMESPath usage for JSON documents, introduced in Parsel 1.8.0.

from parsel import Selector

html_doc = '''
<html>
<head><title>My Awesome Page</title></head>
<body>
    <div id="main">
        <h1>Hello Parsel!</h1>
        <p class="intro">This is an <a href="/example">introductory</a> paragraph.</p>
        <ul>
            <li>Item 1</li>
            <li>Item 2</li>
        </ul>
    </div>
</body>
</html>
'''

# Create a Selector from HTML text
selector = Selector(text=html_doc)

# Extract title using CSS selector
title = selector.css('title::text').get()
print(f"Title: {title}")

# Extract H1 text using XPath
h1_text = selector.xpath('//h1/text()').get()
print(f"H1 Text: {h1_text}")

# Extract all list items using CSS selector
list_items = selector.css('ul li::text').getall()
print(f"List Items: {list_items}")

# Extract attribute using CSS selector
link_href = selector.css('.intro a::attr(href)').get()
print(f"Link href: {link_href}")

# Example with JSON and JMESPath (Parsel >= 1.8.0)
json_doc = '{"data": {"products": [{"id": 1, "name": "Laptop"}, {"id": 2, "name": "Mouse"}]}}'
json_selector = Selector(text=json_doc, type='json')
product_names = json_selector.jmespath('data.products[*].name').getall()
print(f"Product names (JMESPath): {product_names}")

view raw JSON →