CSS Selectors for Python ElementTree

0.9.0 · active · verified Sun Mar 29

cssselect2 is a straightforward implementation of CSS3 and CSS4 Selectors for markup documents (HTML, XML, etc.) that can be read by ElementTree-like parsers (including cElementTree, lxml, and html5lib). Unlike its predecessor `cssselect`, it does not translate selectors to XPath, aiming to resolve correctness issues inherent in that approach. The library is actively maintained, with its current version being 0.9.0, and releases occur several times a year, often coinciding with Python version updates and new CSS selector features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the core workflow of `cssselect2`. It involves parsing a CSS stylesheet using `tinycss2`, compiling its selectors into a `cssselect2.Matcher` object, parsing an HTML document with an ElementTree-like parser, wrapping the root element in `cssselect2.ElementWrapper`, and then iterating through the wrapped elements to find matching CSS rules.

from xml.etree import ElementTree
import cssselect2
import tinycss2

# 1. Parse CSS and add rules to the matcher
matcher = cssselect2.Matcher()
css_rules = tinycss2.parse_stylesheet('p { color: blue; } body p { background: red; }', skip_whitespace=True)
for rule in css_rules:
    if rule.type == 'qualified-rule': # Handle only actual CSS rules
        selectors = cssselect2.compile_selector_list(rule.prelude)
        payload = (tinycss2.serialize(rule.prelude), tinycss2.serialize(rule.content))
        for selector in selectors:
            matcher.add_selector(selector, payload)

# 2. Parse HTML and wrap the tree
html_content = '''
<html>
<body>
    <div>
        <p class="intro">Hello <span>World</span>!</p>
        <p>Another paragraph.</p>
    </div>
</body>
</html>
'''
html_tree = ElementTree.fromstring(html_content)
wrapper = cssselect2.ElementWrapper.from_html_root(html_tree)

# 3. Find CSS rules applying to each tag
print('Matching CSS rules:')
for element in wrapper.iter_subtree():
    tag = element.etree_element.tag.split('}')[-1] # Handle namespaces if present
    matches = matcher.match(element)
    if matches:
        print(f'  Tag "{tag}" matches:')
        for match in matches:
            specificity, order, pseudo_type, payload = match
            selector_string, content_string = payload
            print(f'    - Selector: "{selector_string}" (Declarations: "{content_string}")')

view raw JSON →