CSS Selectors for Python ElementTree

raw JSON →
0.9.0 verified Tue May 12 auth: no python install: verified

cssselect2 is a straightforward implementation of CSS3 and CSS4 Selectors for markup documents (HTML, XML, etc.) that can be read by ElementTree-like parsers (including cElementTree, lxml, and html5lib). Unlike its predecessor `cssselect`, it does not translate selectors to XPath, aiming to resolve correctness issues inherent in that approach. The library is actively maintained, with its current version being 0.9.0, and releases occur several times a year, often coinciding with Python version updates and new CSS selector features.

pip install cssselect2
error ModuleNotFoundError: No module named 'cssselect2'
cause The `cssselect2` library has not been installed in the active Python environment.
fix
Install the library using pip: pip install cssselect2
error cssselect2.parser.SelectorError: (<FunctionBlock url( … )>, 'expected a compound selector, got function')
cause This error occurs when the provided CSS selector string is syntactically incorrect or attempts to use features (like `@import url(...)`) that are not valid for element selection in `cssselect2`.
fix
Review the CSS selector for syntax errors. Ensure the selector adheres to valid CSS selector syntax for selecting elements within a document. For example, cssselect2 is for selecting elements, not parsing entire CSS stylesheets with directives like @import.
error AttributeError: 'ElementTree' object has no attribute 'getiterator'
cause This error typically arises when `cssselect2` (or the code interacting with it) is used with `xml.etree.ElementTree` in Python 3.9 or later, where the `getiterator()` method was removed.
fix
Ensure cssselect2 is updated to its latest version (pip install --upgrade cssselect2). If directly manipulating ElementTree objects, replace deprecated methods like getiterator() with the modern iter() method.
breaking Support for older Python versions is regularly dropped. Version 0.9.0 dropped Python 3.9 support, and previous versions (0.8.0, 0.5.0, 0.4.0) have also removed support for Python 3.8, 3.6, and 3.5 respectively. Ensure your environment uses a supported Python version (currently >=3.10 for 0.9.0).
fix Upgrade your Python environment to a version officially supported by `cssselect2` (e.g., Python 3.10+ for `cssselect2` 0.9.0) before upgrading the library.
deprecated The `iter_ancestors` and `iter_previous_siblings` methods on `ElementWrapper` were deprecated in version 0.6.0 and removed in 0.7.0. Attempting to call these methods will result in an AttributeError.
fix Replace calls to `element.iter_ancestors()` with `element.ancestors` and `element.iter_previous_siblings()` with `element.previous_siblings`. These are now properties returning tuples.
gotcha When working with `ElementWrapper` objects, it's crucial not to instantiate them directly. Doing so can lead to unexpected behavior and may not correctly establish the necessary parent/sibling relationships for selector matching.
fix Always use factory methods like `cssselect2.ElementWrapper.from_xml_root(element)` or `cssselect2.ElementWrapper.from_html_root(element)` to create the initial `ElementWrapper` for the document's root element. Other elements should be accessed through the methods provided by the `ElementWrapper` itself (e.g., `iter_children()`, `iter_subtree()`).
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 0.02s 18.3M
3.10 alpine (musl) - - 0.02s 18.3M
3.10 slim (glibc) wheel 1.6s 0.01s 19M
3.10 slim (glibc) - - 0.01s 19M
3.11 alpine (musl) wheel - 0.03s 20.2M
3.11 alpine (musl) - - 0.04s 20.2M
3.11 slim (glibc) wheel 1.7s 0.03s 21M
3.11 slim (glibc) - - 0.02s 21M
3.12 alpine (musl) wheel - 0.02s 12.0M
3.12 alpine (musl) - - 0.03s 12.0M
3.12 slim (glibc) wheel 1.6s 0.03s 13M
3.12 slim (glibc) - - 0.02s 13M
3.13 alpine (musl) wheel - 0.02s 11.8M
3.13 alpine (musl) - - 0.03s 11.7M
3.13 slim (glibc) wheel 1.6s 0.02s 12M
3.13 slim (glibc) - - 0.02s 12M
3.9 alpine (musl) wheel - 0.02s 17.8M
3.9 alpine (musl) - - 0.03s 17.8M
3.9 slim (glibc) wheel 1.9s 0.01s 18M
3.9 slim (glibc) - - 0.02s 18M

This quickstart demonstrates the core workflow of `cssselect2`. It involves parsing a CSS stylesheet using `tinycss2`, compiling its selectors into a `cssselect2.Matcher` object, parsing an HTML document with an ElementTree-like parser, wrapping the root element in `cssselect2.ElementWrapper`, and then iterating through the wrapped elements to find matching CSS rules.

from xml.etree import ElementTree
import cssselect2
import tinycss2

# 1. Parse CSS and add rules to the matcher
matcher = cssselect2.Matcher()
css_rules = tinycss2.parse_stylesheet('p { color: blue; } body p { background: red; }', skip_whitespace=True)
for rule in css_rules:
    if rule.type == 'qualified-rule': # Handle only actual CSS rules
        selectors = cssselect2.compile_selector_list(rule.prelude)
        payload = (tinycss2.serialize(rule.prelude), tinycss2.serialize(rule.content))
        for selector in selectors:
            matcher.add_selector(selector, payload)

# 2. Parse HTML and wrap the tree
html_content = '''
<html>
<body>
    <div>
        <p class="intro">Hello <span>World</span>!</p>
        <p>Another paragraph.</p>
    </div>
</body>
</html>
'''
html_tree = ElementTree.fromstring(html_content)
wrapper = cssselect2.ElementWrapper.from_html_root(html_tree)

# 3. Find CSS rules applying to each tag
print('Matching CSS rules:')
for element in wrapper.iter_subtree():
    tag = element.etree_element.tag.split('}')[-1] # Handle namespaces if present
    matches = matcher.match(element)
    if matches:
        print(f'  Tag "{tag}" matches:')
        for match in matches:
            specificity, order, pseudo_type, payload = match
            selector_string, content_string = payload
            print(f'    - Selector: "{selector_string}" (Declarations: "{content_string}")')