CSS Selectors for Python ElementTree
cssselect2 is a straightforward implementation of CSS3 and CSS4 Selectors for markup documents (HTML, XML, etc.) that can be read by ElementTree-like parsers (including cElementTree, lxml, and html5lib). Unlike its predecessor `cssselect`, it does not translate selectors to XPath, aiming to resolve correctness issues inherent in that approach. The library is actively maintained, with its current version being 0.9.0, and releases occur several times a year, often coinciding with Python version updates and new CSS selector features.
Common errors
-
ModuleNotFoundError: No module named 'cssselect2'
cause The `cssselect2` library has not been installed in the active Python environment.fixInstall the library using pip: `pip install cssselect2` -
cssselect2.parser.SelectorError: (<FunctionBlock url( … )>, 'expected a compound selector, got function')
cause This error occurs when the provided CSS selector string is syntactically incorrect or attempts to use features (like `@import url(...)`) that are not valid for element selection in `cssselect2`.fixReview the CSS selector for syntax errors. Ensure the selector adheres to valid CSS selector syntax for selecting elements within a document. For example, `cssselect2` is for selecting elements, not parsing entire CSS stylesheets with directives like `@import`. -
AttributeError: 'ElementTree' object has no attribute 'getiterator'
cause This error typically arises when `cssselect2` (or the code interacting with it) is used with `xml.etree.ElementTree` in Python 3.9 or later, where the `getiterator()` method was removed.fixEnsure `cssselect2` is updated to its latest version (`pip install --upgrade cssselect2`). If directly manipulating `ElementTree` objects, replace deprecated methods like `getiterator()` with the modern `iter()` method.
Warnings
- breaking Support for older Python versions is regularly dropped. Version 0.9.0 dropped Python 3.9 support, and previous versions (0.8.0, 0.5.0, 0.4.0) have also removed support for Python 3.8, 3.6, and 3.5 respectively. Ensure your environment uses a supported Python version (currently >=3.10 for 0.9.0).
- deprecated The `iter_ancestors` and `iter_previous_siblings` methods on `ElementWrapper` were deprecated in version 0.6.0 and removed in 0.7.0. Attempting to call these methods will result in an AttributeError.
- gotcha When working with `ElementWrapper` objects, it's crucial not to instantiate them directly. Doing so can lead to unexpected behavior and may not correctly establish the necessary parent/sibling relationships for selector matching.
Install
-
pip install cssselect2
Imports
- Matcher
from cssselect2 import Matcher
- ElementWrapper
cssselect2.ElementWrapper(etree_element, parent, index, previous, in_html_document)
from cssselect2 import ElementWrapper
- compile_selector_list
from cssselect2 import compile_selector_list
Quickstart
from xml.etree import ElementTree
import cssselect2
import tinycss2
# 1. Parse CSS and add rules to the matcher
matcher = cssselect2.Matcher()
css_rules = tinycss2.parse_stylesheet('p { color: blue; } body p { background: red; }', skip_whitespace=True)
for rule in css_rules:
if rule.type == 'qualified-rule': # Handle only actual CSS rules
selectors = cssselect2.compile_selector_list(rule.prelude)
payload = (tinycss2.serialize(rule.prelude), tinycss2.serialize(rule.content))
for selector in selectors:
matcher.add_selector(selector, payload)
# 2. Parse HTML and wrap the tree
html_content = '''
<html>
<body>
<div>
<p class="intro">Hello <span>World</span>!</p>
<p>Another paragraph.</p>
</div>
</body>
</html>
'''
html_tree = ElementTree.fromstring(html_content)
wrapper = cssselect2.ElementWrapper.from_html_root(html_tree)
# 3. Find CSS rules applying to each tag
print('Matching CSS rules:')
for element in wrapper.iter_subtree():
tag = element.etree_element.tag.split('}')[-1] # Handle namespaces if present
matches = matcher.match(element)
if matches:
print(f' Tag "{tag}" matches:')
for match in matches:
specificity, order, pseudo_type, payload = match
selector_string, content_string = payload
print(f' - Selector: "{selector_string}" (Declarations: "{content_string}")')