CSS Selectors for Python ElementTree
cssselect2 is a straightforward implementation of CSS3 and CSS4 Selectors for markup documents (HTML, XML, etc.) that can be read by ElementTree-like parsers (including cElementTree, lxml, and html5lib). Unlike its predecessor `cssselect`, it does not translate selectors to XPath, aiming to resolve correctness issues inherent in that approach. The library is actively maintained, with its current version being 0.9.0, and releases occur several times a year, often coinciding with Python version updates and new CSS selector features.
Warnings
- breaking Support for older Python versions is regularly dropped. Version 0.9.0 dropped Python 3.9 support, and previous versions (0.8.0, 0.5.0, 0.4.0) have also removed support for Python 3.8, 3.6, and 3.5 respectively. Ensure your environment uses a supported Python version (currently >=3.10 for 0.9.0).
- deprecated The `iter_ancestors` and `iter_previous_siblings` methods on `ElementWrapper` were deprecated in version 0.6.0 and removed in 0.7.0. Attempting to call these methods will result in an AttributeError.
- gotcha When working with `ElementWrapper` objects, it's crucial not to instantiate them directly. Doing so can lead to unexpected behavior and may not correctly establish the necessary parent/sibling relationships for selector matching.
Install
-
pip install cssselect2
Imports
- Matcher
from cssselect2 import Matcher
- ElementWrapper
from cssselect2 import ElementWrapper
- compile_selector_list
from cssselect2 import compile_selector_list
Quickstart
from xml.etree import ElementTree
import cssselect2
import tinycss2
# 1. Parse CSS and add rules to the matcher
matcher = cssselect2.Matcher()
css_rules = tinycss2.parse_stylesheet('p { color: blue; } body p { background: red; }', skip_whitespace=True)
for rule in css_rules:
if rule.type == 'qualified-rule': # Handle only actual CSS rules
selectors = cssselect2.compile_selector_list(rule.prelude)
payload = (tinycss2.serialize(rule.prelude), tinycss2.serialize(rule.content))
for selector in selectors:
matcher.add_selector(selector, payload)
# 2. Parse HTML and wrap the tree
html_content = '''
<html>
<body>
<div>
<p class="intro">Hello <span>World</span>!</p>
<p>Another paragraph.</p>
</div>
</body>
</html>
'''
html_tree = ElementTree.fromstring(html_content)
wrapper = cssselect2.ElementWrapper.from_html_root(html_tree)
# 3. Find CSS rules applying to each tag
print('Matching CSS rules:')
for element in wrapper.iter_subtree():
tag = element.etree_element.tag.split('}')[-1] # Handle namespaces if present
matches = matcher.match(element)
if matches:
print(f' Tag "{tag}" matches:')
for match in matches:
specificity, order, pseudo_type, payload = match
selector_string, content_string = payload
print(f' - Selector: "{selector_string}" (Declarations: "{content_string}")')