cssselect: CSS Selectors for Python
cssselect is a BSD-licensed Python library that parses CSS3 Selectors and translates them into XPath 1.0 expressions. These XPath expressions can then be used with an XPath engine like lxml to find matching elements in XML or HTML documents. The library is currently at version 1.4.0 and maintains an active development cycle with releases published on PyPI.
Warnings
- breaking Version 1.2.0 (released 2022-10-27) dropped support for Python 2.7, 3.4, 3.5, and 3.6. Ensure your environment uses Python 3.7 or newer.
- breaking Between versions 0.9 and 0.9.1, the `selector_to_xpath()` function's default behavior for `translate_pseudo_elements` changed. In 0.9.1+, it defaults to `False` (ignoring pseudo-elements), reverting an accidental change in 0.9 which defaulted to `True` (rejecting them). When using `selector_to_xpath()` directly, explicitly set `translate_pseudo_elements=True` if you need pseudo-element support. `css_to_xpath()` is unaffected.
- gotcha The customization API, allowing subclassing of `GenericTranslator` or `HTMLTranslator` to override methods, is not considered stable. Its signature or behavior might change in future versions, potentially breaking your custom subclasses.
- gotcha XPath 1.0, which `cssselect` translates to, does not natively support pseudo-elements (e.g., `::before`, `::after`). While `cssselect`'s `css_to_xpath()` provides some translation, `selector_to_xpath()` explicitly ignores them by default. This can lead to unexpected results if pseudo-elements are part of your CSS selectors.
Install
-
pip install cssselect
Imports
- GenericTranslator
from cssselect import GenericTranslator
- HTMLTranslator
from cssselect import HTMLTranslator
- SelectorError
from cssselect import SelectorError
- css_to_xpath
from cssselect import css_to_xpath
- SelectorSyntaxError
from cssselect import SelectorSyntaxError
Quickstart
from lxml.etree import fromstring
from cssselect import HTMLTranslator, SelectorError
html_doc = '''
<div id="outer">
<p class="content">
<span>Text 1</span>
</p>
<div id="inner" class="content body">
Text 2
<span>Text 3</span>
</div>
</div>
'''
try:
# Use HTMLTranslator for HTML documents for better pseudo-class handling
translator = HTMLTranslator()
xpath_expression = translator.css_to_xpath('div.content > span')
print(f"Generated XPath: {xpath_expression}")
document = fromstring(html_doc)
# Find all elements matching the XPath expression
matches = document.xpath(xpath_expression)
for element in matches:
print(f"Matched element tag: {element.tag}, text: {element.text.strip() if element.text else ''}")
except SelectorError as e:
print(f"Invalid CSS selector: {e}")