cssselect: CSS Selectors for Python
cssselect is a BSD-licensed Python library that parses CSS3 Selectors and translates them into XPath 1.0 expressions. These XPath expressions can then be used with an XPath engine like lxml to find matching elements in XML or HTML documents. The library is currently at version 1.4.0 and maintains an active development cycle with releases published on PyPI.
Common errors
-
ImportError: cssselect seems not to be installed.
cause The cssselect library, although historically integrated with lxml, is now an independent package and needs to be installed separately for lxml's CSS selector functionality to work.fixpip install cssselect -
AttributeError: 'lxml.etree._Element' object has no attribute 'cssselect'
cause This error occurs when attempting to use the .cssselect() method on an lxml element without the cssselect package being properly installed in the Python environment.fixpip install cssselect -
cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '(' at ...>cause The CSS selector string provided contains a syntax error, such as unescaped special characters (e.g., parentheses, colons) in class names or malformed selector patterns, which prevents cssselect from parsing it correctly.fixReview the CSS selector for syntax validity, ensuring proper escaping of special characters (e.g., `\` for colons or periods in names, or quoting attribute values with spaces) and correct adherence to CSS selector grammar. -
cssselect.xpath.ExpressionError: Unknown or unsupported selector (eg. pseudo-class)
cause The CSS selector uses a pseudo-class, pseudo-element, or other advanced feature that cssselect cannot translate into a valid XPath 1.0 expression because XPath 1.0 has limitations compared to modern CSS selectors.fixSimplify the CSS selector to use only features supported by cssselect and XPath 1.0, or consider using direct XPath expressions for complex selections.
Warnings
- breaking Version 1.2.0 (released 2022-10-27) dropped support for Python 2.7, 3.4, 3.5, and 3.6. Ensure your environment uses Python 3.7 or newer.
- breaking Between versions 0.9 and 0.9.1, the `selector_to_xpath()` function's default behavior for `translate_pseudo_elements` changed. In 0.9.1+, it defaults to `False` (ignoring pseudo-elements), reverting an accidental change in 0.9 which defaulted to `True` (rejecting them). When using `selector_to_xpath()` directly, explicitly set `translate_pseudo_elements=True` if you need pseudo-element support. `css_to_xpath()` is unaffected.
- gotcha The customization API, allowing subclassing of `GenericTranslator` or `HTMLTranslator` to override methods, is not considered stable. Its signature or behavior might change in future versions, potentially breaking your custom subclasses.
- gotcha XPath 1.0, which `cssselect` translates to, does not natively support pseudo-elements (e.g., `::before`, `::after`). While `cssselect`'s `css_to_xpath()` provides some translation, `selector_to_xpath()` explicitly ignores them by default. This can lead to unexpected results if pseudo-elements are part of your CSS selectors.
- breaking The `lxml` library is a required dependency for `cssselect`. If `lxml` is not installed in your Python environment, importing or using `cssselect` will result in a `ModuleNotFoundError`.
- gotcha The `cssselect` library is frequently used in conjunction with other parsing libraries like `lxml` for HTML/XML processing. If your application or test script uses `lxml` (e.g., for parsing documents), ensure it is explicitly installed. `cssselect` does not list `lxml` as a direct dependency.
Install
-
pip install cssselect
Imports
- GenericTranslator
from cssselect import GenericTranslator
- HTMLTranslator
from cssselect import HTMLTranslator
- SelectorError
from cssselect import SelectorError
- css_to_xpath
from cssselect.xpath import css_to_xpath
from cssselect import css_to_xpath
- SelectorSyntaxError
from cssselect import SelectorSyntaxError
Quickstart
from lxml.etree import fromstring
from cssselect import HTMLTranslator, SelectorError
html_doc = '''
<div id="outer">
<p class="content">
<span>Text 1</span>
</p>
<div id="inner" class="content body">
Text 2
<span>Text 3</span>
</div>
</div>
'''
try:
# Use HTMLTranslator for HTML documents for better pseudo-class handling
translator = HTMLTranslator()
xpath_expression = translator.css_to_xpath('div.content > span')
print(f"Generated XPath: {xpath_expression}")
document = fromstring(html_doc)
# Find all elements matching the XPath expression
matches = document.xpath(xpath_expression)
for element in matches:
print(f"Matched element tag: {element.tag}, text: {element.text.strip() if element.text else ''}")
except SelectorError as e:
print(f"Invalid CSS selector: {e}")