cssselect: CSS Selectors for Python

raw JSON →
1.4.0 verified Tue May 12 auth: no python install: verified

cssselect is a BSD-licensed Python library that parses CSS3 Selectors and translates them into XPath 1.0 expressions. These XPath expressions can then be used with an XPath engine like lxml to find matching elements in XML or HTML documents. The library is currently at version 1.4.0 and maintains an active development cycle with releases published on PyPI.

pip install cssselect
error ImportError: cssselect seems not to be installed.
cause The cssselect library, although historically integrated with lxml, is now an independent package and needs to be installed separately for lxml's CSS selector functionality to work.
fix
pip install cssselect
error AttributeError: 'lxml.etree._Element' object has no attribute 'cssselect'
cause This error occurs when attempting to use the .cssselect() method on an lxml element without the cssselect package being properly installed in the Python environment.
fix
pip install cssselect
error cssselect.parser.SelectorSyntaxError: Expected selector, got <DELIM '(' at ...>
cause The CSS selector string provided contains a syntax error, such as unescaped special characters (e.g., parentheses, colons) in class names or malformed selector patterns, which prevents cssselect from parsing it correctly.
fix
Review the CSS selector for syntax validity, ensuring proper escaping of special characters (e.g., \ for colons or periods in names, or quoting attribute values with spaces) and correct adherence to CSS selector grammar.
error cssselect.xpath.ExpressionError: Unknown or unsupported selector (eg. pseudo-class)
cause The CSS selector uses a pseudo-class, pseudo-element, or other advanced feature that cssselect cannot translate into a valid XPath 1.0 expression because XPath 1.0 has limitations compared to modern CSS selectors.
fix
Simplify the CSS selector to use only features supported by cssselect and XPath 1.0, or consider using direct XPath expressions for complex selections.
breaking Version 1.2.0 (released 2022-10-27) dropped support for Python 2.7, 3.4, 3.5, and 3.6. Ensure your environment uses Python 3.7 or newer.
fix Upgrade Python to version 3.7 or higher.
breaking Between versions 0.9 and 0.9.1, the `selector_to_xpath()` function's default behavior for `translate_pseudo_elements` changed. In 0.9.1+, it defaults to `False` (ignoring pseudo-elements), reverting an accidental change in 0.9 which defaulted to `True` (rejecting them). When using `selector_to_xpath()` directly, explicitly set `translate_pseudo_elements=True` if you need pseudo-element support. `css_to_xpath()` is unaffected.
fix For `selector_to_xpath()`, explicitly pass `translate_pseudo_elements=True` if you rely on pseudo-element translation, or `False` to ignore them. Consider using `css_to_xpath()` if pseudo-element behavior is critical and you want consistent default handling.
gotcha The customization API, allowing subclassing of `GenericTranslator` or `HTMLTranslator` to override methods, is not considered stable. Its signature or behavior might change in future versions, potentially breaking your custom subclasses.
fix Be aware that custom translator subclasses may require updates with new `cssselect` releases. Review the changelog and source code for any changes to the translation API.
gotcha XPath 1.0, which `cssselect` translates to, does not natively support pseudo-elements (e.g., `::before`, `::after`). While `cssselect`'s `css_to_xpath()` provides some translation, `selector_to_xpath()` explicitly ignores them by default. This can lead to unexpected results if pseudo-elements are part of your CSS selectors.
fix Avoid pseudo-elements in selectors intended for XPath 1.0, or be aware of their limited/non-existent translation. If using `selector_to_xpath()`, set `translate_pseudo_elements=True` to attempt translation, but be mindful of XPath 1.0 limitations.
breaking The `lxml` library is a required dependency for `cssselect`. If `lxml` is not installed in your Python environment, importing or using `cssselect` will result in a `ModuleNotFoundError`.
fix Ensure `lxml` is installed in your Python environment. Typically, `pip install cssselect` should also install `lxml`. If you are installing dependencies manually or in a constrained environment, ensure `pip install lxml` is executed. Note that `lxml` requires compilation tools and development headers on some systems.
gotcha The `cssselect` library is frequently used in conjunction with other parsing libraries like `lxml` for HTML/XML processing. If your application or test script uses `lxml` (e.g., for parsing documents), ensure it is explicitly installed. `cssselect` does not list `lxml` as a direct dependency.
fix Install `lxml` in your environment using `pip install lxml` if your project requires it alongside `cssselect`.
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 0.02s 17.9M
3.10 alpine (musl) - - 0.02s 17.9M
3.10 slim (glibc) wheel 1.5s 0.01s 18M
3.10 slim (glibc) - - 0.01s 18M
3.11 alpine (musl) wheel - 0.05s 19.8M
3.11 alpine (musl) - - 0.07s 19.8M
3.11 slim (glibc) wheel 1.6s 0.04s 20M
3.11 slim (glibc) - - 0.04s 20M
3.12 alpine (musl) wheel - 0.04s 11.7M
3.12 alpine (musl) - - 0.04s 11.7M
3.12 slim (glibc) wheel 1.4s 0.04s 12M
3.12 slim (glibc) - - 0.04s 12M
3.13 alpine (musl) wheel - 0.03s 11.4M
3.13 alpine (musl) - - 0.03s 11.3M
3.13 slim (glibc) wheel 1.4s 0.03s 12M
3.13 slim (glibc) - - 0.04s 12M
3.9 alpine (musl) wheel - 0.02s 17.4M
3.9 alpine (musl) - - 0.02s 17.4M
3.9 slim (glibc) wheel 1.7s 0.01s 18M
3.9 slim (glibc) - - 0.02s 18M

This quickstart demonstrates how to use `cssselect` to translate a CSS selector into an XPath 1.0 expression and then apply it to an HTML document using `lxml` to find matching elements. It highlights the use of `HTMLTranslator` for HTML-specific translations.

from lxml.etree import fromstring
from cssselect import HTMLTranslator, SelectorError

html_doc = '''
<div id="outer">
  <p class="content">
    <span>Text 1</span>
  </p>
  <div id="inner" class="content body">
    Text 2
    <span>Text 3</span>
  </div>
</div>
'''

try:
    # Use HTMLTranslator for HTML documents for better pseudo-class handling
    translator = HTMLTranslator()
    xpath_expression = translator.css_to_xpath('div.content > span')
    print(f"Generated XPath: {xpath_expression}")

    document = fromstring(html_doc)
    # Find all elements matching the XPath expression
    matches = document.xpath(xpath_expression)

    for element in matches:
        print(f"Matched element tag: {element.tag}, text: {element.text.strip() if element.text else ''}")

except SelectorError as e:
    print(f"Invalid CSS selector: {e}")