{"id":2278,"library":"selectolax","title":"selectolax","description":"Selectolax is a fast and lightweight Python library for parsing HTML5 documents with CSS selectors. It leverages Cython bindings to the high-performance Modest and Lexbor engines, with Lexbor being the recommended and actively developed backend. It is actively maintained, with frequent releases addressing bugs and adding features.","status":"active","version":"0.4.7","language":"en","source_language":"en","source_url":"https://github.com/rushter/selectolax","tags":["html-parser","css-selectors","scraping","cython","performance","web-scraping","dom"],"install":[{"cmd":"pip install selectolax","lang":"bash","label":"Standard installation"},{"cmd":"pip install selectolax[cython]","lang":"bash","label":"Installation with Cython for compilation issues"}],"dependencies":[{"reason":"Required Python version range.","package":"python","version":"<3.15,>=3.9","optional":false},{"reason":"May be needed for successful compilation during installation if `pip install selectolax` fails, especially with older selectolax versions on newer Python.","package":"Cython","optional":true}],"imports":[{"note":"The `LexborHTMLParser` from `selectolax.lexbor` is the recommended and preferred backend, offering better performance and features. The `HTMLParser` from `selectolax.parser` uses the deprecated Modest backend.","wrong":"from selectolax.parser import HTMLParser","symbol":"LexborHTMLParser","correct":"from selectolax.lexbor import LexborHTMLParser"}],"quickstart":{"code":"from selectolax.lexbor import LexborHTMLParser\n\nhtml_content = \"\"\"\n<html>\n<head><title>My Awesome Page</title></head>\n<body>\n    <h1 id=\"main-title\" data-version=\"1.0\">Welcome!</h1>\n    <div class=\"post\">\n        <p>This is the first post.</p>\n        <a href=\"/post/1\">Read more</a>\n    </div>\n    <div class=\"post\">\n        <p>This is the second post.</p>\n        <a href=\"/post/2\">Read more</a>\n    </div>\n    <p class=\"footer\">© 2026</p>\n</body>\n</html>\n\"\"\"\n\ntree = LexborHTMLParser(html_content)\n\n# Get the title\ntitle = tree.css_first('title').text() if tree.css_first('title') else 'No Title'\nprint(f\"Page Title: {title}\")\n\n# Get the text of the main heading\nmain_heading = tree.css_first('h1#main-title').text() if tree.css_first('h1#main-title') else 'N/A'\nprint(f\"Main Heading: {main_heading}\")\n\n# Get all post paragraphs and their links\nposts_data = []\nfor post_node in tree.css('.post'):\n    paragraph_text = post_node.css_first('p').text() if post_node.css_first('p') else ''\n    link_href = post_node.css_first('a').attrs.get('href') if post_node.css_first('a') else ''\n    posts_data.append({'paragraph': paragraph_text, 'link': link_href})\n\nprint(\"\\nPosts Found:\")\nfor post in posts_data:\n    print(f\"- {post['paragraph']} (Link: {post['link']})\")","lang":"python","description":"This example demonstrates how to parse an HTML string using `LexborHTMLParser`, extract text from specific elements using CSS selectors, and iterate through multiple matching elements to gather data. It also shows how to access attributes and safely handle cases where an element might not be found."},"warnings":[{"fix":"Adjust any code that relies on the exact serialization format of empty tags. If you were parsing and then re-serializing, verify that the new format does not break downstream processes.","message":"Empty HTML tags are now serialized to `<tag value=\"\">` instead of `<tag value>`. This change affects how attributes of empty tags are represented in the output HTML.","severity":"breaking","affected_versions":"0.4.7+"},{"fix":"Replace `from selectolax.parser import HTMLParser` with `from selectolax.lexbor import LexborHTMLParser` and update instantiation accordingly.","message":"The `HTMLParser` (Modest backend) from `selectolax.parser` is deprecated. Users should migrate to `LexborHTMLParser` from `selectolax.lexbor` for improved performance, features, and continued support.","severity":"deprecated","affected_versions":"0.4.0+"},{"fix":"Always check if the result of `css_first()` is not `None` before proceeding: `node = tree.css_first('selector'); if node: ...`.","message":"The `css_first()` method returns `None` if no element matches the given CSS selector. Failing to check for `None` before accessing attributes or methods (e.g., `.text()`, `.attrs`) will result in an `AttributeError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Avoid installing `0.4.5`. Upgrade to `0.4.6` or later, or downgrade to `0.4.4` if necessary.","message":"Version `0.4.5` was a bugged release and was subsequently yanked from PyPI. Installing or using this specific version is not recommended.","severity":"gotcha","affected_versions":"0.4.5"},{"fix":"Upgrade to `selectolax` version `0.4.6` or `0.4.7` (or newer) to benefit from these critical memory and stability fixes.","message":"Earlier versions of selectolax (prior to 0.4.6 and 0.4.0) contained memory leaks in the fragment parser and potential segfaults when accessing attributes or performing DOM modifications like `decompose()` or `unwrap()`.","severity":"gotcha","affected_versions":"<0.4.6, <0.4.0"},{"fix":"If compilation errors occur, try `pip install selectolax[cython]` to explicitly install Cython, which can help resolve these issues.","message":"Installation via `pip install selectolax` might fail with compilation errors, especially if installing an outdated version on a newer Python environment, or if Cython is not readily available.","severity":"gotcha","affected_versions":"All versions (under specific conditions)"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}