{"id":6873,"library":"scrapling","title":"Scrapling","description":"Scrapling is an adaptive, high-performance Python library designed for robust web scraping. It emphasizes undetectability and efficiency, automatically bypassing common anti-bot systems like Cloudflare and adapting to minor website structure changes. It offers various fetcher types for HTTP, headless browser, and stealthy browser interactions, alongside a Scrapy-like asynchronous spider framework for large-scale, concurrent crawling. The library is actively maintained with frequent updates, focusing on stealth, performance, and developer experience.","status":"active","version":"0.4.6","language":"en","source_language":"en","source_url":"https://github.com/D4Vinci/Scrapling","tags":["web scraping","undetectable","browser automation","async","anti-bot","adaptive scraping"],"install":[{"cmd":"pip install \"scrapling[fetchers]\"","lang":"bash","label":"Install core library with fetcher dependencies"},{"cmd":"scrapling install","lang":"bash","label":"Download browser dependencies (Playwright binaries)"}],"dependencies":[{"reason":"Requires Python 3.10 or higher.","package":"Python","optional":false},{"reason":"Used by browser-based fetchers (StealthyFetcher, DynamicFetcher) for headless automation. Binaries are installed via `scrapling install`.","package":"Playwright","optional":false}],"imports":[{"note":"The `scrapling.defaults` path was used in older examples but `scrapling.fetchers` is the current and recommended path for fetcher classes.","wrong":"from scrapling.defaults import Fetcher","symbol":"Fetcher","correct":"from scrapling.fetchers import Fetcher"},{"symbol":"StealthyFetcher","correct":"from scrapling.fetchers import StealthyFetcher"},{"symbol":"Spider","correct":"from scrapling.spiders import Spider"},{"symbol":"Response","correct":"from scrapling.spiders import Response"},{"symbol":"FetcherSession","correct":"from scrapling.fetchers import FetcherSession"}],"quickstart":{"code":"from scrapling.fetchers import Fetcher\nfrom scrapling.spiders import Spider, Response\nimport asyncio\n\n# --- Basic HTTP Fetching ---\nprint(\"\\n--- Basic HTTP Fetching ---\")\npage = Fetcher.get('https://quotes.toscrape.com/')\nquotes = page.css('.quote .text::text').getall()\nauthors = page.css('.quote .author::text').getall()\nprint(f\"First quote: {quotes[0]}\\nAuthor: {authors[0]}\")\n\n# --- Basic Spider Framework ---\nprint(\"\\n--- Basic Spider Framework ---\")\nclass QuotesSpider(Spider):\n    name = \"quotes_spider\"\n    start_urls = [\"https://quotes.toscrape.com\"]\n\n    async def parse(self, response: Response):\n        for quote in response.css(\"div.quote\"):\n            yield {\n                \"text\": quote.css(\"span.text::text\").get(\"\"),\n                \"author\": quote.css(\"small.author::text\").get(\"\"),\n            }\n\nasync def run_spider():\n    # Note: For production, consider using `MySpider().start()` which handles event loops.\n    # For direct asyncio integration as below, ensure no other event loop is running.\n    result = await QuotesSpider().start_async()\n    print(f\"Scraped {len(result.items)} items with the spider.\")\n    if result.items:\n        print(f\"First item from spider: {result.items[0]}\")\n\nif __name__ == \"__main__\":\n    # Run the basic HTTP fetch synchronously\n    # The spider requires an async context if run outside `Spider().start()`\n    # For this example, we wrap it in asyncio.run\n    asyncio.run(run_spider())\n","lang":"python","description":"This quickstart demonstrates basic HTTP fetching with `Fetcher` to extract data using CSS selectors. It also includes a minimal example of Scrapling's `Spider` framework for structured, asynchronous crawling, similar to Scrapy."},"warnings":[{"fix":"Consult the official Scrapling documentation and v0.4 release notes for migration guidelines, especially for the new `Spider` framework and updated fetcher APIs.","message":"Version 0.4 introduced a new asynchronous Spider framework and significant API changes. Existing scraping logic written for previous versions, especially those not using the new Spider API, may require substantial refactoring. Users are advised to review the v0.4 release notes for specific breaking changes.","severity":"breaking","affected_versions":">=0.4.0"},{"fix":"Review the v0.3.13 release notes for instructions on how to continue using `Camoufox` if desired, or adapt your code to Scrapling's updated browser fetching mechanisms (e.g., `StealthyFetcher`, `DynamicFetcher`). [cite: GitHub releases]","message":"In version 0.3.13, Scrapling stopped using `Camoufox` entirely due to various reasons. If your existing scrapers relied on `Camoufox` integration, they will break or behave differently.","severity":"breaking","affected_versions":">=0.3.13"},{"fix":"After `pip install scrapling`, execute `scrapling install` in your terminal to ensure browser dependencies are set up correctly.","message":"To use browser-based fetchers (like `StealthyFetcher` or `DynamicFetcher`), `pip install scrapling` is not sufficient. You must also run `scrapling install` (or `playwright install` directly if Playwright is installed separately) to download the necessary browser binaries.","severity":"gotcha","affected_versions":"All versions supporting browser fetchers"},{"fix":"Ensure you set `auto_save=True` when first defining element patterns and `adaptive=True` when fetching pages where the structure might have changed to leverage this feature. Example: `page.css('.product', auto_save=True)` and later `page.css('.product', adaptive=True)`.","message":"The adaptive scraping feature, which allows selectors to auto-relocate elements after website changes, needs to be explicitly enabled using `auto_save=True` during initial scraping and `adaptive=True` for subsequent scraping runs.","severity":"gotcha","affected_versions":"All versions supporting adaptive scraping"},{"fix":"Always use `getall()` when expecting a list of results from a selector. Version 0.4.3 unified this to match the `Selector` class.","message":"For `TextHandler` and `Selector` classes, the method to retrieve all matched text or elements is `getall()` (e.g., `page.css('selector').getall()`), not `get_all()`.","severity":"gotcha","affected_versions":"<0.4.3 (potential inconsistency), potentially later if used incorrectly"},{"fix":"Set `robots_txt_obey=True` in your spider's configuration if you need to comply with `robots.txt` rules. Be aware this might alter your crawl's behavior and speed. [cite: GitHub releases, 6]","message":"When running spiders, the `robots_txt_obey` option (introduced in v0.4.4) is disabled by default. If enabled, the spider will pre-fetch and respect `robots.txt` rules, including `Disallow`, `Crawl-delay`, and `Request-rate` directives, which can affect crawling speed and scope.","severity":"gotcha","affected_versions":">=0.4.4"},{"fix":"Exercise caution and validate the security of any external services or configurations (proxies, CDP endpoints) provided to Scrapling's fetchers or sessions.","message":"Supplying proxy credentials, CDP URLs, or user_data_dir paths can expose sensitive data or connect to untrusted remote browsers. Always ensure these sources are secure and trustworthy.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}