{"id":3943,"library":"crawl4ai","title":"Crawl4AI: LLM Friendly Web Crawler & Scraper","description":"Crawl4AI is an open-source, LLM-friendly web crawler and scraper designed for AI agents, RAG, and data pipelines. It provides fast, controllable, and customizable web content extraction, often converting pages into clean Markdown. The library supports dynamic content handling, caching, custom hooks, real-time monitoring via Docker, and offers flexible deployment options. It is actively maintained with frequent minor releases focusing on performance, anti-bot detection, and security, with the current version being 0.8.6. [8, 9]","status":"active","version":"0.8.6","language":"en","source_language":"en","source_url":"https://github.com/unclecode/crawl4ai","tags":["web scraping","crawler","llm","ai","data extraction","rag","playwright"],"install":[{"cmd":"pip install -U crawl4ai","lang":"bash","label":"Basic Installation"},{"cmd":"playwright install","lang":"bash","label":"Install Browser Dependencies (Mandatory)"},{"cmd":"pip install crawl4ai[torch]","lang":"bash","label":"With PyTorch for Clustering (Optional)"},{"cmd":"pip install crawl4ai[transformer]","lang":"bash","label":"With Transformers for LLM Integration (Optional)"}],"dependencies":[{"reason":"Required Python version","package":"python","version":">=3.10","optional":false},{"reason":"Core browser automation; requires separate installation step: `playwright install`","package":"playwright","optional":false},{"reason":"Utility for human-readable formats","package":"humanize","optional":false},{"reason":"Asynchronous HTTP client","package":"httpx","optional":false},{"reason":"Data validation and settings management","package":"pydantic","optional":false},{"reason":"LLM integration (replaces 'litellm' due to security fix in v0.8.6)","package":"unclecode-litellm","optional":false},{"reason":"HTML parsing","package":"beautifulsoup4","optional":false},{"reason":"YAML configuration parsing","package":"pyyaml","optional":false}],"imports":[{"note":"The primary crawler class is directly available from the top-level package since v0.6.0 removed legacy browser modules. [1]","wrong":"from crawl4ai.web_crawler import AsyncWebCrawler","symbol":"AsyncWebCrawler","correct":"from crawl4ai import AsyncWebCrawler"}],"quickstart":{"code":"import asyncio\nfrom crawl4ai import AsyncWebCrawler\n\nasync def main():\n    # Initialize the crawler. Ensure 'playwright install' has been run.\n    async with AsyncWebCrawler() as crawler:\n        # Perform a basic crawl and extract content as Markdown\n        result = await crawler.arun(\n            url=\"https://www.nbcnews.com/business\"\n        )\n        print(\"--- Extracted Markdown ---\")\n        print(result.markdown[:500]) # Print first 500 chars of Markdown\n\n        # Example of getting raw HTML\n        # result_html = await crawler.arun(\n        #     url=\"https://www.nbcnews.com/business\",\n        #     include_raw_html=True\n        # )\n        # print(\"--- Raw HTML ---\")\n        # print(result_html.html[:500])\n\nif __name__ == \"__main__\":\n    asyncio.run(main())","lang":"python","description":"This quickstart demonstrates how to perform a basic web crawl using `AsyncWebCrawler` to fetch content from a URL and receive it as Markdown. It leverages Python's `asyncio` for non-blocking operations. Before running, ensure `playwright install` has been executed to set up necessary browser binaries. [13]"},"warnings":[{"fix":"Upgrade to crawl4ai==0.8.6 or newer: `pip install -U crawl4ai`","message":"Critical Security Hotfix (v0.8.6): The `litellm` dependency was replaced with `unclecode-litellm` due to a PyPI supply chain compromise. Users on `v0.8.5` or earlier are strongly advised to upgrade immediately to `v0.8.6` or later to mitigate this risk. [13]","severity":"breaking","affected_versions":"<=0.8.5"},{"fix":"Execute `playwright install` in your environment after `pip install crawl4ai`.","message":"Mandatory Playwright Installation: After installing `crawl4ai` via pip, you must run `playwright install` to download and set up the required browser binaries. Failing to do so will result in runtime errors when attempting to crawl. [6, 10, 13]","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review the v0.8.0 release notes and documentation for how to securely re-enable and configure Docker API hooks if absolutely necessary. Ensure your Docker environment is secure.","message":"Docker API Hooks Disabled by Default (v0.8.0): For security reasons (Remote Code Execution vulnerability fix), hooks in the Docker API are now disabled by default. If you rely on hooks with the Docker API, you will need to re-enable them with caution. [2]","severity":"breaking","affected_versions":">=0.8.0"},{"fix":"Update import statements (e.g., `from crawl4ai import AsyncWebCrawler` instead of deeper paths) and review method signatures if using custom crawler strategies. Consult the changelog for specific API changes from v0.6.0.","message":"Legacy Browser Modules Removed (v0.6.0): Modules under `crawl4ai/browser/*` were removed. Also, the `AsyncPlaywrightCrawlerStrategy.get_page` function signature changed. Update imports and method calls accordingly. [1]","severity":"breaking","affected_versions":"<0.6.0"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}