Crawl4AI: LLM Friendly Web Crawler & Scraper

0.8.6 · active · verified Sat Apr 11

Crawl4AI is an open-source, LLM-friendly web crawler and scraper designed for AI agents, RAG, and data pipelines. It provides fast, controllable, and customizable web content extraction, often converting pages into clean Markdown. The library supports dynamic content handling, caching, custom hooks, real-time monitoring via Docker, and offers flexible deployment options. It is actively maintained with frequent minor releases focusing on performance, anti-bot detection, and security, with the current version being 0.8.6. [8, 9]

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to perform a basic web crawl using `AsyncWebCrawler` to fetch content from a URL and receive it as Markdown. It leverages Python's `asyncio` for non-blocking operations. Before running, ensure `playwright install` has been executed to set up necessary browser binaries. [13]

import asyncio
from crawl4ai import AsyncWebCrawler

async def main():
    # Initialize the crawler. Ensure 'playwright install' has been run.
    async with AsyncWebCrawler() as crawler:
        # Perform a basic crawl and extract content as Markdown
        result = await crawler.arun(
            url="https://www.nbcnews.com/business"
        )
        print("--- Extracted Markdown ---")
        print(result.markdown[:500]) # Print first 500 chars of Markdown

        # Example of getting raw HTML
        # result_html = await crawler.arun(
        #     url="https://www.nbcnews.com/business",
        #     include_raw_html=True
        # )
        # print("--- Raw HTML ---")
        # print(result_html.html[:500])

if __name__ == "__main__":
    asyncio.run(main())

view raw JSON →