Scrapy

2.14.1 · active · verified Fri Mar 27

High-level web crawling and scraping framework. Current version is 2.14.1 (Jan 2026). Requires Python >=3.10. Two major breaking changes in 2.13: start_requests() (sync) replaced by start() (async), and TWISTED_REACTOR now defaults to asyncio — both can silently break existing spiders.

Warnings

Install

Imports

Quickstart

Basic spider. Run with: scrapy crawl quotes -o output.json

import scrapy

class QuotesSpider(scrapy.Spider):
    name = 'quotes'
    start_urls = ['https://quotes.toscrape.com']

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('small.author::text').get(),
                'tags': quote.css('div.tags a.tag::text').getall(),
            }

        # Follow pagination
        next_page = response.css('li.next a::attr(href)').get()
        if next_page:
            yield response.follow(next_page, self.parse)

view raw JSON →