{"id":227,"library":"Scrapy","title":"Scrapy","description":"High-level web crawling and scraping framework. Current version is 2.14.1 (Jan 2026). Requires Python >=3.10. Two major breaking changes in 2.13: start_requests() (sync) replaced by start() (async), and TWISTED_REACTOR now defaults to asyncio — both can silently break existing spiders.","status":"active","version":"2.14.1","language":"python","source_language":"en","source_url":"https://docs.scrapy.org/en/latest/news.html","tags":["web-scraping","crawling","spider","html-parsing","css-selectors","xpath","twisted"],"install":[{"cmd":"pip install Scrapy","lang":"bash","label":"Standard (note capital S in package name)"},{"cmd":"scrapy startproject myproject","lang":"bash","label":"Create new project"}],"dependencies":[{"reason":"Required. Core async engine. Installed automatically.","package":"Twisted>=18.7.0","optional":false},{"reason":"Required for CSS/XPath selectors. Installed automatically.","package":"parsel>=1.5.0","optional":false},{"reason":"Required for HTTPS. Installed automatically.","package":"cryptography","optional":false},{"reason":"Required. Installed automatically.","package":"itemadapter","optional":false}],"imports":[{"note":"start_requests() (sync) was replaced by start() (async) in Scrapy 2.13. start_requests() still works but start() is the new preferred interface. Custom start_requests() overrides still function but cannot yield items directly.","wrong":"# Old sync start_requests() — still works but deprecated pattern\ndef start_requests(self):\n    for url in self.start_urls:\n        yield scrapy.Request(url, callback=self.parse)","symbol":"Spider.start","correct":"import scrapy\n\nclass MySpider(scrapy.Spider):\n    name = 'myspider'\n    start_urls = ['https://example.com']\n\n    # New async start() method (2.13+) — preferred over start_requests()\n    async def start(self):\n        for url in self.start_urls:\n            yield scrapy.Request(url, callback=self.parse)\n\n    def parse(self, response):\n        yield {'title': response.css('title::text').get()}"}],"quickstart":{"code":"import scrapy\n\nclass QuotesSpider(scrapy.Spider):\n    name = 'quotes'\n    start_urls = ['https://quotes.toscrape.com']\n\n    def parse(self, response):\n        for quote in response.css('div.quote'):\n            yield {\n                'text': quote.css('span.text::text').get(),\n                'author': quote.css('small.author::text').get(),\n                'tags': quote.css('div.tags a.tag::text').getall(),\n            }\n\n        # Follow pagination\n        next_page = response.css('li.next a::attr(href)').get()\n        if next_page:\n            yield response.follow(next_page, self.parse)","lang":"python","description":"Basic spider. Run with: scrapy crawl quotes -o output.json"},"warnings":[{"fix":"Explicitly set TWISTED_REACTOR in settings.py if you need a specific reactor. To restore old behavior: TWISTED_REACTOR = None. New projects use asyncio by default which is correct.","message":"TWISTED_REACTOR default changed to asyncio (AsyncioSelectorReactor) in 2.13. Existing projects that relied on the default reactor being None may behave differently. Projects with incompatible Twisted code that assumed the default reactor could silently break.","severity":"breaking","affected_versions":">= 2.13"},{"fix":"Override start() instead of start_requests() in new spiders. For existing spiders: start_requests() still works but its iteration behavior changed. See 'Delaying start request iteration' in docs to restore previous behavior.","message":"start_requests() (sync) replaced by start() (async) in 2.13. The iteration behavior changed: start requests now run continuously rather than stopping when the scheduler has pending requests. This can cause different crawl ordering and memory behavior on large crawls.","severity":"breaking","affected_versions":">= 2.13"},{"fix":"Pin Scrapy<2.13 for Python 3.9 environments.","message":"Python 3.9 dropped in Scrapy 2.13. Minimum is now Python 3.10.","severity":"breaking","affected_versions":">= 2.13"},{"fix":"Use .get() for the first match (returns str or None), .getall() for all matches (returns list of str). Example: response.css('h1::text').get() not response.css('h1::text').","message":"response.css() and response.xpath() return SelectorList, not strings. Forgetting .get() or .getall() returns a SelectorList object, not the text. A common source of silent data bugs.","severity":"gotcha","affected_versions":"all"},{"fix":"Replace return [item1, item2] with yield item1; yield item2. Or return a generator expression. Scrapy 2.13 added a warning for this (WARN_ON_GENERATOR_RETURN_VALUE setting).","message":"return in a parse callback instead of yield causes items/requests to be silently dropped. parse() must be a generator (use yield) not return a list.","severity":"gotcha","affected_versions":"all"},{"fix":"Always run scrapy commands from inside a project directory (where scrapy.cfg is). Create a project first: scrapy startproject myproject.","message":"Running scrapy crawl outside a Scrapy project directory raises ConfigError. The scrapy CLI requires a scrapy.cfg file in the current or parent directory.","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-05-12T12:03:24.637Z","next_check":"2026-06-27T00:00:00.000Z","problems":[{"fix":"Install Scrapy using `pip install scrapy` (or `pip3 install scrapy`). Ensure the directory where Scrapy's executable is installed (e.g., Python's `Scripts` directory on Windows or `bin` in a virtual environment) is included in your system's PATH. Alternatively, run Scrapy commands using `python -m scrapy`.","cause":"Scrapy is not installed, or its executable script is not in the system's PATH environment variable. This is common if pip installs packages to a user-specific directory not automatically added to PATH, or if a virtual environment is not activated.","error":"'scrapy' is not recognized as an internal or external command, operable program or batch file."},{"fix":"Verify Scrapy is installed for your active Python environment using `pip show scrapy`. If not, install it with `pip install scrapy` (or `python3.x -m pip install scrapy` for a specific Python version). Check your project directory and Python path for any conflicting files or folders named `scrapy`.","cause":"Scrapy is not installed for the Python interpreter currently being used, or there is a local file or directory named `scrapy.py` or `scrapy` that is shadowing the installed library.","error":"ModuleNotFoundError: No module named 'scrapy'"},{"fix":"If your spider uses `async` operations for initial requests, rename `async def start_requests(self)` to `async def start(self)`. The `start()` method should be an `async` generator yielding `Request` objects. If you intend to use synchronous `start_requests()`, ensure it's not defined as `async`.","cause":"In Scrapy versions 2.13 and newer, for asynchronous spiders, the `start_requests()` method has been replaced by `async def start()`. If you define `async def start_requests()`, it will be ignored or lead to this error when the engine tries to call the non-existent synchronous `start_requests()`.","error":"AttributeError: 'Spider' object has no attribute 'start_requests'"},{"fix":"Explicitly set `TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'` in your `settings.py`. Review your project for any early imports of `twisted.internet.reactor` or other Twisted components and move them to local scopes or after Scrapy's reactor initialization if possible. If running Scrapy from a script, consider using `scrapy.utils.reactor.install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')` at the very beginning.","cause":"Scrapy 2.13+ defaults the `TWISTED_REACTOR` to `asyncio` (`twisted.internet.asyncioreactor.AsyncioSelectorReactor`), but another part of your code or a third-party library might be implicitly or explicitly installing a different Twisted reactor before Scrapy can configure its own. Importing `twisted.internet.reactor` too early is a common cause.","error":"twisted.internet.error.ReactorAlreadyRunning"},{"fix":"Ensure that all URLs passed to `scrapy.Request` include a valid scheme, such as `http://` or `https://`. For example, instead of `yield scrapy.Request('example.com')`, use `yield scrapy.Request('https://example.com/')`.","cause":"This error occurs when a `scrapy.Request` object is created with a URL that lacks a proper scheme (e.g., `http://` or `https://`). The URL provided is incomplete or malformed.","error":"ValueError: Missing scheme in request url: ..."}],"ecosystem":"pypi","meta_description":null,"install_score":100,"install_tag":"verified","quickstart_score":80,"quickstart_tag":"verified","pypi_latest":null,"install_checks":{"last_tested":"2026-05-12","tag":"verified","tag_description":"installs cleanly on critical runtimes, fast import, recently tested","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.2,"mem_mb":21.8,"disk_size":"90.2M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.94,"mem_mb":21.8,"disk_size":"91M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.63,"mem_mb":24.2,"disk_size":"102.0M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.41,"mem_mb":24.2,"disk_size":"103M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.74,"mem_mb":23.8,"disk_size":"91.7M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.75,"mem_mb":23.8,"disk_size":"92M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.71,"mem_mb":24.4,"disk_size":"91.0M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.7,"mem_mb":24.4,"disk_size":"92M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.08,"mem_mb":21.9,"disk_size":"90.2M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.03,"mem_mb":21.9,"disk_size":"91M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null}]},"quickstart_checks":{"last_tested":"2026-04-23","tag":"verified","tag_description":"quickstart runs on critical runtimes, recently tested","results":[{"runtime":"python:3.10-alpine","exit_code":0},{"runtime":"python:3.10-slim","exit_code":0},{"runtime":"python:3.11-alpine","exit_code":0},{"runtime":"python:3.11-slim","exit_code":0},{"runtime":"python:3.12-alpine","exit_code":0},{"runtime":"python:3.12-slim","exit_code":0},{"runtime":"python:3.13-alpine","exit_code":0},{"runtime":"python:3.13-slim","exit_code":0},{"runtime":"python:3.9-alpine","exit_code":0},{"runtime":"python:3.9-slim","exit_code":0}]}}