Cloudscraper
Cloudscraper is a Python library built on top of the `requests` library, designed to bypass Cloudflare's anti-bot page (also known as "I'm Under Attack Mode" or IUAM). It achieves this by mimicking a real web browser, handling JavaScript challenges, and managing cookies automatically, allowing users to scrape websites protected by Cloudflare. The library is actively maintained, with frequent updates to adapt to Cloudflare's evolving security measures. The current PyPI version is 1.2.71, though a major version 3.0.0 has been released on GitHub with significant changes.
Warnings
- breaking Cloudscraper v3.0.0 (released on GitHub, potentially available via `pip install cloudscraper>=3.0.0` or direct GitHub install) introduces breaking changes, including a minimum Python version requirement of 3.8+ (dropping Python 3.6 and 3.7 support) and significant dependency upgrades. Python 2 compatibility code has also been removed. If upgrading from an older 1.x or 2.x version, ensure your environment meets these new requirements.
- gotcha Cloudflare continuously updates its anti-bot techniques, leading to an 'arms race' where `cloudscraper` versions can become outdated and cease to work against the latest Cloudflare protections. This often results in `403 Forbidden` errors or being stuck on a Cloudflare challenge page.
- gotcha While `cloudscraper` can handle JavaScript challenges, it does not natively solve CAPTCHA challenges (like reCAPTCHA or hCaptcha) that Cloudflare might present. Users will encounter blocks if CAPTCHAs are triggered.
- gotcha Cloudscraper is not a full-fledged browser and may not suffice for complex web scraping scenarios involving heavy client-side JavaScript rendering, dynamic content loaded via XHR, or highly sophisticated Cloudflare protection layers. It primarily tackles JavaScript challenges and cookie management.
Install
-
pip install cloudscraper
Imports
- cloudscraper
import cloudscraper
Quickstart
import cloudscraper
import os
# Instantiate a CloudScraper session. This object works like a requests.Session
scraper = cloudscraper.create_scraper(
# Optionally, provide a `requests` Session object to base it on
# sess=requests.Session(),
# Or configure an interpreter, e.g., 'nodejs' if installed for better performance
# interpreter='nodejs'
)
# Make a GET request to a Cloudflare-protected site
# Replace 'http://somesite.com' with your target URL
# For demonstration, we'll use a placeholder URL or a test site
# You might need to set headers, e.g., a User-Agent, for more realistic requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36'}
url = os.environ.get('TARGET_URL', 'https://nowsecure.nl') # A common test site for bot detection
try:
response = scraper.get(url, headers=headers)
response.raise_for_status() # Raise an exception for bad status codes
print(f"Successfully accessed {url} (Status: {response.status_code})")
# print(response.text[:500]) # Print first 500 characters of content
except Exception as e:
print(f"Failed to access {url}: {e}")
if response.status_code == 403:
print("Access denied (403 Forbidden). Cloudflare protection might be too strong or settings need adjustment.")