{"id":7935,"library":"apify","title":"Apify SDK for Python","description":"The Apify SDK for Python is the official library for creating Apify Actors. Actors are serverless cloud programs that can perform various web scraping and automation tasks. This SDK provides tools for Actor lifecycle management, local storage emulation, and event handling, allowing developers to build scalable data extraction solutions. It is actively maintained, with the current stable version being 3.3.2.","status":"active","version":"3.3.2","language":"en","source_language":"en","source_url":"https://github.com/apify/apify-sdk-python","tags":["web scraping","automation","actors","apify platform","serverless","data extraction"],"install":[{"cmd":"pip install apify","lang":"bash","label":"Install Apify SDK"},{"cmd":"pip install apify[scrapy]","lang":"bash","label":"Install with Scrapy integration"}],"dependencies":[{"reason":"Used internally by Apify SDK v3.0+ for local storage emulation, providing updated storage APIs for Dataset, KeyValueStore, and RequestQueue.","package":"crawlee","optional":false}],"imports":[{"symbol":"Actor","correct":"from apify import Actor"},{"note":"Request is now directly under the 'apify' namespace for consistency across SDK versions.","wrong":"from apify.storages import Request","symbol":"Request","correct":"from apify import Request"}],"quickstart":{"code":"import asyncio\nimport httpx\nfrom bs4 import BeautifulSoup\nfrom apify import Actor\n\nasync def main() -> None:\n    async with Actor:\n        # Retrieve the Actor input, or use a default if not provided\n        actor_input = await Actor.get_input() or {}\n        start_urls = actor_input.get('start_urls', [{'url': 'https://apify.com'}])\n\n        # Open the default request queue\n        request_queue = await Actor.open_request_queue()\n\n        # Enqueue the start URLs\n        for start_url_obj in start_urls:\n            url = start_url_obj.get('url')\n            if url:\n                await request_queue.add_request(url)\n\n        # Process the URLs from the request queue\n        while True:\n            request = await request_queue.fetch_next_request()\n\n            if not request:\n                break\n\n            Actor.log.info(f'Processing {request.url}')\n            try:\n                async with httpx.AsyncClient() as client:\n                    response = await client.get(request.url)\n                    response.raise_for_status()\n\n                soup = BeautifulSoup(response.content, 'html.parser')\n                data = {\n                    'url': request.url,\n                    'title': soup.title.string if soup.title else None,\n                    'status_code': response.status_code\n                }\n                await Actor.push_data(data)\n            except httpx.HTTPStatusError as e:\n                Actor.log.error(f'Failed to fetch {request.url}: {e}')\n            finally:\n                await request_queue.mark_request_handled(request)\n\nif __name__ == '__main__':\n    asyncio.run(main())","lang":"python","description":"This quickstart demonstrates how to create a simple Apify Actor that fetches URLs from an input, scrapes their titles using HTTPX and BeautifulSoup, and pushes the extracted data to the default dataset. It utilizes the `async with Actor:` context manager for proper lifecycle management and `RequestQueue` for managing URLs."},"warnings":[{"fix":"Consult the official 'Upgrading to v3' documentation. Replace removed methods with their v3 counterparts (e.g., `open` for storages, `get_metadata` for info). Adapt to `crawlee` v1.0 storage API changes.","message":"The Apify SDK v3.0 introduced significant breaking changes from v2.x, including a complete overhaul of storage APIs (Dataset, KeyValueStore, RequestQueue). Older methods like `from_storage_object`, `get_info`, and `storage_object` have been removed or replaced. Default storage IDs in configuration changed from 'default' to `None`.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"Refactor your Actor's main logic to use `async with Actor:`: `async def main(): async with Actor: # Your Actor logic here`. This ensures proper lifecycle management and resource handling.","message":"The `Actor.main()` method was removed in SDK v2.0 and is no longer supported in v3.x. Its functionality is replaced by the `async with Actor:` context manager, which handles initialization and graceful shutdown automatically.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Ensure your Python environment is running version 3.10 or newer. Upgrade Python if necessary.","message":"Apify SDK v3.x requires Python 3.10 or higher. Previous versions (v2.x) supported Python 3.9+ (v1.x dropped 3.8 support).","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"If you need to preserve local storage between runs for testing or specific workflows, you can disable automatic purging by passing `purge=False` to the Actor initialization, e.g., `async with Actor(purge=False):`.","message":"In Apify SDK v3.0+, local storage is automatically purged (cleared) at the start of an Actor run (during `Actor.init()` or `async with Actor:`). This differs from v2.x, where the `--purge` CLI argument was required.","severity":"gotcha","affected_versions":">=3.0.0"},{"fix":"Avoid using mutable objects as default arguments. Instead, use `None` as the default and initialize the mutable object inside the function if `None` is detected: `def func(arg=None): arg = arg if arg is not None else []`.","message":"Python's mutable default arguments can lead to unexpected behavior if not handled correctly. If a function's default argument is a mutable object (like a list or dictionary) and it's modified within the function, the change persists across subsequent calls, leading to state leakage.","severity":"gotcha","affected_versions":"All Python versions (general Python gotcha)"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Refactor your Actor's entry point to use the recommended asynchronous context manager: `async def main(): async with Actor: # your logic` and run with `asyncio.run(main())`.","cause":"Attempting to call the `main()` method on the `Actor` class, which was removed in Apify SDK v2.0 and replaced by the `async with Actor:` context manager pattern.","error":"AttributeError: 'Actor' object has no attribute 'main'"},{"fix":"Pass an `apify.Request` object to `add_request()`. For simple URLs, you can often pass a string, but for more complex requests, create a `Request` object: `from apify import Request; await request_queue.add_request(Request(url='http://example.com'))`.","cause":"In Apify SDK v2.0+, `RequestQueue.add_request()` primarily expects an `apify.Request` object as its argument. Passing a dictionary directly with a `url` key, or simply a plain URL string, might be misinterpreted or require explicit wrapping.","error":"TypeError: RequestQueue.add_request() got an unexpected keyword argument 'url'"},{"fix":"Review your Actor's `INPUT_SCHEMA.json` and ensure that the input data strictly matches the defined schema, including data types, required fields, and patterns. Use the Apify Console's visual input schema editor or the `apify validate-schema` CLI command to check validity.","cause":"The input provided to the Actor (either via Apify Console, API, or local `INPUT.json`) does not conform to the `INPUT_SCHEMA.json` defined for the Actor. This validation happens before the Actor's code even starts.","error":"ApifyApiError: Actor input schema validation failed"}]}