CDP Socket
cdp-socket is a Python library designed for programmatically interacting with the Chrome DevTools Protocol (CDP). It provides a high-level asynchronous API to control Chrome-based browsers, enabling tasks such as browser automation, scraping, and debugging. The library is currently at version 1.2.8, with its last release in April 2024, indicating active maintenance and a regular, albeit not fixed, release cadence.
Common errors
-
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
cause The `launch_chrome` function or subsequent operations are not correctly terminating the Chrome browser process, leading to orphaned processes and resource leaks.fixEnsure `os.kill(process.pid, 15)` is called in a `finally` block to guarantee the Chrome process is terminated even if errors occur. Also, consider removing the temporary data directory with `shutil.rmtree()`. -
cdp_socket.utils.conn.WebSocketConnectionError: Failed to connect to websocket url ... after ... seconds
cause The Chrome browser did not start correctly, or the remote debugging port was not exposed, or the connection attempt timed out.fixVerify that Chrome is installed and runnable. Check the logs for `launch_chrome` to see if it printed any errors. Increase the `timeout` parameter for `get_websock_url` and `SingleCDPSocket` if running on a slower system. -
await sock.exec("Page.navigate", {"url": "https://example.com"}) resulted in an error like 'Target.navigate' was not found.cause You are attempting to use a page-specific command (like 'Page.navigate') on a browser-level target, or vice-versa. Some commands are scoped to the overall browser, while others are specific to a particular page or execution context.fixFirst, use `Target.getTargets()` to list all available targets. Identify the 'page' target's WebSocket URL. You may need to create a new `SingleCDPSocket` instance connected to that specific page URL if you're not already, or ensure your initial connection is to a page target if that's your intent. The `sock.exec` method can target specific sessions using a `sessionId` if you attach to a target.
Warnings
- gotcha Chrome/Chromium browser must be installed and accessible in the system's PATH, or its path explicitly provided to `launch_chrome`. If the browser is not found, `launch_chrome` will fail.
- breaking Connecting to the wrong CDP endpoint (e.g., browser-level vs. page-level) will cause commands to fail with 'command not found' or 'invalid session ID' errors. Browser-level commands require a browser target, while DOM/Page/Network commands require a page target.
- gotcha As `cdp-socket` is an asynchronous library, operations must be awaited within an `async` function. Improper handling of `await` calls will lead to `RuntimeWarning: coroutine '...' was never awaited` or blocks in execution.
Install
-
pip install cdp-socket
Imports
- SingleCDPSocket
from cdp_socket.socket import SingleCDPSocket
- launch_chrome
from cdp_socket.utils.utils import launch_chrome
- random_port
from cdp_socket.utils.utils import random_port
- get_websock_url
from cdp_socket.utils.conn import get_websock_url
Quickstart
import asyncio
import os
import shutil
from cdp_socket.utils.utils import launch_chrome, random_port
from cdp_socket.utils.conn import get_websock_url
from cdp_socket.socket import SingleCDPSocket
async def main():
data_dir = os.path.join(os.getcwd(), "cdp_data_dir")
port = random_port()
process = None
try:
process = launch_chrome(data_dir, port)
websock_url = await get_websock_url(port, timeout=5)
async with SingleCDPSocket(websock_url, timeout=5) as sock:
targets = await sock.exec("Target.getTargets")
print(f"Active targets: {targets}")
# Example: Navigate to a page
# This assumes you have a page target. The 'Target.getTargets' output helps identify it.
# For browser-level commands, you directly use the 'sock' object.
# For page-specific commands, you might need to attach to a page target first.
# For simplicity, this example just fetches targets.
except Exception as e:
print(f"An error occurred: {e}")
finally:
if process and process.poll() is None: # Check if process is still running
os.kill(process.pid, 15) # Terminate Chrome gracefully
if os.path.exists(data_dir):
shutil.rmtree(data_dir)
if __name__ == "__main__":
asyncio.run(main())