MechanicalSoup
MechanicalSoup is a Python library for automating interaction with websites. It builds on top of `requests` and `BeautifulSoup4` to provide a stateful browser experience, making it easy to navigate, fill forms, and submit data without a full-fledged browser. The current version is 1.4.0, and it maintains a moderate release cadence, typically releasing minor versions every 6-12 months with occasional patch releases.
Common errors
-
FileNotFoundError: [Errno 2] No such file or directory: '/path/to/file'
cause Attempting to upload a file in MechanicalSoup 1.3.0+ by passing a string path directly to a form field, instead of an opened file object.fixPass an explicitly opened file object: `browser["upload_field"] = open("/path/to/file.txt", "rb")`. Remember to close the file handle after submission if not managed automatically. -
AttributeError: 'StatefulBrowser' object has no attribute 'get_current_page'
cause Using an old, deprecated method (`get_current_page`, `get_current_form`, or `get_url`) after upgrading to MechanicalSoup 1.0.0+ where these might be removed or not correctly mapped in some contexts (though generally still present as of 1.4.0, it's a common confusion point).fixUse the direct properties introduced in 1.0.0: `browser.page`, `browser.form`, `browser.url`. -
mechanicalsoup.LinkNotFoundError: No link found with selector 'a.broken-link'
cause The specified link selector did not match any link on the current page, or a 404 Not Found error occurred and `raise_on_404` is enabled for the browser.fixVerify your link selector is correct and matches an existing link. If you're getting 404s, consider if the URL is valid, or if you need to handle `LinkNotFoundError` if `raise_on_404=True`. -
ValueError: No form selected
cause Attempting to interact with form fields (e.g., `browser['field_name'] = 'value'`) or submit a form without first successfully selecting one using `browser.select_form()`.fixEnsure `browser.select_form()` is called with a valid selector (e.g., `browser.select_form('form[action="/login"]')`) before trying to manipulate form fields or submit.
Warnings
- breaking As of v1.3.0, uploading files in forms requires explicitly opening the file object (e.g., `open('/path/to/file', 'rb')`) instead of just passing the file path as a string. This change was implemented to prevent malicious web servers from reading arbitrary local files.
- breaking MechanicalSoup v1.4.0 dropped support for Python versions 3.6, 3.7, and 3.8. Earlier versions (v1.1.0) dropped 2.7 and 3.5. Ensure your environment uses Python 3.9 or higher.
- gotcha Since v1.0.0, `StatefulBrowser` introduced properties (`.page`, `.form`, `.url`) to access the current page, form, and URL. The older method calls (`.get_current_page()`, `.get_current_form()`, `.get_url()`) are still present but are considered deprecated and may be removed in future versions.
- gotcha The `StatefulBrowser` and `Browser` constructors accept a `raise_on_404=True` argument, which is highly recommended. By default, it's `False` for backward compatibility, meaning HTTP 404 errors might not immediately raise an exception, potentially leading to silent failures.
Install
-
pip install mechanicalsoup -
pip install mechanicalsoup[full]
Imports
- StatefulBrowser
from mechanicalsoup import StatefulBrowser
- Browser
from mechanicalsoup import Browser
- Form
from mechanicalsoup.form import Form
from mechanicalsoup import Form
Quickstart
import mechanicalsoup
import os
# Create a headless browser instance
browser = mechanicalsoup.StatefulBrowser()
# Open a page (replace with a real URL for testing, e.g., a login page)
# For a test, we'll use a mock login setup
# In a real scenario, you'd open a target URL:
# browser.open("http://example.com/login")
# Simulate a simple HTML page with a form
# For demonstration, we'll parse a string. In reality, browser.open() returns a response.
html_content = '''
<html><body>
<form action="/login" method="post">
<input type="text" name="username" value="">
<input type="password" name="password" value="">
<input type="submit" value="Login">
</form>
</body></html>
'''
browser.set_content(html_content)
# Select the form (by index or CSS selector)
browser.select_form('form[action="/login"]')
# Fill in the form fields
browser["username"] = os.environ.get('TEST_USERNAME', 'testuser')
browser["password"] = os.environ.get('TEST_PASSWORD', 'testpass')
# Submit the form
# In a real scenario, this would send the request to the action URL
# response = browser.submit_selected()
print(f"Form selected: {browser.form}")
print(f"Username field value: {browser['username']}")
print(f"Password field value: {browser['password']}")
# print(f"Response URL after submission: {browser.url}")
# print(f"Response content: {browser.page.text}")