Python Wikipedia API
The `wikipedia` library is a Pythonic wrapper that provides easy access to and parsing of data from Wikipedia. It allows users to search Wikipedia, retrieve article summaries, and extract structured data such as links and images from pages. The current stable version is 1.4.0. This library is designed for ease of use rather than advanced, high-volume scraping, and has not seen a release since 2014.
Warnings
- gotcha Calling `wikipedia.summary()` or `wikipedia.page()` with an ambiguous query (e.g., 'Mercury') will raise a `wikipedia.exceptions.DisambiguationError`.
- gotcha If a query does not match any Wikipedia page, `wikipedia.summary()` or `wikipedia.page()` will raise a `wikipedia.exceptions.PageError`.
- gotcha The library's default `auto_suggest=True` behavior can sometimes silently correct a query to an unintended or incorrect page, leading to unexpected results or `PageError` for what seems like a valid query. For example, 'Commander Worf' might become 'commander wharf'.
- deprecated This `wikipedia` library (goldsmith/Wikipedia) has not been updated since November 2014 (version 1.4.0). While still functional, it may not support the latest Wikipedia API features, might have unaddressed bugs, or could eventually break due to API changes.
- gotcha This library is designed for simple, casual use. It does not include features like rate limiting, robust error handling for network issues, or extensive scraping capabilities. Using it for high-volume or aggressive scraping can lead to IP blocking or violations of Wikimedia's terms of service.
Install
-
pip install wikipedia
Imports
- wikipedia
import wikipedia
- DisambiguationError
from wikipedia.exceptions import DisambiguationError
- PageError
from wikipedia.exceptions import PageError
Quickstart
import wikipedia
# Set language (optional, default is 'en')
wikipedia.set_lang("en")
# Search for pages
search_results = wikipedia.search("Artificial Intelligence")
print(f"Search results: {search_results[:3]}...")
# Get a summary of a page
try:
summary_text = wikipedia.summary("Artificial intelligence", sentences=2)
print(f"Summary: {summary_text}")
except wikipedia.exceptions.DisambiguationError as e:
print(f"Disambiguation options: {e.options}")
# Example of handling by picking the first option
# print(f"Picking first option: {wikipedia.summary(e.options[0], sentences=2)}")
except wikipedia.exceptions.PageError:
print("Page not found.")
# Get a full page object
try:
page = wikipedia.page("Artificial intelligence")
print(f"Page title: {page.title}")
print(f"Page URL: {page.url}")
# Access content, links, etc.
# print(f"Page content (first 200 chars): {page.content[:200]}...")
# print(f"Page links (first 5): {page.links[:5]}")
except wikipedia.exceptions.PageError:
print("Page not found for full object.")
except wikipedia.exceptions.DisambiguationError as e:
print(f"Disambiguation options for page: {e.options}")