arXiv Python API Wrapper
The `arxiv` library is a Python wrapper for the arXiv API, providing programmatic access to over a million scholarly articles in physics, mathematics, computer science, and other fields. It allows users to search, retrieve metadata, and download papers from the arXiv open-access repository. The library is actively maintained with frequent minor and patch releases.
Warnings
- breaking Python 3.7 and 3.8 are no longer officially supported as of version 2.2.0. While existing code might still run, CI validation has ceased for these versions, and future compatibility is not guaranteed.
- deprecated Submodule-style imports like `import arxiv.arxiv` or `from arxiv import arxiv` were deprecated in version 2.1.0.
- deprecated Direct download helper methods (e.g., `result.download_pdf()`, `result.download_source()`) were deprecated in version 2.3.0. While they might still function, their use is discouraged.
- gotcha Version 2.3.1 introduced a fallback for missing titles by string matching `/pdf/`, which was reverted in 2.3.2 due to potential issues. If you encountered unexpected title parsing behavior around this version, it might have been related to this change.
- gotcha The arXiv API requests that users 'make no more than one request every three seconds'. The library's `Client` can be configured with `delay_seconds` to respect this, but excessive unthrottled requests can lead to IP blocking by arXiv.
Install
-
pip install arxiv
Imports
- Client
from arxiv import Client
- Search
from arxiv import Search
- Result
from arxiv import Result
- SortCriterion
from arxiv import SortCriterion
- SortOrder
from arxiv import SortOrder
Quickstart
import arxiv
# Construct the default API client
client = arxiv.Client()
# Search for the 10 most recent articles matching 'quantum'
search = arxiv.Search(
query = "quantum",
max_results = 10,
sort_by = arxiv.SortCriterion.SubmittedDate,
sort_order = arxiv.SortOrder.Descending
)
for result in client.results(search):
print(f"Title: {result.title}")
print(f"Authors: {', '.join(author.name for author in result.authors)}")
print(f"Published: {result.published}")
# Example: download PDF to current directory (note deprecation warning in v2.3.0)
# result.download_pdf(dirpath='./downloads')