scholarly

1.7.11 · active · verified Thu Apr 16

scholarly is a Python module designed to programmatically retrieve author and publication information from Google Scholar, effectively bypassing CAPTCHA challenges. Currently at version 1.7.11, the library maintains an active development cycle with frequent updates to adapt to changes in Google Scholar's structure and anti-bot measures.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to search for an author, retrieve their full profile and publications, and find papers that cite a specific publication. It also includes an example of how to set up a `ProxyGenerator` for robust scraping, which is crucial for reliably interacting with Google Scholar's anti-bot mechanisms.

from scholarly import scholarly, ProxyGenerator
import os

# It is recommended to set up a proxy from the start of your application.
# scholarly is designed to intelligently use proxies only when necessary.
pg = ProxyGenerator()
# For using free proxies (often less reliable for continuous scraping)
# success = pg.FreeProxies()
# if not success: print("Could not set up free proxies. Continuing without.")

# Example for ScraperAPI (recommended for reliability, requires API key)
# Set SCAPERAPI_API_KEY environment variable
scraperapi_key = os.environ.get('SCAPERAPI_API_KEY', '')
if scraperapi_key:
    print("Using ScraperAPI for proxies.")
    pg.ScraperAPI(scraperapi_key)
    scholarly.use_proxy(pg)
else:
    print("SCAPERAPI_API_KEY not found. Using default connection (may hit limits). 
         Consider setting up a proxy for robust scraping.")

# Search for an author
search_query = scholarly.search_author('Steven A Cholewiak')
author = scholarly.fill(next(search_query))
print(f"Author Name: {author['name']}")
print(f"Author Affiliation: {author['affiliation']}")
print(f"Author Interests: {author['interests']}")

# Print the titles of the author's publications
publication_titles = [pub['bib']['title'] for pub in author['publications']]
print(f"First 3 publication titles: {publication_titles[:3]}")

# Take a closer look at the first publication
if author['publications']:
    first_publication = scholarly.fill(author['publications'][0])
    print(f"\nFirst Publication Title: {first_publication['bib']['title']}")
    print(f"First Publication Abstract: {first_publication['bib']['abstract'][:100]}...")
    
    # Which papers cited that publication?
    citations = [citation['bib']['title'] for citation in scholarly.citedby(first_publication)]
    print(f"First 3 papers citing this publication: {citations[:3]}")

view raw JSON →