{"id":7703,"library":"scholarly","title":"scholarly","description":"scholarly is a Python module designed to programmatically retrieve author and publication information from Google Scholar, effectively bypassing CAPTCHA challenges. Currently at version 1.7.11, the library maintains an active development cycle with frequent updates to adapt to changes in Google Scholar's structure and anti-bot measures.","status":"active","version":"1.7.11","language":"en","source_language":"en","source_url":"https://github.com/scholarly-python-package/scholarly","tags":["web scraping","google scholar","academic research","citations","authors","publications"],"install":[{"cmd":"pip install scholarly","lang":"bash","label":"Install latest stable version"},{"cmd":"pip install -U git+https://github.com/scholarly-python-package/scholarly.git","lang":"bash","label":"Install from GitHub (latest development)"}],"dependencies":[{"reason":"Used for HTTP requests, replaced 'requests' in v1.7.7.","package":"httpx","optional":false},{"reason":"Used for parsing HTML responses from Google Scholar.","package":"beautifulsoup4","optional":false},{"reason":"Used for rotating user-agents to mimic human browsing and avoid detection. Library handles if it cannot be imported by using a default user agent.","package":"fake-useragent","optional":true},{"reason":"Used by `ProxyGenerator` for setting up free proxy rotation.","package":"free-proxy","optional":true},{"reason":"Required for Tor integration when installing with `scholarly[tor]`.","package":"stem","optional":true}],"imports":[{"symbol":"scholarly","correct":"from scholarly import scholarly"},{"symbol":"ProxyGenerator","correct":"from scholarly import ProxyGenerator"}],"quickstart":{"code":"from scholarly import scholarly, ProxyGenerator\nimport os\n\n# It is recommended to set up a proxy from the start of your application.\n# scholarly is designed to intelligently use proxies only when necessary.\npg = ProxyGenerator()\n# For using free proxies (often less reliable for continuous scraping)\n# success = pg.FreeProxies()\n# if not success: print(\"Could not set up free proxies. Continuing without.\")\n\n# Example for ScraperAPI (recommended for reliability, requires API key)\n# Set SCAPERAPI_API_KEY environment variable\nscraperapi_key = os.environ.get('SCAPERAPI_API_KEY', '')\nif scraperapi_key:\n    print(\"Using ScraperAPI for proxies.\")\n    pg.ScraperAPI(scraperapi_key)\n    scholarly.use_proxy(pg)\nelse:\n    print(\"SCAPERAPI_API_KEY not found. Using default connection (may hit limits). \n         Consider setting up a proxy for robust scraping.\")\n\n# Search for an author\nsearch_query = scholarly.search_author('Steven A Cholewiak')\nauthor = scholarly.fill(next(search_query))\nprint(f\"Author Name: {author['name']}\")\nprint(f\"Author Affiliation: {author['affiliation']}\")\nprint(f\"Author Interests: {author['interests']}\")\n\n# Print the titles of the author's publications\npublication_titles = [pub['bib']['title'] for pub in author['publications']]\nprint(f\"First 3 publication titles: {publication_titles[:3]}\")\n\n# Take a closer look at the first publication\nif author['publications']:\n    first_publication = scholarly.fill(author['publications'][0])\n    print(f\"\\nFirst Publication Title: {first_publication['bib']['title']}\")\n    print(f\"First Publication Abstract: {first_publication['bib']['abstract'][:100]}...\")\n    \n    # Which papers cited that publication?\n    citations = [citation['bib']['title'] for citation in scholarly.citedby(first_publication)]\n    print(f\"First 3 papers citing this publication: {citations[:3]}\")","lang":"python","description":"This quickstart demonstrates how to search for an author, retrieve their full profile and publications, and find papers that cite a specific publication. It also includes an example of how to set up a `ProxyGenerator` for robust scraping, which is crucial for reliably interacting with Google Scholar's anti-bot mechanisms."},"warnings":[{"fix":"Update your code to use `httpx` compatible patterns or ensure `scholarly` is isolated from other `requests`-dependent code. If using `httpx` directly, ensure compatibility.","message":"Version 1.7.7 introduced a breaking change by switching the underlying HTTP client from `requests` to `httpx`. Code relying on `requests`-specific functionalities or its session objects will break.","severity":"breaking","affected_versions":">=1.7.7"},{"fix":"Always use a `ProxyGenerator` instance with `scholarly.use_proxy()`. Consider using premium proxy services (like ScraperAPI, Bright Data) for higher reliability, or `pg.FreeProxies()` as a free alternative (less robust).","message":"Google Scholar employs aggressive anti-bot measures, including CAPTCHAs and rate-limiting. Without proper proxy configuration, your IP address may be temporarily or permanently blocked, leading to `exceeding maximum number of tries` errors.","severity":"gotcha","affected_versions":"All"},{"fix":"Migrate to other proxy methods, such as `FreeProxies()` or premium services via `ScraperAPI()`/`BrightData()`. If you still wish to use Tor, you must install `scholarly` with the `[tor]` extra (e.g., `pip install scholarly[tor]`).","message":"Tor-related proxy methods (`Tor_External`, `Tor_Internal`) have been deprecated since v1.5 and are no longer actively tested or supported.","severity":"deprecated","affected_versions":">=1.5"},{"fix":"Upgrade to `scholarly` version 1.7.8 or newer: `pip install --upgrade scholarly`.","message":"Version 1.7.7 introduced an incompatibility with ScraperAPI which was fixed in v1.7.8. Users on v1.7.7 will experience issues when trying to use ScraperAPI.","severity":"breaking","affected_versions":"1.7.7"},{"fix":"Update to version 1.7.11 or newer to benefit from improved handling of `scholar_id` redirects.","message":"The `search_author_id` function now handles redirects that occur when using approximate or outdated `scholar_id` values. Previously, this might have led to incorrect or failed searches.","severity":"gotcha","affected_versions":"<1.7.11"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Implement a robust proxy strategy using `scholarly.use_proxy(ProxyGenerator())`. For persistent scraping, consider a paid proxy service (e.g., ScraperAPI) or ensure `pg.FreeProxies()` is working correctly. Add delays between requests if scraping in a loop.","cause":"Google Scholar has detected automated access and is blocking requests, often due to rate-limiting or CAPTCHA challenges.","error":"scholarly.exceptions.MaxTriesExceededException: Exceeded maximum number of tries to fetch url. Check if your connection is good."},{"fix":"After getting an initial author object (e.g., `next(search_query)`), call `scholarly.fill(author_object)` to retrieve comprehensive details like publications, co-authors, and citation counts. For specific sections, use `scholarly.fill(author_object, sections=['publications'])`.","cause":"The `author` object was not fully 'filled' with detailed information, including publications. By default, initial search results provide only summary data to avoid overloading Google Scholar.","error":"AttributeError: 'Author' object has no attribute 'publications'"},{"fix":"Rename your local `scholarly.py` file to something else (e.g., `my_script.py`) or run your script from a directory where no such file exists.","cause":"This error can occur if you have a local file named `scholarly.py` in your working directory, which shadows the installed library.","error":"ImportError: cannot import name 'scholarly' from 'scholarly'"},{"fix":"Ensure you are using the correct access pattern for your `scholarly` version. The current recommended way to access bibliographic data is `pub['bib']['title']` (for dictionary-like access) and `scholarly.citedby(pub)` (for generator). Upgrade to the latest `scholarly` version for consistency.","cause":"In older versions of `scholarly` (e.g., pre-v0.4.1), the method to access publication titles was `pub.bib['title']`. In some newer versions or during transition, it might have been `pub.bib.title` or `pub.title`. Also, `pub.citedby` was sometimes a method, sometimes an attribute.","error":"TypeError: 'builtin_function_or_method' object is not subscriptable (when accessing `pub.bib['title']`)"}]}