JobSpy
JobSpy is a Python library designed for scraping job postings from major job boards including LinkedIn, Indeed, Glassdoor, ZipRecruiter, Google Jobs, Bayt, and Naukri. It aggregates job data into a Pandas DataFrame, supports concurrent scraping, and includes features like proxy support to manage rate limiting. The library is actively maintained, with frequent updates adding new features and improving scraper reliability.
Common errors
-
Received a response code 429
cause The job board site has temporarily blocked your IP due to too many requests.fixWait some time between scrape requests, try using the `proxies` parameter with a proxy list, or consider `use_creds=True` for authenticated access where applicable. -
Segmentation fault: 11 on macOS Catalina
cause This issue is typically related to the `tls_client` dependency not fully supporting the specific macOS architecture or version.fixUpgrade to a newer version of macOS if possible, or refer to the `tls_client` repository for known workarounds or fixes. -
No results when using 'google'
cause The `google_search_term` parameter requires a very specific syntax, usually the exact search string that appears in the Google Jobs search box after applying filters in a browser.fixPerform the desired search on Google Jobs in your browser, then copy and paste the exact query from the Google Jobs search bar into the `google_search_term` parameter. -
Indeed giving unrelated roles
cause Indeed's search engine often includes keywords from job descriptions, not just titles, which can lead to broader, less relevant results.fixRefine your `search_term` using exact phrase matching (e.g., `"engineering intern"`) and negation (e.g., `-marketing`) to narrow down results.
Warnings
- gotcha Default logging verbosity changed. Logs are now suppressed by default, showing only errors.
- gotcha Indeed job sorting changed from date to relevance by default, which may affect expected results.
- gotcha Job boards aggressively block IP addresses for too many requests, leading to `response code 429` (rate limiting).
- gotcha When searching Indeed or Glassdoor, the `country_indeed` parameter is often required for specific countries to yield correct results.
- gotcha Certain parameters like `hours_old` cannot be combined with other filtering parameters (e.g., `job_type` + `is_remote`) for specific sites like Indeed or LinkedIn.
Install
-
pip install -U python-jobspy
Imports
- scrape_jobs
from jobspy import scrape_jobs
Quickstart
import pandas as pd
from jobspy import scrape_jobs
jobs = scrape_jobs(
site_name=["indeed", "linkedin", "zip_recruiter", "google", "glassdoor"],
search_term="software engineer",
location="San Francisco, CA",
results_wanted=10,
country_indeed="USA", # Required for Indeed/Glassdoor in many cases
hours_old=72, # Jobs posted within the last 72 hours
description_format="markdown",
verbose=1 # Show warnings and errors
)
if isinstance(jobs, pd.DataFrame):
print(f"Found {len(jobs)} jobs")
print(jobs.head())
# To save to CSV:
# import csv
# jobs.to_csv(
# "jobs.csv",
# quoting=csv.QUOTE_NONNUMERIC,
# escapechar="\\",
# index=False,
# )
else:
print("No jobs found or an error occurred.")