Trying to learn how to scrape Glassdoor data? You're at the right place. In this guide, I’ll show you exactly how to extract job title descriptions, salaries, and company information using ScrapingBee’s powerful API.
You may already know this – Glassdoor is a goldmine of information, but scraping it can be a challenging task. The site utilizes dynamic content loading and sophisticated bot protection. As a result, the Glassdoor website is out of reach for an average web scraper. I’ve spent countless hours battling these defenses with custom solutions with no luck.
Now I only use this solution as my main webscraper. Simply, because it handles all the complex stuff for you – proxies, browser headers, and JavaScript rendering. ScrapinBee makes it easier to focus on the Glassdoor data rather than sourcing and configuring proxies.
I'll show exactly how straightforward this service is. By the end of this guide, you'll know how to build a reliable Glassdoor scraper to access data, such as company details, job descriptions, and employee reviews – all with basic knowledge of Python coding. Let's dive in.
Quick Answer (TL;DR)
ScrapingBee is powered by a Glassdoor Scraping API that includes JavaScript rendering and custom headers. Using it is extremely simple. Start by defining your target Glassdoor URL and extract job titles, companies, ratings, and salaries with this code:
from scrapingbee import ScrapingBeeClient
import json
# Step 1: Initialize ScrapingBee client
api_key = "YOUR_API_KEY"
client = ScrapingBeeClient(api_key=api_key)
# Step 2: Define target URL (job listings for a search term)
url = "https://www.glassdoor.com/Job/software-engineer-jobs-SRCH_KO0,17.htm"
# Step 2: Set up extract rules for job data
extract_rules = {
"jobs": {
"selector": ".react-job-listing",
"type": "list",
"output": {
"title": {"selector": ".jobLink", "output": "text"},
"company": {"selector": ".jobEmpolyerName", "output": "text"},
"location": {"selector": ".jobLocation", "output": "text"},
"salary": {"selector": ".salarySnippet", "output": "text"},
"rating": {"selector": ".jobRating span", "output": "text"}
}
}
}
# Step 3: Call ScrapingBee with extract_rules
response = client.get(url, params={
'extract_rules': extract_rules,
'render_js': True # ensure JS content is rendered
})
data = response.json()
# Step 4: Iterate and display results
for job in data.get("jobs", []):
print("Title :", job.get("title"))
print("Company:", job.get("company"))
print("Location:", job.get("location"))
print("Salary :", job.get("salary"))
print("Rating :", job.get("rating"))
print("-" * 40)
This code snippet demonstrates the core functionality, but there’s much more to learn about effectively scraping Glassdoor data. Let’s break down this process step by step so you understand exactly how it works.
How to Scrape Glassdoor with ScrapingBee
If you tried web scraping before, you've certainly faced JavaScript rendering issues and anti-bot measures. Glassdoor loads content dynamically, which means a simple HTTP request won’t work. You need to arm yourself with a browser environment to execute JavaScript and render the page properly.
That's what I like about our platform. It solves this problem by providing a headless browser infrastructure that renders pages just like a real browser would.
When you make a request through ScrapingBee, the scraping process looks like this:
It sends the request through clean proxies to avoid IP blocks
Then, renders all JavaScript on the page
Waits for the content to fully load
Extracts the data according to your specifications
I’ve found this approach much more reliable than trying to maintain my own proxy infrastructure or browser automation setup. If you want to learn more about our platform's capabilities, check out the documentation.
Understanding Glassdoor’s Structure to extract company details
Before diving into the code, it’s important to understand how Glassdoor organizes its data. The website follows specific URL patterns that make systematic scraping possible:
Company details: https://www.glassdoor.com/Overview/Working-at-\[Company\]-EI_IE\[CompanyID\].htm
Company reviews: https://www.glassdoor.com/Reviews/\[Company\]-Reviews-E\[CompanyID\].htm
Job listings: https://www.glassdoor.com/Job/\[search-term\]-jobs-SRCH_KO0,\[length\].htm
Salary information: https://www.glassdoor.com/Salaries/\[Company\]-Salaries-E\[CompanyID\].htm
Each of these sections contains valuable data fields that can be extracted using the right selectors. The company ID is a unique identifier assigned by the Glassdoor website to each company. Keep this in mind when web scraping – this ID is essential for targeting specific company data.
Set Up Python and ScrapingBee
Now it's time to kick-start the web scraping process. First, you’ll need to set up your environment. Here’s how you should do it:
We'll be scraping with Python, so go to the official website and download it.
Run the installer and make sure to check the box that says “Add Python to PATH” before clicking Install.
Verify the installation by opening a terminal or command prompt and typing:
python --version
You should see the Python version you installed.
If you haven't already, sign up at ScrapingBee and grab your API key from the Dashboard.
- Install the ScrapingBee Python SDK (includes requests logic and simplifies extract rules):
pip install scrapingbee
- Now, initialize the client in your Python script:
from scrapingbee import ScrapingBeeClient
api_key = "YOUR_API_KEY" # Replace with your actual key
client = ScrapingBeeClient(api_key=api_key)
The ScrapingBee SDK makes the whole process much simpler than writing raw requests code. It handles authentication, request formatting, and response parsing for you.
One feature I particularly like is the JavaScript scenario feature. This allows you to execute custom JavaScript on the page before extraction. For example, you could use it to:
Click on elements to load more content
Scroll down to trigger lazy loading
Dismiss popup dialogs that might block content
Fill in forms or perform searches
Here’s an example of using a JavaScript scenario to scroll down a Glassdoor page to load more reviews:
scroll_script = """
// Scroll to bottom of page
window.scrollTo(0, document.body.scrollHeight);
// Wait for content to load
await new Promise(r => setTimeout(r, 2000));
// Scroll again to trigger more loading
window.scrollTo(0, document.body.scrollHeight);
// Final wait for content
await new Promise(r => setTimeout(r, 2000));
"""
response = client.get(
url,
params={
'render_js': True,
'js_snippet': scroll_script
}
)
This capability is particularly useful when scraping interview reviews that load incrementally as you scroll down the page.
Data extraction with CSS Selectors or Extraction Rules
Now, let's use your Glassdoor scraper to extract data.
When you scrape Glassdoor reviews, you gain valuable insights into company culture and employee satisfaction. The key to successful extraction is identifying the right CSS selectors.
Glassdoor uses React and loads data dynamically, but many fields are embedded in the initial HTML or GraphQL payload. For demonstration, we’ll target a job listing page for a particular search term:
https://www.glassdoor.com/Job/software-engineer-jobs-SRCH_KO0,17.htm
Using extract rules is the most efficient way to get structured data from Glassdoor.
Here’s how to set them up:
url = "https://www.glassdoor.com/Job/software-engineer-jobs-SRCH_KO0,17.htm"
extract_rules = {
"jobs": {
"selector": ".react-job-listing",
"type": "list",
"output": {
"title": {"selector": ".jobLink", "output": "text"},
"company": {"selector": ".jobEmpolyerName", "output": "text"},
"location": {"selector": ".jobLocation", "output": "text"},
"salary": {"selector": ".salarySnippet", "output": "text"},
"rating": {"selector": ".jobRating span", "output": "text"}
}
}
}
This configuration tells ScrapingBee to:
Find all elements matching .react-job-listing (each job data card)
For each job card, extract the text from the specified selectors
Return the data as a structured JSON object
I always recommend previewing selectors in browser developer tools using document.querySelector to confirm they’re valid before running your scraper. This saves time and prevents errors in your extraction process.
For example, to test if the job title selector works, you could open the console tab in your browser’s developer tools while viewing a Glassdoor job listing data and type:
document.querySelectorAll('[data-test="job-title"]').forEach(el =>
console.log(el.textContent.trim())
);
If this returns the job title correctly, you know your selector is working.
Once you’ve set up your extraction rules, making the request is straightforward:
response = client.get(url, params={
'extract_rules': extract_rules,
'render_js': True # ensure JS content is rendered
})
data = response.json()
The response will contain your structured data, ready for analysis or storage.
Scraping Different Types of Glassdoor Data
Glassdoor contains a wide variety of valuable information beyond just job titles. For anyone interested in career research, salary benchmarking, creating a job board, or market insights, it’s useful to understand the different categories to ensure data accuracy.
With a web scraping tool like ScrapingBee, you can simulate a browser request, bypass many anti-bot measures, and reliably fetch every data set. Let's take a look at how it works in action.
Scraping Glassdoor Reviews
To scrape reviews, you’ll need to target the reviews page and extract each review element.
Use this Python script to fetch Glassdoor reviews:
def scrape_company_reviews(company_name, company_id, page=1):
url = f"https://www.glassdoor.com/Reviews/{company_name}-Reviews-E{company_id}_P{page}.htm"
extract_rules = {
"reviews": {
"selector": ".empReview",
"type": "list",
"output": {
"title": {"selector": ".reviewLink", "output": "text"},
"rating": {"selector": ".ratingNumber", "output": "text"},
"position": {"selector": ".authorJobTitle", "output": "text"},
"date": {"selector": ".authorInfo .date", "output": "text"},
"pros": {"selector": ".pros", "output": "text"},
"cons": {"selector": ".cons", "output": "text"},
"advice": {"selector": ".adviceMgmt", "output": "text"}
}
}
}
response = client.get(
url,
params={
'extract_rules': extract_rules,
'render_js': True
}
)
return response.json()
This function allows you to scrape employee reviews for a specific company, with pagination support. The extracted data includes the review title, rating, employee position, date, pros, cons, and advice to management.
Scraping Salary Data with a GlassDoor Scraper
Salary information is another valuable data point available on Glassdoor. Here’s how to extract it:
def scrape_salary_data(company_name, company_id):
url = f"https://www.glassdoor.com/Salary/{company_name}-Salaries-E{company_id}.htm"
extract_rules = {
"salaries": {
"selector": ".salaryRow",
"type": "list",
"output": {
"job_title": {"selector": ".jobTitle", "output": "text"},
"salary_range": {"selector": ".salaryRange", "output": "text"},
"base_pay": {"selector": ".basePay", "output": "text"},
"additional_pay": {"selector": ".additionalPay", "output": "text"},
"sample_size": {"selector": ".sampleSize", "output": "text"}
}
}
}
response = client.get(
url,
params={
'extract_rules': extract_rules,
'render_js': True
}
)
return response.json()
This function extracts salary information for different job titles within company pages, including salary ranges, base pay, additional pay, and sample size.
Scraping Interview Questions
Interview questions can provide valuable insights for job seekers. Here’s how to extract them:
def scrape_interview_questions(company_name, company_id):
url = f"https://www.glassdoor.com/Interview/{company_name}-Interview-Questions-E{company_id}.htm"
extract_rules = {
"interviews": {
"selector": ".interviewQuestion",
"type": "list",
"output": {
"question": {"selector": ".questionText", "output": "text"},
"job_title": {"selector": ".jobTitle", "output": "text"},
"difficulty": {"selector": ".difficultyLabel", "output": "text"},
"experience": {"selector": ".interviewReview", "output": "text"}
}
}
}
response = client.get(
url,
params={
'extract_rules': extract_rules,
'render_js': True
}
)
return response.json()
This function extracts interview questions, associated job titles, difficulty ratings, and interview reviews.
Exporting Scraped Glassdoor Data
Once you’ve collected company data, the next step is to store the scraped data in a format that’s easy to analyze or share. The two most common options are CSV files and JSON files.
Export to JSON
ScrapingBee’s API already returns data as JSON by default. You can save it directly to a file like this:
import json
with open("glassdoor_data.json", "w") as f:
json.dump(data, f, indent=2)
This format is great if you plan to process the data programmatically or feed it into another application.
Export to CSV
If you want to work with your results in Excel, Google Sheets, or a data analysis tool, convert them into CSV:
import csv
with open("glassdoor_data.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["Title", "Company", "Location", "Salary", "Rating"]) # headers
for job in data.get("jobs", []):
writer.writerow([
job.get("title"),
job.get("company"),
job.get("location"),
job.get("salary"),
job.get("rating")
])
This creates a spreadsheet-friendly version of your scraped Glassdoor dataset, making it easy to filter, chart, or compare different roles across company pages and job boards.
Additional Considerations
When web scraping Glassdoor, there are three key technical challenges to keep in mind: respecting rate limits, managing pagination, and waiting for dynamically loaded elements. Handling these properly ensures your scraper runs smoothly, avoids unnecessary errors, and minimizes the risk of being blocked.
Waiting for Elements
If job listings load slowly, you can use the wait_for parameter to ensure content is fully loaded before extraction:
response = client.get(url, params={
'extract_rules': extract_rules,
'render_js': True,
'wait_for': '.react-job-listing' # Wait until this selector appears
})
Pagination
Glassdoor pages use React and AJAX for pagination. For basic listings, you can modify the URL to access different pages. For example:
# Page 1
url = "https://www.glassdoor.com/Job/software-engineer-jobs-SRCH_KO0,17.htm"
# Page 2
url = "https://www.glassdoor.com/Job/software-engineer-jobs-SRCH_KO0,17_IP2.htm"
For deeper scraping (like all reviews, job descriptions, or salary history), you might need to use GraphQL extraction or implement scrolling strategies using JavaScript scenarios.
Respect Rate Limits & Robots.txt
Even with ScrapingBee handling the technical aspects, it’s important to scrape responsibly:
Space out your requests to avoid overwhelming the site
Don’t extract more data than you need
Check Glassdoor’s robots.txt for any specific directives
Consider using the official Glassdoor API if available for your use case
While Glassdoor may be legal to scrape, scraping responsibly is the first line of defense. However, it may not always be enough. Glassdoor employs a range of anti-bot measures designed to detect and block automated traffic. In the next section, we’ll explore how to recognize these barriers and strategies to handle them effectively.
Handling Glassdoor Anti-Bot Measures
Learning how to avoid getting blocked on Glassdoor can save you hours of development time. One of the biggest challenges is dealing with anti-bot protection.
You’ll need a few specific web scraping tools, which I’ll cover below.
User-Agent Rotation
Glassdoor tracks browser fingerprints to identify Glassdoor scrapers. Our solution rotates user-agents to make each request appear to come from a different browser:
response = client.get(url, params={
'render_js': True,
'premium_proxy': 'true' # Uses premium proxies with rotating user-agents
})
IP Rotation
To avoid getting blocked when scraping Glassdoor, IP address rotation is crucial. ScrapingBee automatically rotates IP addresses to prevent Glassdoor from detecting patterns in your requests:
response = client.get(url, params={
'render_js': True,
'country_code': 'us' # Optionally specify country for IPs
})
JavaScript Rendering
Since Glassdoor relies heavily on JavaScript, our platform renders the full page just like a real browser:
response = client.get(url, params={
'render_js': True,
'premium_proxy': 'true'
})
I’ve found that these features, when combined, make our platform significantly more reliable than custom solutions. In my experience, custom scrapers for Glassdoor typically need constant maintenance as the site updates its defenses.
Why Scrape Glassdoor Data?
Now that you're familiar with the technical details of how to scrape Glassdoor, let’s explore why this data is so valuable. Glassdoor provides a wide range of information that benefits various stakeholders.
Market Research and Competitive Intelligence
Glassdoor provides unique insights that aren’t easily available elsewhere. When you extract data from Glassdoor, you gain access to:
Salary benchmarks: Understanding compensation trends across industries, roles, and locations
Company culture insights: Employee sentiment analysis based on reviews
Benefits information: You can compare what perks and benefits companies offer
Interview processes: How companies conduct their hiring
For example, a tech startup might scrape Glassdoor job data to understand the competitive salary landscape before setting its own compensation packages. This helps them stay competitive without overspending.
Recruitment and HR Applications
HR professionals and recruiters can leverage a Glassdoor scraper to:
Track employer brand reputation through company reviews
Monitor employee satisfaction trends
Identify common complaints or praise points
Benchmark their company against competitors
I once worked with a mid-sized tech company that utilized scraped Glassdoor company reviews to identify and address recurring themes in negative feedback, ultimately improving their retention rates.
Investment Research
Investors and financial analysts often use web scraping Glassdoor to gather intelligence on companies they’re evaluating:
Employee sentiment as a leading indicator of company performance
Job openings may signal expansion or contraction
Executive approval ratings
Salary growth or stagnation
The data points available through Glassdoor data extraction can provide valuable signals that complement traditional financial analysis.
Unlocking Glassdoor Data with Web Scraping
With the right web scraper, Glassdoor becomes a rich source of public data on job postings, company pages, and salary information from current and former employees. Using a tool like ScrapingBee makes it simple to extract desired data fields – from job openings and company names to total pay and reviews – without worrying about JavaScript, CAPTCHAs, or IP blocks.
The scraped results can easily be exported into a CSV file or JSON file, ready for analysis or integration into a job board or HR project. Whether you’re tracking machine learning engineer roles on the first page of a search or collecting data across more pages, our platform handles the heavy lifting so you can focus on insights.
Always scrape responsibly: respect Glassdoor’s servers, check robots.txt, and confirm whether it’s legal to scrape for your use case.
With these practices in place, you can reliably collect all the data you need from Glassdoor and turn it into actionable insights.
Start Scraping Glassdoor with ScrapingBee Today
Ready to extract valuable insights from Glassdoor? ScrapingBee makes it simple to get started:
Sign up for a free account at scrapingbee.com
Get 1,000 free API credits to test your Glassdoor scraping
Use the code examples from this guide to start extracting job listings, company reviews, and salary data
The setup takes less than 5 minutes, and you’ll save countless hours compared to building and maintaining your own scraping infrastructure.
Frequently Asked Questions (FAQs)
Is scraping Glassdoor legal?
Web scraping publicly available data is generally legal, but you should use the data responsibly. Avoid republishing content verbatim, respect Glassdoor’s terms of service regarding data usage, and consider consulting legal advice for your specific use case.
Why does Glassdoor block my scraper?
Glassdoor blocks scrapers to protect its data and server resources. They detect unusual patterns like too many requests from one IP, missing cookies/headers, or bot-like behavior. ScrapingBee helps avoid these issues by mimicking real user behavior.
Can I get salary info from Glassdoor listings?
Yes, you can scrape salary information from Glassdoor job listings when available. Not all listings include salary data, but when present, it can be extracted using the .salarySnippet selector as shown in our examples.
Does ScrapingBee work for logged-in Glassdoor pages?
ScrapingBee can handle some login-protected content using cookies. For simple cases, you can extract cookies from your browser and pass them with your request. For more complex scenarios, ScrapingBee’s JavaScript scenario feature can automate the login process.

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.