How to Scrape Google Jobs: Step-by-Step Guide

Kevin Sahin | 22 August 2025 | 15 min read

Table of contents

If you're looking for a straightforward way to scrape Google Jobs, you're in the right place. In this guide, we'll walk through the steps to extract job listings and related data in just minutes using ScrapingBee. Our powerful web scraping API handles the toughest parts of the process for you: JavaScript rendering, proxy rotation, and CAPTCHA bypassing to provide the neccessary tools for consistent and reliable data extraction.

Quick Answer (TL;DR)

To scrape Google Jobs with our HTML API, write a Python script to send a GET request to its endpoint with your target Google search URL. Our tools allow you toenable JavaScript rendering by adding a short render_js=true line, which enables the use of a headless browser to bypass bot restriction and load Google's dynamic content. Add a BS4 parser to remove clutter and focus only on HTML elements that carry relevant job data.

Need more help? Check out our specialized Google Jobs Scraping API for an even easier time extracting job postings!

Quickstart: Full Parser Code

If you want to skip the tutorial and start scraping immediately with our API, here you can copy our entire script to start targeting multiple job boards and save them in a CSV file. Just adjust the position and location parameters within the asyncio loop and you are good to go!

import asyncio
from scrapingbee import ScrapingBeeClient
import pandas as pd
import json

client = ScrapingBeeClient(api_key='YOUR_API_KEY')

def google_jobs_api(position='jobs', location="USA"):
    extract_rules = {
        "jobs": {
            "selector": ".EimVGf",
            "type": "list",
            "output": {
                "title": "div.tNxQIb.PUpOsf",
                "company": "div.wHYlTd.MKCbgd.a3jPc",
                "location_and_portal": "div.wHYlTd.FqK3wc.MKCbgd",
                "posted": "span[aria-label$=ago]",
                "employment_type": "span[aria-label^=Employment]",
                "salary": "span[aria-label^=Salary]",
                "job_detail": ".MQUd2b@href"
            }
        }
    }

    js_scenario = {
        "instructions":[
            {"evaluate":"(async()=>{let startTime=Date.now();let timeLimit=30000;let lastScrollHeight=0;function sleep(ms){return new Promise(resolve=>setTimeout(resolve,ms));}async function scrollToEnd(){while(Date.now()-startTime<timeLimit){window.scrollTo(0,document.body.scrollHeight);await sleep(1000);let currentScrollHeight=document.body.scrollHeight;if(currentScrollHeight===lastScrollHeight){console.log('Reached the end of the page.');break;}else{lastScrollHeight=currentScrollHeight;}}console.log('Scrolling completed or time limit reached.');}await scrollToEnd();let indicator=document.createElement('div');indicator.id='scrollCompleted';document.body.appendChild(indicator);})();"},
            {"wait_for": "#scrollCompleted"}
        ]
    }

    position = position.replace(' ', '+')
    location = location.replace(' ', '+')

    response = client.get(
        f'https://www.google.com/search?q={position.replace(" ","+")}+job+near+{location.replace(" ","+")}&udm=8',
        params={
            "custom_google": "true",
            "stealth_proxy": "true",
            "render_js": "true",
            "extract_rules": extract_rules,
            "js_scenario": js_scenario,
            "country_code": "us"
        },
        retries=2
    )

    job_data = response.json().get('jobs', [])
    df = pd.DataFrame(job_data)
    df.to_csv("google_jobs_output.csv", mode='a', index=False, header=False)
    return {'count': len(job_data)}

async def run_all_queries():
    loop = asyncio.get_event_loop()
    tasks = [
        loop.run_in_executor(None, google_jobs_api, "Software Developer", "New York, USA"),
        loop.run_in_executor(None, google_jobs_api, "Data Analyst", "San Francisco, USA"),
        loop.run_in_executor(None, google_jobs_api, "Marketing Manager", "Austin, USA"),
    ]
    results = await asyncio.gather(*tasks)

asyncio.run(run_all_queries())

Setting Up Your Python Environment

The ScrapingBee HTML API is a great asset that greatly simplifies data extraction from resilient sources. All you need is a functional Python environment that integrates our tools and essential libraries to infuse the script with its intended functionality.

Python is the most popular coding language and an undeniable favorite for automated data collection. Its ease of use and seamless integration of external libraries is enough to empower even the absolute beginners.

Note: For more information on coding essentials, check out our post on Python web scraping.

Install Python and required libraries

Before we begin, make sure to install Python 3.6 or later — on Windows, download the installer from python.org, run it, and tick “Add Python to PATH”;

Note: On macOS, use the official .pkg or Homebrew (brew install python); on Linux, install via your distro’s package manager (apt, dnf, etc.)

For this tutorial, we will focus on the Windows setup. Proceed with the instructions in the installation wizard. Once completed, verify the version by running this line of code in your Command Prompt:

python --version

If installation is successful, you will see your Python version. Once everything is in check, use Python Installs Packages (PIP) to install the necessary packages from the Python Package Index (PyPI):

scrapingbee – routes your requests through a headless-browser API, handling JavaScript and proxies so you get clean HTML.
pandas – structures the scraped data into DataFrames, making filtering, cleaning, and export painless.
beautifulsoup4 – parses the raw HTML, letting you locate and extract tags, text, and attributes with simple selectors.

You can install each package individually, or separate each one by a space to handle the installation in one line, like this:

pip install scrapingbee pandas beautifulsoup4

These libraries will handle HTTP requests and data manipulation, respectively. Our API will extract the data, and beautifulsoup4 and pandas work together to clean up, organize, and structure data in a readable and understandable format.

Note: Want to work with different coding tools? Check out our guide on Best Python scraping libraries!

After successful installation, you can double-check the packages with a pip list command.

command prompt

Get access to a scraping API

If its your first time working with ScrapingBee, Sign up and test the benefits yourself with our free trial of 1000 free API calls! Experience the convenience of our tools firsthand before further commitments to see for yourself the broad scope of unlocked scraping capabilities!

Register your account to get your API key. After registration, you'll find your key in the dashboard. This key is essential for authenticating your requests to our API.

After a successful signup, you will be presented with a dashboard displaying the expiration date of your free trial, available credits, concurrent connections, and most importantly – your API key. Copy it so we can finally start building our scraping script.

dashboard

Test your first API request

To make the first API request, it's time to prepare the coding environment and import the downloaded scraping libraries into the script. Start your code by entering these lines:

import pandas
from scrapingbee import ScrapingBeeClient
from bs4 import BeautifulSoup

Note: Python library imports have to be at the top of your coding script before invoking their tools.

Now it's time to create the connection variable client (or any name of your choice) which will calls the creation of the ScrapingBee client. Enter the API key in the parentheses:

# Set up the ScrapingBee client
client = ScrapingBeeClient(api_key='YOUR_API_KEY')

Once that is set up, create the response variable, which will store the data from the GET API call, following your provided wepbage URL:

# Store data in the "response" variable by invoking the GET method in your ScrapingBee client
response = client.get("YOUR-URL")

The GET API call in the response element can be customized with parameters, defined within the parentheses of your API call. When you plan to extract information from Google, we have two mandatory parameters:

render_js (True/False): Enables/disables the use of a headless browser for JavaScript rendering
custom_google (True/False) : A mandatory parameter for accessing Google and its subdomains with our HTML API.

After the provided webpage URL, add the comma to provide these additional instructions – a dictionary of parameters, stored within the params variable:

# Add ScrapingBee parameters to the GET API call
response = client.get('YOUR URL',
    params={
        'custom_google': True,
        'render_js': True,
}
)

Let's test that everything works by making a simple request to Google. Before extracting any data, let's add the wepbage URL, and a print function that will retrieve the HTML status code.

import pandas
from scrapingbee import ScrapingBeeClient
from bs4 import BeautifulSoup

# Set up the ScrapingBee client
client = ScrapingBeeClient(api_key='YOUR_API_KEY')

# Request rendered Google Jobs search results
response = client.get(
    'https://www.google.com/',
    params={
        'custom_google': True,
        'render_js': True,
    }
)
# Check if connection is successful, look for response status code: 200
print('Response HTTP Status Code: ', response.status_code)

If your connection is successful, your Command Prompt should look like this:

connection

The render_js=true parameter is crucial here – it tells our API to execute JavaScript, which is necessary for Google Jobs content to appear.

Building the Scraper Step-by-Step

Now that our environment is ready, let's create a scraper that can handle different job titles and locations. I've built dozens of these scrapers, and the key is constructing valid search URLs that Google understands.

Create search queries and locations

Effective extraction of Google Jobs listings depends on two key factors: dynamic structuring of URLs. By structuring a search query that triggers the Google Jobs widget, we can see the changes in the URL, which will help form dynamic URLs for each scraping request.

Here is an example of an URL that will trigger the widget according to the desired position and location:

https://www.google.com/search?q={position}+job+near+{location}&udm=8

Note: The &udm=8 is a special query parameter that ensures the rendering of the Google Jobs panel instead of standard search results.

For more details on how to approach search results, check out our blog on scraping Google Search!

Form dynamic URLs for each query

Next, we'll create properly formatted URLs for Google Jobs searches. To properly structure our data collection tool, we will define a function that will invoke all the elements of our Google Jobs Scraper.

# Creating the google_jobs_api function with position and location parameters
def google_jobs_api(position='jobs', location="USA"):

Within the function, we will redefine the response variable to make sure the GET API call uses a dynamic URL based on these position and location of job listings:

def google_jobs_api(position='jobs', location="USA"):    
 position = position.replace(' ', '+')
 location = location.replace(' ', '+')
 response =  client.get(
 f'https://www.google.com/search?q={position}+job+near+{location}&udm=8',

Prepare the API payload with parsing logic

Now we'll set up our ScrapingBee request with additional parameters:

stealth_proxy: Enables proxy connections that mimic human-like browsing.
country_code: Routes the request through a proxy server in the United States.

    response =  client.get(
        f'https://www.google.com/search?q={position}+job+near+{location}&udm=8',
        params={
            "render_js": "true", 
            "custom_google": "true",
            "stealth_proxy": "true", 
            'country_code':'us'
        },  
        retries=2
    )

The next portion of contains covers parsing of the extracted HTML code with Beautifulsoup. The soup variable activates the parser and stores the result. After inspecting the Google Jobs search results, we can see that each individual job listing is stored in div elements with the .EimVGf class.

    soup = BeautifulSoup(response.text, 'html.parser')
    job_cards = soup.select('.EimVGf')

Now the soup variable only stores information from these specific div elements. By inspecting them, we can separate and extract job details and assign them to a desired variable. Let's build a for loop which scans all ements with the .EimVGf class. Every job card is appended into the list of jobs, while each card is defined as a Python dictionary that stores key-value pairs.

    jobs = []
    for card in job_cards:
        title = card.select_one('div.tNxQIb.PUpOsf')
        company = card.select_one('div.wHYlTd.MKCbgd.a3jPc')
        location = card.select_one('div.wHYlTd.FqK3wc.MKCbgd')
        posted = card.select_one('span[aria-label$="ago"]')
        employment_type = card.select_one('span[aria-label^="Employment"]')
        salary = card.select_one('span[aria-label^="Salary"]')
        job_link = card.select_one('.MQUd2b')

        jobs.append({
            "title": title.get_text(strip=True) if title else None,
            "company": company.get_text(strip=True) if company else None,
            "location_and_portal": location.get_text(strip=True) if location else None,
            "posted": posted.get_text(strip=True) if posted else None,
            "employment_type": employment_type.get_text(strip=True) if employment_type else None,
            "salary": salary.get_text(strip=True) if salary else None,
            "job_detail": job_link['href'] if job_link and job_link.has_attr('href') else None
        })

    return {
        'count': len(jobs),
        'jobs': jobs,
        'info': f"{response.status_code} {'SUCCESS' if jobs else 'NO JOBS FOUND'}"
    }

After defining the GET API request and information parsing steps, we can close the google_jobs_api function definition and invoke it with the desired parameters. Let's test our first script and look for Software Developer listings in New York:

def google_jobs_api(position='jobs', location="USA"):


    position = position.replace(' ', '+')
    location = location.replace(' ', '+')

    response =  client.get(
        f'https://www.google.com/search?q={position}+job+near+{location}&udm=8',
        params={ 
            "custom_google": "true",
            "stealth_proxy": "true",
            "render_js": "true",
           # "js_scenario": js_scenario, 
            'country_code':'us'
        },  
        retries=2
    )

    soup = BeautifulSoup(response.text, 'html.parser')
    job_cards = soup.select('.EimVGf')
    
    jobs = []
    for card in job_cards:
        title = card.select_one('div.tNxQIb.PUpOsf')
        company = card.select_one('div.wHYlTd.MKCbgd.a3jPc')
        location = card.select_one('div.wHYlTd.FqK3wc.MKCbgd')
        posted = card.select_one('span[aria-label$="ago"]')
        employment_type = card.select_one('span[aria-label^="Employment"]')
        salary = card.select_one('span[aria-label^="Salary"]')
        job_link = card.select_one('.MQUd2b')

        jobs.append({
            "title": title.get_text(strip=True) if title else None,
            "company": company.get_text(strip=True) if company else None,
            "location_and_portal": location.get_text(strip=True) if location else None,
            "posted": posted.get_text(strip=True) if posted else None,
            "employment_type": employment_type.get_text(strip=True) if employment_type else None,
            "salary": salary.get_text(strip=True) if salary else None,
            "job_detail": job_link['href'] if job_link and job_link.has_attr('href') else None
        })

    return {
        'count': len(jobs),
        'jobs': jobs,
    }
# Run
print(google_jobs_api(position='Software Developer', location='New York, USA'))

After running the scraping script, we can see that it works, but it only shows 10 job postings:

rendered

Let's make some adjustments to extract fully rendered job listings by defining additional parameters in the GET API request, and utilize ScrapingBee's extraction engine's better parsing capabilities when working with dynamic JavaScript attributes.

Extracting and Saving Job Data

With our scraper fetching the pages, we now need to extract structured data from them. In my experience, using XPath provides the most reliable way to navigate Google's complex DOM structure.

Use CSS / XPath to extract job details

To extract the job information, we'll parse the HTML and use CSS or XPath expressions. Instead of parsing with BeautifulSoup, here we will define the extract_rules dictionary, which will be one of the parameters in our GET API call request:

ef google_jobs_api(position='jobs', location="USA"):

    extract_rules = {
        "jobs": {
        "selector": ".EimVGf",
        "type": "list",
        "output": {
            "title": "div.tNxQIb.PUpOsf",
            "company": "div.wHYlTd.MKCbgd.a3jPc",
            "location_and_portal": "div.wHYlTd.FqK3wc.MKCbgd",
            "posted": "span[aria-label$=ago]",
            "employment_type": "span[aria-label^=Employment]",
            "salary": "span[aria-label^=Salary]",
            "job_detail": ".MQUd2b@href"
        }
        }
    }

These CSS expressions target the specific elements containing job information. You might need to adjust them if a Google page changes its structure. They can be redefined using XPath selectors to handle class combinations, nested divs, and partial attribute matches with more precision. For more information, check out our blog on XPath scraping.

To make sure that our API not only takes care of JavaScript rendering but also scrolls through more job listings, we need to define instructions for the js_scenario parameter:

    js_scenario = {
        "instructions":[
            {"evaluate":"(async()=>{let startTime=Date.now();let timeLimit=30000;let lastScrollHeight=0;function sleep(ms){return new Promise(resolve=>setTimeout(resolve,ms));}async function scrollToEnd(){while(Date.now()-startTime<timeLimit){window.scrollTo(0,document.body.scrollHeight);await sleep(1000);let currentScrollHeight=document.body.scrollHeight;if(currentScrollHeight===lastScrollHeight){console.log('Reached the end of the page.');break;}else{lastScrollHeight=currentScrollHeight;}}console.log('Scrolling completed or time limit reached.');}await scrollToEnd();let indicator=document.createElement('div');indicator.id='scrollCompleted';document.body.appendChild(indicator);})();"},
            {"wait_for": "#scrollCompleted"}
        ]
    }

Now, integrate it into the list of GET API call parameters. We also added the retries variable to send the request 2 more times if the first one fails.

    response = client.get(
        f'https://www.google.com/search?q={position}+job+near+{location}&udm=8',
        params={ 
            "custom_google": "true",
            "stealth_proxy": "true",
            "extract_rules": extract_rules,
            "js_scenario": js_scenario, 
            "render_js": "true",
            'country_code':'us'
        },  
        retries=2
    )

Save results to CSV using pandas

Once we have our job data, we can save it to a CSV file:

job_data = response.json().get('jobs', [])

    df = pd.DataFrame(job_data)
    file_name = "google_jobs_output.csv"
    df.to_csv(file_name, index=False)

    return {
        'count': len(job_data),
        'csv_file': file_name
    }

This creates a neatly organized spreadsheet with all your job listings.

Handle multiple queries asynchronously

To speed up our scraping, let's add the async function to run three job queries at the same time. Let's import the async library at the top of the scraping script:

import asyncio

Now define the async function to run multiple google_jobs_api functions.

async def run_all_queries():
    loop = asyncio.get_event_loop()
    tasks = [
        loop.run_in_executor(None, google_jobs_api, "Software Developer", "New York, USA"),
        loop.run_in_executor(None, google_jobs_api, "Data Analyst", "San Francisco, USA"),
        loop.run_in_executor(None, google_jobs_api, "Marketing Manager", "Austin, USA"),
    ]
    results = await asyncio.gather(*tasks)

This approach will significantly reduce the total time needed to scrape multiple job and location combinations. To learn more about concurrent requests, check out our tutorial on Python async scraping.

Running and Scaling the Scraper

Now let's put everything together and run our scraper at scale. It's a bit like organizing a fleet of ships – we need to coordinate multiple requests while avoiding detection.

import asyncio
from scrapingbee import ScrapingBeeClient
import pandas as pd
import json

client = ScrapingBeeClient(api_key='YOUR_API_KEY')

def google_jobs_api(position='jobs', location="USA"):
    extract_rules = {
        "jobs": {
            "selector": ".EimVGf",
            "type": "list",
            "output": {
                "title": "div.tNxQIb.PUpOsf",
                "company": "div.wHYlTd.MKCbgd.a3jPc",
                "location_and_portal": "div.wHYlTd.FqK3wc.MKCbgd",
                "posted": "span[aria-label$=ago]",
                "employment_type": "span[aria-label^=Employment]",
                "salary": "span[aria-label^=Salary]",
                "job_detail": ".MQUd2b@href"
            }
        }
    }

    js_scenario = {
        "instructions":[
            {"evaluate":"(async()=>{let startTime=Date.now();let timeLimit=30000;let lastScrollHeight=0;function sleep(ms){return new Promise(resolve=>setTimeout(resolve,ms));}async function scrollToEnd(){while(Date.now()-startTime<timeLimit){window.scrollTo(0,document.body.scrollHeight);await sleep(1000);let currentScrollHeight=document.body.scrollHeight;if(currentScrollHeight===lastScrollHeight){console.log('Reached the end of the page.');break;}else{lastScrollHeight=currentScrollHeight;}}console.log('Scrolling completed or time limit reached.');}await scrollToEnd();let indicator=document.createElement('div');indicator.id='scrollCompleted';document.body.appendChild(indicator);})();"},
            {"wait_for": "#scrollCompleted"}
        ]
    }

    position = position.replace(' ', '+')
    location = location.replace(' ', '+')

    response = client.get(
        f'https://www.google.com/search?q={position.replace(" ","+")}+job+near+{location.replace(" ","+")}&udm=8',
        params={
            "custom_google": "true",
            "stealth_proxy": "true",
            "render_js": "true",
            "extract_rules": extract_rules,
            "js_scenario": js_scenario,
            "country_code": "us"
        },
        retries=2
    )

    job_data = response.json().get('jobs', [])
    df = pd.DataFrame(job_data)
    df.to_csv("google_jobs_output.csv", mode='a', index=False, header=False)
    return {'count': len(job_data)}

async def run_all_queries():
    loop = asyncio.get_event_loop()
    tasks = [
        loop.run_in_executor(None, google_jobs_api, "Software Developer", "New York, USA"),
        loop.run_in_executor(None, google_jobs_api, "Data Analyst", "San Francisco, USA"),
        loop.run_in_executor(None, google_jobs_api, "Marketing Manager", "Austin, USA"),
    ]
    results = await asyncio.gather(*tasks)

asyncio.run(run_all_queries())

After a successful extraction, we have a well-structured CSV file that combines concurrent extractions into one data set.

csv

Monitor job status and handle errors

To make our scraper more robust, you can add if statements and except functions for GET API calls when no jobs are found in specific locations. Other preventive measures include the retries parameter, that does not stop the scraper after one unsuccessful connection request.

These changes ensure that each URL is targeted multiple times before giving up, which helps handle temporary network issues or rate limiting.

Understanding Google Jobs Structure

To effectively scrape Google Jobs, it helps to understand how the data is structured. Google Jobs data is rendered dynamically with JavaScript, which is why ScrapingBee's render_js feature is essential for accessing the complete information.

How Google Jobs displays listings

When you search for jobs on Google, the results appear in a dedicated "Jobs" widget that contains cards for each position. These cards show a preview of the job information, and clicking on one expands it to reveal more details like the full description, application link, and posting date.

Key elements to extract from job cards

The most valuable data points from Google Jobs listings include:

Job title
Company name
Location
Salary (when available)
Job description snippet
Posted date
Application link

Tips for scaling to more locations

If you're not getting locations:

Use state/province names for broader searches
Add radius qualifiers like "within 50 miles" for rural areas
Rotate through different search patterns to avoid detection

Start Scraping Google Jobs with Confidence

By now, you should have a fully functional Google Jobs scraper powered by ScrapingBee. The beauty of this approach is that you don't have to worry about proxy rotation, browser fingerprinting, or JavaScript rendering – ScrapingBee handles all of that behind the scenes. This means you can focus on extracting and analyzing the job data rather than fighting with anti-scraping measures.

Ready to start collecting job data? Grab your API key today and run your first scraper. And if you run into any challenges, our detailed documentation and support team are here to help.

Frequently Asked Questions (FAQs)

Is it legal to scrape from Google?

Scraping publicly available data from Google is generally acceptable for personal use, but it's important to respect Google's Terms of Service. Avoid scraping at high volumes, and don't republish the data as your own. ScrapingBee helps ensure your scraping activities remain responsible by managing request rates and using legitimate proxies.

Can you scrape from Google?

Yes, you can scrape from Google, but it requires handling challenges like JavaScript rendering, dynamic content loading, and anti-bot measures. ScrapingBee's API makes this process much simpler by providing the rendering capabilities and proxy management needed to access Google's data successfully.

Is scraping job postings legal?

Scraping job postings for personal use, research, or aggregation with proper attribution is generally acceptable. However, you should always respect robots.txt files and avoid overloading servers with requests. Additionally, consider the terms of service of the specific job platforms you're accessing through Google Jobs.

Before you go, check out these related reads:

Kevin Sahin

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.