How to Scrape Google Play: Step-by-Step Guide

Kevin Sahin | 23 August 2025 | 16 min read

Table of contents

Want to extract app names, ratings, reviews, and install counts from Google Play? Scraping is one of the fastest ways to collect valuable mobile app data from Google Play, but dynamic content and anti-bot systems make traditional scrapers unreliable

In this guide, we will teach you to scrape Google Play using Python and our beloved ScrapingBee API. Here you will find the basic necessities for your collection goals, helping you export data in clean, structured formats. Let’s make scraping simple and scalable!

Quick Answer (TL;DR)

To scrape Google Play, use Python with the our HTML API. It handles JavaScript rendering and rotating proxies automatically. Set up a js_scenario to click buttons that reveal app descriptions and reviews. Then use BeautifulSoup to parse the fully rendered HTML.

With just a few lines of code, you can extract app names, ratings, downloads, and reviews. Try it now with our Google Scraping API, or test it with our fully functional Google Play scraper:

from scrapingbee import ScrapingBeeClient
from bs4 import BeautifulSoup
import pandas as pd
#initializes two ScrapingBee clients to target different parts of the site with JS rendering. Don't forget to enter your unique API key!
client1 = ScrapingBeeClient(api_key='YOUR_API_KEY')
client2 = ScrapingBeeClient(api_key='YOUR_API_KEY')

def google_play_store_app_data(app_id):
    #instructions to open the description pop-up
    js_description = {
    'instructions': [
        {
         'wait_for_and_click': 'button.VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.QDwDD.mN1ivc.VxpoF'
        },

        {'wait': 1000}
    ]
}
    #instructions to open the reviews pop-up
    js_reviews = {
    'instructions': [
        {
            'wait_for_and_click': 'button.VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.QDwDD.mN1ivc.VxpoF[aria-label="See more information on Ratings and reviews"]'
        },
        {'wait': 500}
    ]
}    
    #Sending two GET API calls
    response_description = client1.get(
        f'https://play.google.com/store/apps/details?id={app_id}',
        params={
            "custom_google": "true",
            "wait_browser": "networkidle2",
            "premium_proxy": "true",
            "js_scenario": js_description,
            "render_js": "true",
            'country_code': 'us'
        },
        retries=2
    )
    response_reviews = client2.get(
        f'https://play.google.com/store/apps/details?id={app_id}',
        params={
            "custom_google": "true",
            "wait_browser": "networkidle2",
            "premium_proxy": "true",
            "js_scenario": js_reviews,
            "render_js": "true",
            'country_code': 'us'
        },
        retries=2
    )
    if response_description.status_code != 200:
        return "Failed to retrieve the page."

    soup_description = BeautifulSoup(response_description.text, "lxml")
    soup_reviews = BeautifulSoup(response_reviews.text, "lxml")

    def extract_text(selector):
        el = soup_description.select_one(selector)
        return el.get_text(strip=True) if el else None
    
    def extract_reviews_dict():
        review_divs = soup_reviews.select("div.RHo1pe")
        return {
        f"review_{i+1}": div.get_text(strip=True)
        for i, div in enumerate(review_divs)
        }

    data = {
        "name": extract_text("span.AfwdI"),
        "rating": extract_text("div.TT9eCd"),
        "description": extract_text("div.fysCi > div:has(br)"),
        "downloads": extract_text(".wVqUob:nth-child(2) > .ClM7O"),
        "content_rating": extract_text(".wVqUob:nth-child(3) > .g1rdde > span > span"),
        "support_email": extract_text(".VfPpkd-WsjYwc.VfPpkd-WsjYwc-OWXEXe-INsAgc.KC1dQ.Usd1Ac.AaN0Dd.VVmwY:nth-child(2) .pSEeg"),
        "updated_on": extract_text(".lXlx5 + .xg1aie"),
        "tags": extract_text(".TKjAsc + .Uc6QCc"),
        "whats_new": extract_text(".c-wiz:nth-child(6) .SfzRHd > div"),
        "developer": extract_text(".sMUprd:nth-child(10) > .reAt0"),
        "android_os_requirement": extract_text(".sMUprd:nth-child(3) > .reAt0"),
        "in_app_purchase_range": extract_text(".sMUprd:nth-child(5) > .reAt0"),
        "released_on": extract_text(".sMUprd:nth-child(9) > .reAt0"),
        "reviews": extract_reviews_dict()
    }
    df = pd.DataFrame([data])
    df.to_csv("app_data.csv", index=False)
    return data
print(google_play_store_app_data(app_id='com.ludo.king'))

Set Up Your Environment

Before scraping Google Play, you’ll need to prepare your local environment with Python and a few essential libraries. If you haven’t already, download and install Python, available for all major operating systems. As the most popular coding language, it creates a comfortable environment to customize numerous connection parameters, simplifying your interaction with our HTML API.

Note: During installation on Windows, make sure to check the box that says “Add Python to PATH” so you can use it from the command line. If you need more information on coding basics, check out our blog on Python web scraping.

Install required libraries

To scrape Google Play efficiently, install the following Python libraries. Each serves a specific role, but the ScrapingBee API actually combines many of these features:

pip install requests – basic HTTP requests (optional if using ScrapingBee)
pip install beautifulsoup4 – HTML parsing (needed if parsing manually)
pip install lxml – fast, reliable parsing backend for BeautifulSoup
pip install pandas – organize extracted data into tables (DataFrames)
pip install scrapingbee – handles requests, JavaScript rendering, and can parse with extract_rules, making most other tools optional

To emphasize the comfort and effectiveness, we can get away with using just two of them:

pip install scrapingbee pandas

However, if you need more customization, check out or guide on the best Python scraping libraries.

Create a virtual environment (optional but recommended)

Creating a virtual environment helps avoid package conflicts and keeps your scraping project isolated from other Python setups. Here is how you can set it up:

Create the environment folder. The following Command Prompt line creates a folder named venv containing a fresh Python environment:
```
python -m venv venv
```
Activate the environment:
```
venv\Scripts\activate
```
You’ll now see (venv) in your Command Line Prompt — any libraries you install stay inside this environment.

Scrape Google Play Store with Python

Register for a ScrapingBee account to get your API key. After registration, you'll find your key in the dashboard. This key is essential for authenticating your requests to our API.

After a successful signup, you will be presented with a dashboard displaying the expiration date of your free trial, available credits, concurrent connections, and your API key.

dashboard

Now it's time to start building the script for a Google Play scraper. Begin by importing downloaded libraries and assigning the ScrapingBee Client to a variable:

from scrapingbee import ScrapingBeeClient
import pandas as pd
from bs4 import BeautifulSoup
client = ScrapingBeeClient(api_key='YOUR_API_KEY')

The rest of the code will be wrapped under a function "google_play_store_app_data_api". Here we will define BeautifulSoup parsing rules and parameters for JavaScript rendering.

def google_play_store_app_data_api(app_id):

Encompassing additional steps within the function through indentation, let's start by adding a "js_scenario" dictionary that will attach to the GET API request to instruct the headless browser to interact with JavaScript elements within the loaded page.

   js_scenario = {
    "instructions": [
        {"wait_for_and_click": "c-wiz:nth-child(2) button.VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.QDwDD.mN1ivc.VxpoF"},
        {"wait": 500}
    ]
}

Let's break them down a bit for a better understanding:

"wait_for_and_click": tells our API to wait until a button, identified by CSS selectors, appears to click it.
"wait": 500: After clicking, waits 500 milliseconds before continuing.

These additional parameters give us a big advantage in approaching and handling valuable public data hidden behind JavaScript rendering.

Send a GET request to the app page

Now we can define a response variable that will store the instructions for the GET API call:

    response = client.get(
        f'https://play.google.com/store/apps/details?id={app_id}',
        params={
            "custom_google": "true",
            "wait_browser": "networkidle2",
            "premium_proxy": "true",
            "js_scenario": js_scenario,
            "render_js": "true",
            'country_code': 'us'
        },
        retries=2
    )

The "app_id" will be a customizable parameter that will give us webpage URL's to scrape and reach the information on specific apps. Let's take a closer look at the included list of GET API parameters:

"custom_google": "true" – Activates Google-specific optimizations (like correct headers and handling) for better scraping reliability.
"wait_browser": "networkidle2" – Waits until the page has fully loaded — specifically, when there are no more than 2 active network connections for at least 500ms.
"premium_proxy": "true" – Routes our scraping API through premium proxies to avoid blocks, rate-limiting, or region restrictions.
"js_scenario": js_scenario – Runs the defined JavaScript actions (e.g., clicking a button, waiting) before scraping — mimics user interaction.
"render_js": "true" – Tells ScrapingBee to render JavaScript, so you get the fully loaded DOM as a real user would see it.
'country_code': 'us' – Loads the app page as if the user is in the United States — helpful for region-specific content

All of these parameters are defined in the ScrapingBee documentation, so make sure to check it out!

Use BeautifulSoup to parse HTML

After configuring the GET API call with the help of customizable parameters, its time to apply the mechanism that will restructure raw HTML into a readable and understandable format.

    soup = BeautifulSoup(response.text, "lxml")

This line of code initializes BeautifulSoup to use an lxml parser for the content within the response variable. It loads the fully rendered HTML into the soup variable, so you can use CSS selectors or tags to find and extract specific data from the page. Now let's define an "extract_text" function, that accepts one variable – a unique CSS selector on the app of a specific app and store it in a Python variable.

    def extract_text(selector):
        el = soup.select_one(selector)
        return el.get_text(strip=True) if el else None

The last step is to create a dictionary of variables that store data from scraping Google Play. To find how to identify the CSS selectors, use the browser's developer tools (right click the window and click inspect). For example, after right clicking the name of the app, we can pick apart how each important data point is defined.

google play

Then, by applying the "extract_text" and assigning this selector, the function will return the app's name.

# Note: we are still writing the code within the definition of the google_play_store_app_data app
 print(extract_text("span.AfwdI"))

If you want to learn more on how to target a Google Play page or any other site that needs parsing, check out our tutorial on scraping with BeautifulSoup.

Extract App Details and Reviews

To finish up our "google_play_store_app_data" function that combines all stakes within the parser. Now that we know how to target specific content blocks from the page, let's build a dictionary that will extract data into a set of details, effectively building a Google Play Store API.

Accessing user reviews can be a bit tricky, because the main page only renders a few of them. Luckily, with ScrapingBee's page rendering capabilities, we will reach them too. Below we will start building a list of data elements, which will be appended by identifying and including key sections for scraping and parsing.

Get app name, developer, and description

Here is the first version that only includes our first example – content within the span.AfwdI element, which stores the app name.

    data = {
        "name": extract_text("span.AfwdI"),
 # Here we will append additional calls for the exract_text function, and assign them to new variables.
    }

Let's continue with our example of analyzing the Ludo King® game. Head over to the web page, highlight the developer name, then right click on the screen and click "inspect". After checking them, we can structure parsing instructions and call "extract_text" functions:

"description": extract_text(".fysCi div") – extract text within the .fysCi class, which stores the description of the targeted app.
"developer": extract_text(".sMUprd:nth-child(10) > .reAt0") – a complex parsing expression, targets the text within the .reAt0, which is a direct child of the 10th .sMUprd element

Notes: For a browser to load the contents of the .fysCI class, your headless browser has to interact with the arrow button near the "About this game" text. Thankfully, we already addressed this step in the definition of the js_scenario, which waits for the button to appear and presses it before extracting the HTML code.

about block

By now, you should have a data dictionary of extract_text classes that looks like this:

    data = {
        "name": extract_text("span.AfwdI"),
        "description": extract_text(".fysCi div"),
        "developer": extract_text(".sMUprd:nth-child(10) > .reAt0")
    }

Extract ratings and number of downloads

Following the same logic, let's expand the list with additional data from Google Play. Add these two lines into the data dictionary:

        "rating": extract_text("div.TT9eCd"),
        "downloads": extract_text(".wVqUob:nth-child(2) > .ClM7O"),

While some CSS selectors are harder to pinpoint, you can also experiment with the use of different parsers, as well as test different definitions and get the same results. Its all about trial and error.

Scrape user reviews

Unfortunatelly, to access user reviews, we need to interact with another popup window, but closing one of them causes the scraper to lose them or the product description. To target them both under one application, we will create two JavaScript scenarios and two separate GET API calls:

client1 = ScrapingBeeClient(api_key='YOUR_API_KEY')
client2 = ScrapingBeeClient(api_key='YOUR_API_KEY')

def google_play_store_app_data(app_id):
    js_description = {
    'instructions': [
        {
         'wait_for_and_click': 'button.VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.QDwDD.mN1ivc.VxpoF'
        },

        {'wait': 500}
    ]
}
    js_reviews = {
    'instructions': [
        {
            'wait_for_and_click': 'button.VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.QDwDD.mN1ivc.VxpoF[aria-label="See more information on Ratings and reviews"]'
        },
        {'wait': 500}
    ]
}    
  
    response_description = client1.get(
        f'https://play.google.com/store/apps/details?id={app_id}',
        params={
            "custom_google": "true",
            "wait_browser": "networkidle2",
            "premium_proxy": "true",
            "js_scenario": js_description,
            "render_js": "true",
            'country_code': 'us'
        },
        retries=2
    )
    response_reviews = client2.get(
        f'https://play.google.com/store/apps/details?id={app_id}',
        params={
            "custom_google": "true",
            "wait_browser": "networkidle2",
            "premium_proxy": "true",
            "js_scenario": js_reviews,
            "render_js": "true",
            'country_code': 'us'
        },
        retries=2
    )

One of them waits for the button that opens up the entire app description, while the other accesses a popup with the first loaded 20 reviews, which can be extended with integrated scrolling. Because of the applied changes, we will need two Beautifulsoup parsers, and two functions to extract text: one for individual parameters, and the other for combining all reviews.

    soup_description = BeautifulSoup(response_description.text, "lxml")
    soup_reviews = BeautifulSoup(response_reviews.text, "lxml")

    def extract_text(selector):
        el = soup_description.select_one(selector)
        return el.get_text(strip=True) if el else None
    
    def extract_reviews_dict():
        review_divs = soup_reviews.select("div.RHo1pe")
        return {
        f"review_{i+1}": div.get_text(strip=True)
        for i, div in enumerate(review_divs)
        }

Now, after adding additional useful parameters, our collection of Google Play data should look like this:

    data = {
        "name": extract_text("span.AfwdI"),
        "rating": extract_text("div.TT9eCd"),
        "description": extract_text("div.fysCi > div:has(br)"),
        "downloads": extract_text(".wVqUob:nth-child(2) > .ClM7O"),
        "content_rating": extract_text(".wVqUob:nth-child(3) > .g1rdde > span > span"),
        "support_email": extract_text(".VfPpkd-WsjYwc.VfPpkd-WsjYwc-OWXEXe-INsAgc.KC1dQ.Usd1Ac.AaN0Dd.VVmwY:nth-child(2) .pSEeg"),
        "updated_on": extract_text(".lXlx5 + .xg1aie"),
        "tags": extract_text(".TKjAsc + .Uc6QCc"),
        "whats_new": extract_text(".c-wiz:nth-child(6) .SfzRHd > div"),
        "developer": extract_text(".sMUprd:nth-child(10) > .reAt0"),
        "android_os_requirement": extract_text(".sMUprd:nth-child(3) > .reAt0"),
        "in_app_purchase_range": extract_text(".sMUprd:nth-child(5) > .reAt0"),
        "released_on": extract_text(".sMUprd:nth-child(9) > .reAt0"),
        "reviews": extract_reviews_dict()
    }

Once everything is done, close the function and call it with a desired app name:

print(google_play_store_app_data(app_id='com.ludo.king'))

result

Avoid Getting Blocked While Scraping

Scraping sensitive platforms like Google URLs and its related projects can get your IP banned, even if you only target public data. Here are a few ways to ensure consistent scraping without getting blocked.

Use user-agent headers

Web servers often block requests that don’t look like they’re from real browsers. Setting a User-Agent header makes your request appear more natural. We include realistic headers by default, but just like with other parameters, you can adjust HTML headers. For example, here is how you can include a specific User-Agent to your connection.

    response_reviews = client2.get(
        f'https://play.google.com/store/apps/details?id={app_id}',
        params={
            "custom_google": "true",
            "wait_browser": "networkidle2",
            "premium_proxy": "true",
            "js_scenario": js_reviews,
            "render_js": "true",
            'country_code': 'us'
        },
    # Expanding GET API request with a manually chosen User-Agent:
        headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
    }
        retries=2
    )

Add delays between requests

Sending requests too quickly can trigger anti-bot systems. Adding short irregular delays and interactions with the page's JavaScript elements mimics human browsing and reduces the risk of blocks. It’s a good practice to introduce delays, especially as your Google Play Scraper gets more complicated, to avoid blocks and unnecessary strain on recipient servers.

Consider rotating proxies

Sites may block your IP if they detect repeated access. Using rotating proxies changes your IP for each request. Fortunatelly, if you work with ScrapingBee, our API automatically rotates proxies and handles geolocation, so you don’t have to manage proxy pools manually. Just add { "premium_proxy": "true" } to your GET API parameters.

"premium_proxy": "true"

Automate and Save Your Scraper

Once your scraper works reliably, it’s time to introduce efficient data storage and conditions for repeated use Wrap your logic in functions, save data persistently, and handle errors gracefully. This allows you to run your scraper repeatedly—daily, weekly, or across thousands of app IDs.

ScrapingBee supports large-scale scraping with built-in proxy rotation and JavaScript rendering, and making it ideal for repeated use and automated workflows.

Wrap your code in a function

Combining the entire logic in a function makes it more structured, easier to adjust and reuse. Throughout this tutorial, we created one big function that encapsulates all steps, and two additional functions for extraction of data parsed by BeautifulSoup:

def google_play_store_app_data(app_id): the main function that covers all steps of scraping app data from the Google Play Store using ScrapingBee.
- def extract_text(selector): a nested function that extracts the title and other data points (except reviews)
- def extract_reviews_dict(): another inner function, which extracts visible reviews from the reviews pop-up window.

Use a conditional statements to handle errors

To catch errors without wondering what went wrong, introduce conditional statements to handle temporary errors and retrieve HTML error codes. For example, even a simple if statement that checks the GET API call via its HTML code can notify you if the persisting problem is tied to your connection, or the unavailability of Google Play Store servers.

# Note: response is a variable we defined to store data from the API request
    if response.status_code != 200:
        return "Failed to retrieve the page."

Save data to JSON or CSV

Collecting public data means nothing if you cannot store it in a readable and understandable format. For example, after format definitions, let's add the final line to our main scraper function, which will transform the collected data dictionary into a Pandas dataframe and save to CSV:

    df = pd.DataFrame([data])
    df.to_csv("app_data.csv", index=False)

After combining everything together, here is an example code of a functional scraper. Feel free to expand upon it, introduce automatic suggestion of URLs, or expand it to target more data points:

from scrapingbee import ScrapingBeeClient
from bs4 import BeautifulSoup
import pandas as pd
#initializes two ScrapingBee clients to target different parts of the site with JS rendering. Don't forget to enter your unique API key!
client1 = ScrapingBeeClient(api_key='YOUR_API_KEY')
client2 = ScrapingBeeClient(api_key='YOUR_API_KEY')

def google_play_store_app_data(app_id):
    #instructions to open the description pop-up
    js_description = {
    'instructions': [
        {
         'wait_for_and_click': 'button.VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.QDwDD.mN1ivc.VxpoF'
        },

        {'wait': 1000}
    ]
}
    #instructions to open the reviews pop-up
    js_reviews = {
    'instructions': [
        {
            'wait_for_and_click': 'button.VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.QDwDD.mN1ivc.VxpoF[aria-label="See more information on Ratings and reviews"]'
        },
        {'wait': 500}
    ]
}    
    #Sending two GET API calls
    response_description = client1.get(
        f'https://play.google.com/store/apps/details?id={app_id}',
        params={
            "custom_google": "true",
            "wait_browser": "networkidle2",
            "premium_proxy": "true",
            "js_scenario": js_description,
            "render_js": "true",
            'country_code': 'us'
        },
        retries=2
    )
    response_reviews = client2.get(
        f'https://play.google.com/store/apps/details?id={app_id}',
        params={
            "custom_google": "true",
            "wait_browser": "networkidle2",
            "premium_proxy": "true",
            "js_scenario": js_reviews,
            "render_js": "true",
            'country_code': 'us'
        },
        retries=2
    )
    if response_description.status_code != 200:
        return "Failed to retrieve the page."

    soup_description = BeautifulSoup(response_description.text, "lxml")
    soup_reviews = BeautifulSoup(response_reviews.text, "lxml")

    def extract_text(selector):
        el = soup_description.select_one(selector)
        return el.get_text(strip=True) if el else None
    
    def extract_reviews_dict():
        review_divs = soup_reviews.select("div.RHo1pe")
        return {
        f"review_{i+1}": div.get_text(strip=True)
        for i, div in enumerate(review_divs)
        }

    data = {
        "name": extract_text("span.AfwdI"),
        "rating": extract_text("div.TT9eCd"),
        "description": extract_text("div.fysCi > div:has(br)"),
        "downloads": extract_text(".wVqUob:nth-child(2) > .ClM7O"),
        "content_rating": extract_text(".wVqUob:nth-child(3) > .g1rdde > span > span"),
        "support_email": extract_text(".VfPpkd-WsjYwc.VfPpkd-WsjYwc-OWXEXe-INsAgc.KC1dQ.Usd1Ac.AaN0Dd.VVmwY:nth-child(2) .pSEeg"),
        "updated_on": extract_text(".lXlx5 + .xg1aie"),
        "tags": extract_text(".TKjAsc + .Uc6QCc"),
        "whats_new": extract_text(".c-wiz:nth-child(6) .SfzRHd > div"),
        "developer": extract_text(".sMUprd:nth-child(10) > .reAt0"),
        "android_os_requirement": extract_text(".sMUprd:nth-child(3) > .reAt0"),
        "in_app_purchase_range": extract_text(".sMUprd:nth-child(5) > .reAt0"),
        "released_on": extract_text(".sMUprd:nth-child(9) > .reAt0"),
        "reviews": extract_reviews_dict()
    }
    df = pd.DataFrame([data])
    df.to_csv("app_data.csv", index=False)
    return data
print(google_play_store_app_data(app_id='com.ludo.king'))

table

Ready to Scrape Smarter? Try ScrapingBee

Scraping doesn’t have to be complicated. We handle the toughest parts for you: rotating proxies, rendering JavaScript, and avoiding detection without the hassle of managing headless browsers or server setups. Whether you're scraping a simple HTML page or dynamic JavaScript content, ScrapingBee makes it seamless. Just plug in your API key and start collecting data reliably in minutes.

Ready to dive deeper? Explore our tutorials to level up your scraping projects and make the most of our powerful HTML API

Frequently Asked Questions (FAQs)

Is it legal to scrape data from Google Play?

Yes, scraping public data from Google Play is legal, although it may violate Google's terms of service.

Can I scrape user reviews that load dynamically on the page?

Yes, you can scrape dynamically loaded user reviews by using tools that support JavaScript rendering, like ScrapingBee. By simulating clicks and adding wait times, you can access content that isn’t available in the initial HTML.

Why does my scraper get blocked or return incomplete data?

Your scraper may be getting blocked due to missing headers, too many rapid requests, or a lack of IP rotation. Also, don't forget appropriate parameters for JavaScript Rendering.

How do I scrape multiple app pages in one run?

To scrape multiple app pages, loop through a list of app IDs and fetch each page in sequence. Make sure to add short delays between requests to reduce the risk of rate limiting from the recipient server.

Before you go, check out these related reads:

Kevin Sahin

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.