How to Scrape Yellow Pages with ScrapingBee

Kevin Sahin | 05 September 2025 | 14 min read

Table of contents

Learning how to scrape Yellow Pages can unlock access to a rich database of business listings. With minimal technical knowledge, our approach to scraping HTML content extracts data that you can use for lead generation, market research, or local SEO.

Like most online platforms rich with useful coding data, Yellow Pages present JavaScript-rendered content and anti-scraping measures, which often stop traditional scraping efforts. Our HTML API is built to export data while automatically handling restrictions by loading dynamic content and implementing smart proxy rotation to ensure consistent access with minimal coding skills.

Whether you’re a developer building a business directory or just automating data collection for internal use, this guide will help you extract key details like business names, addresses, phone numbers, and websites from Yellow Pages using our beginner-friendly Python SDK. Let's get to work!

Quick Answer (TL;DR)

You can scrape Yellow Pages using our API by making a single API call with JavaScript rendering and stealth proxy configurations, plus a few CSS selectors. Thats it! Our tools help you automatically bypasses anti-bot protections, and extract business data like names, phone numbers, addresses, and categories in a clean format.

Scrape Yellow Pages Listings Using ScrapingBee

Unlock the full potential of Yellow Pages data with our all-in-one scraping solution. Here you will find a fully annotated script that’s easy to adapt and extract business details.

With our tools, you can forget about most scraping headaches. We handle proxy management, JavaScript rendering, and smart request routing for you. That means you can focus entirely on collecting business names, phone numbers, addresses, and website URLs—delivered in a clean, readable format.

Yellow pages load dynamically through JavaScript, and anti-bot measures can block basic scrapers. Let's dissect these features by building a Yellow Pages scraper from scratch. If you want to learn more about how our services handle scraping connections and avoid anti-scraping measures despite restrictions within the target URL, check out our blog article on Web Scraping Without Getting Blocked.

Get Started with Python on Your Device

Start by downloading Python 3.6 or newer is installed on your machine. You can get it from python.org or install it directly from the Microsoft Store by searching for “Python.”

Python

Python provides the opportunity to easily import external libraries. To use its package manager pip, open your Terminal (Command Prompt for Windows) and type pip install <package name>. For Yellow Pages scraping, here are the main packages you will need to scrape data with our HTML API:

scrapingbee – our Python Software Development Kit (SDK) that can send API calls, handle headless browsers, route connection through remote proxy IPs, and interact with JavaScript elements on the site.
pandas – a powerful toolkit for data analysis and manipulation, featuring structures like DataFrames that reorganize business information from HTML content into a readable and understandable format.

To install these packages, open your Terminal, or Command Prompt on windows devices, and enter this pip command:

pip install scrapingbee pandas

Set Up ScrapingBee and Your Environment

Now, head over to our website to create or log in to your ScrapingBee account to get retrieve the API key. After first registration, you will receive 1,000 free credits as a 7-day free trial to test your web scraping skills, plus scale connection requests and explore additional features.

Once that is taken care of, you will greeted by our dashboard that displays available resources and usage data tied to your account. Copy the API key – we will attach it to a Python variable which will activate our API in your script.

Dashboard

For our next step, pick an appropriate folder for your project, and create a text file that will store our web scraping logic, for example Yellow_pages_scraper.py, with the extension indicating that it is a Python file.

Note: Python files can be configured on any text editor, but we highly recommend using a dynamic IDE like VSCode, where you can download the IntelliSense to track your code mistakes in real time – a great way to learn coding and only send properly structured API requests to not waste credits.

Then, begin your script by importing the downloaded external libraries. We also added the urllib.parse module for a more comfortable conversion and generation of desired URLs (more on URLs in the Pagination section).

#Importing our HTML API
from scrapingbee import ScrapingBeeClient 
# pandas dataframes for better data formatting
import pandas as pd 
# An internal Python library to integrate input parameters into a URL string
from urllib.parse import urlencode

Note: Python comment lines, starting with a hash symbol or triple quotes, are ignored by the interpreter. Our code examples use them to explain and clarify written code.

After that, we created a "client" variable which initializes our scraping client and the previously copied key to authorize access to the API. Then, we added a base URL that will be configured based on your search query:

# Initializing our API client in the "client" variable
client = ScrapingBeeClient(api_key='A0I5QA6KVT1I6NP8LXLJRJDYF43WVALD5KP2ZJYCHUIHS3K8THI0X8NC9O4LTZTBW687ASU0M6O2KXM5')
# base URL
base = "https://www.yellowpages.com/search"

If you want to learn more how each of our tools and parameters work, check out ScrapingBee Documentation. Once the setup is in order, we can start working on the actual scraping logic and sending the first API requests.

Make an API Call to Yellow Pages

Because Yellow Pages platform is kind enough to store search query information within the URL, we can start our script by asking for user input and assign them to variables.

# Enter a profession, company, or any other entity that yo are looking for
search_term = input("Search_term: ").strip()
# Choose a location
location_term = input("Location_term: ").strip()
# YellowPages show 30 listings per page. Pagination variable lets us specify how many pages to target
Pagination = int(input("How many pages to scrape? "))

Note: Ignore the "Pagination" variable for now, we will come back to it after building a simple web scraper first, which will just extract raw HTML content.

After collecting our input, the next step is to build an appropriate URL that will target that specific page. Below is a section which restructures it based on what was entered in the input function calls:

# search and location parameters
params = {"search_terms": search_term, "geo_location_terms": location_term}
# encoding the base URL with 
url = f"{base}?{urlencode(params)}"

For example, if you type in your search term as "Dentist", and location the city of Denver, the url variable will be automatically transformed, looking like this:

https://www.yellowpages.com/search?search_terms=Dentist&geo_location_terms=Denver

With that out of the way, we create a function that will contain all logic for your Yellow Pages scraper. To extract data using custom JavaScript rendering rules, the following dictionary variable "js_scenario" covers instructions for how should a headless browser behave. For now all we gonna do is add an instruction to wait for 2 seconds before extracting data to make sure all of it is loaded.

    js_scenario = {
        "instructions": [
            {"wait": 2000}
        ]
    }

If you want to know learn more about customizable scenarios, check out our page about JavaScript Scenario Support. The next step will be defining the variable that will store exported data from Yellow Pages with a GET API call.

        response = client.get(
            url,
            params={
                "js_scenario": js_scenario,
                'stealth_proxy': 'True',
                }
            )

You've probably already noticed that we have snuck in an additional parameter – 'stealth_proxy': 'True'. That is because visiting the page without it triggered the Cloudflare anti-bot measures. Usually, proxy support with a 'premium_proxy' variable is enough for penetrating these hurdles, but sometimes 'stealth_proxy' is needed for sites that do not accept regular proxy connections. Add this parameter and we will take care of the rest!

After defining the GET API call, we decided to store the outcome in a "results" variable, which takes the data from the response and structures it in a JSON format. Now all that is left is to print out the data, close the function and call it:

  result=(response.json())
        print(result)
scrape_Yellow_Pages()

Call your function and enter the desired search information. The scraper should print the result that looks like this:

Command Prompt

Of course, the HTML code with all of its tags is not useful for research in any way. Let's go back and add another dictionary for the GET API call, names "extract_rules". It will store CSS selectors that will ensure only relevant information is extracted from the web page.

To find the appropriate selectors, go to Yellow Pages in your browser, enter the same URL, and click F12, or right-click and inspect element. Here we will identify separate sections that contain relevant data for each listing card, as well as CSS selectors for specific data points like business name, details, phone number, and experience.

The following image highlights a section where organic listing cards (no ads included) are all grouped together.

Class

Within it, we must find a CSS selector that separates each individual job listing card. That will be the main selector for our "extract_rules" definition. By picking the first card on the list, we can see that it is a div with a v-card class:

Card

To extract a specific selector, find the div in the HTML code, right-click it, and select Copy - Copy selector. Within developer tools, there is a search bar that can be accessed by pressing ctrl+F. If you paste the selector there, it will show all cases on the page that fit the defined criteria, or just use the visible CSS selector from the image above: "div.v-card".

However, both of these options pose a problem. Here are two examples of CSS selectors, one copied from the first card listing with the Copy selector feature, while the other one is what we see above the selected HTML content section:

• div.v-card – selector works, but selects cards outside the "organic" range, including ads and featured listings.

• #lid-7870042 > div > div – a copied selector which only targets a listing card with a specific ID.

However, by combining our first two identified selectors, we can combine them to create one that only targets .v-card class elements within organic listing section:

div.search-results.organic div.v-card

If we put this selector into the developer tools' search bar, we can see that it only shows 30 results, the exact amount of shown job listings within one page.

Within "extract_rules" we define a dictionary called "Page list". Its first selector will be the container for each result, while a "list" type tells the API to return more than one card. Then, the output section extracts specific data points within each container. Following the same principles, we find CSS selectors for Business titles, categories, phone numbers, addresses, and experience in the field:

extract_rules = {
    "Page list": {   # A label for the group of extracted data
        "selector": "div.search-results.organic div.v-card",  
        # CSS selector for the "container" of each result (a business card entry).
        "type": "list",  
        # Means: return multiple items (a list of results), not just one.
        "output": {  
            # Inside each "v-card", extract the following fields:
            "Title": 'h2',                        # Business name/title
            "Categories": 'div.categories',       # Categories it belongs to
            "Phone number": 'div.phones.phone.primary',  # Main phone number
            "Address": 'div.adr',                 # Address text
            "Experience": 'div.years-in-business' # "Years in business" info
        }
    },
}

Now, to improve readability of data scraped from Yellow Pages, let's go back to the end of our script, right before the function definition is closed.

Parse and Store the Result

Do you remember the "result" variable, that stored raw HTML data? Add the following lineS to use the Pandas library and build a beautiful data set from the extracted data with CSS selectors:

        df=pd.DataFrame(result['Page list'])
 print(df)

Let's run the script once again to see if there is any improvement:

Result

Now that's a massive difference! Let's add an extra line to business data from the Yellow Pages scraper into a CSV file:

    #df.to_csv("YellowPages_extraction.csv", index=False)

If everything was done correctly, you should have a clean exported data set:

Table

And that's it! Isn't that convenient? By following the same principles, you can also work on Scraping an E-commerce Product Page, travel aggregator platforms, or any page that fits your use cases. An extracted CSV file can be instantly opened in excel for further analysis.

Handle Pagination and Anti-Bot Protections

Now we can finally return to the "Pagination" variable, which defines how many pages you want to target. Add the following section of code before the function definition, which will create a new URL for each page and group them all together in a list:

url_list=[url] # URL list definition, starts with our first page already in it
# if statement that does not create more URLS if 
if Pagination<=1: 
   pass
# with each additional page, append the list of URLs, but change it by adding &page=n
else:
    for i in range(2, Pagination+1):  # n_pages=1 -> only base
        offset = i
        next_url= f"{url}&page={i}"
        url_list.append(next_url)
    print(url_list)

#print(url)

As for the "js_scenario" variable, for the Yellow Pages Scraper, we don't need to add any extra instructions. If you encounter a page that requires more JS rendering customization, check out our page on How to Use js_scenario.

To connect multiple pages into one data set, right after the "js_scenario" definition, add and empty list:

    Pages_list=[]

Then, we change the structure of the GET API call, adding it into a "for" loop that will cycle through our list of newly generated URLs. The "result" variable is now within the loop, and the "Pages_list" gets appended with each new DataFrame "df":

    # List of results for each page
    Pages_list=[]
    # For loop going through the list of URLs generated by your input
    for urls in url_list:
        response = client.get(
            #iterating through pages within the URL list
            urls,
            params={
                "extract_rules": extract_rules,
                "js_scenario": js_scenario,
                'stealth_proxy': 'True',
                }
            )
        result=(response.json())
        df=pd.DataFrame(result['Page list'])
        #appending the list of pages with a DataFrame for each page
        Pages_list.append(df)

After closing the loop, there is another new variable – "Pages_DataFrame", which connects business data from all pages into one. Then we print out the result and export data into a CSV file. Now the end of your code should look like this:

    #uses concatenation to merge all DataFrames from the Pages_List
    Pages_DataFrame= pd.concat(Pages_list, ignore_index=True)
    print(Pages_DataFrame)
    df.to_csv("YellowPages_extraction.csv", index=False)
    
scrape_Yellow_Pages()

Once everything is in order, you should retrieve a working data set, and you can test the Yellow Pages scraper with different parameters. Below is an example of an extracted CSV file after targeting 2 pages with dentists from Miami:

Table

Full Code Example (Python)

And finally we are done! The following section contains the full code used in this tutorial, including for loops for targeting multiple pages at once and export connected results in one organized data set. Feel free to copy it and make adjustments.

#Importing our HTML API
from scrapingbee import ScrapingBeeClient 
# pandas dataframes for better data formatting
import pandas as pd 
# An internal Python library to integrate input parameters into a URL string
from urllib.parse import urlencode 

# Initializing our API client in the "client" variable
client = ScrapingBeeClient(api_key='YOUR_API_KEY')



base = "https://www.yellowpages.com/search"

search_term = input("Search_term: ").strip()
location_term = input("Location_term: ").strip()
Pagination = int(input("How many pages to scrape? "))

params = {"search_terms": search_term, "geo_location_terms": location_term}
url = f"{base}?{urlencode(params)}"
url_list=[url]

if Pagination<=1: 
   pass
# with each additional page, append the list of URLs, but change it by adding &page=n
else:
    for i in range(2, Pagination+1):  # n_pages=1 -> only base
        offset = i
        next_url= f"{url}&page={i}"
        url_list.append(next_url)
    print(url_list)

#print(url)

def scrape_Yellow_Pages():

    extract_rules = {
    "Page list": {
           "selector": "div.search-results.organic div.v-card",
           "type": "list",
           "output": {
           "Title": 'h2',
           "Categories":  'div.categories',
           "Phone number": 'div.phones.phone.primary',
           "Address": 'div.adr',
           "Experience": ' div.years-in-business'
           
           }
        },
}
    js_scenario = {
        "instructions": [
            {"wait": 2000}
        ]
    }
    # List of results for each page
    Pages_list=[]
    # For loop going through the list of URLs generated by your input
    for urls in url_list:
        response = client.get(
            #iterating through pages within the URL list
            urls,
            params={
                "extract_rules": extract_rules,
                "js_scenario": js_scenario,
                'stealth_proxy': 'True',
                }
            )
        result=(response.json())
        df=pd.DataFrame(result['Page list'])
        #appending the list of pages with a DataFrame for each page
        Pages_list.append(df)
    #uses concatenation to merge all DataFrames from the Pages_List
    Pages_DataFrame= pd.concat(Pages_list, ignore_index=True)
    print(Pages_DataFrame)
    Pages_DataFrame.to_csv("YellowPages_extraction.csv", index=False)

scrape_Yellow_Pages()

If you want to extract data from other pages, you can apply the same principles from this example, while we will take care of JS rendering and anti-bot bypass measures for you!

Start Scraping Business Leads Today

Why struggle with JavaScript rendering and anti-bot challenges when you can let our API handle it for you? With our tools, you can scrape pages Pages scraping fast, clean, and headache-free — no headless browsers, no CAPTCHAs, no wasted time. Whether you need one page or a hundred, go to ScrapingBee Signup Page and start using the optimal solutions for reliable business lead extraction!

Frequently Asked Questions (FAQs)

How do I scrape phone numbers from Yellow Pages?

You can use our extract_rules parameter with CSS selectors to pull phone numbers directly from Yellow Pages listings. It’s fast, accurate, and doesn’t require HTML parsing, which makes Data Extraction with ScrapingBee very simple!

Can I scrape multiple Yellow Pages pages at once?

Yes. Our API supports pagination and concurrent requests, so you can collect business data across several pages quickly and efficiently. In the tutorial, we only used a pagination approach, but you can find more details on running parallel scraping sessions on our How to Use Concurrency page.

Does ScrapingBee support JavaScript-based Yellow Pages content?

Absolutely! Yellow Pages listings are rendered with JavaScript, and our API handles it by default with a hidden render_js=True parameter – no headless browser setup required. You can learn more about JS scenarios in our article – Understanding JavaScript Rendering.

Will I get blocked by Yellow Pages while scraping?

Most likely not. our API uses smart proxy rotation and stealth settings to avoid common anti-bot protections like Cloudflare. Even on well-guarded web pages, you can keep Avoiding Blocks with ScrapingBee with our stealth_proxy configuration.

Before you go, check out these related reads:

Kevin Sahin

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.