How to Scrape Yahoo: Step-by-Step Tutorial

Kevin Sahin | 06 September 2025 | 16 min read

Table of contents

Scraping Yahoo search results and finance data is a powerful way to collect real-time insights on market trends, stock performance, and company profiles. With ScrapingBee, you can extract this information easily — even from JavaScript-heavy pages that typically block traditional scrapers.

Yahoo’s dynamic content and anti-bot protections make it difficult to scrape using basic tools. But ScrapingBee handles these challenges out of the box. Our API automatically renders JavaScript, rotates proxies, and bypasses bot detection to deliver clean, structured data from both Yahoo Search and Yahoo Finance.

Whether you're tracking stock tickers, pulling company financials, or building a competitive analysis tool, this guide will walk you through how to scrape Yahoo data using our Python SDK. You’ll get actionable examples for sending API calls, extracting content with CSS selectors, and exporting it into clean, readable formats — all with minimal setup.

Quick Answer (TL;DR)

Our HTML API makes it easy to scrape Yahoo Search and Yahoo Finance with a single GET request. JavaScript rendering is enabled by default, and we can define additional parameters in a "js_scenario" variable. This way, we can access stock prices, financial information, or search results. Take advantage of our intuitive Python SDK to create a Yahoo Search Scraping API and parse clean data!

Scraping Yahoo with ScrapingBee

Because most people access news headlines, product listings and available services through search engines, web scraping is a very appealing strategy for pulling data from the extracted Yahoo HTML code. However, such platforms try to limit visits coming from automated web connections.

Fortunatelly, our HTML API is well equipped for extracting and analyzing details on the Yahoo website. Here are the biggest strengths you can implement with our Python SDK to properly load the target URL with beginner-friendly code:

JavaScript Rendering. Our API Loads dynamic Yahoo search and finance pages without extra setup.
Proxy Rotation. GET API requests with proxy connections bypass IP blocks and rate limits through automatic proxy management.
CAPTCHA Management. ScrapingBee detects and resolves Yahoo’s CAPTCHA challenges internally, keeping requests uninterrupted.
One-Line API Calls. A single GET request to ScrapingBee handles rendering, proxying, and headers, avoiding custom infrastructure.
Structured Output. Choose between raw HTML, screenshots, or structured JSON for direct parsing of Yahoo results.

A combination of these features creates a working environment, where you can build your custom scraper for market research, financial analysis, real-time stock data tracking, and other high-volume extraction tasks. If you want to learn more about our broad range of customizable parameters, check out our extensive ScrapingBee Documentation page.

For web scraping Yahoo or any high-traffic platform for the first time, using Python is the best introduction, because of its beginner-friendly syntax that lowers the learning curve for automation and parsing tasks. It provides extensive support of external packages, including our official SDK and ScrapingBee’s official SDK, and its clear, beginner-friendly syntax that lowers the learning curve for automation and parsing tasks.

Start by installing Python (3.6 or newer) on your system:

Straight from the website – python.org
Microsoft Store (simpler setup for Windows users)
Linux package manager (for distributions like Ubuntu, Debian, Fedora, or Arch; ensures easy updates through system packages)

Python makes it straightforward to work with external libraries. Just launch your Terminal (or Command Prompt on Windows) and run pip install <package-name> to get started. For scraping Yahoo using our HTML API, you’ll want to install the following key packages:

scrapingbee – This Python SDK simplifies sending requests to the our API. It supports JavaScript rendering, handles proxies behind the scenes, and makes it easy to interact with dynamic content without integration of headless browsers.
pandas – A versatile data analysis library that turns uses Raw JSON content to create structured DataFrames, making it easier to clean, organize, and explore extracted data in real time.

You can install both libraries at the same time with a single pip request:

pip install scrapingbee pandas

Okay, one more step before we start working on your web scraping script. Log in or register a ScrapingBee account to get access to our API. If you never used our services, don't worry! After registration, you will receive a free trial of 1,000 credits for a week to experience the convenience of our Python SDK.

After signing up, you will be greeted by the our dashboard, which provides a clear overview of your available credits, usage statistics, and account settings. Copy the unique API key in the top-right section. This key is required for authentication and usage tracking; therefore you will have to add it to your script.

API key

It's time to start coding! Create a directory where you want to contain all of your data scraping elements. Create a text file with a .py extension, which indicates a Python file. For example, we named our script "Yahoo_scraper.py".

Then, open a text editor of your choice. While a simple notepad works fine, we strongly recommend using a dynamic IDE that shows your mistakes before running the code, or at least marks syntax elements in different colors.

We begin shaping the code by importing libraries that we will use in defining web scraping logic. The follow section of code enables our SDK, plus additional tools from other packages:

#Importing our HTML API
from scrapingbee import ScrapingBeeClient 
# pandas dataframes for better data formatting
import pandas as pd 
# An internal Python library to integrate input parameters into a URL string
from urllib.parse import urlencode

Now, its time to add the copied API key. We attach it to a "client" variable that initializes our Python SDK, ensuring that every request is authorized and uses the available parameters for efficient data extractions.

# Initializing our API client in the "client" variable
client = ScrapingBeeClient(api_key='YOUR_API_KEY')

The "base" variable stores the initial Yahoo search URL template, which is later modified with user input. By applying Python’s "urlencode" function, we safely append the query parameters and dinamically update our link with both the chosen search term and the results page number.

After accepting provided parameters via Python's built-in "input" function, the new link is created by encoding the base URL. Then, we create a list that starts with the first page of the search term. The pagination logic is fully formed once we add an "if" statement, which encodes additional URLs and appends them to the list based on the number of pages.

# The base search page URL
base = "https://search.yahoo.com/search?p="
# Invoking Python's "input" command - the  addition of .strip method removes empty spaces which cannot exist inside the URL
search_term = input("Search_term: ").strip()
pages = int(input("How many pages to scrape? "))
params = {"search_terms": search_term}
#Encoding the new URL
url = f"{base}?{urlencode(params)}"
url_list=[url]

if pages<=1:
   #optional: printing one generated URL if the user requests only one page
   print(url)
   pass
# with each additional page, append the list of URLs, but change it by adding &page=n
else:
    for page in range(1, pages+1):  # n_pages=1 -> only base
        page_number = (page-1)*10+1
        next_url= f"{url}&page={page_number}"
        url_list.append(next_url)
    #optional: priting to test and see how generated URLs look
    print(url_list)

Note: Python sees lines starting with a hash symbol (#) as comments, which will be ignored after running the code. Use them as guidelines to better understand the provided code.

Set Up Your ScrapingBee Request

Now, we can start defining a function that contains our scraping logic. At the start, we define the "js_scenario" variable, which is a dictionary that contains customizable parameters for JavaScript rendering. We added a "wait" parameter, which instructs the headless browser to wait for 3 seconds so the page has enough time to load fully, reducing the risk of missing or incomplete data during extraction.

# Start the function definition, the indented lines define its boundaries
def scrape_Yahoo():
    # Rules for JavaScript Rendering
    js_scenario = {
        "instructions": [
        # Tells the headless browser to wait for 3 seconds
            {"wait": 3000}
        ]
    }

Scrape Yahoo Search Results

And here we arrived to the tricky part, so stick with us. The next variable, an "extract_rules" dictionary, cleans up and parses raw HTML code. Before sending Python requests, we define which data to extract from Yahoo with CSS selectors.

But how do we find them? Before automating access to the site, visit the Yahoo page and open developer tools by pressing the F12 button (or right-click the area and select "Inspect").

SERP

Then, by navigating different HTML elements through developer tools, they will be highlighted on the page. To extract data from search results, we first find a selector which encompasses all relevant data from each result.

A class element of each container will be our main CSS selector, which highlights key areas and separates them from the rest of the clutter. The following image shows which <div> element contains all relevant information. Adding the CSS selector to the search bar within Developer Tools shows how many instances match the selector (should be one per each search result).

SERP

The following section of code picks starts defining extraction rules, picking this CSS selector as the main container for all extractions, preparing a list of targeted data points within each section.

    extract_rules = {
    "Search_Result": {
        #the main selector picking out each search result
        "selector": "div.dd.algo.algo-sr.relsrch.Sr",
        #extracts a list of parameters within the main selector
        "type": "list",
        "output": {
        # Output: three columns, each one appended by these data points from each search result
           "Title": 'h3',
           "link": 'a @href',
           "description": 'div.compText'
           }
        },

}

After defining extract rules, we create an empty list called "Pages_list", which will handle pagination by appending results from scraping different search results so our external "Pandas" library can restructure them into columns for an easy-to-read csv file that you can upload and inspect in Google Sheets, Excel, or any other editor of your choice.

    Pages_list=[]

Now comes the complicated for loop. The following section defines a loop of GET API calls, one for each provided link, as we can see in its definition. After that, we attached the previously defined parameters, plus some additional instructions for our API client.

    for pages in url_list:
        response = client.get(
            pages,
            params={
                "extract_rules": extract_rules,
                "js_scenario": js_scenario,
                'premium_proxy': 'True',
                'block_resources' : 'False'
                }
            )

Let's take a closer look at each parameter:

extract_rules – Defines CSS/XPath selectors for targeted data extraction from the page.
js_scenario – A script of browser actions (clicks, inputs, waits) to run before scraping.
premium_proxy – Boolean flag to use our network of fast, residential-quality proxies.
block_resources – Blocks images, CSS, or other heavy assets to speed up requests.

Then, after each API call, the loop continues to assign a response to the "result" variable. That is followed by a "df" variable, which restructures the result into a Pandas DataFrame, and appends it to our "Pages_list". The last variable, "Pages_DataFrame", connects all DataFrames within the list so they become a part of one data set.

        #stores the response in a JSON format
        result=(response.json())
        # transforms each result into a Pandas DataFrame
        df=pd.DataFrame(result['Search_Result'])
        # Append the previously defined list
        Pages_list.append(df)
    # Loop ends, the line below connects all results into one DataFrame
    Pages_DataFrame= pd.concat(Pages_list, ignore_index=True)
    # Exporting data in a csv file
    Pages_DataFrame.to_csv("result.csv", index=False)
    print(Pages_DataFrame)

After the last section, we close the scraping function definition, and the last line in the script calls for its execution:

scrape_Yahoo()

If everything is done correctly, your result should look like this example of a "what is web scraping" search term targeting two pages:

Table

Scrape Yahoo Finance Data

Using the same rules and principles, we can also extract Yahoo Finance web data. Let's update our script by expanding it with an additional function to collect information about financial health on the stock market.

Before calling the "scrape_yahoo" function, we add another function definition that will contain the logic for financial web scraping. In it, we add the similar URL creation logic to target desired data from Yahoo Finance:

def scrape_Yahoo_finance():
 #Accepts user input of a code for particular stock
    finance_term = input("Finance_term: ").strip()
    base_finance = "https://finance.yahoo.com/quote/"
    url = f"{base_finance}{finance_term}"
    print(url)

After that, we have the previously used js_scenario and "extract_rules" variables. Rules for JavaScript rendering remain the same because we did not encounter any challenges scraping financial data. However, the "extract_rules" section is completely different, targeting CSS selectors from the summary of stock data in the URL:

js_scenario = {
        "instructions": [
            {"wait": 3000}
        ]
    }
    extract_rules = {
    "Finance_Result": {
        #the main selector picking out data from Yahoo finance summary
        "selector": "div.container.yf-1qull9i.split-panel > ul",
        #extracts a list of parameters within the main selector
        "type": "list",
        "output": {
            "Previous Close": 'span[title="Previous Close"] + span.value',
            "Open": 'span[title="Open"] + span.value',
            "Bid": 'span[title="Bid"] + span.value',
            "Ask": 'span[title="Ask"] + span.value',
            "Day\'s Range": 'span[title="Day\'s Range"] + span.value',
            "52 Week Range": 'span[title="52 Week Range"] + span.value',
            "Volume": 'span[title="Volume"] + span.value',
            "Avg. Volume": 'span[title="Avg. Volume"] + span.value',
            "Market Cap (intraday)": 'span[title="Market Cap (intraday)"] + span.value',
            "Beta (5Y Monthly)": 'span[title="Beta (5Y Monthly)"] + span.value',
            "PE Ratio (TTM)": 'span[title="PE Ratio (TTM)"] + span.value',
            "EPS (TTM)": 'span[title="EPS (TTM)"] + span.value',
            "Earnings Date": 'span[title="Earnings Date"] + span.value',
            "Forward Dividend & Yield": 'span[title="Forward Dividend & Yield"] + span.value',
            "Ex-Dividend Date": 'span[title="Ex-Dividend Date"] + span.value',
            "1y Target Est": 'span[title="1y Target Est"] + span.value'
}
        },
}

Note: To not overcomplicate your first Yahoo scraper, here we only defined logic to target financial data for one URL, but you can use the same pagination rules to extract data from Yahoo Finance URLs. Note: Even though we use the same variable names, our script will have two function definitions: one for search results and the other for financial statements. After accepting user input on which function to call, the other one will be ignored unless you change the script to run both at the same time.

Once the hard part is done, we define the API call again, and the remaining steps are near identical:

    response = client.get(
                url,
                params={
                    "extract_rules": extract_rules,
                    "js_scenario": js_scenario,
                    'premium_proxy': 'True',
                    'block_resources' : 'False'
                    }
                )
    result=(response.json())
    print(result)
    df=pd.DataFrame(result['Finance_Result'])
    df.to_csv("result_finance.csv", index=False)

Now, after closing the function, the last section of code introduces the logic to pick which scraper to use: extract search results or scrape stock data from Yahoo Finance.

Search_option=int(input("1: Scrape search results\n2: Scrape Finance\n\n:"))
if(Search_option)==1:
    #if you enter "1", calls the search pages function
    scrape_Yahoo()
elif(Search_option)==2:
    #if you enter "2", calls the scrape Yahoo finance function
     scrape_Yahoo_finance()

At the end, we end up with a multifunctional scraper. Running the script allows you to choose between the two defined options. Here is an example of choosing data from Yahoo Finance and targeting the Apple stock market data:

Data

After the extraction, you will see scraped financial data in your terminal:

Data

The same stock market data will be in your "result_finance.csv" file as well:

Table

Final Code Example: Yahoo Scraper with ScrapingBee

If you feel confused or lost after this tutorial, please don't worry! The code below contains a functional prototype for our multifunctional scraper that will let you target search results and scrape stock market data. Our HTML API is very flexible, and the written code can be expanded to scrape more financial statements, news, data on services, and other details that fit your use cases.

#Importing our HTML API
from scrapingbee import ScrapingBeeClient 
# pandas dataframes for better data formatting
import pandas as pd 
# An internal Python library to integrate input parameters into a URL string
from urllib.parse import urlencode 

# Initializing our API client in the "client" variable
client = ScrapingBeeClient(api_key='YOUR_API_KEY')


#print(url)

def scrape_Yahoo():
    base = "https://search.yahoo.com/search?p="

    search_term = input("Search_term: ").strip()
    pages = int(input("How many pages to scrape? "))

    params = {"search_terms": search_term}
    url = f"{base}?{urlencode(params)}"
    url_list=[url]

    if pages<=1:
        print(url)
    pass
# with each additional page, append the list of URLs, but change it by adding &page=n
    for page in range(1, pages+1):  # n_pages=1 -> only base
            page_number = (page-1)*10+1
            next_url= f"{url}&page={page_number}"
            url_list.append(next_url)
            print(url_list)




    js_scenario = {
        "instructions": [
            {"wait": 3000}
        ]
    }

    extract_rules = {
    "Search_Result": {
        #the main selector picking out each search result
        "selector": "div.dd.algo.algo-sr.relsrch.Sr",
        #extracts a list of parameters within the main selector
        "type": "list",
        "output": {
        # Output: three columns, each one appended by these data points from each search result
           "Title": 'h3',
           "link": 'a @href',
           "description": 'div.compText'
           }
        },

}

    Pages_list=[]
    for pages in url_list:
        response = client.get(
            pages,
            params={
                "extract_rules": extract_rules,
                "js_scenario": js_scenario,
                'premium_proxy': 'True',
                'block_resources' : 'False'
                }
            )
        #stores the response in a JSON format
        result=(response.json())
        # transforms each result into a Pandas DataFrame
        df=pd.DataFrame(result['Search_Result'])
        # Append the previously defined list
        Pages_list.append(df)
    # Loop ends, the line below connects all results into one DataFrame
    Pages_DataFrame= pd.concat(Pages_list, ignore_index=True)
    # Exporting data in a csv file
    Pages_DataFrame.to_csv("result.csv", index=False)
    print(Pages_DataFrame)

def scrape_Yahoo_finance():
    finance_term = input("Finance_term: ").strip()
    base_finance = "https://finance.yahoo.com/quote/"
    url = f"{base_finance}{finance_term}"
    print(url)

    js_scenario = {
        "instructions": [
            {"wait": 3000}
        ]
    }
    extract_rules = {
    "Finance_Result": {
        #the main selector picking out data from Yahoo finance summary
        "selector": "div.container.yf-1qull9i.split-panel > ul",
        #extracts a list of parameters within the main selector
        "type": "list",
        "output": {
            "Previous Close": 'span[title="Previous Close"] + span.value',
            "Open": 'span[title="Open"] + span.value',
            "Bid": 'span[title="Bid"] + span.value',
            "Ask": 'span[title="Ask"] + span.value',
            "Day\'s Range": 'span[title="Day\'s Range"] + span.value',
            "52 Week Range": 'span[title="52 Week Range"] + span.value',
            "Volume": 'span[title="Volume"] + span.value',
            "Avg. Volume": 'span[title="Avg. Volume"] + span.value',
            "Market Cap (intraday)": 'span[title="Market Cap (intraday)"] + span.value',
            "Beta (5Y Monthly)": 'span[title="Beta (5Y Monthly)"] + span.value',
            "PE Ratio (TTM)": 'span[title="PE Ratio (TTM)"] + span.value',
            "EPS (TTM)": 'span[title="EPS (TTM)"] + span.value',
            "Earnings Date": 'span[title="Earnings Date"] + span.value',
            "Forward Dividend & Yield": 'span[title="Forward Dividend & Yield"] + span.value',
            "Ex-Dividend Date": 'span[title="Ex-Dividend Date"] + span.value',
            "1y Target Est": 'span[title="1y Target Est"] + span.value'
}
        },
}
    response = client.get(
                url,
                params={
                    "extract_rules": extract_rules,
                    "js_scenario": js_scenario,
                    'premium_proxy': 'True',
                    'block_resources' : 'False'
                    }
                )
    result=(response.json())
    print(result)
    df=pd.DataFrame(result['Finance_Result'])
    df.to_csv("result_finance.csv", index=False)
Search_option=int(input("1: Scrape search results\n2: Scrape Finance\n\n:"))
if(Search_option)==1:
    #if you enter "1", calls the search pages function
    scrape_Yahoo()
elif(Search_option)==2:
    #if you enter "2", calls the scrape Yahoo finance function
     scrape_Yahoo_finance()

Use ScrapingBee for Effortless Yahoo Scraping

Scraping Yahoo doesn’t have to be hard. As you can see, with our flexible API, you can collect data from Yahoo Search or Finance using a single API call, while we teake care of proxy connections CAPTCHAs, and headless browser navigation. Just send a GET request with our customizable parameters, and we will handle the common obstacles that are frequent on popular websites.

Whether you're tracking stock prices, analyzing competitors, or researching search visibility, our platform gives you clean, structured HTML or JSON results. Register today and use your 1,000 free credits and modify our ready-to-use script to fit your use cases!

Frequently Asked Questions (FAQs)

Can I scrape Yahoo Search using Python and ScrapingBee?

Yes. Our Python SDK lets you scrape Yahoo Search with just a few lines of code. It handles JavaScript rendering, proxies, and CAPTCHAs automatically, creating an environment for a data extraction, as defined in our Python Web Scraping Guide.

Does Yahoo block scrapers?

Yes, Yahoo blocks scrapers by detecting bot-like behavior, rate-limiting IPs, and showing CAPTCHAs. Our API bypasses these challenges and restores public data Web Scraping with built-in proxy rotation and headless browser rendering.

What data can I get from Yahoo Finance?

You can extract stock prices, market caps, P/E ratios, earnings dates, dividend yields, and other financial information in real time, while our API takes care of JavaScript rendering and other connection details.

Is it legal to scrape Yahoo?

Yes, scraping publicly available Yahoo data is legal. However, always review Yahoo’s Terms of Service and consult legal advice before scraping at scale or for commercial use.

Before you go, check out these related reads:

Kevin Sahin

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.