In this blog post, I'll show you how to scrape google news by using Python and our Google news API, even if you're not a Python developer. You'll start with the straightforward RSS feed URL method to grab news headlines in structured XML. Then I'll show you how ScrapingBee’s web scraping API, our Google News API and IP rotation can extract public data.
By the end of this guide, you’ll have an easy access to the every news title you need without getting bogged down in complex infrastructure. Let's begin!
Quick Answer (TL;DR)
Need a quick start? The most efficient way to scrape is with ScrapingBee’s Google News Scraper API. It handles JavaScript rendering, proxy rotation, user agent management, and parsing for you.
This web scraping tool saves you from dealing with complex anti-bot measures, import requests and HTML data extraction. Use the code below to give it a try:
from scrapingbee import ScrapingBeeClient
client = ScrapingBeeClient(api_key='YOUR-API-KEY')
response = client.get(
url="https://app.scrapingbee.com/api/v1/store/google",
params={
"search": "artificial intelligence",
"search_type": "news",
"country_code": "us"
}
)
print(response.json()['organic_results'])
This approach to scraping news results will help you gather valuable data for your projects without third-party APIs or worrying about handling response codes yourself.
Set Up Your Python Environment
Before diving deeper into building your Google News scraper, let’s establish a clean, isolated environment for your project. This step is crucial because it ensures your scraper works consistently across different platforms (Windows, macOS, Linux) and prevents dependency conflicts with other projects.
Let’s walk through the setup process to create a clean, isolated environment for your Python web scraping project.
Get your API key
To use ScrapingBee, you’ll need an API key:
Sign up at ScrapingBee.com
Navigate to your dashboard
Copy your API key from the account section
Install Python and required libraries
Before creating a virtual environment, ensure you have Python installed on your system. Download the latest version frompython.org if needed. Verify your installation by running this command in your terminal window:
python --version
# or on some systems
python3 --version
You should see output showing Python 3.6 or higher. Next, make sure pip (Python’s package manager) is installed and updated:
python -m pip install --upgrade pip
Create and activate a virtual environment
Now that you have Python up and ready, you can create a new environment with these commands in your project directory:
# For Windows
python -m venv venv
venv\Scripts\activate
# For macOS/Linux
python -m venv venv
source venv/bin/activate
You’ll notice your terminal window command prompt changes to indicate the active environment, showing you’re working in an isolated space.
Install feedparser and pyshorteners
With your environment activated, we can now install the following libraries needed for your web scraper:
pip install feedparser pyshorteners
Feedparser will handle the RSS parsing, while pyshorteners will help us create more readable links in your output. The requests library will help with HTTP import requests when we need to go beyond RSS feeds. You can verify the installations with:
pip list
Great, your environment is ready. It's time to get into scraping.
Scrape Google News Using RSS Feeds
Did you know that Google News website provides a simpler backdoor for data extraction through RSS feeds? In my years of large scale scraping, I’ve found RSS feeds to be like that reliable old friend who doesn’t have all the fancy features but gets the job done consistently. I deliver clean, structured scrape data that’s significantly easier to parse than complex HTML file from the Google News page.data
What makes this approach particularly valuable for scraping news is its stability. Since RSS is an officially supported format, it’s less likely to break when Google updates their site. As a result, you’ll get standardized fields for titles, links, publication dates, and news sources.
Also, the data comes in a predictable format that remains consistent across different search result pages. Whether you’re monitoring specific topics or keywords over time, this method is perfect for extracting data from Google News articles.
Let’s explore how to access and leverage these feeds for your scraping projects.
Find the correct RSS feed URL
The first step of accessing data via RSS is constructing the appropriate RSS feed URL. Here is the base format:
https://news.google.com/rss/search?q=your+search+term
Now you can launch a topical search to scrape Google news results. Simply replace “your+search+term” with keywords (use + sign for spaces).
I’ve found that being specific with your search terms yields better news results. For example “electric vehicle batteries” will give you more focused content than just “electric vehicles” when you scrape Google for specific news articles.
Use these parameters to further refine your search results:
hl=[language-code] – Interface language (e.g., en-US, fr, es)
gl=[country-code] – Geographic location (e.g., US, UK, CA)
ceid=[country]:[language] – Edition ID
For instance, to get Canadian tech news in French:
https://news.google.com/rss/search?q=technologie&hl=fr-CA&gl=CA&ceid=CA:fr
Once you’ve constructed the proper source URL, you’re ready to retrieve and process the RSS feed data.
Parse the feed using feedparser
The feedparser library transforms complex RSS XML into Python-friendly objects without you having to worry about XML parsing intricacies.
First, install the library in your project directory:
pip install feedparser
Then implement the parsing with just a few lines of code to extract content, including the href attribute of links:
import feedparser
# Get news about climate change
search_term = "climate change"
query = search_term.replace(" ", "+")
feed_url = f"https://news.google.com/rss/search?q={query}"
# Parse and access the feed data
news_feed = feedparser.parse(feed_url)
# Basic feed information
print(f"Feed title: {news_feed.feed.title}")
print(f"Articles found: {len(news_feed.entries)}")
Feedparser handles all the HTTP requests, XML parsing, character encoding, and data normalization. It even takes care of different RSS format versions and extensions without you having to specify anything.
Extract title, link, date, and source
At this point, you’re ready to extract the specific data fields from each news article. Google News page has a helpful convention where they append the source publication name after the news title, separated by a dash:
import datetime
# Process the first 5 articles
for i, entry in enumerate(news_feed.entries[:5], 1):
# Google typically includes source after title with " - " separator
title_parts = entry.title.split(" - ")
clean_title = " - ".join(title_parts[:-1])
source = title_parts[-1] if len(title_parts) > 1 else "Unknown"
# Convert publication date to a more readable format
pub_date = datetime.datetime(*entry.published_parsed[:6]).strftime("%Y-%m-%d %H:%M:%S")
print(f"Article {i}:")
print(f" Title: {clean_title}")
print(f" Source: {source}")
print(f" Published: {pub_date}")
print(f" Link: {entry.link}")
The published_parsed attribute gives you a time tuple that you can convert to any date format you prefer. For production systems, I recommend storing both the original and formatted timestamps to maintain flexibility for different display needs.
Now that we’ve covered the basics of extracting data from RSS feeds, it's time to work on the raw data of your Python project.
Clean and Organize the Data
RSS feed scraping offers more stability than HTML scraping since the format rarely changes. However, the raw data still needs refinement before it’s truly useful.
You need to remove unnecessary elements, standardize formats, and structure the data in a way that’s easy to work with. Let’s look at some essential extraction techniques for this web scraping project.
Remove source from headlines
Google News typically appends the source publication at the end of each headline, separated by a dash. To get clean headlines, you need to extract just the news article title:
def clean_headline(title):
# Split on the last occurrence of " - "
parts = title.split(" - ")
if len(parts) > 1:
return " - ".join(parts[:-1])
return title
# Example usage
clean_title = clean_headline(entry.title)
This complete code handles cases where the headline itself might contain dashes, ensuring you only remove the source portion.
Shorten URLs for readability
Google News links are often long and contain tracking parameters. Using pyshorteners, you can create more manageable source URLs:
import pyshorteners
def shorten_url(url):
shortener = pyshorteners.Shortener()
try:
return shortener.tinyurl.short(url)
except:
return url # Return original if shortening fails
# Example usage
short_link = shorten_url(entry.link)
Now you have more readable links that are easier to share or display in reports when you extract data.
Format and structure the output
Let's make your Google News data more usable, by organizing it into a structured format like a dictionary or list of dictionaries:
def structure_news_data(entries):
structured_data = []
for entry in entries:
article = {
'title': clean_headline(entry.title),
'source': entry.title.split(" - ")[-1],
'published': entry.published,
'link': shorten_url(entry.link)
}
structured_data.append(article)
return structured_data
Now you have data that is easier to process, store, and analyze in subsequent steps.
Export and Scale Your Scraper
Your Python web scraping approach can be easily scaled to handle multiple search queries. Once you’ve extracted and cleaned your results, you’ll want to export data for later use and potentially expand your web scraper to monitor multiple topics.
Save data to a CSV file
A CSV file (Comma-Separated Values) are an excellent format for storing structured data. They’re easy to create and can be opened in spreadsheet applications or imported into databases:
import csv
def save_to_csv(data, filename='google_news.csv'):
# Get field names from the first dictionary
if not data:
return
fieldnames = data[0].keys()
with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(data)
print(f"Data saved to {filename}")
This function creates a CSV file with headers matching your data structure and writes each article as a row. You can analyze the scraped data in tools like Google Sheets or Excel.
Now that you can store your data in a CSV file, let’s expand your scraper to handle multiple search terms.
Scrape multiple search terms
Finally, you can monitor various news topics simultaneously, by creating a flexible function that processes a list of search terms:
def scrape_multiple_terms(search_terms):
all_results = {}
for term in search_terms:
print(f"Scraping news for: {term}")
feed_url = f"https://news.google.com/rss/search?q={term.replace(' ', '+')}"
feed = feedparser.parse(feed_url)
# Process and store results
all_results[term] = structure_news_data(feed.entries)
# Save term-specific results
save_to_csv(all_results[term], f"{term.replace(' ', '_')}_news.csv")
return all_results
# Example usage
topics = ["climate change", "artificial intelligence", "renewable energy"]
results = scrape_multiple_terms(topics)
You could easily extend this further by scheduling your Python script to run periodically, storing search results in a database, or adding email notifications for important news items.
Why Scrape Google News?
Google News aggregates content from thousands of publishers worldwide, making it an invaluable resource for data-driven decision making. By automating the extraction of news data, you gain access to real-time information that would be impossible to collect manually.
From tracking brand mentions to trend analysis in the stock market, Google News scraping opens up possibilities for businesses, researchers, and marketers alike. The structured nature of news data makes it perfect for analysis, visualization, and integration with other systems.
Now that you understand the overall value of scraping Google News, let’s explore how it enables organizations to stay on top of rapidly evolving information landscapes.
Track real-time trends and events
For marketing teams, real-time data allows identifying viral topics for timely content creation that captures peak interest. Research analysts can track breaking industry developments to update forecasts and reports.
The time-sensitive nature of news data is what makes automated media monitoring particularly valuable. Without it, organizations often discover critical information too late to capitalize on opportunities or mitigate risks.
Use cases: sentiment analysis, brand monitoring, research
Google News data serves as the foundation for numerous practical applications that can transform raw information into actionable insights:
Brand Reputation Tracking: Monitor how your company is portrayed across thousands of news sources
Competitive Intelligence: Track product launches, partnerships, and strategic moves by competitors
Market Research: Identify emerging trends and consumer interests based on news coverage patterns
Investment Research: Gather news about specific companies, sectors or the stock market to inform investment decisions
Academic Studies: Analyze media coverage patterns for research on communication and journalism
These applications demonstrate why scraping Google News has become essential for organizations seeking data-driven advantages.
Limit results and handle errors
When scraping at scale, proper error handling is crucial. I recommend to always implement try/except blocks to gracefully handle empty feeds, connection problems, or api credentials issues:
def scrape_with_error_handling(search_term, max_retries=3, delay=5):
for attempt in range(max_retries):
try:
response = client.get(
url="https://app.scrapingbee.com/api/v1/store/google",
params={
"search": search_term,
"search_type": "news",
"nb_results": 20 # Limit results
}
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429: # Too many requests
print(f"Rate limit hit. Waiting {delay * (attempt + 1)} seconds...")
time.sleep(delay * (attempt + 1))
else:
print(f"Error {response.status_code}. Attempt {attempt + 1}/{max_retries}")
time.sleep(delay)
except Exception as e:
print(f"Exception: {e}. Attempt {attempt + 1}/{max_retries}")
time.sleep(delay)
return None # Return None if all retries failed
Implementing this error handling approach transforms your scraping script from a fragile prototype into a production-ready tool.
Get Clean Google News Data with ScrapingBee
Web scraping is like a delicate dance - too aggressive, and we'll be blocked; too timid, and you won't get the data you need. By using ScrapingBee's Google News Scraper API, you avoid all these challenges. Our API handles proxy rotation, JavaScript rendering, and browser fingerprinting for you, so you can focus on using the data rather than fighting to collect it.
ScrapingBee offers 1,000 free API credits when you sign up, which is perfect for testing the service and seeing how it works with your specific use case. If you're serious about scraping Google News at scale, their paid plans provide reliable access with excellent support. Get started now!
Frequently Asked Questions (FAQs)
Is scraping Google News legal?
While web scraping itself isn't illegal, it's important to respect terms of service and copyright laws. ScrapingBee helps ensure compliance by following proper scraping etiquette. For commercial use, consult Google News legal guidelines and a legal professional about your specific use case.
Can I scrape Google News without coding?
Yes! ScrapingBee offers a no-code solution through their user interface, where you can set up Google News scraping without writing a single line of code. This is perfect for non-developers who need to extract news data regularly.
How often can I scrape Google News?
ScrapingBee handles rate limiting for you, but as a best practice, avoid excessive scraping. For real-time monitoring, scraping once every few hours is usually sufficient unless you have specific high-frequency needs.
What’s the best way to scrape news from multiple keywords?
The most efficient approach is to use the batch processing method shown earlier, where you loop through keywords while implementing proper delays between requests. ScrapingBee's API is designed to handle this type of usage efficiently.

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.