99.9% success rate Google Search API Try it now

How to Scrape YouTube Comments for Insights and Analysis

30 April 2026 | 12 min read

Scraping YouTube comments is one of the most practical ways to collect large-scale public opinion data, and demand for it has grown alongside YouTube's position as the largest video platform on the internet. Comment sections capture unfiltered reactions to products, political events, health topics, and cultural moments, the kind of data that surveys rarely reach and social media APIs increasingly restrict or charge for access to.

This guide covers three methods for collecting YouTube comment data in 2026: the official YouTube Data API, Python-based scraping with yt-dlp, and managed scraper APIs for production pipelines. Each section includes working code so you can start collecting data regardless of which approach fits your scale and technical setup.

How to Scrape YouTube Comments for Insights and Analysis

Quick Answer (TL;DR)

The YouTube Data API is free and returns structured JSON, but its 10,000 unit daily quota limits you to roughly 3,000–10,000 comments per day. For larger volumes, the YouTube comment scraper API from ScrapingBee handles pagination and proxy rotation automatically for production-scale collection.

Here's how scraping YouTube comments looks like:

import requests

API_KEY = "YOUR_API_KEY"
resp = requests.get(
    "https://www.googleapis.com/youtube/v3/commentThreads",
    params={"part": "snippet", "videoId": "dQw4w9WgXcQ", "key": API_KEY, "maxResults": 10}
)
for item in resp.json()["items"]:
    print(item["snippet"]["topLevelComment"]["snippet"]["textDisplay"])

Why People Scrape YouTube Comments

The most common use case is sentiment analysis. Brands track public reaction to product launches, ad campaigns, or PR events by pulling comments from relevant videos and running them through a classifier. A product announcement with 50,000 comments gives a far larger signal than any survey.

Academic and market researchers use comment data as a proxy for public discourse, studying how people discuss health topics, political events, or economic conditions across thousands of videos without needing access to private data.

Content creators and agencies analyze what audiences ask for, complain about, or praise to inform future video topics and formats. Moderation teams scrape comments to detect spam, coordinated inauthentic behavior, or hate speech at scale. Competitive analysts compare comment volume and sentiment on their own videos versus a competitor's.

The web scraping YouTube patterns that work for video metadata apply directly to comment extraction, since both live behind the same anti-bot layer.

Methods to Scrape YouTube Comments

Three approaches cover most use cases. The right one depends on your volume requirements, technical constraints, and whether you need a recurring pipeline or a one-off pull.

MethodProsConsBest For
YouTube Data APIOfficial, structured JSON, free tier10,000 units/day quota, requires API key setupSmall-to-medium projects
Python scraping (yt-dlp)No quota limits, full controlFragile to YouTube updates, slower than APIOne-off extractions, quota exhausted
Managed scraper APIHandles blocks/proxies/JS, scales easilyPaid serviceProduction pipelines, large-scale collection

The YouTube Data API is the right starting point for most developers. It is official, well-documented, and free for modest volumes. When quota becomes the bottleneck, yt-dlp covers one-off bulk pulls without any rate ceiling. For recurring collection across many videos, a managed API removes the maintenance overhead entirely.

Method 1: YouTube Data API (Official)

The Data API is the most reliable and beginner-friendly approach. Setup takes about ten minutes and gives you access to structured comment data with author names, like counts, timestamps, and reply thread information.

1. Get an API key

Go to the Google Cloud Console, create a project, enable the YouTube Data API v3, and create an API key under Credentials. Copy the key; you will pass it as a query parameter on every request. No OAuth is required for reading public comment data.

2. Understand the endpoint

The commentThreads endpoint returns top-level comments on a video. The comments endpoint returns replies to a specific comment using the parent comment's ID. Most workflows start with commentThreads and call comments selectively for threads with high reply counts.

3. Make a basic request

This request pulls the top 100 comments from a video, sorted by relevance, and prints the author name alongside the first 80 characters of each comment:

import requests

API_KEY = "YOUR_API_KEY"
VIDEO_ID = "dQw4w9WgXcQ"

params = {
    "part": "snippet",
    "videoId": VIDEO_ID,
    "key": API_KEY,
    "maxResults": 100,
    "order": "relevance",  # or "time"
    "textFormat": "plainText",
}
resp = requests.get("https://www.googleapis.com/youtube/v3/commentThreads", params=params)
data = resp.json()

for item in data["items"]:
    comment = item["snippet"]["topLevelComment"]["snippet"]
    print(f"{comment['authorDisplayName']}: {comment['textDisplay'][:80]}")

4. Handle pagination

YouTube returns up to 100 comments per page. A nextPageToken field in the response tells you whether more pages exist. Loop until it is absent, collecting each comment's author, text, like count, publish date, and reply count:

import requests

all_comments = []
next_token = None

while True:
    params = {
        "part": "snippet",
        "videoId": VIDEO_ID,
        "key": API_KEY,
        "maxResults": 100,
        "textFormat": "plainText",
    }
    if next_token:
        params["pageToken"] = next_token

    resp = requests.get("https://www.googleapis.com/youtube/v3/commentThreads", params=params)
    data = resp.json()

    for item in data["items"]:
        c = item["snippet"]["topLevelComment"]["snippet"]
        all_comments.append({
            "author": c["authorDisplayName"],
            "text": c["textDisplay"],
            "likes": c["likeCount"],
            "published": c["publishedAt"],
            "reply_count": item["snippet"]["totalReplyCount"],
        })

    next_token = data.get("nextPageToken")
    if not next_token:
        break

print(f"Collected {len(all_comments)} comments")

5. Fetch replies

To pull replies for a specific comment thread, call the comments endpoint with the parent comment's ID:

def get_replies(parent_id, api_key):
    resp = requests.get(
        "https://www.googleapis.com/youtube/v3/comments",
        params={"part": "snippet", "parentId": parent_id, "key": api_key, "maxResults": 100, "textFormat": "plainText"}
    )
    return [item["snippet"]["textDisplay"] for item in resp.json().get("items", [])]

6. Save to CSV

Once the loop finishes, the all_comments list is already structured as a list of dicts, which pandas loads directly into a DataFrame:

import pandas as pd
df = pd.DataFrame(all_comments)
df.to_csv("youtube_comments.csv", index=False)

7. Understand quota limits

Each commentThreads request costs 1–3 quota units depending on the parameters used. With the default 10,000 unit daily limit per project, you can collect roughly 3,000–10,000 comments per day. This is the main bottleneck of the official API, not rate limiting or IP blocks, but the hard daily ceiling that resets at midnight Pacific time. If your project needs more than that, you will need to request a quota increase from Google or switch to one of the other methods covered below. Videos with disabled comments return a 403 with a commentsDisabled error; handle this in your pagination loop to avoid breaking on those cases.

Method 2: Python Scraping Without the Official API

When the Data API quota runs out or you need a one-off bulk pull without setting up a Google Cloud project, yt-dlp is the most practical option. It is a community-maintained command-line tool that extracts YouTube data including comments without requiring an API key.

1. Using yt-dlp

Install it with pip install yt-dlp, then run from the command line or call it from Python:

# Command line:
# yt-dlp --write-comments --skip-download "https://www.youtube.com/watch?v=VIDEO_ID"

# Python:
import subprocess, json

result = subprocess.run(
    ["yt-dlp", "--write-comments", "--skip-download", "--dump-json", "https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
    capture_output=True, text=True
)
data = json.loads(result.stdout)
for comment in data.get("comments", [])[:5]:
    print(f"{comment['author']}: {comment['text'][:80]}")

2. Limitations

Because the YouTube comment section loads dynamically via AJAX, traditional scraping methods using raw requests and BeautifulSoup will not work as they cannot execute JavaScript. While browser automation tools like Selenium or Playwright can handle this dynamic content, they are often slow, resource-intensive, and difficult to scale because they require loading a full browser instance for every session.

3. When this breaks

The main challenge with this approach is that YouTube updates its internal API frequently to prevent unauthorized access. Since yt-dlp is community-maintained, the developers usually release updates to keep up with these changes, but there can be a lag between a YouTube update and a library fix. This makes the method less stable than the official API for long-term production use.

Method 3: Using a Managed Scraper API

For production environments where reliability is critical, a managed solution is the most efficient choice. This approach bypasses the rigid 10,000-unit quota of the official API and eliminates the need to maintain fragile custom scrapers. By using a Web Scraping API from ScrapingBee, you offload the complexities of proxy rotation, IP blocks, and JavaScript rendering to a dedicated service.

1. Get an API key from ScrapingBee

To get started, create an account on the ScrapingBee dashboard. Once logged in, your unique API key will be visible on the main interface. You will use this key to authenticate your requests, allowing the service to manage the headless browsers and proxy pools required to extract data from YouTube's dynamic pages.

2. Make a request

You can fetch structured comment data by sending a simple GET request to the ScrapingBee API endpoint with your video ID and desired comment count. This script uses the requests library to call the API and prints the author and text of the first five comments:

import requests

resp = requests.get(
    "https://app.scrapingbee.com/api/v1/store/youtube/comments",
    params={
        "api_key": "YOUR_SCRAPINGBEE_KEY",
        "video_id": "dQw4w9WgXcQ",
        "count": 500,
    },
)
comments = resp.json()
for c in comments[:5]:
    print(c["author"], c["text"][:80])

3. Scale across multiple videos

To collect data from a list of different videos, you can iterate through a list of video IDs and save the resulting JSON data into individual local files. This approach is ideal for building larger datasets while using a small delay to keep your local execution orderly:

video_ids = ["id1", "id2", "id3"]
import time, json

for vid in video_ids:
    resp = requests.get(
        "https://app.scrapingbee.com/api/v1/store/youtube/comments",
        params={"api_key": "YOUR_KEY", "video_id": vid, "count": 1000},
    )
    with open(f"comments_{vid}.json", "w") as f:
        json.dump(resp.json(), f)
    time.sleep(1)

4. Why this approach

Managed APIs provide a "set and forget" infrastructure. You never have to worry about YouTube's internal updates or proxy management, as the service handles dynamic rendering automatically without any quota limits.

Which Method Should You Use?

Choosing the right method for your project comes down to your required volume and technical overhead.

  • If you need fewer than 5,000 comments per day and want a free, official solution, the YouTube Data API is the best fit.
  • For a one-off bulk extraction where you need to bypass quotas without a Google Cloud project, yt-dlp is a powerful alternative.
  • However, for a production pipeline requiring recurring, high-volume collection, a managed scraper API is the professional choice to ensure stability.

What Data You Can Extract from Comments

All three methods return the same essential data points, though the JSON structure varies slightly across tools. You can extract the full comment text in plain text or original HTML, alongside metadata like the author display name, channel URL, and like count. You also receive the reply count, published date, and a flag indicating if the comment is from the video creator. Unique Comment IDs and Parent IDs are also provided to help you reconstruct reply chains.

Comments are just one piece of a larger YouTube data workflow. Combining comment data with video metadata, transcripts, and search results gives a complete picture of an audience's reaction and behavior. For example, using a YouTube video scraper API gives you access to view counts, upload dates, and tags. This allows you to correlate comment sentiment with the overall reach of the video.

Similarly, a YouTube transcript scraper API provides the actual content discussed in the video, helping you verify if the audience is reacting to specific claims or moments. Finally, using a YouTube search API shows what videos rank for specific queries, enabling you to identify which high-performing comment sections are worth monitoring. Together, these tools transform raw text into a deep, multi-dimensional analysis of platform trends and user engagement.

Advanced YouTube Scraping Options

Technical teams often require more than basic engagement metrics to understand the full scope of a niche. For more specialized research, you can move beyond comments to track broader platform trends and monetization strategies. For instance, you can collect video titles at scale using a YouTube title scraper API to monitor how creators structure their hooks and click-through strategies.

For short-form content, the YouTube shorts scraper API helps you track unique engagement patterns that differ from traditional long-form videos. Additionally, using a YouTube ad results API allows you to understand which ads appear on specific types of content, giving you a transparent view of where marketing budgets are currently focused. These advanced tools ensure your research covers every aspect of platform behavior.

YouTube comments are publicly visible to any web user, but YouTube's Terms of Service strictly restrict automated access. The official Data API is the safest route legally since Google explicitly provides it for the purpose of data extraction. When using other methods like direct scraping, ensure you do not attempt to scrape private or unlisted video comments, as these are not intended for public consumption.

You should also avoid republishing personal data, such as usernames tied to real identities, without considering regional privacy laws like GDPR. Maintaining a responsible scraping cadence is also vital; always rate-limit your requests to avoid disrupting the platform's service or triggering security blocks. For any large-scale commercial use of scraped data, it is best to consult legal counsel to ensure your specific data pipeline remains fully compliant with evolving regulations.

Start Scraping YouTube Comments at Scale

Transitioning from manual data collection to an automated pipeline is the only way to gain deep, actionable audience insights without wasting engineering resources. You can begin immediately by choosing the path that fits your current requirements.

  • For a quick test, copy the Data API code provided above to pull comments from a single video in under five minutes.
  • If you are managing a medium-scale research project, utilize yt-dlp or our pagination scripts to collect data across multiple channels efficiently.
  • For true production environments, a managed API offers the most reliable, hands-off solution for recurring, high-volume collection.

Automate your YouTube research today to ensure your data remains stable, scalable, and ready for high-stakes decision-making.

Frequently Asked Questions (FAQs)

Is it possible to scrape YouTube comments without coding?

Yes. Some browser extensions and no-code tools can export comments to CSV. Managed APIs with dashboard UIs also work for this purpose. However, for anything beyond a single video, using Python or an API is far more practical for handling large datasets and automation.

How many comments can I scrape from a single video?

There's no hard cap, as videos can have millions of comments. The YouTube Data API returns them in pages of up to 100. Practical limits are usually set by your API quota (for official methods) or rate limiting (for direct scraping). A managed API can pull tens of thousands per video reliably.

Are YouTube comments public data?

Comments on public videos are visible to anyone on the internet. However, "publicly visible" doesn't automatically mean the data is free to scrape at scale without restriction. The method of collection is still subject to YouTube's Terms of Service and local data privacy regulations.

Can I scrape replies to YouTube comments?

Yes. The Data API has a dedicated comments endpoint for replies that uses the parentId parameter. The code example in Method 1, Step 5 shows exactly how to implement this. Managed APIs and yt-dlp also include these reply threads in their structured output.

image description
Jakub Zielinski

Jakub is a Senior Content Manager at ScrapingBee, a T-shaped content marketer deeply rooted in the IT and SaaS industry.