What does Beautifulsoup do in Python?

BeautifulSoup parses the HTML allowing you to extract information from it.

When doing web scraping, you will usually not be interested in the HTML on the page, but in the underlying data. This is where BeautifulSoup comes into play.

BeautifulSoup will take that HTML and turn it into the data you're interested in. Here is a quick example on how to extract the title of a webpage:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://news.ycombinator.com/")
soup = BeautifulSoup(response.content, 'html.parser')

# The title tag of the page
print(soup.title)
> <title>Hacker News</title>

# The title of the page as string
print(soup.title.string)
> Hacker News

If you want to learn more about BeautifulSoup and how to extract links, custom attributes, siblings and more, feel free to check our BeautifulSoup tutorial.

Go back to web scraping questions