Want to learn how to scrape TripAdvisor? Tired of overpaying for your trips? As one of the biggest online travel platforms, it has tons of valuable information that can help you save money and enjoy your time abroad.
Scraping TripAdvisor is a great way to keep an eye on price changes, customer sentiment, and other details that can impact your trips and vacations. In this tutorial, we will explain how to extract hotel names, prices, ratings, and reviews from TripAdvisor using our web scraping API with Python.
Even if you have no prior coding experience, combining our tools with the most popular coding languages will make challenges like JavaScript content rendering and bot detection feel like a breeze. Let’s get straight to it!
Quick Answer (TL;DR)
Our API makes it easy to extract data from TripAdvisor and other platforms with quick and configurable API calls that do not require a lot of technical proficiency. On top of that, we handle JavaScript rendering, rotates proxies, and bypasses anti-bot measures automatically for an effective TripAdvisor Scraping API.
Below is a full code for basic TripAdvisor scraping that should feel intuitive enough to get you started without diving into complexities too early and wasting your free plan:
from scrapingbee import ScrapingBeeClient
import pandas as pd
client = ScrapingBeeClient(api_key='YOUR_API_KEY')
url = "https://www.tripadvisor.com/Hotels-g187791-Rome_Lazio-Hotels.html"
def scrape_tripadvisor():
js_scenario = {
"instructions": [
{"wait": 2000},
{"click": "button.rmyCe._G.B-.z._S.c.Wc.wSSLS.AeLHi.sOtnj"}
]
}
# extract_rules dictionary holds instructions for CSS selectors
extract_rules = {
"Hotel card": {
# selector for the hotel listing card
"selector": "div.XIWnB.z.y.rCDYP",
# Defines its type - each selector creates a list of dictionaries for each hotel card
"type": "list",
"output": {
# CSS selectors for values within the hotel card
"Hotel name": 'div.XIWnB.z.y.rCDYP > div.IcVzi.y._T > div > div > a > h3',
"Price from": 'div.dDjJv button.BrNIB div.SewaP',
"rating": '[data-automation="bubbleRatingValue"] span',
"review amount": '[data-automation="bubbleReviewCount"] span',
"ranking": 'div.XIWnB.z.y.rCDYP > div.IcVzi.y._T > div > span > div > div:nth-child(2) > span',
"description": 'div.NpDbk > div > div > a > span'
}
},
}
url ="https://www.tripadvisor.com/Hotels-g187791-Rome_Lazio-Hotels.html"
response = client.get(
url,
params={
"extract_rules":extract_rules,
"premium_proxy": "True",
"js_scenario": js_scenario
},
)
if response.status_code != 200:
print(f"Extraction failed: {response.status_code}")
return
df = pd.DataFrame(response.json())
df.to_csv("tripadvisor_data.csv", index=False)
return print(response)
scrape_tripadvisor()
Scraping TripAdvisor with ScrapingBee
Take full advantage of our full package of scrape-ready tools when accessing TripAdvisor. By the end of this tutorial, you will have a simple and customizable script, plus plenty of comments to guide you through the process, even without any programming experience.
Our goal is to handle the extraction and parsing, plus bypass any hurdles with proxy servers, appropriate URLs, and JavaScript rendering, and let you focus on booking prices and user reviews, extracted in a readable and understandable format.
The biggest challenge with TripAdvisor is its strong resistance to scraping. It uses the same URL with parameters managed internally or within the page state; therefore, client-side GraphQL requests determine the start and end dates of your hotel stay.
Begin by installing Python (version 3.6 or newer) on your device. Download the installer from the official Python website (python.org) or directly from the Microsoft Store by searching for Python and clicking "Get" to install the latest version.
Python provides the opportunity to easily import external libraries. To use its package manager pip, open your Terminal (Command Prompt for Windows) and type pip install <package name>
. For TripAdvisor scraping, here are
Thanks to our HTML API, the process is even simpler. Here is what you need:
scrapingbee – our Python library that provides a web scraping API, handling headless browsers, proxy rotation, and JavaScript rendering.
pandas – package for data analysis and manipulation, offering structures such as DataFrames to efficiently handle extracted information.
To install these packages, open your Terminal, or Command Prompt on windows devices, and enter the following line:
pip install scrapingbee pandas
Set Up ScrapingBee Access
Log in or register a ScrapingBee account to get your API key. If you don’t have one, don’t worry! After registering an account, you will receive 1,000 free credits for a week to test the code described in this guide.
After a successful sign up, you will see a dashboard that tells everything you need to know about available resources tied to our API. The top-right section is your API key:
Make Your First Request to TripAdvisor
Create a dedicated folder and create a text file with a .py extension, which will contain our Wikipedia scraping script. First, we import the downloaded libraries to make sure the script utilizes their tools.
from scrapingbee import ScrapingBeeClient
import pandas as pd
Adding this packages will let you utilize the tools and parameters defined in ScrapingBee API Documentation. Once everything is in order, we can start building our web scraping tool that will collect hotel data from the TripAdvisor website.
First, choose a location and manually extract its TripAdvisor URL. In this tutorial, we will extract hotel data from accommodation spots in Rome. This will be our URL:
https://www.tripadvisor.com/Hotels-g187791-Rome_Lazio-Hotels.html
After importing Python libraries, we assign our HTML API to a variable "client". In the parentheses, add your API key from your account's dashboard.
client = ScrapingBeeClient(api_key='A0I5QA6KVT1I6NP8LXLJRJDYF43WVALD5KP2ZJYCHUIHS3K8THI0X8NC9O4LTZTBW687ASU0M6O2KXM5')
Once that is taken care of, start defining a function that will carry all variables that are relevant to your data scraper's functionality. We called it "scrape_tripadvisor":
def scrape_tripadvisor():
The following lines will be indented from the start of the function's definition and will be executed once the function is called. Then, create a Python dictionary of instructions that will guide our JavaScript Scenario Feature, which tells the headless browser how to behave on the visited page. In its current version, all it does is wait 2 seconds for the page to fully load before scraping.
# js_scenario dictionary that will be attached to your GET API call
js_scenario = {
"instructions": [
{"wait": 2000}
]
}
After that, we assign our URL to a "url" variable and proceed with defining the GET API call, which will hold its results in the "response" variable:
url ="https://www.tripadvisor.com/Hotels-g187791-Rome_Lazio-Hotels.html"
response = client.get(
url,
params={
"premium_proxy": "True",
"js_scenario": js_scenario
},
)
Before we continue, let's break down what is going on in the "params" dictionary, that is attached to the API call as its arguments:
• "js_scenario" – attaches the previously defined rules for JavaScript Rendering.
• "premium_proxy" – tells our API to route data scraping connections through a network of legitimate proxy servers, protecting the web scraping process.
Now, all that is left is to add an if statement that prints an HTTP status code if extraction is unsuccessful, so you can better work on resolving the issue, or just print out the extracted data from TripAdvisor. The last line is no longer indented because it not only closes the function, but also invokes it:
if response.status_code != 200:
print(f"Extraction failed: {response.status_code}")
return
print(response.text)
scrape_tripadvisor()
Finally, we can begin TripAdvisor data scraping. Run your Python code. The first result will look very messy:
Parse Results for Hotel Data
Raw HTML code is not useful, as it does not reflect any public opinions, competitor ratings, or insights that help identify trends. Here we will cover how to retrieve publicly available data in a structured format with an automated scraping script.
First, create an additional dictionary that will define Data Extraction Rules – an additional argument to the GET API call that will cherry pick which information to include in your response with the help of CSS selectors. Keep it empty for now, as we need to manually visit the TripAdvisor URL to find the right selectors in the HTML code.
Fortunately, this task is a lot more simple thanks to your browser's developer tools, which you can access by clicking F12 (or right-click the page and press "inspect element"). First, find an appropriate selector that contains how much data from each hotel listing you want to extract. Don't pick one data point from the card, make sure all relevant information stays within its boundaries.
The image below shows an appropriate selection of a CSS selector that we will include in the "extract_rules" dictionary. Right-click the div and copy its selector.
Note: If the copied selector only applies to one hotel card, remove its additional parts to only target sections using this specific class.
After including the CSS selector, your dictionary should look like this:
extract_rules = {
"Hotel card": {
"selector": "div.XIWnB.z.y.rCDYP",
"type": "list",
"output": {
}
},
}
Now you have probably noticed that we added additional elements to "extract_rules". The extracted CSS selector creates boundaries for the scraper to take data from. The "type" element defines "Hotel card" as a list of values containing hotel data. As for "output", it is another dictionary that contains additional selectors that actually refer to the the data within the selector.
By using the same principles as in the "selector" definition, pick out desired key names and their values. Here is an example of a finished "extract_rules" dictionary:
extract_rules = {
"Hotel card": {
# selector for the hotel listing card
"selector": "div.XIWnB.z.y.rCDYP",
# Defines its type - each selector creates a list of dictionaries for each hotel card
"type": "list",
"output": {
# CSS selectors for values within the hotel card
"Hotel name": 'div.XIWnB.z.y.rCDYP > div.IcVzi.y._T > div > div > a > h3',
"Price from": 'div.dDjJv button.BrNIB div.SewaP',
"rating": '[data-automation="bubbleRatingValue"] span',
"review amount": '[data-automation="bubbleReviewCount"] span',
"ranking": 'div.XIWnB.z.y.rCDYP > div.IcVzi.y._T > div > span > div > div:nth-child(2) > span',
"description": 'div.NpDbk > div > div > a > span'
}
},
Now if we run the script once again, the result looks a lot better:
Use js_scenario to Simulate Interaction
After the last extraction, there is a problem – we only extracted valuable data from 6 hotels! That is because there is a button which loads more. To reach all of them with your web scraping tool, first, find the CSS selector of this button:
Add this button to your "js_scenario" instructions, ordering our HTML API to click the button before extracting information from the page:
js_scenario = {
"instructions": [
{"wait": 2000},
{"click": "button.rmyCe._G.B-.z._S.c.Wc.wSSLS.AeLHi.sOtnj"}
]
}
Now, if you run the code again, the number of extracted hotel cards has increased to 30:
Now, the following lines of code will convert our extracted valuable data into a Pandas DataFrame, which we export to a CSV file. The end result is much better than what we started with:
Handle Anti-Bot Techniques on TripAdvisor
Scraping TripAdvisor is challenging because the platform actively deploys anti-bot measures to prevent automated data collection. Frequent requests from a single IP address, unusual traffic patterns, or incomplete rendering can quickly trigger blocks. That’s why our API is a difference maker, handling the most persistent challenges and allowing you to focus on working with extracted data.
Through implementation of premium residential proxies, we distribute requests across a large pool of IPs, plus options to pick your desired geolocation. If default configurations do not allow you to scrape data, you can add spoofed headers and user-agents, ensuring each request looks like it comes from a real browser rather than a script.
TripAdvisor content relies heavily on JavaScript. Because our API incorporates headless browser rendering and executes JavaScript before extraction, you can add specific instructions to deal with dynamic elements such as “See all hotels” buttons. If you utilize our extensive documentation, you can control JavaScript scenarios and add simple instructions to simulate real user behavior without additional code.
By combining these tools and adding additional guidelines, we can allows bypass TripAdvisor’s defenses and automatically collect data. If you need more information on how to stay undetected with our API, check out our how to Avoid Getting Blocked Guide.
Scraping TripAdvisor Reviews at Scale
Our Python SDK creates a robust environment, allowing you to easily improve and scale your TripAdvisor scraper. Because the first page only loads 30 Tripadvisor hotels, you will need additional URLs.
Fortunately, with small and predictable changes, we can identify the pattern on how to reach more review data by accessing additional pages:
'https://www.tripadvisor.com/Hotels-g187791-Rome_Lazio-Hotels.html' – page 1 'https://www.tripadvisor.com/Hotels-g187791-oa30-Rome_Lazio-Hotels.html' – page 2
While the first page has no indicators, once we go to the second one, the URL is appended in the middle with a "oa30" string. With each further page, this number is increases by 30. Understanding this allows us to create an additional function – build_tripadvisor_urls. Let's take a closer look:
# BEFORE THE FUNCTION:
base_url = "https://www.tripadvisor.com/Hotels-g187791-Rome_Lazio-Hotels.html"
# A list of URLs (our base URL is the first page, and the list gets appended if additional pages are equired
#unchanged page 1 URL
urls = [base_url]
# n_pages - a variable storing how many pages you want to scrape
def build_tripadvisor_urls(n_pages):
# if statement: if the amount of desired pages is less or equal than one, change nothing
if n_pages<=1:
return urls
else:
# with each additional page, append the list of URLs, but change it by adding oa30, oa60, oa90, etc.
for i in range(1, n_pages): # n_pages=1 -> only base
offset = i * 30
urls.append(base_url.replace("Hotels-g187791-", f"Hotels-g187791-oa{offset}-"))
return urls
Now, just like we did with the main function, we can call it at the end, and even ask for user input to give a different number of pages every time you run the script:
n_pages=int(input("Input how many pages to target: "))
build_tripadvisor_urls(n_pages)
scrape_tripadvisor()
Also, by adding the Python's Concurrency library, we can add the following section of code at the end of the scraper function to target multiple pages at the same time:
# MAX_THREADS - specific number of threads used for concurrent connections
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
# here arguments are your function and the created list of URLS
executor.map(scrape, urls)
Running multiple connections at the same time adds a lot of complexity to your scraper. If you want to learn more to better target TripAdvisor hotels, check out our blog on How to Use Concurrency in your scraping script.
Start Scraping with 1,000 Free Credits
Ready to try it yourself? Sign up for ScrapingBee and get 1,000 free credits to start scraping TripAdvisor today. No need to manage proxies, rotate IPs, or set up headless browsers — we handle all the hard parts so you can focus on extracting the data you need!
Frequently Asked Questions (FAQs)
Is scraping TripAdvisor legal?
Yes, scraping publicly from sites in the travel industry is legal, but make sure to pay attention to TripAdvisor's terms and stay mindful of it as you begin scraping the platform.
Does ScrapingBee work on dynamic content like TripAdvisor?
Yes our HTML API supports JavaScript rendering, which allows it to extract content loaded dynamically by the browser. This is great on platforms like TripAdvisor, where much of the data (like prices and reviews) is injected via JavaScript after the initial page load.
How to extract only reviews or prices from TripAdvisor?
Make sure to properly modify the extract_rules parameter to target specific CSS selectors like review text or price containers. Use our Python SDK to customize what data to extract, and focus only on the content you need, such as reviews or pricing info.
Why am I getting empty responses from TripAdvisor?
Empty responses usually mean JavaScript hasn’t fully rendered or a CAPTCHA was triggered. If extractions are unsuccessful, inspect your js_scenario instructions like a wait or button click, and do not forget to use "premium_proxy=True" to avoid IP-based blocks.

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.