So, you wanna do C# web scraping without losing your sanity? This guide's got you! We'll go from zero to a working scraper that actually does something useful: fetching real HTML, parsing it cleanly, and saving the data to a nice CSV file.
You'll learn how to use HtmlAgilityPack for parsing, CsvHelper for export, and ScrapingBee as your all-in-one backend that handles headless browsers, proxies, and JavaScript. Yeah, all the messy stuff nobody wants to deal with manually.
By the end you'll have a complete scraper you can run, tweak, or build on. So, fire up VS Code, and let's make this thing scrape.

Quick answer (TL;DR)
If you just want to get scraping right now, here's the fast lane. This section gives you a fully working, copy-pasteable C# web scraping script that calls ScrapingBee, parses book data from "Books to Scrape", and exports everything to a products.csv file.
using HtmlAgilityPack;
using CsvHelper;
using CsvHelper.Configuration;
using System.Globalization;
using System.Text;
// Minimal product model
public sealed class Product
{
public string Title { get; set; } = "";
public decimal Price { get; set; }
public string Url { get; set; } = "";
}
var apiKey = "YOUR_API_KEY"; // get from ScrapingBee dashboard
var baseUrl = "https://books.toscrape.com/";
// Build ScrapingBee URL for the first page (add &render_js=true if needed)
var requestUrl =
$"https://app.scrapingbee.com/api/v1?api_key={apiKey}&url={Uri.EscapeDataString(baseUrl)}";
// Fetch HTML via ScrapingBee
using var http = new HttpClient { Timeout = TimeSpan.FromSeconds(20) };
var html = await http.GetStringAsync(requestUrl);
// Parse HTML
var doc = new HtmlDocument();
doc.LoadHtml(html);
// Extract products from the first page
var products = new List<Product>();
var cards = doc.DocumentNode.SelectNodes("//article[@class='product_pod']")?.ToList() ?? new List<HtmlNode>();
foreach (var card in cards)
{
var a = card.SelectSingleNode(".//h3/a");
var titleRaw = a?.GetAttributeValue("title", a?.InnerText.Trim() ?? "")?.Trim() ?? "";
var title = HtmlEntity.DeEntitize(titleRaw).Replace('\u00A0', ' ').Trim();
var href = a?.GetAttributeValue("href", "") ?? "";
var url = new Uri(new Uri(baseUrl), href).ToString();
var priceText = card.SelectSingleNode(".//p[@class='price_color']")?.InnerText?.Trim() ?? "£0.00";
priceText = HtmlEntity.DeEntitize(priceText).Replace('\u00A0', " ").Trim();
var numeric = priceText.Replace("£", "").Trim();
decimal.TryParse(numeric, NumberStyles.Any, CultureInfo.InvariantCulture, out var price);
products.Add(new Product { Title = title, Price = price, Url = url });
}
// Write CSV next to the executable
var csvPath = Path.Combine(AppContext.BaseDirectory, "products.csv");
// UTF-8 with BOM helps Excel on Windows detect encoding
using var writer = new StreamWriter(csvPath, false, new UTF8Encoding(encoderShouldEmitUTF8Identifier: true));
var csvConfig = new CsvConfiguration(CultureInfo.InvariantCulture) { NewLine = Environment.NewLine };
using var csv = new CsvWriter(writer, csvConfig);
csv.WriteRecords(products);
Console.WriteLine($"Saved {products.Count} products to {csvPath}");
What this code does:
- Calls the ScrapingBee API with your key to fetch HTML (add
&render_js=trueif you need JavaScript rendering). - Uses HtmlAgilityPack to parse book cards and extract titles, prices, and URLs.
- Converts relative links to absolute URLs so every product path works standalone.
- Saves clean data into
products.csvusing CsvHelper, ready to open in Excel or import anywhere. - Writes in UTF-8 with BOM, so older Windows Excel detects the file encoding correctly.
Setting up your C# web scraping environment
Alright, before we dive into the code, let's get your setup ready for some C# web scraping magic. You don't need a monster IDE or a spaceship workstation as Visual Studio Code and the .NET SDK should work just fine.
Installing .NET SDK and Visual Studio Code
First things first: grab the latest .NET SDK for your system. I'll be rolling with .NET SDK 9 in this guide.
Next up, install Visual Studio Code. Once it's open, hit the extensions tab and add C# Dev Kit. That extension brings you IntelliSense, debugging, and project templates.
Verifying .NET CLI installation
Before you start scraping the web like a digital mole, let's make sure the tools are actually working. Pop open your terminal and run:
dotnet --info
If it spits out details about your .NET install (version, runtime, and paths) congrats, your setup's alive and kicking.
You're now locked and loaded to start building your first C# web scraping project. If you're new to the concept, check out our web scraping guide for a quick refresher.
Building a basic C# web scraper with HtmlAgilityPack
Now that the setup's ready, let's build a simple C# web scraper from scratch. We'll use the HtmlAgilityPack library to parse HTML and ScrapingBee's API to fetch pages without getting into heavy browser automation. The goal is to keep things light; just a small console app that grabs a page and pulls out some text.
We'll create a new project, install a couple of NuGet packages, and write a bit of code to fetch and process HTML. By the end of this part, you'll have a working scraper that can request a web page, read its content, and print specific pieces of data.
Creating a console app with dotnet new
Time to kick things off with a fresh console project. Open your terminal and run:
dotnet new console -n CsBeeScraping
cd CsBeeScraping
This sets up a new folder called CsBeeScraping with a basic "Hello World" app inside. Open it in Visual Studio Code, take a quick look at the Program.cs file: that's where all the scraping magic will happen in a minute.
Adding HtmlAgilityPack and CsvHelper via NuGet
Now let's bring in two libraries that will make your C# web scraping setup actually useful. HtmlAgilityPack will help you parse and navigate HTML documents with XPath or DOM traversal. CsvHelper takes care of exporting scraped data to a clean CSV file, including all those annoying details like escaping quotes and handling encodings.
You can install both straight from the terminal using the .NET CLI. Inside your project folder, run:
dotnet add package HtmlAgilityPack
dotnet add package CsvHelper
This downloads and adds the latest versions of both libraries to your project. When it's done, open your .csproj file. You should see something like this:
<ItemGroup>
<PackageReference Include="CsvHelper" Version="33.1.0" />
<PackageReference Include="HtmlAgilityPack" Version="1.12.4" />
</ItemGroup>
If those entries are there, you're good to go. The dependencies are in place and ready for action.
Fetching and parsing HTML with HtmlAgilityPack
Let's get to the fun part: actually pulling some HTML from the web. Sure, you could just use HttpClient.GetStringAsync() to grab the raw page source, but that only works for the simplest sites. Most modern pages use JavaScript, have rate limits, or block suspicious traffic. Handling that yourself means juggling headers, proxies, and retries. That's not how we want to spend a Friday night. (Funnily enough, I'm writing this on a Halloween Friday night, so trust me, I know what bad ideas look like.)
That's where ScrapingBee saves the day. It acts as your friendly web-scraping middleman: it loads the page in a real browser, rotates proxies for you, and returns clean HTML. No fighting with headless browsers or IP bans.
Here's what to do:
- Sign up for a free trial — you get 1000 free credits right away. Each request costs a few credits, depending on proxy type and options.
- Open the Request Builder — HTML in your ScrapingBee dashboard.
- Copy your API key — you'll need it in the code.
- For now, let's stick with the classic proxy type (no action is needed from your side).
The API endpoint looks like this:
https://app.scrapingbee.com/api/v1?api_key=YOUR_API_KEY&url=URL_TO_FETCH
You just plug in your API key and the target URL. Then you can use HttpClient to fetch the HTML and pass it to HtmlAgilityPack for parsing.
Open up Program.cs and replace its contents with:
using HtmlAgilityPack;
var apiKey = "YOUR_API_KEY";
// Let's test with a simple page first.
// We'll switch to a real page for parsing in the next section.
var targetUrl = "https://httpbin.scrapingbee.com/anything?json";
// Build the ScrapingBee request URL with your API key.
var requestUrl = $"https://app.scrapingbee.com/api/v1?api_key={apiKey}&url={Uri.EscapeDataString(targetUrl)}";
// Send the request and grab the HTML.
using var http = new HttpClient();
var html = await http.GetStringAsync(requestUrl);
// Load the returned content into HtmlAgilityPack for parsing.
var doc = new HtmlDocument();
doc.LoadHtml(html);
// Just to confirm everything works.
Console.WriteLine("HTML fetched successfully!");
Console.WriteLine($"Length: {html.Length} chars");
A few notes before you run it:
- If you don't specify a proxy type, ScrapingBee uses the classic one by default.
- The
Uri.EscapeDataString()call makes sure your URL is properly encoded, so never skip that. - HtmlAgilityPack doesn't fetch content; it only parses what you feed it. That's why we use
HttpClienthere.
From your project root, run:
dotnet run
If everything is wired up correctly, you should see output like:
HTML fetched successfully!
Length: 962 chars
Nice. That means your scraper works, your API key is valid, and the pipeline is ready for real parsing.
Parsing HTML with SelectNodes() and XPath
Once you've got the HTML from ScrapingBee, it's time to pull out the bits you care about. HtmlAgilityPack supports XPath, which is perfect for selecting elements without writing a parser yourself. As a quick example, let's grab story titles and links from the Hacker News front page.
Add this to your Program.cs:
using HtmlAgilityPack;
class Program
{
static async Task Main()
{
var apiKey = "YOUR_API_KEY";
var targetUrl = "https://news.ycombinator.com/";
// Build the ScrapingBee request URL.
// ScrapingBee acts as a proxy that fetches the page using a real browser, bypassing JS blocks and bot filters.
var requestUrl =
$"https://app.scrapingbee.com/api/v1?api_key={apiKey}&url={Uri.EscapeDataString(targetUrl)}";
// HttpClient handles HTTP requests; reuse or dispose properly to avoid socket exhaustion.
using var http = new HttpClient { Timeout = TimeSpan.FromSeconds(20) };
// Optional cancellation token: helps stop the request if it takes too long or hangs.
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(25));
string html;
try
{
// Fetch the page HTML through ScrapingBee.
html = await http.GetStringAsync(requestUrl, cts.Token);
}
catch (HttpRequestException ex)
{
// Catch typical network or HTTP-level errors (timeouts, bad status codes, etc.).
Console.Error.WriteLine($"HTTP error: {ex.Message}");
return;
}
// HtmlAgilityPack parses the HTML so you can use XPath selectors instead of regex.
var doc = new HtmlDocument();
doc.LoadHtml(html);
// Hacker News story titles live under:
// <span class="titleline"><a href="...">Title</a></span>
var anchors = doc.DocumentNode.SelectNodes("//span[@class='titleline']/a");
if (anchors is null || anchors.Count == 0)
{
Console.WriteLine("No titles found. The page structure may have changed or the request failed.");
return;
}
const int maxItems = 5;
Console.WriteLine("Top Hacker News Stories:\n");
var baseUri = new Uri(targetUrl);
foreach (var a in anchors.Take(maxItems))
{
// Extract the text inside the <a> tag.
// HtmlAgilityPack keeps entities encoded (e.g. &), so decode or trim if needed.
var title = a.InnerText.Trim();
// Grab the href attribute value.
var href = a.GetAttributeValue("href", "");
// Convert relative URLs (like "item?id=123") into absolute ones.
var link = Uri.TryCreate(href, UriKind.Absolute, out var abs)
? abs
: new Uri(baseUri, href);
// Print results.
Console.WriteLine(title);
Console.WriteLine(link);
Console.WriteLine();
}
}
}
Key things to note:
SelectNodes("//span[@class='titleline']/a")targets the story links. If the site updates its markup, this is where you'll need to adjust.- Relative URLs are converted to absolute with
new Uri(baseUri, href)so all links resolve properly. - A short
HttpClienttimeout andCancellationTokenkeep the app from hanging on slow or blocked requests. - The
HttpRequestExceptionblock handles network issues cleanly instead of crashing. maxItemsjust limits how much you print; bump it up or remove it later.
Run it:
dotnet run
You should see a few titles and URLs printed out. That's SelectNodes() + XPath doing their thing.
Setting custom headers when calling ScrapingBee
Sometimes a site behaves differently depending on what kind of "visitor" it thinks you are (desktop vs. mobile, English vs. French, bot vs. human). That's where custom headers come in. When using ScrapingBee, you can pass things like a specific User-Agent, preferred language, or API tokens, and it'll forward them to the target site for you.
ScrapingBee lets you do this using the forward_headers=true parameter. Every header you forward must start with the spb- prefix — that's how ScrapingBee knows it should pass them through.
Here's how it looks in C#, continuing from our Hacker News example:
using HtmlAgilityPack;
var apiKey = "YOUR_API_KEY";
var targetUrl = "https://news.ycombinator.com/";
// Build request URL with forward_headers enabled.
// This tells ScrapingBee to pass your prefixed headers to the target site.
var requestUrl =
$"https://app.scrapingbee.com/api/v1?api_key={apiKey}&url={Uri.EscapeDataString(targetUrl)}&forward_headers=true";
// Create HttpClient instance to send the request.
using var http = new HttpClient();
// Add custom headers. Each must start with "spb-" to be forwarded correctly.
// ScrapingBee will strip the prefix and send them as regular headers.
http.DefaultRequestHeaders.Add("spb-user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)");
http.DefaultRequestHeaders.Add("spb-accept-language", "en-US,en;q=0.9");
Console.WriteLine("Fetching Hacker News with custom headers...");
// Perform the HTTP request through ScrapingBee.
var html = await http.GetStringAsync(requestUrl);
// Load the returned HTML into HtmlAgilityPack for parsing.
var doc = new HtmlDocument();
doc.LoadHtml(html);
// Extract titles using XPath (same selector as before).
var titles = doc.DocumentNode.SelectNodes("//span[@class='titleline']/a");
if (titles is null || titles.Count == 0)
{
Console.WriteLine("No titles found. Check your API key or headers.");
return;
}
Console.WriteLine("Top Hacker News Stories:\n");
// Limit output for clarity; decode HTML entities for readable text.
foreach (var t in titles.Take(5))
{
var title = HtmlEntity.DeEntitize(t.InnerText.Trim());
var link = t.GetAttributeValue("href", "");
Console.WriteLine($"{title}\n{link}\n");
}
A quick rundown:
- The
forward_headers=trueflag tells ScrapingBee to relay your prefixed headers to the destination. - Prefixing with
spb-is mandatory. For example,spb-user-agentbecomesUser-Agenton the other side. - This is useful when a site adjusts layout, content, or language based on region or device.
- You can combine this with ScrapingBee's proxy and rendering options for even finer control.
For a full list of header rules and supported options, check the ScrapingBee documentation.
Extracting and structuring data from HTML elements
Now we're getting into the real meat of C# web scraping: taking that pile of HTML and shaping it into clean, structured data. For this part, we'll use the Books to Scrape demo site (a classic for testing scrapers). The plan: fetch each product card, grab the title, price, and link, and map it all into a small C# model.
This gives us a predictable scraping pipeline: parse → normalize → store. We'll also clean up the price into a proper decimal and make sure all links are absolute (no weird ../ stuff).
Defining a product class for data storage
Before we touch the HTML, let's create a simple model to store the scraped data. This helps keep your logic tidy and makes it easier to export results later (like saving to CSV).
Create a new file at Models/Product.cs and drop this in:
namespace CsBeeScraping.Models;
// A simple data model representing a scraped product.
public sealed class Product
{
// Book title (as shown on the page)
public string Title { get; set; } = "";
// Parsed numeric price (we'll strip currency symbols later)
public decimal Price { get; set; }
// Absolute product URL
public string Url { get; set; } = "";
}
This Product class will hold the essentials for each scraped book: its title, numeric price, and the absolute URL to the product page. One book, one Product instance.
Using QuerySelectorAll() for CSS selection
Not a fan of XPath? No worries, you can use CSS selectors instead, which often feel more intuitive if you've done any front-end work. HtmlAgilityPack doesn't support them out of the box, but adding that feature is just one NuGet install away.
Run this to add CSS selector support:
dotnet add package HtmlAgilityPack.CssSelectors.NetCore
Now replace your parsing section in Program.cs with the snippet below. It uses QuerySelectorAll() and QuerySelector(); same idea as document.querySelectorAll() in JavaScript.
using HtmlAgilityPack;
using HtmlAgilityPack.CssSelectors.NetCore;
var apiKey = "YOUR_API_KEY";
var targetUrl = "https://books.toscrape.com/";
var requestUrl =
$"https://app.scrapingbee.com/api/v1?api_key={apiKey}&url={Uri.EscapeDataString(targetUrl)}";
using var http = new HttpClient();
Console.WriteLine("Fetching data through ScrapingBee...");
// Fetch the HTML content via ScrapingBee
var html = await http.GetStringAsync(requestUrl);
// Load the response into HtmlAgilityPack
var doc = new HtmlDocument();
doc.LoadHtml(html);
// Select all product cards on the page using a CSS selector
var cards = doc.DocumentNode.QuerySelectorAll("article.product_pod");
if (cards is null || cards.Count == 0)
{
Console.WriteLine("No products found. Check your API key or response.");
return;
}
// Limit to 5 products for clarity
foreach (var card in cards.Take(5))
{
// Each product card has a link inside <h3><a ...></a></h3>
var a = card.QuerySelector("h3 > a");
// Extract title and relative link safely
var title = a?.GetAttributeValue("title", "") ?? "";
var href = a?.GetAttributeValue("href", "") ?? "";
// Price element: <p class="price_color">£51.77</p>
var price = card.QuerySelector("p.price_color")?.InnerText?.Trim() ?? "";
Console.WriteLine($"{title} — {price} — {href}");
}
A few quick notes:
- CSS selectors are often easier to read and maintain than XPath, especially for simple pages like this one.
- You can mix CSS and XPath in the same project, HtmlAgilityPack doesn't mind.
QuerySelectorAll("article.product_pod")grabs each product card, thenh3 > aand.price_colorpull the details inside.- Later, we'll normalize prices and links to prepare the data for export.
Run it:
dotnet run
You'll get something like this:
A Light in the Attic — £51.77 — catalogue/a-light-in-the-attic_1000/index.html
Tipping the Velvet — £53.74 — catalogue/tipping-the-velvet_999/index.html
If you'd rather skip local parsing entirely, ScrapingBee can return structured JSON directly. Check out our data extraction in C# tutorial.
Cleaning HTML entities with HtmlEntity.DeEntitize()
Sometimes the text you scrape looks fine at first glance, but it's hiding HTML entities like &, , or other encoded symbols. These can sneak into titles, prices, and author names. Luckily, HtmlAgilityPack gives us a simple cleanup tool for this: HtmlEntity.DeEntitize().
You don't have to rewrite your scraper. Just tweak your text-cleaning lines a bit. For example, inside your book-parsing loop, update these lines:
// Extract and clean the title text
var title = a?.GetAttributeValue("title", a?.InnerText.Trim() ?? "")?.Trim() ?? "";
title = HtmlEntity.DeEntitize(title).Replace('\u00A0', ' ').Trim();
// Extract and clean the price text
var price = card.QuerySelector("p.price_color")?.InnerText?.Trim() ?? "£0.00";
price = HtmlEntity.DeEntitize(price).Replace('\u00A0', ' ').Trim();
Here's what happens:
HtmlEntity.DeEntitize()turns things like&into&and strips out encoded symbols.- The
Replace('\u00A0', ' ')part converts non-breaking spaces to normal ones, so you don't end up with weird invisible characters. - A final
.Trim()ensures no leftover whitespace.
Exporting scraped data to CSV in C#
Now that your scraper's pulling real data, let's make it actually useful by saving it to a CSV file. A CSV is perfect for quick exports: you can open it in Excel, import it into a database, or feed it to another script. We'll use CsvHelper, a solid C# library for reading and writing CSV files with proper culture and encoding settings.
If you want to go deeper on Excel workflows, check out this guide: How to web scrape in Excel
Importing CsvHelper
You already installed CsvHelper earlier, so now it's just about adding the right namespaces. At the top of your Program.cs, add:
using CsvHelper;
using CsvHelper.Configuration;
using System.Globalization;
using System.IO;
using System.Text;
using CsBeeScraping.Models;
Then prepare a CSV configuration before writing data:
// Define configuration for CsvHelper
// CultureInfo.InvariantCulture ensures consistent number and date formatting across systems
// NewLine uses your environment's default line endings
var csvConfig = new CsvConfiguration(CultureInfo.InvariantCulture)
{
NewLine = Environment.NewLine
};
We'll use this configuration in the next section to export our list of Product objects.
Writing records with CsvWriter.WriteRecords()
Here's a full working example that ties everything together. We'll fetch Books to Scrape through ScrapingBee, parse products with HtmlAgilityPack, then export them to products.csv using CsvHelper (it'll include a header row automatically).
Put this in your Program.cs (assuming you've already got Models/Product.cs defined):
using System.Globalization;
using System.Text;
using HtmlAgilityPack;
using CsvHelper;
using CsvHelper.Configuration;
using CsBeeScraping.Models; // Product model
// 1) Fetch HTML through ScrapingBee
var apiKey = "YOUR_API_KEY";
var targetUrl = "https://books.toscrape.com/";
var requestUrl = $"https://app.scrapingbee.com/api/v1?api_key={apiKey}&url={Uri.EscapeDataString(targetUrl)}";
using var http = new HttpClient();
Console.WriteLine("Fetching Books to Scrape via ScrapingBee...");
// Fetch page HTML
var html = await http.GetStringAsync(requestUrl);
// 2) Load HTML into HtmlAgilityPack
var doc = new HtmlDocument();
doc.LoadHtml(html);
// 3) Extract product info (title, price, and URL)
var baseUrl = "https://books.toscrape.com/";
var products = new List<Product>();
// Each product lives under <article class="product_pod">
var cards = doc.DocumentNode.SelectNodes("//article[@class='product_pod']");
if (cards is null)
{
Console.WriteLine("No products found.");
}
else
{
foreach (var card in cards)
{
// Extract title and relative link
var a = card.SelectSingleNode(".//h3/a");
var title = a?.GetAttributeValue("title", a?.InnerText.Trim() ?? "")?.Trim() ?? "";
var href = a?.GetAttributeValue("href", "") ?? "";
var url = new Uri(new Uri(baseUrl), href).ToString();
// Extract and parse price text
var priceText = card.SelectSingleNode(".//p[@class='price_color']")?.InnerText?.Trim() ?? "£0.00";
var numeric = priceText.Replace("£", "").Trim();
decimal.TryParse(numeric, NumberStyles.Any, CultureInfo.InvariantCulture, out var price);
// Add to product list
products.Add(new Product { Title = title, Price = price, Url = url });
}
}
Console.WriteLine($"Parsed {products.Count} products.");
// 4) Write results to CSV (CsvHelper automatically includes headers)
var csvPath = Path.Combine(AppContext.BaseDirectory, "products.csv");
// UTF-8 with BOM ensures Excel on Windows opens special characters correctly
using var writer = new StreamWriter(csvPath, false, new UTF8Encoding(encoderShouldEmitUTF8Identifier: true));
// Configure CsvHelper
var csvConfig = new CsvConfiguration(CultureInfo.InvariantCulture)
{
NewLine = Environment.NewLine
};
// Write all products to the CSV file
using var csv = new CsvWriter(writer, csvConfig);
csv.WriteRecords(products);
Console.WriteLine($"Saved {products.Count} products to: {csvPath}");
Key things to note:
- ScrapingBee handles the fetching; HtmlAgilityPack parses the HTML.
- Relative links are normalized to absolute URLs.
- Prices are cleaned and parsed into decimals.
- CsvHelper writes everything to
products.csvwith headers included. - UTF-8 with BOM ensures Excel opens the file without encoding issues.
Run it:
dotnet run
You should see output like this:
Fetching Books to Scrape via ScrapingBee...
Parsed 20 products.
Saved 20 products to: .../products.csv
That's it! You now have a working C# web scraping pipeline that fetches, parses, cleans, and exports structured data.
Pro tip: the UTF-8 BOM we added ensures Excel on Windows opens your file without garbling any non-ASCII characters. For more Excel-focused scraping tricks, check out How to web scrape in Excel guide.
Handling CultureInfo for CSV formatting
When exporting data, culture and encoding settings affect how your CSV behaves across systems. Without them, numbers or special characters can appear differently depending on locale. The safest bet is to use CultureInfo.InvariantCulture. It keeps things consistent: decimal points stay as dots (.), and data looks the same everywhere.
We already used it when creating the CSV configuration:
var csvConfig = new CsvConfiguration(CultureInfo.InvariantCulture)
{
NewLine = Environment.NewLine
};
This ensures that values like 51.77 won't turn into 51,77 on systems using European locales.
If your CSV will be opened in Excel on Windows, add a UTF-8 BOM (Byte Order Mark) so it recognizes the encoding correctly:
using var writer = new StreamWriter(csvPath, false, new UTF8Encoding(encoderShouldEmitUTF8Identifier: true));
If you're sending the file to APIs or automated systems, skip the BOM:
using var writer = new StreamWriter(csvPath, false, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false));
In short:
CultureInfo.InvariantCulture— consistent numeric and date formatting across systems- UTF-8 with BOM — ensures Excel reads UTF-8 files correctly on Windows
- UTF-8 without BOM — better for server-side and automated workflows
💡 Note: Newer versions of Excel usually detect UTF-8 correctly even without a BOM, but older versions (Excel 2013 and below) may still misread characters. Including a BOM remains a reliable choice for compatibility.
Scraping JavaScript-heavy sites with Selenium WebDriver
Modern websites love JavaScript. They load content dynamically, update prices on the fly, and hide data behind API calls. That's why sometimes your scraper gets an empty page even though the site looks fine in a browser.
Before jumping to Selenium though, know this: in certain cases, you don't need it. ScrapingBee can already handle JavaScript rendering for you by adding a single query parameter:
&render_js=true
For example:
https://app.scrapingbee.com/api/v1?api_key=YOUR_API_KEY&url=https%3A%2F%2Fexample.com&render_js=true
This tells ScrapingBee to load the page in a real browser, execute all JavaScript, and return the final rendered HTML — no local browser automation needed.
That said, if you want to see how local browser automation works in C#, here's a minimal setup using Selenium WebDriver. It gives you full control of the browser, which can be handy for sites requiring interaction (like login forms, button clicks, or infinite scroll).
For a deeper dive into browser scraping, check the Web Scraping Tutorial Using Selenium.
Installing Selenium.WebDriver and ChromeDriver
If you want to run a real browser locally, you'll need two NuGet packages:
dotnet add package Selenium.WebDriver
dotnet add package Selenium.WebDriver.ChromeDriver
That's all you need: no manual downloads or driver path configuration. The Selenium.WebDriver.ChromeDriver package automatically includes and manages the correct ChromeDriver version for your installed Chrome browser, so it just works out of the box.
Once installed, you can launch a headless Chrome session and fetch fully rendered pages in just a few lines of C#. Keep in mind, though: running Selenium locally means you're spinning up a full browser instance. It's powerful but heavier on resources. In many cases, ScrapingBee's render_js=true parameter achieves the same effect with zero setup or driver maintenance.
Launching Chrome in headless mode
If you really need to run Selenium locally (for example, when testing scripts or debugging dynamic pages) you can launch Chrome in headless mode so it runs without opening a visible browser window.
Here's a minimal working example:
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
// Set up Chrome options for headless mode
var options = new ChromeOptions();
options.AddArgument("--headless"); // run without UI
options.AddArgument("--disable-gpu"); // avoid GPU usage issues
options.AddArgument("--no-sandbox"); // needed for some Linux environments
// Initialize the ChromeDriver with the configured options
using var driver = new ChromeDriver(options);
// Navigate to the target page
driver.Navigate().GoToUrl("https://example.com");
// Output the page title to confirm it loaded correctly
Console.WriteLine(driver.Title);
// Close the browser and release resources
driver.Quit();
This snippet starts a real Chrome process in the background, loads a page, prints its title, and shuts everything down cleanly.
Extracting elements with driver.FindElements()
Once the page is loaded, you can interact with it just like a real browser. Selenium supports both CSS selectors and XPath for finding elements. Similar to HtmlAgilityPack, but on a live, rendered page.
Here's a clean example using CSS selectors:
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;
// Configure Chrome to run in headless mode (no visible window)
var options = new ChromeOptions();
options.AddArgument("--headless");
options.AddArgument("--disable-gpu");
options.AddArgument("--no-sandbox");
// Start a new Chrome browser session with the given options
using var driver = new ChromeDriver(options);
// Navigate to the target website
driver.Navigate().GoToUrl("https://books.toscrape.com/");
// Wait for the page to finish loading and the product cards to appear
var wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
wait.Until(d =>
{
// Wait until the document is fully loaded
var ready = ((IJavaScriptExecutor)d)
.ExecuteScript("return document.readyState")
?.ToString() == "complete";
// Ensure at least one product card exists before proceeding
var cardsReady = d.FindElements(By.CssSelector("article.product_pod")).Count > 0;
return ready && cardsReady;
});
// Find all product cards using a CSS selector
var cards = driver.FindElements(By.CssSelector("article.product_pod"));
if (cards.Count == 0)
{
Console.WriteLine("No products found.");
}
else
{
// Loop through the first 5 product cards for demonstration
foreach (var card in cards.Take(5))
{
try
{
// Find the link element inside <h3><a></a></h3>
var a = card.FindElement(By.CssSelector("h3 a"));
// Try to read the title attribute; if missing, fallback to link text
var title = a?.GetAttribute("title");
if (string.IsNullOrWhiteSpace(title))
title = a?.Text ?? "(no title)";
title = title.Trim();
// Extract the price text (e.g., "£51.77")
var priceNode = card.FindElement(By.CssSelector("p.price_color"));
var price = priceNode?.Text?.Trim() ?? "N/A";
// Print result to console
Console.WriteLine($"{title} — {price}");
}
catch (NoSuchElementException)
{
// Skip any malformed or incomplete product cards
continue;
}
}
}
// Close the browser and release resources
driver.Quit();
Key things to note:
FindElements(By.CssSelector(...))returns all matching nodes suitable for iterating over repeated items.WebDriverWaitensures the page and elements are fully loaded before scraping.- Titles come from the
titleattribute (fallback: link text). - Each lookup is wrapped in
try/catchto skip incomplete cards gracefully. driver.Quit()cleanly shuts down Chrome when done.
Advanced techniques: Pagination and crawling
Let's level up our C# web scraping flow with a tiny crawler. We'll page through Books to Scrape using a queue, keep a HashSet<string> of visited URLs, and stop after a safe limit. ScrapingBee stays in front for reliability (proxy pool, optional JS rendering), so we don't hand-roll anti-bot tricks.
Implementing pagination with a queue
We'll start at the front page, look for the "next" link in the pager, and BFS through pages until we hit a max page count.
using System.Globalization;
using System.Text;
using CsvHelper;
using CsvHelper.Configuration;
using HtmlAgilityPack;
using CsBeeScraping.Models;
var apiKey = "YOUR_API_KEY";
var baseUrl = "https://books.toscrape.com/";
var startUrl = baseUrl;
int maxPages = 3; // keep it small for demos
var visited = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
var queue = new Queue<string>();
queue.Enqueue(startUrl);
var products = new List<Product>();
using var http = new HttpClient();
// CSV setup (UTF-8 BOM helps Excel)
var csvPath = Path.Combine(AppContext.BaseDirectory, "products.csv");
using var writer = new StreamWriter(csvPath, false, new UTF8Encoding(encoderShouldEmitUTF8Identifier: true));
using var csv = new CsvWriter(writer, new CsvConfiguration(CultureInfo.InvariantCulture)
{
NewLine = Environment.NewLine
});
// write header once
csv.WriteHeader<Product>();
csv.NextRecord();
int pagesCrawled = 0;
while (queue.Count > 0 && pagesCrawled < maxPages)
{
var current = queue.Dequeue();
if (!visited.Add(current)) continue;
var requestUrl = $"https://app.scrapingbee.com/api/v1?api_key={apiKey}&url={Uri.EscapeDataString(current)}";
Console.WriteLine($"Fetching: {current}");
var html = await http.GetStringAsync(requestUrl);
var doc = new HtmlDocument();
doc.LoadHtml(html);
// Extract products on this page
var cards = doc.DocumentNode.SelectNodes("//article[@class='product_pod']");
if (cards != null)
{
foreach (var card in cards)
{
var a = card.SelectSingleNode(".//h3/a");
var title = a?.GetAttributeValue("title", a?.InnerText.Trim() ?? "")?.Trim() ?? "";
var href = a?.GetAttributeValue("href", "") ?? "";
var url = new Uri(new Uri(baseUrl), href).ToString();
var priceText = card.SelectSingleNode(".//p[@class='price_color']")?.InnerText?.Trim() ?? "£0.00";
var numeric = priceText.Replace("£", "").Trim();
decimal.TryParse(numeric, NumberStyles.Any, CultureInfo.InvariantCulture, out var price);
var p = new Product { Title = title, Price = price, Url = url };
products.Add(p);
// stream row to CSV
csv.WriteRecord(p);
csv.NextRecord();
}
}
// Find "next" link and enqueue
var nextHref = doc.DocumentNode
.SelectSingleNode("//ul[@class='pager']/li[@class='next']/a")
?.GetAttributeValue("href", "");
if (!string.IsNullOrEmpty(nextHref))
{
var nextUrl = new Uri(new Uri(current), nextHref).ToString();
queue.Enqueue(nextUrl);
}
pagesCrawled++;
}
Console.WriteLine($"Crawled {pagesCrawled} pages, collected {products.Count} products.");
Console.WriteLine($"Saved CSV: {csvPath}");
How it works:
- Pages to visit go into a
Queue<string>. - A
HashSet<string>keeps track of visited URLs. - Each page is fetched via ScrapingBee, parsed with HtmlAgilityPack, and scanned for product cards.
- Products are written to both memory and CSV in real time (efficient for long crawls).
- The "Next" link is resolved to an absolute URL and enqueued.
- Uses
decimal.TryParse()withInvariantCulturefor consistent number formatting. - Cleans text with
HtmlEntity.DeEntitize()and skips bad nodes gracefully.
This turns your simple scraper into a lightweight, robust crawler ready for multi-page data extraction.
Avoiding duplicate URLs with HashSet
The HashSet<string> visited above prevents re-fetching the same page (useful on sites with circular pagination or filters). If you also crawl product detail pages, use a second set for item URLs:
var seenItems = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
// ...when you build the absolute product URL:
if (seenItems.Add(url))
{
products.Add(new Product { Title = title, Price = price, Url = url });
}
Notes:
- Keep
maxPageslow during development to avoid long runs. - If a page requires JavaScript to load content, just add
&render_js=trueto your ScrapingBee URL instead of introducing Selenium locally. - For websites that behave differently depending on headers, you can forward custom ones (like
User-AgentorAccept-Language) by addingforward_headers=trueand prefixing each header withspb-.
Taking your C# web scraping further
You've just built a complete C# scraper! It fetches, parses, cleans, and exports data like a champ. But the real power comes from keeping your code lean while letting the infrastructure do the heavy lifting.
That's exactly what ScrapingBee is built for. It handles headless browsers, JavaScript rendering, rotating proxies, and CAPTCHAs: all the messy parts that make scraping unreliable. You just call an API endpoint and get clean, ready-to-parse HTML in return.
Instead of scaling Selenium instances or maintaining proxy lists, focus on the logic that matters — what to scrape and how to use the data.
👉 Start your free ScrapingBee trial and keep your C# scrapers fast, simple, and production-ready.
Conclusion
Web scraping in C# doesn't have to be complicated. With the right libraries and a clean structure, you can fetch, parse, and export data in just a few lines of code. HtmlAgilityPack handles parsing, CsvHelper manages exports, and ScrapingBee takes care of all the tough stuff like proxies, JavaScript rendering, and browser handling.
Start small, keep your code modular, and build up from there. A reliable scraper is just good engineering, not black magic. Whether you're collecting research data, tracking prices, or automating reports, this setup gives you everything you need to scrape the web responsibly and efficiently.
Frequently asked questions
What are the main libraries used for web scraping in C#?
The core toolkit usually includes:
- HtmlAgilityPack — for parsing and querying HTML using XPath or CSS selectors.
- CsvHelper — for exporting structured data to CSV files cleanly and efficiently.
- HttpClient — for making network requests in a modern, async-friendly way.
How do you handle pagination when scraping multiple pages?
Use a combination of:
- A
Queue<string>to store pending URLs. - A
HashSet<string>to track visited pages and prevent duplicates.
Each time you detect a "Next" link (via XPath or CSS selector), enqueue its absolute URL. Then keep crawling until you hit a safe limit like maxPages or the queue is empty.
Can C# web scrapers handle dynamic content loaded by JavaScript?
Yes — but not through plain HttpClient, since it only fetches raw HTML. You have two solid options:
- ScrapingBee — add
&render_js=trueto your API request and it'll return fully rendered HTML. No browser setup, just one API call. - Selenium WebDriver — launches a headless Chrome session, runs scripts, and gives you live DOM access. It's slower but great for debugging or simulating real user interactions.
For production work, ScrapingBee is typically faster and more reliable.
How can I avoid getting blocked while web scraping?
A few key practices:
- Use ScrapingBee's rotating proxies to spread traffic across many IPs.
- Add realistic
User-AgentandAccept-Languageheaders (you can forward them viaforward_headers=true). - Respect polite crawl rates — avoid hammering sites with too many requests per second.
- Cache pages you've already scraped to minimize unnecessary hits.
- Always review a site's
robots.txtand terms of service before scraping.
How do I handle 429 errors or rate limits?
A 429 means “Too Many Requests.” To fix it:
- Add exponential backoff (wait longer after each retry).
- Randomize delays between requests.
- Cache already-scraped pages.
- Use ScrapingBee's rotating proxies or premium pool to balance traffic.
If you keep hitting 429s, slow down your scraper or use a queue-based worker system to control throughput.
Should I use CSS selectors or XPath in C#?
Both work, so it's mostly preference and context.
- XPath is native in HtmlAgilityPack and great for structured or nested elements.
- CSS selectors (via
HtmlAgilityPack.CssSelectors.NetCore) feel more intuitive if you come from front-end or JS backgrounds.
In general: use XPath for deeply nested, table-like layouts, and CSS for simpler, modern HTML.
How do I store results beyond CSV?
CSV is great for small projects, but once data grows or needs updates, use a database:
- SQLite for local, lightweight scraping.
- PostgreSQL or MySQL for scalable storage.
- Use an ORM like Entity Framework Core to define models and upsert (update/insert) by a unique key, like a product URL.

Jennifer Marsh is a software developer and technology writer for a number of publications across several industries including cybersecurity, programming, DevOps, and IT operations.


