Kotlin web scraping: Learn how to extract data step by step

Q: Is Kotlin good for web scraping?

Yes, Kotlin is a strong choice for web scraping. It runs on the JVM, gives you access to mature Java libraries, and adds cleaner syntax with built-in null safety. This makes scraping code easier to write, safer to run, and simpler to maintain as projects grow.

Q: Do Kotlin developers use Java scraping libraries?

Yes, Kotlin developers commonly use Java scraping libraries. Kotlin is fully interoperable with Java, so tools like Jsoup, OkHttp, Selenium, and Playwright work seamlessly. This allows developers to reuse proven libraries without giving up Kotlin's cleaner syntax.

Q: Can Kotlin scrape JavaScript websites?

Kotlin can scrape JavaScript-heavy websites, but it usually requires extra tooling. Headless browsers like Playwright or HtmlUnit can render pages before parsing. Scraping APIs can also handle JavaScript rendering and blocking automatically.

Michael Nyamande | 11 January 2026 | 22 min read

Table of contents

Kotlin web scraping is a practical way to extract data from websites using a modern, JVM-based language. Developers choose Kotlin because it combines clean syntax, strong typing, and full access to the Java ecosystem, making scraping code easier to write and safer to maintain.

In this guide, you will learn how web scraping works in Kotlin from the ground up. We will cover the tools you need, how to fetch and parse HTML, and how to extract real data using clear, step-by-step examples. By the end of the article, you will know how to build a working Kotlin scraper, understand when simple HTTP requests are enough, and recognize when more advanced solutions are needed for JavaScript-heavy or protected sites.

Kotlin web scraping: Learn how to extract data step by step

💡 Interested in web scraping with Java? Check out our guide to the best Java web scraping libraries

Quick answer (TL;DR)

Kotlin web scraping is straightforward: fetch the page HTML with an HTTP client, parse it with an HTML parser, then extract the fields you need using CSS selectors. A simple stack is OkHttp for requests and Jsoup for parsing. If the site is JavaScript-heavy or blocks bots, a scraping API can handle rendering and anti-bot friction so you can focus on extraction.

If you want the bigger picture around terminology, check out Scraping vs Crawling in Scala.

Full Kotlin code (Books to Scrape)

This example shows the full Kotlin web scraping flow in one place. It sends an HTTP request to the Books to Scrape site, parses the returned HTML with Jsoup, and selects all book cards using simple CSS selectors. Each book's title, price, availability, rating, and links are extracted safely and printed to the console.

import okhttp3.OkHttpClient
import okhttp3.Request
import org.jsoup.Jsoup
import org.jsoup.nodes.Element

// Simple data model to store scraped book data
data class Book(
    val title: String,
    val price: String,
    val availability: String,
    val rating: String,
    val productUrl: String,
    val imageUrl: String
)

fun main() {
    // Base URL is used by Jsoup to resolve relative links
    val baseUrl = "https://books.toscrape.com/"
    val targetUrl = baseUrl

    // OkHttp client handles HTTP requests
    val client = OkHttpClient()

    // Build a basic GET request with a User-Agent header
    val request = Request.Builder()
        .url(targetUrl)
        .header(
            "User-Agent",
            "Mozilla/5.0 (compatible; Kotlin web scraping tutorial)"
        )
        .build()

    try {
        // Execute the HTTP request
        client.newCall(request).execute().use { response ->

            // Check for non-200 responses
            if (!response.isSuccessful) {
                println("Request failed with status: ${response.code}")
                return
            }

            // Read the response body as raw HTML
            val html = response.body?.string()
            if (html.isNullOrBlank()) {
                println("Empty response body, nothing to parse.")
                return
            }

            // Parse the HTML into a Jsoup Document
            val doc = Jsoup.parse(html, baseUrl)

            // Select all book cards on the page
            // Layout: ol.row > li > article.product_pod
            val bookCards = doc.select("ol.row > li")
            if (bookCards.isEmpty()) {
                println("No books found. The page structure may have changed.")
                return
            }

            // Parse each book card into a Book object
            val books = bookCards.mapNotNull { card ->
                parseBookCard(card)
            }

            // Print results to the console
            println("Found ${books.size} books on the page.\n")

            books.forEachIndexed { index, book ->
                println("#${index + 1}")
                println("Title: ${book.title}")
                println("Price: ${book.price}")
                println("Availability: ${book.availability}")
                println("Rating: ${book.rating}")
                println("Product URL: ${book.productUrl}")
                println("Image URL: ${book.imageUrl}")
                println()
            }
        }
    } catch (e: Exception) {
        // Catch network, parsing, or unexpected errors
        println("Scrape failed: ${e.message}")
    }
}

// Extracts all relevant fields from a single book card element
private fun parseBookCard(card: Element): Book? {

    // Select elements using CSS selectors
    val titleEl = card.selectFirst("article.product_pod h3 a")
    val priceEl = card.selectFirst("article.product_pod .product_price .price_color")
    val availabilityEl = card.selectFirst("article.product_pod .product_price .instock.availability")
    val ratingEl = card.selectFirst("article.product_pod p.star-rating")
    val imageEl = card.selectFirst("article.product_pod .image_container img")

    // Title is stored in the "title" attribute to avoid truncated text
    val title = titleEl?.attr("title")?.trim().orEmpty()
    if (title.isBlank()) return null

    // Price text (e.g. £51.77)
    val price = priceEl?.text()?.trim().orEmpty()

    // Availability text often contains extra whitespace and line breaks
    val availability = availabilityEl
        ?.text()
        ?.replace(Regex("\\s+"), " ")
        ?.trim()
        .orEmpty()

    // Rating is encoded in a class name like "star-rating Three"
    val rating = ratingEl
        ?.classNames()
        ?.map { it.trim() }
        ?.firstOrNull { it.equals("One", true) || it.equals("Two", true) || it.equals("Three", true) || it.equals("Four", true) || it.equals("Five", true) }
        ?: "Unknown"

    // Convert relative URLs into absolute URLs
    val productUrl = titleEl?.absUrl("href").orEmpty()
    val imageUrl = imageEl?.absUrl("src").orEmpty()

    return Book(
        title = title,
        price = price,
        availability = availability,
        rating = rating,
        productUrl = productUrl,
        imageUrl = imageUrl
    )
}

This keeps things intentionally simple so you can see the core mechanics without distractions.

In the next sections of the tutorial, we will break these ideas down further and cover more concepts in detail.

What is Kotlin web scraping

Kotlin web scraping is the process of extracting data from websites using Kotlin code. You make a request to a page, read the HTML response, and pull out the information you care about. This data is already public, you are just collecting it in a structured way. (You still need to respect the site's ToS and avoid abusive traffic.)

Web scraping is commonly used for things like tracking prices, collecting job listings, monitoring content updates, or building datasets for analysis. Instead of copying data by hand, your code does the boring work for you.

Kotlin works well for this because it runs on the JVM and gives you access to the entire Java ecosystem. You can reuse mature Java libraries while writing cleaner and safer Kotlin code. The language helps reduce boilerplate and makes common mistakes harder to introduce.

Some websites rely heavily on JavaScript or block automated requests. In those cases, tools like ScrapingBee can help by handling HTML rendering, rotating IPs, and bypassing common blocks, so you can focus on extracting data instead of fighting infrastructure issues.

If you are curious how scraping looks in other ecosystems, you can also check out Web scraping with Rust.

How web scraping works in Kotlin

The basic workflow of Kotlin web scraping is simple and predictable.

First, your code sends an HTTP request to a target URL. The server responds with HTML content, which represents the structure of the page.
Next, you parse the HTML to turn it into something you can work with. This is usually done using an HTML parser that lets you select elements by tag, class, or attribute.
Once parsed, you extract the data you need and store it in a format like JSON, CSV, or a database.

For pages that rely on JavaScript, a headless browser or HTML rendering tool may be required. Libraries and tools like HtmlUnit can simulate a real browser and execute scripts before returning the final HTML. You can learn more about this approach in Using HtmlUnit.

When to use Kotlin instead of Java

Kotlin is often a better choice than Java when you want cleaner and safer scraping code. The language reduces boilerplate, which means fewer lines of code for the same logic. This makes scrapers easier to read, debug, and maintain. Null safety is another big advantage. Kotlin forces you to handle missing values explicitly, which helps avoid runtime crashes when a page structure changes or data is missing. These issues are very common in web scraping projects.

At the same time, Kotlin does not limit your options. You can still use all Java scraping libraries, HTTP clients, and parsing tools without restrictions. If you want a Java-focused overview for comparison, take a look at Intro to Java Web Scraping.

Choosing Kotlin tools for web scraping

Kotlin web scraping usually comes down to three pieces: an HTTP client to fetch pages, an HTML parser to extract data, and sometimes a headless browser when JavaScript is involved. You do not need a huge stack to start, but you do want to pick tools that match the site you are scraping.

For fetching pages, you can use OkHttp, Ktor Client, or even Java's built-in HTTP client.

OkHttp is popular because it is simple and stable.
Ktor Client feels more "Kotlin-first" and works nicely with coroutines. If you are building something bigger, choosing a client that supports timeouts, retries, and headers cleanly will save you pain later.

For parsing HTML, Jsoup is the usual go-to. It supports CSS selectors, which makes scraping feel like picking elements in DevTools. You can grab text, read attributes, and navigate parent or child nodes without writing messy string logic.

Some sites load content with JavaScript, so the raw HTML response is missing the data you want. That is when headless tools come in, like HtmlUnit (works for some sites), Playwright, or Selenium. These tools can run JavaScript and give you the final rendered DOM, but they cost more in CPU and time, so you should only use them when needed.

Blocking is also a real thing. You might hit CAPTCHAs, rate limits, or get served different content. If you want to avoid fighting with proxies, browser fingerprints, and rendering, an API like ScrapingBee can handle that layer and return HTML you can parse. It is especially handy for sites that require JavaScript rendering or have aggressive bot protection.

If you like seeing how scraping stacks differ across languages, you can also check out Haskell web scraping overview.

Parsing HTML with Jsoup in Kotlin

Jsoup works great in Kotlin because it is just a Java library with a clean API. The flow is simple: fetch or load HTML, parse it into a Document, then use CSS selectors to find elements. From there you can read text content or attributes like href, src, and title.

Here is a tiny example using the Books to Scrape layout. It finds the first book card and extracts the product title from the link inside h3.

import org.jsoup.Jsoup

fun main() {
    val html = """
        <li class="col-xs-6 col-sm-4 col-md-3 col-lg-3">
          <article class="product_pod">
            <h3>
              <a href="catalogue/a-light-in-the-attic_1000/index.html" title="A Light in the Attic">
                A Light in the ...
              </a>
            </h3>
          </article>
        </li>
    """.trimIndent()

    try {
        val doc = Jsoup.parse(html)
        val firstTitle = doc.selectFirst("article.product_pod h3 a")
            ?.attr("title")
            ?: "Missing title"

        println(firstTitle)
    } catch (e: Exception) {
        println("Failed to parse HTML: ${e.message}")
    }
}

This example uses selectFirst so you do not crash if the selector does not match. In scraping, pages change and elements disappear, so null-safe reads like this are worth it. You can also extract other fields using the same idea, like a.attr("href") for the product link or img.attr("src") for the cover image.

If you want a deeper Jsoup selector walkthrough, this Java-focused guide still maps cleanly to Kotlin: Parse HTML with Jsoup (Java).

Other libraries you can use

Jsoup is great for most HTML scraping, and it supports both CSS selectors and XPath-style queries. If you need different DOM tooling (or you’re working in an existing Java parsing stack), alternatives like TagSoup, NekoHTML, or Java’s built-in DOM/XPath APIs are also options. They’re usually more verbose than Jsoup, but can fit better in certain legacy or XML-heavy setups.

If your target pages require JavaScript, you will likely pick a headless solution. HtmlUnit is lightweight compared to full browsers, but modern sites often behave better with Playwright or Selenium. Playwright tends to be faster and more reliable for modern rendering, while Selenium has huge ecosystem support and tons of examples.

You can also add helper libraries depending on what you are doing. A JSON library like kotlinx.serialization helps when you store results. A CSV writer helps when you export datasets. Retry helpers and rate limiting utilities help when you scale beyond a few pages, because websites will eventually push back if you hit them too hard.

For another language perspective on building a scraping stack, here is a good overview: OCaml web scraping.

Step-by-step Kotlin web scraping tutorial

This Kotlin web scraping walkthrough uses a real practice site: Books to Scrape. We will fetch the homepage, parse the HTML, extract all books in the list, and print each book's data to the console. The flow stays linear, so you can follow it without bouncing around.

Step 1: Create a Kotlin project and add dependencies

Use Gradle (Kotlin DSL) and add an HTTP client plus an HTML parser. OkHttp fetches pages. Jsoup parses and queries HTML with CSS selectors.

Add these dependencies to your build.gradle.kts:

dependencies {
    implementation("com.squareup.okhttp3:okhttp:5.3.2")
    implementation("org.jsoup:jsoup:1.21.2")
}

Example provided in this tutorial works in a simple Kotlin CLI application, a backend service, or any JVM-based project. Using Java 11 or newer is recommended, as most modern HTTP clients and libraries expect it. As long as your project can run Kotlin on the JVM, this setup will work the same way.

Step 2: Fetch the page HTML

We request https://books.toscrape.com/ and read the response body as a string. This response contains the full HTML of the page, which is the input for the parsing step.

Step 3: Parse the HTML and select all books

The page layout is simple and consistent. All books are listed inside an ol.row container. Each book is represented by an li element that wraps an article.product_pod.

Example HTML to scrape with Kotlin

Inside each book card you can extract:

The title from h3 a, preferably using the title attribute
The price from p.price_color
The availability status from p.instock.availability
The rating from p.star-rating, where the value is encoded in the class name
The product link from the href attribute of h3 a

Step 4: Extract fields and print results

We loop over every li element inside ol.row and extract each field using CSS selectors. When a selector does not match, we fall back to a safe default instead of letting the scraper crash. This approach makes the scraper more resilient to small layout changes.

If you want to compare this flow with another Java-based scraping approach, you can check out Jaunt for Java web scraping.

Complete Kotlin code example

This is a full copy-paste example. It fetches the page, parses all books on the page, and prints the extracted data to the console.

import okhttp3.OkHttpClient
import okhttp3.Request
import org.jsoup.Jsoup
import org.jsoup.nodes.Element

data class Book(
    val title: String,
    val price: String,
    val availability: String,
    val rating: String,
    val productUrl: String,
    val imageUrl: String
)

fun main() {
    val baseUrl = "https://books.toscrape.com/"
    val targetUrl = baseUrl

    val client = OkHttpClient()

    val request = Request.Builder()
        .url(targetUrl)
        .header(
            "User-Agent",
            "Mozilla/5.0 (compatible; Kotlin web scraping tutorial; +https://example.com)"
        )
        .build()

    try {
        client.newCall(request).execute().use { response ->
            if (!response.isSuccessful) {
                println("Request failed with status: ${response.code}")
                return
            }

            val html = response.body?.string()
            if (html.isNullOrBlank()) {
                println("Empty response body, nothing to parse.")
                return
            }

            val doc = Jsoup.parse(html, baseUrl)

            // Books are listed inside: ol.row > li
            val bookCards = doc.select("ol.row > li")

            if (bookCards.isEmpty()) {
                println("No books found. The page structure may have changed.")
                return
            }

            val books = bookCards.mapNotNull { card ->
                parseBookCard(card)
            }

            println("Found ${books.size} books on the page.\n")

            books.forEachIndexed { index, book ->
                println("#${index + 1}")
                println("Title: ${book.title}")
                println("Price: ${book.price}")
                println("Availability: ${book.availability}")
                println("Rating: ${book.rating}")
                println("Product URL: ${book.productUrl}")
                println("Image URL: ${book.imageUrl}")
                println()
            }
        }
    } catch (e: Exception) {
        println("Scrape failed: ${e.message}")
    }
}

private fun parseBookCard(card: Element): Book? {
    val titleEl = card.selectFirst("article.product_pod h3 a")
    val priceEl = card.selectFirst("article.product_pod .product_price .price_color")
    val availabilityEl = card.selectFirst("article.product_pod .product_price .instock.availability")
    val ratingEl = card.selectFirst("article.product_pod p.star-rating")
    val imageEl = card.selectFirst("article.product_pod .image_container img")

    // Title is best read from the "title" attribute.
    val title = titleEl?.attr("title")?.trim().orEmpty()
    if (title.isBlank()) return null

    val price = priceEl?.text()?.trim().orEmpty()

    // Availability text often contains extra whitespace and newlines.
    val availability = availabilityEl
        ?.text()
        ?.replace(Regex("\\s+"), " ")
        ?.trim()
        .orEmpty()

    // Rating is encoded as a class name like: "star-rating Three"
    val rating = ratingEl
        ?.classNames()
        ?.map { it.trim() }
        ?.firstOrNull { it.equals("One", true) || it.equals("Two", true) || it.equals("Three", true) || it.equals("Four", true) || it.equals("Five", true) }
        ?: "Unknown"

    // Jsoup uses the base URL we provided to resolve relative links.
    val productUrl = titleEl?.absUrl("href").orEmpty()
    val imageUrl = imageEl?.absUrl("src").orEmpty()

    return Book(
        title = title,
        price = price,
        availability = availability,
        rating = rating,
        productUrl = productUrl,
        imageUrl = imageUrl
    )
}

Code walkthrough:

A Book data class is used to keep scraped data structured and readable. Each field maps directly to a visible part of the product card.
OkHttp handles the HTTP request. It is reliable, easy to configure, and works well for simple scraping jobs. A basic User-Agent header helps avoid trivial blocks (it's not a panacea though).
Jsoup parses the HTML into a DOM-like structure. Providing the base URL allows relative links to be resolved automatically.
All books are selected using the ol.row > li selector. This targets only book cards and avoids unrelated page elements.
Each book card is parsed in a separate function. This keeps the main flow clean and makes the scraping logic easier to reuse or extend.
selectFirst is used to safely read elements without crashing when something is missing. This is important because page layouts can change.
The title is extracted from the title attribute to avoid truncated text shown on the page.
Availability text is cleaned to remove extra whitespace and line breaks.
The rating is derived from the class name on p.star-rating, where the actual value is encoded.
absUrl converts relative links into full URLs, which avoids manual string handling.

If you want to avoid blocks, handle rendering, and scale this kind of scraper without babysitting proxies, check out the Web scraping API.

What about pagination?

Many real websites spread data across multiple pages. Pagination usually works by adding a page number to the URL or following a "next" link in the HTML. The core idea stays the same: fetch a page, extract data, find the next page, and repeat until there are no more pages.

import okhttp3.OkHttpClient
import okhttp3.Request
import org.jsoup.Jsoup
import kotlin.time.Duration.Companion.seconds

// Singleton object holding a single OkHttpClient instance.
// This client is reused for all HTTP requests in the app.
object HttpClient {
    val client: OkHttpClient by lazy {
        OkHttpClient.Builder()
            // Max time allowed to establish a TCP connection
            .connectTimeout(10.seconds)
            // Max time allowed to read the response body
            .readTimeout(20.seconds)
            .build()
    }
}

// Pagination example for Books to Scrape.
// The scraper loads a page, extracts book titles,
// finds the "next" page link, and repeats.

fun main() {
    val startUrl = "https://books.toscrape.com/"
    var currentUrl = startUrl

    // Limit how many pages we scrape in this demo
    val maxPages = 3
    var currentPage = 1

    while (currentPage <= maxPages) {
        println("Scraping page $currentPage: $currentUrl")

        // Build an HTTP GET request for the current page
        val request = Request.Builder()
            .url(currentUrl)
            // User-Agent header to look like a real browser
            .header(
                "User-Agent",
                "Mozilla/5.0 (compatible; Kotlin web scraping pagination example)"
            )
            .build()

        try {
            // Execute the request using the shared HTTP client
            HttpClient.client.newCall(request).execute().use { response ->

                // Stop if the server returned a non-OK status
                if (!response.isSuccessful) {
                    println("Request failed with status: ${response.code}")
                    return
                }

                // Read the HTML response body as a string
                val html = response.body?.string()
                if (html.isNullOrBlank()) {
                    println("Empty response body, stopping.")
                    return
                }

                // Parse HTML using the *current page URL* as base
                // This allows Jsoup to resolve relative links correctly
                val doc = Jsoup.parse(html, currentUrl)

                // Select all book cards on the page
                val bookCards = doc.select("ol.row > li")
                println("Found ${bookCards.size} books on this page.")

                // Extract and print each book title
                bookCards.forEach { card ->
                    val title = card
                        .selectFirst("article.product_pod h3 a")
                        ?.attr("title")
                        ?.trim()
                        ?: "Unknown title"

                    println(" - $title")
                }

                // Locate the "next" page link in the pagination section
                val nextLinkEl = doc.selectFirst("ul.pager li.next a")
                if (nextLinkEl == null) {
                    println("No next page found. Done.")
                    return
                }

                // Convert the relative "href" into an absolute URL
                val nextUrl = nextLinkEl.absUrl("href")
                if (nextUrl.isBlank()) {
                    println("Next link exists but could not be resolved. Done.")
                    return
                }

                // Update state for the next loop iteration
                currentUrl = nextUrl
                currentPage++
            }

            // Small pause between requests to avoid hitting the site too fast
            Thread.sleep(1500)

        } catch (e: Exception) {
            println("Pagination scrape failed: ${e.message}")
            return
        }
    }

    println("Finished pagination example.")
}

This example shows a simple pagination flow for Books to Scrape. The code loads a page, extracts book data, finds the next link, and moves on. The key detail is parsing HTML with the current page URL as the base, so relative pagination links resolve correctly.

Once you understand how to scrape a single page reliably, adding pagination is mostly about looping and knowing when to stop.

Error handling and troubleshooting Kotlin web scraping

Scraping breaks for boring reasons. Timeouts happen. HTML changes. Sites block requests. A good scraper expects this and fails in a controlled way. The goal is simple: get data when possible, and get useful logs when it is not.

Handle HTTP errors and retries

Always check status codes and treat non-200 responses as normal. Add timeouts, and retry a few times for temporary failures like 429 or 5xx. Keep retries limited so you do not hammer the site.

import okhttp3.OkHttpClient
import okhttp3.Request
import java.io.IOException
import kotlin.time.Duration.Companion.seconds

// Singleton HTTP client reused for all requests.
// Keeps connections pooled and avoids rebuilding clients.
object HttpClient {
    val client: OkHttpClient by lazy {
        OkHttpClient.Builder()
            // Maximum time allowed to establish a connection
            .connectTimeout(10.seconds)
            // Maximum time allowed to read the response
            .readTimeout(20.seconds)
            .build()
    }
}

// Fetches HTML from a URL with retry logic for temporary failures.
fun fetchHtmlWithRetries(
    url: String,
    maxAttempts: Int = 3
): String {

    // HTTP request definition (safe to reuse between attempts)
    val request = Request.Builder()
        .url(url)
        // User-Agent header to identify the client
        .header("User-Agent", "Mozilla/5.0 (compatible; Kotlin web scraping)")
        .build()

    var lastError: Exception? = null

    // Attempt the request multiple times
    for (attempt in 1..maxAttempts) {
        try {
            // Execute the request using the shared HTTP client
            HttpClient.client.newCall(request).execute().use { response ->
                val code = response.code

                // Successful response
                if (code == 200) {
                    val body = response.body?.string().orEmpty()

                    // Guard against empty HTML responses
                    if (body.isBlank()) {
                        throw IOException("Empty response body")
                    }

                    return body
                }

                // Retry on rate limiting or server-side errors
                if (code == 429 || code in 500..599) {
                    Thread.sleep((attempt * 500L).coerceAtMost(2000L))
                    continue
                }

                // Non-retryable HTTP error
                throw IOException("HTTP $code for $url")
            }
        } catch (e: Exception) {
            // Store the last error for reporting
            lastError = e

            // Backoff delay before the next attempt
            Thread.sleep((attempt * 500L).coerceAtMost(2000L))
        }
    }

    // All retry attempts failed
    throw IOException(
        "Failed after $maxAttempts attempts: ${lastError?.message}",
        lastError
    )
}

Make your parsers null-safe and resilient

HTML changes all the time. If your scraper assumes every selector exists, it will crash the first time a price is missing or a class name changes. Prefer selectFirst, use safe calls, and set defaults that make sense.

import org.jsoup.nodes.Element

fun safeText(root: Element, css: String, defaultValue: String = ""): String {
    return root.selectFirst(css)?.text()?.trim().takeUnless { it.isNullOrBlank() } ?: defaultValue
}

fun safeAttr(root: Element, css: String, attr: String, defaultValue: String = ""): String {
    return root.selectFirst(css)?.attr(attr)?.trim().takeUnless { it.isNullOrBlank() } ?: defaultValue
}

Log what matters, not everything

When something fails, you want quick answers. Log the URL, the status code, and which selector failed. If parsing breaks, save a small HTML snippet to a file so you can debug without re-running requests.

import java.nio.file.Files
import java.nio.file.Paths

fun dumpHtmlForDebug(filename: String, html: String) {
    val path = Paths.get(filename)
    Files.write(path, html.toByteArray(Charsets.UTF_8))
    println("Saved debug HTML to: $path")
}

Fix the "relative link" and "broken URL" problem

Lots of sites return relative paths like catalogue/... or media/cache/.... If you print those as-is, your output is annoying to use. Parse with a base URL and use absUrl so you always get full links.

import org.jsoup.Jsoup

fun absoluteLinksExample(html: String, baseUrl: String) {
    val doc = Jsoup.parse(html, baseUrl)
    val productUrl = doc.selectFirst("article.product_pod h3 a")?.absUrl("href").orEmpty()
    val imageUrl = doc.selectFirst("article.product_pod .image_container img")?.absUrl("src").orEmpty()
    println("Product URL: $productUrl")
    println("Image URL: $imageUrl")
}

Know the common failure modes and what to do

These are the usual "why is my scraper dead" cases:

429 Too Many Requests: slow down, add delays between requests, rotate IPs, or use an API that handles rate limiting for you.
403 Forbidden: your headers look bot-like or your IP is blocked, so you need better request behavior, realistic headers, or proxy rotation.
Empty HTML for dynamic pages: the site relies on JavaScript, so you need a headless browser or a web scraping API like ScrapingBee.
Selectors return nothing: the page layout changed, so re-check the HTML structure and update your selectors.
HTML doesn't match what you see in the browser: sometimes the server returns a "Please enable JavaScript" / cookie-consent / bot-check page instead of the real content. Before you tweak selectors, dump the HTML to a file and open it to confirm you're scraping the actual page.

If you keep your requests cautious, your parsing null-safe, and your logs useful, debugging becomes quick instead of painful.

Scrape responsibly! Scrape public data and check the site's ToS and robots.txt; even if robots.txt isn't law, it signals what the operator expects. Avoid aggressive request rates and unnecessary load. Responsible scraping keeps your projects stable and helps avoid legal or ethical issues.

Ready to take Kotlin web scraping to the next level?

At this point, you have everything you need to build a solid Kotlin web scraping setup. You know how to fetch pages, parse HTML, and extract real data. This works great for simple sites and learning projects. When you move to real-world targets, things get trickier. JavaScript rendering, IP blocking, CAPTCHAs, and rate limits can slow you down fast. That is where using a scraping API makes sense.

ScrapingBee handles the hard parts for you so your Kotlin code stays clean and focused on data extraction.

What you get with ScrapingBee:

JavaScript rendering for modern, dynamic websites
Built-in proxy rotation and IP management
Automatic handling of common bot protection
Simple API that returns clean HTML or JSON
Works smoothly with Kotlin, Java, and any HTTP client

You can start testing right away. When you sign up at app.scrapingbee.com/account/register you get 1,000 free credits, no credit card required. That is more than enough to experiment, build a prototype, and see how it fits into your Kotlin web scraping workflow.

If you are ready to stop fighting blocks and focus on shipping scrapers that actually work, give it a shot!

Conclusion

Kotlin web scraping is a solid choice when you want clean code, strong typing, and full access to the Java ecosystem. With the right tools, you can go from a simple request to structured data in just a few clear steps.

Start small, scrape responsibly, and build up as your needs grow. Once you hit JavaScript-heavy pages or blocking issues, adding a scraping API can save you a lot of time. The important part is keeping your scraper simple, readable, and easy to maintain.

Before you go, check out these related reads:

Frequently asked questions (FAQs)

Is Kotlin good for web scraping?

Yes, Kotlin is a great option for web scraping. It runs on the JVM, gives you access to mature Java libraries, and offers cleaner syntax with built-in null safety. This makes scrapers easier to write, debug, and maintain over time.

Do Kotlin developers use Java scraping libraries?

Yes, most Kotlin developers use Java scraping libraries. Kotlin is fully interoperable with Java, so tools like Jsoup, OkHttp, Selenium, and Playwright work without issues. This gives Kotlin scrapers access to a large, proven ecosystem.

Can Kotlin scrape JavaScript websites?

Kotlin can scrape JavaScript websites, but it usually requires extra tools. Headless browsers like Playwright or HtmlUnit can execute JavaScript and return rendered HTML. APIs like ScrapingBee can also handle rendering and blocking for you.

How do I parse HTML in Kotlin?

HTML is usually parsed in Kotlin using Jsoup. You load the HTML into a Document, then query elements using CSS selectors. From there, you can extract text, attributes, and links while safely handling missing elements.

Michael Nyamande

A digital product manager by day, Michael is a tech enthusiast who is always tinkering with different technologies. His interests include web and mobile frameworks, NoCode development, and blockchain development.