How to Scrape Financial Statements with Python: A Practical Guide for Beginners

Kevin Sahin | 06 January 2026 | 14 min read

Table of contents

If you're an investor, analyst, or developer working in the finance industry, you should know how to scrape financial statements with Python. It's a great way to monitor the current stock price, keep the pulse on market trends, and make informed financial decisions. After all, financial markets are prone to fluctuations, so you simply can't waste time gathering financial data manually.

In this practical guide, we’ll walk you through everything you need to know about web scraping for financial statements with Python, from basic setup to advanced automation techniques. We’ll cover the essential tools, legal considerations, and step-by-step implementation that transforms raw SEC filings into structured, analyzable data.

By the end of this tutorial, you’ll have built your own desired financial data scraper using ScrapingBee’s powerful API, which handles the complex challenges of modern web scraping. This approach keeps your Python code simple while ensuring reliable access to the financial data you need on your web scraping journey.

Quick Answer (TL;DR)

Here is a quick start: add your ScrapingBee API key, run the snippet, and preview the income statement in a pandas DataFrame.

from scrapingbee import ScrapingBeeClient
import pandas as pd
import json

client = ScrapingBeeClient(api_key="YOUR_API_KEY")

filing_url = "https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230930.htm"

rules = {
    "income_statement": {
        # Find any table that contains the text "Net sales"
        "selector_type": "xpath",
        "selector": '//table[.//text()[contains(., "Net sales")]]',
        "type": "table",
        "output": "table_array"
    }
}

resp = client.get(
    filing_url,
    params={
        "extract_rules": json.dumps(rules),
        "render_js": False   # SEC HTML is static; JS not needed
    },
)

data = json.loads(resp.text)          # -> {"income_statement": [[...headers...], [...row1...], ...]}
table = data["income_statement"]
df = pd.DataFrame(table[1:], columns=table[0])
print(df.head())

Seeing a DataFrame means you’re set. If not, check your API key and selector; SEC pages are static, so JS rendering is usually unnecessary.

What Are Financial Statements and Why Scrape Them?

Financial statements represent the core documents that publicly traded companies must file with the United States Securities and Exchange Commission (SEC), including income statements, balance sheets, and cash flow statements. These documents contain the essential data that drives investment decisions, credit analysis, and market research across the financial industry.

However, it's difficult to access this information efficiently. While the SEC’s EDGAR database contains decades of filings, manually downloading and processing these documents becomes impractical when analyzing multiple companies or tracking trends over time. This is where Python financial data scraping becomes invaluable.

Operations

Yet, large-scale scraping projects, targeting, for example, Yahoo Finance, require a reliable scraping tool. I recommend ScrapingBee for making the scraping process easy, no matter if the website relies on JavaScript or anti-bot measures. Check out the How to Scrape Yahoo: Step-by-Step Tutorial guide to see how it works with the Yahoo Finance page.

You should also know that financial filings on EDGAR appear as both HTML and XML documents, depending on the year and form type. When targeting a web page, consider the website structure changes that occur over time, table layouts shifting, a specific HTML element being renamed, and pagination moving. Designing selectors that tolerate these changes helps your scraper reliably retrieve data even when the surface HTML code evolves.

Types of Financial Statements You Can Extract

Let's take a look at what kind of economic indicators you can gather. The primary financial statements available for scraping include the Income Statement, which shows revenue, expenses, and profit over a specific period. Balance Sheets provide a snapshot of assets, liabilities, and equity at a point in time, while Cash Flow Statements track how money moves in and out of the business.

Additionally, you can extract data from Notes to Financial Statements, which often contain crucial details about accounting methods, debt obligations, and future commitments. These notes frequently appear in structured HTML tables within SEC filings, making them ideal targets for automated extraction.

Most financial filings follow consistent table patterns, with clearly defined rows for line items such as “Total Revenue” or “Net Income” and columns for different time periods. This standardization makes it possible to create robust scraping rules that work across different companies and filing periods, though you’ll occasionally encounter variations that require flexible parsing logic.

That being said, you can also target other financial websites. With the right tools, financial professionals can gather data from financial news and scrape stock data.

Benefits of Scraping Over Manual Downloads

Speed represents the most obvious advantage when you scrape financial data automatically. What might take hours of manual copying and pasting can be completed in minutes through automated scripts. This efficiency becomes even more pronounced when analyzing dozens or hundreds of companies simultaneously.

Repeatability ensures consistent data collection processes. Manual data entry introduces human error, inconsistent formatting, and missed updates. Automated scraping eliminates these issues while enabling scheduled updates that keep your datasets current without ongoing manual intervention.

ScrapingBee handles the technical complexities that often derail scraping projects, including JavaScript rendering for dynamic content and proxy rotation to avoid IP blocks. This means your Python code can focus on data processing and analysis rather than wrestling with anti-bot measures and changing website structures.

Setting Up Your Python Environment

It's time to start web scraping financial data. You can kick-start the process by installing Python 3.8 or newer, which provides the stability and features needed for modern web scraping applications. Most financial data processing benefits from the enhanced performance and library compatibility found in recent Python versions.

Then, you'll need a Jupyter Notebook, which offers an ideal environment for developing and testing scraping scripts interactively. This approach proves particularly valuable when exploring new financial data sources or debugging extraction rules.

Before you move on with the installs, check out our comprehensive web scraping techniques beyond financial data. The best way to start is our Python Web Scraping guide, which covers additional libraries and advanced techniques that complement the financial-specific methods we’ll discuss here.

Installing Python and Jupyter Notebook

You can download Python from python.org and install Jupyter using the following command:

# WINDOWS (PowerShell)
py -m pip install --upgrade pip jupyter notebook ; jupyter notebook

# macOS (Terminal, with Homebrew)
brew install python && python3 -m pip install --upgrade pip jupyter notebook && jupyter notebook

# LINUX (Debian/Ubuntu)
sudo apt-get update && sudo apt-get install -y python3 python3-pip && python3 -m pip install --upgrade pip jupyter notebook && jupyter notebook

Start Jupyter by running jupyter notebook in your terminal, which opens a web interface for creating and managing your scraping projects. This environment provides the perfect balance of interactivity and reproducibility for financial data work.

Required Libraries: pandas, matplotlib, dateutil, scrapingbee

Now, it's time to install the essential libraries. Pandas handles data manipulation and CSV/Excel export, matplotlib creates visualizations, dateutil processes financial reporting dates, and ScrapingBee provides reliable web scraping capabilities.

# WINDOWS
py -m pip install --upgrade pip pandas matplotlib python-dateutil scrapingbee

# macOS/Linux
python3 -m pip install --upgrade pip pandas matplotlib python-dateutil scrapingbee

ScrapingBee’s Python SDK simplifies API integration with built-in error handling and response parsing. Your first request requires only your API key and target URL, making it accessible for beginners while offering advanced features for complex scraping scenarios.

Creating a Virtual Environment

Isolate your project dependencies using python -m venv financial_scraper followed by activation commands:

# WINDOWS (PowerShell)
py -m venv financial_scraper ; .\financial_scraper\Scripts\Activate.ps1

# macOS/Linux
python3 -m venv financial_scraper && source financial_scraper/bin/activate

# (optional) to leave later: deactivate

Virtual environments prevent conflicts between different web scraping projects and ensure reproducible installations across development and production environments.

Scraping a Single Company’s Financial Data (With ScrapingBee)

This Python scraping example demonstrates the complete workflow for extracting financial statements from SEC filings. Instead of wrestling with parsing HTML and anti-bot measures, ScrapingBee’s extract_rules feature transforms financial tables directly into a structured format that integrates seamlessly with Pandas DataFrames.

The process begins by identifying the target SEC filing URL, typically a 10-K annual report or 10-Q quarterly filing. Our API handles the HTTP request, JavaScript rendering if needed, and applies your extraction rules to pull specific financial tables from the document. This approach eliminates the common frustrations of changing HTML structures and blocked requests.

Here’s the complete implementation that fetches Apple’s latest 10-K filing and extracts the income statement:

from scrapingbee import ScrapingBeeClient
import pandas as pd
import json

client = ScrapingBeeClient(api_key='YOUR_API_KEY')

# Target Apple's latest 10-K filing
filing_url = 'https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230930.htm'

response = client.get(
    filing_url,
    params={
        'extract_rules': {
            'income_statement': {
                'selector': 'table:contains("Net sales")',
                'type': 'table',
                'output': 'table_array'
            }
        },
        'render_js': True
    }
)

financial_data = response['extract_rules']['income_statement']
df = pd.DataFrame(financial_data[1:], columns=financial_data[0])
print(df.head())

That's how you launch your web scraper and collect data with our API. Now, let's take a look at more complex scenarios you need to know for data-driven decision making.

Using ScrapingBee’s Extract Rules To Target Tables

Our APIs extract_rules provide powerful CSS selector targeting that identifies specific financial tables within complex SEC documents. The selector parameter uses CSS syntax to locate tables containing key financial terms like “Net sales” or “Total assets”, while the type: 'table' specification tells the API to parse the HTML table structure into structured data.

You can force specific selector types using the selector_type parameter, choosing between CSS selectors and XPath expressions depending on your targeting needs. For most financial statements, CSS selectors prove sufficient and more readable, but XPath offers additional precision for complex document structures.

The output: 'table_array' format returns data as a two-dimensional array where the first row contains column headers and subsequent rows contain the actual financial data. This format integrates perfectly with pandas DataFrame construction, eliminating manual parsing steps.

Setting Date Ranges With Filing Choice

Rather than using query parameters to filter dates, you select specific filing URLs that correspond to your desired time periods. Each SEC filing URL contains embedded date information, allowing precise control over which quarterly or annual reports you’re accessing.

For dynamic content that loads after the initial page render, enable JavaScript processing with the render_js: True parameter. This ensures that ScrapingBee fully loads the page content before applying your extraction rules, capturing data that might otherwise be missed.

Displaying and Understanding the DataFrame

Once you’ve created your DataFrame, use print(df.head()) to examine the first few rows and verify successful extraction. Financial data often requires column name cleanup to remove extra whitespace or special characters that interfere with analysis.

Common cleanup steps include df.columns = df.columns.str.strip() to remove whitespace and df.replace('—', 0) to convert dash symbols to zeros for numeric calculations.

Scraping Multiple Companies at Once

Scaling your scraping operation to handle several company websites at once requires careful planning around rate limits, error handling, and data organization. The most effective approach involves creating a list of target companies with their corresponding SEC filing URLs, then iterating through this list while implementing appropriate delays and error recovery.

This Python scraping example shows how to process multiple companies efficiently while saving individual CSV files for each organization. The key is balancing speed with reliability, ensuring that temporary network issues or missing data don’t derail your entire batch operation.

from scrapingbee import ScrapingBeeClient
import pandas as pd, json, time

client = ScrapingBeeClient(api_key="YOUR_API_KEY")
rules = {
    "financials": {
        "selector_type": "xpath",
        "selector": '//table[.//text()[contains(., "Net sales")] or .//text()[contains(., "Total net sales")]]',
        "type": "table",
        "output": "table_array"
    }
}

companies = [
    {"ticker": "AAPL", "filing_url": "https://www.sec.gov/Archives/edgar/data/320193/000032019323000077/aapl-20230930.htm"},
    {"ticker": "MSFT", "filing_url": "https://www.sec.gov/Archives/edgar/data/789019/000156459023013030/msft-20230630.htm"},
    {"ticker": "GOOGL","filing_url": "https://www.sec.gov/Archives/edgar/data/1652044/000165204424000023/goog-20231231.htm"},
]

for c in companies:
    try:
        r = client.get(c["filing_url"], params={"extract_rules": json.dumps(rules)})
        data = json.loads(r.text)
        table = data.get("financials", [])
        if not table or len(table) < 2:
            print(f"No table for {c['ticker']}; skipping.")
            continue
        df = pd.DataFrame(table[1:], columns=table[0])
        df.to_csv(f"{c['ticker']}_financials.csv", index=False)
        time.sleep(1)  # be polite
    except Exception as e:
        print(f"Error processing {c['ticker']}: {e}")

For handling larger datasets efficiently, consider our guide on Make concurrent requests in Python which covers advanced techniques for parallel processing while respecting rate limits.

Creating a List of Tickers

Start with a simple Python list containing ticker symbols and their corresponding SEC filing URLs. You can expand this to include additional metadata like company names, sectors, or specific filing types, depending on your analysis needs.

Map each ticker to its most recent 10-K or 10-Q filing URL by checking the SEC’s EDGAR database or using automated discovery methods.

Using a For Loop To Iterate

The basic iteration pattern processes each company sequentially, applying the same extraction rules and saving results to individual files. Include a time.sleep(1) delay between requests to avoid overwhelming the target servers and respect ScrapingBee’s rate limits.

Consider implementing progress tracking with print(f"Processing {company['ticker']}...") statements to monitor batch job progress, especially when processing large company lists.

Handling Errors and Missing Data

Wrap each scraping operation in try/except blocks to handle network timeouts, missing filings, or parsing errors gracefully. Log errors with sufficient detail to enable debugging while allowing the batch process to continue with the remaining companies.

Implement fallback strategies for empty tables or missing data, such as creating placeholder entries or skipping companies with insufficient information, rather than terminating the entire process.

Automating and Analyzing the Data

Once you’ve successfully extracted financial data, the next step involves creating automated analysis workflows that transform raw numbers into actionable insights. This process typically includes data cleaning, calculation of key financial ratios, trend analysis, and visualization of important metrics over time.

The automation aspect becomes particularly powerful when combined with scheduled execution, allowing you to maintain real-time data without manual intervention. Python’s scheduling capabilities, combined with ScrapingBee’s reliable data extraction, create a robust foundation for ongoing financial monitoring and analysis.

Here’s how to implement a complete analysis pipeline that processes your scraped data and generates meaningful insights:

def analyze_financial_data(df):
    # Clean and convert data types
    numeric_columns = ['Revenue', 'Net Income', 'Total Assets']
    for col in numeric_columns:
        df[col] = pd.to_numeric(df[col].str.replace(',', ''), errors='coerce')
    
    # Calculate key ratios
    df['Profit Margin'] = (df['Net Income'] / df['Revenue']) * 100
    df['Revenue Growth'] = df['Revenue'].pct_change() * 100
    
    return df

For complex scenarios involving dynamic content and advanced automation, explore our JavaScript scenario guide, which covers handling sophisticated web applications.

Saving Data to Excel or CSV

Export your processed data using pandas’ built-in functions: df.to_csv('financial_analysis.csv', index=False) for CSV format or df.to_excel('financial_analysis.xlsx', index=False) for Excel compatibility.

Name

Excel format proves particularly useful for financial analysis since it preserves formatting and enables easy sharing with stakeholders who prefer spreadsheet applications.

Calculating Moving Averages

Implement trend analysis using pandas’ rolling window functions: df['Revenue_MA'] = df['Revenue'].rolling(window=4).mean() creates a four-quarter moving average that smooths seasonal variations.

Moving averages help identify underlying trends in financial performance by reducing the impact of quarterly fluctuations and one-time events.

Visualizing Trends With matplotlib

Create compelling visualizations using matplotlib: plt.plot(df['Date'], df['Revenue']) generates basic line charts, while more sophisticated plots can highlight trends, comparisons, and key inflection points in financial performance.

Effective financial visualizations often combine multiple metrics on the same chart, using different y-axis scales to show relationships between revenue growth and profitability trends.

Intro to Algorithmic Trading With Scraped Data

Financial statement data provides fundamental analysis inputs for algorithmic trading strategies, particularly for long-term value investing approaches. Basic backtesting frameworks can evaluate how financial metrics correlate with future stock performance.

Keep initial algorithmic applications simple, focusing on clear relationships between financial health indicators and stock price movements rather than complex predictive models.

A Better Way To Scrape At Scale

When your financial data needs grow beyond individual company analysis, ScrapingBee’s infrastructure becomes essential for handling large-scale operations. The platform manages rotating proxies, JavaScript rendering, and anti-bot detection automatically, allowing your Python scripts to focus on data processing rather than technical hurdles.

Our API handles the complexities that typically derail large scraping projects: IP rotation prevents blocks, browser automation manages dynamic content, and built-in retry logic ensures relevant data collection even when individual requests fail. This reliability becomes crucial when processing hundreds of companies or maintaining daily data updates.

The best part is that our Web Scraping API can handle various use cases, from stock market data and market capitalization insight to news. It is an excellent tool for making informed decisions based on data analysis.

The platform offers 1,000 free API calls to get started, providing ample opportunity to test your financial scraping workflows before committing to larger operations. This approach proves far more cost-effective than managing your own proxy infrastructure and browser automation systems.

Frequently Asked Questions (FAQs)

What are the main benefits of using Python to scrape financial statements?

Python offers powerful libraries like Pandas for data manipulation, extensive community support for financial analysis, and seamless integration with ScrapingBee’s API. The programming language’s readability makes it accessible for beginners while providing advanced capabilities for complex financial modeling and market sentiment analysis.

Which Python libraries are essential for scraping financial data?

The core libraries include Pandas for data manipulation, Requests or ScrapingBee for web scraping, Matplotlib for visualization, and dateutil for handling financial reporting dates. BeautifulSoup helps with HTML parsing when needed, while numpy supports numerical calculations for financial ratios and trend analysis.

How can I handle errors when scraping data for multiple companies?

Implement try/except blocks around each company’s scraping operation, log errors with sufficient detail for debugging, and use continue statements to process remaining companies when individual requests fail. Include retry logic for temporary network issues and validate data completeness before saving results.

What types of financial statements can I scrape with Python and ScrapingBee?

You can extract income statements, balance sheets, cash flow statements, and notes to financial statements from SEC filings. The system handles 10-K annual reports, 10-Q quarterly filings, and 8-K current reports, as well as specialized forms such as proxy statements and insider trading reports.

Kevin Sahin

Kevin worked in the web scraping industry for 10 years before co-founding ScrapingBee. He is also the author of the Java Web Scraping Handbook.