How to get file type of an URL in Python?

You can get the file type of a URL in Python via two different methods.

  1. Use the mimetypes module

mimetypes module comes by default with Python and can infer the file type from the URL. This relies on the file extension being present in the URL. Here is some sample code:

import mimetypes

mimetypes.guess_type("http://example.com/file.pdf")
# Output: ('application/pdf', None)

mimetypes.guess_type("http://example.com/file")
# Output: (None, None)
  1. Perform a HEAD request to the URL and investigate the response headers

A head request does not download the whole response but rather makes a short request to a URL to get some metadata. An important piece of information that it provides is the Content-Type of the response. This can give you a very good idea of the file type of a URL. Here is some sample code for making a HEAD request and figuring out the file type:

import requests

response = requests.head("https://scrapingbee.com")
print(response.headers['Content-Type'])
# Output: 'text/html; charset=utf-8'

response = requests.head("https://practicalpython.yasoob.me/_static/images/book-cover.png")
print(response.headers['Content-Type'])
# Output: image/png

Related Web Crawling web scraping questions: