How to get file type of an URL in Python?
You can get the file type of a URL in Python via two different methods.
- Use the
mimetypes module comes by default with Python and can infer the file type from the URL. This relies on the file extension being present in the URL. Here is some sample code:
import mimetypes mimetypes.guess_type("http://example.com/file.pdf") # Output: ('application/pdf', None) mimetypes.guess_type("http://example.com/file") # Output: (None, None)
- Perform a HEAD request to the URL and investigate the response headers
A head request does not download the whole response but rather makes a short request to a URL to get some metadata. An important piece of information that it provides is the
Content-Type of the response. This can give you a very good idea of the file type of a URL. Here is some sample code for making a HEAD request and figuring out the file type:
import requests response = requests.head("https://scrapingbee.com") print(response.headers['Content-Type']) # Output: 'text/html; charset=utf-8' response = requests.head("https://practicalpython.yasoob.me/_static/images/book-cover.png") print(response.headers['Content-Type']) # Output: image/png