How to Parse URLs in Python: A Comprehensive Guide with Examples
Parsing URLs (Uniform Resource Locators) is a common task in web development and data processing. URLs are the addresses that identify resources on the internet, such as web pages, images, and documents. Python provides powerful libraries and tools to easily parse and manipulate URLs, enabling developers to extract specific components and perform various operations on them.
Introduction to URL Parsing
URL parsing involves breaking down a URL into its constituent parts, such as the scheme (e.g., “http”), host (e.g., “www.example.com ”), path (e.g., “/page”), query parameters (e.g., “key=value”), and more. Python provides several libraries to facilitate URL parsing, each with its own set of features and capabilities.
Using the urllib.parse Module
The urllib.parse
module is part of the Python standard library and offers functions to parse and manipulate URLs. It provides a comprehensive set of tools for handling URLs.
Example: Parsing a Simple URL
>>> from urllib.parse import urlparse
>>>
>>> url = "https://www.example.com/page?query=value"
>>> parsed_url = urlparse(url)
>>>
>>> print("Scheme:", parsed_url.scheme)
Scheme: https
>>> print("Netloc:", parsed_url.netloc)
Netloc: www.example.com
>>> print("Path:", parsed_url.path)
Path: /page
>>> print("Query:", parsed_url.query)
Query: query=value
>>>
Example: Extracting Query Parameters
>>> from urllib.parse import parse_qs
>>>
>>> query_string = "key1=value1&key2=value2&key3=value3"
>>> query_params = parse_qs(query_string)
>>>
>>> for key, values in query_params.items():
... print(key, ":", values)
...
key1 : ['value1']
key2 : ['value2']
key3 : ['value3']
Advanced URL Parsing with the furl Library
The furl library is a powerful and user-friendly option for parsing and manipulating URLs, including handling relative URLs.
pip3 install furl
Example: Handling Relative URLs
>>> from furl import furl
>>>
>>> base_url = "https://www.example.com/page/"
>>> relative_url = "../otherpage"
>>> absolute_url = furl(base_url).join(relative_url)
>>>
>>> print("Absolute URL:", absolute_url.url)
Absolute URL: https://www.example.com/otherpage
Example: Combining and Resolving URLs
>>> from furl import furl
>>>
>>> url1 = "https://www.example.com/page"
>>> url2 = "otherdir/otherpage"
>>> combined_url = furl(url1).join(url2)
>>>
>>> print("Combined URL:", combined_url.url)
Combined URL: https://www.example.com/otherdir/otherpage
Related pages:
- Python: How to print literal curly brace { or } in f-string and format string
- Python unicode string lowercase and caseless match
References

OmniLock - Block / Hide App on iOS
Block distractive apps from appearing on the Home Screen and App Library, enhance your focus and reduce screen time.

DNS Firewall for iOS and Mac OS
Encrypted your DNS to protect your privacy and firewall to block phishing, malicious domains, block ads in all browsers and apps