liburlparser for Python¶
Fastest domain extractor library for Python
Complete library for parsing URLs with Python and Command Line
Overview¶
liburlparser is a powerful domain extractor library written in C++ with Python bindings. It provides efficient URL parsing capabilities for Python, making it a valuable tool for projects that involve working with web addresses.
Key Features¶
- High Performance: Significantly faster than pure Python alternatives
- Intuitive Interface: Simple, easy-to-use API
- Clean Code Design: Separate
Url
andHost
classes for organized code - Public Suffix List Support: Handles known combinatorial suffixes (e.g., "ac.ir")
- Unknown Suffix Support: Can handle unknown suffixes (e.g., "comm" in "google.comm")
- Automatic PSL Updates: Updates the public_suffix_list automatically
- Comprehensive Properties: Access all parts of URLs and hosts with simple property access
- Command Line Interface: Parse URLs directly from the command line
Quick Start¶
from liburlparser import Url, Host
# Parse a URL
url = Url("https://mail.google.com/about")
print(url.domain) # Output: google
print(url.suffix) # Output: com
print(url.protocol) # Output: https
# Parse a host
host = Host("mail.google.com")
print(host.subdomain) # Output: mail
print(host.domain) # Output: google
print(host.suffix) # Output: com
Command Line¶
python -m liburlparser --url "https://mail.google.com/about" | jq
python -m liburlparser --host "mail.google.com" | jq
Why liburlparser?¶
- Performance: Significantly faster than other domain extraction libraries
- Ease of Use: Simple, intuitive API
- Comprehensive: Handles all parts of URLs and hosts
- Reliable: Built on the Public Suffix List for accurate domain extraction
Check out the Installation guide to get started, or dive into the Basic Usage documentation to learn more.
Performance Comparison¶
Extract From Host¶
Tests were run on a file containing 10 million random domains from various top-level domains:
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host | 1.12s |
PyDomainExtractor | pydomainextractor.extract | 1.50s |
publicsuffix2 | publicsuffix2.get_sld | 9.92s |
tldextract | __call__ | 29.23s |
tld | tld.parse_tld | 34.48s |
Extract From URL¶
Tests were run on a file containing 1 million random URLs:
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host.from_url | 2.10s |
PyDomainExtractor | pydomainextractor.extract_from_url | 2.24s |
publicsuffix2 | publicsuffix2.get_sld | 10.84s |
tldextract | __call__ | 36.04s |
tld | tld.parse_tld | 57.87s |