liburlparser for Python¶

Fastest domain extractor library for Python

Complete library for parsing URLs with Python and Command Line

Overview¶

liburlparser is a powerful domain extractor library written in C++ with Python bindings. It provides efficient URL parsing capabilities for Python, making it a valuable tool for projects that involve working with web addresses.

Key Features¶

High Performance: Significantly faster than pure Python alternatives
Intuitive Interface: Simple, easy-to-use API
Clean Code Design: Separate Url and Host classes for organized code
Public Suffix List Support: Handles known combinatorial suffixes (e.g., "ac.ir")
Unknown Suffix Support: Can handle unknown suffixes (e.g., "comm" in "google.comm")
Automatic PSL Updates: Updates the public_suffix_list automatically
Comprehensive Properties: Access all parts of URLs and hosts with simple property access
Command Line Interface: Parse URLs directly from the command line

Quick Start¶

from liburlparser import Url, Host

# Parse a URL
url = Url("https://mail.google.com/about")
print(url.domain)  # Output: google
print(url.suffix)  # Output: com
print(url.protocol)  # Output: https

# Parse a host
host = Host("mail.google.com")
print(host.subdomain)  # Output: mail
print(host.domain)  # Output: google
print(host.suffix)  # Output: com

Command Line¶

python -m liburlparser --url "https://mail.google.com/about" | jq
python -m liburlparser --host "mail.google.com" | jq

Why liburlparser?¶

Performance: Significantly faster than other domain extraction libraries
Ease of Use: Simple, intuitive API
Comprehensive: Handles all parts of URLs and hosts
Reliable: Built on the Public Suffix List for accurate domain extraction

Check out the Installation guide to get started, or dive into the Basic Usage documentation to learn more.

Performance Comparison¶

Extract From Host¶

Tests were run on a file containing 10 million random domains from various top-level domains:

Library	Function	Time
liburlparser	liburlparser.Host	1.12s
PyDomainExtractor	pydomainextractor.extract	1.50s
publicsuffix2	publicsuffix2.get_sld	9.92s
tldextract	__call__	29.23s
tld	tld.parse_tld	34.48s

Extract From URL¶

Tests were run on a file containing 1 million random URLs:

Library	Function	Time
liburlparser	liburlparser.Host.from_url	2.10s
PyDomainExtractor	pydomainextractor.extract_from_url	2.24s
publicsuffix2	publicsuffix2.get_sld	10.84s
tldextract	__call__	36.04s
tld	tld.parse_tld	57.87s