liburlparser
Fastest domain extractor library written in C++ with Python binding. First and complete library for parsing URL in C++, Python, and Command Line.
Why liburlparser?
liburlparser is a simple, lightweight, and fast library for parsing URLs and hosts that:
- Extracts components like protocol, host, fragment, query, and path
- Processes hosts to extract subdomain, domain, domain-name, and suffix
- Functions as a top-level domain extractor, correctly identifying that in "ee.aut.ac.ir", the entire "ac.ir" is the suffix, not just ".ir"
- Supports international domain names and non-ASCII characters
- Written in C++ but provides easy-to-use interfaces for Python and Command Line
- Practical and clean design for efficient implementation
Powerful Features
Engineered for high-performance URL parsing with minimal dependencies
Multiple Language Support
Use in Python, C++, and Shell with an intuitive and consistent interface
Public Suffix List Support
Handles known combinatorial suffixes like 'ac.ir' and unknown suffixes
Clean Code Design
Separate Url and Host classes for more organized and maintainable code
Automatic Updates
Public Suffix List updates automatically before each build and deployment
High Performance
Outperforms other domain extraction libraries in both host and URL parsing
Cross-Platform
Seamless performance across Windows, Linux & macOS
Performance Comparison
LibUrlParser outperforms other domain extraction libraries in both host and URL parsing:
Extract From Host (10 million domains)
Library | Function | Time |
---|---|---|
liburlparser | liburlparser.Host | 1.12s |
PyDomainExtractor | pydomainextractor.extract | 1.50s |
publicsuffix2 | publicsuffix2.get_sld | 9.92s |
tldextract | __call__ | 29.23s |
tld | tld.parse_tld | 34.48s |
Installation Guide
Step 1: Select Language
|
|
Step 2: Select Installation Method
|
|
Installation Commands
# Install from PyPI
pip install liburlparser
Basic Usage Example
Select Language
|
|
Example Code (Python)
from liburlparser import Url, Host
# Parse URL
url = Url("https://ee.aut.ac.ir/#id")
print(url.suffix, url.domain, url.fragment)
# Parse host
host = Host("ee.aut.ac.ir")
print(host.domain, host.suffix)