liburlparser

Fastest domain extractor library written in C++ with Python binding. First and complete library for parsing URL in C++, Python, and Command Line.

C++ Documentation Python Documentation GitHub Repository

Why liburlparser?

liburlparser is a simple, lightweight, and fast library for parsing URLs and hosts that:

Extracts components like protocol, host, fragment, query, and path
Processes hosts to extract subdomain, domain, domain-name, and suffix
Functions as a top-level domain extractor, correctly identifying that in "ee.aut.ac.ir", the entire "ac.ir" is the suffix, not just ".ir"
Supports international domain names and non-ASCII characters
Written in C++ but provides easy-to-use interfaces for Python and Command Line
Practical and clean design for efficient implementation

Powerful Features

Engineered for high-performance URL parsing with minimal dependencies

Multiple Language Support

Use in Python, C++, and Shell with an intuitive and consistent interface

Public Suffix List Support

Handles known combinatorial suffixes like 'ac.ir' and unknown suffixes

Clean Code Design

Separate Url and Host classes for more organized and maintainable code

Automatic Updates

Public Suffix List updates automatically before each build and deployment

High Performance

Outperforms other domain extraction libraries in both host and URL parsing

Cross-Platform

Seamless performance across Windows, Linux & macOS

Performance Comparison

LibUrlParser outperforms other domain extraction libraries in both host and URL parsing:

Extract From Host (10 million domains)

Library	Function	Time
liburlparser	liburlparser.Host	1.12s
PyDomainExtractor	pydomainextractor.extract	1.50s
publicsuffix2	publicsuffix2.get_sld	9.92s
tldextract	__call__	29.23s
tld	tld.parse_tld	34.48s

Installation Guide

Step 1: Select Language
Step 2: Select Installation Method

Installation Commands

                  # Install from PyPI
pip install liburlparser

Basic Usage Example

Select Language

Example Code (Python)

              from liburlparser import Url, Host

# Parse URL
url = Url("https://ee.aut.ac.ir/#id")
print(url.suffix, url.domain, url.fragment)

# Parse host
host = Host("ee.aut.ac.ir")
print(host.domain, host.suffix)