liburlparser Logo

liburlparser

Fastest domain extractor library written in C++ with Python binding. First and complete library for parsing URL in C++, Python, and Command Line.

Why liburlparser?

liburlparser is a simple, lightweight, and fast library for parsing URLs and hosts that:

  • Extracts components like protocol, host, fragment, query, and path
  • Processes hosts to extract subdomain, domain, domain-name, and suffix
  • Functions as a top-level domain extractor, correctly identifying that in "ee.aut.ac.ir", the entire "ac.ir" is the suffix, not just ".ir"
  • Supports international domain names and non-ASCII characters
  • Written in C++ but provides easy-to-use interfaces for Python and Command Line
  • Practical and clean design for efficient implementation

Powerful Features

Engineered for high-performance URL parsing with minimal dependencies

Multiple Language Support

Use in Python, C++, and Shell with an intuitive and consistent interface

Public Suffix List Support

Handles known combinatorial suffixes like 'ac.ir' and unknown suffixes

Clean Code Design

Separate Url and Host classes for more organized and maintainable code

Automatic Updates

Public Suffix List updates automatically before each build and deployment

High Performance

Outperforms other domain extraction libraries in both host and URL parsing

Cross-Platform

Seamless performance across Windows, Linux & macOS

Performance Comparison

LibUrlParser outperforms other domain extraction libraries in both host and URL parsing:

Extract From Host (10 million domains)

Library Function Time
liburlparser liburlparser.Host 1.12s
PyDomainExtractor pydomainextractor.extract 1.50s
publicsuffix2 publicsuffix2.get_sld 9.92s
tldextract __call__ 29.23s
tld tld.parse_tld 34.48s

Installation Guide

Step 1: Select Language
Step 2: Select Installation Method

Installation Commands

                  # Install from PyPI
pip install liburlparser

Basic Usage Example

Select Language

Example Code (Python)

              from liburlparser import Url, Host

# Parse URL
url = Url("https://ee.aut.ac.ir/#id")
print(url.suffix, url.domain, url.fragment)

# Parse host
host = Host("ee.aut.ac.ir")
print(host.domain, host.suffix)