Basic Usage

This guide covers the fundamental operations of liburlparser in Python.

Importing the Library

from liburlparser import Url, Host

Parsing URLs

The Url class is used to parse complete URLs:

# Create a URL object
url = Url("https://mail.google.com/about?q=test#section")

# Access URL components
print(url)                # <Url :'https://mail.google.com/about?q=test#section'>
print(url.protocol)       # https
print(url.host)           # <Host :'mail.google.com'>
print(url.subdomain)      # mail
print(url.domain)         # google
print(url.domain_name)    # google
print(url.suffix)         # com
print(url.port)           # 0 (default)
print(url.query)          # q=test
print(url.fragment)       # section

Parsing Hosts

The Host class is used to parse hostnames:

# Create a Host object directly
host = Host("mail.google.com")

# Access host components
print(host)               # <Host :'mail.google.com'>
print(host.subdomain)     # mail
print(host.domain)        # google
print(host.domain_name)   # google
print(host.suffix)        # com

Getting Host from URL

There are multiple ways to extract host information from a URL:

# Method 1: Get the host from a URL object
url = Url("https://mail.google.com/about")
host = url.host

# Method 2: Use the Host.from_url static method
host = Host.from_url("https://mail.google.com/about")

# Method 3: Just extract the host string (fastest)
host_str = Url.extract_host("https://mail.google.com/about")
print(host_str)  # mail.google.com

Converting to Dictionary or JSON

Both Url and Host objects can be converted to dictionaries or JSON strings:

# URL to dictionary
url = Url("https://mail.google.com/about?q=test#section")
url_dict = url.to_dict()
print(url_dict)
# Output: {'str': 'https://mail.google.com/about?q=test#section', 'protocol': 'https', 'userinfo': '', 'host': {'str': 'mail.google.com', 'subdomain': 'mail', 'domain': 'google', 'domain_name': 'google', 'suffix': 'com'}, 'port': 0, 'query': 'q=test', 'fragment': 'section'}

# URL to JSON
url_json = url.to_json()
print(url_json)
# Output: {"str": "https://mail.google.com/about?q=test#section", "protocol": "https", "userinfo": "", "host": {"str": "mail.google.com", "subdomain": "mail", "domain": "google", "domain_name": "google", "suffix": "com"}, "port": 0, "query": "q=test", "fragment": "section"}

# Host to dictionary
host = Host("mail.google.com")
host_dict = host.to_dict()
print(host_dict)
# Output: {'str': 'mail.google.com', 'subdomain': 'mail', 'domain': 'google', 'domain_name': 'google', 'suffix': 'com'}

# Host to JSON
host_json = host.to_json()
print(host_json)
# Output: {"str": "mail.google.com", "subdomain": "mail", "domain": "google", "domain_name": "google", "suffix": "com"}

Quick Domain Extraction

If you only need the domain components without creating full objects:

# From a host string
result = Host.extract("mail.google.com")
print(result)  # {'suffix': 'com', 'domain': 'google', 'subdomain': 'mail'}

# From a URL string
result = Host.extract_from_url("https://mail.google.com/about")
print(result)  # {'suffix': 'com', 'domain': 'google', 'subdomain': 'mail'}

Ignoring "www" Subdomain

You can choose to ignore the "www" subdomain when parsing:

# Default behavior
host = Host("www.example.com")
print(host.subdomain)  # www
print(host.domain)     # example

# Ignore www
host = Host("www.example.com", ignore_www=True)
print(host.subdomain)  # (empty string)
print(host.domain)     # example

# Same for URLs
url = Url("https://www.example.com/about", ignore_www=True)
print(url.subdomain)   # (empty string)
print(url.domain)      # example

Removing "www" from a Host String

host_str = Host.removeWWW("www.example.com")
print(host_str)  # example.com

Complete Example

Here's a complete example that demonstrates parsing a URL and accessing its components:

from liburlparser import Url, Host

def analyze_url(url_str):
    # Parse the URL
    url = Url(url_str)

    # Print URL components
    print(f"Full URL: {url}")
    print(f"Protocol: {url.protocol}")
    print(f"Host: {url.host}")
    print(f"Subdomain: {url.subdomain}")
    print(f"Domain: {url.domain}")
    print(f"Suffix: {url.suffix}")
    print(f"Port: {url.port}")
    print(f"Query: {url.query}")
    print(f"Fragment: {url.fragment}")

    # Convert to JSON
    print(f"JSON: {url.to_json()}")

# Test with a sample URL
analyze_url("https://mail.google.com/about?q=test#section")