Welcome to pygixml

pygixml is a high-performance XML parser for Python built on Cython and pugixml. It delivers fast parsing, full XPath 1.0 support, and a clean Pythonic API for reading, writing, and transforming XML.

New to XML? Start with What is XML? for a primer on the format, its structure, and real-world applications.

Note

Enjoy pygixml? Star the project on GitHub to support the development: https://github.com/MohammadRaziei/pygixml

Why pygixml?

Speed — pugixml is one of the fastest XML parsers available. pygixml brings that speed directly to Python:

Library

Avg Time

Speedup vs ElementTree

pygixml

0.0009 s

9.2× faster

lxml

0.0041 s

2.0× faster

ElementTree

0.0083 s

1.0× (baseline)

(Benchmark: parsing a document with 5 000 elements. See Performance for the full comparison.)

Features

  • Blazing-fast parsing — up to 14× faster than ElementTree

  • Full XPath 1.0 — complete query engine with all standard functions

  • Memory efficient — zero-copy C++ memory management via pugixml

  • Pythonic API — intuitive methods and properties, not a direct C++ mirror

  • Cross-platform — Windows, Linux, macOS

  • Text extraction — recursive text gathering with configurable joins

  • XML serialization — output with custom indentation (spaces or integer)

  • Node iteration — depth-first traversal of the entire document

  • Node identity — memory-based ID for debugging and comparison

Quick Example

import pygixml

doc = pygixml.parse_string("""
<library>
    <book id="1">
        <title>The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
    </book>
</library>
""")

# Access elements and attributes
root = doc.root
book = root.child("book")
print(book.name)                              # → book
print(book.attribute("id").value)             # → 1
print(book.child("title").text())             # → The Great Gatsby

# XPath queries
titles = root.select_nodes("book/title")
for t in titles:
    print(t.node.text())                      # → The Great Gatsby

# Create and save
doc = pygixml.XMLDocument()
root = doc.append_child("catalog")
root.append_child("item").set_value("Hello")
doc.save_file("output.xml")

Core Classes

See the API Reference for the complete reference with every class, method, and property documented.

Class

Description

XMLDocument

Document-level operations: load, save, append-child

XMLNode

Navigate, read, and modify individual nodes

XMLAttribute

Attribute name and value access

XPathQuery

Pre-compiled XPath queries for repeated evaluation

XPathNode

Single XPath result (wraps a node or attribute)

XPathNodeSet

Collection of XPath results

Pythonic Extensions

pugixml gives pygixml its speed, but the API you actually use goes well beyond what the C++ library provides. pygixml adds several features that make working with XML from Python natural and productive:

  • text — recursive text extraction with configurable joins. One call to gather all text content from an element and its descendants.

  • children() — iterate direct child elements only (or all descendants with recursive=True), no manual sibling walking.

  • xpath — generate an absolute XPath to any node using a custom O(depth) algorithm. Not available in pugixml natively.

  • xml — serialize a node and its subtree to a formatted XML string in one property.

  • mem_id — a unique numeric identifier for each node, ideal for caching and dictionary-based lookups.

  • to_string() — customizable XML serialization with string or integer indentation.

These are pygixml’s own contributions — you won’t find equivalents in pugixml. See the API Reference for full documentation of every member.

Note

Properties vs Methods — pygixml uses properties for simple accessors and methods for operations that take arguments:

Properties (no parentheses): node.name, node.value, node.type, node.parent, node.next_sibling, node.previous_sibling, node.xml, node.xpath, attr.name, attr.value, attr.next_attribute, doc.root

Methods (need parentheses): node.child(name), node.first_child(), node.append_child(name), node.child_value(name), node.set_value(v), node.first_attribute(), node.attribute(name), node.select_nodes(query), node.select_node(query), node.text(), node.to_string()

XPath Support

pygixml exposes pugixml’s full XPath 1.0 engine:

  • Axes: child::, attribute::, descendant::, ancestor::

  • Predicates: book[@id='1'], book[year > 1950]

  • Functions: position(), last(), count(), sum(), string(), number(), concat(), substring()

  • Operators: and, or, not(), =, !=, <, >, +, -, *, div, mod

  • Wildcards: *, @*, node()

See XPath Support for a detailed walkthrough.

Installation

From PyPI

pip install pygixml

From source

pip install git+https://github.com/MohammadRaziei/pygixml.git

Documentation Contents

Indices and tables