Welcome to pygixml¶

pygixml (Python Giant XML) is a high-performance Cython framework bridging two specialized C++ engines: pugixml for its in-memory DOM parser (full XPath 1.0, Objectify — Dotted Navigation, Dictify — XML to Dict), and an inlined yxml push parser for true constant-memory streaming. The result is a faster, constant-memory alternative to lxml and xmltodict — everything they do, plus a streaming layer neither of them has, which is what makes pygixml the package to reach for once a dataset gets massive.

New to XML? Start with What is XML? for a primer on the format, its structure, and real-world applications.

Note

Enjoy pygixml? Star the project on GitHub to support the development: https://github.com/MohammadRaziei/pygixml

Why pygixml?¶

Speed — pugixml is one of the fastest XML parsers available. pygixml brings that speed directly to Python:

Library	Avg Time	Speedup vs ElementTree
pygixml	0.0009 s	9.2× faster
lxml	0.0041 s	2.0× faster
ElementTree	0.0083 s	1.0× (baseline)

(Benchmark: parsing a document with 5 000 elements. See Performance for the full comparison.)

Features¶

Blazing-fast parsing — up to 14× faster than ElementTree
Full XPath 1.0 — complete query engine with all standard functions
Memory efficient — zero-copy C++ memory management via pugixml
Pythonic API — intuitive methods and properties, not a direct C++ mirror
objectify — lxml.objectify-style dotted navigation (root.user.name)
dictify — xmltodict-compatible XML → dict conversion
jsonify — direct XML → JSON, in memory or streamed straight to disk in constant memory (see Jsonify — XML to JSON)
Streaming — constant-memory, ElementTree-style incremental parsing for documents too big to load whole (see Streaming — Constant-Memory Parsing for Big XML)
Cross-platform — Windows, Linux, macOS
Text extraction — recursive text gathering with configurable joins
XML serialization — output with custom indentation
Node iteration — depth-first traversal of the entire document
Node identity — memory-based ID for debugging and comparison

Quick Example¶

import pygixml

doc = pygixml.parse_string("""
<library>
    <book id="1">
        <title>The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
    </book>
</library>
""")

# Low-level API
root = doc.root
book = root.child("book")
print(book.attribute("id").value)        # → 1
print(book.child("title").text())        # → The Great Gatsby

# XPath queries
titles = root.select_nodes("book/title")
for t in titles:
    print(t.node.text())                      # → The Great Gatsby

# Create and save
doc = pygixml.XMLDocument()
root = doc.append_child("catalog")
root.append_child("item").set_value("Hello")
doc.save_file("output.xml")

# objectify — dotted navigation
from pygixml import objectify
root = objectify.from_string(xml)
print(root.book.title())                 # → 'The Great Gatsby'
print(root.book.id)                      # → 1  (int)

# dictify — XML to dict
from pygixml import dictify
d = dictify.parse(xml)
print(d['library']['book']['@id'])

# jsonify — direct XML to JSON
from pygixml import jsonify
print(jsonify.dumps(xml))

# streaming — constant memory, for files too big to load whole
for book in pygixml.iterfind("library.xml", "book"):
    print(book.get("id"), book.findtext("title"))
    book.clear()

Core Classes¶

See the API Reference for the complete reference.

Class / Module	Description
`XMLDocument`	Document-level operations: load, save, append-child
`XMLNode`	Navigate, read, and modify individual nodes
`XMLAttribute`	Attribute name and value access
`XPathQuery`	Pre-compiled XPath queries for repeated evaluation
`XPathNode`	Single XPath result (wraps a node or attribute)
`XPathNodeSet`	Collection of XPath results
objectify	lxml.objectify-style dotted navigation
dictify	xmltodict-compatible XML → dict conversion
jsonify	Direct XML → JSON, in memory or streamed to disk in constant memory
streaming	`iterparse`/`iterfind` — constant-memory parsing for big XML

Pythonic Extensions¶

pugixml gives pygixml its speed, but the API you actually use goes well beyond what the C++ library provides:

text — recursive text extraction with configurable joins. One call to gather all text content from an element and its descendants.
children() — iterate direct child elements only (or all descendants with recursive=True), no manual sibling walking.
xpath — generate an absolute XPath to any node using a custom O(depth) algorithm. Not available in pugixml natively.
xml — serialize a node to formatted XML in one property.
mem_id — a unique numeric identifier for each node, ideal for caching and dictionary-based lookups.
to_string() — customizable XML serialization with string or integer indentation.
objectify — navigate XML like a Python object tree.
dictify — convert XML to dict / JSON with one call.
jsonify — convert XML straight to JSON, in memory or streamed file-to-file in constant memory.
streaming — iterparse/iterfind for documents too large to ever load as a full DOM tree.

Note

Properties vs Methods — pygixml uses properties for simple accessors and methods for operations that take arguments:

Properties (no parentheses): node.name, node.value, node.type, node.parent, node.next_sibling, node.previous_sibling, node.xml, node.xpath, attr.name, attr.value, attr.next_attribute, doc.root

Methods (need parentheses): node.child(name), node.first_child(), node.append_child(name), node.child_value(name), node.set_value(v), node.first_attribute(), node.attribute(name), node.select_nodes(query), node.select_node(query), node.text(), node.to_string()

XPath Support¶

pygixml exposes pugixml’s full XPath 1.0 engine:

Axes: child::, attribute::, descendant::, ancestor::
Predicates: book[@id='1'], book[year > 1950]
Functions: position(), last(), count(), sum(), string(), number(), concat(), substring()
Operators: and, or, not(), =, !=, <, >, +, -, *, div, mod
Wildcards: *, @*, node()

See XPath Support for a detailed walkthrough.

Installation¶

From PyPI

pip install pygixml

From source

pip install git+https://github.com/MohammadRaziei/pygixml.git

Documentation Contents¶

User Guide

Welcome to pygixml¶

Why pygixml?¶

Features¶

Quick Example¶

Core Classes¶

Pythonic Extensions¶

XPath Support¶

Installation¶

Documentation Contents¶

Indices and tables¶