Welcome to pygixml’s documentation!

pygixml is a high-performance XML parser for Python based on Cython and pugixml, providing fast XML parsing, manipulation, XPath queries, text extraction, and advanced XML processing capabilities.

Note

To use this library, you must star the project on GitHub! This helps support the development and shows appreciation for the work.

Star pygixml on GitHub: https://github.com/MohammadRaziei/pygixml

🚀 Performance

pygixml delivers exceptional performance compared to other XML libraries:

Performance Comparison (5000 XML elements)

Library

Parsing Time

Speedup vs ElementTree

pygixml

0.00077s

15.9x faster

lxml

0.00407s

3.0x faster

ElementTree

0.01220s

1.0x (baseline)

Performance Comparison Chart

Key Performance Highlights

  • 15.9x faster than Python’s ElementTree for XML parsing

  • 5.3x faster than lxml for XML parsing

  • Memory efficient - uses pugixml’s optimized C++ memory management

  • Scalable performance - maintains speed advantage across different XML sizes

Features

  • High Performance: 15.9x faster than Python’s ElementTree for XML parsing

  • Full XPath 1.0 Support: Complete XPath query capabilities with all standard functions

  • Memory Efficient: Uses pugixml’s optimized C++ memory management

  • Easy to Use: Pythonic API with comprehensive documentation

  • Cross-Platform: Works on Windows, Linux, and macOS

  • Text Extraction: Advanced text content extraction with recursive options

  • XML Serialization: Flexible XML output with custom indentation

  • Node Iteration: Depth-first iteration over document nodes

  • Node Comparison: Identity comparison and memory debugging

Quick Start

import pygixml

# Parse XML from string
xml_string = """
<library>
    <book id="1">
        <title>The Great Gatsby</title>
        <author>F. Scott Fitzgerald</author>
        <year>1925</year>
    </book>
</library>
"""

doc = pygixml.parse_string(xml_string)
root = doc.first_child()

# Access elements
book = root.first_child()
title = book.child("title")
print(f"Title: {title.child_value()}")  # Output: Title: The Great Gatsby

# Use XPath
books = root.select_nodes("book")
print(f"Found {len(books)} books")

# Create new XML
doc = pygixml.XMLDocument()
root = doc.append_child("catalog")
product = root.append_child("product")
product.name = "product"

# To add text content to an element, append a text node
text_node = product.append_child("")  # Empty name creates text node
text_node.value = "content"

Important

Element Nodes vs Text Nodes

In pugixml (and therefore pygixml), element nodes do not have values directly. Instead, they contain child text nodes that hold the text content.

# ❌ This will NOT work (element nodes don't have values):
element_node.value = "some text"

# ✅ Correct approach - use child_value() to get text content:
text_content = element_node.child_value()

# ✅ To set text content, you need to append a text node:
text_node = element_node.append_child("")  # Empty name creates text node
text_node.value = "some text"

Core Classes

  • XMLDocument: Create, parse, save XML documents

  • XMLNode: Navigate and manipulate XML nodes

  • XMLAttribute: Handle XML attributes

  • XPathQuery: Compile and execute XPath queries

  • XPathNode: Result of XPath queries (wraps nodes and attributes)

  • XPathNodeSet: Collection of XPath results

XPath Support

pygixml provides full XPath 1.0 support through pugixml’s powerful XPath engine:

Supported XPath Features

  • Node selection: //book, /library/book, book[1]

  • Attribute selection: book[@id], book[@category='fiction']

  • Boolean operations: and, or, not()

  • Comparison operators: =, !=, <, >, <=, >=

  • Mathematical operations: +, -, *, div, mod

  • Functions: position(), last(), count(), sum(), string(), number()

  • Axes: child::, attribute::, descendant::, ancestor::

  • Wildcards: *, @*, node()

Installation

From PyPI

pip install pygixml

From GitHub

pip install git+https://github.com/MohammadRaziei/pygixml.git

Documentation Contents

Indices and tables