API Reference

pygixml - Python wrapper for pugixml using Cython

A fast and efficient XML parser and manipulator for Python.

exception pygixml.PygiXMLError

Bases: ValueError

Raised when a pygixml operation fails.

Typical causes include malformed XML passed to parse_string() or parse_file(), or an attempt to set a name/value on a null or otherwise invalid node.

exception pygixml.PygiXMLNullNodeError

Bases: PygiXMLError

Raised when an operation that requires a valid node is called on a null node (e.g. setting attributes on an element that was never found).

class pygixml.ParseFlags(value)

Bases: IntFlag

Bitmask of parse options for parse_string() and parse_file().

Members are combined with the bitwise OR operator (|). When no flags are supplied the parser uses ParseFlags.DEFAULT (all standard processing enabled).

Use ParseFlags.MINIMAL when you only care about element structure and want the fastest possible parse — it skips escape processing, EOL normalization, and all whitespace handling.

Example:

>>> doc = pygixml.parse_string(xml, pygixml.ParseFlags.MINIMAL)
>>> # Combine specific flags:
>>> flags = pygixml.ParseFlags.COMMENTS | pygixml.ParseFlags.CDATA
>>> doc = pygixml.parse_string(xml, flags)

See the Parse Flags section in the documentation for a complete description of each flag.

CDATA = 4
COMMENTS = 2
DECLARATION = 256
DEFAULT = 116
DOCTYPE = 512
EMBED_PCDATA = 8192
EOL = 32
ESCAPES = 16
FRAGMENT = 4096
FULL = 887
MERGE_PCDATA = 16384
MINIMAL = 0
PI = 1
TRIM_PCDATA = 2048
WCONV_ATTRIBUTE = 64
WNORM_ATTRIBUTE = 128
WS_PCDATA = 8
WS_PCDATA_SINGLE = 1024
class pygixml.XMLAttribute

Bases: object

An XML attribute on an element (e.g. id="123").

Use XMLNode.attribute() or XMLNode.first_attribute() to obtain attributes.

Example:

>>> doc = pygixml.parse_string('<root id="42" class="main"/>')
>>> root = doc.root
>>> root.attribute('id').value
'42'
set_name(name)

Change the attribute name. Returns False if null.

set_value(value)

Change the attribute value. Returns False if null.

name

Return the attribute name.

Returns:

str | None

next_attribute

Get next attribute.

Returns:

Next attribute or None if no next attribute

Return type:

XMLAttribute

Example

>>> attr = node.first_attribute()
>>> next_attr = attr.next_attribute
value

Return the attribute value.

Returns:

str | None

class pygixml.XMLDocument

Bases: object

An XML document, providing document-level operations.

Use this class to load, create, save, and manipulate XML documents, or to access the root element and top-level children.

The most common entry point is parse_string() or parse_file(), which return an XMLDocument:

>>> doc = pygixml.parse_string('<root><item>value</item></root>')
>>> doc.root.child('item').text()
'value'

You can also build a document from scratch:

>>> doc = pygixml.XMLDocument()
>>> root = doc.append_child('catalog')
>>> item = root.append_child('item')
>>> item.set_value('content')

When processing many files in a loop, reuse a single document with reset() to avoid repeated allocations.

append_child(name)

Append a new child element and return it.

Parameters:

name (str) – Tag name for the new element. Pass an empty string to create a text node instead.

Returns:

The newly created element (or text node).

Return type:

XMLNode

Example:

>>> doc = pygixml.XMLDocument()
>>> root = doc.append_child('catalog')
>>> item = root.append_child('item')
>>> item.set_value('content')
child(name)

Return the first child element whose tag matches name, or None if no match is found.

Parameters:

name (str) – Element tag to look for.

Returns:

XMLNode | None

Example:

>>> doc = pygixml.parse_string('<root><item>value</item></root>')
>>> doc.child('item').text()
'value'
first_child()

Return the first child element, or None if the document is empty.

Returns:

XMLNode | None

Example:

>>> doc = pygixml.parse_string('<root><child/></root>')
>>> doc.first_child().name
'root'
load_file(path, options=4294967295)

Parse XML from a file and replace the current document content.

Reads and parses the file at path. Returns True on success, False if the file cannot be opened or does not contain well-formed XML.

Parameters:
  • path (str) – Path to the XML file.

  • options (ParseFlags) – Which parse flags to use. Defaults to ParseFlags.DEFAULT.

Returns:

True if loading succeeded, False otherwise.

Return type:

bool

Example:

>>> doc = pygixml.XMLDocument()
>>> doc.load_file('data.xml')
True
>>> doc.root.name
'root'
load_string(content, options=4294967295)

Parse XML from a string and replace the current document content.

Parses content and replaces whatever the document previously held. Returns True on success, False if the string is not well-formed.

Parameters:
  • content (str) – The XML source text.

  • options (ParseFlags) – Which parse flags to use. Defaults to ParseFlags.DEFAULT (full compliance). Use ParseFlags.MINIMAL for faster parsing when you don’t need escape processing, EOL normalization, or whitespace handling.

Returns:

True if parsing succeeded, False otherwise.

Return type:

bool

Example:

>>> doc = pygixml.XMLDocument()
>>> doc.load_string('<root><item>value</item></root>')
True
>>> doc.root.child('item').text()
'value'
Raises:

PygiXMLError – When the input is not well-formed XML (raised by parse_string(); this method returns False instead).

reset()

Clear all content, returning the document to its initial empty state.

Reuses the same underlying C++ document object, avoiding reallocation overhead when processing many files in a loop.

Example:

>>> doc = pygixml.parse_string('<root>content</root>')
>>> doc.reset()
>>> doc.root  # None — document is empty
save_file(path, indent='  ')

Serialize the document and write it to a file.

Parameters:
  • path (str) – Output file path. Existing files will be overwritten.

  • indent (str) – Indentation string used for pretty-printing. Defaults to two spaces. Pass an empty string for compact output with no indentation.

Example:

>>> doc = pygixml.parse_string('<root><item>value</item></root>')
>>> doc.save_file('output.xml')              # 2-space indent
>>> doc.save_file('compact.xml', indent='')  # no indent
to_string(indent='  ')

Serialize the document to an XML string.

Parameters:

indent (str | int) – Indentation — either a string (e.g. '    ') or a number of spaces (e.g. 4). Defaults to two spaces.

Returns:

The serialized XML.

Return type:

str

Example:

>>> doc = pygixml.parse_string('<root><item>value</item></root>')
>>> doc.to_string()
'<root>\n  <item>value</item>\n</root>'
>>> doc.to_string(4)
'<root>\n    <item>value</item>\n</root>'
root

Return the root element of the document.

Equivalent to calling first_child(). Returns None if the document is empty.

Returns:

The root element, or None.

Return type:

XMLNode | None

Example:

>>> doc = pygixml.parse_string('<root><item>value</item></root>')
>>> doc.root.name
'root'
class pygixml.XMLNode

Bases: object

A single node in the XML tree.

Represents an element, text, comment, processing instruction, or other node type. Provides methods for navigating to related nodes (parent, children, siblings), reading and modifying content, and executing XPath queries scoped to this node.

The most commonly used members are:

Example:

>>> doc = pygixml.parse_string('<root><item>value</item></root>')
>>> root = doc.root
>>> root.child('item').text()
'value'
static from_mem_id_unsafe(mem_id)

Reconstruct an XMLNode from its memory identifier in O(1) time.

Unlike find_mem_id(), which walks the entire tree in O(n) time to locate a node, this method performs an instant lookup.

⚠️ Warning: If the mem_id is stale (the node was deleted or the document has been freed), calling methods on the returned object may cause a segmentation fault.

Only use this when you are certain the identifier still belongs to a live node within a valid XMLDocument.

Parameters:

mem_id (int) – An identifier previously obtained from node.mem_id.

Returns:

A wrapper for the node at the given identifier.

Return type:

XMLNode

Complexity:

O(1) — direct lookup, no tree traversal. Compare with find_mem_id() which is O(n).

Example:

>>> mid = root.child('item').mem_id
>>> node = XMLNode.from_mem_id_unsafe(mid)
>>> node.name
'item'
append_attribute(name)

Append a new attribute and return it.

Parameters:

name (str) – Attribute name.

Returns:

The newly created attribute.

Return type:

XMLAttribute

Example:

>>> root = doc.root
>>> attr = root.append_attribute('id')
>>> attr.value = '123'
append_child(name)

Append a new child element and return it.

Parameters:

name (str) – Tag name. Use an empty string to create a text node instead.

Returns:

The newly created child.

Return type:

XMLNode

Example:

>>> root = doc.root
>>> root.append_child('title').set_value('My Title')
attribute(name)

Return the attribute with the given name, or None.

Parameters:

name (str) – Attribute name.

Returns:

XMLAttribute | None

Example:

>>> doc = pygixml.parse_string('<root id="1"/>')
>>> doc.root.attribute('id').value
'1'
child(name)

Return the first child element whose tag matches name, or None.

Parameters:

name (str) – Element tag to look for.

Returns:

XMLNode | None

Example:

>>> doc = pygixml.parse_string('<root><item>value</item></root>')
>>> doc.root.child('item').text()
'value'
child_value(name=None)

Return the text content of a child element.

If name is given, finds the first child with that tag and returns its text. Without name, returns the direct text content of this node (i.e. text immediately inside this element, not inside a child).

Parameters:

name (str | None) – Child tag to look up, or None for direct text.

Returns:

str | None

Example:

>>> doc = pygixml.parse_string('<root><title>Book</title></root>')
>>> doc.root.child_value('title')
'Book'
children(recursive=False)

Iterate over child element nodes.

Note

This is a pygixml-specific feature. pugixml provides first_child() and next_sibling() for manual traversal, but children() offers a Pythonic one-liner for iterating direct child elements — or all descendants with recursive=True.

Text, comment, and processing-instruction nodes are skipped.

Parameters:

recursive (bool) – Yield only direct children (False, the default) or all descendants in depth-first order (True).

Yields:

XMLNode

Example:

>>> doc = pygixml.parse_string('<root><a><a1/></a><b/></root>')
>>> [c.name for c in doc.root.children()]
['a', 'b']
>>> [c.name for c in doc.root.children(True)]
['a', 'a1', 'b']
find_mem_id(mem_id)

Look up a descendant node by its memory identifier (see mem_id).

Note

This is a pygixml-specific feature. pugixml has no equivalent — pygixml walks the descendant tree in DFS order comparing node addresses until a match is found.

Returns:

XMLNode | None

first_attribute()

Return the first attribute on this element, or None if it has none.

Returns:

XMLAttribute | None

Example:

>>> doc = pygixml.parse_string('<root id="1" class="main"/>')
>>> doc.root.first_attribute().name
'id'
first_child()

Return the first child element, or None.

Returns:

XMLNode | None

Example:

>>> doc = pygixml.parse_string('<root><a/><b/></root>')
>>> doc.root.first_child().name
'a'
is_null()

Return True if this node is null (i.e. was not found or is invalid).

prepend_attribute(name)

Prepend a new attribute and return it.

Parameters:

name (str) – Attribute name.

Returns:

The newly created attribute.

Return type:

XMLAttribute

Example:

>>> root = doc.root
>>> attr = root.prepend_attribute('id')
>>> attr.value = '123'
prepend_child(name)

Preppend a new child element and return it.

Parameters:

name (str) – Tag name. Use an empty string to create a text node instead.

Returns:

The newly created child.

Return type:

XMLNode

Example:

>>> root = doc.root
>>> root.preppend_child('title').set_value('My Title')
remove_attribute(attr)

Remove an attribute from this node.

Parameters:

attr (XMLAttribute) – The attribute to remove.

Returns:

True if the attribute was successfully removed, False otherwise.

Return type:

bool

Example:

>>> root = doc.root
>>> attr = root.attribute('id')
>>> root.remove_attribute(attr)
True
remove_child(node)

Remove a direct child element from this node.

Parameters:

node (XMLNode) – The child node to remove. Must be a direct child of this node.

Returns:

True if the node was successfully removed, False otherwise.

Return type:

bool

Example:

>>> child = root.child('old_item')
>>> if child:
...     root.remove_child(child)
select_node(query)

Run an XPath expression and return the first match, or None.

Parameters:

query (str) – XPath 1.0 expression.

Returns:

XPathNode | None

Example:

>>> doc = pygixml.parse_string('<root><a/><b/></root>')
>>> doc.root.select_node('b').node.name
'b'
select_nodes(query)

Run an XPath expression and return all matching nodes.

Parameters:

query (str) – XPath 1.0 expression.

Returns:

XPathNodeSet

Example:

>>> doc = pygixml.parse_string('<root><a/><b/><a/></root>')
>>> len(doc.root.select_nodes('a'))
2
set_name(name)

Change the tag name of this element.

Returns False if the node is null.

Parameters:

name (str) – New tag name.

Returns:

bool

Example:

>>> doc = pygixml.parse_string('<old/>')
>>> doc.root.set_name('new')
True
>>> doc.root.name
'new'
set_value(value)

Replace the text content of this node.

Returns False if the node is null.

Parameters:

value (str) – New text content.

Returns:

bool

Example:

>>> doc = pygixml.parse_string('<root><item>old</item></root>')
>>> doc.root.child('item').first_child().set_value('new')
True
text(recursive=True, join='\n')

Return the combined text content of this node.

Note

This is a pygixml-specific feature. pugixml provides child_value() for a single child’s text, but text() recursively collects text from all descendants (optionally non-recursive) and joins the fragments with a configurable separator.

Parameters:
  • recursive (bool) – When True (default), gathers text from all descendant text and CDATA nodes. When False, returns only text that is a direct child of this element.

  • join (str) – String used to join multiple text fragments. Defaults to \n.

Returns:

str

Example:

>>> doc = pygixml.parse_string(
...     '<root><a>hello</a><b>world</b></root>')
>>> doc.root.text()
'hello\nworld'
>>> doc.root.text(join=', ')
'hello, world'
to_string(indent='  ')

Serialize this element (and its subtree) to an XML string.

Note

This is a pygixml-specific feature. pugixml can serialize to a file via save_file(), but it does not provide a method that returns the serialized XML as a Python string. pygixml implements this using an internal std::ostringstream buffer.

Parameters:

indent (str | int) – Indentation string or number of spaces. Defaults to two spaces.

Returns:

str

Example:

>>> doc = pygixml.parse_string('<root><item>val</item></root>')
>>> doc.root.child('item').to_string()
'<item>val</item>'
mem_id

A unique numeric identifier derived from the node’s internal address.

Note

This is a pygixml-specific feature. The underlying pugixml library does not expose integer node identifiers natively. pygixml provides mem_id as a safe, hashable handle for debugging, caching, and fast node reconstruction.

Returns 0 for null nodes.

Returns:

int

name

Return the tag name of this node.

For element nodes this is the element’s tag name. For text, comment, and other non-element nodes this is None.

Returns:

str | None

Example:

>>> doc = pygixml.parse_string('<root/>')
>>> doc.root.name
'root'
next_element_sibling

The next sibling that is an element node, skipping text, comment, and other non-element nodes. None if none.

next_sibling

The next sibling node, or None if this is the last child.

parent

The parent element node. Returns None for the document root.

previous_element_sibling

The previous sibling that is an element node. None if none.

previous_sibling

The previous sibling node, or None if this is the first child.

type

Return the node type as a human-readable string.

Possible values: 'element', 'pcdata', 'cdata', 'comment', 'pi', 'declaration', 'doctype', 'document', 'null'.

Returns:

str

Example:

>>> doc = pygixml.parse_string('<root>text</root>')
>>> doc.root.type
'element'
>>> doc.root.first_child().type
'pcdata'
value

Return the text content of this node.

For text, CDATA, comment, and processing-instruction nodes, returns the raw value directly.

For element nodes, this is a convenience shortcut that returns the value of the first text/CDATA child (or None if no text child exists).

Returns:

str | None

Example:

# Text node — returns raw value
>>> doc = pygixml.parse_string('<root><item>hello</item></root>')
>>> doc.root.child('item').first_child().value
'hello'

# Element node — returns first text child's value
>>> doc.root.child('item').value
'hello'
xml

Shorthand for self.to_string() — serialized XML with default two-space indentation.

Note

This is a pygixml-specific convenience property. pugixml has no equivalent.

xpath

The absolute XPath to this node (e.g. /root/item[1]/name[1]).

Note

This is a pygixml-specific feature. pugixml does not provide XPath generation natively — pygixml implements a custom O(depth) algorithm that walks from the node up to the root, counting same-name siblings to produce accurate positional predicates.

Returns an empty string if the node is not an element.

Returns:

str

class pygixml.XPathNode

Bases: object

A single result from an XPath query.

Wraps either an XMLNode (.node) or an XMLAttribute (.attribute). One of these properties will be None depending on what the query matched.

Example:

>>> doc = pygixml.parse_string('<root><item id="1">value</item></root>')
>>> result = doc.select_node('//item')
>>> result.node.name
'item'
attribute

The matched attribute, or None if the query matched an element instead.

node

The matched element, or None if the query matched an attribute instead.

parent

The parent of the matched node (None for attributes or the document root).

class pygixml.XPathNodeSet

Bases: object

A collection of XPathNode results from an XPath query.

Supports len(), indexing (node_set[0]), and iteration.

Example:

>>> doc = pygixml.parse_string('<root><item>1</item><item>2</item></root>')
>>> nodes = doc.select_nodes('//item')
>>> len(nodes)
2
>>> nodes[0].node.text()
'1'
class pygixml.XPathQuery

Bases: object

A compiled XPath 1.0 query.

Compiling a query once and re-using it is faster than calling select_nodes() repeatedly, because the expression is parsed only once.

Example:

>>> doc = pygixml.parse_string('<root><item>value</item></root>')
>>> query = pygixml.XPathQuery('//item')
>>> query.evaluate_node(doc.root).node.text()
'value'
evaluate_boolean(context_node)

Evaluate query and return boolean result.

Parameters:

context_node (XMLNode) – Node to evaluate the query against

Returns:

Boolean result of the XPath query

Return type:

bool

Example

>>> query = pygixml.XPathQuery('count(//item) > 0')
>>> has_items = query.evaluate_boolean(doc.first_child())
>>> print(has_items)
True
evaluate_node(context_node)

Evaluate query and return first node.

Parameters:

context_node (XMLNode) – Node to evaluate the query against

Returns:

First matching XPath node or None if no matches

Return type:

XPathNode

Example

>>> query = pygixml.XPathQuery('//item')
>>> node = query.evaluate_node(doc.first_child())
>>> print(node.node.text())
evaluate_node_set(context_node)

Evaluate query and return node set.

Parameters:

context_node (XMLNode) – Node to evaluate the query against

Returns:

Set of matching XPath nodes

Return type:

XPathNodeSet

Example

>>> query = pygixml.XPathQuery('//item')
>>> nodes = query.evaluate_node_set(doc.first_child())
>>> for node in nodes:
...     print(node.node.text())
evaluate_number(context_node)

Evaluate query and return numeric result.

Parameters:

context_node (XMLNode) – Node to evaluate the query against

Returns:

Numeric result of the XPath query

Return type:

float

Example

>>> query = pygixml.XPathQuery('count(//item)')
>>> count = query.evaluate_number(doc.first_child())
>>> print(count)
2.0
evaluate_string(context_node)

Evaluate query and return string result.

Parameters:

context_node (XMLNode) – Node to evaluate the query against

Returns:

String result of the XPath query or None if empty

Return type:

str

Example

>>> query = pygixml.XPathQuery('//item[1]/text()')
>>> text = query.evaluate_string(doc.first_child())
>>> print(text)
'value'
pygixml.parse_file(file_path, options=4294967295)

Parse XML from file and return XMLDocument.

Parameters:
  • file_path (str) – Path to XML file

  • options (ParseFlags, optional) – Parse flags (default: ParseFlags.DEFAULT). Combine flags with bitwise OR. Use ParseFlags.MINIMAL for fastest parsing when you don’t need comments, CDATA, or escape processing.

Returns:

Parsed XML document

Return type:

XMLDocument

Raises:

PygiXMLError – If parsing fails

Example

>>> import pygixml
>>> doc = pygixml.parse_file('data.xml')
>>> doc = pygixml.parse_file('data.xml', pygixml.ParseFlags.MINIMAL)
pygixml.parse_string(xml_string, options=4294967295)

Parse XML from string and return XMLDocument.

Parameters:
  • xml_string (str) – XML content as string

  • options (ParseFlags, optional) – Parse flags (default: ParseFlags.DEFAULT). Combine flags with bitwise OR. Use ParseFlags.MINIMAL for fastest parsing when you don’t need comments, CDATA, or escape processing.

Returns:

Parsed XML document

Return type:

XMLDocument

Raises:

PygiXMLError – If parsing fails

Example

>>> import pygixml
>>> doc = pygixml.parse_string('<root>content</root>')
>>> doc = pygixml.parse_string(xml, pygixml.ParseFlags.MINIMAL)