API Reference¶
pygixml - Python wrapper for pugixml using Cython
A fast and efficient XML parser and manipulator for Python.
- exception pygixml.PygiXMLError¶
Bases:
ValueErrorRaised when a pygixml operation fails.
Typical causes include malformed XML passed to
parse_string()orparse_file(), or an attempt to set a name/value on a null or otherwise invalid node.
- exception pygixml.PygiXMLNullNodeError¶
Bases:
PygiXMLErrorRaised when an operation that requires a valid node is called on a null node (e.g. setting attributes on an element that was never found).
- class pygixml.ParseFlags(value)¶
Bases:
IntFlagBitmask of parse options for
parse_string()andparse_file().Members are combined with the bitwise OR operator (
|). When no flags are supplied the parser usesParseFlags.DEFAULT(all standard processing enabled).Use
ParseFlags.MINIMALwhen you only care about element structure and want the fastest possible parse — it skips escape processing, EOL normalization, and all whitespace handling.Example:
>>> doc = pygixml.parse_string(xml, pygixml.ParseFlags.MINIMAL) >>> # Combine specific flags: >>> flags = pygixml.ParseFlags.COMMENTS | pygixml.ParseFlags.CDATA >>> doc = pygixml.parse_string(xml, flags)
See the Parse Flags section in the documentation for a complete description of each flag.
- CDATA = 4¶
- COMMENTS = 2¶
- DECLARATION = 256¶
- DEFAULT = 116¶
- DOCTYPE = 512¶
- EMBED_PCDATA = 8192¶
- EOL = 32¶
- ESCAPES = 16¶
- FRAGMENT = 4096¶
- FULL = 887¶
- MERGE_PCDATA = 16384¶
- MINIMAL = 0¶
- PI = 1¶
- TRIM_PCDATA = 2048¶
- WCONV_ATTRIBUTE = 64¶
- WNORM_ATTRIBUTE = 128¶
- WS_PCDATA = 8¶
- WS_PCDATA_SINGLE = 1024¶
- class pygixml.XMLAttribute¶
Bases:
objectAn XML attribute on an element (e.g.
id="123").Use
XMLNode.attribute()orXMLNode.first_attribute()to obtain attributes.Example:
>>> doc = pygixml.parse_string('<root id="42" class="main"/>') >>> root = doc.root >>> root.attribute('id').value '42'
- set_name(name)¶
Change the attribute name. Returns
Falseif null.
- set_value(value)¶
Change the attribute value. Returns
Falseif null.
- name¶
Return the attribute name.
- Returns:
str | None
- next_attribute¶
Get next attribute.
- Returns:
Next attribute or None if no next attribute
- Return type:
Example
>>> attr = node.first_attribute() >>> next_attr = attr.next_attribute
- value¶
Return the attribute value.
- Returns:
str | None
- class pygixml.XMLDocument¶
Bases:
objectAn XML document, providing document-level operations.
Use this class to load, create, save, and manipulate XML documents, or to access the root element and top-level children.
The most common entry point is
parse_string()orparse_file(), which return anXMLDocument:>>> doc = pygixml.parse_string('<root><item>value</item></root>') >>> doc.root.child('item').text() 'value'
You can also build a document from scratch:
>>> doc = pygixml.XMLDocument() >>> root = doc.append_child('catalog') >>> item = root.append_child('item') >>> item.set_value('content')
When processing many files in a loop, reuse a single document with
reset()to avoid repeated allocations.- append_child(name)¶
Append a new child element and return it.
- Parameters:
name (str) – Tag name for the new element. Pass an empty string to create a text node instead.
- Returns:
The newly created element (or text node).
- Return type:
Example:
>>> doc = pygixml.XMLDocument() >>> root = doc.append_child('catalog') >>> item = root.append_child('item') >>> item.set_value('content')
- child(name)¶
Return the first child element whose tag matches name, or
Noneif no match is found.- Parameters:
name (str) – Element tag to look for.
- Returns:
XMLNode | None
Example:
>>> doc = pygixml.parse_string('<root><item>value</item></root>') >>> doc.child('item').text() 'value'
- first_child()¶
Return the first child element, or
Noneif the document is empty.- Returns:
XMLNode | None
Example:
>>> doc = pygixml.parse_string('<root><child/></root>') >>> doc.first_child().name 'root'
- load_file(path, options=4294967295)¶
Parse XML from a file and replace the current document content.
Reads and parses the file at path. Returns
Trueon success,Falseif the file cannot be opened or does not contain well-formed XML.- Parameters:
path (str) – Path to the XML file.
options (ParseFlags) – Which parse flags to use. Defaults to
ParseFlags.DEFAULT.
- Returns:
Trueif loading succeeded,Falseotherwise.- Return type:
bool
Example:
>>> doc = pygixml.XMLDocument() >>> doc.load_file('data.xml') True >>> doc.root.name 'root'
- load_string(content, options=4294967295)¶
Parse XML from a string and replace the current document content.
Parses content and replaces whatever the document previously held. Returns
Trueon success,Falseif the string is not well-formed.- Parameters:
content (str) – The XML source text.
options (ParseFlags) – Which parse flags to use. Defaults to
ParseFlags.DEFAULT(full compliance). UseParseFlags.MINIMALfor faster parsing when you don’t need escape processing, EOL normalization, or whitespace handling.
- Returns:
Trueif parsing succeeded,Falseotherwise.- Return type:
bool
Example:
>>> doc = pygixml.XMLDocument() >>> doc.load_string('<root><item>value</item></root>') True >>> doc.root.child('item').text() 'value'
- Raises:
PygiXMLError – When the input is not well-formed XML (raised by
parse_string(); this method returnsFalseinstead).
- reset()¶
Clear all content, returning the document to its initial empty state.
Reuses the same underlying C++ document object, avoiding reallocation overhead when processing many files in a loop.
Example:
>>> doc = pygixml.parse_string('<root>content</root>') >>> doc.reset() >>> doc.root # None — document is empty
- save_file(path, indent=' ')¶
Serialize the document and write it to a file.
- Parameters:
path (str) – Output file path. Existing files will be overwritten.
indent (str) – Indentation string used for pretty-printing. Defaults to two spaces. Pass an empty string for compact output with no indentation.
Example:
>>> doc = pygixml.parse_string('<root><item>value</item></root>') >>> doc.save_file('output.xml') # 2-space indent >>> doc.save_file('compact.xml', indent='') # no indent
- to_string(indent=' ')¶
Serialize the document to an XML string.
- Parameters:
indent (str | int) – Indentation — either a string (e.g.
' ') or a number of spaces (e.g.4). Defaults to two spaces.- Returns:
The serialized XML.
- Return type:
str
Example:
>>> doc = pygixml.parse_string('<root><item>value</item></root>') >>> doc.to_string() '<root>\n <item>value</item>\n</root>' >>> doc.to_string(4) '<root>\n <item>value</item>\n</root>'
- root¶
Return the root element of the document.
Equivalent to calling
first_child(). ReturnsNoneif the document is empty.- Returns:
The root element, or
None.- Return type:
XMLNode | None
Example:
>>> doc = pygixml.parse_string('<root><item>value</item></root>') >>> doc.root.name 'root'
- class pygixml.XMLNode¶
Bases:
objectA single node in the XML tree.
Represents an element, text, comment, processing instruction, or other node type. Provides methods for navigating to related nodes (parent, children, siblings), reading and modifying content, and executing XPath queries scoped to this node.
The most commonly used members are:
child()— first child with a given tagchildren()— iterate direct child elementstext()— combined text contentselect_nodes()/select_node()— XPath selectionxml— serialized XML of this node and its subtree
Example:
>>> doc = pygixml.parse_string('<root><item>value</item></root>') >>> root = doc.root >>> root.child('item').text() 'value'
- static from_mem_id_unsafe(mem_id)¶
Reconstruct an
XMLNodefrom its memory identifier in O(1) time.Unlike
find_mem_id(), which walks the entire tree in O(n) time to locate a node, this method performs an instant lookup.⚠️ Warning: If the mem_id is stale (the node was deleted or the document has been freed), calling methods on the returned object may cause a segmentation fault.
Only use this when you are certain the identifier still belongs to a live node within a valid
XMLDocument.- Parameters:
mem_id (int) – An identifier previously obtained from
node.mem_id.- Returns:
A wrapper for the node at the given identifier.
- Return type:
- Complexity:
O(1) — direct lookup, no tree traversal. Compare with
find_mem_id()which is O(n).
Example:
>>> mid = root.child('item').mem_id >>> node = XMLNode.from_mem_id_unsafe(mid) >>> node.name 'item'
- append_attribute(name)¶
Append a new attribute and return it.
- Parameters:
name (str) – Attribute name.
- Returns:
The newly created attribute.
- Return type:
Example:
>>> root = doc.root >>> attr = root.append_attribute('id') >>> attr.value = '123'
- append_child(name)¶
Append a new child element and return it.
- Parameters:
name (str) – Tag name. Use an empty string to create a text node instead.
- Returns:
The newly created child.
- Return type:
Example:
>>> root = doc.root >>> root.append_child('title').set_value('My Title')
- attribute(name)¶
Return the attribute with the given name, or
None.- Parameters:
name (str) – Attribute name.
- Returns:
XMLAttribute | None
Example:
>>> doc = pygixml.parse_string('<root id="1"/>') >>> doc.root.attribute('id').value '1'
- child(name)¶
Return the first child element whose tag matches name, or
None.- Parameters:
name (str) – Element tag to look for.
- Returns:
XMLNode | None
Example:
>>> doc = pygixml.parse_string('<root><item>value</item></root>') >>> doc.root.child('item').text() 'value'
- child_value(name=None)¶
Return the text content of a child element.
If name is given, finds the first child with that tag and returns its text. Without name, returns the direct text content of this node (i.e. text immediately inside this element, not inside a child).
- Parameters:
name (str | None) – Child tag to look up, or
Nonefor direct text.- Returns:
str | None
Example:
>>> doc = pygixml.parse_string('<root><title>Book</title></root>') >>> doc.root.child_value('title') 'Book'
- children(recursive=False)¶
Iterate over child element nodes.
Note
This is a pygixml-specific feature. pugixml provides
first_child()andnext_sibling()for manual traversal, butchildren()offers a Pythonic one-liner for iterating direct child elements — or all descendants withrecursive=True.Text, comment, and processing-instruction nodes are skipped.
- Parameters:
recursive (bool) – Yield only direct children (
False, the default) or all descendants in depth-first order (True).- Yields:
XMLNode
Example:
>>> doc = pygixml.parse_string('<root><a><a1/></a><b/></root>') >>> [c.name for c in doc.root.children()] ['a', 'b'] >>> [c.name for c in doc.root.children(True)] ['a', 'a1', 'b']
- find_mem_id(mem_id)¶
Look up a descendant node by its memory identifier (see
mem_id).Note
This is a pygixml-specific feature. pugixml has no equivalent — pygixml walks the descendant tree in DFS order comparing node addresses until a match is found.
- Returns:
XMLNode | None
- first_attribute()¶
Return the first attribute on this element, or
Noneif it has none.- Returns:
XMLAttribute | None
Example:
>>> doc = pygixml.parse_string('<root id="1" class="main"/>') >>> doc.root.first_attribute().name 'id'
- first_child()¶
Return the first child element, or
None.- Returns:
XMLNode | None
Example:
>>> doc = pygixml.parse_string('<root><a/><b/></root>') >>> doc.root.first_child().name 'a'
- is_null()¶
Return
Trueif this node is null (i.e. was not found or is invalid).
- prepend_attribute(name)¶
Prepend a new attribute and return it.
- Parameters:
name (str) – Attribute name.
- Returns:
The newly created attribute.
- Return type:
Example:
>>> root = doc.root >>> attr = root.prepend_attribute('id') >>> attr.value = '123'
- prepend_child(name)¶
Preppend a new child element and return it.
- Parameters:
name (str) – Tag name. Use an empty string to create a text node instead.
- Returns:
The newly created child.
- Return type:
Example:
>>> root = doc.root >>> root.preppend_child('title').set_value('My Title')
- remove_attribute(attr)¶
Remove an attribute from this node.
- Parameters:
attr (XMLAttribute) – The attribute to remove.
- Returns:
True if the attribute was successfully removed, False otherwise.
- Return type:
bool
Example:
>>> root = doc.root >>> attr = root.attribute('id') >>> root.remove_attribute(attr) True
- remove_child(node)¶
Remove a direct child element from this node.
- Parameters:
node (XMLNode) – The child node to remove. Must be a direct child of this node.
- Returns:
True if the node was successfully removed, False otherwise.
- Return type:
bool
Example:
>>> child = root.child('old_item') >>> if child: ... root.remove_child(child)
- select_node(query)¶
Run an XPath expression and return the first match, or
None.- Parameters:
query (str) – XPath 1.0 expression.
- Returns:
XPathNode | None
Example:
>>> doc = pygixml.parse_string('<root><a/><b/></root>') >>> doc.root.select_node('b').node.name 'b'
- select_nodes(query)¶
Run an XPath expression and return all matching nodes.
- Parameters:
query (str) – XPath 1.0 expression.
- Returns:
XPathNodeSet
Example:
>>> doc = pygixml.parse_string('<root><a/><b/><a/></root>') >>> len(doc.root.select_nodes('a')) 2
- set_name(name)¶
Change the tag name of this element.
Returns
Falseif the node is null.- Parameters:
name (str) – New tag name.
- Returns:
bool
Example:
>>> doc = pygixml.parse_string('<old/>') >>> doc.root.set_name('new') True >>> doc.root.name 'new'
- set_value(value)¶
Replace the text content of this node.
Returns
Falseif the node is null.- Parameters:
value (str) – New text content.
- Returns:
bool
Example:
>>> doc = pygixml.parse_string('<root><item>old</item></root>') >>> doc.root.child('item').first_child().set_value('new') True
- text(recursive=True, join='\n')¶
Return the combined text content of this node.
Note
This is a pygixml-specific feature. pugixml provides
child_value()for a single child’s text, buttext()recursively collects text from all descendants (optionally non-recursive) and joins the fragments with a configurable separator.- Parameters:
recursive (bool) – When
True(default), gathers text from all descendant text and CDATA nodes. WhenFalse, returns only text that is a direct child of this element.join (str) – String used to join multiple text fragments. Defaults to
\n.
- Returns:
str
Example:
>>> doc = pygixml.parse_string( ... '<root><a>hello</a><b>world</b></root>') >>> doc.root.text() 'hello\nworld' >>> doc.root.text(join=', ') 'hello, world'
- to_string(indent=' ')¶
Serialize this element (and its subtree) to an XML string.
Note
This is a pygixml-specific feature. pugixml can serialize to a file via
save_file(), but it does not provide a method that returns the serialized XML as a Python string. pygixml implements this using an internalstd::ostringstreambuffer.- Parameters:
indent (str | int) – Indentation string or number of spaces. Defaults to two spaces.
- Returns:
str
Example:
>>> doc = pygixml.parse_string('<root><item>val</item></root>') >>> doc.root.child('item').to_string() '<item>val</item>'
- mem_id¶
A unique numeric identifier derived from the node’s internal address.
Note
This is a pygixml-specific feature. The underlying pugixml library does not expose integer node identifiers natively. pygixml provides
mem_idas a safe, hashable handle for debugging, caching, and fast node reconstruction.Returns
0for null nodes.- Returns:
int
- name¶
Return the tag name of this node.
For element nodes this is the element’s tag name. For text, comment, and other non-element nodes this is
None.- Returns:
str | None
Example:
>>> doc = pygixml.parse_string('<root/>') >>> doc.root.name 'root'
- next_element_sibling¶
The next sibling that is an element node, skipping text, comment, and other non-element nodes.
Noneif none.
- next_sibling¶
The next sibling node, or
Noneif this is the last child.
- parent¶
The parent element node. Returns
Nonefor the document root.
- previous_element_sibling¶
The previous sibling that is an element node.
Noneif none.
- previous_sibling¶
The previous sibling node, or
Noneif this is the first child.
- type¶
Return the node type as a human-readable string.
Possible values:
'element','pcdata','cdata','comment','pi','declaration','doctype','document','null'.- Returns:
str
Example:
>>> doc = pygixml.parse_string('<root>text</root>') >>> doc.root.type 'element' >>> doc.root.first_child().type 'pcdata'
- value¶
Return the text content of this node.
For text, CDATA, comment, and processing-instruction nodes, returns the raw value directly.
For element nodes, this is a convenience shortcut that returns the value of the first text/CDATA child (or
Noneif no text child exists).- Returns:
str | None
Example:
# Text node — returns raw value >>> doc = pygixml.parse_string('<root><item>hello</item></root>') >>> doc.root.child('item').first_child().value 'hello' # Element node — returns first text child's value >>> doc.root.child('item').value 'hello'
- xml¶
Shorthand for
self.to_string()— serialized XML with default two-space indentation.Note
This is a pygixml-specific convenience property. pugixml has no equivalent.
- xpath¶
The absolute XPath to this node (e.g.
/root/item[1]/name[1]).Note
This is a pygixml-specific feature. pugixml does not provide XPath generation natively — pygixml implements a custom O(depth) algorithm that walks from the node up to the root, counting same-name siblings to produce accurate positional predicates.
Returns an empty string if the node is not an element.
- Returns:
str
- class pygixml.XPathNode¶
Bases:
objectA single result from an XPath query.
Wraps either an
XMLNode(.node) or anXMLAttribute(.attribute). One of these properties will beNonedepending on what the query matched.Example:
>>> doc = pygixml.parse_string('<root><item id="1">value</item></root>') >>> result = doc.select_node('//item') >>> result.node.name 'item'
- attribute¶
The matched attribute, or
Noneif the query matched an element instead.
- node¶
The matched element, or
Noneif the query matched an attribute instead.
- parent¶
The parent of the matched node (
Nonefor attributes or the document root).
- class pygixml.XPathNodeSet¶
Bases:
objectA collection of
XPathNoderesults from an XPath query.Supports
len(), indexing (node_set[0]), and iteration.Example:
>>> doc = pygixml.parse_string('<root><item>1</item><item>2</item></root>') >>> nodes = doc.select_nodes('//item') >>> len(nodes) 2 >>> nodes[0].node.text() '1'
- class pygixml.XPathQuery¶
Bases:
objectA compiled XPath 1.0 query.
Compiling a query once and re-using it is faster than calling
select_nodes()repeatedly, because the expression is parsed only once.Example:
>>> doc = pygixml.parse_string('<root><item>value</item></root>') >>> query = pygixml.XPathQuery('//item') >>> query.evaluate_node(doc.root).node.text() 'value'
- evaluate_boolean(context_node)¶
Evaluate query and return boolean result.
- Parameters:
context_node (XMLNode) – Node to evaluate the query against
- Returns:
Boolean result of the XPath query
- Return type:
bool
Example
>>> query = pygixml.XPathQuery('count(//item) > 0') >>> has_items = query.evaluate_boolean(doc.first_child()) >>> print(has_items) True
- evaluate_node(context_node)¶
Evaluate query and return first node.
- Parameters:
context_node (XMLNode) – Node to evaluate the query against
- Returns:
First matching XPath node or None if no matches
- Return type:
Example
>>> query = pygixml.XPathQuery('//item') >>> node = query.evaluate_node(doc.first_child()) >>> print(node.node.text())
- evaluate_node_set(context_node)¶
Evaluate query and return node set.
- Parameters:
context_node (XMLNode) – Node to evaluate the query against
- Returns:
Set of matching XPath nodes
- Return type:
Example
>>> query = pygixml.XPathQuery('//item') >>> nodes = query.evaluate_node_set(doc.first_child()) >>> for node in nodes: ... print(node.node.text())
- evaluate_number(context_node)¶
Evaluate query and return numeric result.
- Parameters:
context_node (XMLNode) – Node to evaluate the query against
- Returns:
Numeric result of the XPath query
- Return type:
float
Example
>>> query = pygixml.XPathQuery('count(//item)') >>> count = query.evaluate_number(doc.first_child()) >>> print(count) 2.0
- evaluate_string(context_node)¶
Evaluate query and return string result.
- Parameters:
context_node (XMLNode) – Node to evaluate the query against
- Returns:
String result of the XPath query or None if empty
- Return type:
str
Example
>>> query = pygixml.XPathQuery('//item[1]/text()') >>> text = query.evaluate_string(doc.first_child()) >>> print(text) 'value'
- pygixml.parse_file(file_path, options=4294967295)¶
Parse XML from file and return XMLDocument.
- Parameters:
file_path (str) – Path to XML file
options (ParseFlags, optional) – Parse flags (default:
ParseFlags.DEFAULT). Combine flags with bitwise OR. UseParseFlags.MINIMALfor fastest parsing when you don’t need comments, CDATA, or escape processing.
- Returns:
Parsed XML document
- Return type:
- Raises:
PygiXMLError – If parsing fails
Example
>>> import pygixml >>> doc = pygixml.parse_file('data.xml') >>> doc = pygixml.parse_file('data.xml', pygixml.ParseFlags.MINIMAL)
- pygixml.parse_string(xml_string, options=4294967295)¶
Parse XML from string and return XMLDocument.
- Parameters:
xml_string (str) – XML content as string
options (ParseFlags, optional) – Parse flags (default:
ParseFlags.DEFAULT). Combine flags with bitwise OR. UseParseFlags.MINIMALfor fastest parsing when you don’t need comments, CDATA, or escape processing.
- Returns:
Parsed XML document
- Return type:
- Raises:
PygiXMLError – If parsing fails
Example
>>> import pygixml >>> doc = pygixml.parse_string('<root>content</root>') >>> doc = pygixml.parse_string(xml, pygixml.ParseFlags.MINIMAL)