Build-in 'ElementTree' module

There are many python modules that can parse XML and store the data into Python objects. For the purposes of this example, we will use the build-in module - "ElementTree". It has functions to read and manipulate XMLs (and other similarly structured files). We will be working with the example from lesson 2 which is:

<?xml version="1.0" encoding="UTF-8"?>

  <interface id="1">
    <description >VLAN20</description>

  <interface id="2">
    <description >VLAN20</description>

  <interface id="3">
    <description >VLAN20</description>


Before we start, we must have the file saved as interfaces.xml in our python working directory. Then we import the ElementTree library. It's common practice to use the alias of ET:

import xml.etree.ElementTree as ET

Parsing XML Data

In the XML file provided, there is a basic collection of interfaces. Each interface has a different IP address and other configuration parameters. The main goal in this lesson will be to read and understand the file with Python.

First, we need to read the file with ElementTree.

xml = ET.parse('interfaces.xml')
root = xml.getroot()

Then, if we print the object we can see that we have an ‘ElementTree’ object at a specified memory address:

>>> print(xml)
<xml.etree.ElementTree.ElementTree object at 0x0364AEE0>

Once loaded as a python object, the data can be read or manipulated with the build-in methods and attributes available. We can see them using the ‘dir()’ method:

>>> print(dir(xml))
['__class__', '__delattr__', '__dict__',
 '__dir__', '__doc__', '__eq__',
 '__format__', '__ge__',
 '__getattribute__', '__gt__',
 '__hash__', '__init__', '__init_subclass__',
 '__le__', '__lt__',
 '__module__', '__ne__', '__new__',
 '__reduce__', '__reduce_ex__', 
'__repr__', '__setattr__', '__sizeof__', 
'__str__', '__subclasshook__', '__weakref__',
 '_root', '_setroot', 'find', 'findall', 'findtext',
 'getiterator', 'getroot', 'iter', 'iterfind',
 'parse', 'write', 'write_c14n']

Let's use the method 'iterfind()' to return a generator that we can use to iterate over in a for-loop.

>>> for item in xml.iterfind('interface'):
...         print(item)
<Element 'interface' at 0x03D265C8>
<Element 'interface' at 0x03D47CD0>
<Element 'interface' at 0x03D47E60>

We can see that we have several ‘interface’ objects stored at various memory addresses. We can extract the information from these objects using the ‘findtext()’ method. Let’s extract the information in the ‘address’ tags:

for item in xml.iterfind('interface'):
...     print(item.findtext('address'))

Let's take the interface name as well:

for item in xml.iterfind('interface'):
...     print(item.findtext('name'), item.findtext('address'))

To summarize, in this lesson we saw how to parse XML data using the built-in 'ElementTree' library in python. We saw how to use the ‘iterfind()’ method to define a generator object that we can iterate over in a for-loop. We also showed how to access element tag information using the ‘findtext()’ method.