What is XML?

Extensible Markup Language (XML) is a markup language, similar to HTML, that defines a set of rules for encoding text information in a format that is both human-readable and machine-readable. As a markup language, it uses user-defined tags to create elements within a document, meaning XML files contain standard words, rather than typical programming syntax. That's why it is a self-descriptive language.

Let's break down all parts of XML's name and see why it is called Extensible Markup Language.

  • Extensible - XML is extensible because it lets the user define its own tags and values. It also allows the user to define the characters' encoding and how the document should be fetched and displayed. 
  • Markup - XML is built around the concept of tags also called elements. It is very similar to HTML but as we said earlier it is more extensible and customizable than HTML. An XML document is a very flexible construct, it can be nested and extended indefinitely in alignment with the user's needs. 
  • Language - XML is a meta-language. It allows the users to built other languages on tops of it such as RSS (RDF Site Summary), WML (Wireless Markup Language), and XSL ( Extensible Style Language).

Why do we need XML?

Many people wonder why do we need XML if we already have another very popular markup language like HTML. The answer is very simple and straight forward - HTML is designed to be consumed by web browsers and not humans. HTML uses predefined tags/elements and document structure. It is also not extensible with user-defined constructs. XML on the other hand is specifically designed to be human-readable and extensible in alignment with the user's needs. Let's look at the following example:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
    "https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
 <html xmlns="https://www.w3.org/1999/xhtml"> 
 <head> 
 <title>Interface Configuration</title> 
 <meta http-equiv="Content-Type" 
    content="text/html; charset=utf-8" /> 
 </head> 
 <body> 
 <h1>GigabitEthernet0/0</h1> 
 <h2>Description: Link to Router 1</h2> 
 <p>IPaddress:192.168.1.1</p> 
 <p><b>Mask: 255.255.255.0</b></p> 
 <p><strong>Speed:1000</strong></p> 
 ... 
 <h1>GigabitEthernet0/1</h1> 
 <h2>Description: Link to Router 3</h2> 
 <p>IPaddress:192.168.5.1</p> 
 <p><b>Mask: 255.255.255.0</b></p> 
 <p><strong>Speed:100</strong></p> 
 ... 
 <h1>GigabitEthernet0/0</h1> 
 <h2>Description: Link to Router 4</h2> 
 <p>IPaddress:192.168.43.1</p> 
 <p><b>Mask: 255.255.255.192</b></p> 
 <p><strong>Speed:10</strong></p> 
 </body> 
 </html>

Take a good look at the above example. A human can certainly read this document and make sense of the content but it is definitely not human-readable and not easy to parse via any programming language. Let's look at another example of similar content but documented with XML instead of HTML.

<?xml version="1.0" encoding="UTF-8"?>
<interfaces>

  <interface id="1">
    <name>GigabitEthernet0/0</name>
    <description >Link to Router 1</description>
    <address>192.168.1.1</address>
    <mask>255.255.255.0</mask>
    <speed>1000</speed>
  </interface>

  <interface id="2">
    <name>GigabitEthernet0/1</name>
    <description >Link to ROuter 3</description>
    <address>192.168.2.1</address>
    <mask>255.255.255.0</mask>
    <speed>100</speed>
  </interface>

  <interface id="3">
    <name>GigabitEthernet0/2</name>
    <description >Link to Router 4</description>
    <address>192.168.43.1</address>
    <mask>255.255.255.192</mask>
    <speed>100</speed>
  </interface>

</interfaces>

Note how simpler it is to read and make sense of the data in the document. Also, think about whether it would be easier to parse the data with a programming language. With XML, you will only need to tell the programming language that all data enclosed in the <interface></interface> brackets belong to this interface. 

XML does NOT do anything

It is important to understand that the XML below does NOT do anything on its own. It is just information wrapped in tags following the pre-defined set of rules.

<?xml version="1.0" encoding="UTF-8"?>
<interfaces>

  <interface id="1">
    <name>GigabitEthernet0/0/1</name>
    <description >VLAN20</description>
    <address>10.1.1.1</address>
    <mask>255.255.255.0</mask>
    <MTU>1400</MTU>
    <duplex>full</duplex>
    <speed>1000</speed>
  </interface>

  <interface id="2">
    <name>GigabitEthernet0/0/2</name>
    <description >VLAN20</description>
    <address>192.168.1.1</address>
    <mask>255.255.255.128</mask>
    <MTU>1500</MTU>
    <duplex>full</duplex>
    <speed>1000</speed>
  </interface>

  <interface id="3">
    <name>GigabitEthernet0/0/3</name>
    <description >VLAN20</description>
    <address>172.16.5.1</address>
    <mask>255.255.255.192</mask>
    <MTU>1514</MTU>
    <duplex>full</duplex>
    <speed>100</speed>
  </interface>

</interfaces>

This XML data can be fed into algorithms and programming languages. It can be modified and stored in a local file. It can be sent over the network. But on its own, it is just a bunch of clear text.

XML is Open Standard

XML is stored in a clear text format. This provides a software- and hardware-independent way of storing, transporting, and sharing data.

Because it is open standard, XML is widely adopted and supported across many popular applications and web browsers. It is also one of the office formats supported by Microsoft Office, Open Office and Google Docs.