• Home
  • Docs
  • About
  • Resume
  • Services
    • Testimonials
  • Contact
Humbug
Bah! Humbug!
Home 2010 Parse Simple XML Files using Bash – Extract Name Value Pairs and Attributes

Parse Simple XML Files using Bash – Extract Name Value Pairs and Attributes

  • 5 Comments
  • Tweet
Pratik Sinha | July 31, 2010

I have written up a simple routine parseXML to parse simple XML files to extract unique name values pairs and their attributes. The script extracts all xml tags of the format <abc arg1="hello">xyz</abc> and dynamically creates bash variables which hold values of the attributes as well as the elements. This is a good solution, if you don’t wish to use xpath for some simple xml files. However you will need xmllint installed on your system to use the script. Here’s a sample script which uses the parseXML function

#!/bin/bash
xmlFile=$1

function parseXML() {
  elemList=( $(cat $xmlFile | tr '\n' ' ' | XMLLINT_INDENT="" xmllint --format - | /bin/grep -e "</.*>$" | while read line; do \
    echo $line | sed -e 's/^.*<\///' | cut -d '>' -f 1; \
  done) )

  totalNoOfTags=${#elemList[@]}; ((totalNoOfTags--))
  suffix=$(echo ${elemList[$totalNoOfTags]} | tr -d '</>')
  suffix="${suffix}_"

  for (( i = 0 ; i < ${#elemList[@]} ; i++ )); do
    elem=${elemList[$i]}
    elemLine=$(cat $xmlFile | tr '\n' ' ' | XMLLINT_INDENT="" xmllint --format - | /bin/grep "</$elem>")
    echo $elemLine | grep -e "^</[^ ]*>$" 1>/dev/null 2>&1
    if [ "0" = "$?" ]; then
      continue
    fi
    elemVal=$(echo $elemLine | tr '\011' '\040'| sed -e 's/^[ ]*//' -e 's/^<.*>\([^<].*\)<.*>$/\1/' | sed -e 's/^[ ]*//' | sed -e 's/[ ]*$//')
    xmlElem="${suffix}$(echo $elem | sed 's/-/_/g')"
    eval ${xmlElem}=`echo -ne \""${elemVal}"\"`
    attrList=($(cat $xmlFile | tr '\n' ' ' | XMLLINT_INDENT="" xmllint --format - | /bin/grep "</$elem>" | tr '\011' '\040' | sed -e 's/^[ ]*//' | cut -d '>' -f 1  | sed -e 's/^<[^ ]*//' | tr "'" '"' | tr '"' '\n'  | tr '=' '\n' | sed -e 's/^[ ]*//' | sed '/^$/d' | tr '\011' '\040' | tr ' ' '>'))
    for (( j = 0 ; j < ${#attrList[@]} ; j++ )); do
      attr=${attrList[$j]}
      ((j++))
      attrVal=$(echo ${attrList[$j]} | tr '>' ' ')
      attrName=`echo -ne ${xmlElem}_${attr}`
      eval ${attrName}=`echo -ne \""${attrVal}"\"`
    done
  done
}

parseXML
echo "$status_xyz |  $status_abc |  $status_pqr" #Variables for each  XML ELement
echo "$status_xyz_arg1 |  $status_abc_arg2 |  $status_pqr_arg3 | $status_pqr_arg4" #Variables for each XML Attribute
echo ""

#All the variables that were produced by the parseXML function
set | /bin/grep -e "^$suffix"

The XML File used for the above script example is:

<?xml version="1.0"?>
<status>
  <xyz arg1="1"> a </xyz>
  <abc arg2="2"> p </abc>
  <pqr arg3="3" arg4="a phrase"> x </pqr>
</status>

The root tag, which in this case is “status”, is used as a suffix for all variables. Once the XML file is passed to the function, it dynamically creates the variables $status_xyz, $status_abc, $status_pqr, $status_xyz_arg1, $status_abc_arg2, $status_pqr_arg3 and $status_pqr_arg4.

The output when the script is ran with the xml file as an argument is

@$ bash  parseXML.sh test.xml
a |  p |  x
1 |  2 |  3 | a phrase

status_abc=p
status_abc_arg2=2
status_pqr=x
status_pqr_arg3=3
status_pqr_arg4='a phrase'
status_xyz=a
status_xyz_arg1=1

This script won’t work for XML files like the one below with duplicate element names.

<?xml version="1.0"?>
<status>
  <test arg1="1"> a </test>
  <test arg2="2"> p </test>
  <test arg3="3" arg4="a phrase"> x </test>
</status>

This script also won’t be able to extract attributes of elements without any CDATA. For eg, the script won’t be able to create variables corresponding to <test arg1="1">. It will only create the variables corresponding to <test1 arg2="2">abc</test1>.

<?xml version="1.0"?>
<status>
  <test arg1="1">
    <test1 arg2="2">abc</test1>
  </test>
</status>

Posted in Code-Snippets | Tagged bash, xml | 5 Responses

  • Tweet
Logging In...

Profile cancel

Sign in with Twitter Sign in with Facebook
or

Not published

 

  • 5 Replies
  • 5 Comments
  • 0 Tweets
  • 0 Facebook
  • 0 Pingbacks
Last reply was 46 days ago
  1. Chris Hunter
    View 489 days ago

    I think you meant the root tag is used as a *prefix. But other wise well written article, it really helped me make a configurable backup script.

    Reply
    • Pratik Sinhareplied:
      View 486 days ago

      Oh yes, it should be prefix!

      Reply
  2. bhavya
    View 170 days ago

    Hi I was wondering if some can provide script to reverse this procedure(to generate same xml format like above from same text format)?

    Reply
  3. Kenneth Jacob
    View 46 days ago

    this script is really helpful. But when I try to run it, the script didn’t work at first because of the XMLLINT_INDENT variable value is “empty string” instead of “single space”. Maybe the HTML format causes the single space to disappear.

    Reply
    • Pratik Sinhareplied:
      View 46 days ago

      Hi Kenneth, XMLLINT_INDENT is supposed to be empty, because I don’t want the xml to be indented. I checked the script again and it works for me with XMLLINT_INDENT as an empty string. I remember some corner cases, where the indentation was causing an issue.

      Reply
« Previous Next »

Search

Recommend on Google
  • RSS
Follow @free_thinker

Get the latest posts delivered straight to your inbox.

Categories

  • Bookmarks
  • Code-Snippets
  • Guides
  • Info
  • Software
  • Testimonials
  • Tips-N-Tricks
  • Updates

Recently Popular

  • Make English the Default Language for Google Chrome Search
  • Parse Simple XML Files using Bash – Extract Name Value Pairs and Attributes
  • Kill/Quit and Restart Plasma on KDE
  • Bash Tricks: Split / Cut a String with Multi Character Delimiters
  • Enable SPDY in Firefox 11 on Ubuntu 12.04 beta
  • Utility to Send Commands or Data to Other Terminals (tty/pts)
  • A Sample Loop in XSL, Alternative for While, For Loops
  • Bash Tricks: Create variables dynamically using some eval magic
  • Kickstrap – CSS Framework: Enhancements for Bootstrap
  • WordPress Plugin: Cleaner WordPress Editor – Trying To Make WordPress Editing a Pleasure

Latest Tweets

  • 2 unexploded bombs found under Ho Chi Minh City house - Approximately 800,000 tons of bombs and mines still present humbug.in/z/9i #fb 4 weeks ago
  • 3 Vietnamese bloggers charged over their writing | Fox News humbug.in/z/9g #fb 1 month ago
  • Implement strong WiFi encryption the easy way with hostapd - Using WPA2-Personal with individual keys for each user humbug.in/z/9e 1 month ago
  • Soon, wonder vaccine that will kill 90% of cancers - Health - DNA humbug.in/z/9d 1 month ago
  • PHP like str_replace function in C: This post is part of a series of posts where I want to document a bunch of C f... humbug.in/z/9c 1 month ago

Services Offered

  • Embedded Linux Systems and Services
  • Gateway Routers (Frontend and Backend)
  • Free and Open Source Software Solutions
  • Network Monitoring Solutions
  • Device Driver Development


Mobile and Web Analytics

Recent Comments

  • Haakon Dahl commented on Make English the Default Language for Google Chrome Search
    (1 weeks ago)
  • pawan commented on Wordpress Error: The plugin generated 1 characters of unexpected output during activation
    (1 weeks ago)
  • simon commented on Detect Mobile Browser using Server Side Includes (SSI)
    (2 weeks ago)
  • hakatagroup commented on Twitter Weekly Updates between 2012-04-19 and 2012-04-25
    (3 weeks ago)

Tags

Address bar android awk bash C css custom post type debug Default Browser DNS dreamhost English Facebook firefox Google iphone JavaScript jquery KDE keyboard shortcuts Kubuntu Launchpad Linux Mozilla Firefox p2 page template pdf php plugin PPA python ruby ruby-on-rails Ruby1.9.2 Search Results SNMP theme tweets Twitter Ubuntu United States update-alternatives Vietnam wordpress xsl

Copyright © 2012 Humbug.

Powered by WordPress and Hybrid.