I've been working on a way to parse an XML document and convert it into a 
python dictionary.  I want to maintain the hierarchy of the XML.  Here is the 
sample XML I have been working on:

<collection>
  <comic title="Sandman" number='62'>
    <writer>Neil Gaiman</writer>
    <penciller pages='1-9,18-24'>Glyn Dillon</penciller>
    <penciller pages="10-17">Charles Vess</penciller>
  </comic>
</collection>

This is my first stab at this:

#!/usr/bin/env python

from lxml import etree

def generateKey(element):
    if element.attrib:
        key = (element.tag, element.attrib)
    else:
        key = element.tag
    return key  

class parseXML(object):
    def __init__(self, xmlFile = 'test.xml'):
        self.xmlFile = xmlFile
        
    def parse(self):
        doc = etree.parse(self.xmlFile)
        root = doc.getroot()
        key = generateKey(root)
        dictA = {}
        for r in root.getchildren():
            keyR = generateKey(r)
            if r.text:
                dictA[keyR] = r.text
            if r.getchildren():
                dictA[keyR] = r.getchildren()
                
        newDict = {}
        newDict[key] = dictA
        return newDict
                        
if __name__ == "__main__":
    px = parseXML()
    newDict = px.parse()
    print newDict
        
This is the output:
163>./parseXML.py
{'collection': {('comic', {'number': '62', 'title': 'Sandman'}): [<Element 
writer at -482193f4>, <Element penciller at -482193cc>, <Element penciller at 
-482193a4>]}}

The script doesn't descend all of the way down because I'm not sure how to hand 
a XML document that may have multiple layers.  Advice anyone?  Would this be a 
job for recursion?

Thanks!
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to