I've been working on a way to parse an XML document and convert it into a python dictionary. I want to maintain the hierarchy of the XML. Here is the sample XML I have been working on:
<collection> <comic title="Sandman" number='62'> <writer>Neil Gaiman</writer> <penciller pages='1-9,18-24'>Glyn Dillon</penciller> <penciller pages="10-17">Charles Vess</penciller> </comic> </collection> This is my first stab at this: #!/usr/bin/env python from lxml import etree def generateKey(element): if element.attrib: key = (element.tag, element.attrib) else: key = element.tag return key class parseXML(object): def __init__(self, xmlFile = 'test.xml'): self.xmlFile = xmlFile def parse(self): doc = etree.parse(self.xmlFile) root = doc.getroot() key = generateKey(root) dictA = {} for r in root.getchildren(): keyR = generateKey(r) if r.text: dictA[keyR] = r.text if r.getchildren(): dictA[keyR] = r.getchildren() newDict = {} newDict[key] = dictA return newDict if __name__ == "__main__": px = parseXML() newDict = px.parse() print newDict This is the output: 163>./parseXML.py {'collection': {('comic', {'number': '62', 'title': 'Sandman'}): [<Element writer at -482193f4>, <Element penciller at -482193cc>, <Element penciller at -482193a4>]}} The script doesn't descend all of the way down because I'm not sure how to hand a XML document that may have multiple layers. Advice anyone? Would this be a job for recursion? Thanks! _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor