On Friday, October 10, 2014 8:21:17 AM UTC-4, Peter Otten wrote:
> David Jobes wrote:
>
>
>
> > I was given a badly or poor formatted xml file that i need to convert to
>
> > csv file:
>
>
>
> There are no "badly formatted" XML files, only valid and invalid ones.
>
> Fortunately following looks like the beginning of a valid one.
>
>
>
> >
>
> > http://exslt.org/dynamic";>
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > 0001-0001-0001-0001-0027
>
> > 27
>
> > 2
>
> > 0027: IP Options: Record Route (RR)
>
> > Network_equip
>
> > 10
>
> > ip
>
> > 100741885
>
> > 2001-0752,1999-1339,1999-0986
>
> > 870
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > I have been able to load and read the file line by line,
>
>
>
> XML doesn't have an idea of lines, so don't do that. Instead let a parser
>
> make sense of the document structure.
>
>
>
> > but once i get to
>
> > the r line and try to process each c(column) that is where it blows up. I
>
> > need to be able to split the lines and place each one or the r (row) on a
>
> > single line for the csv.
>
> >
>
> > i have a list set for each one of the headers based on the col name field,
>
> > i just have been able to format properly.
>
>
>
> Here's a simple script using ElementTree, to introduce you to basic xml
>
> handling with Python's stdlib. If you are lucky it might even work ;)
>
>
>
> import csv
>
> import sys
>
> from xml.etree import ElementTree
>
>
>
> SOURCEFILE = "xml_to_csv.xml"
>
>
>
> tree = ElementTree.parse(SOURCEFILE)
>
> table = tree.find("table")
>
> column_names = [c.attrib["name"] for c in table.findall("column")]
>
> writer = csv.writer(sys.stdout)
>
> writer.writerow(column_names)
>
> for row in table.find("data").findall("r"):
>
> writer.writerow([field.text for field in row.findall("c")])
That did it, thank you, and in a lot fewer lines of code than i had, i was
trying to use strings and regex. i will read up more on the xml.etree stuff.
--
https://mail.python.org/mailman/listinfo/python-list