On 11/11/2012 12:01 AM, Marc wrote: > Hello, > > I am trying to parse a text file with a structure that looks like: > > [record: Some text about the record]
So the record delimiter starts with a left bracket, in first column? And all lines within the record are indented? Use this fact. > Attribute 1 = Attribute 1 text > Attribute 3 = Attribute 3 text > Attribute 4 = Attribute 4 text > Attribute 7 = Attribute 7 text > > [record: Some text about the record] > Attribute 1 = Attribute 1 text > Attribute 2 = Attribute 2 text > Attribute 3 = Attribute 3 text > Attribute 4 = Attribute 4 text > Attribute 5 = Attribute 5 text > Attribute 6 = Attribute 6 text > > [record: Some text about the record] > Attribute 2 = Attribute 2 text > Attribute 3 = Attribute 3 text > Attribute 7 = Attribute 7 text > Attribute 8 = Attribute 8 text > > Etc.for many hundreds of records > > I am looking to create output that looks like: > > Attribute 1 text | Attribute 3 text > Attribute 1 text | Attribute 3 text > Blank | Attribute 3 text > > Treating each record as a record with its associated lines is the holy grail > for which I am searching, yet I seem to only be coming up with dead parrots. > It should be simple, but the answer is eluding me and Google has not been > helpful. > > Pathetic thing is that I do this with Python and XML all the time, but I > can't seem to figure out a simple text file. I 'm missing something simple, > I'm sure. Here's the most I have gotten to work (poorly) so far - it gets > me the correct data, but not in the correct format because the file is being > handled sequentially, not by record - it's not even close, but I thought I'd > include it here: > > for line in infile: > while line != '\n': > Attribute1 = 'Blank' > Attribute3 = 'Blank' > line = line.lstrip('\t') > line = line.rstrip('\n') > LineElements = line.split('=') > if LineElements[0] == 'Attribute1 ': > Attribute1=LineElements[1] > if LineElements[0] == 'Attribute3 ': > Attribute3=LineElements[1] > print("%s | %s\n" % (Attribute1, Attribute3)) > > Is there a library or example I could be looking at for this? I use lxml > for xml, but I don't think it will work for this - at least the way I tried > did not. I don't think any existing library will fit your format, unless you happen to be very lucky. What you probably want is to write a generator function that gives you a record at a time. It'll take a file object (infile) and it'll yield a list of lines. Then your main loop would be something like: for record in records(infile): attrib1 = attrib2 = "" for line in record: line = strip(line) line_elements = line.split("=") etc. here you print out the attrib1/2 as appropriate I'll leave you to write the records() generator. But the next() method will probably play a part. -- DaveA _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor