| I think I would split this into three phases:
| - collect the data into groups of HFR
| - process each group by rearranging, renumbering, reporting errors
| - output the processed groups
| 
| One potential problem is to resynchronize to the next group when
| there is a sequence error. If there is always a blank line between
| groups it is easy. Otherwise maybe just assume an H is the start of a
| group.   
| 

Hmm...so Alan could first split the data on the "|H" values. These *should* 
contain an |F and and |R, so the next step would be to break these HFR groups 
into pieces and check to see that all the pieces are there, and perhaps if not, 
printing those to an error file for review.

Alan, regarding the extraction of the parentheticals, what have you tried? One 
suggestion for this aspect is to get rid of the line breaks in the |H chunk and 
then you won't have the problem of a broken parenthetical. For example,

######
>>> multiLines = '''This (as you
... can see) is multilined.'''
>>> multiLines.splitlines()
['This (as you', 'can see) is multilined.']
>>> ' '.join(multiLines.splitlines())
'This (as you can see) is multilined.'
>>> # the above is one line and much easier to handle now.
######

How are you reading the data in from the file?

/c
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to