Smile and Kent The logic is so good so far. However, How do we move the (...) in |H to end of |R and before next |H
Much respect AD Exceptional team: I like and I agree with your all logic (I have no choice! Smile you are more advanced than me) Kent said: I think I would split this into three phases: - collect the data into groups of HFR - process each group by rearranging, renumbering, reporting errors - output the processed groups One potential problem is to resynchronize to the next group when there is a sequence error. If there is always a blank line between groups it is easy. Otherwise maybe just assume an H is the start of a group. And Smile addressed Kent's concern by saying: Hmm...so Alan could first split the data on the "|H" values. These *should* contain an |F and and |R, so the next step would be to break these HFR groups into pieces and check to see that all the pieces are there, and perhaps if not, printing those to an error file for review. Alan, regarding the extraction of the parentheticals, what have you tried? One suggestion for this aspect is to get rid of the line breaks in the |H chunk and then you won't have the problem of a broken parenthetical. For example, ###### >>> multiLines = '''This (as you ... can see) is multilined.''' >>> multiLines.splitlines() ['This (as you', 'can see) is multilined.'] >>> ' '.join(multiLines.splitlines()) 'This (as you can see) is multilined.' >>> # the above is one line and much easier to handle now. ###### >How are you reading the data in from the file? I use the 150 line python I do not mind emailing it directly so I do not confuse these cleaning tasks - you just say yes Much respect AD --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.778 / Virus Database: 525 - Release Date: 10/15/2004 _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
