On Dec 8, 2004, at 14:42, Jesse Noller wrote:
Hello,
I'm trying to do some text processing with python on a farily large text file (actually, XML, but I am handling it as plaintext as all I need to do is find/replace/move) and I am having problems with trying to identify two lines in the text file, and remove everything in between those two lines (but not the two lines) and then write the file back (I know the file IO part).
Okay, here are some hints: you need to identify when you enter a <foo> block and when you exit a </foo> block, keeping in mind that this may happen on the same line (e.g. <foo>blah</foo>). The rest is trivial.
The rest of your message is included as a spoiler space if you want to find the solution by yourself -- however, a 17-line program that does that is included at the end of this message. It prints the resulting file to the standard out, for added flexibility: if you want the result to be in a file, just redirect stdout (python blah.py > out.txt).
Oh, one last thing: don't use readlines(), it uses up a lot of memory (especially with big files), and you don't need it since you're reading the file sequentially. Use the file iterator instead.
I'm trying to do this with the re module - the two tags looks like:
<foo> ... a bunch of text (~1500 lines) ... </foo>
I need to identify the first tag, and the second, and unconditionally strip out everything in between those two tags, making it look like:
<foo> </foo>
I'm familiar with using read/readlines to pull the file into memory and alter the contents via string.replace(str, newstr) but I am not sure where to begin with this other than the typical open/readlines.
I'd start with something like:
re1 = re.compile('^\<foo\>') re2 = re.compile('^\<\/foo\>')
f = open('foobar.txt', 'r') for lines in f.readlines() match = re.match(re1, line)
But I'm lost after this point really, as I can identify the two lines, but I am not sure how to do the processing.
thank you -jesse _______________________________________________ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
#!/usr/bin/env python
import sre
reStart = sre.compile('^\s*\<foo\>') reEnd = sre.compile('\</foo\>\s*$')
inBlock = False
fileSource = open('foobar.txt')
for line in fileSource: if reStart.match(line): inBlock = True if not inBlock: print line if reEnd.match(line): inBlock = False
fileSource.close()
-- Max
maxnoel_fr at yahoo dot fr -- ICQ #85274019
"Look at you hacker... A pathetic creature of meat and bone, panting and sweating as you run through my corridors... How can you challenge a perfect, immortal machine?"
_______________________________________________ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor