Hello,

I'm trying to do some text processing with python on a farily large
text file (actually, XML, but I am handling it as plaintext as all I
need to do is find/replace/move) and I am having problems with trying
to identify two lines in the text file, and remove everything in
between those two lines (but not the two lines) and then write the
file back (I know the file IO part).

I'm trying to do this with the re module - the two tags looks like:

<foo>
    ...
    a bunch of text (~1500 lines)
    ...
</foo>

I need to identify the first tag, and the second, and unconditionally
strip out everything in between those two tags, making it look like:

<foo>
</foo>

I'm familiar with using read/readlines to pull the file into memory
and alter the contents via string.replace(str, newstr) but I am not
sure where to begin with this other than the typical open/readlines.

I'd start with something like:

re1 = re.compile('^\<foo\>')
re2 = re.compile('^\<\/foo\>')

f = open('foobar.txt', 'r')
for lines in f.readlines()
    match = re.match(re1, line)

But I'm lost after this point really, as I can identify the two lines,
but I am not sure how to do the processing.

thank you
-jesse
_______________________________________________
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to