I would use a loop with a flag to indicate whether you are in the <foo> block or not. If the tags are always <foo> and </foo> on a line by themselves you don't need an re:

lines = []
appending = True
f = open('foobar.txt', 'r')
for line in f:
  if appending:
    lines.append(line)
    if line.strip() == '<foo>':
      appending = False

  elif line.strip() == '</foo>':
    appending = True
    lines.append(line)
f.close()

At the end of this loop lines will have the lines you want to write back.

Kent

Jesse Noller wrote:
Hello,

I'm trying to do some text processing with python on a farily large
text file (actually, XML, but I am handling it as plaintext as all I
need to do is find/replace/move) and I am having problems with trying
to identify two lines in the text file, and remove everything in
between those two lines (but not the two lines) and then write the
file back (I know the file IO part).

I'm trying to do this with the re module - the two tags looks like:

<foo>
    ...
    a bunch of text (~1500 lines)
    ...
</foo>

I need to identify the first tag, and the second, and unconditionally
strip out everything in between those two tags, making it look like:

<foo>
</foo>

I'm familiar with using read/readlines to pull the file into memory
and alter the contents via string.replace(str, newstr) but I am not
sure where to begin with this other than the typical open/readlines.

I'd start with something like:

re1 = re.compile('^\<foo\>')
re2 = re.compile('^\<\/foo\>')

f = open('foobar.txt', 'r')
for lines in f.readlines()
    match = re.match(re1, line)

But I'm lost after this point really, as I can identify the two lines,
but I am not sure how to do the processing.

thank you
-jesse
_______________________________________________
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to