Hi,

I have a file that is a long list of records (roughly) in the format

[EMAIL PROTECTED]

So, for example:

[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
....

What I would like to do is run a regular expression against this and
wind up with:

[EMAIL PROTECTED]@[EMAIL PROTECTED]@data4
[EMAIL PROTECTED]

So I ran the following regex against the string:

re.compile(r'([EMAIL PROTECTED])@(.*)\n\1@(.*)').sub(r'\1\2\3', string)

and I wound up with:

[EMAIL PROTECTED]@data2
[EMAIL PROTECTED]@data4
[EMAIL PROTECTED]

So, my questions are:
(1) Is there any way to get a single regular expression to handle
overlapping matches so that I get what I want in one call?
(2) Is there any way (without comparing the before and after strings) to
know if a re.sub(...) call did anything?

I suppose I could do something like:

pattern = re.compile(r'([EMAIL PROTECTED])@(.*)\n\1@(.*)')

while(pattern.search(string)):
    string = pattern.sub(r'\1\2\3', string)

but I would like to avoid the explicit loop if possible...

Actually, should I be able to do something like that?  If I execute it
in my debugger, my string gets really funky... like the re is losing
track of what the groups are... and I end up with a single really long
string rather than what I expect..


Any help on this would be appreciated.

-jdc




_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to