On 02/03/2011 07:47 PM, Karim wrote:
On 02/03/2011 02:15 PM, Peter Otten wrote:
Karim wrote:
I am trying to subsitute a '""' pattern in '\"\"' namely escape 2
consecutives double quotes:
* *In Python interpreter:*
$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> expression = *' "" '*
>>> re.subn(*r'([^\\])?"', r'\1\\"', expression*)
Traceback (most recent call last):
File "<stdin>", line 1, in<module>
File "/home/karim/build/python/install/lib/python2.7/re.py", line
162, in subn
return _compile(pattern, flags).subn(repl, string, count)
File "/home/karim/build/python/install/lib/python2.7/re.py", line
278, in filter
return sre_parse.expand_template(template, match)
File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
line 787, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group
But if I remove '?' I get the following:
>>> re.subn(r'([^\\])"', r'\1\\"', expression)
(' \\"" ', 1)
Only one substitution..._But this is not the same REGEX._ And the
count=2 does nothing. By default all occurrence shoul be substituted.
* *On linux using my good old sed command, it is working with
my '?'
(0-1 match):*
*$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
\"\"
*Indeed what's the matter with RE module!?*
You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know
how to fix it yet.
afterwards
it's probably a good idea to try and explain your goal clearly, in plain
English.
I already did it. (cf the mails queue). But to resume I pass the
expression string to TCL command which delimits string with double
quotes only.
Indeed I get error with nested double quotes => That's the key problem.
Yes. What Steven said ;)
Now to your question as stated: if you want to escape two consecutive
double
quotes that can be done with
s = s.replace('""', '\"\"')
I have already done it as a workaround but I have to add another
replacement before to consider all other cases.
I want to make the original command work to suppress the workaround.
but that's probably *not* what you want. Assuming you want to escape two
consecutive double quotes and make sure that the first one isn't already
escaped,
You hit it !:-)
this is my attempt:
def sub(m):
... s = m.group()
... return r'\"\"' if s == '""' else s
...
print re.compile(r'[\\].|""').sub(sub, r'\\\"" \\"" \"" "" \\\"
\\" \"')
That is not the thing I want. I want to escape any " which are not
already escaped.
The sed regex '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have
made regex on unix since 15 years).
For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'
'?' is not accepted Why? character which should not be an antislash
with 0 or 1 occurence. This is quite simple.
I am a poor tradesman but I don't deny evidence.
Recall:
>>> re.subn(r'([^\\])?"', r'\1\\"', expression)
Traceback (most recent call last):
File "<stdin>", line 1, in<module>
File "/home/karim/build/python/install/lib/python2.7/re.py", line
162, in subn
return _compile(pattern, flags).subn(repl, string, count)
File "/home/karim/build/python/install/lib/python2.7/re.py", line
278, in filter
return sre_parse.expand_template(template, match)
File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
line 787, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group
Found the solution: '?' needs to be inside parenthesis (saved pattern)
because outside we don't know if the saved match argument
will exist or not namely '\1'.
>>> re.subn(r'([^\\]?)"', r'\1\\"', expression)
(' \\"\\" ', 2)
sed unix command is more permissive: sed 's/\([^\\]\)\?"/\1\\"/g'
because '?' can be outside parenthesis (saved pattern but escaped for sed).
\1 seems to not cause issue when matching is found. Perhaps it is
created only when match occurs.
MORALITY:
1) Behaviour of python is logic and I must understand what I do with it.
2) sed is a fantastic tool because it manages match value when missing.
3) I am a real poor tradesman
Regards
Karim
Regards
Karim
\\\"" \\\"\" \"" \"\" \\\" \\" \"
Compare that with
$ echo '\\\"" \\"" \"" "" \\\" \\" \"' | sed 's/\([^\\]\)\?"/\1\\"/g'
\\\"\" \\"\" \"\" \"\" \\\\" \\\" \\"
Concerning the exception and the discrepancy between sed and python's
re, I
suggest that you ask it again on comp.lang.python aka the python-list
mailing list where at least one regex guru will read it.
Peter
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor