On 02/03/2011 07:47 PM, Karim wrote:
On 02/03/2011 02:15 PM, Peter Otten wrote:
Karim wrote:

I am trying to subsitute a '""' pattern in '\"\"' namely escape 2
consecutives double quotes:

     * *In Python interpreter:*

$ python
Python 2.7.1rc1 (r271rc1:86455, Nov 16 2010, 21:53:40)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>  expression = *' "" '*
>>>  re.subn(*r'([^\\])?"', r'\1\\"', expression*)
Traceback (most recent call last):
    File "<stdin>", line 1, in<module>
    File "/home/karim/build/python/install/lib/python2.7/re.py", line
162, in subn
      return _compile(pattern, flags).subn(repl, string, count)
    File "/home/karim/build/python/install/lib/python2.7/re.py", line
278, in filter
      return sre_parse.expand_template(template, match)
    File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
line 787, in expand_template
      raise error, "unmatched group"
sre_constants.error: unmatched group

But if I remove '?' I get the following:

>>>  re.subn(r'([^\\])"', r'\1\\"', expression)
(' \\"" ', 1)

Only one substitution..._But this is not the same REGEX._ And the
count=2 does nothing. By default all occurrence shoul be substituted.

* *On linux using my good old sed command, it is working with my '?'
       (0-1 match):*

*$* echo *' "" '* | sed *'s/\([^\\]\)\?"/\1\\"/g*'*
   \"\"

*Indeed what's the matter with RE module!?*
You should really fix the problem with your email program first;
Thunderbird issue with bold type (appears as stars) but I don't know how to fix it yet.
  afterwards
it's probably a good idea to try and explain your goal clearly, in plain
English.

I already did it. (cf the mails queue). But to resume I pass the expression string to TCL command which delimits string with double quotes only.
Indeed I get error with nested double quotes => That's the key problem.
Yes. What Steven said ;)

Now to your question as stated: if you want to escape two consecutive double
quotes that can be done with

s = s.replace('""', '\"\"')

I have already done it as a workaround but I have to add another replacement before to consider all other cases.
I want to make the original command work to suppress the workaround.


but that's probably *not* what you want. Assuming you want to escape two
consecutive double quotes and make sure that the first one isn't already
escaped,

You hit it !:-)

this is my attempt:

def sub(m):
...     s = m.group()
...     return r'\"\"' if s == '""' else s
...
print re.compile(r'[\\].|""').sub(sub, r'\\\"" \\"" \"" "" \\\" \\" \"')

That is not the thing I want. I want to escape any " which are not already escaped. The sed regex '/\([^\\]\)\?"/\1\\"/g' is exactly what I need (I have made regex on unix since 15 years).

For me the equivalent python regex is buggy: r'([^\\])?"', r'\1\\"'
'?' is not accepted Why? character which should not be an antislash with 0 or 1 occurence. This is quite simple.

I am a poor tradesman but I don't deny evidence.

Recall:

>>> re.subn(r'([^\\])?"', r'\1\\"', expression)

Traceback (most recent call last):
    File "<stdin>", line 1, in<module>
    File "/home/karim/build/python/install/lib/python2.7/re.py", line
162, in subn
      return _compile(pattern, flags).subn(repl, string, count)
    File "/home/karim/build/python/install/lib/python2.7/re.py", line
278, in filter
      return sre_parse.expand_template(template, match)
    File "/home/karim/build/python/install/lib/python2.7/sre_parse.py",
line 787, in expand_template
      raise error, "unmatched group"
sre_constants.error: unmatched group


Found the solution: '?' needs to be inside parenthesis (saved pattern) because outside we don't know if the saved match argument
will exist or not namely '\1'.

>>> re.subn(r'([^\\]?)"', r'\1\\"', expression)

(' \\"\\" ', 2)

sed unix command is more permissive: sed 's/\([^\\]\)\?"/\1\\"/g' because '?' can be outside parenthesis (saved pattern but escaped for sed). \1 seems to not cause issue when matching is found. Perhaps it is created only when match occurs.

MORALITY:

1) Behaviour of python is logic and I must understand what I do with it.
2) sed is a fantastic tool because it manages match value when missing.
3) I am a real poor tradesman

Regards
Karim


Regards
Karim

\\\"" \\\"\" \"" \"\" \\\" \\" \"

Compare that with

$ echo '\\\"" \\"" \"" "" \\\" \\" \"' | sed 's/\([^\\]\)\?"/\1\\"/g'
\\\"\" \\"\" \"\" \"\" \\\\" \\\" \\"

Concerning the exception and the discrepancy between sed and python's re, I
suggest that you ask it again on comp.lang.python aka the python-list
mailing list where at least one regex guru will read it.

Peter

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to