On 06/07/2008 5:37 PM, (Ted Harding) wrote:
On 06-Jul-08 21:17:04, Duncan Murdoch wrote:
I'm trying to write a gsub() call that takes a string and escapes all
the unescaped quote marks in it. So the string
\"
would be left unchanged, but
\\"
would be changed to
\\\"
because the double backslash doesn't act as an escape for the quote,
the first just escapes the second. I have the usual problems of
writing regular expressions involving backslashes which make
everything I write completely unreadable, so I'm going to change
the problem for this post: I will define E to be the escape
character, and q to be the quote; the gsub() call would leave
Eq
unchanged, but would change
EEq
to EEEq, etc.
The expression I have come up with after this change is
gsub( "((^|[^E])(EE)*)q", "\\1Eq", x)
i.e. "(start of line, or non-escape, followed by an even number of
escapes), all of which we call expression 1, followed by a quote,
is replaced by expression 1 followed by an escape and a quote".
This works sometimes, but not always:
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "Eq")
[1] "Eq"
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "EEq")
[1] "EEEq"
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qaq")
[1] "EqaEq"
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qq")
[1] "qEq"
Notice that in the final example, the first quote doesn't get escaped.
Why not????
I think (without having done the "experimental diagnostics")
that it's because in "qq" the first q mtaches (^|[^E]) because
it matches [^E] (i.e. is a "non-escape"); since it is followed
by q, it is the second q which gets the escape. Possibly you
need to include "^q" as an additional alternative match at the
start of the line.
Thanks, that sounds right, but now I can't see how to fix it. Is there
syntax to say: match A only if it follows B, but don't match the B part?
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.