I'm trying to write a gsub() call that takes a string and escapes all the unescaped quote marks in it. So the string

\"

would be left unchanged, but

\\"

would be changed to

\\\"

because the double backslash doesn't act as an escape for the quote, the first just escapes the second. I have the usual problems of writing regular expressions involving backslashes which make everything I write completely unreadable, so I'm going to change the problem for this post: I will define E to be the escape character, and q to be the quote; the gsub() call would leave

Eq

unchanged, but would change

EEq

to EEEq, etc.

The expression I have come up with after this change is

gsub( "((^|[^E])(EE)*)q", "\\1Eq", x)

i.e. "(start of line, or non-escape, followed by an even number of escapes), all of which we call expression 1, followed by a quote, is replaced by expression 1 followed by an escape and a quote".

This works sometimes, but not always:

> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "Eq")
[1] "Eq"
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "EEq")
[1] "EEEq"
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qaq")
[1] "EqaEq"
> gsub( "((^|[^E])(EE)*)q", "\\1Eq", "qq")
[1] "qEq"

Notice that in the final example, the first quote doesn't get escaped. Why not????

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to