Hi,
I was recently misfortunate enough to have to use regular expressions to sort out some data in R. I'm working on a data file which contains taxonomical data of bacteria in hierarchical order.
A sample of this file can be generated using:

tax.data <- read.table(header=F, con <- textConnection('
G9SS7BA01D15EC  Bacteria(100)    Cyanobacteria(84)    unclassified
G9SS7BA01C9UIR Bacteria(100) Proteobacteria(94) Alphaproteobacteria(89) G9SS7BA01CM00D Bacteria(100) Proteobacteria(99) Alphaproteobacteria(99)
'))
close(con)

What I try to do is to remove the parenthesis and the number inside (which could contain a decimal point) I assumed that the following command would solve it, but instead I got an error.

tax.data <- as.data.frame(apply(tax.data, 2, function(x) gsub('\(.*\)','',x)))
Error: '\(' is an unrecognized escape in character string starting "\("

And it doesn't matter if I use perl = TRUE or not.
To solve it I need to use a double escape sign '\\' before opening and closing the parenthesis:

tax.data <- as.data.frame(apply(tax.data, 2, function(x) gsub('\\(.*\\)','',x)))

This yields the desired result but I wonder why it does that?
No other regular expression system I'm used to (e.g. Perl, Shell) works like that.

I'm using R 2.14 (but also R 2.10) and I get the same results on Ubuntu and win XP.

I'd appreciate any explanation.

Thanks in advance,
baffled Roey

--
Dr. Roey Angel

Max-Planck-Institute for Terrestrial Microbiology
Karl-von-Frisch-Strasse 10
D-35043 Marburg, Germany

Office: +49 (0)6421/178-832
Mobile: +49 (0)176/612-785-88

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to