You can alway convert to lower case afterwards with probably a shorter vector. You did not indicate that you needed that conversion; it only looked like you did it for the regular expression.
On Fri, Sep 14, 2012 at 3:13 PM, Sam Steingold <s...@gnu.org> wrote: >> * jim holtman <wubyg...@tznvy.pbz> [2012-09-14 13:10:37 -0400]: >> >> more than half the time is in 'tolower' and 'nchar', so it is not all >> 'sub's problem. > > aha, thanks! > >> This version runs a little faster since it does not need the 'tolower': >> >> canonicalize.language <- function (s) { >> # s <- tolower(s) >> long <- nchar(s) == 5 >> s[long] <- sub("^([[:alpha:]]{2})[-_][[:alpha:]]{2}$","\\1",s[long]) >> s[nchar(s) != 2 & s != "c"] <- "unknown" >> s >> } > > but it does not convert "EN" to "en", so it is not good for my purposes. > > -- > Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X > 11.0.11103000 > http://www.childpsy.net/ http://thereligionofpeace.com http://mideasttruth.com > http://iris.org.il http://honestreporting.com http://memri.org > Life is like Tetris: failures accumulate, successes fade. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.