> * jim holtman <wubyg...@tznvy.pbz> [2012-09-14 13:10:37 -0400]: > > more than half the time is in 'tolower' and 'nchar', so it is not all > 'sub's problem.
aha, thanks! > This version runs a little faster since it does not need the 'tolower': > > canonicalize.language <- function (s) { > # s <- tolower(s) > long <- nchar(s) == 5 > s[long] <- sub("^([[:alpha:]]{2})[-_][[:alpha:]]{2}$","\\1",s[long]) > s[nchar(s) != 2 & s != "c"] <- "unknown" > s > } but it does not convert "EN" to "en", so it is not good for my purposes. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://thereligionofpeace.com http://mideasttruth.com http://iris.org.il http://honestreporting.com http://memri.org Life is like Tetris: failures accumulate, successes fade. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.