subject:"\[R\] please comment on my function"

Re: [R] please comment on my function

2012-09-14 Thread jim holtman

You can alway convert to lower case afterwards with probably a shorter vector. You did not indicate that you needed that conversion; it only looked like you did it for the regular expression. On Fri, Sep 14, 2012 at 3:13 PM, Sam Steingold wrote: >> * jim holtman [2012-09-14 13:10:37 -0400]: >>

Re: [R] please comment on my function

2012-09-14 Thread Sam Steingold

> * jim holtman [2012-09-14 13:10:37 -0400]: > > more than half the time is in 'tolower' and 'nchar', so it is not all > 'sub's problem. aha, thanks! > This version runs a little faster since it does not need the 'tolower': > > canonicalize.language <- function (s) { > # s <- tolower(s) > lo

Re: [R] please comment on my function

2012-09-14 Thread jim holtman

First thing to do is to run Rprof and see where the time is going; here it is from my computer: self.time self.pct total.time total.pct tolower4.4239.46 4.42 39.46 sub3.5631.79 3.56 31.79 nchar

[R] please comment on my function

2012-09-14 Thread Sam Steingold

this function is supposed to canonicalize the language: --8<---cut here---start->8--- canonicalize.language <- function (s) { s <- tolower(s) long <- nchar(s) == 5 s[long] <- sub("^([a-z]{2})[-_][a-z]{2}$","\\1",s[long]) s[nchar(s) != 2 & s != "c"] <- "u