# Thanks again to everyone who provided suggestions. # I was curious about which approaches would be the fastest... so a little benchmarking
# My approach was by far the worst :) # The approach suggested by Duncan Murdoch and Peter Langfelder, based on indexing , was by far the fastest (~ 66times faster than using nested ifelse() ). All the details can be found below for those who are interested. I found it interesting that the variant by Peter Langfelder was somewhat slower, given that the only difference was explicitly defining the class in the index. What is the speed cost for this: O(n) or O(1)? # I have one additional question. I would have guessed that initializing an empty vector of the right size would have sped up the subsequent operation, filling that vector, but it does not seem to have much of an effect. Any thoughts? # i.e. using N <- 6000000 # number of observations elevation <- rep(NA, length(Population)) # This does not really speed things up much. ##### Population <- gl( n=6, k=5,length=N, labels =c("Ga", "CO", "CN","KO", "Ng", "Mw")) # You would like to assign a particular value to each level of population (in this case the elevation at which they were collected). In a vectorized approach (for speed... pretend this was a big data set..) elevation <- rep(NA, length(Population)) # Just to make a vector of the right size, to speed up filling it. In practice it does not seem to speed things up. # My original approach system.time( elevation <- ifelse(Population=="CO", 2169, ifelse(Population=="CN", 1121, ifelse(Population=="Ga", 500, ifelse(Population=="KO", 2500, ifelse(Population=="Mw", 625, ifelse(Population=="Ng", 300, NA )))))) ) #elapsed ~ 12s... by far the slowest approach!!!! # Suggestions #Peter Langfelder values = c(500, 2169, 1121, 2500, 300, 625) system.time( elevation.PL <- values[as.numeric(factor(Population))] ) # ~ 0.85s # Values need to be in the order in which the levels of the factor are sorted #i.e. Pop2 <- rep(c("Ga", "CO", "CN", "Ng", "KO", "Mw"), 10) # levels(factor(Pop2)) would not work. #or codeToElev = data.frame(codes = c("CO", "CN","Ga","KO", "Mw", "Ng"), elev = c(2169, 1121, 500, 2500, 625, 300)) system.time( elevation.PL.2 <- codeToElev$elev[match(Population, codeToElev$codes)] ) # ~ 0.5s elapsed # Duncan Murdoch suggested #In a case like this, often indexing is clearer than ifelse. For example, results <- c(CN=1121, CO=2169, Ga = 500, KO=2500, Mw = 625, Ng = 300) system.time ( elevation.DM <- results[Population] ) # 0.181s elapsed #One followup: don't do this if Population is a factor. It will index by the numeric values rather than the labels. In this example you should get the same answer since the labels in "results" are in alphabetical order, but you won't in general. #Generally vector indexing of atomic vectors and matrices is very fast; indexing of data frames is much slower, so if speed is an issue, avoid them. # Jorge Ivan Velez suggests looking at recode in the car package. require(car) system.time( elevation.JIV <- recode(Population, " 'CN'=1121; 'CO'=2169; 'Ga' = 500; 'KO' = 2500; 'Mw' = 625; 'Ng' = 300 ", as.factor.result=F) ) # ~ 3.5s elapsed # David Winsemius suggests system.time( elevation.DW <- (Population=="CO")* 2169+ (Population=="CN")* 1121+ (Population=="Ga")* 500+ (Population=="KO")* 2500+ (Population=="Mw")* 625+ (Population=="Ng")* 300 ) # ~ 3.2s elapsed #Jeff Newmiller suggested using merge.. not implemented # Dennis Murphy suggested switch.. I have not gotten it working yet.. elevation.DM <- switch(Population, "CO"= 2169, "CN" = 1121, "Ga" = 500, "KO" = 2500, "Mw" = 625, "Ng" = 300 ) On 26 May 2010 01:25, Ian Dworkin <idwor...@msu.edu> wrote: > # This is more about trying to find a more effecient way to code some > simple vectorized computations using ifelse(). > > # Say you have some vector representing a factor with a number of > levels (6 in this case), representing the location that samples were > collected. > > Population <- gl( n=6, k=5,length=120, labels =c("CO", "CN","Ga","KO", > "Mw", "Ng")) > > > # You would like to assign a particular value to each level of > population (in this case the elevation at which they were collected). > In a vectorized approach (for speed... pretend this was a big data > set..) > > elevation <- ifelse(Population=="CO", 2169, > ifelse(Population=="CN", 1121, > ifelse(Population=="Ga", 500, > ifelse(Population=="KO", 2500, > ifelse(Population=="Mw", 625, > ifelse(Population=="Ng", 300, NA )))))) > > # Which is fine, but is a pain to write... > > # So I was trying to think about how to vectorize directly. i.e use > vectors within the test, and for return values for T and F > > elevation.take.2 <- ifelse(Population==c("CO", "CN", "Ga", "KO", > "Mw", "Ng"), c(2169, 1121, 500, 2500, 625, 300), c(NA, NA, NA, NA, NA, > NA)) > > # It makes sense to me why this does not work (elevation.take.2), but > I am not sure how to get it to work. Any suggestions? I suspect it > involves a trick using "any" or "II" or something, but I can't seem to > work it out. > > > # Thanks in advance! > > # Ian Dworkin > # idwor...@msu.edu > -- Ian Dworkin Assistant Professor Department of Zoology Program in Ecology, Evolutionary Biology & Behaviour Program in Genetics Michigan State University office (517) 432-6733 lab (517) 432-6730 idwor...@msu.edu https://www.msu.edu/~idworkin/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.