Hi R-helpers, Does anyone know why adding which() makes the select call more efficient than just using logical selection in a dataframe? Doesn't which() technically add another conversion/function call on top of the logical selection? Here is a reproducible example with a slight difference in timing.
# Surrogate data - the timing here isn't interesting urltext <- paste("https://drive.google.com/", "uc?id=1AZ-s1EgZXs4M_XF3YYEaKjjMMvRQ7", "-h8&export=download", sep="") download.file(url=urltext, destfile="tempfile.csv") # download file first dat <- read.csv("tempfile.csv", stringsAsFactors = FALSE, header=TRUE, nrows=2.5e6) # read the file; 'nrows' is a slight # overestimate dat <- dat[,1:3] # select just the first 3 columns head(dat, 10) # print the first 10 rows # Select using which() as the final step ~ 90ms total time on my macbook air system.time( head( dat[which(dat$gender2=="other"),],), gcFirst=TRUE) # Select skipping which() ~130ms total time system.time( head( dat[dat$gender2=="other", ]), gcFirst=TRUE) Now I would think that the second one without which() would be more efficient. However, every time I run these, the first version, with which() is more efficient by about 20ms of system time and 20ms of user time. Does anyone know why this is? Cheers! Keith ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.