In total I found 8 different way to calculate quantile in very a large data.table. I share below their performances for future reference. Tests 1, 7 and 8 were the fastest I found.
Best, Camilo library(data.table) v <- data.table(x=runif(10000),x2 = runif(10000), x3=runif(10000),x4=runif(10000)) #fastest Sys.time()->StartTEST1 t(v[, apply(v,1,quantile,probs =c(.1,.9,.5),na.rm=TRUE)] ) Sys.time()->EndTEST1 Sys.time()->StartTEST2 v[, quantile(.SD,probs =c(.1,.9,.5)), by = 1:nrow(v)] Sys.time()->EndTEST2 Sys.time()->StartTEST3 v[, c("L","H","M"):=quantile(.SD,probs =c(.1,.9,.5)), by = 1:nrow(v)] Sys.time()->EndTEST3 v v[, c("L","H","M"):=NULL] v[,Names:=rownames(v)] setkey(v,Names) Sys.time()->StartTEST4 v[, c("L","H","M"):=quantile(.SD,probs =c(.1,.9,.5)), by = Names] Sys.time()->EndTEST4 v v[, c("L","H","M"):=NULL] Sys.time()->StartTEST5 v[, as.list(quantile(.SD,c(.1,.90,.5),na.rm=TRUE)), by=Names] Sys.time()->EndTEST5 Sys.time()->StartTEST6 v[, as.list(quantile(.SD,c(.1,.90,.5),na.rm=TRUE)), by=Names,.SDcols=1:4] Sys.time()->EndTEST6 Sys.time()->StartTEST7 v[, as.list(quantile(c(x , x2, x3, x4 ),c(.1,.90,.5),na.rm=TRUE)), by=Names] Sys.time()->EndTEST7 # melting the database and doing quantily by summary. This is the second fastest, which is ironic given that the database has to be melted first library(reshape2) Sys.time()->StartTEST8 vs<-melt(v) vs[, as.list(quantile(value,c(.1,.90,.5),na.rm=TRUE)), by=Names] Sys.time()->EndTEST8 EndTEST1-StartTEST1 EndTEST2-StartTEST2 EndTEST3-StartTEST3 EndTEST4-StartTEST4 EndTEST5-StartTEST5 EndTEST6-StartTEST6 EndTEST7-StartTEST7 EndTEST8-StartTEST8 ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.