Herve Pages <[EMAIL PROTECTED]> writes: > So apparently here extracting with dat[i, ] is 300 times faster than > extracting with dat[key, ] ! > >> system.time(for (i in 1:100) dat["1", ]) > user system elapsed > 12.680 0.396 13.075 > >> system.time(for (i in 1:100) dat[1, ]) > user system elapsed > 0.060 0.076 0.137 > > Good to know!
I think what you are seeing here has to do with the space efficient storage of row.names of a data.frame. The example data you are working with has no specified row names and so they get stored in a compact fashion: mat <- matrix(rep(paste(letters, collapse=""), 5*300000), ncol=5) dat <- as.data.frame(mat) > typeof(attr(dat, "row.names")) [1] "integer" In the call to [.data.frame when i is character, the appropriate index is found using pmatch and this requires that the row names be converted to character. So in a loop, you get to convert the integer vector to character vector at each iteration. If you assign character row names, things will be a bit faster: # before system.time(for (i in 1:25) dat["2", ]) user system elapsed 9.337 0.404 10.731 # this looks funny, but has the desired result rownames(dat) <- rownames(dat) typeof(attr(dat, "row.names") # after system.time(for (i in 1:25) dat["2", ]) user system elapsed 0.343 0.226 0.608 And you probably would have seen this if you had looked at the the profiling data: Rprof() for (i in 1:25) dat["2", ] Rprof(NULL) summaryRprof() + seth ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel