I've looked at how to speed up the transpose function in R (ie, t(X)). The existing code does the work with loops like the following:
for (i = 0; i < len; i++) REAL(r)[i] = REAL(a)[(i / ncol) + (i % ncol) * nrow]; It seems a bit optimistic to expect a compiler to produce good code from this. I've re-written these loops as follows: for (i = 0, j = 0; i<len; i += 1, j += nrow) { if (j>=len) j -= (len-1); REAL(r)[i] = REAL(a)[j]; } The resulting improvement is sometimes dramatic. Here's a test program: M <- matrix(seq(0,1,12000),200,60) print(system.time({for (i in 1:10000) S <- t(M)})) print(system.time({for (i in 1:10000) R <- t(S)})) v <- seq(0,2,12000) print(system.time({for (i in 1:100000) u <- t(v)})) print(system.time({for (i in 1:100000) w <- t(u)})) Here are the times on an Intel Linux system: R version 2.11.1: Modified version: user system elapsed user system elapsed 1.190 0.040 1.232 0.610 0.010 0.619 user system elapsed user system elapsed 1.200 0.020 1.226 0.610 0.000 0.616 user system elapsed user system elapsed 0.800 0.010 0.813 0.750 0.000 0.752 user system elapsed user system elapsed 0.910 0.010 0.921 0.860 0.000 0.864 Here are the times on a SPARC Solaris system: R version 2.11.1: Modified version: user system elapsed user system elapsed 18.643 0.041 18.685 2.994 0.039 3.033 user system elapsed user system elapsed 18.574 0.041 18.616 3.123 0.039 3.163 user system elapsed user system elapsed 3.803 0.271 4.075 3.868 0.296 4.163 user system elapsed user system elapsed 4.184 0.273 4.457 4.238 0.302 4.540 So with the modification, transpose for a 60x200 or 200x60 matrix is about a factor of two faster on the Intel system, and a factor of six faster on the SPARC system. There is little or no gain from the modification when transposing a row or column vector, however. (I think it must be that on these machines multiplies and divides do not take constant time, but are faster when, for instance, dividing by 1.) I've appended below the new version of the modified part of the do_transpose function in src/main/array.c. Radford Neal ---------------------------------------------------------------------- PROTECT(r = allocVector(TYPEOF(a), len)); switch (TYPEOF(a)) { case LGLSXP: case INTSXP: for (i = 0, j = 0; i<len; i += 1, j += nrow) { if (j>=len) j -= (len-1); INTEGER(r)[i] = INTEGER(a)[j]; } case REALSXP: for (i = 0, j = 0; i<len; i += 1, j += nrow) { if (j>=len) j -= (len-1); REAL(r)[i] = REAL(a)[j]; } break; case CPLXSXP: for (i = 0, j = 0; i<len; i += 1, j += nrow) { if (j>=len) j -= (len-1); COMPLEX(r)[i] = COMPLEX(a)[j]; } break; case STRSXP: for (i = 0, j = 0; i<len; i += 1, j += nrow) { if (j>=len) j -= (len-1); SET_STRING_ELT(r, i, STRING_ELT(a,j)); } break; case VECSXP: for (i = 0, j = 0; i<len; i += 1, j += nrow) { if (j>=len) j -= (len-1); SET_VECTOR_ELT(r, i, VECTOR_ELT(a,j)); } break; case RAWSXP: for (i = 0, j = 0; i<len; i += 1, j += nrow) { if (j>=len) j -= (len-1); RAW(r)[i] = RAW(a)[j]; } break; default: UNPROTECT(1); goto not_matrix; } ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel