Hi,
this patch uses division by known sizes (which can usually be replaced by a simple shift because intrinsics have sizes of power of two) instead of division by the size extracted from the array descriptor itself.This should save about 20 cycles for a single calculation. I'll go through the rest of the library to identify other possibilities for this. Regression-tested, no new failures. OK for the branch?
Full patch at http://gcc.gnu.org/ml/fortran/2012-03/msg00120.html Ping? Thomas