http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118
Harald Anlauf <anlauf at gmx dot de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |anlauf at gmx dot de --- Comment #5 from Harald Anlauf <anlauf at gmx dot de> 2012-03-01 19:54:08 UTC --- (In reply to comment #4) > Additionally, as written before (comment 2), a reasonably well written DO loop > should be always as fast or faster than a FORALL. The definition of FORALL > does > not allow for a good optimization in the general case. Do not forget that there are constraints for FORALL statements that are not required for DO loops so that all assignments are independent. This guarantees vectorization > I did a quick run with six compilers. Result: The FORALL construct was between > 3.2 to 5.25 times slower than the DO loop. Thus, other compilers do not handle > it better, either. I tried the SunStudio 12 on i686 Time of operation was 11.831321 seconds Time of operation was 12.235342 seconds and on x86_64 (AMD barcelona) Time of operation was 8.715117 seconds Time of operation was 10.525522 seconds So a small slowdown. Then I tried NEC's sxf90 rev.441 for SX-9 at -Chopt: Time of operation was 4.187261 seconds Time of operation was 1.259775 seconds Whoops! After looking into the transformation listing and instrumenting the code, it looks like the do loop is poorly optimized, giving lots of so-called bank conflicts. Reducing optimization to -Cvopt, I get: Time of operation was 1.185673 seconds Time of operation was 1.271729 seconds Looks reasonable. So yes, FORALL is in practice slightly slower (almost always... ;-)