------- Comment #2 from rguenth at gcc dot gnu dot org 2009-07-03 18:55 ------- Try -march=pentium-m -mtune=generic. Pentium-M never received any special tuning (it is the same as for pentium-pro). So is -march=i686 btw, but i686 does not have SSE, so it is likely vectorization and/or prefetching that slows your case 3. down.
Try disabling prefetching. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40644