https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88533
Thomas Koenig <tkoenig at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tkoenig at gcc dot gnu.org --- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- Strange. I ran the code (with the data) a few times on my Ryzen 7 home system. Here are some timings (run 10 times): $ gfortran -O3 -ftree-vectorize -g csc_test.f90 $ for a in 1 2 3 4 5 6 7 8 9 10; do ./a.out; done CPU time [s]: 1.20 CPU time [s]: 2.52 CPU time [s]: 2.53 CPU time [s]: 2.53 CPU time [s]: 2.53 CPU time [s]: 2.53 CPU time [s]: 2.53 CPU time [s]: 1.18 CPU time [s]: 2.49 CPU time [s]: 2.53 $ gfortran -O3 -ftree-vectorize -fcheck=bounds -g csc_test.f90 $ for a in 1 2 3 4 5 6 7 8 9 10; do ./a.out; done CPU time [s]: 1.28 CPU time [s]: 2.62 CPU time [s]: 2.62 CPU time [s]: 2.60 CPU time [s]: 2.59 CPU time [s]: 2.60 CPU time [s]: 2.60 CPU time [s]: 2.63 CPU time [s]: 2.65 CPU time [s]: 2.57 What strikes me is that I hardly see any slowdown from bounds checking, and that some runs (only a few) are far faster than others. Is it possible that the data size of the problem is just at the edge of cache size, so that (depending on what else happens on the system) it is possible to either get a lot of cache misses or not? (I made sure to always seed the random number generator with the same values).