https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110972
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- -fatigue2.f90:1077:93: optimized: basic block part vectorized using 32 byte vectors +fatigue2.f90:1002:56: optimized: basic block part vectorized using 32 byte vectors +fatigue2.f90:1058:93: optimized: basic block part vectorized using 32 byte vectors OTOH that's a bit imprecise, the - is bogus I think, but the + are real, :1058 is also loop vectorized. The body is very large, perf shows Samples: 279K of event 'cycles', Event count (approx.): 316851940248 Overhead Samples Command Shared Object Symbol 30.84% 86412 fatigue2 fatigue2 [.] MAIN__ 24.26% 67530 fatigue fatigue [.] MAIN__ 20.05% 56167 fatigue2 fatigue2 [.] __perdida_m_MOD_generalized_hookes_law.constprop.0.isra.0 18.73% 52127 fatigue fatigue [.] __perdida_m_MOD_generalized_hookes_law.constprop.0.isra.0 1.96% 5459 fatigue fatigue [.] __perdida_m_MOD_generalized_hookes_law.constprop.1.isra.0 1.84% 5140 fatigue2 fatigue2 [.] __perdida_m_MOD_generalized_hookes_law.constprop.1.isra.0 0.83% 2308 fatigue libc-2.31.so [.] __memset_avx2_unaligned_erms 0.60% 1693 fatigue2 libc-2.31.so [.] __memset_avx2_unaligned_erms