https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62220
Bug ID: 62220 Summary: missed optimization wrt module for loop variable Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: drepper.fsp+rhbz at gmail dot com Created attachment 33376 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33376&action=edit program to measure difference of the original and proposed optimized code I've optimized some prominent code bases by hand to achieve significant speedup but this optimization could have been performed by gcc. The problem is that some program authors underestimate the price of integer module operations. I attaching a test program (x86-64) below and the time difference I see about 1500%. The general pattern is: loop index variable I with upper limit L for (I = <?>; I < L; ++I) inside loop use I % M where M is loop in-variant e.g.: var[I % M] This could be optimized to compute LL = L - L % M loop index variable I with upper limit LL; nested second loop J = <?> % M for (I = <?>; I < LL; ) { for (; J < M; ++I, ++J) { ... loop body ... ... e.g., var[J] ... instead of var[I % M] } J = 0; }