------- Comment #10 from steven at gcc dot gnu dot org 2006-02-19 13:41 ------- I modified the test case a bit to make it easier to understand what is going on:
void do_sort (int *lst, int cnt) { int i, j, k; for (i = 0; i < cnt - 1; i++) { for (j = i + 1; j < cnt; j++) { int lsti = lst[i]; int lstj = lst[j]; if (lsti > lstj) { lst[i] = lstj; lst[j] = lsti; } } } } This gives two very different inner loops: GCC 4.0: .L6: movl -4(%esi), %ecx movl (%edx), %eax cmpl %eax, %ecx jle .L7 movl %eax, -4(%esi) movl %ecx, (%edx) .L7: addl $1, %ebx addl $4, %edx cmpl %edi, %ebx jne .L6 GCC 4.1: .L6: movl 8(%ebp), %ebx movl -4(%ebx,%eax,4), %ebx movl %ebx, -20(%ebp) movl 4(%ecx), %esi movl %esi, -24(%ebp) cmpl %esi, %ebx jle .L7 movl 8(%ebp), %ebx movl %esi, -4(%ebx,%eax,4) movl -20(%ebp), %esi movl %esi, 4(%ecx) .L7: addl $1, -28(%ebp) addl $4, %ecx cmpl -28(%ebp), %edi jg .L6 So there are two problems: - The addressing modes are different. This is due to the TARGET_MEM_REF stuff that Zdenek added. - We need at least one register more apparently, judging from the extra stack moves. Interestingly, if I change the test case to: void do_sort (int *lst, int cnt) { int i, j, k; for (i = 0; i < cnt - 1; i++) { for (j = 0/*i + 1*/; j < cnt; j++) { int lsti = lst[i]; int lstj = lst[j]; if (lsti > lstj) { lst[i] = lstj; lst[j] = lsti; } } } } then the code produced by GCC 4.1 is 20% faster than what GCC 4.0 makes of it. Zdenek, this really looks like one for you... -- steven at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|steven at gcc dot gnu dot |unassigned at gcc dot gnu |org |dot org Status|ASSIGNED |NEW http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26290