------- Comment #19 from hubicka at gcc dot gnu dot org 2008-02-06 16:56 ------- Created an attachment (id=15108) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15108&action=view) Complete continue heuristic patch
Hi, this is the complete patch. With this patch we produce profile sane enough so the internal loops are not marked cold. I will benchmark it probably tomorrow (I want to wait for the FP changes to show separately). It fixes the offline copy of longest_match, so we no longer have one of IV variables at stack: .L15: movzbl 2(%edx), %eax leal 2(%edx), %esi cmpb 2(%ecx), %al jne .L8 movzbl 3(%edx), %eax leal 3(%edx), %esi cmpb 3(%ecx), %al jne .L8 movzbl 4(%edx), %eax leal 4(%edx), %esi cmpb 4(%ecx), %al jne .L8 movzbl 5(%edx), %eax leal 5(%edx), %esi cmpb 5(%ecx), %al jne .L8 movzbl 6(%edx), %eax leal 6(%edx), %esi cmpb 6(%ecx), %al jne .L8 movzbl 7(%edx), %eax leal 7(%edx), %esi cmpb 7(%ecx), %al jne .L8 leal 8(%ecx), %eax movl %eax, %ecx movzbl 8(%edx), %eax cmpb (%ecx), %al leal 8(%edx), %ebx movl %ebx, %esi jne .L8 cmpl %ebx, -20(%ebp) jbe .L8 movl %ebx, %edx movzbl 1(%edx), %eax leal 1(%edx), %esi cmpb 1(%ecx), %al je .L15 Irronically this can further widen the gap in between -O2 and -O3, since the inline copy in deflate was always allocated resonably. Deflate codegen changes quite a lot and because function body is big I will wait for benchmarks before trying to analyze futher. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761