------- Comment #19 from hubicka at gcc dot gnu dot org  2008-02-06 16:56 
-------
Created an attachment (id=15108)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15108&action=view)
Complete continue heuristic patch

Hi,
this is the complete patch.  With this patch we produce profile sane enough so
the internal loops are not marked cold.  I will benchmark it probably tomorrow
(I want to wait for the FP changes to show separately).

It fixes the offline copy of longest_match, so we no longer have one of IV
variables at stack:
.L15:
        movzbl  2(%edx), %eax
        leal    2(%edx), %esi
        cmpb    2(%ecx), %al
        jne     .L8
        movzbl  3(%edx), %eax
        leal    3(%edx), %esi
        cmpb    3(%ecx), %al
        jne     .L8 
        movzbl  4(%edx), %eax
        leal    4(%edx), %esi
        cmpb    4(%ecx), %al
        jne     .L8
        movzbl  5(%edx), %eax
        leal    5(%edx), %esi
        cmpb    5(%ecx), %al
        jne     .L8
        movzbl  6(%edx), %eax
        leal    6(%edx), %esi
        cmpb    6(%ecx), %al
        jne     .L8
        movzbl  7(%edx), %eax
        leal    7(%edx), %esi
        cmpb    7(%ecx), %al
        jne     .L8
        leal    8(%ecx), %eax
        movl    %eax, %ecx
        movzbl  8(%edx), %eax
        cmpb    (%ecx), %al
        leal    8(%edx), %ebx
        movl    %ebx, %esi
        jne     .L8
        cmpl    %ebx, -20(%ebp)
        jbe     .L8
        movl    %ebx, %edx
        movzbl  1(%edx), %eax
        leal    1(%edx), %esi
        cmpb    1(%ecx), %al
        je      .L15

Irronically this can further widen the gap in between -O2 and -O3, since the
inline copy in deflate was always allocated resonably.
Deflate codegen changes quite a lot and because function body is big I will
wait for benchmarks before trying to analyze futher.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33761

Reply via email to