http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60539

chrbr at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chrbr at gcc dot gnu.org

--- Comment #1 from chrbr at gcc dot gnu.org ---
yes or not, it's not really ignored, it's the prob_likely value tuning. Setting
it to REG_BR_PROB_BASE restores the loop align but also impacts code ordering
for the byte-at-a-time code chunk that becomes less likely.

so we get worse:

    mov    r4,r0
    tst    #3,r0
    mov    r4,r1
    bt    .L10
.L6:
    mov.b    @r1+,r2
    tst    r2,r2
    bf    .L6
    mov    r1,r0
    rts    
    subc    r4,r0
    .align 1
.L10:
    mov    #0,r3
.L4:
    mov.l    @r1+,r2
    cmp/str    r3,r2
    bf    .L4
    bra    .L6
    add    #-4,r1

The problem is that .L14 is reached both from the word-at-atime paths and
byte-at-atime paths... and I was not able to find the proper tuning value to
favor boths given than the word loop iteration number is probably small
("strings are generally" not so big) and that the byte loop number of
iterations is less than 4, so introducing a .align here can be a cost.

I did try to introduce a UNSPECV_ALIGN here but without measuring any speed
improvement (or any small negative impact) on my board. Anyway any interesting
benchmarking tuning here is interesting, or even find a pathological case here
welcome,

Reply via email to