http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60539
chrbr at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |chrbr at gcc dot gnu.org --- Comment #1 from chrbr at gcc dot gnu.org --- yes or not, it's not really ignored, it's the prob_likely value tuning. Setting it to REG_BR_PROB_BASE restores the loop align but also impacts code ordering for the byte-at-a-time code chunk that becomes less likely. so we get worse: mov r4,r0 tst #3,r0 mov r4,r1 bt .L10 .L6: mov.b @r1+,r2 tst r2,r2 bf .L6 mov r1,r0 rts subc r4,r0 .align 1 .L10: mov #0,r3 .L4: mov.l @r1+,r2 cmp/str r3,r2 bf .L4 bra .L6 add #-4,r1 The problem is that .L14 is reached both from the word-at-atime paths and byte-at-atime paths... and I was not able to find the proper tuning value to favor boths given than the word loop iteration number is probably small ("strings are generally" not so big) and that the byte loop number of iterations is less than 4, so introducing a .align here can be a cost. I did try to introduce a UNSPECV_ALIGN here but without measuring any speed improvement (or any small negative impact) on my board. Anyway any interesting benchmarking tuning here is interesting, or even find a pathological case here welcome,