https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82682
--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> --- We now generate: .L3: movzbl (%edx), %esi addl $2, %edx addl $1, %ecx movzbl -1(%edx), %eax movl %esi, %ebx imull $38470, %eax, %eax movzbl %bl, %esi imull $19595, %esi, %esi addl %esi, %eax sarl $16, %eax movb %al, -1(%ecx) cmpl %edi, %edx jne .L3 while from older gcc I get .L3: movzbl (%ecx,%edx,2), %eax movzbl 1(%ecx,%edx,2), %edi imull $19595, %eax, %eax imull $38470, %edi, %edi addl %edi, %eax sarl $16, %eax movb %al, (%esi,%edx) addl $1, %edx cmpl %edx, %ebx jne .L3 .L1: There is clearly missed optimization on movzbl $bl, esi because it is already extended. I wonder how this can be triggered by the move cost changes, perhaps regalloc difference? Jakub, it is easy for you to get .s files from r253958 and just before?