https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102840
Roger Sayle <roger at nextmovesoftware dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2021-10-19 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Roger Sayle <roger at nextmovesoftware dot com> --- I believe this test case is poorly written, and not correctly testing the original issue in PR target/22076 which concerned suboptimal moving of arguments via memory (fixed by prohibiting reload using mmx registers). Prior to my patch, with -m32 -O2 -fomit-frame-pointer -mmmx -mno-sse2, GCC generated: test: movq .LC1, %mm0 paddb .LC0, %mm0 movq %mm0, x ret .x: .zero 8 .LC0: .byte 1 .byte 2 .byte 3 .byte 4 .byte 5 .byte 6 .byte 7 .byte 8 .LC1: .byte 11 .byte 22 .byte 33 .byte 44 .byte 55 .byte 66 .byte 77 .byte 88 which indeed doesn't use movl, and requires two movq. After my patch, we now generate the much more efficient (dare I say optimal): test: movl $807671820, %eax movl $1616136252, %edx movl %eax, x movl %edx, x+4 ret which has evaluated the _mm_add_pi8 at compile-time, and effectively memsets x to the correct value in the minimum possible number of cycles. In fact, failing to evaluate this at compile-time is a regression since v4.1 (according to godbolt) [p.s. I predict other platforms might also notice changes in their testsuites, as the middle-end now generates more efficient instruction sequences].