https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117717
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed| |2024-11-20 --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- .L4: movq (%rax), %xmm0 pshufd $0xe5, %xmm0, %xmm1 movd %xmm0, %edx movd %xmm1, %ecx cmpl %edx, %ecx jnb .L3 pshufd $225, %xmm0, %xmm0 movl $1, %edi movq %xmm0, (%rax) .L3: addq $4, %rax cmpq %rsi, %rax jne .L4 vs: .L4: movl 4(%rax), %edx movl (%rax), %ecx cmpl %ecx, %edx jnb .L3 movl %ecx, 4(%rax) movl $1, %edi movl %edx, (%rax) .L3: addq $4, %rax cmpq %rsi, %rax jne .L4 Note the aarch64 cost model rejects the vectorization. X86_64 (on the trunk) cost model says: /app/example.cpp:10:26: note: Cost model analysis for part in loop 2: Vector cost: 44 Scalar cost: 48 While aarch64 says: /app/example.cpp:10:26: note: Cost model analysis for part in loop 2: Vector cost: 12 Scalar cost: 4