[Bug target/117717] SLP of bubble sort is slower than without SLP

pinskia at gcc dot gnu.org via Gcc-bugs Wed, 20 Nov 2024 10:24:17 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117717


Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-11-20

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.L4:
  movq (%rax), %xmm0
  pshufd $0xe5, %xmm0, %xmm1
  movd %xmm0, %edx
  movd %xmm1, %ecx
  cmpl %edx, %ecx
  jnb .L3
  pshufd $225, %xmm0, %xmm0
  movl $1, %edi
  movq %xmm0, (%rax)
.L3:
  addq $4, %rax
  cmpq %rsi, %rax
  jne .L4


vs:
.L4:
  movl 4(%rax), %edx
  movl (%rax), %ecx
  cmpl %ecx, %edx
  jnb .L3
  movl %ecx, 4(%rax)
  movl $1, %edi
  movl %edx, (%rax)
.L3:
  addq $4, %rax
  cmpq %rsi, %rax
  jne .L4

Note the aarch64 cost model rejects the vectorization. 

X86_64 (on the trunk) cost model says:
/app/example.cpp:10:26: note: Cost model analysis for part in loop 2:
  Vector cost: 44
  Scalar cost: 48

While aarch64 says:
/app/example.cpp:10:26: note: Cost model analysis for part in loop 2:
  Vector cost: 12
  Scalar cost: 4

[Bug target/117717] SLP of bubble sort is slower than without SLP

Reply via email to