[Bug target/110023] 10% performance drop on important benchmark after r247544.

d_vampile at 163 dot com via Gcc-bugs Tue, 30 May 2023 07:46:47 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110023


--- Comment #2 from d_vampile <d_vampile at 163 dot com> ---
(In reply to Andrew Pinski from comment #1)
> This is almost definitely an aarch64 cost model issue ...

Do you mean that the vectorized cost_model of the underlying hardware causes
the policy of not peeling the loop after r247544 to be chosen? ? So why does
loop peeling result in performance improvements?
For the following code, I understand that this is a very standard vectorized
effective loop.
for (j=0; j<STREAM_ARRAY_SIZE; j++)
c[j] = a[j]+b[j];

[Bug target/110023] 10% performance drop on important benchmark after r247544.

Reply via email to