https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Blocks| |53947
Component|target |tree-optimization
Last reconfirmed| |2024-02-26
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note that we fail to SLP vectorize this (at -O3 we unroll the inner loop):
t.c:4:20: note: ==> examining statement: _34 = *_33;
t.c:4:20: missed: peeling for gaps insufficient for access
t.c:5:51: missed: not vectorized: relevant stmt not supported: _34 = *_33;
t.c:4:20: note: removing SLP instance operations starting from: *_29 = _35;
t.c:4:20: missed: unsupported SLP instances
which is because 'factor[i]' is treated as vector load
t.c:4:20: note: node 0x687f730 (max_nunits=4, refcnt=2) const vector(4)
double
t.c:4:20: note: op template: _34 = *_33;
t.c:4:20: note: stmt 0 _34 = *_33;
t.c:4:20: note: stmt 1 _34 = *_33;
t.c:4:20: note: stmt 2 _34 = *_33;
t.c:4:20: note: stmt 3 _34 = *_33;
t.c:4:20: note: load permutation { 0 0 0 0 }
and we don't anticipate we can do this with a load-and-splat (I'm not sure
we'd eventually do that even).
I think we might have a duplicate bugreport for this issue.
Note with GCC 13 we refuse to SLP because
t.c:4:20: missed: Build SLP failed: not grouped load _35 = *_34;
You can help GCC by doign
void rescale_x4(double* __restrict data, const double * __restrict factor, int
n)
{
for (int i=0; i<n; i++) {
#pragma GCC unroll 0
for (int k=0; k<4; k++) data[4*i+k] *= factor[i];
}
}
which will get you
rescale_x4:
.LFB0:
.cfi_startproc
testl %edx, %edx
jle .L5
movslq %edx, %rdx
salq $5, %rdx
leaq (%rdi,%rdx), %rax
.p2align 4,,10
.p2align 3
.L3:
vbroadcastsd (%rsi), %ymm0
addq $32, %rdi
addq $8, %rsi
vmulpd -32(%rdi), %ymm0, %ymm0
vmovupd %ymm0, -32(%rdi)
cmpq %rdi, %rax
jne .L3
vzeroupper
.L5:
ret
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations