https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116475
Bug ID: 116475
Summary: autovect: may be optimized for min/max
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: syq at gcc dot gnu.org
Target Milestone: ---
If we need get the minimal of 8 floats in an array.
We may have code like this
float min(float *x) {
float ret = x[0];
for (int i=0; i<8; i++) { // from 0 in this line
ret = ret<x[i] ? ret : x[i];
}
return ret;
}
While if we compile it with
aarch64-linux-gnu-gcc -O3 -ffast-math -S xx.c
We get
ldp q0, q1, [x0]
ld1r {v31.4s}, [x0] # <-- not needed
fminnm v31.4s, v1.4s, v31.4s # <-- not needed
fminnm v0.4s, v31.4s, v0.4s
fminnmv s0, v0.4s
ret
And maybe we can also use
float min(float *x) {
float ret = x[0];
for (int i=1; i<8; i++) { // from 1 in this line
ret = ret<x[i] ? ret : x[i];
}
return ret;
}
It will be even worse
ldr q31, [x0, 4]
ld1r {v30.4s}, [x0]
ldp s0, s29, [x0, 20]
fminnm v31.4s, v31.4s, v30.4s
ldr s30, [x0, 28]
fminnm s0, s0, s29
fminnmv s31, v31.4s
fminnm s31, s30, s31
fminnm s0, s0, s31
ret