https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121949
Bug ID: 121949
Summary: Missed shift vectorization when IV value has a
different datatype
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tnfchris at gcc dot gnu.org
Blocks: 53947
Target Milestone: ---
I think the solution to this is probably the same as in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119860#c1 but filing it as a
separate tickets as something to test with.
The following example:
void f1(long long word, long long* acc)
{
for (long long row = 0; row < 64; ++row)
{
if (word & (1ull << row)) {
acc[row] += row;
}
}
}
void f2(long long word, long long* acc)
{
for (int row = 0; row < 64; ++row)
{
if (word & (1ull << row)) {
acc[row] += row;
}
}
}
with -O3 -march=armv8-a+sve vectorizes with f1 but doesn't with f2.
This is because the shift amount "row" is 32-bits but the datatype of the shift
64-bits.
It seems the vectorizer doesn't support increasing the VF and simply extending
the value to 64-bits in this case and instead refuses to vectorize.
While the optimal solution may be to just extend row to a 64-bit IV, it's
unclear why we didn't support unpacking in this case.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations