https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80015
Bug ID: 80015
Summary: auto vectorization leave scalar code even if it is
unreachable
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vanyacpp at gmail dot com
Target Milestone: ---
Consider these two versions of dot_product:
#include <cstdlib>
float dot_product(float const* a,
float const* b,
size_t n)
{
a = (float const*)__builtin_assume_aligned(a, 16);
b = (float const*)__builtin_assume_aligned(b, 16);
if ((n % 4) != 0)
return 0.; // (1)
// __builtin_unreachable(); // (2)
float result = 0.f;
for (size_t i = 0; i != n; ++i)
result += a[i] * b[i];
return result;
}
The code should be compiled with flags -O3 -ffast-math.
In case of (1) the return 0. is performed when n is not a multiple of 4, in (2)
__builtin_unreachable() is invoked. The code (2) with __builtin_unreachable()
is optimized to the point where only packed operations are used. In the code
(1) with return the scalar operations are still left.
The expected behavior is that gcc should not emit scalar operations in both
versions.