https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122148

            Bug ID: 122148
           Summary: Cannot vectorize max-min vector product loops
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: manu at gcc dot gnu.org
  Target Milestone: ---

GCC fails to vectorize either of the loops here with 

gcc -O3 -march=x86-64-v4 -fopt-info-vec-missed test.c

```
static inline double
get_min_ratio(const double * restrict p, unsigned dim, const double * restrict
w)
{
    double min_ratio = p[0] * w[0];
    for (unsigned k = 1; k < dim; k++) {
        double ratio = p[k] * w[k];
        if (min_ratio > ratio)
            min_ratio = ratio;
    }
    return min_ratio;        
}

double
get_expected_value(const double * restrict points, unsigned dim, unsigned
npoints,
                   const double * restrict w)
{
    // points >= 0 && w >=0 so max_s_w cannot be < 0.
    double max_s_w = 0;
    for (unsigned i = 0; i < npoints; i++) {
        const double * restrict p = points + i * dim;
        double min_ratio = get_min_ratio(p, dim, w);
        if (max_s_w < min_ratio)
            max_s_w = min_ratio;
    }
    return max_s_w;
}
```

<source>:19:28: missed: couldn't vectorize loop
<source>:19:28: missed: not vectorized: unsupported control flow in loop.
<source>:5:28: missed: couldn't vectorize loop
<source>:14:1: missed: not vectorized: unsupported use in stmt.

See https://godbolt.org/z/hb8KKfM3W

I have tried replacing the "if" with "?:" but doesn't change anything (not that
it should).

gcc -O3 -march=x86-64-v4 -fopt-info-vec-optimized-missed -ffinite-math-only 
-funsafe-math-optimizations

is able to vectorize the inner loop but the "missed" info did not mention these
flags.

<source>:19:28: missed: couldn't vectorize loop
<source>:19:28: missed: not vectorized: unsupported control flow in loop.
<source>:5:28: optimized: loop vectorized using 64 byte vectors and unroll
factor 8
<source>:5:28: optimized: epilogue loop vectorized using 32 byte vectors and
unroll factor 4

Reply via email to