https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93440

            Bug ID: 93440
           Summary: scalar unrolled loop makes vectorized code unreachable
           Product: gcc
           Version: 9.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ikonomisma at googlemail dot com
  Target Milestone: ---

Created attachment 47711
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47711&action=edit
generated assemby code, showing unreachable SIMD vector code for
transform_reduce

With gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC) on x86-64, "-O3
-std=gnu++17 -march=core-avx2" emits both SIMD vectorized code and an unrolled
loop for "std::transform_reduce". The unrolled loop prevents the SIMD
vectorized code from executing for reasonable vector sizes. In contrast,
"std::inner_product" produces reachable vectorized code.

Minimal reproducing c++ code:

    #include <vector>
    #include <algorithm>
    #include <numeric>

    auto workingvector(std::vector<int> const& a, std::vector<int> const& b) 
    {
      return std::inner_product(cbegin(a), cend(a), cbegin(b), 0,
std::plus<>{}, std::multiplies<>{});
    }

    auto brokenvector(std::vector<int> const& a, std::vector<int> const& b) 
    {
      return std::transform_reduce(cbegin(a), cend(a), cbegin(b), 0,
std::plus<>{},std::multiplies<>{});
    }



Details:
The generated assembly for the "transform_reduce" checks for short vectors with
a signed comparison, so the vectorized code is *technically* reachable (for
completely infeasible vector sizes on a 64-bit address-space). If the vector
size is large enough for the unrolled scalar loop, the scalar loop processes
the entire vector, never allowing the SIMD vector code to execute.

Reply via email to