https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93440
Bug ID: 93440 Summary: scalar unrolled loop makes vectorized code unreachable Product: gcc Version: 9.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ikonomisma at googlemail dot com Target Milestone: --- Created attachment 47711 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47711&action=edit generated assemby code, showing unreachable SIMD vector code for transform_reduce With gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC) on x86-64, "-O3 -std=gnu++17 -march=core-avx2" emits both SIMD vectorized code and an unrolled loop for "std::transform_reduce". The unrolled loop prevents the SIMD vectorized code from executing for reasonable vector sizes. In contrast, "std::inner_product" produces reachable vectorized code. Minimal reproducing c++ code: #include <vector> #include <algorithm> #include <numeric> auto workingvector(std::vector<int> const& a, std::vector<int> const& b) { return std::inner_product(cbegin(a), cend(a), cbegin(b), 0, std::plus<>{}, std::multiplies<>{}); } auto brokenvector(std::vector<int> const& a, std::vector<int> const& b) { return std::transform_reduce(cbegin(a), cend(a), cbegin(b), 0, std::plus<>{},std::multiplies<>{}); } Details: The generated assembly for the "transform_reduce" checks for short vectors with a signed comparison, so the vectorized code is *technically* reachable (for completely infeasible vector sizes on a 64-bit address-space). If the vector size is large enough for the unrolled scalar loop, the scalar loop processes the entire vector, never allowing the SIMD vector code to execute.