https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111894
Bug ID: 111894 Summary: Missed vectorization opportunity Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: Mark_B53 at yahoo dot com Target Milestone: --- Consider the following code that uses views to implement a two dimensional `iota`: #include <array> #include <ranges> template<std::integral T> std::ranges::range auto iota2D(T xbound, T ybound) { auto fn = [=](T idx) { return std::tuple{idx / ybound, idx % ybound}; }; return std::views::iota(T{0}, xbound * ybound) | std::views::transform(fn); } constexpr std::size_t N = 20; std::array<std::array<int, N>, N> data; __attribute__((noinline)) void init1() { for (auto i : std::views::iota(size_t{}, N)) { for (auto j : std::views::iota(size_t{}, N)) { data[i][j] = 123; } } } __attribute__((noinline)) void init2() { for (auto [i, j] : iota2D(N,N)) { data[i][j] = 123; } } Using gcc 13.2 with -O3, we see that the code using a nested loop is nicely vectorized: init1(): movdqa xmm0, XMMWORD PTR .LC0[rip] mov eax, OFFSET FLAT:data .L2: movaps XMMWORD PTR [rax], xmm0 add rax, 80 movaps XMMWORD PTR [rax-64], xmm0 movaps XMMWORD PTR [rax-48], xmm0 movaps XMMWORD PTR [rax-32], xmm0 movaps XMMWORD PTR [rax-16], xmm0 cmp rax, OFFSET FLAT:data+1600 jne .L2 ret The code using iota2D is not vectorized: init2(): xor eax, eax .L6: mov DWORD PTR data[0+rax*4], 123 add rax, 1 cmp rax, 400 jne .L6 ret Although GCC 13 produces much higher quality assembly than previous versions, it fails to vectorize the loop.