https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

            Bug ID: 108487
           Summary: ~20-30x slowdown in populating std::vector from
                    std::ranges::iota_view
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Mark_B53 at yahoo dot com
  Target Milestone: ---

Using -std=c++20 -O3, comparing gcc 12.2 vs. gcc 10.3:
 * fn2 is 20-30x slower on gcc 12.2 (i.e. 2000-3000% more) 
 * fn1 is ~20% slower on gcc 12.2 

This test was run on an 52 core Intel Xeon Gold 6278C CPU.  Tests on
www.godbolt.org directionally align with these findings.  It seems the slowdown
was introduced in 10.4 & 11.1.  The trunk has identical performance to 12.2.

#include <vector>
#include <ranges>
#include <ctime>
#include <iostream>

__attribute__((noinline)) std::vector<int> fn1(int n)
{
    auto v = std::vector<int>(n);
    for(int i = 0; i != n; ++i)
        v[i] = i;
    return v;
}

__attribute__((noinline)) std::vector<int> fn2(int n)
{
    auto rng = std::ranges::iota_view{0, n};
    return std::vector<int>{rng.begin(), rng.end()};
}

int main() {
    int n = 100'000;
    int times = 100'000;

    auto t0 = std::clock();
    for (int i = 0; i < times; ++i)
        fn1(n);            
    auto t1 = std::clock();
    for (int i = 0; i < times; ++i)
        fn2(n);            
    auto t2 = std::clock();
    std::cout << t1 - t0 << '\n';
    std::cout << t2 - t1 << '\n';
    return 0;
}

P.S. 20% slowdown for a common vector population is still significant IMO.  I
am not sure that qualifies as a bug.  I did not file one on account of the
'fn1' slowdown.

Reply via email to