https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |tree-optimization
           Keywords|                            |needs-bisection
            Summary|~20-30x slowdown in         |[10/11/12/13 Regression]
                   |populating std::vector from |~20-30x slowdown in
                   |std::ranges::iota_view      |populating std::vector from
                   |                            |std::ranges::iota_view
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Regarding fn1, would you mind re-running the test on your Xeon CPU with fn2
removed from the source code and -falign-loops=32 added to gcc command line?
For fn1, assembly of the inner loop should be identical, so I think the 20% you
were seeing may result from different loop alignment with respect to 32b fetch
boundary.

Also please note that cloud instances backing godbolt.org have different CPUs,
so timing results from different runs are not directly comparable.

Regarding fn2, this may partially be a library issue, compiling preprocessed
source from gcc-10.4 using gcc-10.2 also exhibits the problem. Inner loop
becomes significantly more complicated. Bisecting should be helpful.

Reply via email to