http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

--- Comment #30 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-02 
08:07:15 UTC ---
Created attachment 26809
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26809
pr50182.C

Even the reduced testcase is orders of magnitude longer than what would be
desirable for analysis, I've tried to reduce it just to the templates that are
actually needed (and can be meassured just with time), does this reflect the
slowdowns you are seeing?  The next step at reducing would be to remove all the
template mess, instantiate it by hand, and perhaps also inline by hand.  There
is no reason why we shouldn't be just having one loop with all the statements
in it.  On this reduced testcase on Intel i7-2600 CPU with -O3 the
-DFAST_VER/-DNOINLINE don't seem to make any difference, but 4.6 is measurably
faster than 4.7.

In any case, this is way too late for 4.7.

Reply via email to