https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 9 Jan 2019, wschmidt at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767 > > Bill Schmidt <wschmidt at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|WAITING |UNCONFIRMED > Ever confirmed|1 |0 > > --- Comment #2 from Bill Schmidt <wschmidt at gcc dot gnu.org> --- > Hi Richard -- This was reported to us internally. The performance of this > test > case on a P8 server indicates that disabling complete unrolling and applying > unroll-and-jam could produce about a 1.5x speedup. I am going to have our > performance team verify that this is the case using just the options that Li > Jia used; the original report modified the source to provide the results of > unroll-and-jam since the reporter didn't know how to disable cunrolli. I'll > post the results here when we have them. Note for cases like this it would be nice to extend our set of loop pragmas so you could say #pragma GCC loop unroll-and-jam [factor] on the outer loop which should then disable unrolling of the inner. If source modification is possible, that is. Using -fdisable-tree-cunrolli isn't meant to be a "production thing"