https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

--- Comment #38 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Joost VandeVondele from comment #37)
> (In reply to Joost VandeVondele from comment #36)
> > #pragma GCC optimize ( "-Ofast -fvariable-expansion-in-unroller
> > -funroll-loops" )
> 
> and really beneficial for larger matrices would be 
> 
> -floop-nest-optimize
> 
> in particular the blocking (it would be an additional motivation for PR14741
> and work on graphite in general), don't know if one can give the parameter
> for the blocking. In principle the loop-nest-optimization, together with the
> -Ofast (and ideally -march=native, which we can't have in libgfortran, I
> assume) would yield near peak performance.

The algorithm that Jerry implemented already has a very nice unrolling/
blocking algorithm.  I doubt that the gcc algorithms can add to that.

Regarding -march=native, that could really be an improvement,
especially with -mavx.  I wonder if it is possible to have
architecture-specific versions of library functions?  We could
select the right routine depending on the -march flag.  Worth
a question on the gcc list, probably (but definitely _not_ a
prerequisite for this going into gcc 7).

Of course, we _could_ also try to bring blocking to the inline
version (PR 66189), risking insanity for the implementer :-)

Jerry, what Netlib code were you basing your code on?

Reply via email to