[Bug tree-optimization/51179] poor vectorization on interlagos.

Joost.VandeVondele at mat dot ethz.ch Tue, 22 Nov 2011 10:35:24 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51179


--- Comment #5 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 
2011-11-22 18:34:48 UTC ---
(In reply to comment #3)
> is IMHO just a matter whether graphite can -floop-interchange this or not.
> If you swap manually the l and j for lines, the generated code looks better,
> though for some reason we unroll even the l loop which increases register
> pressure too much.

Unfortunately, the issue is not just loop ordering or loop unrolling. I have a
code generator which tries systematically all possible loop orderings, and all
possible unroll factors. For this testcase (matrix sizes 4,10,10) the best cray
output (this one) runs at 10.8 Gflops. The best gcc compiled version runs at
4.7 Gflops (smm_dnn_4_10_10_1_1_10_2). I attach the test code, which I use for
testing.

[Bug tree-optimization/51179] poor vectorization on interlagos.

Reply via email to