https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600

--- Comment #8 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
Created attachment 36887
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36887&action=edit
A faster version

I took the example code found in
http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm/ where the register based
vector computations are explicitly called via the SSE registers and converted
it to use the builtin gcc vector extensions.  I had to experiment a little to
get some of the equivalent operations of the original code.

With only -O2 and march=native I am getting good results. I need to roll this
into the other test program yet to confirm the gflops are being computed
correctly.  The diff value is comparing to the reference naive results to check
the computation is correct.

MY_MMult = [
Size: 40, Gflops: 1.828571e+00, Diff: 2.664535e-15 
Size: 80, Gflops: 3.696751e+00, Diff: 7.105427e-15 
Size: 120, Gflops: 4.051583e+00, Diff: 1.065814e-14 
Size: 160, Gflops: 4.015686e+00, Diff: 1.421085e-14 
Size: 200, Gflops: 4.029212e+00, Diff: 2.131628e-14 
Size: 240, Gflops: 3.972414e+00, Diff: 2.486900e-14 
Size: 280, Gflops: 3.881188e+00, Diff: 2.842171e-14 
Size: 320, Gflops: 3.872371e+00, Diff: 3.552714e-14 
Size: 360, Gflops: 3.887676e+00, Diff: 4.973799e-14 
Size: 400, Gflops: 3.862052e+00, Diff: 4.973799e-14 
Size: 440, Gflops: 3.886575e+00, Diff: 4.973799e-14 
Size: 480, Gflops: 3.910124e+00, Diff: 6.039613e-14 
Size: 520, Gflops: 3.863706e+00, Diff: 6.394885e-14 
Size: 560, Gflops: 3.976947e+00, Diff: 6.750156e-14 
Size: 600, Gflops: 4.002631e+00, Diff: 7.460699e-14 
Size: 640, Gflops: 3.992507e+00, Diff: 8.171241e-14 
Size: 680, Gflops: 3.964570e+00, Diff: 9.237056e-14 
Size: 720, Gflops: 3.973661e+00, Diff: 1.101341e-13 
Size: 760, Gflops: 3.982346e+00, Diff: 1.065814e-13 
Size: 800, Gflops: 3.869291e+00, Diff: 8.881784e-14 
Size: 840, Gflops: 3.936271e+00, Diff: 1.065814e-13 
Size: 880, Gflops: 3.931259e+00, Diff: 1.030287e-13 
Size: 920, Gflops: 3.912907e+00, Diff: 1.207923e-13 
Size: 960, Gflops: 3.938391e+00, Diff: 1.278977e-13 
Size: 1000, Gflops: 3.945754e+00, Diff: 1.421085e-13

Reply via email to