https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80616

            Bug ID: 80616
           Summary: Slow vector multiply compared to icc
           Product: gcc
           Version: 6.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nightstrike at gmail dot com
  Target Milestone: ---

Created attachment 41308
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41308&action=edit
Testcase

If I compile the following:

float vmul(int N, float A[N], float B[N]) {
    float total = 0.0f;
    for (int i = 0; i < N; ++i)
        total += A[i] * B[i];
    return total;
}

GCC gives me a time of 7us and icc gives me 1us for 4k elements using -O3
-march=native on bdver2.

$ gcc v.c -lrt -O3 -march=native -save-temps
$ ./a.out
Val: 4772697023455277613056.000000
Time: 0.000007

$icc v.c -lrt -O3 -xHost -save-temps
$ ./a.out
Val: 4772700964104951562240.000000
Time: 0.000001

Attached small source code, will add asm in followup.

Reply via email to