https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80616
Bug ID: 80616
Summary: Slow vector multiply compared to icc
Product: gcc
Version: 6.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: nightstrike at gmail dot com
Target Milestone: ---
Created attachment 41308
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41308&action=edit
Testcase
If I compile the following:
float vmul(int N, float A[N], float B[N]) {
float total = 0.0f;
for (int i = 0; i < N; ++i)
total += A[i] * B[i];
return total;
}
GCC gives me a time of 7us and icc gives me 1us for 4k elements using -O3
-march=native on bdver2.
$ gcc v.c -lrt -O3 -march=native -save-temps
$ ./a.out
Val: 4772697023455277613056.000000
Time: 0.000007
$icc v.c -lrt -O3 -xHost -save-temps
$ ./a.out
Val: 4772700964104951562240.000000
Time: 0.000001
Attached small source code, will add asm in followup.