https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80616
Bug ID: 80616 Summary: Slow vector multiply compared to icc Product: gcc Version: 6.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: nightstrike at gmail dot com Target Milestone: --- Created attachment 41308 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41308&action=edit Testcase If I compile the following: float vmul(int N, float A[N], float B[N]) { float total = 0.0f; for (int i = 0; i < N; ++i) total += A[i] * B[i]; return total; } GCC gives me a time of 7us and icc gives me 1us for 4k elements using -O3 -march=native on bdver2. $ gcc v.c -lrt -O3 -march=native -save-temps $ ./a.out Val: 4772697023455277613056.000000 Time: 0.000007 $icc v.c -lrt -O3 -xHost -save-temps $ ./a.out Val: 4772700964104951562240.000000 Time: 0.000001 Attached small source code, will add asm in followup.