https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119596

--- Comment #18 from Mateusz Guzik <mjguzik at gmail dot com> ---
Ok, I see.

I think I also see the discrepancy here.

When you bench "libcall", you are going to glibc with SIMD-enabled routines.

In contrast, the kernel avoids SIMD for performance reasons and instead will
only do regular stores *or* rep mov/stos in these.

But this also means that your "libcall is faster" results wont hold in the
kernel, where you have to assume the thing is handled with the rep prefix.

Perhaps gcc could take -mno-sse into consideration when deciding when to punt
to libcall?

I'm going to provide results after I regain access to the hw. I'm going to
whack sizes above 512 as they don't add any value and also remove libcall as
that wont be a valid test for my purpose. Instead I'm going to add better
granularity of sizes < 512.

Reply via email to