http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041

--- Comment #20 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #18)
> I think it is a bad idea to introduce the IFUNC into libgcc_s, because then
> while you speed up the few users of this builtin, you slow down all users of
> libgcc_s (pretty much all C++ programs and lots of C programs), because they
> will need to resolve the ifunc.

I assume it is only those that use the builtin at least once, no? At least
LD_DEBUG seems to say so. I have no idea how heavy the ifunc resolution is, so
ok. We are back to only considering the non-table version... (By the way,
shouldn't these builtins act like C99 inline functions, so we can sometimes
inline them at -O3 (it could also enable vectorization)? Or maybe they already
do and it's just that I didn't test hard enough)


(In reply to Cristian Rodríguez from comment #19)
> Hold on..Apparently I used ambiguous language in my comment.. adding ifuncs
> to libgcc* was not my real suggestion, but to EMIT such IFUNC s in the
> resulting final user code when the target environment allows it. One
> generic, one hardware/arch specific.

Not sure if that's much better. Ideally we'd clone the hot loop that uses it
and propagate the versioning to that, not just the instruction, but I don't
think we have any code for that. Although if gcc saw the full code:
if(__builtin_cpu_supports("popcnt"))_mm_popcnt_u64(x);else{call lib}, it might
already manage to clone the loop.

Reply via email to