* Aaron Sawdey:

> I've previously posted a patch to add vector/vsx inline expansion of
> strcmp/strncmp for the power8/power9 processors. Here are some of the
> other items I have in the pipeline that I hope to get into gcc9:
>
> * vector/vsx support for inline expansion of memcmp to non-loop code.
>   This improves performance of small memcmp.
> * vector/vsx support for inline expansion of memcmp to loop code. This
>   will close the performance gap for lengths of about 128-512 bytes
>   by making the loop code closer to the performance of the library
>   memcmp.
> * generate inline expansion to a loop for strcmp/strncmp. This closes
>   another performance gap because strcmp/strncmp vector/vsx code
>   currently generated is lots faster than the library call but we
>   only generate comparison of 64 bytes to avoid exploding code size.
>   Similar code in a loop would be compact and allow inline comparison
>   of maybe the first 512 bytes inline before dumping to the library
>   function.
>
> If anyone has any other input on the inline expansion work I've been
> doing for the rs6000 target, please let me know.

The inline expansion of strcmp is problematic for valgrind:

  <https://bugs.kde.org/show_bug.cgi?id=386945>

We currently see around 0.5 KiB of instructions for each call to
strcmp.  I find it hard to believe that this improves general system
performance except in micro-benchmarks.

Reply via email to