* Aaron Sawdey: > I've previously posted a patch to add vector/vsx inline expansion of > strcmp/strncmp for the power8/power9 processors. Here are some of the > other items I have in the pipeline that I hope to get into gcc9: > > * vector/vsx support for inline expansion of memcmp to non-loop code. > This improves performance of small memcmp. > * vector/vsx support for inline expansion of memcmp to loop code. This > will close the performance gap for lengths of about 128-512 bytes > by making the loop code closer to the performance of the library > memcmp. > * generate inline expansion to a loop for strcmp/strncmp. This closes > another performance gap because strcmp/strncmp vector/vsx code > currently generated is lots faster than the library call but we > only generate comparison of 64 bytes to avoid exploding code size. > Similar code in a loop would be compact and allow inline comparison > of maybe the first 512 bytes inline before dumping to the library > function. > > If anyone has any other input on the inline expansion work I've been > doing for the rs6000 target, please let me know.
The inline expansion of strcmp is problematic for valgrind: <https://bugs.kde.org/show_bug.cgi?id=386945> We currently see around 0.5 KiB of instructions for each call to strcmp. I find it hard to believe that this improves general system performance except in micro-benchmarks.