https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66741
Andrew Pinski changed:
What|Removed |Added
Severity|normal |enhancement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66741
--- Comment #7 from Bernhard Reutner-Fischer ---
folding tolower (and toupper while at it) gives:
for i in 0 1 2;do
gcc -o tolower_strcpy-$i tolower_strcpy-$i.c -Ofast -W -Wall -Wextra -pedantic
-DMAIN -msse4.2
done
/tmp/inp is 200MB random bin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66741
--- Comment #6 from Bernhard Reutner-Fischer ---
Created attachment 35942
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35942&action=edit
Variant demonstrating strcpy+tolower fused loop, vectorized, SSE4.x
Code like this should be emitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66741
--- Comment #5 from Bernhard Reutner-Fischer ---
Created attachment 35941
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35941&action=edit
Variant perusing builtins
This is (essentially) the motivating real-world example
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66741
--- Comment #4 from Bernhard Reutner-Fischer ---
Created attachment 35940
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35940&action=edit
Manually expanded variant
expanding builtins early should arrive at that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66741
--- Comment #3 from Richard Biener ---
I don't see where we inline-expand __builtin_tolower at all.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66741
Richard Biener changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66741
--- Comment #1 from Bernhard Reutner-Fischer ---
i.e. maybe something more along the lines of
$ cat <
#include
#include
void
sse_tolower_strcpy (const char *d, const char *s)
{
__m128i ranges =
_mm_setr_epi8 ('A', 'Z', 0, 0, 0, 0, 0, 0