https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78869
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
Status|UNCONFIRMED |NEW
Known to work| |5.4.0
Keywords| |missed-optimization
Last reconfirmed| |2016-12-20
Component|c |tree-optimization
Ever confirmed|0 |1
Summary|Strange __builtin_memcpy |[6/7 Regression] Strange
|optimisations |__builtin_memcpy
| |optimisations
Target Milestone|--- |6.3
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's SRA which handles
memcpy_test (char * ptr)
{
char * D.1764;
char temp[64];
try
{
D.1764 = ptr + 64;
MEM[(char * {ref-all})&temp] = MEM[(char * {ref-all})D.1764];
MEM[(char * {ref-all})ptr] = MEM[(char * {ref-all})&temp];
by "scalarizing" temp byte-wise:
<bb 2>:
MEM[(char * {ref-all})&temp] = MEM[(char * {ref-all})ptr_1(D) + 64B];
temp$0_2 = MEM[(char * {ref-all})ptr_1(D) + 64B];
temp$1_7 = MEM[(char * {ref-all})ptr_1(D) + 65B];
...
MEM[(char * {ref-all})ptr_1(D)] = temp$0_2;
MEM[(char * {ref-all})ptr_1(D) + 1B] = temp$1_7;
MEM[(char * {ref-all})ptr_1(D) + 2B] = temp$2_8;
MEM[(char * {ref-all})ptr_1(D) + 3B] = temp$3_9;
...
return;
ugh. Looks like memcpy folding doesn't handle
D.1765 = ptr + 64;
__builtin_memcpy (&temp[0], D.1765, 64);
__builtin_memcpy (ptr, &temp[0], 64);
so we're just "lucky" there. Regression from GCC 5, the SRA capability was
added with GCC 6.