https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78869

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
             Status|UNCONFIRMED                 |NEW
      Known to work|                            |5.4.0
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2016-12-20
          Component|c                           |tree-optimization
     Ever confirmed|0                           |1
            Summary|Strange __builtin_memcpy    |[6/7 Regression] Strange
                   |optimisations               |__builtin_memcpy
                   |                            |optimisations
   Target Milestone|---                         |6.3

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's SRA which handles

memcpy_test (char * ptr)
{
  char * D.1764;
  char temp[64];

  try
    {
      D.1764 = ptr + 64;
      MEM[(char * {ref-all})&temp] = MEM[(char * {ref-all})D.1764];
      MEM[(char * {ref-all})ptr] = MEM[(char * {ref-all})&temp];

by "scalarizing" temp byte-wise:

  <bb 2>:
  MEM[(char * {ref-all})&temp] = MEM[(char * {ref-all})ptr_1(D) + 64B];
  temp$0_2 = MEM[(char * {ref-all})ptr_1(D) + 64B];
  temp$1_7 = MEM[(char * {ref-all})ptr_1(D) + 65B];
...
  MEM[(char * {ref-all})ptr_1(D)] = temp$0_2;
  MEM[(char * {ref-all})ptr_1(D) + 1B] = temp$1_7;
  MEM[(char * {ref-all})ptr_1(D) + 2B] = temp$2_8;
  MEM[(char * {ref-all})ptr_1(D) + 3B] = temp$3_9;
...
  return;

ugh.  Looks like memcpy folding doesn't handle

      D.1765 = ptr + 64;
      __builtin_memcpy (&temp[0], D.1765, 64);
      __builtin_memcpy (ptr, &temp[0], 64);

so we're just "lucky" there.  Regression from GCC 5, the SRA capability was
added with GCC 6.

Reply via email to