https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821

            Bug ID: 109821
           Summary: vect: Different output with -O2 -ftree-loop-vectorize
                    compared to -O2
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yinyuefengyi at gmail dot com
  Target Milestone: ---

For this test code, it aims to generate special patterns different with memcpy
or memmove, it generates different results with -O2 -ftree-loop-vectorize
compared to -O2, is this a bug of vectorizer that lack of checking the gap of
op-src should be larger than vector mode size (here only do vectorize if op -
src > 16)?

copy.cpp:

#include <stdio.h>
#include <cstdint>
#include <stdlib.h>

#define UNALIGNED_LOAD64(_p) (*reinterpret_cast<const uint64_t *>(_p))
#define UNALIGNED_STORE64(_p, _val) (*reinterpret_cast<uint64_t *>(_p) =
(_val))

__attribute__((__noinline__))
static void IncrementalCopyFastPath(const char* src, char* op, int len) {
    while (op - src < 8) {
        UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src));
        len -= op - src;
        op += op - src;
    }
    while (len > 0) {
        UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src));
        src += 8;
        op += 8;
        len -= 8;
    }
}

int main ()
{
  char src[] = "123456789abcdefghijklmnopqrstu";
  char *op = src+12;
  char * dst = op;
  IncrementalCopyFastPath (src, op, 36);
  int i = 0;
  while (i < 36)
    {printf("%x ", *(dst+i)), i++;}
  printf("\n");
  return 0;
}


$ gcc copy.cpp -O2 -o a.out.good
$ ./a.out.good
30 31 32 33 34 35 36 37 38 39 61 62 30 31 32 33 34 35 36 37 38 39 61 62 30 31
32 33 34 35 36 37 38 39 61 62
$ gcc copy.cpp -O2 -ftree-loop-vectorize  -o a.out.bad
$ ./a.out.bad
30 31 32 33 34 35 36 37 38 39 61 62 63 64 65 66 34 35 36 37 38 39 61 62 63 64
65 66 73 74 75 76 38 39 61 62


gimple after t.vect:

IncrementalCopyFastPath.constprop (const char * src, char * op)
{
...
  <bb 2> [local count: 118111600]:
  _4 = src_8(D) + 8;
  if (_4 != op_9(D))    // <=  the check should be op_9 > src_8 + 16 here?
    goto <bb 16>; [80.00%]
  else
    goto <bb 10>; [20.00%]
...
}

Reply via email to