https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85800

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 16 May 2018, juneyoung.lee at sf dot snu.ac.kr wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85800
> 
> --- Comment #2 from Juneyoung Lee <juneyoung.lee at sf dot snu.ac.kr> ---
> If address is not adjacent - you can reorder the definition of A and B and try
> again.
> ```
>   int A[4], B[4];
> ```
> 
> If changing
> 
>     if (c1 == c2)
>       // Always true if p and q have same integer representation.
>       arr3[i] = arr1[i];
>     else
>       arr3[i] = arr2[i];
> 
> to
> 
>     if (c1 == c2)
>       // Always true if p and q have same integer representation.
>       arr3[i] = arr2[i];
>     else
>       arr3[i] = arr1[i];
> 
> removed miscompilation, I think this implies that the former one is folded 
> into
> `arr[3] = arr2[i]` (which is incorrect, because arr2 is equivalent to `q`) but
> the latter one is folded into `arr[3] = arr1[i]`.
> 
> This explains the generated assembly as well: Compiling store_10_to_p with -O3
> generates
> 
> ```
> store_10_to_p(int*, int*):
>   mov DWORD PTR [rdi], 1
>   mov DWORD PTR [rsi], 10
>   ret
> ```
> 
> meaning that it is actually trying to store 10 to *q, which is UB if the
> function is inlined. (B[4] is out-of-bounds)

Ok, so the sequence of optimization is as follows.  First the
comparison is narrowed so we compare arr1[i] and arr2[i] directly.
Then we sink the store and get

  _10 = arr1[i_9];
  _11 = arr2[i_9];
  if (_10 == _11)
    goto <bb 5>; [34.00%]
  else
    goto <bb 4>; [66.00%]

  <bb 4> [58.67%]:

  <bb 5> [88.89%]:
  # cstore_29 = PHI <_10(3), _11(4)>
  arr3[i_9] = cstore_29;

then this is pattern matched to just

  _1 = arr1[i_4];
  _2 = arr2[i_4];
  arr3[i_4] = _2;

which is correct since if _10 == _11 we can as well store _11
instead of _10 to arr3[i_9].  This is where things go "wrong"
and a "bug" similar to PR61502 arises.  We propagate conditional
equivalences.

Then alias analysis comes along and while it originally
correctly computed arr3 to "point" to both A and B it now
only sees the provenance on B and both stores can be
re-ordered by scheduling.  We expand to RTL from
(-fdump-tree-optimized-alias-uid):

  <bb 2> [11.11%]:
  # USE = nonlocal null { D.2675 D.2676 } (escaped)
  # CLB = nonlocal null { D.2675 D.2676 } (escaped)
  printf ("%p %p\n", &AD.2675, &BD.2676[4]);
  # RANGE [0, 18446744073709551615] NONZERO 18446744073709551600
  _8 = (long unsigned intD.10) &BD.2676[4];
  MEM[(charD.7 * {ref-all})&arr2D.2693] = _8;
  _27 = MEM[(charD.7 * {ref-all})&arr2D.2693];
  MEM[(charD.7 * {ref-all})&arr3D.2694] = _27;
  _14 = MEM[(charD.7 * {ref-all})&arr3D.2694];
  # PT = null { D.2676 } (escaped)
  r_15 = (intD.6 *) _14;
  MEM[(intD.6 *)&AD.2675] = 1;
  *r_15 = 10;
  arr2D.2693 ={v} {CLOBBER};
  arr3D.2694 ={v} {CLOBBER};
  # USE = nonlocal null { D.2675 D.2676 } (escaped)
  # CLB = nonlocal null { D.2675 D.2676 } (escaped)
  printf ("%d\n", 1);
  AD.2675 ={v} {CLOBBER};
  BD.2676 ={v} {CLOBBER};
  return 0;

as you can see r_15 is computed to point to BD.2676 only.  The
"bug" reproduces with -O2 as well if you declare store_10_to_p
static inline.

Reply via email to