https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85800
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 16 May 2018, juneyoung.lee at sf dot snu.ac.kr wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85800 > > --- Comment #2 from Juneyoung Lee <juneyoung.lee at sf dot snu.ac.kr> --- > If address is not adjacent - you can reorder the definition of A and B and try > again. > ``` > int A[4], B[4]; > ``` > > If changing > > if (c1 == c2) > // Always true if p and q have same integer representation. > arr3[i] = arr1[i]; > else > arr3[i] = arr2[i]; > > to > > if (c1 == c2) > // Always true if p and q have same integer representation. > arr3[i] = arr2[i]; > else > arr3[i] = arr1[i]; > > removed miscompilation, I think this implies that the former one is folded > into > `arr[3] = arr2[i]` (which is incorrect, because arr2 is equivalent to `q`) but > the latter one is folded into `arr[3] = arr1[i]`. > > This explains the generated assembly as well: Compiling store_10_to_p with -O3 > generates > > ``` > store_10_to_p(int*, int*): > mov DWORD PTR [rdi], 1 > mov DWORD PTR [rsi], 10 > ret > ``` > > meaning that it is actually trying to store 10 to *q, which is UB if the > function is inlined. (B[4] is out-of-bounds) Ok, so the sequence of optimization is as follows. First the comparison is narrowed so we compare arr1[i] and arr2[i] directly. Then we sink the store and get _10 = arr1[i_9]; _11 = arr2[i_9]; if (_10 == _11) goto <bb 5>; [34.00%] else goto <bb 4>; [66.00%] <bb 4> [58.67%]: <bb 5> [88.89%]: # cstore_29 = PHI <_10(3), _11(4)> arr3[i_9] = cstore_29; then this is pattern matched to just _1 = arr1[i_4]; _2 = arr2[i_4]; arr3[i_4] = _2; which is correct since if _10 == _11 we can as well store _11 instead of _10 to arr3[i_9]. This is where things go "wrong" and a "bug" similar to PR61502 arises. We propagate conditional equivalences. Then alias analysis comes along and while it originally correctly computed arr3 to "point" to both A and B it now only sees the provenance on B and both stores can be re-ordered by scheduling. We expand to RTL from (-fdump-tree-optimized-alias-uid): <bb 2> [11.11%]: # USE = nonlocal null { D.2675 D.2676 } (escaped) # CLB = nonlocal null { D.2675 D.2676 } (escaped) printf ("%p %p\n", &AD.2675, &BD.2676[4]); # RANGE [0, 18446744073709551615] NONZERO 18446744073709551600 _8 = (long unsigned intD.10) &BD.2676[4]; MEM[(charD.7 * {ref-all})&arr2D.2693] = _8; _27 = MEM[(charD.7 * {ref-all})&arr2D.2693]; MEM[(charD.7 * {ref-all})&arr3D.2694] = _27; _14 = MEM[(charD.7 * {ref-all})&arr3D.2694]; # PT = null { D.2676 } (escaped) r_15 = (intD.6 *) _14; MEM[(intD.6 *)&AD.2675] = 1; *r_15 = 10; arr2D.2693 ={v} {CLOBBER}; arr3D.2694 ={v} {CLOBBER}; # USE = nonlocal null { D.2675 D.2676 } (escaped) # CLB = nonlocal null { D.2675 D.2676 } (escaped) printf ("%d\n", 1); AD.2675 ={v} {CLOBBER}; BD.2676 ={v} {CLOBBER}; return 0; as you can see r_15 is computed to point to BD.2676 only. The "bug" reproduces with -O2 as well if you declare store_10_to_p static inline.