2 since r13-1268-g8c99e307b20c502

jakub at gcc dot gnu.org via Gcc-bugs Thu, 30 Jan 2025 03:50:49 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117498


--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This looks like lim4 bug to me (looking at #c2 with -O3).
Before lim2 we have:
  <bb 2> [local count: 14598063]:
  c.0_16 ={v} c;
  if (c.0_16 == 0B)
    goto <bb 4>; [18.09%]
  else
    goto <bb 3>; [81.91%]

  <bb 3> [local count: 11957273]:

  <bb 4> [local count: 14598063]:
  # _17 = PHI <1(3), -1(2)>
  # prephitmp_26 = PHI <1(3), 255(2)>
  f = _17;
  n = 0;
  d.2_25 = d;
  if (d.2_25 <= 3)
    goto <bb 10>; [89.00%]
  else
    goto <bb 9>; [11.00%]

  <bb 10> [local count: 12992276]:
  ivtmp.140_38 = (unsigned int) d.2_25;
  goto <bb 7>; [100.00%]
...
  <bb 7> [local count: 118111600]:
  # ivtmp.140_45 = PHI <ivtmp.140_73(12), ivtmp.140_38(10)>
  if (prephitmp_26 != 1)
    goto <bb 29>; [89.00%]
  else
    goto <bb 6>; [11.00%]
and only if prephitmp_26 is not 1 (it must be 255 then) we branch to bb 29
which has vectorized unrolled loop doing 251 iterations and invoking UB in that
case (correctly).
But prephitmp_26 is actually 1 at runtime, so no UB.
But lim4 decides to hoist one of the vector loads from that bb 29 to bb 10, and
that is not correct, because the load would be before UB only for prephitmp_26
!= 1 and now it is unconditionally:
   <bb 10> [local count: 12992276]:
   ivtmp.140_38 = (unsigned int) d.2_25;
+  vect__12.117_6 = MEM <vector(16) char> [(char *)&n];
+  g__lsm.141_333 = _30(D);
+  g__lsm_flag.142_315 = 0;
   goto <bb 7>; [100.00%]

   <bb 29> [local count: 105119324]:
-  vect__12.117_6 = MEM <vector(16) char> [(char *)&n];
   vect__12.118_21 = MEM <vector(16) char> [(char *)&n + 16B];
   vect__12.119_107 = MEM <vector(16) char> [(char *)&n + 32B];
   vect__12.120_109 = MEM <vector(16) char> [(char *)&n + 48B];
The crash is on exactly that load:
=> 0x000000000040105d <+61>:    movdqa 0x7(%rsp),%xmm4
because obviously %rsp+7 is not 16-byte aligned; n has just char type, so it is
just fine when it is just 1-byte aligned.

[Bug middle-end/117498] [13/14/15 regression] Miscompile with -O3 and -O0/1/2 since r13-1268-g8c99e307b20c502

Reply via email to