https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2023-07-05 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Sergei Trofimovich from comment #7) > (In reply to Richard Biener from comment #6) > > Turning off PRE also allows it to pass but I guess it's the same as with > > SRA. > > -fno-strict-aliasing also helps, -fno-ipa-modref doesn't. I suspect > > bisection > > isn't going to help much, but maybe it points to a small set of changes on > > the testcase itself. > > Bisect landed on the r12-4790-g4b3a325f07aceb : > > $ git bisect bad > 4b3a325f07acebf47e82de227ce1d5ba62f5bcae is the first bad commit > commit 4b3a325f07acebf47e82de227ce1d5ba62f5bcae > Author: Aldy Hernandez <al...@redhat.com> > Date: Thu Oct 28 15:35:21 2021 +0200 > > Remove VRP threader passes in exchange for better threading pre-VRP. After this commit the bug still appears with -fdisable-tree-threadfull1 -fdisable-tree-threadfull2 and before the commit it appears with -fdisable-tree-vrp-thread1 -fdisable-tree-vrp-thread2 so it seems removing the threading passes after VRP exposes the issue. Looking at the source and the IL of grow() I think the issue is that loop invariant motion hoists _134 and _104 in _31 = Visited.Small; if (_31 != 0) goto <bb 5>; [50.00%] else goto <bb 4>; [50.00%] <bb 4> [local count: 621159684]: _145 = MEM[(struct LargeRep *)&Visited + 8B].Slots; _34 = MEM[(struct LargeRep *)&Visited + 8B].Capacity; if (_34 != 0) goto <bb 5>; [0.00%] else goto <bb 9>; [0.00%] <bb 5> [local count: 955630290]: # _134 = PHI <_34(4), 2(3)> # _104 = PHI <_145(4), &Visited.u.i(3)> <bb 6> [local count: 8933211531]: out of the inner first loop which I think is OK even though it makes the _145 and _34 reads unconditional. Then PRE hoists those further, but correctly identifying Visited.Small = 0; _94 = operator new [] (1024); Visited.u.o.Slots = _94; Visited.u.o.Capacity = 128; as where they change. Interestingly I can disable (almost) all passes after tree PRE and still get the failure. One change PRE does is - _18 = MEM[(struct V *)&Visited + 8B].v; - if (_18 != 790526) - goto <bb 26>; [66.00%] + _196 = (long unsigned int) prephitmp_173; + if (prephitmp_173 != 790526B) + goto <bb 22>; [66.00%] where it replaces the read with _94 (or EmptyKey as from entry). That's phishy. It's the != EmptyKey check in grow when copying inline slots into temporary storage. So on 4b3a325f07acebf47e82de227ce1d5ba62f5bcae^ (one before the bisected rev.) it fails with g++ t.C -O2 -fdisable-tree-vrp-thread1 -fdump-tree-pre-details -fdump-tree-all -fno-tree-tail-merge -fno-tree-vectorize -fno-tree-loop-optimize -fdisable-tree-sink1 -fdisable-tree-dse3 -fdisable-tree-dce4 -fdisable-tree-fre5 -fdisable-tree-dom3 -fdisable-tree-thread3 -fdisable-tree-thread4 -fdisable-tree-vrp2 -fdisable-tree-vrp-thread2 -fdisable-tree-dse5 -fdisable-tree-cddce3 -fdisable-tree-sink2 -fdisable-tree-store-merging -fdisable-tree-dce7 -fdisable-tree-modref2 -fdisable-tree-local-pure-const2 -fdisable-tree-split-paths -fdisable-tree-ccp4 -fdisable-tree-slsr -fdisable-tree-reassoc2 -fdisable-tree-switchlower1 -fdisable-tree-forwprop4 -fdisable-tree-phiopt4 -fno-gcse -fno-dse -fno-gcse-after-reload -fdbg-cnt=treepre_insert:1-3:16-21:24-24 -fno-tree-partial-pre So it goes that the above load is CSEd to the LIM inserted load plus a conversion from pointer to unsigned long: Value numbering stmt = _145 = MEM[(struct LargeRep *)&Visited + 8B].Slots; Setting value number of _145 to _145 (changed) ... Value numbering stmt = _18 = MEM[(struct V *)&Visited + 8B].v; Inserting name _47 for expression (long unsigned int) _145 Setting value number of _18 to _47 (changed) we seem to disregard TBAA when CSEing two loads without an intervening possibly aliasing stores. But this then causes PRE (well, that's a long-long-standing issue...) to record the later load to EXP_GEN as exp_gen[25] := { {component_ref<Slots>,mem_ref<8B>,addr_expr<&Visited>}@.MEM_129 (0084), _18 (0031) } so to PRE it appears that we do _18 = (long unsigned int) MEM[(struct LargeRep *)&Visited + 8B].Slots; and thus the wrong ref is used for disambiguation which means we disambiguate against TheSlot_172->v = Item$f_20; which eventually stores to that slot. *sigh*