https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103721
Andrew Macleod <amacleod at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeffreyalaw at gmail dot com --- Comment #3 from Andrew Macleod <amacleod at redhat dot com> --- After the initial loop tweaking, the IL that the threader sees in v.c.111t.threadfull1 is: ;; basic block 2, loop depth 0 goto <bb 10>; [100.00%] ;; basic block 3, loop depth 1 ipos.0_2 = ipos; if (ipos.0_2 != 0) goto <bb 6>; [50.00%] else goto <bb 4>; [50.00%] ;; basic block 4, loop depth 1 ;; basic block 6, loop depth 1 # searchVolume_11 = PHI <1(4), 0(3)> # currentVolume_10 = PHI <searchVolume_5(4), searchVolume_5(3)> ;; basic block 10, loop depth 1 # searchVolume_5 = PHI <searchVolume_11(6), 1111(2)> # currentVolume_6 = PHI <currentVolume_10(6), 0(2)> _7 = searchVolume_5 != currentVolume_6; _8 = searchVolume_5 != 0; _9 = _7 & _8; if (_9 != 0) goto <bb 3>; [89.00%] else goto <bb 7>; [11.00%] It looks to me like it decides to thread 2->10, which means it turns bb2 into something like: # searchVolume_5 = 1111 # currentVolume_6 = 0 _7 = searchVolume_5 != currentVolume_6; // folds to 1 _8 = searchVolume_5 != 0; // folds to 1 _9 = _7 & _8; //folds to 1 if (_9 != 0) // folds to goto bb3 goto <bb 3>; [89.00%] else goto <bb 7>; [11.00%] And then it updates the PHIS in BB10 to not have an edge from bb2: (note I am doing this by hand, not actually renaming any ssa_names.) ;; basic block 10, loop depth 1 # searchVolume_5 = PHI <searchVolume_11(6)> # currentVolume_6 = PHI <currentVolume_10(6)> _7 = searchVolume_5 != currentVolume_6; _8 = searchVolume_5 != 0; _9 = _7 & _8; if (_9 != 0) goto <bb 3>; [89.00%] else goto <bb 7>; [11.00%] The problem would seem to be that when we thread 2->10, we are actually peeling off an iteration of the loop. the PHIs in BB6: ;; basic block 6, loop depth 1 # searchVolume_11 = PHI <1(4), 0(3)> # currentVolume_10 = PHI <searchVolume_5(4), searchVolume_5(3)> I think currentVolume_10 is picking up searchVolume_5 calulated from the threaded entry point, which is the constant 1111... and we are "losing" the information that it could also be the value of searchVolume_11 from the previous iteration. Threading is out of my wheel house, but Its not clear to me how you could even update the PHI nodes properly if you try to thread that path... And its starting to give me a headache thinking about it :-) It seem that needs to be a new phi inserted in BB3 which sets searchvolume_5 = PHI <1111(2), searchVolume_11(10)> Or something to that efffect. something is missing anyway.