https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80520
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jeffrey A. Law from comment #9) > So AFAICT there's two issues that need to be addressed. PRE and split-paths. > > First up is PRE. Compile the sample code from c#5/c#6 with -O3 > -fno-split-paths > > > Prior to PRE we have: > > if (_16 != 0) > goto <bb 5>; [50.00%] > else > goto <bb 4>; [50.00%] > > <bb 4> [local count: 531502203]: > > <bb 5> [local count: 1063004407]: > # iftmp.0_19 = PHI <2567483615(3), 0(4)> > _17 = _15 ^ iftmp.0_19; > > That's actually reasonably good. While it's not a conditional move in the > gimple. It's in a form will be easy for the RTL optimizers to handle and > generate a suitable cmov if we just left it alone on x86_64. > > > PRE (correctly) identifies that it can reduce the number of expression > evaluations on the path traversing bb3->bb5 by hoisting the XOR with the > non-zero constant into BB4 resulting in: > > if (_16 != 0) > goto <bb 4>; [50.00%] > else > goto <bb 5>; [50.00%] > > <bb 4> [local count: 531502203]: > _52 = _15 ^ 2567483615; > > <bb 5> [local count: 1063004407]: > # iftmp.0_19 = PHI <2567483615(4), 0(3)> > # prephitmp_53 = PHI <_52(4), _15(3)> > > That's correct, but far from ideal. > > > So the second issue is split-paths. There's actually two problems to deal > with in split-paths. > > > > > As it stands today this is what we see in split-paths (as a result of the > PRE de-optimization): > > > <bb 3> > [ ... ] > if (_20 != 0) > goto <bb 5>; [50.00%] > else > goto <bb 4>; [50.00%] > > <bb 4> [local count: 531502203]: > _18 = _25 ^ 2567483615; > > <bb 5> [local count: 1063004407]: > # prephitmp_49 = PHI <_25(3), _18(4)> > _2 = (void *) ivtmp.8_30; > MEM[base: _2, offset: 0B] = prephitmp_49; > ivtmp.8_29 = ivtmp.8_30 + 8; > if (ivtmp.8_29 != _6) > goto <bb 3>; [98.99%] > else > goto <bb 6>; [1.01%] > > split-paths should try not to muck it up further. Note that we can probably > identify this half-diamond pretty easily. bb3 dominates bb4. bb4 has a > single statement that feeds a PHI in bb5. That's a very likely > if-conversion candidate so split-paths ought to leave it alone. > > If we were to fix PRE then split-paths would be presented with something > like this: > > <bb3> > [ ... ] > if (_47 != 0) > goto <bb 4>; [50.00%] > else > goto <bb 5>; [50.00%] > > <bb 4> [local count: 531502203]: > > <bb 5> [local count: 1063004407]: > # iftmp.0_48 = PHI <2567483615(3), 0(4)> > _49 = _18 ^ iftmp.0_48; > > ISTM that when either of the blocks in question (bb3 bb4) has *no* > statements, with a single pred that is the other block then split-blocks > definitely should leave it alone as well. > > So, to summarize. > > 1. PRE mucks things up a bit. > 2. split-paths makes it worse > > I've got a prototype patch that implements the two improvements to keep > split-paths from making things worse. That will improve things, but to > really do a good job we'll have to either do something about PRE or have a > pass after PRE undo PRE's deoptimization. I don't think it's per-se a PRE "deoptimization", it's simply what PRE is supposed to do. The result should be quite optimal given we should be able to coalesce _52 and _15 and thus require no edge copies into bb 5. if (_16 != 0) goto <bb 4>; [50.00%] else goto <bb 5>; [50.00%] <bb 4> [local count: 531502203]: _52 = _15 ^ 2567483615; <bb 5> [local count: 1063004407]: # prephitmp_53 = PHI <_52(4), _15(3)> with a "reasonable" CPU we'd end up with sth like test r1, p1 // _16 != 0 into predicate reg p1 xor r2, 2567483615, p1 // xor predicated with p1 now of course path splitting ruins things here (I never ever liked that pass, it's very low-level transform is more suitable for RTL). And if we'd get conditional execution modeled "properly" in GIMPLE we could if-convert there, preventing path-splitting from mucking up things here...