On Mon, May 18, 2026 at 1:16 AM Guoce Feng via Gcc <[email protected]> wrote: > > Hi, > > I am investigating an optimization opportunity in GCC related to > factoring common pure operations out of control-flow joins. > > Consider code of this general form: > > if (c1 <= 0.0) > distance = sqrt(expr1); > else if (c2 <= c1) > distance = sqrt(expr2); > else > distance = sqrt(expr3); > > > After control-flow simplification / if-conversion, this can effectively > lead to a form resembling: > > distance = select(cond1, > sqrt(expr1), > select(cond2, sqrt(expr2), sqrt(expr3))); > > > For vectorized code, this may become "three vector sqrt operations plus > blends", while a more profitable form would be: > > merged_expr = select(cond1, > expr1, > select(cond2, expr2, expr3)); > distance = sqrt(merged_expr); > > > That is, transform a conceptual pattern like > > phi(sqrt(a), sqrt(b), sqrt(c)) > > > into > > sqrt(phi(a, b, c)) > > > when the operation is safe to factor out and doing so is profitable. > > I noticed that LLVM has a related transformation in SimplifyCFG under > the sink-common-insts machinery. In GCC, I am trying to understand what > the most appropriate place would be for a similar optimization. > > Two possible implementation directions seem plausible: > > 1. Extend the existing PHI factoring logic in > tree-ssa-phiopt.cc, especially around > factor_out_conditional_operation, to also handle suitable > side-effect-free builtins / calls such as sqrt under the > appropriate semantic restrictions.
So part of the problem with the above case at -Ofast is really https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123113 . I have a patch set which is mostly approved but I need to double check on the order of basic-blocks being looked at. The patches are: https://inbox.sourceware.org/gcc-patches/[email protected]/ https://inbox.sourceware.org/gcc-patches/[email protected]/ https://patchwork.sourceware.org/project/gcc/patch/[email protected]/ In that order. (I can't seen to find the 3rd patch on inbox for some reason). With those patches we get: ``` <bb 2> [local count: 1073741824]: if (c1_2(D) <= 0.0) goto <bb 3>; [41.00%] else goto <bb 4>; [59.00%] <bb 3> [local count: 440234144]: distance_7 = __builtin_sqrt (expr1_6(D)); [tail call] goto <bb 7>; [100.00%] <bb 4> [local count: 633507680]: if (c1_2(D) >= c2_3(D)) goto <bb 5>; [50.00%] else goto <bb 6>; [50.00%] <bb 5> [local count: 316753840]: <bb 6> [local count: 633507680]: # _8 = PHI <expr3_4(D)(4), expr2_5(D)(5)> distance_10 = __builtin_sqrt (_8); [tail call] <bb 7> [local count: 1073741824]: # distance_1 = PHI <distance_10(6), distance_7(3)> return distance_1; ``` Extending factor_* to handle the above could be done but I am not sure phiopt is the correct place. Note there is code in the ifcvt that does a similar thing as factor_* in phiopt which then does handle the rest of it and for: ``` double f(double *expr1, double *expr2, double *expr3, double *c1, double *c2) { double t = 0.0; for(int i =0 ;i < 1024; i++) { double distance = 0.0; if (c1[i] <= 0.0) distance = __builtin_sqrt(expr1[i]); else if (c2[i] <= c1[i]) distance = __builtin_sqrt(expr2[i]); else distance = __builtin_sqrt(expr3[i]); t+=distance; } return t; } ``` With -mavx512f (since I was too lazy to figure out what expr* was and too lazy to test non-x86_64 [since that is what I have built]) we get: ``` _7 = _34 ? _14 : _12; _63 = _42 ? _6 : _7; distance_17 = __builtin_sqrt (_63); t_25 = distance_17 + t_29; ``` In ifcvt. Which I assume is exactly what you were expecting. > > 2. Add a more general "common instruction sinking" transformation to > tree-ssa-sink.cc, analogous in spirit to the existing common-store > sinking support, but for a restricted class of pure scalar > computations. > > My current intuition is that the PHIOPT direction may be a better fit > for this specific pattern, because the optimization is fundamentally > about factoring a common operation across PHI/merge structure rather > than ordinary code sinking. However, I would appreciate guidance from > the GCC community on the preferred design. > > I am especially interested in feedback on: > > * Whether tree-ssa-phiopt.cc is indeed the right place for this form of > transformation. > > * Whether handling builtins such as sqrt is acceptable under the > relevant floating-point / errno / exception semantics. > > * Whether this should be restricted to cases where the operation is > known to be side-effect-free and safe to move. > > * Whether profitability should be guided primarily by code size, > vectorization opportunities, or target-independent cost heuristics. > > * Whether there is existing work or a PR in this area that I should > build upon. See above. > > A motivating reduced example comes from a geometric distance kernel in > which three branch-local sqrt computations could ideally be turned into > one sqrt after value merging, improving the shape of the vectorized code. See above. Thanks, Andrea > > If this direction seems reasonable, I am interested in preparing a > prototype patch and test cases. > > Thanks, > Guoce Feng
