On Mon, May 18, 2026 at 1:16 AM Guoce Feng via Gcc <[email protected]> wrote:
>
> Hi,
>
> I am investigating an optimization opportunity in GCC related to
> factoring common pure operations out of control-flow joins.
>
> Consider code of this general form:
>
> if (c1 <= 0.0)
>   distance = sqrt(expr1);
> else if (c2 <= c1)
>   distance = sqrt(expr2);
> else
>   distance = sqrt(expr3);
>
>
> After control-flow simplification / if-conversion, this can effectively
> lead to a form resembling:
>
> distance = select(cond1,
>                   sqrt(expr1),
>                   select(cond2, sqrt(expr2), sqrt(expr3)));
>
>
> For vectorized code, this may become "three vector sqrt operations plus
> blends", while a more profitable form would be:
>
> merged_expr = select(cond1,
>                      expr1,
>                      select(cond2, expr2, expr3));
> distance = sqrt(merged_expr);
>
>
> That is, transform a conceptual pattern like
>
> phi(sqrt(a), sqrt(b), sqrt(c))
>
>
> into
>
> sqrt(phi(a, b, c))
>
>
> when the operation is safe to factor out and doing so is profitable.
>
> I noticed that LLVM has a related transformation in SimplifyCFG under
> the sink-common-insts machinery. In GCC, I am trying to understand what
> the most appropriate place would be for a similar optimization.
>
> Two possible implementation directions seem plausible:
>
>   1.  Extend the existing PHI factoring logic in
> tree-ssa-phiopt.cc, especially around
> factor_out_conditional_operation, to also handle suitable
> side-effect-free builtins / calls such as sqrt under the
> appropriate semantic restrictions.

So part of the problem with the above case at -Ofast is really
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123113 .
I have a patch set which is mostly approved but I need to double check
on the order of basic-blocks being looked at.
The patches are:
https://inbox.sourceware.org/gcc-patches/[email protected]/
https://inbox.sourceware.org/gcc-patches/[email protected]/
https://patchwork.sourceware.org/project/gcc/patch/[email protected]/

In that order. (I can't seen to find the 3rd patch on inbox for some reason).
With those patches we get:
```
  <bb 2> [local count: 1073741824]:
  if (c1_2(D) <= 0.0)
    goto <bb 3>; [41.00%]
  else
    goto <bb 4>; [59.00%]

  <bb 3> [local count: 440234144]:
  distance_7 = __builtin_sqrt (expr1_6(D)); [tail call]
  goto <bb 7>; [100.00%]

  <bb 4> [local count: 633507680]:
  if (c1_2(D) >= c2_3(D))
    goto <bb 5>; [50.00%]
  else
    goto <bb 6>; [50.00%]

  <bb 5> [local count: 316753840]:

  <bb 6> [local count: 633507680]:
  # _8 = PHI <expr3_4(D)(4), expr2_5(D)(5)>
  distance_10 = __builtin_sqrt (_8); [tail call]

  <bb 7> [local count: 1073741824]:
  # distance_1 = PHI <distance_10(6), distance_7(3)>
  return distance_1;
```

Extending factor_* to handle the above could be done but I am not sure
phiopt is the correct place.
Note there is code in the ifcvt that does a similar thing as factor_*
in phiopt which then does handle the rest of it and for:
```
double f(double *expr1, double *expr2, double *expr3, double *c1, double *c2)
{
double t = 0.0;
for(int i =0 ;i < 1024; i++)
{
  double distance = 0.0;
if (c1[i] <= 0.0)
  distance = __builtin_sqrt(expr1[i]);
else if (c2[i] <= c1[i])
  distance = __builtin_sqrt(expr2[i]);
else
  distance = __builtin_sqrt(expr3[i]);
t+=distance;
}
  return t;
}

```

With -mavx512f (since I was too lazy to figure out what expr* was and
too lazy to test non-x86_64 [since that is what I have built]) we get:
```
  _7 = _34 ? _14 : _12;
  _63 = _42 ? _6 : _7;
  distance_17 = __builtin_sqrt (_63);
  t_25 = distance_17 + t_29;
```
In ifcvt.

Which I assume is exactly what you were expecting.


>
>   2.  Add a more general "common instruction sinking" transformation to
> tree-ssa-sink.cc, analogous in spirit to the existing common-store
> sinking support, but for a restricted class of pure scalar
> computations.
>
> My current intuition is that the PHIOPT direction may be a better fit
> for this specific pattern, because the optimization is fundamentally
> about factoring a common operation across PHI/merge structure rather
> than ordinary code sinking. However, I would appreciate guidance from
> the GCC community on the preferred design.
>
> I am especially interested in feedback on:
>
>   *   Whether tree-ssa-phiopt.cc is indeed the right place for this form of
> transformation.
>
>   *   Whether handling builtins such as sqrt is acceptable under the
> relevant floating-point / errno / exception semantics.
>
>   *   Whether this should be restricted to cases where the operation is
> known to be side-effect-free and safe to move.
>
>   *   Whether profitability should be guided primarily by code size,
> vectorization opportunities, or target-independent cost heuristics.
>
>   *   Whether there is existing work or a PR in this area that I should
> build upon.

See above.

>
> A motivating reduced example comes from a geometric distance kernel in
> which three branch-local sqrt computations could ideally be turned into
> one sqrt after value merging, improving the shape of the vectorized code.
See above.

Thanks,
Andrea

>
> If this direction seems reasonable, I am interested in preparing a
> prototype patch and test cases.
>
> Thanks,
> Guoce Feng

Reply via email to