Hi,

I am investigating an optimization opportunity in GCC related to
factoring common pure operations out of control-flow joins.

Consider code of this general form:

if (c1 <= 0.0)
  distance = sqrt(expr1);
else if (c2 <= c1)
  distance = sqrt(expr2);
else
  distance = sqrt(expr3);


After control-flow simplification / if-conversion, this can effectively
lead to a form resembling:

distance = select(cond1,
                  sqrt(expr1),
                  select(cond2, sqrt(expr2), sqrt(expr3)));


For vectorized code, this may become "three vector sqrt operations plus
blends", while a more profitable form would be:

merged_expr = select(cond1,
                     expr1,
                     select(cond2, expr2, expr3));
distance = sqrt(merged_expr);


That is, transform a conceptual pattern like

phi(sqrt(a), sqrt(b), sqrt(c))


into

sqrt(phi(a, b, c))


when the operation is safe to factor out and doing so is profitable.

I noticed that LLVM has a related transformation in SimplifyCFG under
the sink-common-insts machinery. In GCC, I am trying to understand what
the most appropriate place would be for a similar optimization.

Two possible implementation directions seem plausible:

  1.  Extend the existing PHI factoring logic in
tree-ssa-phiopt.cc, especially around
factor_out_conditional_operation, to also handle suitable
side-effect-free builtins / calls such as sqrt under the
appropriate semantic restrictions.

  2.  Add a more general "common instruction sinking" transformation to
tree-ssa-sink.cc, analogous in spirit to the existing common-store
sinking support, but for a restricted class of pure scalar
computations.

My current intuition is that the PHIOPT direction may be a better fit
for this specific pattern, because the optimization is fundamentally
about factoring a common operation across PHI/merge structure rather
than ordinary code sinking. However, I would appreciate guidance from
the GCC community on the preferred design.

I am especially interested in feedback on:

  *   Whether tree-ssa-phiopt.cc is indeed the right place for this form of
transformation.

  *   Whether handling builtins such as sqrt is acceptable under the
relevant floating-point / errno / exception semantics.

  *   Whether this should be restricted to cases where the operation is
known to be side-effect-free and safe to move.

  *   Whether profitability should be guided primarily by code size,
vectorization opportunities, or target-independent cost heuristics.

  *   Whether there is existing work or a PR in this area that I should
build upon.

A motivating reduced example comes from a geometric distance kernel in
which three branch-local sqrt computations could ideally be turned into
one sqrt after value merging, improving the shape of the vectorized code.

If this direction seems reasonable, I am interested in preparing a
prototype patch and test cases.

Thanks,
Guoce Feng

Reply via email to