Hi,
I am investigating an optimization opportunity in GCC related to
factoring common pure operations out of control-flow joins.
Consider code of this general form:
if (c1 <= 0.0)
distance = sqrt(expr1);
else if (c2 <= c1)
distance = sqrt(expr2);
else
distance = sqrt(expr3);
After control-flow simplification / if-conversion, this can effectively
lead to a form resembling:
distance = select(cond1,
sqrt(expr1),
select(cond2, sqrt(expr2), sqrt(expr3)));
For vectorized code, this may become "three vector sqrt operations plus
blends", while a more profitable form would be:
merged_expr = select(cond1,
expr1,
select(cond2, expr2, expr3));
distance = sqrt(merged_expr);
That is, transform a conceptual pattern like
phi(sqrt(a), sqrt(b), sqrt(c))
into
sqrt(phi(a, b, c))
when the operation is safe to factor out and doing so is profitable.
I noticed that LLVM has a related transformation in SimplifyCFG under
the sink-common-insts machinery. In GCC, I am trying to understand what
the most appropriate place would be for a similar optimization.
Two possible implementation directions seem plausible:
1. Extend the existing PHI factoring logic in
tree-ssa-phiopt.cc, especially around
factor_out_conditional_operation, to also handle suitable
side-effect-free builtins / calls such as sqrt under the
appropriate semantic restrictions.
2. Add a more general "common instruction sinking" transformation to
tree-ssa-sink.cc, analogous in spirit to the existing common-store
sinking support, but for a restricted class of pure scalar
computations.
My current intuition is that the PHIOPT direction may be a better fit
for this specific pattern, because the optimization is fundamentally
about factoring a common operation across PHI/merge structure rather
than ordinary code sinking. However, I would appreciate guidance from
the GCC community on the preferred design.
I am especially interested in feedback on:
* Whether tree-ssa-phiopt.cc is indeed the right place for this form of
transformation.
* Whether handling builtins such as sqrt is acceptable under the
relevant floating-point / errno / exception semantics.
* Whether this should be restricted to cases where the operation is
known to be side-effect-free and safe to move.
* Whether profitability should be guided primarily by code size,
vectorization opportunities, or target-independent cost heuristics.
* Whether there is existing work or a PR in this area that I should
build upon.
A motivating reduced example comes from a geometric distance kernel in
which three branch-local sqrt computations could ideally be turned into
one sqrt after value merging, improving the shape of the vectorized code.
If this direction seems reasonable, I am interested in preparing a
prototype patch and test cases.
Thanks,
Guoce Feng