Ping (https://gcc.gnu.org/pipermail/gcc/2024-August/244625.html).
On 22/08/24 11:32, Dhruv Chawla via Gcc wrote:
External email: Use caution opening links or attachments
* Table Of Contents *
- Introduction
- Motivating Test Cases
- Proposed Solution
- Other Options
- Existing Solutions
- Primary Optimizations To Be Improved
- Front-End Attribute For Recording Return-Value Information
- Future Work
- References
- Questions
* Introduction *
Return jump functions offer a way of modeling return values of a function in
terms of its formal parameters, similar to how regular jump functions can be
used to model the actual parameters of a callee at a callsite in terms of the
formal parameters of the caller.
By discovering this information and propagating it across the program callgraph
in LTO mode, various transformations like value-range propagation, sibling-call
and chaining-call optimization can be augmented to take advantage of it and
potentially perform better optimizations.
Currently, ipa-cp models parameters using jump functions, propagates the values
of those jump functions by iterating to a fixed-point using lattices, and
clones functions by taking decisions from those lattice values.
We propose extending ipa-cp with the analysis and propagation of return jump
functions, starting with the basic case of a direct return of the form
"return x;" where "x" is any formal parameter of the function.
* Motivating Test Cases *
* Optimizing tail calls:
- PR92867
char buf[128] = { 1 };
char *foo (__SIZE_TYPE__ n)
{
return __builtin_memset (buf, ' ', n);
}
char *bar (__SIZE_TYPE__ n)
{
__builtin_memset (buf, ' ', n);
return buf;
}
Here, the call to __builtin_memset in foo gets successfully converted to a
tail-call, however the call in bar cannot be optimized to a tail-call. This is
an optimization that LLVM is able to perform.
Link to compiler explorer: https://godbolt.org/z/E81axqWTb
- PR67797
#include
void *my_func(void *s, size_t n)
{
memset(s, 0, n);
return s;
}
This is a similar case to the above example. LLVM is able to optimize the call
to memset to be a tail-call as well.
Link to compiler explorer: https://godbolt.org/z/YzjGc59f8
* Optimizing chaining calls:
#include
void f (int x, int y)
{
std::cout << x;
std::cout << y;
}
This function can be optimized to the following:
#include
void f (int x, int y)
{
std::cout << x << y;
}
LLVM is able to perform this optimization as well:
https://godbolt.org/z/MW75on1o5
* Proposed Solution *
* Extending IPA-CP:
1. Add return jump function data structures
- This involves updating ipa_node_params to contain information regarding the
return statements of the function, namely the lattice and the jump function
describing the return value, where both use existing data structures.
- The ipa_node_params class is reused to avoid introducing a new class and a
corresponding function_summary type, though it is trivial to add if deemed
a better solution. The ipa_return_value_summary class may also be a good
place for this information, however using it would involve moving it from
ipa-prop.cc to ipa-prop.h.
- Additionally, ipa_edge_args is updated to track whether or not it is a
callgraph edge originating from a return statement. This enables the
propagation of information in the WPA phase.
2. LGEN
- Set up return jump functions for each function similar to the parameter
jump function. When it cannot be conclusively determined that one formal
parameter is always returned (such as conditionally returning either of
two), the jump function is marked as invalid.
- This involves looking through phi nodes when return statements of
conditional branches are merged into a single exit block. Note that
returning a call expression does not count towards invalidating the
information for that function.
3. WPA
- Implement return value information propagation. It is not possible to do
this in the existing propagate_constants_topo function, because it iterates
in reverse postorder i.e. from caller to callee. However return value
propagation needs to be done from callee to caller, thus there is a need to
iterate in a postorder fashion.
- The lattice values are initialized using the jump functions computed in the
LGEN phase. These values are then propagated over the callgraph.
- The existing lattices are reused, with three possible values like usual.
The possible values are:
return_lattice -> { TOP, param_decl, BOTTOM }
where TOP and BOTTOM are defined as usual. param_decl refers to the tree of
either the parameter declaration or its type, whichever is available. The
meet operator is defined as usual for TOP and BOTTOM. When both are
param_decl, the following meet operation is defined:
meet(x, y) = x if x == y, BOTTOM otherwise
- Finally, nodes to which no information could be