On 24/03/2025 03:28, Yangyu Chen wrote:
On 24 Mar 2025, at 11:03, Andrew Carlotti <andrew.carlo...@arm.com> wrote:
Two brief comments, since I'm on holiday until 31st but happened to notice this
patch anyway.
On Mon, Mar 24, 2025 at 02:19:21AM +0800, Yangyu Chen wrote:
This behavior does not ensure that if any higher priority callee version
were selected at runtime, then a higher priority caller version would have
been eligible for selection. But this is hard to solve due to comparing
the priority of different versions of the caller may not be meaningful.
I've discussed this problem with Alfie (added to CC), who is currently working
on FMV improvements for AArch64 and hoping to clean up the backend/middle end
interface as well. I think we agreed that it would be good to add a target
hook to compare versions,
Actually, we already have a target hook for comparing the priority
of mv clones called: TARGET_COMPARE_VERSION_PRIORITY.
The problem is that if we don’t have the same target list for both
caller and callee, then such optimization might be meaningless.
For example, we know "arch=+v,+zbb", "arch=+v" and "arch=+zbb" can
speed up vector rotation right operations on RISC-V. But "v" does
not imply "zbb" extension. We have a callee for all these targets.
And we have a caller that has only "arch=+v" and "arch=+zbb". Then
running on a CPU with only "v" extension but no Zbb. In this case,
the callee with "arch+v" is the best target for this operation on
this CPU. But "arch=+v,+zbb" might be better for some platforms.
Here are some tradeoffs: if this function call is very frequent,
and zbb is only used for small-sized scalar data epilog, then calling
from PLT rather than inline the "v" version of callee might have
more performance penalty.
Yes agreed, this is part of the issue. There is also the issue that
callers may be in different TU's where only some of the versions are
available so the list of possible versions are different.
My point is that, if it only happens in the same compilation unit,
we should let the developer know this optimization and have the
same target list for both the caller and the callee. Sometimes, a
function call will introduce more performance penalties than using
a default target. And if we don't have this optimization, the callee
cannot be inlined.
I agree there are performance implications and cases where we would
rather just inline and use a sub-optimal version. But we cannot be
certain this will just be used for performance reasons, and selecting
the wrong version or inconsistent versions across different callers
could be a soundness issue if used for functional reasons.
I imagine a situation where there is some extension with specific
setup/take down code, such as setting/clearing a system register. If
then we have two functions called setup/takedown which we use FMV to
specify for our architecture, it would then be possible that different
versions were taken for the setup vs take down.
In my opinion, the idea of FMV is to allow users to specify that
functions will be dispatched and the most specific one taken. I think to
select the wrong version is a breaking of that promise.
I think we can make this optimisation smarter to do the inlining when we
can determine at compile time what the correct version is. I believe
that logic should be something like:
- When in the TU where the default is implemented (so we have visibility
of all versions)
- When the caller target features + CLI enabled features is a superset
of a version AND when there is no version for which these features are a
subset.
- Then we can call directly.
I think this would catch a lot of cases, especially as I think it is
unlikely in practice to have lots of versions with overlapping target
feature lists.
I'm still open to hearing some more detailed and effective strategies
for addressing this issue.
at which point the hard problem can more easily be
solved for all targets. I don't know what progress Alfie has made towards this
so far, but I think we're aiming to get these improvements into GCC 16.
As Andrew says I am working on a refactor of the FMV logic, and will
post my next version soon.
On which note: I presume this patch is proposed for GCC 16 Stage 1?
Yes.
Thanks,
Yangyu Chen