> On 24 Mar 2025, at 11:03, Andrew Carlotti <andrew.carlo...@arm.com> wrote:
> 
> Two brief comments, since I'm on holiday until 31st but happened to notice 
> this
> patch anyway.
> 
> On Mon, Mar 24, 2025 at 02:19:21AM +0800, Yangyu Chen wrote:
>> This behavior does not ensure that if any higher priority callee version
>> were selected at runtime, then a higher priority caller version would have
>> been eligible for selection. But this is hard to solve due to comparing
>> the priority of different versions of the caller may not be meaningful.
> 
> I've discussed this problem with Alfie (added to CC), who is currently working
> on FMV improvements for AArch64 and hoping to clean up the backend/middle end
> interface as well.  I think we agreed that it would be good to add a target
> hook to compare versions,

Actually, we already have a target hook for comparing the priority
of mv clones called: TARGET_COMPARE_VERSION_PRIORITY.

The problem is that if we don’t have the same target list for both
caller and callee, then such optimization might be meaningless.

For example, we know "arch=+v,+zbb", "arch=+v" and "arch=+zbb" can
speed up vector rotation right operations on RISC-V. But "v" does
not imply "zbb" extension. We have a callee for all these targets.
And we have a caller that has only "arch=+v" and "arch=+zbb". Then
running on a CPU with only "v" extension but no Zbb. In this case,
the callee with "arch+v" is the best target for this operation on
this CPU. But "arch=+v,+zbb" might be better for some platforms.
Here are some tradeoffs: if this function call is very frequent,
and zbb is only used for small-sized scalar data epilog, then calling
from PLT rather than inline the "v" version of callee might have
more performance penalty.

My point is that, if it only happens in the same compilation unit,
we should let the developer know this optimization and have the
same target list for both the caller and the callee. Sometimes, a
function call will introduce more performance penalties than using
a default target. And if we don't have this optimization, the callee
cannot be inlined.

I'm still open to hearing some more detailed and effective strategies
for addressing this issue.

> at which point the hard problem can more easily be
> solved for all targets.  I don't know what progress Alfie has made towards 
> this
> so far, but I think we're aiming to get these improvements into GCC 16.
> 
> On which note: I presume this patch is proposed for GCC 16 Stage 1?

Yes.

Thanks,
Yangyu Chen

Reply via email to