Re: [PATCH] RISC-V: Allow redirecting function calls to the same FMV clone

Yangyu Chen Mon, 24 Mar 2025 03:31:33 -0700

> On 24 Mar 2025, at 16:28, Alfie Richards <alfie.richa...@arm.com> wrote:
> 
> On 24/03/2025 03:28, Yangyu Chen wrote:
>>> On 24 Mar 2025, at 11:03, Andrew Carlotti <andrew.carlo...@arm.com> wrote:
>>> 
>>> Two brief comments, since I'm on holiday until 31st but happened to notice 
>>> this
>>> patch anyway.
>>> 
>>> On Mon, Mar 24, 2025 at 02:19:21AM +0800, Yangyu Chen wrote:
>>>> This behavior does not ensure that if any higher priority callee version
>>>> were selected at runtime, then a higher priority caller version would have
>>>> been eligible for selection. But this is hard to solve due to comparing
>>>> the priority of different versions of the caller may not be meaningful.
>>> 
>>> I've discussed this problem with Alfie (added to CC), who is currently 
>>> working
>>> on FMV improvements for AArch64 and hoping to clean up the backend/middle 
>>> end
>>> interface as well.  I think we agreed that it would be good to add a target
>>> hook to compare versions,
>> Actually, we already have a target hook for comparing the priority
>> of mv clones called: TARGET_COMPARE_VERSION_PRIORITY.
>> The problem is that if we don’t have the same target list for both
>> caller and callee, then such optimization might be meaningless.
>> For example, we know "arch=+v,+zbb", "arch=+v" and "arch=+zbb" can
>> speed up vector rotation right operations on RISC-V. But "v" does
>> not imply "zbb" extension. We have a callee for all these targets.
>> And we have a caller that has only "arch=+v" and "arch=+zbb". Then
>> running on a CPU with only "v" extension but no Zbb. In this case,
>> the callee with "arch+v" is the best target for this operation on
>> this CPU. But "arch=+v,+zbb" might be better for some platforms.
>> Here are some tradeoffs: if this function call is very frequent,
>> and zbb is only used for small-sized scalar data epilog, then calling
>> from PLT rather than inline the "v" version of callee might have
>> more performance penalty.
> 
> Yes agreed, this is part of the issue. There is also the issue that callers 
> may be in different TU's where only some of the versions are available so the 
> list of possible versions are different.
>> My point is that, if it only happens in the same compilation unit,
>> we should let the developer know this optimization and have the
>> same target list for both the caller and the callee. Sometimes, a
>> function call will introduce more performance penalties than using
>> a default target. And if we don't have this optimization, the callee
>> cannot be inlined.
> 
> I agree there are performance implications and cases where we would rather 
> just inline and use a sub-optimal version. But we cannot be certain this will 
> just be used for performance reasons, and selecting the wrong version or 
> inconsistent versions across different callers could be a soundness issue if 
> used for functional reasons.
> 
> I imagine a situation where there is some extension with specific setup/take 
> down code, such as setting/clearing a system register. If then we have two 
> functions called setup/takedown which we use FMV to specify for our 
> architecture, it would then be possible that different versions were taken 
> for the setup vs take down.
> 
> In my opinion, the idea of FMV is to allow users to specify that functions 
> will be dispatched and the most specific one taken. I think to select the 
> wrong version is a breaking of that promise.
> 
> I think we can make this optimisation smarter to do the inlining when we can 
> determine at compile time what the correct version is. I believe that logic 
> should be something like:
> 
> - When in the TU where the default is implemented (so we have visibility of 
> all versions)
> - When the caller target features + CLI enabled features is a superset of a 
> version AND when there is no version for which these features are a
> subset.
> - Then we can call directly.
> 

Agreed. This relaxed my proposal where directly calling from caller
== callee in features to caller's features is a superset of callee's.

> I think this would catch a lot of cases, especially as I think it is unlikely 
> in practice to have lots of versions with overlapping target feature lists.
> 
>> I'm still open to hearing some more detailed and effective strategies
>> for addressing this issue.
>>> at which point the hard problem can more easily be
>>> solved for all targets.  I don't know what progress Alfie has made towards 
>>> this
>>> so far, but I think we're aiming to get these improvements into GCC 16.
> 
> As Andrew says I am working on a refactor of the FMV logic, and will post my 
> next version soon.
> 

I'm looking forward to your patches. I'm willing to implement that
hook for RISC-V FMV.

Thanks,
Yangyu Chen

>>> 
>>> On which note: I presume this patch is proposed for GCC 16 Stage 1?
>> Yes.
>> Thanks,
>> Yangyu Chen
Re: [PATCH] RISC-V: Allow redirecting function calls to the same FMV clone

Reply via email to