> On 24 Mar 2025, at 16:28, Alfie Richards <alfie.richa...@arm.com> wrote:
>
> On 24/03/2025 03:28, Yangyu Chen wrote:
>>> On 24 Mar 2025, at 11:03, Andrew Carlotti <andrew.carlo...@arm.com> wrote:
>>>
>>> Two brief comments, since I'm on holiday until 31st but happened to notice
>>> this
>>> patch anyway.
>>>
>>> On Mon, Mar 24, 2025 at 02:19:21AM +0800, Yangyu Chen wrote:
>>>> This behavior does not ensure that if any higher priority callee version
>>>> were selected at runtime, then a higher priority caller version would have
>>>> been eligible for selection. But this is hard to solve due to comparing
>>>> the priority of different versions of the caller may not be meaningful.
>>>
>>> I've discussed this problem with Alfie (added to CC), who is currently
>>> working
>>> on FMV improvements for AArch64 and hoping to clean up the backend/middle
>>> end
>>> interface as well. I think we agreed that it would be good to add a target
>>> hook to compare versions,
>> Actually, we already have a target hook for comparing the priority
>> of mv clones called: TARGET_COMPARE_VERSION_PRIORITY.
>> The problem is that if we don’t have the same target list for both
>> caller and callee, then such optimization might be meaningless.
>> For example, we know "arch=+v,+zbb", "arch=+v" and "arch=+zbb" can
>> speed up vector rotation right operations on RISC-V. But "v" does
>> not imply "zbb" extension. We have a callee for all these targets.
>> And we have a caller that has only "arch=+v" and "arch=+zbb". Then
>> running on a CPU with only "v" extension but no Zbb. In this case,
>> the callee with "arch+v" is the best target for this operation on
>> this CPU. But "arch=+v,+zbb" might be better for some platforms.
>> Here are some tradeoffs: if this function call is very frequent,
>> and zbb is only used for small-sized scalar data epilog, then calling
>> from PLT rather than inline the "v" version of callee might have
>> more performance penalty.
>
> Yes agreed, this is part of the issue. There is also the issue that callers
> may be in different TU's where only some of the versions are available so the
> list of possible versions are different.
>> My point is that, if it only happens in the same compilation unit,
>> we should let the developer know this optimization and have the
>> same target list for both the caller and the callee. Sometimes, a
>> function call will introduce more performance penalties than using
>> a default target. And if we don't have this optimization, the callee
>> cannot be inlined.
>
> I agree there are performance implications and cases where we would rather
> just inline and use a sub-optimal version. But we cannot be certain this will
> just be used for performance reasons, and selecting the wrong version or
> inconsistent versions across different callers could be a soundness issue if
> used for functional reasons.
>
> I imagine a situation where there is some extension with specific setup/take
> down code, such as setting/clearing a system register. If then we have two
> functions called setup/takedown which we use FMV to specify for our
> architecture, it would then be possible that different versions were taken
> for the setup vs take down.
>
> In my opinion, the idea of FMV is to allow users to specify that functions
> will be dispatched and the most specific one taken. I think to select the
> wrong version is a breaking of that promise.
>
> I think we can make this optimisation smarter to do the inlining when we can
> determine at compile time what the correct version is. I believe that logic
> should be something like:
>
> - When in the TU where the default is implemented (so we have visibility of
> all versions)
> - When the caller target features + CLI enabled features is a superset of a
> version AND when there is no version for which these features are a
> subset.
> - Then we can call directly.
>
Agreed. This relaxed my proposal where directly calling from caller
== callee in features to caller's features is a superset of callee's.
> I think this would catch a lot of cases, especially as I think it is unlikely
> in practice to have lots of versions with overlapping target feature lists.
>
>> I'm still open to hearing some more detailed and effective strategies
>> for addressing this issue.
>>> at which point the hard problem can more easily be
>>> solved for all targets. I don't know what progress Alfie has made towards
>>> this
>>> so far, but I think we're aiming to get these improvements into GCC 16.
>
> As Andrew says I am working on a refactor of the FMV logic, and will post my
> next version soon.
>
I'm looking forward to your patches. I'm willing to implement that
hook for RISC-V FMV.
Thanks,
Yangyu Chen
>>>
>>> On which note: I presume this patch is proposed for GCC 16 Stage 1?
>> Yes.
>> Thanks,
>> Yangyu Chen