Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

Richard Sandiford via Gcc-patches Sat, 23 Oct 2021 03:40:00 -0700

Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> I'm still a bit sceptical about treating the high-part cost as lower.
>> ISTM that the subreg cases are the ones that are truly “free” and any others
>> should have a normal cost.  So if CSE handled the subreg case itself (to 
>> model
>> how the rtx would actually be generated) then aarch64 code would have to
>> do less work.  I imagine that will be true for other targets as well.
>
> I guess the main problem is that CSE lacks context because it's not until 
> after
> combine that the high part becomes truly "free" when pushed into a high 
> operation.


Yeah.  And the aarch64 code is just being asked to cost the operation
it's given, which could for example come from an existing
aarch64_simd_mov_from_<mode>high.  I think we should try to ensure that
a aarch64_simd_mov_from_<mode>high followed by some arithmetic on the
result is more expensive than the fused operation (when fusing is
possible).

An analogy might be: if the cost code is given:

  (add (reg X) (reg Y))

then, at some later point, the (reg X) might be replaced with a
multiplication, in which case we'd have a MADD operation and the
addition is effectively free.  Something similar would happen if
(reg X) became a shift by a small amount on newer cores, although
I guess then you could argue either that the cost of the add
disappears or that the cost of the shift disappears.

But we shouldn't count ADD as free on the basis that it could be
combined with a multiplication or shift in future.  We have to cost
what we're given.  I think the same thing applies to the high part.

Here we're trying to prevent cse1 from replacing a DUP (lane) with
a MOVI by saying that the DUP is strictly cheaper than the MOVI.
I don't think that's really true though, and the cost tables in the
patch say that DUP is more expensive (rather than less expensive)
than MOVI.

Also, if I've understood correctly, it looks like we'd be relying
on the vget_high of a constant remaining unfolded until RTL cse1.
I think it's likely in future that we'd try to fold vget_high
at the gimple level instead, since that could expose more
optimisations of a different kind.  The gimple optimisers would
then fold vget_high(constant) in a similar way to cse1 does now.

So perhaps we should continue to allow the vget_high(constant)
to be foloded in cse1 and come up with some way of coping with
the folded form.

Thanks,
Richard

Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

Reply via email to