https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #8)
> > vec_unpacks_hi_v4sf create an unintialized (reg:V4SF 853), I guess it may
> > confuse LRA to allocate a mem for it.
> 
> For simple case
> void
> foo (double* a, float* b, int n)
> {
>     for (int i = 0; i != n; i++)
>       a[i] = b[i];
> }
> 
> RA works ok, there's no extra spill there.

Yeah, it needs enough register pressure to not have the extra reg here.

I think the proposed patch in comment#7 might be good on its own as it
avoids a false dependence on prior register contents (if not optimizing
for size).

It does fix the benchmark regression as well.

I do wonder about the usefulness of the memory alternative on the
sse_movhlps pattern though, there's the sse_storehps pattern which
also models the store part more precisely as V2SFmode.  Is
sse_movhlps_exp ever invoked with a memory destination?

That said, if the memory alternative stays we might want to mark it with '$'
so it's never chosen when the then memory operand needs a reload?

Reply via email to