https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Hongtao Liu from comment #8) > > vec_unpacks_hi_v4sf create an unintialized (reg:V4SF 853), I guess it may > > confuse LRA to allocate a mem for it. > > For simple case > void > foo (double* a, float* b, int n) > { > for (int i = 0; i != n; i++) > a[i] = b[i]; > } > > RA works ok, there's no extra spill there. Yeah, it needs enough register pressure to not have the extra reg here. I think the proposed patch in comment#7 might be good on its own as it avoids a false dependence on prior register contents (if not optimizing for size). It does fix the benchmark regression as well. I do wonder about the usefulness of the memory alternative on the sse_movhlps pattern though, there's the sse_storehps pattern which also models the store part more precisely as V2SFmode. Is sse_movhlps_exp ever invoked with a memory destination? That said, if the memory alternative stays we might want to mark it with '$' so it's never chosen when the then memory operand needs a reload?