https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91154
--- Comment #31 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Uroš Bizjak from comment #28)
> (In reply to Richard Biener from comment #26)
> > This is the powers of simplify_subreg I guess. We're lucky it doesn't do
> > this to arbitrary arithmetic.
> >
> > So we need to really change all defs we introduce to vector modes instead of
> > making our live easy and using paradoxical subregs all over the place.
>
> No, IMO IRA should be "fixed" to avoid stack temporary and (based on some
> cost metric) use direct move for paradoxical subregs.
The problem is
/* Moves between SSE and integer units are expensive. */
if (SSE_CLASS_P (class1) != SSE_CLASS_P (class2))
/* ??? By keeping returned value relatively high, we limit the number
of moves between integer and SSE registers for all targets.
Additionally, high value prevents problem with x86_modes_tieable_p(),
where integer modes in SSE registers are not tieable
because of missing QImode and HImode moves to, from or between
MMX/SSE registers. */
return MAX (8, SSE_CLASS_P (class1)
? ix86_cost->hard_register.sse_to_integer
: ix86_cost->hard_register.integer_to_sse);
The minimum cost of moves between SSE and integer units is 8.