[Bug rtl-optimization/44141] Redundant loads and stores generated for AMD bdver1 target

venkataramanan.kumar at amd dot com Wed, 28 Mar 2012 03:34:39 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44141


--- Comment #13 from Venkataramanan <venkataramanan.kumar at amd dot com> 
2012-03-28 10:32:31 UTC ---
(In reply to comment #12)
> Having a vector mode changing subreg on the LHS of an instruction is a very
> common issue in the i386 backend, and unfortunately e.g. means that lots of
> insns can't be combined or simplified.  I wonder if the expansion sometimes
> shouldn't use a non-subregged temporary as lhs and add a move from subreg of
> the temporary to the desired destination.

The expander now converts as shown below for unaligned moves with V2DF mode.

            if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
                {
                  op0 = gen_lowpart (V4SFmode, op0);
                  op1 = gen_lowpart (V4SFmode, op1);
                  emit_insn (gen_sse_movups (op0, op1));
                  return;
                }

You mean conversion is not needed here?  


> BTW, if with TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL ps is more desirable than 
> pd
> for movs, then perhaps it would be better to add a mode attr similar to
> ssemodesuffix, which would emit pd or ps for V2DFmode depending on
> TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL (see e.g. i128 mode attr).

Yes with TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL we want to generate movups
instead of movupd.

[Bug rtl-optimization/44141] Redundant loads and stores generated for AMD bdver1 target

Reply via email to