> we are going to have some AMD CPU with AVX2 support soon, the question is
> if it will prefer 256-bit vmovups/vmovupd/vmovdqu or split, but even
> if it will prefer split, the question is if like bdver{1,2,3} it will
> be X86_TUNE_AVX128_OPTIMAL, because if yes, then how 256-bit unaligned
> loads/stores are handled is much less important there. Ganesh?
256-bit is friendly on bdver4.
But, 256 bit unaligned stores are micro-coded which we would like to avoid. So
we require 128-bit MOVUPS.
-----Original Message-----
From: Jakub Jelinek [mailto:[email protected]]
Sent: Tuesday, November 12, 2013 3:57 PM
To: Jan Hubicka
Cc: H.J. Lu; Vladimir Makarov; GCC Patches; Uros Bizjak; Richard Henderson;
Gopalasubramanian, Ganesh
Subject: Re: Honnor ix86_accumulate_outgoing_args again
On Tue, Nov 12, 2013 at 11:05:45AM +0100, Jan Hubicka wrote:
> > @@ -16576,7 +16576,7 @@ ix86_avx256_split_vector_move_misalign (rtx
> > op0, rtx op1)
> >
> > if (MEM_P (op1))
> > {
> > - if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD)
> > + if (!TARGET_AVX2 && TARGET_AVX256_SPLIT_UNALIGNED_LOAD)
> > {
> > rtx r = gen_reg_rtx (mode);
> > m = adjust_address (op1, mode, 0); @@ -16596,7 +16596,7 @@
> > ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1)
> > }
> > else if (MEM_P (op0))
> > {
> > - if (TARGET_AVX256_SPLIT_UNALIGNED_STORE)
> > + if (!TARGET_AVX2 && TARGET_AVX256_SPLIT_UNALIGNED_STORE)
>
> I would add explanation comment on those two.
Looking at http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01235.html
we are going to have some AMD CPU with AVX2 support soon, the question is
if it will prefer 256-bit vmovups/vmovupd/vmovdqu or split, but even
if it will prefer split, the question is if like bdver{1,2,3} it will
be X86_TUNE_AVX128_OPTIMAL, because if yes, then how 256-bit unaligned
loads/stores are handled is much less important there. Ganesh?
> Shall we also disable argument accumulation for cores? It seems we won't
> solve the IRA issues, right?
You mean LRA issues here, right? If you are starting to use
no-accumulate-outgoing-args much more often than in the past, I think
the problem that LRA forces a frame pointer in that case is much more
important now (or has that been fixed in the mean time?). Vlad?
Jakub