On Fri, Jan 3, 2014 at 9:59 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> This is an attempt to port my recent > http://gcc.gnu.org/viewcvs?rev=204219&root=gcc&view=rev > http://gcc.gnu.org/viewcvs?rev=205663&root=gcc&view=rev > http://gcc.gnu.org/viewcvs?rev=206090&root=gcc&view=rev > changes also to AVX512F. The motivation is to get: > > #include <immintrin.h> > > __m512i > foo (void *x, void *y) > { > __m512i a = _mm512_loadu_si512 (x); > __m512i b = _mm512_loadu_si512 (y); > return _mm512_add_epi32 (a, b); > } > > use one of the unaligned memories directly as operand to the vpaddd > instruction. The first hunk is needed so that we don't regress on say: > > #include <immintrin.h> > > __m512i z; > > __m512i > foo (void *x, void *y, int k) > { > __m512i a = _mm512_mask_loadu_epi32 (z, k, x); > __m512i b = _mm512_mask_loadu_epi32 (z, k, y); > return _mm512_add_epi32 (a, b); > } > > __m512i > bar (void *x, void *y, int k) > { > __m512i a = _mm512_maskz_loadu_epi32 (k, x); > __m512i b = _mm512_maskz_loadu_epi32 (k, y); > return _mm512_add_epi32 (a, b); > } > > Does it matter which of vmovdqu32 vs. vmovdqu64 is used if no > masking/zeroing is performed (i.e. vmovdqu32 (%rax), %zmm0 vs. > vmovdqu64 (%rax), %zmm0) for performance reasons (i.e. isn't there some > reinterpretation penalty)? > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2014-01-03 Jakub Jelinek <ja...@redhat.com> > > * config/i386/sse.md (avx512f_load<mode>_mask): Emit vmovup{s,d} > or vmovdqu* for misaligned_operand. > (<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>, > <sse2_avx_avx512f>_loaddqu<mode><mask_name>): Handle <mask_applied>. > * config/i386/i386.c (ix86_expand_special_args_builtin): Set > aligned_mem for AVX512F masked aligned load and store builtins and for > non-temporal moves. > > * gcc.target/i386/avx512f-vmovdqu32-1.c: Allow vmovdqu64 instead of > vmovdqu32. Taking into account Kirill's comment, the patch is OK, although I find a bit strange in [1] that void f2 (int *__restrict e, int *__restrict f) { int i; for (i = 0; i < 1024; i++) e[i] = f[i]; } results in vmovdqu64 (%rsi,%rax), %zmm0 vmovdqu32 %zmm0, (%rdi,%rax) Shouldn't these two move insns be the same? [1] http://gcc.gnu.org/ml/gcc/2014-01/msg00015.html Thanks, Uros.