On 11/01/2013 10:19 PM, Kirill Yukhin wrote: > Hello Richard, > > On 21 Oct 16:01, Richard Henderson wrote: >> Error on V16SF. Probably better to fill this out. > Thanks, fixed. > >> Better to just use <sseinsnmode> here, as it's a compile-time constant. > Fixed. > >>> +(define_insn "avx512f_store<mode>_mask" >> >> Likewise. > Fixed. > >> Nested vec_merge? That seems... odd to say the least. >> How in the world does this get matched? > Moved to separate patch. > >>> +(define_insn "*avx512f_loads<mode>_mask" >> >> Likewise. > Moved to separate patch. > >>> +(define_insn "avx512f_stores<mode>_mask" >> This seems similar, though of course it's an extract. >> I still can't imagine how it could be used. > Separate patch. > >>> -(define_insn "rcp14<mode>" >>> +(define_insn "<mask_codefor>rcp14<mode><mask_name>" >> >> What, this name isn't used for non-masked anymore? > Bogus original pattern. Introduced by me here: > svn+ssh://gcc.gnu.org/svn/gcc/trunk@203609 > This (and subseqent patterns you mention) are not used > by name. In general case for every built-in whether it is > masked or not we use masked version of built-in. E.g.: > extern __inline __m512d > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > _mm512_rcp14_pd (__m512d __A) > { > return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A, > (__v8df) > _mm512_setzero_pd (), > (__mmask8) -1); > } > Then, while expanding, if we can prove that mask is const -1 we remove > redundant vec_merge getting non-masked variant of the built-in. > Same holds for all inssn you mentioned > > >>> -(define_insn "srcp14<mode>" >> Likewise. These changes don't belong in this patch. > Ditto. > >>> -(define_insn "rsqrt14<mode>" >> Likewise. > Ditto. > >>> - (match_operand:FMAMODE 3 "nonimmediate_operand" " v,vm, 0,xm,x")))] >>> + (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x") >>> + (match_operand:FMAMODE 2 "nonimmediate_operand" "vm,v,vm,x,m") >>> + (match_operand:FMAMODE 3 "nonimmediate_operand" "v,vm,0,xm,x")))] >> >> Unrelated changes. Repeated throughout the fma patterns. > Reverted. > >>> +(define_insn "*fmai_fmadd_<mode>_maskz" >>> + [(set (match_operand:VF_128 0 "register_operand" "=v,v") >>> + (vec_merge:VF_128 >>> + (vec_merge:VF_128 >>> + (fma:VF_128 >>> + (match_operand:VF_128 1 "nonimmediate_operand" "0,0") >>> + (match_operand:VF_128 2 "nonimmediate_operand" "vm,v") >>> + (match_operand:VF_128 3 "nonimmediate_operand" "v,vm")) >>> + (match_operand:VF_128 4 "const0_operand") >>> + (match_operand:QI 5 "register_operand" "k,k")) >>> + (match_dup 1) >>> + (const_int 1)))] >>> + "TARGET_AVX512F" >>> + "@ >>> + vfmadd132<ssescalarmodesuffix>\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %3, >>> %2} >>> + vfmadd213<ssescalarmodesuffix>\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, >>> %3}" >>> + [(set_attr "type" "ssemuladd") >>> + (set_attr "mode" "<MODE>")]) >> >> These seem like useless patterns. If they're for builtins, >> then they seem like useless builtins. See above. > Moved to dedicated patch. > >>> @@ -3686,8 +4328,8 @@ >>> (set_attr "athlon_decode" "vector,double,*") >>> (set_attr "amdfam10_decode" "vector,double,*") >>> (set_attr "bdver1_decode" "direct,direct,*") >>> - (set_attr "btver2_decode" "double,double,double") >>> (set_attr "prefix" "orig,orig,vex") >>> + (set_attr "btver2_decode" "double,double,double") >>> (set_attr "mode" "SF")]) >> >> Unrelated changes. > Ugh, reverted. > >>> +(define_expand "vec_unpacku_float_hi_v16si" >>> + [(match_operand:V8DF 0 "register_operand") >>> + (match_operand:V16SI 1 "register_operand")] >>> + "TARGET_AVX512F" >>> +{ >>> + REAL_VALUE_TYPE TWO32r; >>> + rtx k, x, tmp[4]; >>> + >>> + real_ldexp (&TWO32r, &dconst1, 32); >>> + x = const_double_from_real_value (TWO32r, DFmode); >>> + >>> + tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode)); >>> + tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x)); >>> + tmp[2] = gen_reg_rtx (V8DFmode); >>> + tmp[3] = gen_reg_rtx (V8SImode); >>> + k = gen_reg_rtx (QImode); >>> + >>> + emit_insn (gen_vec_extract_hi_v16si (tmp[3], operands[1])); >>> + emit_insn (gen_floatv8siv8df2 (tmp[2], tmp[3])); >>> + emit_insn (gen_rtx_SET (VOIDmode, k, >>> + gen_rtx_LT (QImode, tmp[2], tmp[0]))); >>> + emit_insn (gen_addv8df3_mask (tmp[2], tmp[2], tmp[1], tmp[2], k)); >>> + emit_move_insn (operands[0], tmp[2]); >>> + DONE; >>> +}) >> >> Separate patch. And this is too complicated, since vcvtudq2pd exists. > Moved to [5/8] Extend hooks. > >> Non-masked name change again. > See above. >>> +(define_insn "<mask_codefor>avx512f_unpcklps512<mask_name>" >> >> Ditto. > Above. > >>> +(define_insn "<mask_codefor>avx512f_movshdup512<mask_name>" >> >> Ditto. > Above. > >>> +(define_insn "<mask_codefor>avx512f_movsldup512<mask_name>" >> >> Ditto. > Above. > > Updated patch in the bottom. > > Bootstrapped. > > Coould you pls take a look?
This is ok now. r~