On Thu, Jun 6, 2019 at 7:54 AM Hongtao Liu <crazy...@gmail.com> wrote: > > Hi Uros and all: > This patch is about to enable support for AVX512_VP2INTERSECT which will > be in Willow Cove. There are two instructions for AVX512_VP2INTERSECT: > VP2INTERSECTD and VP2INTERSECTQ. More details please refer to > https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf > > Bootstrap is ok, and no regressions for i386/x86-64 testsuite. > > Changelog: > > gcc/ > +2019-06-06 Hongtao Liu <hongtao....@intel.com> > + H.J. Lu <hongjiu...@intel.com> > + Olga Makhotina <olga.makhot...@intel.com> > + > + * common/config/i386/i386-common.c > + (OPTION_MASK_ISA_AVX512VP2INTERSECT_SET, > + OPTION_MASK_ISA_AVX512VP2INTERSECT_UNSET): New macros. > + (OPTION_MASK_ISA2_AVX512F_UNSET): Add > + OPTION_MASK_ISA_AVX512VP2INTERSECT_UNSET. > + (ix86_handle_option): Handle -mavx512vp2intersect. > + * config/i386/avx512vp2intersectintrin.h: New. > + * config/i386/avx512vp2intersectvlintrin.h: New. > + * config/i386/cpuid.h (bit_AVX512VP2INTERSECT): New. > + * config/i386/driver-i386.c (host_detect_local_cpu): Detect > + AVX512VP2INTERSECT. > + * config/i386/i386-builtin-types.def: Add new types. > + * config/i386/i386-builtin.def: Add new builtins. > + * config/i386/i386-builtins.c: (enum processor_features): Add > + F_AVX512VP2INTERSECT. > + (static const _isa_names_table isa_names_table): Ditto. > + * config/i386/i386-c.c (ix86_target_macros_internal): Define > + __AVX512VP2INTERSECT__. > + * config/i386/i386-expand.c (ix86_expand_builtin): Expand > + IX86_BUILTIN_2INTERSECTD512, IX86_BUILTIN_2INTERSECTQ512, > + IX86_BUILTIN_2INTERSECTD256, IX86_BUILTIN_2INTERSECTQ256, > + IX86_BUILTIN_2INTERSECTD128, IX86_BUILTIN_2INTERSECTQ128. > + * config/i386/i386-modes.def (P2QI, P2HI): New modes. > + * config/i386/i386-options.c (ix86_target_string): Add > + -mavx512vp2intersect. > + (ix86_option_override_internal): Handle AVX512VP2INTERSECT. > + * config/i386/i386.c (ix86_hard_regno_nregs): Allocate two regs for > + P2HImode and P2QImode. > + (ix86_hard_regno_mode_ok): Register pair only starts at even hardreg > + number for P2QImode and P2HImode. > + * config/i386/i386.h (TARGET_AVX512VP2INTERSECT, > + TARGET_AVX512VP2INTERSECT_P): New. > + (PTA_AVX512VP2INTERSECT): Ditto. > + * config/i386/i386.opt: Add -mavx512vp2intersect. > + * config/i386/immintrin.h: Include avx512vp2intersectintrin.h and > + avx512vp2intersectvlintrin.h. > + * config/i386/sse.md (define_c_enum "unspec"): Add UNSPEC_VP2INTERSECT. > + (define_mode_iterator VI48_AVX512VP2VL): New. > + (avx512vp2intersect_2intersect<mode>, > + avx512vp2intersect_2intersectv16si): New define_insn patterns. > + (*vec_extractp2hi, *vec_extractp2qi): New define_insn_and_split > + patterns. > + * config.gcc: Add avx512vp2intersectvlintrin.h and > + avx512vp2intersectintrin.h to extra_headers. > + * doc/invoke.texi: Document -mavx512vp2intersect. > + > > gcc/testsuite/ > +2019-06-06 Hongtao Liu <hongtao....@intel.com> > + Olga Makhotina <olga.makhot...@intel.com> > + > + * gcc.target/i386/avx512-check.h: Handle bit_AVX512VP2INTERSECT. > + * gcc.target/i386/avx512vp2intersect-2intersect-1a.c: New test. > + * gcc.target/i386/avx512vp2intersect-2intersect-1b.c: Likewise. > + * gcc.target/i386/avx512vp2intersect-2intersectvl-1a.c: Likewise. > + * gcc.target/i386/avx512vp2intersect-2intersectvl-1b.c: Likewise. > + * gcc.target/i386/sse-12.c: Add -mavx512vp2intersect. > + * gcc.target/i386/sse-13.c: Likewsie. > + * gcc.target/i386/sse-14.c: Likewise. > + * gcc.target/i386/sse-22.c: Likewise. > + * gcc.target/i386/sse-23.c: Likewise. > + * g++.dg/other/i386-2.C: Likewise. > + * g++.dg/other/i386-3.C: Likewise. > +
+ case OPT_mavx512vp2intersect: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA_AVX512VP2INTERSECT_SET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA_AVX512VP2INTERSECT_SET; + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET; + } some space/tab mixup here. +(define_mode_iterator VI48_AVX512VP2VL + [V8DI + (V4DI "TARGET_AVX512VL") (V2DI "TARGET_AVX512VL") + (V8SI "TARGET_AVX512VL") (V4SI "TARGET_AVX512VL")]) also here (or maybe a vertical alignment issue). + op2 = copy_to_reg (op2); + op3 = copy_to_reg (op3); The predicate says that this one can be memory operand as well. I suggest you use if (!insn_data[icode].operand[X].predicate (opX, modeX)) opX = copy_to_mode_reg (modeX, opX); This would also handle eventual VOIDmode vector 0 operand. + + op4 = gen_reg_rtx (mode4); + emit_insn (GEN_FCN (icode) (op4, op2, op3)); + mode0 = GET_MODE_INNER (GET_MODE (op4)); + pat = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, GEN_INT (0))); + pat2 = gen_rtx_VEC_SELECT (mode0, op4, pat); + emit_move_insn (gen_rtx_MEM (mode0, op0), pat2); + pat = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, GEN_INT (1))); + pat2 = gen_rtx_VEC_SELECT (mode0, op4, pat); + emit_move_insn (gen_rtx_MEM (mode0, op1), pat2); + You should probably emit a subreg here (using simplify_gen_subreg) and leave to the register allocator to emit correct hard register out of a register pair. Using this approach, *vec_extractp2hi and *vec_extractp2hi should not be necessary anymore; RA will reduce the subreg RTX to a movqi/movhi by itself. Uros.