So in the end I seem to be able to combine AVX & AVX512 arriving at the following which passes basic testing. I will now see to teach the vectorizer the required "promotion" to handle mask_gather_loadv4dfv4si and mask_gather_loadv4sfv4di.
Meanwhile, do you see any hole in the below? If not I'll do mask_scatter_store accordingly (that will be simpler since AVX doesn't have scatter). Thanks, Richard. >From 706d3ac96b384eaad249cc83ec542ec643e21a4c Mon Sep 17 00:00:00 2001 From: Richard Biener <rguent...@suse.de> Date: Tue, 17 Aug 2021 15:14:38 +0200 Subject: [PATCH] move x86 to use gather/scatter internal functions To: gcc-patches@gcc.gnu.org This moves the x86 backend to use standard pattern names for mask_gather_load and mask_scatter_store rather than using the builtin_{gather,scatter} target hooks. The patch starts with mask_gather_load convering AVX and AVX512 and omits gather_load which doesn't match an actual instruction. The vectorizer will appropriately set up a all-ones mask when necessary. The vectorizer still lacks support for unpacking, thus mixed size gathers like mask_gather_loadv4dfv4si. 2021-08-17 Richard Biener <rguent...@suse.de> * config/i386/sse.md (vec_gather_idxsi, vec_gather_idxdi): New lower-case mode attributes for VEC_GATHER_IDX{SI,DI}. (VEC_GATHER_MODEX): New mode iterator. (mask_gather_load<mode><vec_gather_idxsi>): New expander. (mask_gather_load<mode><vec_gather_idxdi>): Likewise. --- gcc/config/i386/sse.md | 53 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 13889687793..6ae826c268a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -23640,12 +23640,22 @@ (V2DF "V4SI") (V4DF "V4SI") (V8DF "V8SI") (V4SI "V4SI") (V8SI "V8SI") (V16SI "V16SI") (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")]) +(define_mode_attr vec_gather_idxsi + [(V2DI "v4si") (V4DI "v4si") (V8DI "v8si") + (V2DF "v4si") (V4DF "v4si") (V8DF "v8si") + (V4SI "v4si") (V8SI "v8si") (V16SI "v16si") + (V4SF "v4si") (V8SF "v8si") (V16SF "v16si")]) (define_mode_attr VEC_GATHER_IDXDI [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI") (V2DF "V2DI") (V4DF "V4DI") (V8DF "V8DI") (V4SI "V2DI") (V8SI "V4DI") (V16SI "V8DI") (V4SF "V2DI") (V8SF "V4DI") (V16SF "V8DI")]) +(define_mode_attr vec_gather_idxdi + [(V2DI "v2di") (V4DI "v4di") (V8DI "v8di") + (V2DF "v2di") (V4DF "v4di") (V8DF "v8di") + (V4SI "v2di") (V8SI "v4di") (V16SI "v8di") + (V4SF "v2di") (V8SF "v4di") (V16SF "v8di")]) (define_mode_attr VEC_GATHER_SRCDI [(V2DI "V2DI") (V4DI "V4DI") (V8DI "V8DI") @@ -23653,6 +23663,30 @@ (V4SI "V4SI") (V8SI "V4SI") (V16SI "V8SI") (V4SF "V4SF") (V8SF "V4SF") (V16SF "V8SF")]) +(define_mode_iterator VEC_GATHER_MODEX + [V2DI V2DF V4DI V4DF V4SI V4SF V8SI V8SF + (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F") + (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F")]) + +(define_expand "mask_gather_load<mode><vec_gather_idxsi>" + [(match_operand:VEC_GATHER_MODEX 0 "register_operand") + (match_operand 1 "vsib_address_operand") + (match_operand:<VEC_GATHER_IDXSI> 2 "register_operand") + (match_operand:SI 3 "const0_operand") + (match_operand:SI 4 "const1248_operand") + (match_operand 5 "register_operand")] + "TARGET_AVX2 && TARGET_USE_GATHER" +{ + rtx scratch = gen_reg_rtx (<MODE>mode); + emit_move_insn (scratch, CONST0_RTX(<MODE>mode)); + if (GET_MODE_SIZE (<MODE>mode) != 32 && !TARGET_AVX512VL) + operands[5] = gen_lowpart_SUBREG (<MODE>mode, operands[5]); + emit_insn (gen_<avx2_avx512>_gathersi<mode> (operands[0], scratch, + operands[1], operands[2], + operands[5], operands[4])); + DONE; +}) + (define_expand "avx2_gathersi<mode>" [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand") (unspec:VEC_GATHER_MODE @@ -23714,6 +23748,25 @@ (set_attr "prefix" "vex") (set_attr "mode" "<sseinsnmode>")]) +(define_expand "mask_gather_load<mode><vec_gather_idxdi>" + [(match_operand:VEC_GATHER_MODEX 0 "register_operand") + (match_operand 1 "vsib_address_operand") + (match_operand:<VEC_GATHER_IDXDI> 2 "register_operand") + (match_operand:SI 3 "const0_operand") + (match_operand:SI 4 "const1248_operand") + (match_operand 5 "register_operand")] + "TARGET_AVX2 && TARGET_USE_GATHER" +{ + rtx scratch = gen_reg_rtx (<MODE>mode); + emit_move_insn (scratch, CONST0_RTX(<MODE>mode)); + if (GET_MODE_SIZE (<MODE>mode) != 32 && !TARGET_AVX512VL) + operands[5] = gen_lowpart_SUBREG (<VEC_GATHER_SRCDI>mode, operands[5]); + emit_insn (gen_<avx2_avx512>_gatherdi<mode> (operands[0], scratch, + operands[1], operands[2], + operands[5], operands[4])); + DONE; +}) + (define_expand "avx2_gatherdi<mode>" [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand") (unspec:VEC_GATHER_MODE -- 2.31.1