On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: > > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> "H.J. Lu" <hjl.to...@gmail.com> writes: > >> > 1. Update vec_duplicate to allow to fail so that backend can only allow > >> > broadcasting an integer constant to a vector when broadcast instruction > >> > is available. This can be used by memset expander to avoid vec_duplicate > >> > when loading from constant pool is more efficient. > >> > >> I don't see any changes in target-independent code though, other than > >> the doc update. It's still the case that (existing) uses of > >> vec_duplicate_optab do not allow it to fail. > > > > I have a followup patch set on > > > > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast > > > > to use it to expand memset with vector broadcast: > > > > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3 > > > > For SSE2 which doesn't have vector broadcast, the constant vector broadcast > > expander returns FAIL and load from constant pool will be used. > > Hmm, but as Jeff and I mentioned in the earlier replies, > vec_duplicate_optab shouldn't be used for constants. Constants > should go via the move expanders instead. > > In a previous message I suggested: > > … would it work to change: > > /* Try using vec_duplicate_optab for uniform vectors. */ > if (!TREE_SIDE_EFFECTS (exp) > && VECTOR_MODE_P (mode) > && eltmode == GET_MODE_INNER (mode) > && ((icode = optab_handler (vec_duplicate_optab, mode)) > != CODE_FOR_nothing) > && (elt = uniform_vector_p (exp))) > > to something like: > > /* Try using vec_duplicate_optab for uniform vectors. */ > if (!TREE_SIDE_EFFECTS (exp) > && VECTOR_MODE_P (mode) > && eltmode == GET_MODE_INNER (mode) > && (elt = uniform_vector_p (exp))) > { > if (TREE_CODE (elt) == INTEGER_CST > || TREE_CODE (elt) == POLY_INT_CST > || TREE_CODE (elt) == REAL_CST > || TREE_CODE (elt) == FIXED_CST) > { > rtx src = gen_const_vec_duplicate (mode, expand_normal > (node)); > emit_move_insn (target, src); > break; > } > … > } > > if that code was the source of the constant operand. If we're adding a > new use of vec_duplicate_optab then that should be similarly protected > against constant operands. >
Your comments apply to my initial vec_duplicate patch that caused the gcc.dg/pr100239.c failure. It has been fixed by commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515 Author: Richard Biener <rguent...@suse.de> Date: Mon Jun 7 20:08:13 2021 +0200 middle-end/100951 - make sure to generate VECTOR_CST in lowering When vector lowering creates piecewise ops make sure to create VECTOR_CSTs instead of CONSTRUCTORs when possible. The problem I am running into now is in my memset vector broadcast patch. In order to optimize vector broadcast for memset, I need to generate a pseudo register for __builtin_memset (ops, 3, 38); only when vector broadcast is available: rtx target = nullptr; unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode); machine_mode vector_mode; if (!mode_for_vector (QImode, nunits).exists (&vector_mode)) gcc_unreachable (); enum insn_code icode = optab_handler (vec_duplicate_optab, vector_mode); if (icode != CODE_FOR_nothing) { rtx reg = targetm.gen_memset_scratch_rtx (vector_mode); class expand_operand ops[2]; create_output_operand (&ops[0], reg, vector_mode); create_input_operand (&ops[1], data, QImode); if (maybe_expand_insn (icode, 2, ops)) { if (!rtx_equal_p (reg, ops[0].value)) emit_move_insn (reg, ops[0].value); target = lowpart_subreg (mode, reg, vector_mode); } } return target; <<< Return nullptr to load from constant pool. -- H.J.