Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > On Mon, 9 Sep 2019 at 22:06, Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: >> >> On Mon, 9 Sep 2019 at 16:45, Richard Sandiford >> <richard.sandif...@arm.com> wrote: >> > >> > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: >> > > With patch, the only following FAIL remains for aarch64-sve.exp: >> > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve >> > > scan-assembler-times \\tmovprfx\\t 6 >> > > which now contains 14. >> > > Should I adjust the test, assuming the change isn't a regression ? >> > >> > Well, it is kind-of a regression, but it really just means that the >> > integer code is now consistent with the floating-point code in having >> > an unnecessary MOVPRFX. So I think adjusting the count is fine. >> > Presumably any future fix for the existing redundant MOVPRFXs will >> > apply to the new ones as well. >> > >> > The patch looks good to me, just some very minor nits: >> > >> > > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type) >> > > >> > > /* Record that a fully-masked version of LOOP_VINFO would need MASKS to >> > > contain a sequence of NVECTORS masks that each control a vector of >> > > type >> > > - VECTYPE. */ >> > > + VECTYPE. SCALAR_MASK if non-null, represents the mask used for >> > > corresponding >> > > + load/store stmt. */ >> > >> > Should be two spaces between sentences. Maybe: >> > >> > VECTYPE. If SCALAR_MASK is nonnull, the fully-masked loop would AND >> > these vector masks with the vector version of SCALAR_MASK. */ >> > >> > since the mask isn't necessarily for a load or store statement. >> > >> > > [...] >> > > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, >> > > tree, stmt_vec_info, >> > > says how the load or store is going to be implemented and GROUP_SIZE >> > > is the number of load or store statements in the containing group. >> > > If the access is a gather load or scatter store, GS_INFO describes >> > > - its arguments. >> > > + its arguments. SCALAR_MASK is the scalar mask used for corresponding >> > > + load or store stmt. >> > >> > Maybe: >> > >> > its arguments. If the load or store is conditional, SCALAR_MASK is the >> > condition under which it occurs. >> > >> > since SCALAR_MASK can be null here too. >> > >> > > [...] >> > > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info stmt_info, >> > > gimple_stmt_iterator *gsi, >> > > /* Handle cond expr. */ >> > > for (j = 0; j < ncopies; j++) >> > > { >> > > + tree loop_mask = NULL_TREE; >> > > + bool swap_cond_operands = false; >> > > + >> > > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) >> > > + { >> > > + scalar_cond_masked_key cond (cond_expr, ncopies); >> > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) >> > > + { >> > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); >> > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, >> > > vectype, j); >> > > + } >> > > + else >> > > + { >> > > + cond.code = invert_tree_comparison (cond.code, >> > > + HONOR_NANS (TREE_TYPE >> > > (cond.op0))); >> > >> > Long line. Maybe just split it out into a separate assignment: >> > >> > bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0)); >> > cond.code = invert_tree_comparison (cond.code, honor_nans); >> > >> > > + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) >> > > + { >> > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); >> > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, >> > > vectype, j); >> > >> > Long line here too. >> > >> > > [...] >> > > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info >> > > stmt_info, gimple_stmt_iterator *gsi, >> > > } >> > > } >> > > } >> > > + >> > > + if (loop_mask) >> > > + { >> > > + if (COMPARISON_CLASS_P (vec_compare)) >> > > + { >> > > + tree tmp = make_ssa_name (vec_cmp_type); >> > > + gassign *g = gimple_build_assign (tmp, >> > > + TREE_CODE >> > > (vec_compare), >> > > + TREE_OPERAND >> > > (vec_compare, 0), >> > d> + TREE_OPERAND >> > (vec_compare, 1)); >> > >> > Two long lines. >> > >> > > + vect_finish_stmt_generation (stmt_info, g, gsi); >> > > + vec_compare = tmp; >> > > + } >> > > + >> > > + tree tmp2 = make_ssa_name (vec_cmp_type); >> > > + gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, >> > > vec_compare, loop_mask); >> > >> > Long line here too. >> > >> > > [...] >> > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c >> > > index dc181524744..c4b2d8e8647 100644 >> > > --- a/gcc/tree-vectorizer.c >> > > +++ b/gcc/tree-vectorizer.c >> > > @@ -1513,3 +1513,39 @@ make_pass_ipa_increase_alignment (gcc::context >> > > *ctxt) >> > > { >> > > return new pass_ipa_increase_alignment (ctxt); >> > > } >> > > + >> > > +/* If code(T) is comparison op or def of comparison stmt, >> > > + extract it's operands. >> > > + Else return <NE_EXPR, T, 0>. */ >> > > + >> > > +void >> > > +scalar_cond_masked_key::get_cond_ops_from_tree (tree t) >> > > +{ >> > > + if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison) >> > > + { >> > > + this->code = TREE_CODE (t); >> > > + this->op0 = TREE_OPERAND (t, 0); >> > > + this->op1 = TREE_OPERAND (t, 1); >> > > + return; >> > > + } >> > > + >> > > + if (TREE_CODE (t) == SSA_NAME) >> > > + { >> > > + gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t)); >> > > + if (stmt) >> > > + { >> > >> > Might as well do this as: >> > >> > if (TREE_CODE (t) == SSA_NAME) >> > if (gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t))) >> > { >> > >> > The patch (as hoped) introduces some XPASSes: >> > >> > XPASS: gcc.target/aarch64/sve/cond_cnot_2.c scan-assembler-not \\tsel\\t >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 252 >> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times >> > \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 180 >> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times >> > \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21 >> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times >> > \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42 >> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times >> > \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15 >> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times >> > \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30 >> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times >> > \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21 >> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times >> > \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42 >> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times >> > \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15 >> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times >> > \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30 >> > >> > Could you remove the associated xfails (and comments above them where >> > appropriate)? >> > >> > OK with those changes from my POV, but please give Richi a day or so >> > to object. >> > >> > Thanks for doing this. >> Thanks for the suggestions, I have updated the patch accordingly. >> Boostrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu. >> Richi, does the patch look OK to you ? > Hi, > Bootstrap+test passes for x86_64-unknown-linux-gnu and aarch64-linux-gnu. > On x86_64, there's a "strange" failure of c-c++-common/builtins.c, log shows: > > /home/prathamesh.kulkarni/gnu-toolchain/gcc/pr86753-v2-3/gcc/gcc/test > FAIL: c-c++-common/builtins.c -Wc++-compat (test for excess errors) > Excess errors: > /home/prathamesh.kulkarni/gnu-toolchain/gcc/pr86753-v2-3/gcc/gcc/test > > Which shouldn't really happen since the test doesn't seem relevant to patch, > and only passes -O2 which shouldn't enable the vectorizer ? Manually > testing it results in PASS with: > make check-gcc RUNTESTFLAGS="dg.exp=builtins.c" > Would it be OK to ignore the FAIL during reg-test ?
Looks like the lines got truncated by the cut-&-paste. What was the excess error? Richard