On November 21, 2019 8:02:14 PM GMT+01:00, Richard Sandiford <richard.sandif...@arm.com> wrote: >Richard Biener <rguent...@suse.de> writes: >> We're doing quite some useless work here and in the case we >> actually manage to "vectorize" it, we've done a no-op (bb-slp-42.c). > >The point of that test was that we could share a vector load at b with >four constructors, so it's not really a no-op: > > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01240.html
Ah, so it becomes a cost issue. The testcase wasn't too verbose at that... >Is this because x86 prefers to load-and-duplicate four individual >elements rather than load them all at once? Yeah, it can splat from a memory operand and thereby avoid the dependence on a single load plus eventually the STLF penalty if timely earlier stores were not vector. So I thought that's always more profitable than a vector load and then four shufflings. On x86 shuffling resources are also limited while the splat can be done plenty in parallel (even combined with other vector ops with AVX 512, saving a register). Richard. >Thanks, >Richard > >> >> It also refactors the routine a bit and only dumps about >"vectorizable" >> CTORs when we actually analyze the SLP tree (when all CTOR elements >> were internally defined). >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu. >> >> Richard. >> >> 2019-11-20 Richard Biener <rguent...@suse.de> >> >> * tree-vect-slp.c (vect_analyze_slp_instance): Dump >> constructors we are actually analyzing. >> (vect_slp_check_for_constructors): Do not vectorize uniform >> constuctors, do not dump here. >> >> * gcc.dg/vect/bb-slp-42.c: Adjust. >> >> Index: gcc/tree-vect-slp.c >> =================================================================== >> --- gcc/tree-vect-slp.c (revision 278484) >> +++ gcc/tree-vect-slp.c (working copy) >> @@ -2183,6 +2183,10 @@ vect_analyze_slp_instance (vec_info *vin >> else >> return false; >> } >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "Analyzing vectorizable constructor: %G\n", >> + stmt_info->stmt); >> } >> else >> { >> @@ -3123,31 +3127,22 @@ vect_slp_check_for_constructors (bb_vec_ >> gimple_stmt_iterator gsi; >> >> for (gsi = bb_vinfo->region_begin; >> - gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next >(&gsi)) >> + gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next >(&gsi)) >> { >> - gimple *stmt = gsi_stmt (gsi); >> - >> - if (is_gimple_assign (stmt) >> - && gimple_assign_rhs_code (stmt) == CONSTRUCTOR >> - && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME >> - && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == >VECTOR_TYPE) >> - { >> - tree rhs = gimple_assign_rhs1 (stmt); >> - >> - if (CONSTRUCTOR_NELTS (rhs) == 0) >> - continue; >> - >> - poly_uint64 subparts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)); >> + gassign *stmt = dyn_cast <gassign *> (gsi_stmt (gsi)); >> + if (!stmt || gimple_assign_rhs_code (stmt) != CONSTRUCTOR) >> + continue; >> >> - if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs))) >> - continue; >> + tree rhs = gimple_assign_rhs1 (stmt); >> + if (!VECTOR_TYPE_P (TREE_TYPE (rhs)) >> + || maybe_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)), >> + CONSTRUCTOR_NELTS (rhs)) >> + || VECTOR_TYPE_P (TREE_TYPE (CONSTRUCTOR_ELT (rhs, 0)->value)) >> + || uniform_vector_p (rhs)) >> + continue; >> >> - if (dump_enabled_p ()) >> - dump_printf_loc (MSG_NOTE, vect_location, >> - "Found vectorizable constructor: %G\n", stmt); >> - stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt); >> - BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info); >> - } >> + stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt); >> + BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info); >> } >> } >> >> Index: gcc/testsuite/gcc.dg/vect/bb-slp-42.c >> =================================================================== >> --- gcc/testsuite/gcc.dg/vect/bb-slp-42.c (revision 278484) >> +++ gcc/testsuite/gcc.dg/vect/bb-slp-42.c (working copy) >> @@ -44,6 +44,5 @@ main () >> >> } >> >> -/* See that we vectorize an SLP instance. */ >> -/* { dg-final { scan-tree-dump "Found vectorizable constructor" >"slp1" { target { ! vect_fully_masked } } } } */ >> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 >"slp1" { target { ! vect_fully_masked } } } } */ >> +/* See that we do not try to vectorize the uniform CTORs. */ >> +/* { dg-final { scan-tree-dump-not "Analyzing vectorizable >constructor" "slp1" } } */