On Mon, 18 Nov 2019, Richard Biener wrote: > On Sat, 16 Nov 2019, Richard Sandiford wrote: > > > If stmt analysis failed for an SLP node, r278246 tried building the > > node from scalars instead. By that point we've already processed child > > nodes (usually successfully) and might have made some of them the lead > > nodes for their stmt list. This is why we couldn't free the child nodes > > when deciding to build from scalars: > > > > /* Don't remove and free the child nodes here, since they could be > > referenced by other structures. The analysis and scheduling phases > > (need to) ignore child nodes of anything that isn't vect_internal_def. > > */ > > > > The problem in this PR is that we (correctly) don't process the unused > > child nodes again during the scheduling phase, which means that those > > nodes won't become the leader again. So some other node with same stmts > > becomes the leader instead. However, any internal-def child nodes of > > this new leader won't have been processed during the analysis phase, > > because we also (correctly) skip child nodes of non-leader nodes. > > > > We talked on IRC about fixing this by sharing nodes between SLP > > instances, so that it becomes a "proper" graph. But that seems > > like it could throw up some problems too, so I wanted to fix the > > PR in a less invasive way first. > > > > This patch therefore records the leader nodes during the analysis > > phase and reuses that choice during scheduling. When scheduling > > a non-leader node, we first try to schedule the leader node and > > then reuse its vector stmts. At the moment, doing this means we > > need to know the leader node's SLP instance, so the patch records > > that in the SLP node too. > > > > While there, I noticed that the two-operation handling returned > > early without restoring the original def types. That's really > > an independent bug fix. > > Can you split that out please? > > For the rest I prefer the following which I am now fully testing > on x86_64-unknown-linux-gnu (vect.exp testing went fine). As > discussed on IRC this makes the SLP graph nodes shared between > SLP instances (which it has in fact been, just "virtual" via > all the leader computations). So SLP instances are now just > the collection of entries into the directed SLP graph. > > I'm also throwing this on a SPEC build test.
Bootstrap & regtest went fine. SPEC building reveals a latent CTOR vectorization issue fixed by the pach below. Re-testing the combined patch now. Richard. 2019-11-18 Richard Biener <rguent...@suse.de> * tree-vect-slp.c (vect_analyze_slp_instance): When a CTOR was vectorized with just external refs fail. * gcc.dg/vect/vect-ctor-1.c: New testcase. Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 278389) +++ gcc/tree-vect-slp.c (working copy) @@ -2260,6 +2253,18 @@ vect_analyze_slp_instance (vec_info *vin matches[group_size / const_max_nunits * const_max_nunits] = false; vect_free_slp_tree (node, false); } + else if (constructor + && SLP_TREE_DEF_TYPE (node) != vect_internal_def) + { + /* CONSTRUCTOR vectorization relies on a vector stmt being + generated, that doesn't work for fully external ones. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Build SLP failed: CONSTRUCTOR of external " + "or constant elements\n"); + vect_free_slp_tree (node, false); + return false; + } else { /* Create a new SLP instance. */ Index: gcc/testsuite/gcc.dg/vect/vect-ctor-1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-ctor-1.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/vect-ctor-1.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3" } */ +/* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */ + +typedef struct { + unsigned short mprr_2[5][16][16]; +} ImageParameters; +int s[16][2]; +void intrapred_luma_16x16(ImageParameters *img, int s0) +{ + for (int j=0; j < 16; j++) + for (int i=0; i < 16; i++) + { + img->mprr_2[1 ][j][i]=s[j][1]; + img->mprr_2[2 ][j][i]=s0; + } +}