On Mon, 18 Nov 2019, Richard Biener wrote:

> On Sat, 16 Nov 2019, Richard Sandiford wrote:
> 
> > If stmt analysis failed for an SLP node, r278246 tried building the
> > node from scalars instead.  By that point we've already processed child
> > nodes (usually successfully) and might have made some of them the lead
> > nodes for their stmt list.  This is why we couldn't free the child nodes
> > when deciding to build from scalars:
> > 
> >   /* Don't remove and free the child nodes here, since they could be
> >      referenced by other structures.  The analysis and scheduling phases
> >      (need to) ignore child nodes of anything that isn't vect_internal_def. 
> >  */
> > 
> > The problem in this PR is that we (correctly) don't process the unused
> > child nodes again during the scheduling phase, which means that those
> > nodes won't become the leader again.  So some other node with same stmts
> > becomes the leader instead.  However, any internal-def child nodes of
> > this new leader won't have been processed during the analysis phase,
> > because we also (correctly) skip child nodes of non-leader nodes.
> > 
> > We talked on IRC about fixing this by sharing nodes between SLP
> > instances, so that it becomes a "proper" graph.  But that seems
> > like it could throw up some problems too, so I wanted to fix the
> > PR in a less invasive way first.
> > 
> > This patch therefore records the leader nodes during the analysis
> > phase and reuses that choice during scheduling.  When scheduling
> > a non-leader node, we first try to schedule the leader node and
> > then reuse its vector stmts.  At the moment, doing this means we
> > need to know the leader node's SLP instance, so the patch records
> > that in the SLP node too.
> > 
> > While there, I noticed that the two-operation handling returned
> > early without restoring the original def types.  That's really
> > an independent bug fix.
> 
> Can you split that out please?
> 
> For the rest I prefer the following which I am now fully testing
> on x86_64-unknown-linux-gnu (vect.exp testing went fine).  As
> discussed on IRC this makes the SLP graph nodes shared between
> SLP instances (which it has in fact been, just "virtual" via
> all the leader computations).  So SLP instances are now just
> the collection of entries into the directed SLP graph.
> 
> I'm also throwing this on a SPEC build test.

Bootstrap & regtest went fine.  SPEC building reveals a latent
CTOR vectorization issue fixed by the pach below.

Re-testing the combined patch now.

Richard.

2019-11-18  Richard Biener  <rguent...@suse.de>

        * tree-vect-slp.c (vect_analyze_slp_instance): When a CTOR
        was vectorized with just external refs fail.

        * gcc.dg/vect/vect-ctor-1.c: New testcase.

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c (revision 278389)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2260,6 +2253,18 @@ vect_analyze_slp_instance (vec_info *vin
          matches[group_size / const_max_nunits * const_max_nunits] = false;
          vect_free_slp_tree (node, false);
        }
+      else if (constructor
+              && SLP_TREE_DEF_TYPE (node) != vect_internal_def)
+       {
+         /* CONSTRUCTOR vectorization relies on a vector stmt being
+            generated, that doesn't work for fully external ones.  */
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                            "Build SLP failed: CONSTRUCTOR of external "
+                            "or constant elements\n");
+         vect_free_slp_tree (node, false);
+         return false;
+       }
       else
        {
          /* Create a new SLP instance.  */
Index: gcc/testsuite/gcc.dg/vect/vect-ctor-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-ctor-1.c     (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-ctor-1.c     (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
+
+typedef struct {
+    unsigned short mprr_2[5][16][16];
+} ImageParameters;
+int s[16][2];
+void intrapred_luma_16x16(ImageParameters *img, int s0)
+{
+  for (int j=0; j < 16; j++)
+    for (int i=0; i < 16; i++)
+      {
+       img->mprr_2[1 ][j][i]=s[j][1];
+       img->mprr_2[2 ][j][i]=s0;
+      }
+}

Reply via email to