Hi Richard,

> > [...]
> > 3) So I abandoned vec-patterns and instead tried to do it in
> > tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree
> > is created.  Matching the SLP tree is quite simple and getting it to
> > emit the right SLP tree was simple enough,except that at this point
> > all data references and loads have already been calculated.
> 
> (3) seems like the way to go.  Can you explain in more detail why it didn't
> work?  The SLP tree after matching should look something like this:
> 
>   REALPART_EXPR <*_10> = _4;
>   IMAGPART_EXPR <*_10> = _13;
> 
>   _4 = .COMPLEX_ADD_ROT_90 (_12, _8)
>   _13 = .COMPLEX_ADD_ROT_90 (_22, _6)
> 
>   _12 = REALPART_EXPR <*_3>;
>   _22 = IMAGPART_EXPR <*_3>;
> 
>   _8 = REALPART_EXPR <*_5>;
>   _6 = IMAGPART_EXPR <*_5>;
> 
> The operands to the individual .COMPLEX_ADD_ROT_90s aren't the
> operands that actually determine the associated scalar result, but that's
> bound to be the case with something that includes an internal permute.  All
> we're trying to describe is an operation that does the right thing when
> vectorised.
> 
> If you didn't have the .COMPLEX_ADD_ROT_90 and just fell back on mixed
> two-operator SLP, the final node would be in the opposite order:
> 
>   _6 = IMAGPART_EXPR <*_5>;
>   _8 = REALPART_EXPR <*_5>;
> 
> So if you're doing the matching after building the initial tree, you'd need to
> swap the statements in that node so that _8 comes first and cancel the
> associated load permute.  If you're doing the matching on the fly while
> building the SLP tree then the subnodes should start out in the right order.

Ah, I hadn't tried it this way because in the SLP version, I had originally 
started with looking
at the complex fma, which would have a considerably longer match pattern.

  _3 = c_14(D) + _2;
  _11 = REALPART_EXPR <*_3>;
  _21 = IMAGPART_EXPR <*_3>;
  _5 = a_15(D) + _2;
  _22 = REALPART_EXPR <*_5>;
  _12 = IMAGPART_EXPR <*_5>;
  _7 = b_16(D) + _2;
  _19 = REALPART_EXPR <*_7>;
  _20 = IMAGPART_EXPR <*_7>;
  _25 = _19 * _22;
  _26 = _12 * _20;
  _27 = _20 * _22;
  _28 = _12 * _19;
  _29 = _25 - _26;
  _30 = _27 + _28;
  _31 = _11 + _29;
  _32 = _21 + _30;
  REALPART_EXPR <*_3> = _31;
  IMAGPART_EXPR <*_3> = _32;

So In this case I should replace _31 and _32 right? but I can't remove the 
other statements otherwise it'll complain later about the missing references. I 
could replace _31 and _32 with something using all the variables I would need, 
however when I tried this previously in vect-patterns there was a block on 
build in functions with more than 4 arguments (and currently 3 is the limit for 
built in functions in the def file as well).  I don’t know if that same 
limitation is in place if I replace it in SLP.

The complex add basically creates this vector

 b⋅c - e⋅f + l 
 b⋅e + c⋅f + n

so I'd need 5 parameters and then I'm guessing the other expressions would be 
removed by DCE at some point?

Kind Regards,
Tamar

> 
> Thanks,
> Richard

Reply via email to