Hi Richard, > > [...] > > 3) So I abandoned vec-patterns and instead tried to do it in > > tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree > > is created. Matching the SLP tree is quite simple and getting it to > > emit the right SLP tree was simple enough,except that at this point > > all data references and loads have already been calculated. > > (3) seems like the way to go. Can you explain in more detail why it didn't > work? The SLP tree after matching should look something like this: > > REALPART_EXPR <*_10> = _4; > IMAGPART_EXPR <*_10> = _13; > > _4 = .COMPLEX_ADD_ROT_90 (_12, _8) > _13 = .COMPLEX_ADD_ROT_90 (_22, _6) > > _12 = REALPART_EXPR <*_3>; > _22 = IMAGPART_EXPR <*_3>; > > _8 = REALPART_EXPR <*_5>; > _6 = IMAGPART_EXPR <*_5>; > > The operands to the individual .COMPLEX_ADD_ROT_90s aren't the > operands that actually determine the associated scalar result, but that's > bound to be the case with something that includes an internal permute. All > we're trying to describe is an operation that does the right thing when > vectorised. > > If you didn't have the .COMPLEX_ADD_ROT_90 and just fell back on mixed > two-operator SLP, the final node would be in the opposite order: > > _6 = IMAGPART_EXPR <*_5>; > _8 = REALPART_EXPR <*_5>; > > So if you're doing the matching after building the initial tree, you'd need to > swap the statements in that node so that _8 comes first and cancel the > associated load permute. If you're doing the matching on the fly while > building the SLP tree then the subnodes should start out in the right order.
Ah, I hadn't tried it this way because in the SLP version, I had originally started with looking at the complex fma, which would have a considerably longer match pattern. _3 = c_14(D) + _2; _11 = REALPART_EXPR <*_3>; _21 = IMAGPART_EXPR <*_3>; _5 = a_15(D) + _2; _22 = REALPART_EXPR <*_5>; _12 = IMAGPART_EXPR <*_5>; _7 = b_16(D) + _2; _19 = REALPART_EXPR <*_7>; _20 = IMAGPART_EXPR <*_7>; _25 = _19 * _22; _26 = _12 * _20; _27 = _20 * _22; _28 = _12 * _19; _29 = _25 - _26; _30 = _27 + _28; _31 = _11 + _29; _32 = _21 + _30; REALPART_EXPR <*_3> = _31; IMAGPART_EXPR <*_3> = _32; So In this case I should replace _31 and _32 right? but I can't remove the other statements otherwise it'll complain later about the missing references. I could replace _31 and _32 with something using all the variables I would need, however when I tried this previously in vect-patterns there was a block on build in functions with more than 4 arguments (and currently 3 is the limit for built in functions in the def file as well). I don’t know if that same limitation is in place if I replace it in SLP. The complex add basically creates this vector b⋅c - e⋅f + l b⋅e + c⋅f + n so I'd need 5 parameters and then I'm guessing the other expressions would be removed by DCE at some point? Kind Regards, Tamar > > Thanks, > Richard