Tamar Christina <tamar.christ...@arm.com> writes: > Hi All, > > I am trying to add support to the auto-vectorizer for complex operations where > a target has instructions for. > > The instructions I have are only available as vector instructions. The > operations > are complex addition with a rotation or complex fmla with a rotation for > half floats, floats and doubles. > > They expect the complex number to be broken down and stored in vectors as > real/img parts. GCC already does this first part when it lowers complex > numbers > very early on in tree, so that's good. > > As a simple example, I am trying to get GCC to emit an internal function > .FCOMPLEX_ADD_ROT_90 (Complex addition with a 90* rotation) > when the target supports it. > > my C example is: > > void f90 (double complex a[N], double complex b[N], double complex c[N]) > { > for (int i=0; i < N; i++) > c[i] = a[i] + b[i] * I; > } > > Which in tree looks like > > _3 = a_15(D) + _2; > _12 = REALPART_EXPR <*_3>; > _22 = IMAGPART_EXPR <*_3>; > _5 = b_16(D) + _2; > _6 = IMAGPART_EXPR <*_5>; > _8 = REALPART_EXPR <*_5>; > _10 = c_17(D) + _2; > _4 = _12 - _6; > _13 = _8 + _22; > REALPART_EXPR <*_10> = _4; > IMAGPART_EXPR <*_10> = _13; > [...] > 3) So I abandoned vec-patterns and instead tried to do it in > tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree > is created. Matching the SLP tree is quite simple and getting it to > emit the right SLP tree was simple enough,except that at this point > all data references and loads have already been calculated.
(3) seems like the way to go. Can you explain in more detail why it didn't work? The SLP tree after matching should look something like this: REALPART_EXPR <*_10> = _4; IMAGPART_EXPR <*_10> = _13; _4 = .COMPLEX_ADD_ROT_90 (_12, _8) _13 = .COMPLEX_ADD_ROT_90 (_22, _6) _12 = REALPART_EXPR <*_3>; _22 = IMAGPART_EXPR <*_3>; _8 = REALPART_EXPR <*_5>; _6 = IMAGPART_EXPR <*_5>; The operands to the individual .COMPLEX_ADD_ROT_90s aren't the operands that actually determine the associated scalar result, but that's bound to be the case with something that includes an internal permute. All we're trying to describe is an operation that does the right thing when vectorised. If you didn't have the .COMPLEX_ADD_ROT_90 and just fell back on mixed two-operator SLP, the final node would be in the opposite order: _6 = IMAGPART_EXPR <*_5>; _8 = REALPART_EXPR <*_5>; So if you're doing the matching after building the initial tree, you'd need to swap the statements in that node so that _8 comes first and cancel the associated load permute. If you're doing the matching on the fly while building the SLP tree then the subnodes should start out in the right order. Thanks, Richard