Hi All, I am trying to add support to the auto-vectorizer for complex operations where a target has instructions for.
The instructions I have are only available as vector instructions. The operations are complex addition with a rotation or complex fmla with a rotation for half floats, floats and doubles. They expect the complex number to be broken down and stored in vectors as real/img parts. GCC already does this first part when it lowers complex numbers very early on in tree, so that's good. As a simple example, I am trying to get GCC to emit an internal function .FCOMPLEX_ADD_ROT_90 (Complex addition with a 90* rotation) when the target supports it. my C example is: void f90 (double complex a[N], double complex b[N], double complex c[N]) { for (int i=0; i < N; i++) c[i] = a[i] + b[i] * I; } Which in tree looks like _3 = a_15(D) + _2; _12 = REALPART_EXPR <*_3>; _22 = IMAGPART_EXPR <*_3>; _5 = b_16(D) + _2; _6 = IMAGPART_EXPR <*_5>; _8 = REALPART_EXPR <*_5>; _10 = c_17(D) + _2; _4 = _12 - _6; _13 = _8 + _22; REALPART_EXPR <*_10> = _4; IMAGPART_EXPR <*_10> = _13; after some rewriting from match.pd. what I'm after is for it to get rewritten as something like _3 = a_15(D) + _2; _5 = b_16(D) + _2; _10 = c_17(D) + _2; *_10 = .FCOMPLEX_ADD_ROT_90 (*_5, *_3) 1) My first attempt to do this was in tree-vect-patterns.c as just another vectorizer pattern. The first problem is that I need to match a pair of statements REALPART_EXPR <*_10> = _4; IMAGPART_EXPR <*_10> = _13; and not just a single one. This I can solve with getting the gsi for the statement being inspected and walking back up the tree to find the second pair. This works, but I am stopped by that the vectorizer (quite reasonably) doesn't know what to do when the statement is already a vector stmt. So it bails out and rejects the pattern substitution. 2) I thought about introducing two internal FN that would be treated as a pair to match against later, but not sure this would work. The problem with generating the two internal functions or doing the whole matching in combine (the vectorizer will always vectorize this so I could match the add and sub in a pattern later) is that I need to prevent it from treating them as a compound structure and instead just as a normal vector. In AArch64 terms I want to stop it from doing ld2 (load multiple 2-elem structures) and instead use ld1 loads (load multiple single element structures). In certain cases (rotations) it also thinks it has a permute and inserts a rotate in there which is also not desired. 3) So I abandoned vec-patterns and instead tried to do it in tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree is created. Matching the SLP tree is quite simple and getting it to emit the right SLP tree was simple enough, except that at this point all data references and loads have already been calculated. Which left me in a very painful process of removing the loads and forced me to reconstruct all this information. But I kept hitting more and more things I needed to manually recreate, which feels like not the right approach. If I just add a new stmt in and leave the ones in place, it just ends up getting ignored silently. My guess is because this statement has no data reference to anything. Any suggestions on what would be the right approach and that would be acceptable for upstreaming? Thanks, Tamar