------- Comment #9 from rguenther at suse dot de 2009-07-20 12:55 ------- Subject: Re: Vectorization of complex types, vectorization of sincos missing
On Mon, 20 Jul 2009, irar at il dot ibm dot com wrote: > > > ------- Comment #7 from irar at il dot ibm dot com 2009-07-20 11:18 ------- > AFAIU, querying for the component type of complex type is not difficult to > implement. > I think, that loop-based vectorization is preferable here, so we should stay > with vectorization factor of 2 for doubles. > > The next problem is to vectorize > D.1611_4 = IMAGPART_EXPR <sincostmp.1_1>; > and > D.1612_6 = REALPART_EXPR <sincostmp.1_1>; > > Currently, we support only loads and stores with IMAGPART/REALPART_EXPR, > vectorizing them as strided accesses, with extract odd and even operations for > loads. So, we will have to support interleaving of non-memory variables. > > Does __builtin_cexpi have a vector implementation? If so, does it return two > vectors? No, currently cexpi doesn't have a vectorized version. We could add an internal builtin for that that takes a vector as argument and returns a vector with complex components. And lower this during expansion to a suitable available form (eventually just two calls). > If not, I guess, we need something like: > > sincostmp.1 = __builtin_cexpi (xd[i]); > sincostmp.2 = __builtin_cexpi (xd[i+1]); > v1 = VEC_EXTRACT_EVEN (sincostmp.1, sincostmp.2); > v2 = VEC_EXTRACT_ODD (sincostmp.1, sincostmp.2); > sf[i:i+1] = v1; > cf[i:i+1] = v2; > i = i + 2; Yes, that was my initial idea. > Or we can use the two vectors from vectorized __builtin_cexpi as parameters of > extract operations. > Does that make sense? Yes, I think so. With a vectorized builtin we'd have v0 = xd[i:i+1]; sincostmp.1 = __builtin_vect_cexpi (v0); v1 = VEC_EXTRACT_EVEN (sincostmp.1[0], sincostmp.1[1]); v2 = VEC_EXTRACT_ODD (sincostmp.1[0], sincostmp.1[1]); sf[i:i+1] = v1; cf[i:i+1] = v2; i = i + 2; where sincostmp.1[0] would select the lower half of a V4DF and sincostmp.1[1] the upper half of a V4DF. But that's probably more difficult as we'd have both V2DF and V4DF in the IL. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770