------- Comment #9 from rguenther at suse dot de  2009-07-20 12:55 -------
Subject: Re:  Vectorization of complex types,
 vectorization of sincos missing

On Mon, 20 Jul 2009, irar at il dot ibm dot com wrote:

> 
> 
> ------- Comment #7 from irar at il dot ibm dot com  2009-07-20 11:18 -------
> AFAIU, querying for the component type of complex type is not difficult to
> implement. 
> I think, that loop-based vectorization is preferable here, so we should stay
> with vectorization factor of 2 for doubles.
> 
> The next problem is to vectorize 
>   D.1611_4 = IMAGPART_EXPR <sincostmp.1_1>;
> and
>   D.1612_6 = REALPART_EXPR <sincostmp.1_1>;
> 
> Currently, we support only loads and stores with IMAGPART/REALPART_EXPR,
> vectorizing them as strided accesses, with extract odd and even operations for
> loads. So, we will have to support interleaving of non-memory variables. 
> 
> Does __builtin_cexpi have a vector implementation? If so, does it return two
> vectors?

No, currently cexpi doesn't have a vectorized version.  We could add
an internal builtin for that that takes a vector as argument and
returns a vector with complex components.  And lower this during expansion
to a suitable available form (eventually just two calls).

> If not, I guess, we need something like:
> 
>   sincostmp.1 = __builtin_cexpi (xd[i]);
>   sincostmp.2 = __builtin_cexpi (xd[i+1]);
>   v1 = VEC_EXTRACT_EVEN (sincostmp.1, sincostmp.2);
>   v2 = VEC_EXTRACT_ODD (sincostmp.1, sincostmp.2);
>   sf[i:i+1] = v1;
>   cf[i:i+1] = v2;
>   i = i + 2;

Yes, that was my initial idea.

> Or we can use the two vectors from vectorized __builtin_cexpi as parameters of
> extract operations.
> Does that make sense?

Yes, I think so.  With a vectorized builtin we'd have

  v0 = xd[i:i+1];
  sincostmp.1 = __builtin_vect_cexpi (v0);
  v1 = VEC_EXTRACT_EVEN (sincostmp.1[0], sincostmp.1[1]);
  v2 = VEC_EXTRACT_ODD (sincostmp.1[0], sincostmp.1[1]);
  sf[i:i+1] = v1;
  cf[i:i+1] = v2;
  i = i + 2;

where sincostmp.1[0] would select the lower half of a V4DF and
sincostmp.1[1] the upper half of a V4DF.  But that's probably
more difficult as we'd have both V2DF and V4DF in the IL.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770

Reply via email to