On Wednesday 16 November 2005 14:35, Dorit Naishlos wrote:
> We're going to commit to autovect-branch vectorization support for
> non-unit-stride accesses.
> We'd like to suggest a few new tree-codes/optabs in order to express the
> extraction and merging of elements from/to vectors.

> Background:
>       The new functionality is going to allow us to vectorize computations
> with strides that are a power-of-2, like in the example below, in which the
> real and imaginary parts are interleaved, and therefore each of the
> data-refs accesses data with stride 2:
> 
>   for (i = 0; i < n; i++) {
>      tmp_re = in[2*i] * coefs[2*i] - in[2*i+1] * coefs[2*i+1];
>      tmp_im = in[2*i] * coefs[2*i+1] + in[2*i+1] * coefs[2*i];
>      out[2*i] = tmp_re;
>      out[2*i+1] = temp_im;
>   }
> 
> What is generally going to happen is that, for a VF=4, we're going to:
> 
> (1) load this data from memory:
>       vec_in1 = [re0,im0,re1,im1] = vload &in
>       vec_in2 = [re2,im2,re3,im3] = vload &in[VF]
>       (and similarly for the coefs array)
> 
> and then, because we're doing different operations on the odd and even
> elements, we need to
> (2) arrange them into separate vectors:
>       vec_in_re = [re0,re1,re2,re3] = extract_even (vec_in1, vec_in2)
>       vec_in_im = [im0,im1,im2,im3] = extract_odd (vec_in1, vec_in2)
>       (and similarly for the coefs array)

Have you considered targets that support interleaved load/store instructions?
I'm not sure if this is supported by existing targets, but in the next year 
there will be targets that can perform steps 1+2 in a single load-interleaved 
instruction.

Paul

Reply via email to