On Wednesday 16 November 2005 14:35, Dorit Naishlos wrote: > We're going to commit to autovect-branch vectorization support for > non-unit-stride accesses. > We'd like to suggest a few new tree-codes/optabs in order to express the > extraction and merging of elements from/to vectors.
> Background: > The new functionality is going to allow us to vectorize computations > with strides that are a power-of-2, like in the example below, in which the > real and imaginary parts are interleaved, and therefore each of the > data-refs accesses data with stride 2: > > for (i = 0; i < n; i++) { > tmp_re = in[2*i] * coefs[2*i] - in[2*i+1] * coefs[2*i+1]; > tmp_im = in[2*i] * coefs[2*i+1] + in[2*i+1] * coefs[2*i]; > out[2*i] = tmp_re; > out[2*i+1] = temp_im; > } > > What is generally going to happen is that, for a VF=4, we're going to: > > (1) load this data from memory: > vec_in1 = [re0,im0,re1,im1] = vload &in > vec_in2 = [re2,im2,re3,im3] = vload &in[VF] > (and similarly for the coefs array) > > and then, because we're doing different operations on the odd and even > elements, we need to > (2) arrange them into separate vectors: > vec_in_re = [re0,re1,re2,re3] = extract_even (vec_in1, vec_in2) > vec_in_im = [im0,im1,im2,im3] = extract_odd (vec_in1, vec_in2) > (and similarly for the coefs array) Have you considered targets that support interleaved load/store instructions? I'm not sure if this is supported by existing targets, but in the next year there will be targets that can perform steps 1+2 in a single load-interleaved instruction. Paul