Sorry for the delay in my response, I was sick last week.
>
> I've been spending this week playing around with various representations
> of the v{ld,st}{1,2,3,4}{,_lane} operations. I agree with Ira that the
> best representation would be to use built-in functions.
>
> One concern in the original discussion was that the optimisers might
> move the original MEM_REFs away from the call. I don't think that's
> a problem though. For loads, we can simply treat the whole of the
> accessed memory as an array, and pass the array by value. If we do that,
> then the call would just look like:
>
>__builtin_load_lanes (MEM_REF[(elem[N] *)ADDR])
>
> (where, despite the C notation, the MEM_REF accesses the whole of elem
[N]).
> It is of course possible in principle for the tree optimisers to replace
> this MEM_REF with another, equivalent, one, but that's OK semantically.
> It isn't possible for the optimisers to replace it with something like
> an SSA name, because arrays can't be stored in gimple registers.
>
> __builtin_load_lanes would then be used like this:
>
>combined_vectors = __builtin_load_lanes (...);
>vector1 = ...extract first vector from combined_vectors...
>vector2 = ...extract second vector from combined_vectors...
>
>
This looks good from the vectorizer point of view.
> So combined_vectors only exists for load and extract operations.
> The question then is: what type should it have? (At this point I'm
> just talking about types, not modes.) The main possibilities seemed to
be:
>
> 1. an integer type
>
> Pros
>* Gimple registers can store integers.
>
> Cons
>* As Julian points out, GCC doesn't really support integer types
> that are wider than 2 HOST_WIDE_INTs. It would be good to
> remove that restriction, but it might be a lot of work, and it
> isn't something we'd want to take on as part of this project.
>
>* We're not really using the type as an integer.
>
>* The combination of the integer type and the __builtin_load_lanes
> array argument wouldn't be enough to determine the correct
> load operation. __builtin_load_lanes would need something
> like a vector count (N => vldN) argument as well.
>
> 2. a combined vector type
>
> Pros
>* Gimple registers can store vectors.
>
> Cons
>* For vld3, this would mean creating vector types with non-power-
> of-two vectors. GCC doesn't support those yet, and you get
> ICEs as soon as you try to use them. (Remember that this is
> all about types, not modes.)
>
> It _might_ be interesting to implement this support, but as
> above, it would be a lot of work. It also raises some semantic
> questions, such as: what is the alignment of the new vectors?
> Which leads to...
>
>* The alignment of the type would be strange. E.g. suppose
> we're loading N*2 uint32_ts into N vectors of 2 elements each.
> The types and alignments would be:
>
>N=2 uint32x4_t, alignment 16
>N=3 uint32x6_t, alignment 8 (if we follow the convention for
modes)
>N=4 uint32x8_t, alignment 32
>
> We don't need alignments greater than 8 in our intended use;
> 16 and 32 are overkill.
>
>* We're not really using the type as a single vector,
> but as a collection of vectors.
>
>* The combination of the vector type and the __builtin_load_lanes
> array argument wouldn't be enough to determine the correct
> load operation. __builtin_load_lanes would need something
> like a vector count (N => vldN) argument as well.
>
> 3. an array of vectors type
>
> Pros
>* No support for new GCC features (large integers or
non-power-of-two
> vectors) is needed.
>
>* The alignment of the type would be taken from the alignment of
the
> individual vectors, which is correct.
>
>* It accurately reflects how the loaded value is going to be used.
>
>* The type uniquely identifies the correct load operation,
> without need for additional arguments. (This is minor.)
>
> Cons
>* Gimple registers can't store array values.
>
> So I think the only disadvantage of using an array of vectors is that the
> result can never be a gimple register. But that isn't much of a
disadvantage
> really; the things we care about are the individual vectors, which can
> of course be treated as gimple registers. I think our tracking of memory
> values is good enough for combined_vectors to be treated as such
> (even though, with the back-end changes we talked about earlier,
> they will actually be stored in RTL registers).
I agree that an array of vectors seems to be the best option here.
>
> So how about the following functions? (Forgive the pascally syntax.)
>
> __builtin_load_lanes (REF : array N*M of X)
>