Sorry for the delay in my response, I was sick last week.
>
> I've been spending this week playing around with various representations
> of the v{ld,st}{1,2,3,4}{,_lane} operations. I agree with Ira that the
> best representation would be to use built-in functions.
>
> One concern in the original discussion was that the optimisers might
> move the original MEM_REFs away from the call. I don't think that's
> a problem though. For loads, we can simply treat the whole of the
> accessed memory as an array, and pass the array by value. If we do that,
> then the call would just look like:
>
> __builtin_load_lanes (MEM_REF[(elem[N] *)ADDR])
>
> (where, despite the C notation, the MEM_REF accesses the whole of elem
[N]).
> It is of course possible in principle for the tree optimisers to replace
> this MEM_REF with another, equivalent, one, but that's OK semantically.
> It isn't possible for the optimisers to replace it with something like
> an SSA name, because arrays can't be stored in gimple registers.
>
> __builtin_load_lanes would then be used like this:
>
> combined_vectors = __builtin_load_lanes (...);
> vector1 = ...extract first vector from combined_vectors...
> vector2 = ...extract second vector from combined_vectors...
> ....
>
This looks good from the vectorizer point of view.
> So combined_vectors only exists for load and extract operations.
> The question then is: what type should it have? (At this point I'm
> just talking about types, not modes.) The main possibilities seemed to
be:
>
> 1. an integer type
>
> Pros
> * Gimple registers can store integers.
>
> Cons
> * As Julian points out, GCC doesn't really support integer types
> that are wider than 2 HOST_WIDE_INTs. It would be good to
> remove that restriction, but it might be a lot of work, and it
> isn't something we'd want to take on as part of this project.
>
> * We're not really using the type as an integer.
>
> * The combination of the integer type and the __builtin_load_lanes
> array argument wouldn't be enough to determine the correct
> load operation. __builtin_load_lanes would need something
> like a vector count (N => vldN) argument as well.
>
> 2. a combined vector type
>
> Pros
> * Gimple registers can store vectors.
>
> Cons
> * For vld3, this would mean creating vector types with non-power-
> of-two vectors. GCC doesn't support those yet, and you get
> ICEs as soon as you try to use them. (Remember that this is
> all about types, not modes.)
>
> It _might_ be interesting to implement this support, but as
> above, it would be a lot of work. It also raises some semantic
> questions, such as: what is the alignment of the new vectors?
> Which leads to...
>
> * The alignment of the type would be strange. E.g. suppose
> we're loading N*2 uint32_ts into N vectors of 2 elements each.
> The types and alignments would be:
>
> N=2 uint32x4_t, alignment 16
> N=3 uint32x6_t, alignment 8 (if we follow the convention for
modes)
> N=4 uint32x8_t, alignment 32
>
> We don't need alignments greater than 8 in our intended use;
> 16 and 32 are overkill.
>
> * We're not really using the type as a single vector,
> but as a collection of vectors.
>
> * The combination of the vector type and the __builtin_load_lanes
> array argument wouldn't be enough to determine the correct
> load operation. __builtin_load_lanes would need something
> like a vector count (N => vldN) argument as well.
>
> 3. an array of vectors type
>
> Pros
> * No support for new GCC features (large integers or
non-power-of-two
> vectors) is needed.
>
> * The alignment of the type would be taken from the alignment of
the
> individual vectors, which is correct.
>
> * It accurately reflects how the loaded value is going to be used.
>
> * The type uniquely identifies the correct load operation,
> without need for additional arguments. (This is minor.)
>
> Cons
> * Gimple registers can't store array values.
>
> So I think the only disadvantage of using an array of vectors is that the
> result can never be a gimple register. But that isn't much of a
disadvantage
> really; the things we care about are the individual vectors, which can
> of course be treated as gimple registers. I think our tracking of memory
> values is good enough for combined_vectors to be treated as such
> (even though, with the back-end changes we talked about earlier,
> they will actually be stored in RTL registers).
I agree that an array of vectors seems to be the best option here.
>
> So how about the following functions? (Forgive the pascally syntax.)
>
> __builtin_load_lanes (REF : array N*M of X)
> returns array N of vector M of X
> maps to vldN
> in practice, the result would be used in assignments of the form:
> vectorX = ARRAY_REF <result, X>
>
> __builtin_store_lanes (VECTORS : array N of vector M of X)
> returns array N*M of X
> maps to vstN
> in practice, the argument would be populated by assignments ofthe
form:
> vectorX = ARRAY_REF <result, X>
>
> __builtin_load_lane (REF : array N of X,
> VECTORS : array N of vector M of X,
> LANE : integer)
> returns array N of vector M of X
> maps to vldN_lane
>
> __builtin_store_lane (VECTORS : array N of vector M of X,
> LANE : integer)
> returns array N of X
> maps to vstN_lane
>
How do you distinguish between "multiple structures" and "single structure
to all lanes"?
> Note that each operation can be expanded independently. The expansion
> doesn't rely on preceding or following statements.
>
> I've hacked up the prototype below as a proof of concept. It includes
> changes to the C parser to allow these functions to be created in the
> original source code. This is throw-away code though; it would never
> be submitted.
>
> I've also included a simple test case and the output I get from it.
> The output looks pretty good; there's not even the stray VMOV that
> I saw with the intrinsics earlier in the week.
>
> (Note that if you'd like to try this yourself, you'll need the patch
> I posted on Monday as well.)
>
> What do you think? Obviously this discussion needs to move to gcc@ at
> some point,
Good idea.
Ira
> but I wanted to make sure this was vaguely sane first.
>
> Richard
>
> [attachment "lane-functions.patch" deleted by Ira Rosen/Haifa/IBM]
> [attachment "test.c" deleted by Ira Rosen/Haifa/IBM] [attachment
> "test.s" deleted by Ira Rosen/Haifa/IBM]
> _______________________________________________
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain