Re: Representing interleaving and lane load/stores at the tree level

2011-03-06 Thread Ira Rosen



Sorry for the delay in my response, I was sick last week.

>
> I've been spending this week playing around with various representations
> of the v{ld,st}{1,2,3,4}{,_lane} operations.  I agree with Ira that the
> best representation would be to use built-in functions.
>
> One concern in the original discussion was that the optimisers might
> move the original MEM_REFs away from the call.  I don't think that's
> a problem though.  For loads, we can simply treat the whole of the
> accessed memory as an array, and pass the array by value.  If we do that,
> then the call would just look like:
>
>__builtin_load_lanes (MEM_REF[(elem[N] *)ADDR])
>
> (where, despite the C notation, the MEM_REF accesses the whole of elem
[N]).
> It is of course possible in principle for the tree optimisers to replace
> this MEM_REF with another, equivalent, one, but that's OK semantically.
> It isn't possible for the optimisers to replace it with something like
> an SSA name, because arrays can't be stored in gimple registers.
>
> __builtin_load_lanes would then be used like this:
>
>combined_vectors = __builtin_load_lanes (...);
>vector1 = ...extract first vector from combined_vectors...
>vector2 = ...extract second vector from combined_vectors...
>
>

This looks good from the vectorizer point of view.

> So combined_vectors only exists for load and extract operations.
> The question then is: what type should it have?  (At this point I'm
> just talking about types, not modes.)  The main possibilities seemed to
be:
>
> 1. an integer type
>
>  Pros
>* Gimple registers can store integers.
>
>  Cons
>* As Julian points out, GCC doesn't really support integer types
>  that are wider than 2 HOST_WIDE_INTs.  It would be good to
>  remove that restriction, but it might be a lot of work, and it
>  isn't something we'd want to take on as part of this project.
>
>* We're not really using the type as an integer.
>
>* The combination of the integer type and the __builtin_load_lanes
>  array argument wouldn't be enough to determine the correct
>  load operation.  __builtin_load_lanes would need something
>  like a vector count (N => vldN) argument as well.
>
> 2. a combined vector type
>
>  Pros
>* Gimple registers can store vectors.
>
>  Cons
>* For vld3, this would mean creating vector types with non-power-
>  of-two vectors.  GCC doesn't support those yet, and you get
>  ICEs as soon as you try to use them.  (Remember that this is
>  all about types, not modes.)
>
>  It _might_ be interesting to implement this support, but as
>  above, it would be a lot of work.  It also raises some semantic
>  questions, such as: what is the alignment of the new vectors?
>  Which leads to...
>
>* The alignment of the type would be strange.  E.g. suppose
>  we're loading N*2 uint32_ts into N vectors of 2 elements each.
>  The types and alignments would be:
>
>N=2 uint32x4_t, alignment 16
>N=3 uint32x6_t, alignment 8 (if we follow the convention for
modes)
>N=4 uint32x8_t, alignment 32
>
>  We don't need alignments greater than 8 in our intended use;
>  16 and 32 are overkill.
>
>* We're not really using the type as a single vector,
>  but as a collection of vectors.
>
>* The combination of the vector type and the __builtin_load_lanes
>  array argument wouldn't be enough to determine the correct
>  load operation.  __builtin_load_lanes would need something
>  like a vector count (N => vldN) argument as well.
>
> 3. an array of vectors type
>
>  Pros
>* No support for new GCC features (large integers or
non-power-of-two
>  vectors) is needed.
>
>* The alignment of the type would be taken from the alignment of
the
>  individual vectors, which is correct.
>
>* It accurately reflects how the loaded value is going to be used.
>
>* The type uniquely identifies the correct load operation,
>  without need for additional arguments.  (This is minor.)
>
>  Cons
>* Gimple registers can't store array values.
>
> So I think the only disadvantage of using an array of vectors is that the
> result can never be a gimple register.  But that isn't much of a
disadvantage
> really; the things we care about are the individual vectors, which can
> of course be treated as gimple registers.  I think our tracking of memory
> values is good enough for combined_vectors to be treated as such
> (even though, with the back-end changes we talked about earlier,
> they will actually be stored in RTL registers).

I agree that an array of vectors seems to be the best option here.


>
> So how about the following functions?  (Forgive the pascally syntax.)
>
> __builtin_load_lanes (REF : array N*M of X)
>  

[ACTIVITY] Feb.28 -- Mar.06

2011-03-06 Thread Chung-Lin Tang
Last week:
* Launchpad #711819 / PR47719: ARM minipool ICE. Followed up on
discussion with Bernd and Ramana. Later posted discussion results on
gcc-patches, where Richard Earnshaw took it over with a final fix.

* Coremark ARMv7/v6 regressions: mostly pinpointed the exact cases where
RTL simplification fails to optimize away ZERO_EXTEND expressions. Still
working on how to enhance it.

* TW Public Holiday on Feb.28 (Mon), was off for one day.

This week:
* Try to turn Coremark regression investigation into code form.
* Other GCC issues.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain