Re: Representing interleaving and lane load/stores at the tree level

Ira Rosen Sun, 06 Mar 2011 01:20:39 -0800

Sorry for the delay in my response, I was sick last week.

>
> I've been spending this week playing around with various representations
> of the v{ld,st}{1,2,3,4}{,_lane} operations.  I agree with Ira that the
> best representation would be to use built-in functions.
>
> One concern in the original discussion was that the optimisers might
> move the original MEM_REFs away from the call.  I don't think that's
> a problem though.  For loads, we can simply treat the whole of the
> accessed memory as an array, and pass the array by value.  If we do that,
> then the call would just look like:
>
>    __builtin_load_lanes (MEM_REF[(elem[N] *)ADDR])
>
> (where, despite the C notation, the MEM_REF accesses the whole of elem
[N]).
> It is of course possible in principle for the tree optimisers to replace
> this MEM_REF with another, equivalent, one, but that's OK semantically.
> It isn't possible for the optimisers to replace it with something like
> an SSA name, because arrays can't be stored in gimple registers.
>
> __builtin_load_lanes would then be used like this:
>
>    combined_vectors = __builtin_load_lanes (...);
>    vector1 = ...extract first vector from combined_vectors...
>    vector2 = ...extract second vector from combined_vectors...
>    ....
>

This looks good from the vectorizer point of view.

> So combined_vectors only exists for load and extract operations.
> The question then is: what type should it have?  (At this point I'm
> just talking about types, not modes.)  The main possibilities seemed to
be:
>
> 1. an integer type
>
>      Pros
>        * Gimple registers can store integers.
>
>      Cons
>        * As Julian points out, GCC doesn't really support integer types
>          that are wider than 2 HOST_WIDE_INTs.  It would be good to
>          remove that restriction, but it might be a lot of work, and it
>          isn't something we'd want to take on as part of this project.
>
>        * We're not really using the type as an integer.
>
>        * The combination of the integer type and the __builtin_load_lanes
>          array argument wouldn't be enough to determine the correct
>          load operation.  __builtin_load_lanes would need something
>          like a vector count (N => vldN) argument as well.
>
> 2. a combined vector type
>
>      Pros
>        * Gimple registers can store vectors.
>
>      Cons
>        * For vld3, this would mean creating vector types with non-power-
>          of-two vectors.  GCC doesn't support those yet, and you get
>          ICEs as soon as you try to use them.  (Remember that this is
>          all about types, not modes.)
>
>          It _might_ be interesting to implement this support, but as
>          above, it would be a lot of work.  It also raises some semantic
>          questions, such as: what is the alignment of the new vectors?
>          Which leads to...
>
>        * The alignment of the type would be strange.  E.g. suppose
>          we're loading N*2 uint32_ts into N vectors of 2 elements each.
>          The types and alignments would be:
>
>            N=2 uint32x4_t, alignment 16
>            N=3 uint32x6_t, alignment 8 (if we follow the convention for
modes)
>            N=4 uint32x8_t, alignment 32
>
>          We don't need alignments greater than 8 in our intended use;
>          16 and 32 are overkill.
>
>        * We're not really using the type as a single vector,
>          but as a collection of vectors.
>
>        * The combination of the vector type and the __builtin_load_lanes
>          array argument wouldn't be enough to determine the correct
>          load operation.  __builtin_load_lanes would need something
>          like a vector count (N => vldN) argument as well.
>
> 3. an array of vectors type
>
>      Pros
>        * No support for new GCC features (large integers or
non-power-of-two
>          vectors) is needed.
>
>        * The alignment of the type would be taken from the alignment of
the
>          individual vectors, which is correct.
>
>        * It accurately reflects how the loaded value is going to be used.
>
>        * The type uniquely identifies the correct load operation,
>          without need for additional arguments.  (This is minor.)
>
>      Cons
>        * Gimple registers can't store array values.
>
> So I think the only disadvantage of using an array of vectors is that the
> result can never be a gimple register.  But that isn't much of a
disadvantage
> really; the things we care about are the individual vectors, which can
> of course be treated as gimple registers.  I think our tracking of memory
> values is good enough for combined_vectors to be treated as such
> (even though, with the back-end changes we talked about earlier,
> they will actually be stored in RTL registers).

I agree that an array of vectors seems to be the best option here.


>
> So how about the following functions?  (Forgive the pascally syntax.)
>
>     __builtin_load_lanes (REF : array N*M of X)
>       returns array N of vector M of X
>       maps to vldN
>       in practice, the result would be used in assignments of the form:
>         vectorX = ARRAY_REF <result, X>
>
>     __builtin_store_lanes (VECTORS : array N of vector M of X)
>       returns array N*M of X
>       maps to vstN
>       in practice, the argument would be populated by assignments ofthe
form:
>         vectorX = ARRAY_REF <result, X>
>
>     __builtin_load_lane (REF : array N of X,
>           VECTORS : array N of vector M of X,
>           LANE : integer)
>       returns array N of vector M of X
>       maps to vldN_lane
>
>     __builtin_store_lane (VECTORS : array N of vector M of X,
>            LANE : integer)
>       returns array N of X
>       maps to vstN_lane
>

How do you distinguish between "multiple structures" and "single structure
to all lanes"?

> Note that each operation can be expanded independently.  The expansion
> doesn't rely on preceding or following statements.
>
> I've hacked up the prototype below as a proof of concept.  It includes
> changes to the C parser to allow these functions to be created in the
> original source code.  This is throw-away code though; it would never
> be submitted.
>
> I've also included a simple test case and the output I get from it.
> The output looks pretty good; there's not even the stray VMOV that
> I saw with the intrinsics earlier in the week.
>
> (Note that if you'd like to try this yourself, you'll need the patch
> I posted on Monday as well.)
>
> What do you think?  Obviously this discussion needs to move to gcc@ at
> some point,

Good idea.

Ira

> but I wanted to make sure this was vaguely sane first.
>
> Richard
>
> [attachment "lane-functions.patch" deleted by Ira Rosen/Haifa/IBM]
> [attachment "test.c" deleted by Ira Rosen/Haifa/IBM] [attachment
> "test.s" deleted by Ira Rosen/Haifa/IBM]
> _______________________________________________
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain


_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Representing interleaving and lane load/stores at the tree level

Reply via email to