From: Richard Biener <[email protected]>
Date: Monday, June 26, 2023 at 2:23 PM
To: Tejas Belagod <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
<[email protected]> wrote:
>
> Hi,
>
> Packed Boolean Vectors
> ----------------------
>
> I'd like to propose a feature addition to GNU Vector extensions to add packed
> boolean vectors (PBV). This has been discussed in the past here[1] and a
> variant has
> been implemented in Clang recently[2].
>
> With predication features being added to vector architectures (SVE, MVE, AVX),
> it is a useful feature to have to model predication on targets. This could
> find its use in intrinsics or just used as is as a GNU vector extension being
> mapped to underlying target features. For example, the packed boolean vector
> could directly map to a predicate register on SVE.
>
> Also, this new packed boolean type GNU extension can be used with SVE ACLE
> intrinsics to replace a fixed-length svbool_t.
>
> Here are a few options to represent the packed boolean vector type.
The GIMPLE frontend uses a new 'vector_mask' attribute:
typedef int v8si __attribute__((vector_size(8*sizeof(int))));
typedef v8si v8sib __attribute__((vector_mask));
it get's you a vector type that's the appropriate (dependent on the
target) vector
mask type for the vector data type (v8si in this case).
Thanks Richard.
Having had a quick look at the implementation, it does seem to tick the boxes.
I must admit I haven't dug deep, but if the target hook allows the mask to be
defined in way that is target-friendly (and I don't know how much effort it will
be to migrate the attribute to more front-ends), it should do the job nicely.
Let me go back and dig a bit deeper and get back with questions if any.
Thanks,
Tejas.
> 1. __attribute__((vector_size (n))) where n represents bytes
>
> typedef bool vbool __attribute__ ((vector_size (1)));
>
> In this approach, the shape of the boolean vector is unclear. IoW, it is not
> clear if each bit in 'n' controls a byte or an element. On targets
> like SVE, it would be natural to have each bit control a byte of the target
> vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX,
> each
> bit would control one element/lane on the target vector(therefore resulting
> in a
> 'packed' layout with all significant bits at the LSB).
>
> 2. __attribute__((vector_size (n))) where n represents num of lanes
>
> typedef int v4si __attribute__ ((vector_size (4 * sizeof (int)));
> typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof
> (v4si){0}[0])));
>
> Here the 'n' in the vector_size attribute represents the number of bits that
> is needed to represent a vector quantity. In this case, this packed boolean
> vector can represent upto 'n' vector lanes. The size of the type is
> rounded up the nearest byte. For example, the sizeof v4bi in the above
> example is 1.
>
> In this approach, because of the nature of the representation, the n bits
> required
> to represent the n lanes of the vector are packed at the LSB. This does not
> naturally
> align with the SVE approach of each bit representing a byte of the target
> vector
> and PBV therefore having an 'unpacked' layout.
>
> More importantly, another drawback here is that the change in units for
> vector_size
> might be confusing to programmers. The units will have to be interpreted
> based on the
> base type of the typedef. It does not offer any flexibility in terms of the
> layout of
> the bool vector - it is fixed.
>
> 3. Combination of 1 and 2.
>
> Combining the best of 1 and 2, we can introduce extra parameters to
> vector_size that will
> unambiguously represent the layout of the PBV. Consider
>
> typedef bool vbool __attribute__((vector_size (s, n[, w])));
>
> where 's' is size in bytes, 'n' is the number of lanes and an optional 3rd
> parameter 'w'
> is the number of bits of the PBV that represents a lane of the target vector.
> 'w' would
> allow a target to force a certain layout of the PBV.
>
> The 2-parameter form of vector_size allows the target to have an
> implementation-defined layout of the PBV. The target is free to choose the 'w'
> if it is not specified to mirror the target layout of predicate registers. For
> eg. AVX would choose 'w' as 1 and SVE would choose s*8/n.
>
> As an example, to represent the result of a comparison on 2 int16x8_t, we'd
> need
> 8 lanes of boolean which could be represented by
>
> typedef bool v8b __attribute__ ((vector_size (2, 8)));
>
> SVE would implement v8b layout to make every 2nd bit significant i.e. w == 2
>
> and AVX would choose a layout where all 8 consecutive bits packed at LSB would
> be significant i.e. w == 1.
>
> This scheme would accomodate more than 1 target to effectively represent
> vector
> bools that mirror the target properties.
>
> 4. A new attribite
>
> This is based on a suggestion from Richard S in [3]. The idea is to introduce
> a new
> attribute to define the PBV and make it general enough to
>
> * represent all targets flexibly (SVE, AVX etc)
> * represent sub-byte length predicates
> * have no change in units of vector_size/no new vector_size signature
> * not have the number of bytes constrain representation
>
> If we call the new attribute 'bool_vec' (for lack of a better name), consider
>
> typedef bool vbool __attribute__((bool_vec (n[, w])))
>
> where 'n' represents number of lanes/elements and the optional 'w' is
> bits-per-lane.
>
> If 'w' is not specified, it and bytes-per-predicate are
> implementation-defined based on target.
> If 'w' is specified, sizeof (vbool) will be ceil (n*w/8).
>
> 5. Behaviour of the packed vector boolean type.
>
> Taking the example of one of the options above, following is an illustration
> of it's behavior
>
> * ABI
>
> New ABI rules will need to be defined for this type - eg alignment, PCS,
> mangling etc
>
> * Initialization:
>
> Packed Boolean Vectors(PBV) can be initialized like so:
>
> typedef bool v4bi __attribute__ ((vector_size (2, 4, 4)));
> v4bi p = {false, true, false, false};
>
> Each value in the initizlizer constant is of type bool. The lowest numbered
> element in the const array corresponds to the LSbit of p, element 1 is
> assigned to bit 4 etc.
>
> p is effectively a 2-byte bitmask with value 0x0010
>
> With a different layout
>
> typedef bool v4bi __attribute__ ((vector_size (2, 4, 1)));
> v4bi p = {false, true, false, false};
>
> p is effectively a 2-byte bitmask with value 0x0002
>
> * Operations:
>
> Packed Boolean Vectors support the following operations:
> . unary ~
> . unary !
> . binary&,|andˆ
> . assignments &=, |= and ˆ=
> . comparisons <, <=, ==, !=, >= and >
> . Ternary operator ?:
>
> Operations are defined as applied to the individual elements i.e the bits
> that are significant in the PBV. Whether the PBVs are treated as bitmasks
> or otherwise is implementation-defined.
>
> Insignificant bits could affect results of comparisons or ternary operators.
> In such cases, it is implementation defined how the unused bits are treated.
>
> . Subscript operator []
>
> For the subscript operator, the packed boolean vector acts like a array of
> elements - the first or the 0th indexed element being the LSbit of the PBV.
> Subscript operator yields a scalar boolean value.
> For example:
>
> typedef bool v8b __attribute__ ((vector_size (2, 8, 2)));
>
> // Subscript operator result yields a boolean value.
> // x[3] is the 7th LSbit and x[1] is the 3rd LSbit of x.
> bool foo (v8b p, int n) { p[3] = true; return p[1]; }
>
> Out of bounds access: OOB access can be determined at compile time given the
> strong typing of the PBVs.
>
> PBV does not support address of operator(&) for elements of PBVs.
>
> . Implicit conversion from integer vectors to PBVs
>
> We would like to support the output of comparison operations to be PBVs.
> This
> requires us to define the implicit conversion from an integer vector to PBV
> as the result of vector comparisons are integer vectors.
>
> To define this operation:
>
> bool_vector = vector <cmpop> vector
>
> There is no change in how vector <cmpop> vector behavior i.e. this
> comparison
> would still produce an int_vector type as it does now.
>
> temp_int_vec = vector <cmpop> vector
> bool_vec = temp_int_vec // Implicit conversion from int_vec to bool_vec
>
> The implicit conversion from int_vec to bool I'd define simply to be:
>
> bool_vec[n] = (_Bool) int_vec[n]
>
> where the C11 standard rules apply
> 6.3.1.2 Boolean type When any scalar value is converted to _Bool, the
> result
> is 0 if the value compares equal to 0; otherwise, the result is 1.
>
>
> [1] https://lists.llvm.org/pipermail/cfe-dev/2020-May/065434.html
> [2] https://reviews.llvm.org/D88905
> [3] https://reviews.llvm.org/D81083
>
> Thoughts?
>
> Thanks,
> Tejas.