Re: [RFC] Proposal to support Packed Boolean Vector masks.

Richard Biener Wed, 17 Jul 2024 06:43:34 -0700

On Wed, Jul 17, 2024 at 3:17 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Richard Biener <richard.guent...@gmail.com> writes:
> > On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod <tejas.bela...@arm.com> wrote:
> >>
> >> On 7/17/24 4:36 PM, Richard Biener wrote:
> >> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod <tejas.bela...@arm.com> 
> >> > wrote:
> >> >>
> >> >> On 7/15/24 6:05 PM, Richard Biener wrote:
> >> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod <tejas.bela...@arm.com> 
> >> >>> wrote:
> >> >>>>
> >> >>>> On 7/15/24 12:16 PM, Tejas Belagod wrote:
> >> >>>>> On 7/12/24 6:40 PM, Richard Biener wrote:
> >> >>>>>> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek <ja...@redhat.com> 
> >> >>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> >> >>>>>>>> Padding is only an issue for very small vectors - the obvious 
> >> >>>>>>>> choice is
> >> >>>>>>>> to disallow vector types that would require any padding.  I can 
> >> >>>>>>>> hardly
> >> >>>>>>>> see where those are faster than using a vector of up to 4 char
> >> >>>>>>>> elements.
> >> >>>>>>>> Problematic are 1-bit elements with 4, 2 or one element vectors,
> >> >>>>>>>> 2-bit elements
> >> >>>>>>>> with 2 or one element vectors and 4-bit elements with 1 element
> >> >>>>>>>> vectors.
> >> >>>>>>>
> >> >>>>>>> I'd really like to avoid having to support something like
> >> >>>>>>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) 
> >> >>>>>>> *
> >> >>>>>>> 16)))
> >> >>>>>>> _BitInt(2) to say size of long long could be acceptable.
> >> >>>>>>
> >> >>>>>> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
> >> >>>>>> way to say
> >> >>>>>> the element should have n (< 8) bits.
> >> >>>>>>
> >> >>>>>>>> I have no idea what the stance of supporting _BitInt in C++ are,
> >> >>>>>>>> but most certainly diverging support (or even semantics) of the
> >> >>>>>>>> vector extension in C vs. C++ is undesirable.
> >> >>>>>>>
> >> >>>>>>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason
> >> >>>>>>> didn't
> >> >>>>>>> look favorably to _BitInt support in C++, so at least until 
> >> >>>>>>> something
> >> >>>>>>> like
> >> >>>>>>> that is standardized in C++ the answer is probably no.
> >> >>>>>>
> >> >>>>>> OK, I think that rules out _BitInt use here so while bool is then 
> >> >>>>>> natural
> >> >>>>>> for 1-bit elements for 2-bit and 4-bit elements we'd have to 
> >> >>>>>> specify the
> >> >>>>>> number of bits explicitly.  There is signed_bool_precision but like
> >> >>>>>> vector_mask it's use is restricted to the GIMPLE frontend because
> >> >>>>>> interaction with the rest of the language isn't defined.
> >> >>>>>>
> >> >>>>>
> >> >>>>> Thanks for all the suggestions - really insightful (to me) 
> >> >>>>> discussions.
> >> >>>>>
> >> >>>>> Yeah, BitInt seemed like it was best placed for this, but not having 
> >> >>>>> C++
> >> >>>>> support is definitely a blocker. But as you say, in the absence of
> >> >>>>> BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
> >> >>>>> way to specify non-1-bit widths could be overloading vector_size.
> >> >>>>>
> >> >>>>> Also, I think overloading GIMPLE's vector_mask takes us into the
> >> >>>>> earlier-discussed territory of what it should actually mean - it 
> >> >>>>> meaning
> >> >>>>> the target truth type in GIMPLE and a generic vector extension in 
> >> >>>>> the FE
> >> >>>>> will probably confuse gcc developers more than users.
> >> >>>>>
> >> >>>>>> That said - we're mixing two things here.  The desire to have 
> >> >>>>>> "proper"
> >> >>>>>> svbool (fix: declare in the backend) and the desire to have "packed"
> >> >>>>>> bit-precision vectors (for whatever actual reason) as part of the
> >> >>>>>> GCC vector extension.
> >> >>>>>>
> >> >>>>>
> >> >>>>> If we leave lane-disambiguation of svbool to the backend, the values 
> >> >>>>> I
> >> >>>>> see in supporting 1, 2 and 4 bitsizes are 1) first step towards
> >> >>>>> supporting BitInt(N) vectors possibly in the future 2) having a way 
> >> >>>>> for
> >> >>>>> targets to define their intrinsics' bool vector types using GNU
> >> >>>>> extensions 3) feature parity with Clang's ext_vector_type?
> >> >>>>>
> >> >>>>> I believe the primary motivation for Clang to support ext_vector_type
> >> >>>>> was to have a way to define target intrinsics' vector bool type using
> >> >>>>> vector extensions.
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> Interestingly, Clang seems to support
> >> >>>>
> >> >>>> typedef struct {
> >> >>>>        _Bool i:1;
> >> >>>> } STR;
> >> >>>>
> >> >>>> typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof 
> >> >>>> (STR)
> >> >>>> * 4))) vec;
> >> >>>>
> >> >>>>
> >> >>>> int foo (vec b) {
> >> >>>>       return sizeof b;
> >> >>>> }
> >> >>>>
> >> >>>> I can't find documentation about how it is implemented, but I suspect
> >> >>>> the vector is constructed as an array STR[] i.e. possibly each
> >> >>>> bit-element padded to byte boundary etc. Also, I can't seem to apply
> >> >>>> many operations other than sizeof.
> >> >>>>
> >> >>>> I don't know if we've tried to support such cases in GNU in the past?
> >> >>>
> >> >>> Why should we do that?  It doesn't make much sense.
> >> >>>
> >> >>> single-bit vectors is what _BitInt was invented for.
> >> >>
> >> >> Forgive me if I'm misunderstanding - I'm trying to figure out how
> >> >> _BitInts can be made to have single-bit generic vector semantics. For
> >> >> eg. If I want to initialize a _BitInt as vector, I can't do:
> >> >>
> >> >>    _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};
> >> >>
> >> >> as 'a' expects a scalar initialization.
> >> >>
> >> >> Of if I want to convert an int vector to bit vector, I can't do
> >> >>
> >> >>     v4si_p = v4si_a > v4si_b;
> >> >>     _BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4));
> >> >>
> >> >> Also semantics of conditionals with _BitInt behave like scalars
> >> >>
> >> >>     _BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they
> >> >> behave as scalars.
> >> >>
> >> >> Also, I can't do things like
> >> >>
> >> >>     typedef _BitInt (2) vbool __attribute__((vector_size(sizeof (_BitInt
> >> >> (2)) * 4)));
> >> >>
> >> >> to force it to behave as a vector because _BitInt is disallowed here.
> >> >>
> >> >
> >> > All I'm trying to say is that when people want to use vector<bool> as
> >> > a large packed bitfield they can now use _BitInt instead.  Of course
> >> > with a different (but portable) API.
> >> > > I don't see single-bit element vectors something as especially
> >> > useful with a "vector API".  What's its the use-case? (similar
> >> > for the two and four bit elements, with or without padding)
> >> >
> >>
> >> I'm trying to figure out if we had a portable (generic) way to represent
> >> predicate vectors(eg BitInts) in the front end, and had rules(or a
> >> vector API?)) that cast from integer vectors acting as bools to BitInts,
> >> would it be more efficient to lower to target predicate modes (VNx16BI
> >> etc on targets that support n-bit mode predicates)? It could also
> >> possibly interoperate with target intrinsics better than int bool vectors.
> >
> > No, we don't have an existing way to represent predicate vectors.  And no,
> > I don't think there's good evidence of necessity for supporting one
> > within the realm
> > of GCCs generic vector extension.  But there's plenty of doubt a portable
> > and performant way of doing this is possible.
>
> We'd like to be able to support things like:
>
>   svbool_t x, y, z;
>   x &= y | ~z;
>   y[0] = z[1];
>
> etc.  And, for fixed-size variants of svbool_t, we'd like to support:
>
>   fixed_svbool_t x = { 1, 0, 1, 0 }; // + implicit zeros
>
> The hope was that we could do that as a two-step process:
>
> - add a generic way of representing packed boolean vectors
> - inherit that generic support for the SVE ACLE types
>
> It seemed unlikely that adding SVE ACLE support directly to the frontends
> would be acceptable.  (E.g. direct target support in frontends was rejected
> for Altivec IIRC.)
>
> _BitInt doesn't seem like a good replacement since, like Tejas said,
> it doesn't support vector-style initialisation and indexing, and it
> isn't part of C++.  The last one is a killer for us, since so much
> intrinsics code is written in C++ using abstraction layers.
>
> Also, things like __builtin_shuffle and __builtin_convert should be
> supported for vector booleans, but wouldn't (I guess) be natural
> operations on _BitInt.
>
> std::experimental::simd does support indexing of mask types, which
> suggests that there is some demand for it.
>
> At the moment, the implementation of that for SVE has to convert to an
> integer vector, index that, and convert back to a bool:
>
> template <>
>   struct __sve_mask_type<2>
>   {
>     ...
>     typedef svuint16_t __sve_mask_vector_type
>     __attribute__((arm_sve_vector_bits(__ARM_FEATURE_SVE_BITS)));
>     ...
>     inline static bool
>     __sve_mask_get(type __active_mask, size_t __i)
>     { return __sve_mask_vector_type(svdup_u16_z(__active_mask, 1))[__i] != 0;}
>     ...
>   };
>
> It would be nice if it could just use:
>
>     inline static bool
>     __sve_mask_get(type __active_mask, size_t __i)
>     { return __active_mask[__i * 2]; }
>
> without the round trip through uint16_ts.
>
> Even better would be if __sve_mask_type<2> could use a 2-bits-per-element
> GNU-style boolean vector, so that the compiler has a better view of what's
> actually happening.  But for me, the main point was to design the extension
> so that multi-bit elements could be added later, rather than being a
> requirement from day 1.


I would start with declaring svbool in the backend and make the vector syntax
work with that.  Thus avoid giving users a way to create "generic" vector bools.
Exactly because we would need to sit down and design inter-operability.

Richard.

>
> Thanks,
> Richard

Re: [RFC] Proposal to support Packed Boolean Vector masks.

Reply via email to