Re: [RFC] Proposal to support Packed Boolean Vector masks.

Richard Sandiford Wed, 10 Jul 2024 03:44:14 -0700

Tejas Belagod <tejas.bela...@arm.com> writes:
> On 7/10/24 2:38 PM, Richard Biener wrote:
>> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod <tejas.bela...@arm.com> wrote:
>>>
>>> On 7/9/24 4:22 PM, Richard Biener wrote:
>>>> On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod <tejas.bela...@arm.com> 
>>>> wrote:
>>>>>
>>>>> On 7/8/24 4:45 PM, Richard Biener wrote:
>>>>>> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod <tejas.bela...@arm.com> 
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Sorry to have dropped the ball on
>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
>>>>>>> here I've tried to pick it up again and write up a strawman proposal for
>>>>>>> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tejas.
>>>>>>>
>>>>>>> Motivation
>>>>>>> ----------
>>>>>>>
>>>>>>> The idea of packed boolean vectors came about when we wanted to support
>>>>>>> C/C++ operators on SVE ACLE types. The current vector boolean type that
>>>>>>> ACLE specifies does not adequately disambiguate vector lane sizes which
>>>>>>> they were derived off of. Consider this simple, albeit unrealistic, 
>>>>>>> example:
>>>>>>>
>>>>>>>       bool foo (svint32_t a, svint32_t b)
>>>>>>>       {
>>>>>>>         svbool_t p = a > b;
>>>>>>>
>>>>>>>         // Here p[2] is not the same as a[2] > b[2].
>>>>>>>         return p[2];
>>>>>>>       }
>>>>>>>
>>>>>>> In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
>>>>>>> does not return the bool value corresponding to a[i] > b[i]. This
>>>>>>> necessitates a 'typed' vector boolean value that unambiguously
>>>>>>> represents results of operations
>>>>>>> of the same type.
>>>>>>>
>>>>>>> __attribute__((vector_mask))
>>>>>>> -----------------------------
>>>>>>>
>>>>>>> Note: If interested in historical discussions refer to:
>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
>>>>>>>
>>>>>>> We define this new attribute which when applied to a base data vector
>>>>>>> produces a new boolean vector type that represents a boolean type that
>>>>>>> is produced as a result of operations on the corresponding base vector
>>>>>>> type. The following is the syntax.
>>>>>>>
>>>>>>>       typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
>>>>>>>       typedef v8si v8sib __attribute__((vector_mask));
>>>>>>>
>>>>>>> Here the 'base' data vector type is v8si or a vector of 8 integers.
>>>>>>>
>>>>>>> Rules
>>>>>>>
>>>>>>> • The layout/size of the boolean vector type is implementation-defined
>>>>>>> for its base data vector type.
>>>>>>>
>>>>>>> • Two boolean vector types who's base data vector types have same number
>>>>>>> of elements and lane-width have the same layout and size.
>>>>>>>
>>>>>>> • Consequently, two boolean vectors who's base data vector types have
>>>>>>> different number of elements or different lane-size have different 
>>>>>>> layouts.
>>>>>>>
>>>>>>> This aligns with gnu vector extensions that generate integer vectors as
>>>>>>> a result of comparisons - "The result of the comparison is a vector of
>>>>>>> the same width and number of elements as the comparison operands with a
>>>>>>> signed integral element type." according to
>>>>>>>        https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
>>>>>>
>>>>>> Without having the time to re-review this all in detail I think the GNU
>>>>>> vector extension does not expose the result of the comparison as the
>>>>>> machine would produce it but instead a comparison "decays" to
>>>>>> a conditional:
>>>>>>
>>>>>> typedef int v4si __attribute__((vector_size(16)));
>>>>>>
>>>>>> v4si a;
>>>>>> v4si b;
>>>>>>
>>>>>> void foo()
>>>>>> {
>>>>>>      auto r = a < b;
>>>>>> }
>>>>>>
>>>>>> produces, with C23:
>>>>>>
>>>>>>      vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
>>>>>> 0, 0, 0 } > ;
>>>>>>
>>>>>> In fact on x86_64 with AVX and AVX512 you have two different "machine
>>>>>> produced" mask types and the above could either produce a AVX mask with
>>>>>> 32bit elements or a AVX512 mask with 1bit elements.
>>>>>>
>>>>>> Not exposing "native" mask types requires the compiler optimizing 
>>>>>> subsequent
>>>>>> uses and makes generic vectors difficult to combine with for example 
>>>>>> AVX512
>>>>>> intrinsics (where masks are just 'int').  Across an ABI boundary it's 
>>>>>> also
>>>>>> even more difficult to optimize mask transitions.
>>>>>>
>>>>>> But it at least allows portable code and it does not suffer from users 
>>>>>> trying to
>>>>>> expose machine representations of masks as input to generic vector code
>>>>>> with all the problems of constant folding not only requiring 
>>>>>> self-consistent
>>>>>> code within the compiler but compatibility with user produced constant 
>>>>>> masks.
>>>>>>
>>>>>> That said, I somewhat question the need to expose the target mask layout
>>>>>> to users for GCCs generic vector extension.
>>>>>>
>>>>>
>>>>> Thanks for your feedback.
>>>>>
>>>>> IIUC, I can imagine how having a GNU vector extension exposing the
>>>>> target vector mask layout can pose a challenge - maybe making it a
>>>>> generic GNU vector extension was too ambitious. I wonder if there's
>>>>> value in pursuing these alternate paths?
>>>>>
>>>>> 1. Can implementing this extension in a 'generic' way i.e. possibly not
>>>>> implement it with a target mask, but just a generic int vector, still
>>>>> maintain the consistency of GNU predicate vectors within the compiler? I
>>>>> know it may not seem very different from how boolean vectors are
>>>>> currently implemented (as in your above example), but, having the
>>>>> __attribute__((vector_mask)) as a 'property' of the object makes it
>>>>> useful to optimize its uses to target predicates in subsequent stages of
>>>>> the compiler.
>>>>>
>>>>> 2. Restricting __attribute__((vector_mask)) to apply only to target
>>>>> intrinsic types? Eg.
>>>>>
>>>>> On SVE something like:
>>>>> typedef svint16_t svpred16_t __attribute__((vector_mask)); // OK.
>>>>>
>>>>> On AVX, something like:
>>>>> typedef __m256i __mask32 __attribute__((vector_mask)); // OK - though
>>>>> this would require more fine-grained defn of lane-size to mask-bits 
>>>>> mapping.
>>>>
>>>> I think the target should be able to register builtin types already which
>>>> intrinsics could use.  There is already the vector_mask attribute but only
>>>> for GIMPLE and it has the same limitation of querying the target for the
>>>> actual mode being used - for AVX vs AVX512 one might be able to
>>>> combine this with a mode attribute.  Not sure if on arm you can parse
>>>> __attribute__((mode("Vx4BI4"))) or how the modes are called.
>>>>
>>>> But when you are talking about intrinsics I'd really suggest to leave the
>>>> type creation to the target rather than trying to do a typedef in a header?
>>>>
>>>
>>> Yeah, thinking about this a bit more, makes sense to keep intrinsic type
>>> creation in the target realm.
>>>
>>> Just to clarify if I understand your point about exposing masks' machine
>>> representations, would representing vector_mask types using opaque
>>> types/modes have the same challenges with compatibility with generic
>>> vector constants as it essentially would be a parallel type system, and
>>> would be unaffected by constant-folding etc due to their opacity? I ask
>>> because opacity might give the representation the flexibility of
>>> 'decaying' to a type based on the context it is used in.
>> 
>> I also thought about using an opaque type but I wonder if it really suits
>> here?  
>
> Sorry, yes using opaque type was your idea from last year's thread - I 
> merely reiterated it here. :-)
>
> Or would the target then need to decay a mask[i] into something
>> that's later recognizable?
>> 
>
> I think that would depend on the usage, wouldn't it - it could lower 
> down to target insn(s) based on how whether, for eg, its used as a test 
> or read as a scalar value?
>
>
>> So I guess the answer is you'd have to try.
>
> Thanks for your feedback so far - much appreciated. If it helps, I will 
> try to write up a prototype to test the idea - might help clear the mist 
> further.


Just to note that one of the original motivations (that applies more
to option 3 from last year's proposal) was to add support for general
packed vector boolean types to the GNU vector extension, as a feature
independent of the target's "native" format(s).  Clang already supports
this via ext_vector_type and it seemed like there might be value in
providing something similar for the GNU extensions.

Thanks,
Richard

Re: [RFC] Proposal to support Packed Boolean Vector masks.

Reply via email to