Richard Biener <richard.guent...@gmail.com> writes:
> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
> <richard.guent...@gmail.com> wrote:
>>
>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
>> <richard.guent...@gmail.com> wrote:
>> >
>> >
>> >
>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford 
>> > > <richard.sandif...@arm.com>:
>> > >
>> > > Richard Biener <richard.guent...@gmail.com> writes:
>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
>> > >>> <richard.guent...@gmail.com> wrote:
>> > >>>
>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao....@intel.com> wrote:
>> > >>>>
>> > >>>> for the testcase in the PR115406, here is part of the dump.
>> > >>>>
>> > >>>>  char D.4882;
>> > >>>>  vector(1) <signed-boolean:8> _1;
>> > >>>>  vector(1) signed char _2;
>> > >>>>  char _5;
>> > >>>>
>> > >>>>  <bb 2> :
>> > >>>>  _1 = { -1 };
>> > >>>>
>> > >>>> When assign { -1 } to vector(1} {signed-boolean:8},
>> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of 
>> > >>>> dest
>> > >>>> with each vector elemnet. But i think the bit setting should only 
>> > >>>> apply for
>> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
>> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
>> > >>>> differently from vector<1> <signed-boolean:16>?
>> > >>>>
>> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> > >>>> Ok for trunk?
>> > >>>
>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>> > >>> code should work for 8 bit
>> > >>> entities as well, it seems we only set the LSB of each element in the
>> > >>> "mask".  ISTR that SVE
>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
>> > >>> maybe that's why
>> > >>> <= BITS_PER_UNIT.
>> > >
>> > > Yeah.
>> >
>> > So is it necessary that only one bit is set for SVE?

TBH I can't remember now.  It matches what SVE instructions produce, and
lines up with the associated RTL code (which at the time was SVE-specific).
But when dealing with multibyte elements, upper predicate element bits
are ignored on read, so matching the instructions might not matter.

>> > >>> So maybe instead of just setting one bit in
>> > >>>
>> > >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>> > >>>
>> > >>> we should set elt_bits bits, aka (without testing)
>> > >>>
>> > >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>> > >>> % BITS_PER_UNIT);
>> > >>>
>> > >>> ?
>> > >>
>> > >> Alternatively
>> > >>
>> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>> > >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>> > >>
>> > >> should be amended with
>> > >>
>> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
>> > >
>> > > How about:
>> > >
>> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
>> > >    {
>> > >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
>> > >
>> > > ?
>> >
>> > Note the path is also necessary for avx512 and gcn mask modes which are 
>> > integer modes.
>> >
>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
>> > > since it can change with the target attribute?  (At least TYPE_MODE_RAW
>> > > would be stable.)
>> >
>> > That’s a good question and also related to GCC vector extension which can 
>> > result in both BLKmode and integer modes to be used.  But I’m not sure how 
>> > we expose masks to the middle end here.  A too large vector bool could be 
>> > lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of 
>> > BLKmode vectors and make sure to never assign integer modes to vector 
>> > bools (if the target didn’t specify that mode)?
>> >
>> > I guess some test coverage would be nice here.
>>
>> To continue on that, we do not currently have a way to capture a
>> vector comparison output
>> but the C++ frontend has vector ?:
>>
>> typedef int v8si __attribute__((vector_size(32)));
>>
>> void foo (v8si *a, v8si *b, v8si *c)
>> {
>>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
>> }
>>
>> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
>> that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
>> then even with AVX512 enabled I see <signed-boolean:32> here and
>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
>> just a missed optimization).  But note that to decompose this into two
>> AVX512 vectors the temporary would have to change from <signed-boolean:32>
>> elements to <signed-boolean:1>.
>>
>> The not supported vector bool types have BLKmode sofar.
>>
>> But for example on i?86-linux with -mno-sse (like -march=i586) for
>>
>> typedef short v2hi __attribute__((vector_size(4)));
>>
>> void foo (v2hi *a, v2hi *b, v2hi *c)
>> {
>>   *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
>> }
>>
>> we get a SImode vector <signed-boolean:16> as I feared.  That means
>> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
>> between SVE (bool for a 8byte data vector) and emulated vectors
>> ("word-mode" vectors; for 1byte data vectors).
>>
>> And without knowing that SVE would have used VnBImode given that
>> AVX512 uses an integer mode.
>>
>> Aside from the too large vector and AVX512 issue above I think we can use
>> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
>> can assert the mode is a scalar integer mode (AVX512 or GCN)?
>
> So like the attached?

Can you give me a few days to see whether swapping to -1 : 0 masks
works for SVE?  I guess it would make things easier in the long run.

Thanks,
Richard

Reply via email to