Richard Biener <richard.guent...@gmail.com> writes: > On Fri, Jun 28, 2024 at 2:16 PM Richard Biener > <richard.guent...@gmail.com> wrote: >> >> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener >> <richard.guent...@gmail.com> wrote: >> > >> > >> > >> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford >> > > <richard.sandif...@arm.com>: >> > > >> > > Richard Biener <richard.guent...@gmail.com> writes: >> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener >> > >>> <richard.guent...@gmail.com> wrote: >> > >>> >> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao....@intel.com> wrote: >> > >>>> >> > >>>> for the testcase in the PR115406, here is part of the dump. >> > >>>> >> > >>>> char D.4882; >> > >>>> vector(1) <signed-boolean:8> _1; >> > >>>> vector(1) signed char _2; >> > >>>> char _5; >> > >>>> >> > >>>> <bb 2> : >> > >>>> _1 = { -1 }; >> > >>>> >> > >>>> When assign { -1 } to vector(1} {signed-boolean:8}, >> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of >> > >>>> dest >> > >>>> with each vector elemnet. But i think the bit setting should only >> > >>>> apply for >> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). >> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1. >> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled >> > >>>> differently from vector<1> <signed-boolean:16>? >> > >>>> >> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >> > >>>> Ok for trunk? >> > >>> >> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision >> > >>> code should work for 8 bit >> > >>> entities as well, it seems we only set the LSB of each element in the >> > >>> "mask". ISTR that SVE >> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so >> > >>> maybe that's why >> > >>> <= BITS_PER_UNIT. >> > > >> > > Yeah. >> > >> > So is it necessary that only one bit is set for SVE?
TBH I can't remember now. It matches what SVE instructions produce, and lines up with the associated RTL code (which at the time was SVE-specific). But when dealing with multibyte elements, upper predicate element bits are ignored on read, so matching the instructions might not matter. >> > >>> So maybe instead of just setting one bit in >> > >>> >> > >>> ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); >> > >>> >> > >>> we should set elt_bits bits, aka (without testing) >> > >>> >> > >>> ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit >> > >>> % BITS_PER_UNIT); >> > >>> >> > >>> ? >> > >> >> > >> Alternatively >> > >> >> > >> if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) >> > >> && TYPE_PRECISION (itype) <= BITS_PER_UNIT) >> > >> >> > >> should be amended with >> > >> >> > >> && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT >> > > >> > > How about: >> > > >> > > if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL) >> > > { >> > > gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT); >> > > >> > > ? >> > >> > Note the path is also necessary for avx512 and gcn mask modes which are >> > integer modes. >> > >> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially >> > > since it can change with the target attribute? (At least TYPE_MODE_RAW >> > > would be stable.) >> > >> > That’s a good question and also related to GCC vector extension which can >> > result in both BLKmode and integer modes to be used. But I’m not sure how >> > we expose masks to the middle end here. A too large vector bool could be >> > lowered to AVX512 mode. Maybe we should simply reject interpret/encode of >> > BLKmode vectors and make sure to never assign integer modes to vector >> > bools (if the target didn’t specify that mode)? >> > >> > I guess some test coverage would be nice here. >> >> To continue on that, we do not currently have a way to capture a >> vector comparison output >> but the C++ frontend has vector ?: >> >> typedef int v8si __attribute__((vector_size(32))); >> >> void foo (v8si *a, v8si *b, v8si *c) >> { >> *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0}; >> } >> >> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled >> that becomes <singed-boolean:1>. When we enlarge the vector to size 128 >> then even with AVX512 enabled I see <signed-boolean:32> here and >> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe >> just a missed optimization). But note that to decompose this into two >> AVX512 vectors the temporary would have to change from <signed-boolean:32> >> elements to <signed-boolean:1>. >> >> The not supported vector bool types have BLKmode sofar. >> >> But for example on i?86-linux with -mno-sse (like -march=i586) for >> >> typedef short v2hi __attribute__((vector_size(4))); >> >> void foo (v2hi *a, v2hi *b, v2hi *c) >> { >> *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0}; >> } >> >> we get a SImode vector <signed-boolean:16> as I feared. That means >> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous >> between SVE (bool for a 8byte data vector) and emulated vectors >> ("word-mode" vectors; for 1byte data vectors). >> >> And without knowing that SVE would have used VnBImode given that >> AVX512 uses an integer mode. >> >> Aside from the too large vector and AVX512 issue above I think we can use >> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we >> can assert the mode is a scalar integer mode (AVX512 or GCN)? > > So like the attached? Can you give me a few days to see whether swapping to -1 : 0 masks works for SVE? I guess it would make things easier in the long run. Thanks, Richard