On Tue, Jun 25, 2019 at 7:55 AM Jeff Law <l...@redhat.com> wrote: > > On 6/25/19 8:34 AM, H.J. Lu wrote: > > On Tue, Jun 25, 2019 at 12:58 AM Uros Bizjak <ubiz...@gmail.com> wrote: > >> > >> On 6/25/19, Hongtao Liu <crazy...@gmail.com> wrote: > >>> On Sat, Jun 22, 2019 at 3:38 PM Uros Bizjak <ubiz...@gmail.com> wrote: > >>>> > >>>> On Fri, Jun 21, 2019 at 8:38 PM H.J. Lu <hjl.to...@gmail.com> wrote: > >>>> > >>>>>>>>>>>>>>>>> +/* Register pair. */ > >>>>>>>>>>>>>>>>> +VECTOR_MODES_WITH_PREFIX (P, INT, 2); /* P2QI > >>>>>>>>>>>>>>>>> */ > >>>>>>>>>>>>>>>>> +VECTOR_MODES_WITH_PREFIX (P, INT, 4); /* P2HI > >>>>>>>>>>>>>>>>> P4QI */ > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I think > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> INT_MODE (P2QI, 16); > >>>>>>>>>>>>>>>>> INT_MODE (P2HI, 32); > >>>>>>>> Why P2QI need 16 bytes but not 2 bytes? > >>>>>>>> Same question with P2HI. > >>>>>>> > >>>>>>> Because we made a mistake. It should be 2 and 4, since these > >>>>>>> arguments > >>>>>> Then it will run into internal comiler error when building libgcc. > >>>>>> I'm still invertigating it. > >>>>>>> are bytes, not bits. > >>>>> > >>>>> I don't think we can have 2 integer modes with the same number of bytes > >>>>> since > >>>>> it breaks things like > >>>>> > >>>>> scalar_int_mode wider_mode = GET_MODE_WIDER_MODE (mode).require (); > >>>>> > >>>>> We can get > >>>>> > >>>>> (gdb) p mode > >>>>> $2 = {m_mode = E_SImode} > >>>>> (gdb) p wider_mode > >>>>> $3 = {m_mode = E_P2HImode} > >>>>> (gdb) > >>>>> > >>>>> Neither middle-end nor backend support it. > >>>> > >>>> Ouch... It looks we hit the limitation of the middle end (which should > >>>> at least warn/error out if two modes of the same width are declared). > >>>> > >>>> OTOH, we can't solve this problem by using two HI/QImode registers, > >>>> since a consecutive register pair has to be allocated It is also not > >>>> possible to overload existing SI/HImode mode with different > >>>> requirements w.r.t register pair allocation (e.g. sometimes the whole > >>>> register is allocated, and sometimes a register pair is allocated). > >>>> > >>>> I think we have to invent something like SPECIAL_INT_MODE, which would > >>>> avoid mode promotion functionality (basically, it should not be listed > >>>> in mode_wider and similar arrays). This would prevent mode promotion > >>>> issues, while it would still allow to have mode, having the same width > >>>> as existing mode, but with special properties. > >>>> > >>>> I'm adding Jeff and Jakub to the discussion about SPECIAL_INT_MODE. > >>>> > >>>> Uros. > >>> > >>> Patch from H.J using PARTIAL_INT_MODE fixed this issue. > >>> > >>> +/* Register pair. */ > >>> +PARTIAL_INT_MODE (HI, 16, P2QI); > >>> +PARTIAL_INT_MODE (SI, 32, P2HI); > >>> + > >> > >> I don't think this approach is correct (the mode is not partial), and > >> it could work by chance. The documentation is very brief with the > >> details of different mode types, so let's ask middle-end and RTL > >> experts. > >> > > > > It is used by powerpc backend for similar purpose: > > > > :/* Replacement for TImode that only is allowed in GPRs. We also use > > PTImode > > for quad memory atomic operations to force getting an even/odd register > > combination. */ > > PARTIAL_INT_MODE (TI, 128, PTI); > The partial modes were designed to handle things like targets with > register sizes that aren't 2**n bits in size. A port can certainly > support something like SImode and PSImode side by side and they can have > the same underlying size. > > Essentially the partial modes represent a mode where the compiler does > not necessarily know the exact size, but instead knows a maximum size of > the object. You'll have to define suitable movXX patterns and any other > operations you might want to perform. THe compiler will generally not > convert between the partial mode and any other modes without an explicit > conversion (again it can't because it doesn't know how big the partial > mode really is).
These are all what we need here. We generate an instruction to set a P2HI/P2QI register and immediately extract it to HI/QI registers. No other operations in P2HI/P2QI modes are generated nor needed. [hjl@gnu-cfl-1 vp2intersect]$ cat 2.i typedef int __v16si __attribute__ ((__vector_size__ (64))); typedef unsigned char __mmask8; typedef unsigned short __mmask16; __mmask16 foo (__v16si x, __v16si y, __mmask16 *b) { __mmask16 a; __builtin_ia32_2intersectd512 (&a, b, x, y); return a; } [hjl@gnu-cfl-1 vp2intersect]$ make 2.s /export/build/gnu/tools-build/gcc-intel/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-intel/build-x86_64-linux/gcc/ -mavx512vp2intersect -O2 -S 2.i [hjl@gnu-cfl-1 vp2intersect]$ cat 2.s .file "2.i" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc vp2intersectd %zmm1, %zmm0, %k0 kmovw %k0, %eax kmovw %k1, (%rdi) ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 10.0.0 20190620 (experimental)" .section .note.GNU-stack,"",@progbits [hjl@gnu-cfl-1 vp2intersect]$ > I don't see anything inherently wrong with using the partial modes, but > we need to be aware that they're not stressed all that hard and we could > well run into under-specified cases and missed optimizations. > Jeff -- H.J.