On Tue, Jun 25, 2019 at 7:55 AM Jeff Law <l...@redhat.com> wrote:
>
> On 6/25/19 8:34 AM, H.J. Lu wrote:
> > On Tue, Jun 25, 2019 at 12:58 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> >>
> >> On 6/25/19, Hongtao Liu <crazy...@gmail.com> wrote:
> >>> On Sat, Jun 22, 2019 at 3:38 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> >>>>
> >>>> On Fri, Jun 21, 2019 at 8:38 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >>>>
> >>>>>>>>>>>>>>>>> +/* Register pair.  */
> >>>>>>>>>>>>>>>>> +VECTOR_MODES_WITH_PREFIX (P, INT, 2); /* P2QI
> >>>>>>>>>>>>>>>>> */
> >>>>>>>>>>>>>>>>> +VECTOR_MODES_WITH_PREFIX (P, INT, 4); /* P2HI
> >>>>>>>>>>>>>>>>> P4QI */
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I think
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> INT_MODE (P2QI, 16);
> >>>>>>>>>>>>>>>>> INT_MODE (P2HI, 32);
> >>>>>>>> Why P2QI need 16 bytes but not 2 bytes?
> >>>>>>>> Same question with P2HI.
> >>>>>>>
> >>>>>>> Because we made a mistake. It should be 2 and 4, since these
> >>>>>>> arguments
> >>>>>> Then it will run into internal comiler error when building libgcc.
> >>>>>> I'm still invertigating it.
> >>>>>>> are bytes, not bits.
> >>>>>
> >>>>> I don't think we can have 2 integer modes with the same number of bytes
> >>>>> since
> >>>>> it breaks things like
> >>>>>
> >>>>> scalar_int_mode wider_mode = GET_MODE_WIDER_MODE (mode).require ();
> >>>>>
> >>>>> We can get
> >>>>>
> >>>>> (gdb) p mode
> >>>>> $2 = {m_mode = E_SImode}
> >>>>> (gdb) p wider_mode
> >>>>> $3 = {m_mode = E_P2HImode}
> >>>>> (gdb)
> >>>>>
> >>>>> Neither middle-end nor backend support it.
> >>>>
> >>>> Ouch... It looks we hit the limitation of the middle end (which should
> >>>> at least warn/error out if two modes of the same width are declared).
> >>>>
> >>>> OTOH, we can't solve this problem by using two HI/QImode registers,
> >>>> since a consecutive register pair has to be allocated It is also not
> >>>> possible to overload existing SI/HImode mode with different
> >>>> requirements w.r.t register pair allocation (e.g. sometimes the whole
> >>>> register is allocated, and sometimes a register pair is allocated).
> >>>>
> >>>> I think we have to invent something like SPECIAL_INT_MODE, which would
> >>>> avoid mode promotion functionality (basically, it should not be listed
> >>>> in mode_wider and similar arrays). This would prevent mode promotion
> >>>> issues, while it would still allow to have mode, having the same width
> >>>> as existing mode, but with special properties.
> >>>>
> >>>> I'm adding Jeff and Jakub to the discussion about SPECIAL_INT_MODE.
> >>>>
> >>>> Uros.
> >>>
> >>> Patch from H.J using PARTIAL_INT_MODE fixed this issue.
> >>>
> >>> +/* Register pair.  */
> >>> +PARTIAL_INT_MODE (HI, 16, P2QI);
> >>> +PARTIAL_INT_MODE (SI, 32, P2HI);
> >>> +
> >>
> >> I don't think this approach is correct (the mode is not partial), and
> >> it could work by chance. The documentation is very brief with the
> >> details of different mode types, so let's ask middle-end and RTL
> >> experts.
> >>
> >
> > It is used by powerpc backend for similar purpose:
> >
> > :/* Replacement for TImode that only is allowed in GPRs.  We also use 
> > PTImode
> >    for quad memory atomic operations to force getting an even/odd register
> >    combination.  */
> > PARTIAL_INT_MODE (TI, 128, PTI);
> The partial modes were designed to handle things like targets with
> register sizes that aren't 2**n bits in size.  A port can certainly
> support something like SImode and PSImode side by side and they can have
> the same underlying size.
>
> Essentially the partial modes represent a mode where the compiler does
> not necessarily know the exact size, but instead knows a maximum size of
> the object.  You'll have to define suitable movXX patterns and any other
> operations you might want to perform.  THe compiler will generally not
> convert between the partial mode and any other modes without an explicit
> conversion (again it can't because it doesn't know how big the partial
> mode really is).

These are all what we need here.  We generate an instruction to set a
P2HI/P2QI register and immediately extract it to HI/QI registers.  No other
operations in P2HI/P2QI modes are generated nor needed.

[hjl@gnu-cfl-1 vp2intersect]$ cat 2.i
typedef int __v16si __attribute__ ((__vector_size__ (64)));

typedef unsigned char  __mmask8;
typedef unsigned short __mmask16;

__mmask16
foo (__v16si x, __v16si y, __mmask16 *b)
{
  __mmask16 a;
  __builtin_ia32_2intersectd512 (&a, b, x, y);
  return a;
}
[hjl@gnu-cfl-1 vp2intersect]$ make 2.s
/export/build/gnu/tools-build/gcc-intel/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-intel/build-x86_64-linux/gcc/
-mavx512vp2intersect -O2 -S 2.i
[hjl@gnu-cfl-1 vp2intersect]$ cat 2.s
.file "2.i"
.text
.p2align 4
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
vp2intersectd %zmm1, %zmm0, %k0
kmovw %k0, %eax
kmovw %k1, (%rdi)
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.ident "GCC: (GNU) 10.0.0 20190620 (experimental)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-1 vp2intersect]$


> I don't see anything inherently wrong with using the partial modes, but
> we need to be aware that they're not stressed all that hard and we could
> well run into under-specified cases and missed optimizations.
> Jeff



-- 
H.J.

Reply via email to