On Wed, Jun 23, 2021 at 4:50 PM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Wed, Jun 23, 2021 at 3:59 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Mon, Jun 21, 2021 at 10:08 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > On Mon, Jun 21, 2021 at 3:28 PM Uros Bizjak via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > On Mon, Jun 21, 2021 at 6:56 AM liuhongt <hongtao....@intel.com> wrote:
> > > > >
> > > > > The avx512 supports bitwise operations with mask registers, but the
> > > > > throughput of those instructions is much lower than that of the
> > > > > corresponding gpr version, so we would additionally disparages
> > > > > slightly the mask register alternative for bitwise operations in the
> > > > > LRA.
> > > >
> > > > This was the reason for UNSPEC tagged instructions with mask
> > > > registers, used mainly for builtins.
> > > >
> > > > Also, care should be taken if we want mask registers to be used under
> > > > GPR pressure, or it is better to spill GPR registers. In the past,
> > > > DImode values caused a lot of problems with MMX registers on x86-64,
> > > > but we were able to hand-tune the allocator in the way you propose.
> > > >
> > > > Let's try the proposed approach to see what happens.
> > > >
> > > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, allocate
> > > > > MASK_REGS first since it has already been disparaged.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         PR target/101142
> > > > >         * config/i386/i386.md: (*anddi_1): Disparage slightly the mask
> > > > >         register alternative.
> > > > >         (*and<mode>_1): Ditto.
> > > > >         (*andqi_1): Ditto.
> > > > >         (*andn<mode>_1): Ditto.
> > > > >         (*<code><mode>_1): Ditto.
> > > > >         (*<code>qi_1): Ditto.
> > > > >         (*one_cmpl<mode>2_1): Ditto.
> > > > >         (*one_cmplsi2_1_zext): Ditto.
> > > > >         (*one_cmplqi2_1): Ditto.
> > > > >         * config/i386/i386.c (x86_order_regs_for_local_alloc): Change
> > > > >         the order of mask registers to be before general registers.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         PR target/101142
> > > > >         * gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
> > > > >         * gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
> > > > >         * gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
> > > > >         * gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
> > > >
> > > > OK with a comment addition, see inline.
> > > >
> > > > Thanks,
> > > > Uros.
> > > >
> > > > > ---
> > > > >  gcc/config/i386/i386.c                        |  8 +-
> > > > >  gcc/config/i386/i386.md                       | 20 ++---
> > > > >  .../gcc.target/i386/spill_to_mask-1.c         | 89 
> > > > > +++++++++++++------
> > > > >  .../gcc.target/i386/spill_to_mask-2.c         | 11 ++-
> > > > >  .../gcc.target/i386/spill_to_mask-3.c         | 11 ++-
> > > > >  .../gcc.target/i386/spill_to_mask-4.c         | 11 ++-
> > > > >  6 files changed, 91 insertions(+), 59 deletions(-)
> > > > >
> > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > index a61255857ff..a651853ca3b 100644
> > > > > --- a/gcc/config/i386/i386.c
> > > > > +++ b/gcc/config/i386/i386.c
> > > > > @@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
> > > > >     int pos = 0;
> > > > >     int i;
> > > > >
> > > > > +   /* Mask register.  */
> > > > > +   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> > > > > +     reg_alloc_order [pos++] = i;
> > > >
> > > > Please add a comment why mask registers should come first.
> > > Thanks for the review, this is the patch i'm check in.
> >
> > This patch again caused unwanted mask instructions with -m32 in cpuid
> > check code, e.g.:
> >
> > Running target unix/-m32
> > FAIL: gcc.target/i386/avx512bw-pr70329-1.c execution test
> > FAIL: gcc.target/i386/pr96814.c execution test
> >
> > Debugging pr96814 failure:
> >
> >   0x0804921d <+66>:    mov    %edx,%ecx
> >   0x0804921f <+68>:    cpuid
> > => 0x08049221 <+70>:    kmovd  %edx,%k0
> >   0x08049225 <+74>:    mov    %eax,-0x8(%ebp)
> >   0x08049228 <+77>:    mov    %ebx,-0xc(%ebp)
> >
> > It looks to me that putting mask registers in front of GPR is looking
> So it's not functionality but performance issue here, under 32-bit
> mode there are only 8 gprs which result in higher register pressure,
> and for this we do have mask->integer and integer->mask cost, with
> -mtune=bdver1 where cost of kmovd is quite high(16, 20 /*
> mask->integer and integer->mask moves */), there's no mask
> instructions in cpuid.
> I guess we can adjust mask->integer and integer->mask for 32-bit mode
> to avoid such a situation?
I notice the default option is O0, with -O there's no mask instructions.
IMHO, We don't need to change cost unless there's -O2 cases where mask
instructions regress performance here.
> > for trouble, since apparently disparaging mask registers does not
> > offset register preference of the new order.
> Hmm, w/o order preference,it will fail bitwise_mask_op-3.c.(costs of
> gpr and mask register are equal here) and according to gcc documents,
> “?k" is minimal increase in cost of the mask register(one unit more
> costly).
> The cost model is always difficult in balancing these, as you said
> fine tune cost model is always looking for trouble.
> >
> > Uros.
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

Reply via email to