On Wed, Jun 23, 2021 at 5:55 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Wed, Jun 23, 2021 at 11:41 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Wed, Jun 23, 2021 at 11:32 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, > > > > > > > > allocate > > > > > > > > MASK_REGS first since it has already been disparaged. > > > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > > > PR target/101142 > > > > > > > > * config/i386/i386.md: (*anddi_1): Disparage slightly > > > > > > > > the mask > > > > > > > > register alternative. > > > > > > > > (*and<mode>_1): Ditto. > > > > > > > > (*andqi_1): Ditto. > > > > > > > > (*andn<mode>_1): Ditto. > > > > > > > > (*<code><mode>_1): Ditto. > > > > > > > > (*<code>qi_1): Ditto. > > > > > > > > (*one_cmpl<mode>2_1): Ditto. > > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > * config/i386/i386.c (x86_order_regs_for_local_alloc): > > > > > > > > Change > > > > > > > > the order of mask registers to be before general > > > > > > > > registers. > > > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > > > PR target/101142 > > > > > > > > * gcc.target/i386/spill_to_mask-1.c: Adjust testcase. > > > > > > > > * gcc.target/i386/spill_to_mask-2.c: Adjust testcase. > > > > > > > > * gcc.target/i386/spill_to_mask-3.c: Adjust testcase. > > > > > > > > * gcc.target/i386/spill_to_mask-4.c: Adjust testcase. > > > > > > > > > > > > > > OK with a comment addition, see inline. > > > > > > > > > > > > > > Thanks, > > > > > > > Uros. > > > > > > > > > > > > > > > --- > > > > > > > > gcc/config/i386/i386.c | 8 +- > > > > > > > > gcc/config/i386/i386.md | 20 ++--- > > > > > > > > .../gcc.target/i386/spill_to_mask-1.c | 89 > > > > > > > > +++++++++++++------ > > > > > > > > .../gcc.target/i386/spill_to_mask-2.c | 11 ++- > > > > > > > > .../gcc.target/i386/spill_to_mask-3.c | 11 ++- > > > > > > > > .../gcc.target/i386/spill_to_mask-4.c | 11 ++- > > > > > > > > 6 files changed, 91 insertions(+), 59 deletions(-) > > > > > > > > > > > > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > > > > > > > > index a61255857ff..a651853ca3b 100644 > > > > > > > > --- a/gcc/config/i386/i386.c > > > > > > > > +++ b/gcc/config/i386/i386.c > > > > > > > > @@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void) > > > > > > > > int pos = 0; > > > > > > > > int i; > > > > > > > > > > > > > > > > + /* Mask register. */ > > > > > > > > + for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++) > > > > > > > > + reg_alloc_order [pos++] = i; > > > > > > > > > > > > > > Please add a comment why mask registers should come first. > > > > > > Thanks for the review, this is the patch i'm check in. > > > > > > > > > > This patch again caused unwanted mask instructions with -m32 in cpuid > > > > > check code, e.g.: > > > > > > > > > > Running target unix/-m32 > > > > > FAIL: gcc.target/i386/avx512bw-pr70329-1.c execution test > > > > > FAIL: gcc.target/i386/pr96814.c execution test > > > > > > > > > > Debugging pr96814 failure: > > > > > > > > > > 0x0804921d <+66>: mov %edx,%ecx > > > > > 0x0804921f <+68>: cpuid > > > > > => 0x08049221 <+70>: kmovd %edx,%k0 > > > > > 0x08049225 <+74>: mov %eax,-0x8(%ebp) > > > > > 0x08049228 <+77>: mov %ebx,-0xc(%ebp) > > > > > > > > > > It looks to me that putting mask registers in front of GPR is looking > > > > So it's not functionality but performance issue here, under 32-bit > > > > mode there are only 8 gprs which result in higher register pressure, > > > > and for this we do have mask->integer and integer->mask cost, with > > > > -mtune=bdver1 where cost of kmovd is quite high(16, 20 /* > > > > mask->integer and integer->mask moves */), there's no mask > > > > instructions in cpuid. > > > > I guess we can adjust mask->integer and integer->mask for 32-bit mode > > > > to avoid such a situation? > > > I notice the default option is O0, with -O there's no mask instructions. > > > IMHO, We don't need to change cost unless there's -O2 cases where mask > > > instructions regress performance here. > > > > No, this reasoning is not acceptable. The compiled code will SIGILL on > > targets where unsupported mask registers are involved, so GPR should > > always have priority compared to mask registers. Based on these > > findings, x86_order_regs_for_local_alloc change should be reverted, > > and register move costs compensated elsewhere. > > Longterm, I think that introducing vector BImode and VxBI vectors > should solve this issue. This would make a clear cut between mask and > GPR operations. I think that generic vector infrastructure supports > BImode vectors, so using e.g. "a & b" instead of builtins should work > well, too. > > This approach would also avoid costly moves between register sets. > Agreeing with the long term part, I've opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101185 to record this regression.
> Uros. -- BR, Hongtao