This patch fixes a subtle bug in the depths of GCC's synth_mult,
where the middle-end queries whether (how well) the target supports
widening and highpart multiplications by calling targetm.rtx_costs.
The code in init_expmed and init_expmed_one_mode iterates over various
RTL patterns querying the cost of each.  To avoid generating & garbage
collecting too much junk, it reuses the same RTL over and over, but
adjusting the modes between each call.

Alas this reuse of state is a little fragile, and at some point a
change to init_expmed_one_conv has resulted in the state (mode of
a register) being changed, but not reset before being used again.

The fallout is that GCC currently queries the backend for the cost
of non-sense RTL such as:

(mult:HI (zero_extend:HI (reg:TI 82))
    (zero_extend:HI (reg:TI 82)))

and

(lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82))
        (zero_extend:HI (reg:TI 82)))
    (const_int 8 [0x8]))

The fix is to set the mode of the register back to its assumed
state, as (reg:QI 82) in the above patterns makes much more sense.

Using the old software engineering/defensive programming maxim of
"why fix a bug just once, if it can be fixed in multiple places",
this patch both restores the original value in init_expmed_one_conv,
and also sets it to the expected value in init_expmed_one_mode.
This should hopefully signal the need to be careful of invariants for
anyone modifying this code in future.

Alas things are rarely simple...  Fixing this obviously incorrect
logic causes a failure of gcc.target/i386/pr71321.c that tests for
a particular expansion from synth_mult.  The issue here is that this
test is checking for a specific multiplication expansion, when it
should really be checking that we don't generate the inefficient
"leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as
reported in the PR target/71321.  Now that we use correct costs,
GCC uses a multiply instruction matching icc, LLVM and GCC prior
to v4.8.  I've even microbenchmarked this function (over several
minutes) with (disappointingly) no difference in timings.  Three
dependent leas has 3-cycle latency, exactly the same as a widening
byte multiply (on Haswell), so the shorter code splits the tie.
[I have a follow-up patch that may improve things further].

Before:
        movzbl  %dil, %eax
        leal    (%rax,%rax,4), %edx
        leal    (%rax,%rdx,8), %edx
        leal    (%rdx,%rdx,4), %edx
        shrw    $11, %dx
        leal    (%rdx,%rdx,4), %eax
        ...

After:
        movl    $-51, %edx
        movl    %edx, %eax
        mulb    %dil
        movl    %eax, %edx
        shrw    $11, %dx
        leal    (%rdx,%rdx,4), %eax
        ...


This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures.
Ok for mainline?


2020-08-09  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
        * expmed.c (init_expmed_one_conv): Restore all->reg's mode.
        (init_expmed_one_mode): Set all->reg to desired mode.

gcc/testsuite/ChangeLog
        PR target/71321
        * gcc.target/i386/pr71321.c: Check that the code doesn't use
        the 4B zero displacement lea, not that it uses lea.

Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3d2d234..d34f0fb 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -155,6 +155,8 @@ init_expmed_one_conv (struct init_expmed_rtl *all, 
scalar_int_mode to_mode,
   PUT_MODE (all->reg, from_mode);
   set_convert_cost (to_mode, from_mode, speed,
                    set_src_cost (which, to_mode, speed));
+  /* Restore all->reg's mode.  */
+  PUT_MODE (all->reg, to_mode);
 }
 
 static void
@@ -229,6 +231,7 @@ init_expmed_one_mode (struct init_expmed_rtl *all,
       if (GET_MODE_CLASS (int_mode_to) == MODE_INT
          && GET_MODE_WIDER_MODE (int_mode_to).exists (&wider_mode))
        {
+         PUT_MODE (all->reg, mode);
          PUT_MODE (all->zext, wider_mode);
          PUT_MODE (all->wide_mult, wider_mode);
          PUT_MODE (all->wide_lshr, wider_mode);
diff --git a/gcc/testsuite/gcc.target/i386/pr71321.c 
b/gcc/testsuite/gcc.target/i386/pr71321.c
index 86ad812..24d144b 100644
--- a/gcc/testsuite/gcc.target/i386/pr71321.c
+++ b/gcc/testsuite/gcc.target/i386/pr71321.c
@@ -12,5 +12,4 @@ unsigned cvt_to_2digit_ascii(uint8_t i)
 {
   return cvt_to_2digit(i, 10) + 0x0a3030;
 }
-/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 3 
} } */
-/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,8" 1 
} } */
+/* { dg-final { scan-assembler-not "lea.*0" } } */

Reply via email to