This patch fixes a subtle bug in the depths of GCC's synth_mult,
where the middle-end queries whether (how well) the target supports
widening and highpart multiplications by calling targetm.rtx_costs.
The code in init_expmed and init_expmed_one_mode iterates over various
RTL patterns querying the cost of each. To avoid generating & garbage
collecting too much junk, it reuses the same RTL over and over, but
adjusting the modes between each call.
Alas this reuse of state is a little fragile, and at some point a
change to init_expmed_one_conv has resulted in the state (mode of
a register) being changed, but not reset before being used again.
The fallout is that GCC currently queries the backend for the cost
of non-sense RTL such as:
(mult:HI (zero_extend:HI (reg:TI 82))
(zero_extend:HI (reg:TI 82)))
and
(lshiftrt:HI (mult:HI (zero_extend:HI (reg:TI 82))
(zero_extend:HI (reg:TI 82)))
(const_int 8 [0x8]))
The fix is to set the mode of the register back to its assumed
state, as (reg:QI 82) in the above patterns makes much more sense.
Using the old software engineering/defensive programming maxim of
"why fix a bug just once, if it can be fixed in multiple places",
this patch both restores the original value in init_expmed_one_conv,
and also sets it to the expected value in init_expmed_one_mode.
This should hopefully signal the need to be careful of invariants for
anyone modifying this code in future.
Alas things are rarely simple... Fixing this obviously incorrect
logic causes a failure of gcc.target/i386/pr71321.c that tests for
a particular expansion from synth_mult. The issue here is that this
test is checking for a specific multiplication expansion, when it
should really be checking that we don't generate the inefficient
"leal 0(,%rax,4), %edx" forms that were produced in GCC v6, as
reported in the PR target/71321. Now that we use correct costs,
GCC uses a multiply instruction matching icc, LLVM and GCC prior
to v4.8. I've even microbenchmarked this function (over several
minutes) with (disappointingly) no difference in timings. Three
dependent leas has 3-cycle latency, exactly the same as a widening
byte multiply (on Haswell), so the shorter code splits the tie.
[I have a follow-up patch that may improve things further].
Before:
movzbl %dil, %eax
leal (%rax,%rax,4), %edx
leal (%rax,%rdx,8), %edx
leal (%rdx,%rdx,4), %edx
shrw $11, %dx
leal (%rdx,%rdx,4), %eax
...
After:
movl $-51, %edx
movl %edx, %eax
mulb %dil
movl %eax, %edx
shrw $11, %dx
leal (%rdx,%rdx,4), %eax
...
This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures.
Ok for mainline?
2020-08-09 Roger Sayle <[email protected]>
gcc/ChangeLog
* expmed.c (init_expmed_one_conv): Restore all->reg's mode.
(init_expmed_one_mode): Set all->reg to desired mode.
gcc/testsuite/ChangeLog
PR target/71321
* gcc.target/i386/pr71321.c: Check that the code doesn't use
the 4B zero displacement lea, not that it uses lea.
Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3d2d234..d34f0fb 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -155,6 +155,8 @@ init_expmed_one_conv (struct init_expmed_rtl *all,
scalar_int_mode to_mode,
PUT_MODE (all->reg, from_mode);
set_convert_cost (to_mode, from_mode, speed,
set_src_cost (which, to_mode, speed));
+ /* Restore all->reg's mode. */
+ PUT_MODE (all->reg, to_mode);
}
static void
@@ -229,6 +231,7 @@ init_expmed_one_mode (struct init_expmed_rtl *all,
if (GET_MODE_CLASS (int_mode_to) == MODE_INT
&& GET_MODE_WIDER_MODE (int_mode_to).exists (&wider_mode))
{
+ PUT_MODE (all->reg, mode);
PUT_MODE (all->zext, wider_mode);
PUT_MODE (all->wide_mult, wider_mode);
PUT_MODE (all->wide_lshr, wider_mode);
diff --git a/gcc/testsuite/gcc.target/i386/pr71321.c
b/gcc/testsuite/gcc.target/i386/pr71321.c
index 86ad812..24d144b 100644
--- a/gcc/testsuite/gcc.target/i386/pr71321.c
+++ b/gcc/testsuite/gcc.target/i386/pr71321.c
@@ -12,5 +12,4 @@ unsigned cvt_to_2digit_ascii(uint8_t i)
{
return cvt_to_2digit(i, 10) + 0x0a3030;
}
-/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 3
} } */
-/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,8" 1
} } */
+/* { dg-final { scan-assembler-not "lea.*0" } } */