3 with a range for (unsigned)x that does not have the sign bit set seems to produce much worse code

jakub at gcc dot gnu.org via Gcc-bugs Fri, 29 Nov 2024 08:40:50 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115910


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-11-29
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So it is comparing cost of uns_insns:
(insn 15 0 16 (set (reg:DI 103)
        (zero_extend:DI (reg/v:SI 101 [ x ]))) "pr115910.c":4:23 -1
     (nil))

(insn 16 15 17 (set (reg:DI 105)
        (const_int 2863311531 [0xaaaaaaab])) "pr115910.c":4:23 -1
     (nil))

(insn 17 16 18 (parallel [
            (set (reg:DI 104)
                (mult:DI (reg:DI 103)
                    (reg:DI 105)))
            (clobber (reg:CC 17 flags))
        ]) "pr115910.c":4:23 -1
     (nil))

(insn 18 17 19 (parallel [
            (set (reg:DI 106)
                (lshiftrt:DI (reg:DI 104)
                    (const_int 32 [0x20])))
            (clobber (reg:CC 17 flags))
        ]) "pr115910.c":4:23 -1
     (nil))

(insn 19 18 0 (parallel [
            (set (reg:SI 102)
                (lshiftrt:SI (subreg:SI (reg:DI 106) 0)
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) "pr115910.c":4:23 -1
     (expr_list:REG_EQUAL (udiv:SI (reg/v:SI 101 [ x ])
            (const_int 3 [0x3]))
        (nil)))
vs. sgn_insns:
(insn 20 0 21 (set (reg:DI 107)
        (sign_extend:DI (reg/v:SI 101 [ x ]))) "pr115910.c":4:23 -1
     (nil))

(insn 21 20 22 (parallel [
            (set (reg:DI 108)
                (mult:DI (reg:DI 107)
                    (const_int 1431655766 [0x55555556])))
            (clobber (reg:CC 17 flags))
        ]) "pr115910.c":4:23 -1
     (nil))

(insn 22 21 23 (parallel [
            (set (reg:DI 109)
                (lshiftrt:DI (reg:DI 108)
                    (const_int 32 [0x20])))
            (clobber (reg:CC 17 flags))
        ]) "pr115910.c":4:23 -1
     (nil))

(insn 23 22 24 (parallel [
            (set (reg:SI 110)
                (ashiftrt:SI (reg/v:SI 101 [ x ])
                    (const_int 31 [0x1f])))
            (clobber (reg:CC 17 flags))
        ]) "pr115910.c":4:23 -1
     (nil))

(insn 24 23 0 (parallel [
            (set (reg:SI 102)
                (minus:SI (subreg:SI (reg:DI 109) 0)
                    (reg:SI 110)))
            (clobber (reg:CC 17 flags))
        ]) "pr115910.c":4:23 -1
     (expr_list:REG_EQUAL (div:SI (reg/v:SI 101 [ x ])
            (const_int 3 [0x3]))
        (nil)))

Now, when we need to load 0xaaaaaaab into DImode reg, we certainly don't need
movabsq, just movl will do.  So, I'd say both the comment and the value are
wrong:
      if (x86_64_immediate_operand (x, VOIDmode))
        *total = 0;
     else
        /* movabsq is slightly more expensive than a simple instruction. */
        *total = COSTS_N_INSNS (1) + 1;
(note, else is misindented).  Perhaps we want
      else if (x86_64_zext_immediate_operand (x, VOIDmode))
        /* movl is expensive like normal instruction.  */
        *total = COSTS_N_INSNS (1);
      else
        /* movabsq is slightly more expensive than a simple instruction. */
        *total = COSTS_N_INSNS (1) + 1;
?  Or perhaps for speed even make the zext load slightly cheaper?
Note, just COSTS_N_INSNS (1) wouldn't fix it this testcase, uns cost is 30 and
sgn cost is 28.

COSTS_N_INSNS (1) is 4, and the costs we get for the unsigned sequence are
1 (zero extend, 9 (the movabsq which is just movl), 12 (mul), 4 (one right
shift), 4 (another), while for the signed sequence are
4 (sign extend), 12 (mul), 4 (logical right shift), 4 (arithmetic right shift),
4 (minus).
>From this POV, I'd say the movl definitely shouldn't have higher cost than 4
aka COSTS_N_INSNS (1), it certainly isn't slower than subtraction or right
shift.
At least when it is the sole SET_SRC of a simple insn which sets the
destination register.  Now, the way rtx_cost works is for most rtxes like
CONST_INT, it sets
total to factor * COSTS_N_INSNS (1) aka 4 and then let's the backend override
it.
The reason we get 9 is that the above spot sets *total to COSTS_N_INSNS (1) + 1
and then
21839             if (CONSTANT_P (SET_SRC (x)))
21840               /* Constant costs assume a base value of COSTS_N_INSNS (1)
and add
21841                  a small value, possibly zero for cheap constants.  */
21842               src_cost += COSTS_N_INSNS (1);
in the SET handling adds 4 to that.
So dunno how to best express that x86_64_zext_immediate_operand (x, VOIDmode)
constants when in just a SET cost the same as normal cheap instructions but if
it appears as operand of something more complex, it isn't for free but will
most likely need another cheap instructions to load the constant.  If we get
the movl cost to 4, then
it will be 1+4+12+4+4=25 vs. 4+12+4+4+4=28 and we'll pick the cheaper sequence.

And another thing, wonder why we emit the 2 shifts, first DImode shift by 32
and then SImode shift by 1, that surely can be combined into a single DImode
shift by 33.  Combine will do it, but wonder if we could improve it already
during expansion, such that the costing would see that.

Note, for movabsq proper perhaps the cost of 2 cheap instructions plus epsilon
aka COSTS_N_INSNS (2) + 1 is right, dunno.  For -Os costs it is roughly twice
as large as movl.

[Bug target/115910] [15 Regression] ((unsigned)x)/3 with a range for (unsigned)x that does not have the sign bit set seems to produce much worse code

Reply via email to