https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115910
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2024-11-29 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- So it is comparing cost of uns_insns: (insn 15 0 16 (set (reg:DI 103) (zero_extend:DI (reg/v:SI 101 [ x ]))) "pr115910.c":4:23 -1 (nil)) (insn 16 15 17 (set (reg:DI 105) (const_int 2863311531 [0xaaaaaaab])) "pr115910.c":4:23 -1 (nil)) (insn 17 16 18 (parallel [ (set (reg:DI 104) (mult:DI (reg:DI 103) (reg:DI 105))) (clobber (reg:CC 17 flags)) ]) "pr115910.c":4:23 -1 (nil)) (insn 18 17 19 (parallel [ (set (reg:DI 106) (lshiftrt:DI (reg:DI 104) (const_int 32 [0x20]))) (clobber (reg:CC 17 flags)) ]) "pr115910.c":4:23 -1 (nil)) (insn 19 18 0 (parallel [ (set (reg:SI 102) (lshiftrt:SI (subreg:SI (reg:DI 106) 0) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) "pr115910.c":4:23 -1 (expr_list:REG_EQUAL (udiv:SI (reg/v:SI 101 [ x ]) (const_int 3 [0x3])) (nil))) vs. sgn_insns: (insn 20 0 21 (set (reg:DI 107) (sign_extend:DI (reg/v:SI 101 [ x ]))) "pr115910.c":4:23 -1 (nil)) (insn 21 20 22 (parallel [ (set (reg:DI 108) (mult:DI (reg:DI 107) (const_int 1431655766 [0x55555556]))) (clobber (reg:CC 17 flags)) ]) "pr115910.c":4:23 -1 (nil)) (insn 22 21 23 (parallel [ (set (reg:DI 109) (lshiftrt:DI (reg:DI 108) (const_int 32 [0x20]))) (clobber (reg:CC 17 flags)) ]) "pr115910.c":4:23 -1 (nil)) (insn 23 22 24 (parallel [ (set (reg:SI 110) (ashiftrt:SI (reg/v:SI 101 [ x ]) (const_int 31 [0x1f]))) (clobber (reg:CC 17 flags)) ]) "pr115910.c":4:23 -1 (nil)) (insn 24 23 0 (parallel [ (set (reg:SI 102) (minus:SI (subreg:SI (reg:DI 109) 0) (reg:SI 110))) (clobber (reg:CC 17 flags)) ]) "pr115910.c":4:23 -1 (expr_list:REG_EQUAL (div:SI (reg/v:SI 101 [ x ]) (const_int 3 [0x3])) (nil))) Now, when we need to load 0xaaaaaaab into DImode reg, we certainly don't need movabsq, just movl will do. So, I'd say both the comment and the value are wrong: if (x86_64_immediate_operand (x, VOIDmode)) *total = 0; else /* movabsq is slightly more expensive than a simple instruction. */ *total = COSTS_N_INSNS (1) + 1; (note, else is misindented). Perhaps we want else if (x86_64_zext_immediate_operand (x, VOIDmode)) /* movl is expensive like normal instruction. */ *total = COSTS_N_INSNS (1); else /* movabsq is slightly more expensive than a simple instruction. */ *total = COSTS_N_INSNS (1) + 1; ? Or perhaps for speed even make the zext load slightly cheaper? Note, just COSTS_N_INSNS (1) wouldn't fix it this testcase, uns cost is 30 and sgn cost is 28. COSTS_N_INSNS (1) is 4, and the costs we get for the unsigned sequence are 1 (zero extend, 9 (the movabsq which is just movl), 12 (mul), 4 (one right shift), 4 (another), while for the signed sequence are 4 (sign extend), 12 (mul), 4 (logical right shift), 4 (arithmetic right shift), 4 (minus). >From this POV, I'd say the movl definitely shouldn't have higher cost than 4 aka COSTS_N_INSNS (1), it certainly isn't slower than subtraction or right shift. At least when it is the sole SET_SRC of a simple insn which sets the destination register. Now, the way rtx_cost works is for most rtxes like CONST_INT, it sets total to factor * COSTS_N_INSNS (1) aka 4 and then let's the backend override it. The reason we get 9 is that the above spot sets *total to COSTS_N_INSNS (1) + 1 and then 21839 if (CONSTANT_P (SET_SRC (x))) 21840 /* Constant costs assume a base value of COSTS_N_INSNS (1) and add 21841 a small value, possibly zero for cheap constants. */ 21842 src_cost += COSTS_N_INSNS (1); in the SET handling adds 4 to that. So dunno how to best express that x86_64_zext_immediate_operand (x, VOIDmode) constants when in just a SET cost the same as normal cheap instructions but if it appears as operand of something more complex, it isn't for free but will most likely need another cheap instructions to load the constant. If we get the movl cost to 4, then it will be 1+4+12+4+4=25 vs. 4+12+4+4+4=28 and we'll pick the cheaper sequence. And another thing, wonder why we emit the 2 shifts, first DImode shift by 32 and then SImode shift by 1, that surely can be combined into a single DImode shift by 33. Combine will do it, but wonder if we could improve it already during expansion, such that the costing would see that. Note, for movabsq proper perhaps the cost of 2 cheap instructions plus epsilon aka COSTS_N_INSNS (2) + 1 is right, dunno. For -Os costs it is roughly twice as large as movl.