Hi there. The architecture I'm working is a 32 bit, word based machine with a 16x16 -> 32 unsigned multiply. For some reason the combine stage is converting the umulhisi3 into a mulsi3 and I'm not sure how to track this down.
The test code is part of an alpha blend:
void blend(uint8_t* sb, uint8_t* db)
{
uint16_t ia = 256 - *sb;
uint16_t d = *db;
*db = ((d * ia) >> 8) + *sb;
}
I've define the different multiplies in the .md file:
(define_insn "umulhisi3"
[(set (match_operand:SI 0 "register_operand" "=r")
(mult:SI (zero_extend:SI
(match_operand:HI 1 "register_operand" "%r"))
(zero_extend:SI
(match_operand:HI 2 "register_operand" "r"))))]
""
...
(define_insn "mulsi3"
[(set (match_operand:SI 0 "register_operand" "=r")
(mult:SI (match_operand:SI 1 "register_operand" "%r")
(match_operand:SI 2 "register_operand" "r")))]
""
...
Running at -O level optimisations gives the following in
umul.157r.outof_cfglayout, just before the combine stage:
---
(insn 3 6 4 2 umul.c:16 (set (reg/v/f:SI 28 [ sb ])
(reg:SI 0 R10 [ sb ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 0
R10 [ sb ])
(nil)))
(insn 4 3 5 2 umul.c:16 (set (reg/v/f:SI 29 [ db ])
(reg:SI 1 R11 [ db ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 1
R11 [ db ])
(nil)))
(note 5 4 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 5 9 2 umul.c:17 (set (reg:SI 26 [ D.1217 ])
(zero_extend:SI (mem:QI (reg/v/f:SI 28 [ sb ]) [0 S1 A8]))) 27
{zero_extendqisi2} (expr_list:REG_DEAD (reg/v/f:SI 28 [ sb ])
(nil)))
(insn 9 8 10 2 umul.c:20 (set (reg:HI 30)
(const_int 256 [0x100])) 1 {movhi_insn} (nil))
(insn 10 9 11 2 umul.c:20 (set (reg:SI 31)
(minus:SI (subreg:SI (reg:HI 30) 0)
(reg:SI 26 [ D.1217 ]))) 12 {subsi3} (expr_list:REG_DEAD (reg:HI 30)
(nil)))
(insn 11 10 12 2 umul.c:20 (set (reg:SI 33)
(zero_extend:SI (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8]))) 27
{zero_extendqisi2} (nil))
(insn 12 11 13 2 umul.c:20 (set (reg:HI 32)
(subreg:HI (reg:SI 33) 0)) 1 {movhi_insn} (expr_list:REG_DEAD
(reg:SI 33)
(nil)))
(insn 13 12 14 2 umul.c:20 (set (reg:SI 34)
(mult:SI (zero_extend:SI (reg:HI 32))
(zero_extend:SI (subreg:HI (reg:SI 31) 0)))) 14
{umulhisi3} (expr_list:REG_DEAD (reg:HI 32)
(expr_list:REG_DEAD (reg:SI 31)
(nil))))
(insn 14 13 15 2 umul.c:20 (set (reg:SI 35)
(ashiftrt:SI (reg:SI 34)
(const_int 8 [0x8]))) 21 {ashrsi3_const}
(expr_list:REG_DEAD (reg:SI 34)
(nil)))
(insn 15 14 16 2 umul.c:20 (set (reg:QI 36)
(subreg:QI (reg:SI 35) 0)) 0 {movqi_insn} (expr_list:REG_DEAD
(reg:SI 35)
(nil)))
(insn 16 15 17 2 umul.c:20 (set (reg:SI 37)
(plus:SI (reg:SI 26 [ D.1217 ])
(subreg:SI (reg:QI 36) 0))) 11 {addsi3}
(expr_list:REG_DEAD (reg:QI 36)
(expr_list:REG_DEAD (reg:SI 26 [ D.1217 ])
(nil))))
(insn 17 16 0 2 umul.c:20 (set (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8])
(subreg:QI (reg:SI 37) 0)) 0 {movqi_insn} (expr_list:REG_DEAD
(reg:SI 37)
(expr_list:REG_DEAD (reg/v/f:SI 29 [ db ])
(nil))))
---
The umulhisi3 has been correctly found and used at this stage. In the
following combine stage however, it gets converted into a mulsi3. The
.combine dump is attached.
The xtensa port is the closest match I can find as it is 32 bit, word
based, and has the umulhisi3. It correctly keeps the 16 bit multiply.
Some other test cases like:
uint32_t mul(uint16_t a, uint16_t b)
{
return a*b;
}
come through fine. It might be something to do with the memory access.
How does the combine stage work? It looks like it could get multiple
potential matches for a set of RTLs. Does it use some type of costing
function to pick between them? Can I tell combine that a umulhisi3 is
cheaper than a mulsi3?
Thanks for the earlier help on the post reload split to use the
accumulator - it's working well.
-- Michael
umul.i.159r.combine
Description: Binary data
