I have a fictitious machine which has a word size of 8-bits but can handle
16-bit adds and 16-bit mov's. I am trying to build the most efficient support
for handling an addsi3 insn. My problem is that if I try to split up the
addsi3 insn into a couple of addhi3 insns (using a define_expand template) the
compiler appears to ignore this declaration and proceeds to implement addsi3 as
a bunch of addqi's along with some carry propogation rtx's. i.e. the compiler
defaults to the word size of the machine and I can't seem to override this.
I could allow it to go and create its big long list of addqi's etc and then use
some insn combining method such as a peephole optimizer but this seems really
inefficient to me - especially when I can explicitly state how a larger insn
should be split.
If I use the following addsi3 template:
(define_insn "addsi3"
[(set (match_operand:SI 0 "general_operand" "=g")
(plus:SI (match_operand:SI 1 "general_operand" "g")
(match_operand:SI 2 "general_operand" "g")))]
""
"addsi3 %1 %2 %0 ;(%1 plus %2)->%0" )
I can observe addsi being used in the assembly output of my test case.
If I use:
(define_expand "addsi"
[(set (match_operand:SI 0 "general_operand" "=g")
(plus:SI (match_operand:SI 1 "general_operand" "g")
(match_operand:SI 2 "general_operand" "g")))]
""
"{
emit_insn (gen_addhi3 (custom_subword(operands[0], 0, SImode),
custom_subword(operands[1], 0, SImode),
custom_subword(operands[2], 0, SImode)));
emit_insn (gen_addhi3 (custom_subword(operands[0], 1, SImode),
custom_subword(operands[1], 1, SImode),
custom_subword(operands[2], 1, SImode)));
DONE;
}" )
the output becomes a mess of addqi, cmpqi, and branches.
Any help would be great.
Thanks
Marty