Splitting an immediate using define_expand
Hi, I am trying to separate move immediates that require more than 16 bits into two instructions that set the high and low 16 bits of a register respectively. Here is my define_expand: (define_expand "movsi" [(set (match_operand:SI 0 "nonimmediate_operand" "") (match_operand:SI 1 "general_operand" ""))] "" { if(GET_CODE(operands[0]) != REG){ operands[1] = force_reg(SImode, operands[1]); } else if(CONSTANT_P(operands[1]) && (-0x8000 > INTVAL(operands[1]) || INTVAL(operands[1]) > 0x7FFF)){ emit_move_insn(gen_rtx_SUBREG(HImode, operands[0], 0), gen_rtx_TRUNCATE(HImode, operands[1])); emit_move_insn(gen_rtx_SUBREG(HImode, operands[0], 2), gen_rtx_TRUNCATE(HImode, gen_rtx_LSHIFTRT(SImode, operands[1], 16))); DONE; } } ) and here are the patterns I intend to match: (define_insn "movsi_subreg0" [(set (subreg:HI (match_operand:SI 0 "register_operand" "=r") 0) (truncate:HI (match_operand:SI 1 "immediate_operand" "i")))] "" "%0.l = #LO(%1)" ) (define_insn "movsi_subreg1" [(set (subreg:HI (match_operand:SI 0 "register_operand" "=r") 2) (truncate:HI (lshiftrt:SI (match_operand:SI 1 "immediate_operand" "i") (const_int 16] "" "%0.h = #HI(%1)" ) When I try to build, the cross compiler gives the following error: ./crtstuff.c: In function `__do_global_dtors_aux': ./crtstuff.c:261: internal compiler error: in expand_simple_unop, at optabs.c:2291 Why does this not work? I think the truncate expression is the problem. Is this not the way to use truncate? Thank you, Charles J. Tabony
scheduling insn on none
Hi, I am trying to add instruction scheduling to a machine description. I added everything I think I need and the .dfa looks right to me, but when I compile with -fsched-verbose=10 I get something that looks like this: ;; == ;; -- basic block 0 from 9 to 83 -- before reload ;; == ;; --- forward dependences: ;; --- Region Dependences --- b 0 bb 0 ;; insn codebb dep prio cost blockage units ;; -- --- - ;;917 0 012 10 - 0 none : 83 10 ;; 1062 0 111 10 - 0 none : 83 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 ;; 1236 0 110 10 - 0 none : 83 13 ;; 1317 0 2 9 10 - 0 none : 83 33 ;; 1436 0 110 10 - 0 none : 83 15 ;; 1517 0 2 9 10 - 0 none : 83 33 ;; 1636 0 110 10 - 0 none : 83 17 (snip) ;; Ready list after queue_to_ready:19 32 17 15 13 30 28 26 24 22 20 ;; Ready list after ready_sort:32 19 17 15 13 30 28 26 24 22 20 ;; Ready list (t = 6):32 19 17 15 13 30 28 26 24 22 20 ;; --> scheduling insn <<<20>>> on unit none ;; dependences resolved: insn 21 into queue with cost=1 ;; Ready-->Q: insn 21: queued for 1 cycles. ;; Ready list (t = 6):32 19 17 15 13 30 28 26 24 22 ;; Q-->Ready: insn 21: moving to ready without stalls ;; Ready list after queue_to_ready:21 32 19 17 15 13 30 28 26 24 22 ;; Ready list after ready_sort:32 21 19 17 15 13 30 28 26 24 22 ;; Ready list (t = 7):32 21 19 17 15 13 30 28 26 24 22 ;; --> scheduling insn <<<22>>> on unit none ;; dependences resolved: insn 23 into queue with cost=1 ;; Ready-->Q: insn 23: queued for 1 cycles. ;; Ready list (t = 7):32 21 19 17 15 13 30 28 26 24 ;; Q-->Ready: insn 23: moving to ready without stalls (snip) What does it mean by "unit none"? Has anyone else encountered this problem? What more information do you think you would need to diagnose it? The port I am working on came from gcc 3.4.2. Thank you, Charles
splitting load immediates using high and lo_sum
Hi, I am working on a port for a processor that has 32 bit registers but can only load 16 bit immediates. I have tried several ways to split moves with larger immediates into two RTL insns. One is using a define_expand: -code--- (define_expand "movsi" [(set (match_operand:SI 0 "nonimmediate_operand" "") (match_operand:SI 1 "general_operand" ""))] "" { if (GET_CODE(operands[0]) != REG) { operands[1] = force_reg(SImode, operands[1]); } else if(s17to32_const_int_operand(operands[1], GET_MODE(operands[1]))){ emit_move_insn(operands[0], gen_rtx_HIGH(GET_MODE(operands[1]), operands[1])); emit_move_insn(operands[0], gen_rtx_LO_SUM(GET_MODE(operands[1]), operands[0], operands[1])); DONE; } }) /code--- With the corresponding define_insns: -code--- (define_insn "movsi_high" [(set (match_operand:SI 0 "register_operand" "=r") (high:SI (match_operand:SI 1 "immediate_operand" "i")))] "" "%0.h = #HI(%1)") (define_insn "movsi_lo_sum" [(set (match_operand:SI 0 "register_operand" "+r") (lo_sum:SI (match_dup 0) (match_operand:SI 1 "immediate_operand" "i")))] "" "%0.l = #LO(%1)") /code--- but using this method, I get the following error: -error--- ./libgcc2.c:470: error: unrecognizable insn: (insn 100 99 86 0 ./libgcc2.c:464 (set (reg:SI 10 r10) (lo_sum (reg:SI 10 r10) (const_int 65536 [0x1]))) -1 (nil) (nil)) ./libgcc2.c:470: internal compiler error: in extract_insn, at recog.c:2083 /error--- Why would that RTL not match my movsi_lo_sum define_insn? I also tried using a define_split: -code--- (define_split [(set (match_operand:SI 0 "register_operand" "") (match_operand:SI 1 "s17to32_const_int_operand" ""))] "reload_completed" [(set (match_dup 0) (high:SI (match_dup 1))) (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))] "") /code--- along with the same define_insns, but then I get the following error: -error--- ./crtstuff.c:288: error: insn does not satisfy its constraints: (insn 103 12 11 0 (set (reg:SI 0 r0) (symbol_ref/u:SI ("*.LC0") [flags 0x2])) 18 {movsi_real} (nil) (nil)) ./crtstuff.c:288: internal compiler error: in reload_cse_simplify_operands, at postreload.c:378 /error--- In other words, that RTL never matches my define_split, even though I placed it before the more general movsi define_insn and s17to32_const_int_operand should return 1 for a symbol_ref. Do you have any idea why either of these attempts do not work? Which method do you think is better? In case you were wondering, here is the code for s17to32_const_int_operand. I modified the function int_2word_operand from the frv port. -code--- int s17to32_const_int_operand(rtx op, enum machine_mode mode ATTRIBUTE_UNUSED) { HOST_WIDE_INT value; REAL_VALUE_TYPE rv; long l; switch (GET_CODE (op)) { default: break; case LABEL_REF: case SYMBOL_REF: case CONST: return 1; case CONST_INT: return ! IN_RANGE_P (INTVAL (op), -0x8000, 0x7FFF); case CONST_DOUBLE: if (GET_MODE (op) == SFmode) { REAL_VALUE_FROM_CONST_DOUBLE (rv, op); REAL_VALUE_TO_TARGET_SINGLE (rv, l); value = l; return ! IN_RANGE_P (value, -0x8000, 0x7FFF); } else if (GET_MODE (op) == VOIDmode) { value = CONST_DOUBLE_LOW (op); return ! IN_RANGE_P (value, -0x8000, 0x7FFF); } break; } return 0; } /code--- Thank you, Charles J. Tabony
RE: splitting load immediates using high and lo_sum
> From: Dale Johannesen [mailto:[EMAIL PROTECTED] > > On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote: > > > Hi, > > > > I am working on a port for a processor that has 32 bit registers but > > can > > only load 16 bit immediates. > > "" > > "%0.h = #HI(%1)") > > What are the semantics of this? Low bits zeroed, or untouched? > If the former, your semantics are identical to Sparc; look at that. The low bits are untouched. However, I would expect the compiler to always follow setting the high bits with setting the low bits. The point of splitting them is that I want the insn setting the high bits to be scheduled in a vliw packet with the preceding insns and the insn setting the low bits to be scheduled with the following insns whenever possible. The processor cannot execute both instructions in the same cycle. -Charles
RE: splitting load immediates using high and lo_sum
> From: Dale Johannesen [mailto:[EMAIL PROTECTED] > > On Jul 21, 2005, at 5:04 PM, Tabony, Charles wrote: > > >> From: Dale Johannesen [mailto:[EMAIL PROTECTED] > >> > >> On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote: > >> > >>> Hi, > >>> > >>> I am working on a port for a processor that has 32 bit registers but > >>> can > >>> only load 16 bit immediates. > >>> "" > >>> "%0.h = #HI(%1)") > >> > >> What are the semantics of this? Low bits zeroed, or untouched? > >> If the former, your semantics are identical to Sparc; look at that. > > > > The low bits are untouched. However, I would expect the compiler to > > always follow setting the high bits with setting the low bits. > > OK, if you're willing to accept that limitation (your architecture could > handle putting the LO first, which Sparc can't) then Sparc is still a > good model to look at. What it does should work for you. Aha! I looked at the SPARC code and distilled it down to what I needed and the difference is that it sets the mode of the high and lo_sum expressions to the mode of operand 0, while I was setting it to the mode of operand 1. Now mine works great. Thank you, Charles J. Tabony
RE: splitting load immediates using high and lo_sum
> From: Dale Johannesen [mailto:[EMAIL PROTECTED] > > On Jul 21, 2005, at 5:04 PM, Tabony, Charles wrote: > > >> From: Dale Johannesen [mailto:[EMAIL PROTECTED] > >> > >> On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote: > >> > >>> Hi, > >>> > >>> I am working on a port for a processor that has 32 bit > registers but > >>> can > >>> only load 16 bit immediates. > >>> "" > >>> "%0.h = #HI(%1)") > >> > >> What are the semantics of this? Low bits zeroed, or untouched? > >> If the former, your semantics are identical to Sparc; look at that. > > > > The low bits are untouched. However, I would expect the compiler to > > always follow setting the high bits with setting the low bits. > > OK, if you're willing to accept that limitation (your > architecture could > handle putting the LO first, which Sparc can't) then Sparc is still a > good model to look at. What it does should work for you. Earlier I was able to successfully split load immediates into high and lo_sum insns, and that has worked great as far as scheduling. However, I noticed that now instead of loading the address of a constant such as a string, compiled programs will load the address of a constant that is the address of that string and then dereference it. My guess is that this is caused by the constant in the high/lo_sum pair being hidden from CSE. I looked at the way SPARC and MIPS handle the problem, but I don't think that will work for me. If I understand correctly, they split the move into a load immediate that has the lower bits cleared, corresponding to a sethi or lui instruction, and an ior immediate. The semantics of the instructions I am working with, "R0.H = #HI(CONSTANT)" and "R0.L = #LO(CONSTANT)" are that the half of the register not being set is unmodified. Since I can not use an ior immediate like SPARC and MIPS, how can I split move immediate insns so that they can be effeciently scheduled but still eliminate the unnecessary indirection? Also, does the method used by SPARC and MIPS work for symbols? Thank you, Charles
using recog_data.operand in ASM_OUTPUT_OPCODE
Hi, I am trying to use recog_data.operand in ASM_OUTPUT_OPCODE to access the operands of the current insn for printing as the documentation for ASM_OUTPUT_OPCODE suggests. However, this does not work for printing inline assembly because asm insns are never matched. How can I distinguish recognized from unrecognized insns in ASM_OUTPUT_OPCODE? Thank you, Charles