Splitting an immediate using define_expand

2005-07-14 Thread Tabony, Charles
Hi,

I am trying to separate move immediates that require more than 16 bits
into two instructions that set the high and low 16 bits of a register
respectively.  Here is my define_expand:

(define_expand "movsi"
  [(set (match_operand:SI 0 "nonimmediate_operand" "")
(match_operand:SI 1 "general_operand" ""))]
  ""
  {
if(GET_CODE(operands[0]) != REG){
  operands[1] = force_reg(SImode, operands[1]);
}
else if(CONSTANT_P(operands[1])
&& (-0x8000 > INTVAL(operands[1]) || INTVAL(operands[1]) >
0x7FFF)){
  emit_move_insn(gen_rtx_SUBREG(HImode, operands[0], 0),
 gen_rtx_TRUNCATE(HImode, operands[1]));
  emit_move_insn(gen_rtx_SUBREG(HImode, operands[0], 2),
 gen_rtx_TRUNCATE(HImode, gen_rtx_LSHIFTRT(SImode,
 
operands[1],
   16)));
  DONE;
}
  }
)

and here are the patterns I intend to match:

(define_insn "movsi_subreg0"
  [(set (subreg:HI (match_operand:SI 0 "register_operand" "=r") 0)
(truncate:HI (match_operand:SI 1 "immediate_operand" "i")))]
  ""
  "%0.l = #LO(%1)"
)

(define_insn "movsi_subreg1"
  [(set (subreg:HI (match_operand:SI 0 "register_operand" "=r") 2)
(truncate:HI (lshiftrt:SI (match_operand:SI 1
"immediate_operand" "i")
  (const_int 16]
  ""
  "%0.h = #HI(%1)"
)

When I try to build, the cross compiler gives the following error:

./crtstuff.c: In function `__do_global_dtors_aux':
./crtstuff.c:261: internal compiler error: in expand_simple_unop, at
optabs.c:2291

Why does this not work?  I think the truncate expression is the problem.
Is this not the way to use truncate?

Thank you,
Charles J. Tabony



scheduling insn on none

2005-07-15 Thread Tabony, Charles
Hi,

I am trying to add instruction scheduling to a machine description.  I
added everything I think I need and the .dfa looks right to me, but when
I compile with -fsched-verbose=10 I get something that looks like this:

;;   ==
;;   -- basic block 0 from 9 to 83 -- before reload
;;   ==

;;   --- forward dependences:  

;;   --- Region Dependences --- b 0 bb 0 
;;  insn  codebb   dep  prio  cost   blockage units
;;    --   ---        -
;;917 0 012 10 -  0   none  : 83 10 
;;   1062 0 111 10 -  0   none  : 83 33
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 
;;   1236 0 110 10 -  0   none  : 83 13 
;;   1317 0 2 9 10 -  0   none  : 83 33 
;;   1436 0 110 10 -  0   none  : 83 15 
;;   1517 0 2 9 10 -  0   none  : 83 33 
;;   1636 0 110 10 -  0   none  : 83 17

(snip)

;;  Ready list after queue_to_ready:19  32  17  15  13
30  28  26  24  22  20
;;  Ready list after ready_sort:32  19  17  15  13  30
28  26  24  22  20
;;  Ready list (t =  6):32  19  17  15  13  30  28  26  24  22
20
;;  --> scheduling insn <<<20>>> on unit none
;;  dependences resolved: insn 21 into queue with cost=1
;;  Ready-->Q: insn 21: queued for 1 cycles.
;;  Ready list (t =  6):32  19  17  15  13  30  28  26  24  22
;;  Q-->Ready: insn 21: moving to ready without stalls
;;  Ready list after queue_to_ready:21  32  19  17  15
13  30  28  26  24  22
;;  Ready list after ready_sort:32  21  19  17  15  13
30  28  26  24  22
;;  Ready list (t =  7):32  21  19  17  15  13  30  28  26  24
22
;;  --> scheduling insn <<<22>>> on unit none
;;  dependences resolved: insn 23 into queue with cost=1
;;  Ready-->Q: insn 23: queued for 1 cycles.
;;  Ready list (t =  7):32  21  19  17  15  13  30  28  26  24
;;  Q-->Ready: insn 23: moving to ready without stalls

(snip)

What does it mean by "unit none"?  Has anyone else encountered this
problem?  What more information do you think you would need to diagnose
it?  The port I am working on came from gcc 3.4.2.

Thank you,
Charles



splitting load immediates using high and lo_sum

2005-07-21 Thread Tabony, Charles
Hi,

I am working on a port for a processor that has 32 bit registers but can
only load 16 bit immediates.  I have tried several ways to split moves
with larger immediates into two RTL insns.  One is using a
define_expand:

-code---
(define_expand "movsi"
  [(set (match_operand:SI 0 "nonimmediate_operand" "")
(match_operand:SI 1 "general_operand" ""))]
  ""
  {
if (GET_CODE(operands[0]) != REG) {
  operands[1] = force_reg(SImode, operands[1]);
}
else if(s17to32_const_int_operand(operands[1],
GET_MODE(operands[1]))){
  emit_move_insn(operands[0],
 gen_rtx_HIGH(GET_MODE(operands[1]), operands[1]));
  emit_move_insn(operands[0],
 gen_rtx_LO_SUM(GET_MODE(operands[1]),
operands[0], operands[1]));
  DONE;
}
  })
/code---

With the corresponding define_insns:

-code---
(define_insn "movsi_high"
  [(set (match_operand:SI 0 "register_operand" "=r")
(high:SI (match_operand:SI 1 "immediate_operand" "i")))]
  ""
  "%0.h = #HI(%1)")

(define_insn "movsi_lo_sum"
  [(set (match_operand:SI 0 "register_operand" "+r")
(lo_sum:SI (match_dup 0)
   (match_operand:SI 1 "immediate_operand" "i")))]
  ""
  "%0.l = #LO(%1)")
/code---

but using this method, I get the following error:

-error---
./libgcc2.c:470: error: unrecognizable insn:
(insn 100 99 86 0 ./libgcc2.c:464 (set (reg:SI 10 r10)
(lo_sum (reg:SI 10 r10)
(const_int 65536 [0x1]))) -1 (nil)
(nil))
./libgcc2.c:470: internal compiler error: in extract_insn, at
recog.c:2083
/error---

Why would that RTL not match my movsi_lo_sum define_insn?

I also tried using a define_split:

-code---
(define_split
  [(set (match_operand:SI 0 "register_operand" "")
(match_operand:SI 1 "s17to32_const_int_operand" ""))]
  "reload_completed"
  [(set (match_dup 0)
(high:SI (match_dup 1)))
   (set (match_dup 0)
(lo_sum:SI (match_dup 0)
(match_dup 1)))]
  "")
/code---

along with the same define_insns, but then I get the following error:

-error---
./crtstuff.c:288: error: insn does not satisfy its constraints:
(insn 103 12 11 0 (set (reg:SI 0 r0)
(symbol_ref/u:SI ("*.LC0") [flags 0x2])) 18 {movsi_real} (nil)
(nil))
./crtstuff.c:288: internal compiler error: in
reload_cse_simplify_operands, at postreload.c:378
/error---

In other words, that RTL never matches my define_split, even though I
placed it before the more general movsi define_insn and
s17to32_const_int_operand should return 1 for a symbol_ref.

Do you have any idea why either of these attempts do not work?  Which
method do you think is better?  In case you were wondering, here is the
code for s17to32_const_int_operand.  I modified the function
int_2word_operand from the frv port.

-code---
int s17to32_const_int_operand(rtx op, enum machine_mode mode
ATTRIBUTE_UNUSED)
{
  HOST_WIDE_INT value;
  REAL_VALUE_TYPE rv;
  long l;

  switch (GET_CODE (op))
{
default:
  break;

case LABEL_REF:
case SYMBOL_REF:
case CONST:
  return 1;

case CONST_INT:
  return ! IN_RANGE_P (INTVAL (op), -0x8000, 0x7FFF);

case CONST_DOUBLE:
  if (GET_MODE (op) == SFmode)
{
  REAL_VALUE_FROM_CONST_DOUBLE (rv, op);
  REAL_VALUE_TO_TARGET_SINGLE (rv, l);
  value = l;
  return ! IN_RANGE_P (value, -0x8000, 0x7FFF);
}
  else if (GET_MODE (op) == VOIDmode)
{
  value = CONST_DOUBLE_LOW (op);
  return ! IN_RANGE_P (value, -0x8000, 0x7FFF);
}
  break;
}

  return 0;
}
/code---

Thank you,
Charles J. Tabony



RE: splitting load immediates using high and lo_sum

2005-07-21 Thread Tabony, Charles
> From: Dale Johannesen [mailto:[EMAIL PROTECTED]
> 
> On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote:
> 
> > Hi,
> >
> > I am working on a port for a processor that has 32 bit registers but
> > can
> > only load 16 bit immediates.
> >   ""
> >   "%0.h = #HI(%1)")
> 
> What are the semantics of this?  Low bits zeroed, or untouched?
> If the former, your semantics are identical to Sparc; look at that.

The low bits are untouched.  However, I would expect the compiler to
always follow setting the high bits with setting the low bits.  The
point of splitting them is that I want the insn setting the high bits to
be scheduled in a vliw packet with the preceding insns and the insn
setting the low bits to be scheduled with the following insns whenever
possible.  The processor cannot execute both instructions in the same
cycle.

-Charles



RE: splitting load immediates using high and lo_sum

2005-07-21 Thread Tabony, Charles
> From: Dale Johannesen [mailto:[EMAIL PROTECTED]
> 
> On Jul 21, 2005, at 5:04 PM, Tabony, Charles wrote:
> 
> >> From: Dale Johannesen [mailto:[EMAIL PROTECTED]
> >>
> >> On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote:
> >>
> >>> Hi,
> >>>
> >>> I am working on a port for a processor that has 32 bit registers
but
> >>> can
> >>> only load 16 bit immediates.
> >>>   ""
> >>>   "%0.h = #HI(%1)")
> >>
> >> What are the semantics of this?  Low bits zeroed, or untouched?
> >> If the former, your semantics are identical to Sparc; look at that.
> >
> > The low bits are untouched.  However, I would expect the compiler to
> > always follow setting the high bits with setting the low bits.
> 
> OK, if you're willing to accept that limitation (your architecture
could
> handle putting the LO first, which Sparc can't) then Sparc is still a
> good model to look at.  What it does should work for you.

Aha!  I looked at the SPARC code and distilled it down to what I needed
and the difference is that it sets the mode of the high and lo_sum
expressions to the mode of operand 0, while I was setting it to the mode
of operand 1.  Now mine works great.

Thank you,
Charles J. Tabony



RE: splitting load immediates using high and lo_sum

2005-08-02 Thread Tabony, Charles
> From: Dale Johannesen [mailto:[EMAIL PROTECTED] 
> 
> On Jul 21, 2005, at 5:04 PM, Tabony, Charles wrote:
> 
> >> From: Dale Johannesen [mailto:[EMAIL PROTECTED]
> >>
> >> On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote:
> >>
> >>> Hi,
> >>>
> >>> I am working on a port for a processor that has 32 bit 
> registers but
> >>> can
> >>> only load 16 bit immediates.
> >>>   ""
> >>>   "%0.h = #HI(%1)")
> >>
> >> What are the semantics of this?  Low bits zeroed, or untouched?
> >> If the former, your semantics are identical to Sparc; look at that.
> >
> > The low bits are untouched.  However, I would expect the compiler to
> > always follow setting the high bits with setting the low bits.
> 
> OK, if you're willing to accept that limitation (your 
> architecture could
> handle putting the LO first, which Sparc can't) then Sparc is still a
> good model to look at.  What it does should work for you.

Earlier I was able to successfully split load immediates into high and
lo_sum insns, and that has worked great as far as scheduling.  However,
I noticed that now instead of loading the address of a constant such as
a string, compiled programs will load the address of a constant that is
the address of that string and then dereference it.  My guess is that
this is caused by the constant in the high/lo_sum pair being hidden from
CSE.

I looked at the way SPARC and MIPS handle the problem, but I don't think
that will work for me.  If I understand correctly, they split the move
into a load immediate that has the lower bits cleared, corresponding to
a sethi or lui instruction, and an ior immediate.  The semantics of the
instructions I am working with, "R0.H = #HI(CONSTANT)" and "R0.L =
#LO(CONSTANT)" are that the half of the register not being set is
unmodified.

Since I can not use an ior immediate like SPARC and MIPS, how can I
split move immediate insns so that they can be effeciently scheduled but
still eliminate the unnecessary indirection?  Also, does the method used
by SPARC and MIPS work for symbols?

Thank you,
Charles


using recog_data.operand in ASM_OUTPUT_OPCODE

2005-08-04 Thread Tabony, Charles
Hi,

I am trying to use recog_data.operand in ASM_OUTPUT_OPCODE to access the
operands of the current insn for printing as the documentation for
ASM_OUTPUT_OPCODE suggests.  However, this does not work for printing
inline assembly because asm insns are never matched.  How can I
distinguish recognized from unrecognized insns in ASM_OUTPUT_OPCODE?

Thank you,
Charles