moving v16sf reg with multiple sub-regs

2005-02-19 Thread Dylan Cuthbert
Hi there,

I have implemented a move of a v16sf type like this because it is held by 4 
v4sf registers:

--- snip ---

(define_expand "movv16sf"
  [(set (match_operand:V16SF 0 "nonimmediate_operand" "")
 (match_operand:V16SF 1 "general_operand" ""))]
  ""
  "  if ((reload_in_progress | reload_completed) == 0
  && !register_operand (operands[0], V16SFmode)
  && !nonmemory_operand (operands[1], V16SFmode))
operands[1] = force_reg (V16SFmode, operands[1]);

 move_v16sf( operands );
 DONE;
 ")

--- end snip ---


and in the config's .c file:


--- snip ---

void
move_v16sf (operands )
 rtx operands[];
{
  rtx op0 = operands[0];
  rtx op1 = operands[1];
  enum rtx_code code0 = GET_CODE (operands[0]);
  enum rtx_code code1 = GET_CODE (operands[1]);
  int subreg_offset0 = 0;
  int subreg_offset1 = 0;
  enum delay_type delay = DELAY_NONE;

  if (code0 == REG)
{
  int regno0 = REGNO (op0) + subreg_offset0;

  if (code1 == REG)
 {
   int regno1 = REGNO (op1) + subreg_offset1;

   /* Just in case, don't do anything for assigning a register
  to itself, unless we are filling a delay slot.  */
   if (regno0 == regno1 && set_nomacro == 0) return;

   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_SUBREG( 
V4SFmode, op1, 0   ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_SUBREG( 
V4SFmode, op1, 16  ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_SUBREG( 
V4SFmode, op1, 32  ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_SUBREG( 
V4SFmode, op1, 48  ) );
 }
  else if (code1 == MEM)
 {
   rtx src_reg;

   src_reg = copy_addr_to_reg ( XEXP (op1,0) );

   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_MEM( 
V4SFmode, src_reg ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_MEM( 
V4SFmode, plus_constant( src_reg, 16 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_MEM( 
V4SFmode, plus_constant( src_reg, 32 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_MEM( 
V4SFmode, plus_constant( src_reg, 48 ) ) );
 }

}

  else if (code0 == MEM)
{
  if (code1 == REG)
 {
   rtx dest_reg;

   dest_reg = copy_addr_to_reg ( XEXP (op0,0) );

   emit_move_insn( gen_rtx_MEM( V4SFmode, dest_reg ), gen_rtx_SUBREG 
(V4SFmode, op1, 0  ) );
   emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 16) ), 
gen_rtx_SUBREG (V4SFmode, op1, 16 ) );
   emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 32) ), 
gen_rtx_SUBREG (V4SFmode, op1, 32 ) );
   emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 48) ), 
gen_rtx_SUBREG (V4SFmode, op1, 48 ) );
 }
}

}
--- end snip ---


This works ok, but it produces inefficient code, here some sample source 
code:

--- snip ---

typedef int v4 __attribute__((mode(V4SF)));
typedef int m4 __attribute__((mode(V16SF)));

v4 vec1, vec2;
m4 frog;

int main( int argc, char* argv[] )
{
 m4 blob;

 asm( "some_instruction %0,%1,%2,%3" : "=&j" (blob): "j" (vec1), "j" (vec2), 
"j" (frog) );
 asm( "some_instruction2 %0,%1" : "=&j" (frog) : "j" (blob) );

 return 0;
}

--- end snip ---

where j is the register class for v4sf and v16sf types.
This produces a move of the v16sf type between the two asm instructions, 
when it doesn't need to, does anyone have any ideas why this move isn't 
eliminated?

 #APP
some_instruction r10,r22,r20,r00
 #NO_APP
move r00,r10
move r01,r11
move r02,r12
move r03,r13
 #APP
some_instruction2 r10, r00


r10 isn't needed to be preserved (it isn't written out) but it seems to be 
making a copy anyway.  Worse, if "blob" is defined in global space like 
"frog", then it also writes out r10 to memory when it shouldn't.


Any ideas appreciated.

Regards

Re: moving v16sf reg with multiple sub-regs

2005-02-21 Thread Dylan Cuthbert
Further investigation.

If I remove the define_expand for movv16sf and throw in a dummy define_insn 
that supports reg<->reg mem<->reg reg<->mem, then the redundant move is 
optimized away.

But of course, the store load and move all use 4 instructions each so this 
produces inefficient code.

Any idea how I can get the same removal of redundant temporaries and still 
get the multiple instructions for each operation interspersed nicely?

Dylan


"Dylan Cuthbert" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Hi there,
>
> I have implemented a move of a v16sf type like this because it is held by 
> 4 v4sf registers:
>
> --- snip ---
>
> (define_expand "movv16sf"
>  [(set (match_operand:V16SF 0 "nonimmediate_operand" "")
> (match_operand:V16SF 1 "general_operand" ""))]
>  ""
>  "  if ((reload_in_progress | reload_completed) == 0
>  && !register_operand (operands[0], V16SFmode)
>  && !nonmemory_operand (operands[1], V16SFmode))
>operands[1] = force_reg (V16SFmode, operands[1]);
>
> move_v16sf( operands );
> DONE;
> ")
>
> --- end snip ---
>
>
> and in the config's .c file:
>
>
> --- snip ---
>
> void
> move_v16sf (operands )
> rtx operands[];
> {
>  rtx op0 = operands[0];
>  rtx op1 = operands[1];
>  enum rtx_code code0 = GET_CODE (operands[0]);
>  enum rtx_code code1 = GET_CODE (operands[1]);
>  int subreg_offset0 = 0;
>  int subreg_offset1 = 0;
>  enum delay_type delay = DELAY_NONE;
>
>  if (code0 == REG)
>{
>  int regno0 = REGNO (op0) + subreg_offset0;
>
>  if (code1 == REG)
> {
>   int regno1 = REGNO (op1) + subreg_offset1;
>
>   /* Just in case, don't do anything for assigning a register
>  to itself, unless we are filling a delay slot.  */
>   if (regno0 == regno1 && set_nomacro == 0) return;
>
>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_SUBREG( 
> V4SFmode, op1, 0   ) );
>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_SUBREG( 
> V4SFmode, op1, 16  ) );
>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_SUBREG( 
> V4SFmode, op1, 32  ) );
>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_SUBREG( 
> V4SFmode, op1, 48  ) );
> }
>  else if (code1 == MEM)
> {
>   rtx src_reg;
>
>   src_reg = copy_addr_to_reg ( XEXP (op1,0) );
>
>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_MEM( 
> V4SFmode, src_reg ) );
>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_MEM( 
> V4SFmode, plus_constant( src_reg, 16 ) ) );
>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_MEM( 
> V4SFmode, plus_constant( src_reg, 32 ) ) );
>   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_MEM( 
> V4SFmode, plus_constant( src_reg, 48 ) ) );
> }
>
>}
>
>  else if (code0 == MEM)
>{
>  if (code1 == REG)
> {
>   rtx dest_reg;
>
>   dest_reg = copy_addr_to_reg ( XEXP (op0,0) );
>
>   emit_move_insn( gen_rtx_MEM( V4SFmode, dest_reg ), gen_rtx_SUBREG 
> (V4SFmode, op1, 0  ) );
>   emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 16) ), 
> gen_rtx_SUBREG (V4SFmode, op1, 16 ) );
>   emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 32) ), 
> gen_rtx_SUBREG (V4SFmode, op1, 32 ) );
>   emit_move_insn( gen_rtx_MEM( V4SFmode, plus_constant( dest_reg, 48) ), 
> gen_rtx_SUBREG (V4SFmode, op1, 48 ) );
> }
>}
>
> }
> --- end snip ---
>
>
> This works ok, but it produces inefficient code, here some sample source 
> code:
>
> --- snip ---
>
> typedef int v4 __attribute__((mode(V4SF)));
> typedef int m4 __attribute__((mode(V16SF)));
>
> v4 vec1, vec2;
> m4 frog;
>
> int main( int argc, char* argv[] )
> {
> m4 blob;
>
> asm( "some_instruction %0,%1,%2,%3" : "=&j" (blob): "j" (vec1), "j" 
> (vec2), "j" (frog) );
> asm( "some_instruction2 %0,%1" : "=&j" (frog) : "j" (blob) );
>
> return 0;
> }
>
> --- end snip ---
>
> where j is the register class for v4sf and v16sf types.
> This produces a move of the v16sf type betwe

Re: moving v16sf reg with multiple sub-regs

2005-02-22 Thread Dylan Cuthbert
Hi there,
The assembler instructions themselves don't allow the target to be the same 
as the source unfortunately so removing the '&' is difficult.  (If I enforce 
the same thing without a '&' in inline asm using builtins and building the 
expression manually to generate a new reg rtx if the dest/source are the 
same do you think it will optimize better?)

However, I don't see why it isn't eliminating the move that is generated 
when it realises that the temporary source is discarded.   It seems to do 
this ok if it is just a define_insn with raw multi-line assembly, but I 
can't use multi-line assembly or it destroys optimizations that occur if 
sub-register access is performed, ie. if I overwrite the second v4sf in a 
v16sf type, gcc nicely gets rid of the move of that particular sub-register 
when it copies the entire v16sf around - something I was quite impressed by.

Regards
Dylan
"James E Wilson" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
Dylan Cuthbert wrote:
asm( "some_instruction %0,%1,%2,%3" : "=&j" (blob): "j" (vec1), "j"
(vec2), "j" (frog) );
asm( "some_instruction2 %0,%1" : "=&j" (frog) : "j" (blob) );
It is the goal of the register allocator to use as few registers as
possible,
which means that we will try to use the same register for input and
output here.
Until we get to reload, where we see the early clobber (&), and then are
forced
to add a copy so that the instruction has separate input and output
registers.
Early clobbers are bad.  Don't ever use them unless you have to.  Just
because
the instruction operates on pieces of the input does not mean & is
necessary.
You only add the & if the input and output  operands must be in different
non-overlapping registers.
This is just a guess.  Try compiling with -da and looking at the register
assignments in the .lreg and .greg files, and also at what reload did.
It is
possible that there could be something else going on.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com



Re: moving v16sf reg with multiple sub-regs

2005-02-22 Thread Dylan Cuthbert
One reason that occurred to me is that I am issueing the v16sf move as four 
subreg v4sf moves.

One thing I get are "variable may not be initialised" warnings:
v16sf   test;
test = _builtin_matrix_mul( left, right );
return test;
Is there someway I can flag the moves to say that is moving the v16sf 
"whole" so it doesn't need to be initialised and hence avoid the warning?

Dylan
"James E Wilson" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
Dylan Cuthbert wrote:
asm( "some_instruction %0,%1,%2,%3" : "=&j" (blob): "j" (vec1), "j"
(vec2), "j" (frog) );
asm( "some_instruction2 %0,%1" : "=&j" (frog) : "j" (blob) );
It is the goal of the register allocator to use as few registers as
possible,
which means that we will try to use the same register for input and
output here.
Until we get to reload, where we see the early clobber (&), and then are
forced
to add a copy so that the instruction has separate input and output
registers.
Early clobbers are bad.  Don't ever use them unless you have to.  Just
because
the instruction operates on pieces of the input does not mean & is
necessary.
You only add the & if the input and output  operands must be in different
non-overlapping registers.
This is just a guess.  Try compiling with -da and looking at the register
assignments in the .lreg and .greg files, and also at what reload did.
It is
possible that there could be something else going on.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com



Re: moving v16sf reg with multiple sub-regs

2005-02-22 Thread Dylan Cuthbert
Brilliant!
This got rid of the warnings, *and* got rid of the spurious move I was 
getting.  Thanks for the advice!

The problem with the spurious move was with the move from memory to the 
v16sf register, because it was doing it with subregs it thought it still had 
to preserve the previous (uninitialised) value and so couldn't optimize the 
move out completely because of the parameter aliasing on those asm 
instructions.

For the last parameter (equiv) of emit_no_conflict_block I am putting in 
"gen_rtx_SET ( V16SFmode, op0, op1 )", does this seem correct to you?

Much thanks
Dylan
"James E Wilson" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
Dylan Cuthbert wrote:
Is there someway I can flag the moves to say that is moving the v16sf 
"whole" so it doesn't need to be initialised and hence avoid the warning?
See emit_no_conflict_block in optabs.c.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com



Re: moving v16sf reg with multiple sub-regs

2005-02-22 Thread Dylan Cuthbert
Ah, ok, sorry about that, I read it as being the equivalent of the whole 
operation.

I'll throw op1 in there, thanks again.
Dylan
"James E Wilson" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
Dylan Cuthbert wrote:
For the last parameter (equiv) of emit_no_conflict_block I am putting in 
"gen_rtx_SET ( V16SFmode, op0, op1 )", does this seem correct to you?
This is supposed to be the value of op0 after the no conflict block.  So 
it should just be op1.
--
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com




Re: moving v16sf reg with multiple sub-regs

2005-02-23 Thread Dylan Cuthbert
Thanks for the info...
I was worried about the aliasing problems - adjust_address fits the ticket 
perfectly.

The information for simplify_gen_subreg is a little sparse, what does it do 
differently to gen_rtx_subreg?

Regards
Dylan
"Richard Sandiford" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
"Dylan Cuthbert" <[EMAIL PROTECTED]> writes:
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_MEM(
V4SFmode, src_reg ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 16 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 32 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 48 ) ) );
Note that generating MEMs like this is a bad idea because it discards
alias information.  It's better to use functions like adjust_address
instead.
It's probably also better to use simplify_gen_subreg instead of
gen_rtx_SUBREG.
Richard



Re: moving v16sf reg with multiple sub-regs

2005-02-23 Thread Dylan Cuthbert
Ok, I think I found out why gen_subreg crashes here: (with gcc 3.3.3)
 if (byte % GET_MODE_SIZE (outermode)
 || byte >= GET_MODE_SIZE (innermode))
   abort ();
This check doesn't seem right to me  ;-)
I'll see what's in the latest cvs for this function.
Regards
Dylan
"Richard Sandiford" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
"Dylan Cuthbert" <[EMAIL PROTECTED]> writes:
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_MEM(
V4SFmode, src_reg ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 16 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 32 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 48 ) ) );
Note that generating MEMs like this is a bad idea because it discards
alias information.  It's better to use functions like adjust_address
instead.
It's probably also better to use simplify_gen_subreg instead of
gen_rtx_SUBREG.
Richard



Re: moving v16sf reg with multiple sub-regs

2005-02-23 Thread Dylan Cuthbert
I tried simplify_gen_subreg but it crashes with a compiler error.  Maybe 
because V4SF isn't really thought of as a subreg of a V16SF at the moment? 
I am using gcc 3.3.3 right now so it might be just that it works in a later 
version of the compiler?

"Richard Sandiford" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
"Dylan Cuthbert" <[EMAIL PROTECTED]> writes:
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_MEM(
V4SFmode, src_reg ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 16 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 32 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 48 ) ) );
Note that generating MEMs like this is a bad idea because it discards
alias information.  It's better to use functions like adjust_address
instead.
It's probably also better to use simplify_gen_subreg instead of
gen_rtx_SUBREG.
Richard



Re: moving v16sf reg with multiple sub-regs

2005-02-23 Thread Dylan Cuthbert
Unless in gcc-world  outermode has the meaning of innermode? (and vice 
versa)

which.. from looking at some other source... perhaps it does..  :-/
"Dylan Cuthbert" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
Ok, I think I found out why gen_subreg crashes here: (with gcc 3.3.3)
 if (byte % GET_MODE_SIZE (outermode)
 || byte >= GET_MODE_SIZE (innermode))
   abort ();
This check doesn't seem right to me  ;-)
I'll see what's in the latest cvs for this function.
Regards
Dylan
"Richard Sandiford" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
"Dylan Cuthbert" <[EMAIL PROTECTED]> writes:
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 0  ), gen_rtx_MEM(
V4SFmode, src_reg ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 16 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 16 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 32 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 32 ) ) );
   emit_move_insn( gen_rtx_SUBREG (V4SFmode, op0, 48 ), gen_rtx_MEM(
V4SFmode, plus_constant( src_reg, 48 ) ) );
Note that generating MEMs like this is a bad idea because it discards
alias information.  It's better to use functions like adjust_address
instead.
It's probably also better to use simplify_gen_subreg instead of
gen_rtx_SUBREG.
Richard