Re: GCC47 movmem breaks RA, GCC46 RA is fine

2012-04-27 Thread Richard Guenther
On Thu, Apr 26, 2012 at 6:16 PM, Paulo J. Matos  wrote:
> Hi,
>
> I am facing a problem with the GCC47 register allocation and my movmemqi.
> GCC46 dealt very well with the problem but GCC47 keeps throwing at me
> register spill failures.
>
> My backend has very few registers. 3 chip registers in total (class
> CHIP_REGS), one of them (XL) is used for memory references (class ADDR_REGS)
> and the other two (AL, AH) are for normal use (DATA_REGS), so CHIP_REGS =
> ADDR_REGS U DATA_REGS.
>
> There are a couple of other memory mapped registers, but all loads and
> stores go through CHIP_REGS.
>
> My chip has a block copy instruction which needs source address in XL,
> destination address in AH and count in AL. My movmemqi is similar to
> movmemsi in rx.
>
> (define_expand "movmemqi"
>  [(use (match_operand:BLK 0 "memory_operand"))
>   (use (match_operand:BLK 1 "memory_operand"))
>   (use (match_operand:QI 2 "general_operand"))
>   (use (match_operand:QI 3 "general_operand"))]
>  ""
> {
>    rtx dst_addr = XEXP(operands[0], 0);
>    rtx src_addr = XEXP(operands[1], 0);
>    rtx dst_reg = gen_rtx_REG(QImode, RAH);
>    rtx src_reg = gen_rtx_REG(QImode, RXL);
>    rtx cnt_reg = gen_rtx_REG(QImode, RAL);
>
>    emit_move_insn(cnt_reg, operands[2]);
>
>    if(GET_CODE(dst_addr) == PLUS)
>    {
>        emit_move_insn(dst_reg, XEXP(dst_addr, 0));
>        emit_insn(gen_addqi3(dst_reg, dst_reg, XEXP(dst_addr, 1)));
>    }
>    else
>        emit_move_insn(dst_reg, dst_addr);
>
>    if(GET_CODE(src_addr) == PLUS)
>    {
>        emit_move_insn(src_reg, XEXP(src_addr, 0));
>        emit_insn(gen_addqi3(src_reg, src_reg, XEXP(src_addr, 1)));
>    }
>    else
>        emit_move_insn(src_reg, src_addr);
>
>    emit_insn(gen_bc2());
>
>    DONE;
> })
>
> (define_insn "bc2"
>  [(set (reg:QI RAL) (const_int 0))
>   (set (mem:BLK (reg:QI RAH)) (mem:BLK (reg:QI RXL)))
>   (set (reg:QI RXL) (plus:QI (reg:QI RXL) (reg:QI RAL)))
>   (set (reg:QI RAH) (plus:QI (reg:QI RAH) (reg:QI RAL)))]
>  ""
>  "bc2")
>
> The parallel in bc2 setups what the bc2 chip instruction modifies. Copies
> block in XL to AH, Moves XL to point to the end of the source block, AH to
> point to the end of the destination block and sets AL to 0.
>
> The C code
> int **
> t25 (int *d, int **s)
> {
>  memcpy (d, *s, 16);
>  return s;
> }
>
> turns into the following after asmcons (-Os passed in):
> (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>
> (insn 2 5 3 2 (parallel [
>            (set (reg/v/f:QI 22 [ d ])
>                (reg:QI 1 AL [ d ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:3 6 {*movqi}
>     (expr_list:REG_DEAD (reg:QI 1 AL [ d ])
>        (expr_list:REG_UNUSED (reg:CC 13 CC)
>            (nil
>
> (insn 3 2 4 2 (parallel [
>            (set (reg/v/f:QI 23 [ s ])
>                (reg:QI 0 AH [ s ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:3 6 {*movqi}
>     (expr_list:REG_DEAD (reg:QI 0 AH [ s ])
>        (expr_list:REG_UNUSED (reg:CC 13 CC)
>            (nil
>
> (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
>
> (insn 7 4 8 2 (parallel [
>            (set (reg/f:QI 24 [ *s_1(D) ])
>                (mem/f:QI (reg/v/f:QI 23 [ s ]) [2 *s_1(D)+0 S1 A16]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:4 6 {*movqi}
>     (expr_list:REG_UNUSED (reg:CC 13 CC)
>        (nil)))
>
> (insn 8 7 9 2 (parallel [
>            (set (reg:QI 1 AL)
>                (const_int 16 [0x10]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:4 6 {*movqi}
>     (expr_list:REG_UNUSED (reg:CC 13 CC)
>        (nil)))
>
> (insn 9 8 10 2 (parallel [
>            (set (reg:QI 0 AH)
>                (reg/v/f:QI 22 [ d ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:4 6 {*movqi}
>     (expr_list:REG_DEAD (reg/v/f:QI 22 [ d ])
>        (expr_list:REG_UNUSED (reg:CC 13 CC)
>            (nil
>
> (insn 10 9 11 2 (parallel [
>            (set (reg:QI 3 X)
>                (reg/f:QI 24 [ *s_1(D) ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:4 6 {*movqi}
>     (expr_list:REG_DEAD (reg/f:QI 24 [ *s_1(D) ])
>        (expr_list:REG_UNUSED (reg:CC 13 CC)
>            (nil
>
> (insn 11 10 16 2 (parallel [
>            (set (reg:QI 1 AL)
>                (const_int 0 [0]))
>            (set (mem:BLK (reg:QI 0 AH) [0 A16])
>                (mem:BLK (reg:QI 3 X) [0 A16]))
>            (set (reg:QI 3 X)
>                (plus:QI (reg:QI 3 X)
>                    (reg:QI 1 AL)))
>            (set (reg:QI 0 AH)
>                (plus:QI (reg:QI 0 AH)
>                    (reg:QI 1 AL)))
>        ]) memcpy.i:4 21 {bc2}
>     (expr_list:REG_UNUSED (reg:QI 3 X)
>        (expr_list:REG_UNUSED (reg:QI 1 AL)
>            (expr_list:REG_UNUSED (reg:QI 0 AH)
>                (nil)
>
> (insn 16 11 19 2 (parallel [
>            (set (reg/i:QI 1 AL)
>                (reg/v/f:QI 23 [ s ]))
>            (clobber (reg:CC 13 CC))
>        ]) memcpy.i:6 6 {*movqi}
>     (expr_list:REG_DEAD (reg/v/f:QI 23 [ s ])

Re: locating unsigned type for non-standard precision

2012-04-27 Thread Georg-Johann Lay
Richard Guenther wrote:
> [PR c/51527]
> 
> I think the fix would be sth like
> 
> Index: gcc/convert.c
> ===
> --- gcc/convert.c   (revision 186871)
> +++ gcc/convert.c   (working copy)
> @@ -769,6 +769,7 @@ convert_to_integer (tree type, tree expr
>(Otherwise would recurse infinitely in convert.  */
> if (TYPE_PRECISION (typex) != inprec)
>   {
> +   tree otypex = typex;
> /* Don't do unsigned arithmetic where signed was wanted,
>or vice versa.
>Exception: if both of the original operands were
> @@ -806,10 +807,11 @@ convert_to_integer (tree type, tree expr
>   typex = unsigned_type_for (typex);
> else
>   typex = signed_type_for (typex);
> -   return convert (type,
> -   fold_build2 (ex_form, typex,
> -convert (typex, arg0),
> -convert (typex, arg1)));
> +   if (TYPE_PRECISION (otypex) == TYPE_PRECISION (typex))
> + return convert (type,
> + fold_build2 (ex_form, typex,
> +  convert (typex, arg0),
> +  convert (typex, arg1)));
>   }
>   }
>   }

Thanks for the patch.

I bootstrapped and regression-tested on i686-pc-linux-gnu.

If it's ok with you I'd go ahead and install it.

And maybe Peter could tell if it also fixes the issue on his platform.

Johann


Re: locating unsigned type for non-standard precision

2012-04-27 Thread Richard Guenther
On Fri, Apr 27, 2012 at 11:29 AM, Georg-Johann Lay  wrote:
> Richard Guenther wrote:
>> [PR c/51527]
>>
>> I think the fix would be sth like
>>
>> Index: gcc/convert.c
>> ===
>> --- gcc/convert.c       (revision 186871)
>> +++ gcc/convert.c       (working copy)
>> @@ -769,6 +769,7 @@ convert_to_integer (tree type, tree expr
>>                    (Otherwise would recurse infinitely in convert.  */
>>                 if (TYPE_PRECISION (typex) != inprec)
>>                   {
>> +                   tree otypex = typex;
>>                     /* Don't do unsigned arithmetic where signed was wanted,
>>                        or vice versa.
>>                        Exception: if both of the original operands were
>> @@ -806,10 +807,11 @@ convert_to_integer (tree type, tree expr
>>                       typex = unsigned_type_for (typex);
>>                     else
>>                       typex = signed_type_for (typex);
>> -                   return convert (type,
>> -                                   fold_build2 (ex_form, typex,
>> -                                                convert (typex, arg0),
>> -                                                convert (typex, arg1)));
>> +                   if (TYPE_PRECISION (otypex) == TYPE_PRECISION (typex))
>> +                     return convert (type,
>> +                                     fold_build2 (ex_form, typex,
>> +                                                  convert (typex, arg0),
>> +                                                  convert (typex, arg1)));
>>                   }
>>               }
>>           }
>
> Thanks for the patch.
>
> I bootstrapped and regression-tested on i686-pc-linux-gnu.
>
> If it's ok with you I'd go ahead and install it.
>
> And maybe Peter could tell if it also fixes the issue on his platform.

It's not necessary on the trunk btw, but it's ok for the 4.7 branch.

Thanks,
Richard.

> Johann


Re: GCC47 movmem breaks RA, GCC46 RA is fine

2012-04-27 Thread Paulo J. Matos

On 27/04/12 09:21, Richard Guenther wrote:


This differs from what GCC47 does and seems to work better.
I would like help on how to best handle this situation under GCC47.


Not provide movmem which looks like open-coded and not in any way
"optimized"?



Thanks Richard, however I don't understand your comment.
GCC46 outputs for this problem:
$t25:
enterl  #H'0002
st  AL,@H'fff9
st  AH,@H'fff8
ld  X,@$XAP_AH
ld  X,@(0,X)
ld  AL,#H'0010
ld  AH,@H'fff9
bc2
ld  AL,@H'fff8
leavel  #H'0002

and GCC47, once movmemqi and setmemqi are disabled:
$t25:
enterl  #H'0005
st  AH,@(H'0001,Y)
ld  X,@$XAP_AH
ld  X,@(0,X)
ld  AH,#H'0010
st  AH,@(0,Y)
ld  AH,@$XAP_UXL
bsr $memcpy
ld  AL,@(H'0001,Y)
leavel  #H'0005


It feels to me that GCC46 version is better:
* no branch to subroutine memcpy;
* less stack usage (argument to enterl);

So, using our block copy (bc2) instruction is an optimisation, don't you 
think?


--
PMatos



Re: GCC47 movmem breaks RA, GCC46 RA is fine

2012-04-27 Thread Richard Guenther
On Fri, Apr 27, 2012 at 12:00 PM, Paulo J. Matos  wrote:
> On 27/04/12 09:21, Richard Guenther wrote:
>>>
>>>
>>> This differs from what GCC47 does and seems to work better.
>>> I would like help on how to best handle this situation under GCC47.
>>
>>
>> Not provide movmem which looks like open-coded and not in any way
>> "optimized"?
>>
>
> Thanks Richard, however I don't understand your comment.
> GCC46 outputs for this problem:
> $t25:
>        enterl  #H'0002
>        st      AL,@H'fff9
>        st      AH,@H'fff8
>        ld      X,@$XAP_AH
>        ld      X,@(0,X)
>        ld      AL,#H'0010
>        ld      AH,@H'fff9
>        bc2
>        ld      AL,@H'fff8
>        leavel  #H'0002
>
> and GCC47, once movmemqi and setmemqi are disabled:
> $t25:
>        enterl  #H'0005
>        st      AH,@(H'0001,Y)
>        ld      X,@$XAP_AH
>        ld      X,@(0,X)
>        ld      AH,#H'0010
>        st      AH,@(0,Y)
>        ld      AH,@$XAP_UXL
>        bsr     $memcpy
>        ld      AL,@(H'0001,Y)
>        leavel  #H'0005
>
>
> It feels to me that GCC46 version is better:
> * no branch to subroutine memcpy;
> * less stack usage (argument to enterl);
>
> So, using our block copy (bc2) instruction is an optimisation, don't you
> think?

Yes, it inlines it.  You may want to look at s390 which I believe has
a similar block-copy operation.

Richard.

> --
> PMatos
>


Re: GCC47 movmem breaks RA, GCC46 RA is fine

2012-04-27 Thread Paulo J. Matos

On 27/04/12 11:49, Richard Guenther wrote:

It feels to me that GCC46 version is better:
* no branch to subroutine memcpy;
* less stack usage (argument to enterl);

So, using our block copy (bc2) instruction is an optimisation, don't you
think?


Yes, it inlines it.  You may want to look at s390 which I believe has
a similar block-copy operation.



I am not sure I understood your comment. GCC46 generates the bc2 call 
due to my implementation of movmemqi. If I remove it, as you suggested, 
GCC47 will always call memcpy and will be worse off.


I will me looking at s390 for inspiration. Thanks for the suggestion.

--
PMatos



Re: locating unsigned type for non-standard precision

2012-04-27 Thread Peter Bigot
On Fri, Apr 27, 2012 at 4:29 AM, Georg-Johann Lay  wrote:
> Richard Guenther wrote:
>> [PR c/51527]
>>
>> I think the fix would be sth like
>>
>> Index: gcc/convert.c
>> ===
>> --- gcc/convert.c       (revision 186871)
>> +++ gcc/convert.c       (working copy)
>> @@ -769,6 +769,7 @@ convert_to_integer (tree type, tree expr
>>                    (Otherwise would recurse infinitely in convert.  */
>>                 if (TYPE_PRECISION (typex) != inprec)
>>                   {
>> +                   tree otypex = typex;
>>                     /* Don't do unsigned arithmetic where signed was wanted,
>>                        or vice versa.
>>                        Exception: if both of the original operands were
>> @@ -806,10 +807,11 @@ convert_to_integer (tree type, tree expr
>>                       typex = unsigned_type_for (typex);
>>                     else
>>                       typex = signed_type_for (typex);
>> -                   return convert (type,
>> -                                   fold_build2 (ex_form, typex,
>> -                                                convert (typex, arg0),
>> -                                                convert (typex, arg1)));
>> +                   if (TYPE_PRECISION (otypex) == TYPE_PRECISION (typex))
>> +                     return convert (type,
>> +                                     fold_build2 (ex_form, typex,
>> +                                                  convert (typex, arg0),
>> +                                                  convert (typex, arg1)));
>>                   }
>>               }
>>           }
>
> Thanks for the patch.
>
> I bootstrapped and regression-tested on i686-pc-linux-gnu.
>
> If it's ok with you I'd go ahead and install it.
>
> And maybe Peter could tell if it also fixes the issue on his platform.

It does (though so did moving the original test down past the last
change to typex).

I've switched mspgcc to use the version you committed.

Peter

>
> Johann


Re: GCC47 movmem breaks RA, GCC46 RA is fine

2012-04-27 Thread Paulo J. Matos

On 27/04/12 11:49, Richard Guenther wrote:


Yes, it inlines it.  You may want to look at s390 which I believe has
a similar block-copy operation.

Richard.



I looked at s390 and even though the block copy instruction seems 
similar ours is much more restrictive since it expects values in 
specific registers, instead of allowing the register numbers to be 
passed to the instruction (which is the case with s390 mvcle insn).


I decided to try and not hardcode the registers in the instruction but 
since the instruction requires specific registers as operands I had to 
create a class per register (with a single register in it) and then 
register constraints for each of the classes. This turned out not to 
work. RA breaks even earlier than before. Here's what I did:

(define_expand "movmemqi"
  [(set (match_operand:BLK 0 "memory_operand"); destination
(match_operand:BLK 1 "memory_operand"))   ; source
   (use (match_operand:QI 2 "general_operand"))]  ; count
  "!TARGET_NO_BLOCK_COPY && !reload_completed"
{
rtx dst_addr = XEXP(operands[0], 0);
rtx src_addr = XEXP(operands[1], 0);
rtx dst_reg = gen_reg_rtx(QImode); /* will be forced into AH */
rtx src_reg = gen_reg_rtx(QImode); /* will be forced into XL */
rtx cnt_reg = gen_reg_rtx(QImode); /* will be forced into AL */

emit_move_insn(cnt_reg, operands[2]);

if(GET_CODE(dst_addr) == PLUS)
{
emit_move_insn(dst_reg, XEXP(dst_addr, 0));
emit_insn(gen_addqi3(dst_reg, dst_reg, XEXP(dst_addr, 1)));
}
else
emit_move_insn(dst_reg, dst_addr);

if(GET_CODE(src_addr) == PLUS)
{
emit_move_insn(src_reg, XEXP(src_addr, 0));
emit_insn(gen_addqi3(src_reg, src_reg, XEXP(src_addr, 1)));
}
else
emit_move_insn(src_reg, src_addr);

emit_insn(gen_bc2(dst_reg, src_reg, cnt_reg));

DONE;
})

(define_insn "bc2"
  [(set (match_operand:QI 0 "register_operand" "=l")
(const_int 0))
   (set (mem:BLK (match_operand:QI 1 "register_operand" "=h"))
(mem:BLK (match_operand:QI 2 "register_operand" "=x")))
   (set (match_dup 2)
(plus:QI (match_dup 2) (match_dup 0)))
   (set (match_dup 1) (plus:QI (match_dup 1) (match_dup 0)))]
  "!TARGET_NO_BLOCK_COPY"
  "bc2")

constraints l, h and x correspond to singleton classes for registers AL, 
AH and XL respectively. I think the problem here is the RA inability to 
deal with such a constrained register set. Since I want to be able to 
use our block copy instruction instead of disabling movmemqi, setmemqi 
and therefore branch to memcpy, is there anything I can try to tune the RA?


--
PMatos



Congreso Nacional de Flotillas de Autotransporte Guadalajara, Monterrey y Mexico D.F

2012-04-27 Thread ALBERTO Mendoza
Congreso Nacional de Flotillas de Autotransporte 2012

Guadalajara 24 Y 25 de Mayo 2012
Monterrey 28 Y 29 de Mayo  2012
MÉXICO, D.F. 30 Y 31 de Mayo de 2012 

Responda con los siguientes datos para recibir un folleto:

Nombre: 
Empresa: 
Teléfono (Lada): 
Ext: 
Número de Interesados: 

Centro de Atención Telefonica 01 80025020 30

Para desuscribir su correo g...@gnu.org a este boletin responda 
Flotillasbaja



Fwd: Using movw/movt rather than minipools in ARM gcc

2012-04-27 Thread David Sehr
Hello All,

We are using gcc trunk as of 4/27/12, and are attempting to add
support to the ARM gcc compiler for Native Client.
We are trying to get gcc -march=armv7-a to use movw/movt consistently
instead of minipools. The motivation is for
a new target variant where armv7-a is the minimum supported and
non-code in .text is never allowed (per Native Client rules).
But the current behavior looks like a generically poor optimization
for -march=armv7-a.  (Surely memory loads are slower
than movw/movt, and no space is saved in many cases.)  For further
details, this seems to only happen with -O2 or higher.
-O1 generates movw/movt, seemingly because cprop is folding away a
LO_SUM/HIGH pair.  Another data point to note
is that "Ubuntu/Linaro 4.5.2-8ubuntu3" does produce movw/movt for this
test case, but we haven't tried stock 4.5.

I have enabled TARGET_USE_MOVT, which should force a large fraction of
constant materialization to use movw/movt
rather than pc-relative loads.  However, I am still seeing pc-relative
loads for the following example case and am looking
for help from the experts here.

int a[1000], b[1000], c[1000];

void foo(int n) {
  int i;
  for (i = 0; i < n; ++i) {
    a[i] = b[i] + c[i];
  }
}

When I compile this I get:

foo:
        ...
ldr r3, .L7
ldr r1, .L7+4
ldr r2, .L7+8
        ...
.L7:
.word  b
.word  c
.word  a
.size foo, .-foo
.comm c,4000,4
.comm b,4000,4
.comm a,4000,4

From some investigation, it seems I need to add a define_split to
convert SYMBOL_REFs to LO_SUM/HIGH pairs.
There is already a function called arm_split_constant that seems to do
this, but no rule seems to be firing to cause
it to get invoked.  Before I dive into writing the define_split, am I
missing something obvious?

Cheers,

David


IRA and two-phase load/store

2012-04-27 Thread Greg McGary
I'm working on a port that does loads & stores in two phases.
Every load/store is funneled through the intermediate registers "ld" and "st"
standing between memory and the rest of the register file.

Example:
ld=4(rB)
...
...
rC=ld

st=rD
8(rB)=st

rB is a base address register, rC and rD are data regs.  The ... represents
load delay cycles.

The CPU has only a single instance of "ld", but the machine description
defines five in order to allow overlapping live ranges to pipeline loads.

My mov insn patterns have constraints so that a memory destination pairs with
the "st" register source, and a memory source pairs with "ld" destination
reg.  The trouble is that register allocation doesn't understand the
constraint, so it loads/stores from/to random data registers.

Is there a way to confine register allocation to the "ld" and "st" classes,
or is it better to let IRA do what it wants, then fixup after reload with
splits to turn single insn rC=MEM into the insn pair ld=MEM ... rC=ld ?

Greg


Re: IRA and two-phase load/store

2012-04-27 Thread Greg McGary
On 04/27/12 14:31, Greg McGary wrote:
> I'm working on a port that does loads & stores in two phases.
> Every load/store is funneled through the intermediate registers "ld" and "st"
> standing between memory and the rest of the register file.
>
> Example:
> ld=4(rB)
> ...
> ...
> rC=ld
>
> st=rD
> 8(rB)=st
>
> rB is a base address register, rC and rD are data regs.  The ... represents
> load delay cycles.
>
> The CPU has only a single instance of "ld", but the machine description
> defines five in order to allow overlapping live ranges to pipeline loads.
>
> My mov insn patterns have constraints so that a memory destination pairs with
> the "st" register source, and a memory source pairs with "ld" destination
> reg.  The trouble is that register allocation doesn't understand the
> constraint, so it loads/stores from/to random data registers.

Clarification: I understand that IRA will do this, but I also thought that 
reload
was supposed to notice that the insn didn't match its constraints and emit reg
copies in order to fixup.  It doesn't do that for me--postreload just asserts,
complaining that the insn doesn't match its constraints.

> Is there a way to confine register allocation to the "ld" and "st" classes,
> or is it better to let IRA do what it wants, then fixup after reload with
> splits to turn single insn rC=MEM into the insn pair ld=MEM ... rC=ld ?
>
> Greg



gcc-4.6-20120427 is now available

2012-04-27 Thread gccadmin
Snapshot gcc-4.6-20120427 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20120427/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch 
revision 186921

You'll find:

 gcc-4.6-20120427.tar.bz2 Complete GCC

  MD5=7bece036826df08d82ce04693ae343e2
  SHA1=17160e039f905ffdf57e0887501425f1240e3ca3

Diffs from 4.6-20120420 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: IRA and two-phase load/store

2012-04-27 Thread Paul_Koning
I think this is what secondary reload is for.  Check the internals manual.

Something like this shows up in the pdp11 port, where float registers f4 and f5 
can't be loaded/stored directly.  You can see in that port how this is handled; 
it seems to work.

paul

On Apr 27, 2012, at 5:31 PM, Greg McGary wrote:

> I'm working on a port that does loads & stores in two phases.
> Every load/store is funneled through the intermediate registers "ld" and "st"
> standing between memory and the rest of the register file.
> 
> Example:
>ld=4(rB)
>...
>...
>rC=ld
> 
>st=rD
>8(rB)=st
> 
> rB is a base address register, rC and rD are data regs.  The ... represents
> load delay cycles.
> 
> The CPU has only a single instance of "ld", but the machine description
> defines five in order to allow overlapping live ranges to pipeline loads.
> 
> My mov insn patterns have constraints so that a memory destination pairs with
> the "st" register source, and a memory source pairs with "ld" destination
> reg.  The trouble is that register allocation doesn't understand the
> constraint, so it loads/stores from/to random data registers.
> 
> Is there a way to confine register allocation to the "ld" and "st" classes,
> or is it better to let IRA do what it wants, then fixup after reload with
> splits to turn single insn rC=MEM into the insn pair ld=MEM ... rC=ld ?
> 
> Greg



conflicts between combine and pre global passes?

2012-04-27 Thread Bin.Cheng
Hi,
I noticed that global passes before combine, like loop-invariant/cprop/cse2 some
time have conflicts with combine.
The combine pass can only operates with basic block, while these global passes
move insns across basic block and left no description info.

For example, a case I encountered.

(insn 77 75 78 8 (set (reg:SI 171)
(const_int  [0x270f])) diffmeasure/verify.c:157 176
{*thumb1_movsi_insn}
 (nil))

(insn 78 77 79 8 (set (reg:SI 172)
(const_int 0 [0])) diffmeasure/verify.c:157 176 {*thumb1_movsi_insn}
 (nil))

(insn 79 78 80 8 (set (reg:SI 170)
(plus:SI (plus:SI (reg:SI 172)
(reg:SI 172))
(geu:SI (reg:SI 171)
(reg/v:SI 138 [ num ] diffmeasure/verify.c:157 226
{thumb1_addsi3_addgeu}
 (expr_list:REG_DEAD (reg:SI 172)
(expr_list:REG_DEAD (reg:SI 171)
(expr_list:REG_EQUAL (geu:SI (const_int  [0x270f])
(reg/v:SI 138 [ num ]))
(nil)

(insn 80 79 81 8 (set (reg:QI 169)
(subreg:QI (reg:SI 170) 0)) diffmeasure/verify.c:157 187
{*thumb1_movqi_insn}
 (expr_list:REG_DEAD (reg:SI 170)
(nil)))

(insn 81 80 82 8 (set (reg:SI 173)
(zero_extend:SI (reg:QI 169))) diffmeasure/verify.c:157 158
{*thumb1_zero_extendqisi2_v6}
 (expr_list:REG_DEAD (reg:QI 169)
(nil)))

(jump_insn 82 81 124 8 (set (pc)
(if_then_else (eq (reg:SI 173)
(const_int 0 [0]))
(label_ref:SI 129)
(pc))) diffmeasure/verify.c:157 196 {cbranchsi4_insn}
 (expr_list:REG_DEAD (reg:SI 173)
(expr_list:REG_BR_PROB (const_int 900 [0x384])
(nil)))
 -> 129)

can be combined into :
(insn 77 75 78 8 (set (reg:SI 171)
(const_int  [0x270f])) diffmeasure/verify.c:157 176
{*thumb1_movsi_insn}
 (nil))

(note 78 77 79 8 NOTE_INSN_DELETED)

(note 79 78 80 8 NOTE_INSN_DELETED)

(note 80 79 81 8 NOTE_INSN_DELETED)

(note 81 80 82 8 NOTE_INSN_DELETED)

(jump_insn 82 81 124 8 (set (pc)
(if_then_else (ltu (reg:SI 171)
(reg/v:SI 138 [ num ]))
(label_ref:SI 129)
(pc))) diffmeasure/verify.c:157 196 {cbranchsi4_insn}
 (expr_list:REG_DEAD (reg:SI 171)
(expr_list:REG_BR_PROB (const_int 900 [0x384])
(nil)))
 -> 129)

BUT, if pre-combine passes propagate register 172 in insn79 and delete insn78,
the resulting instructions will not be combined.

I am not sure how to handle this, so any advice will be very appreciated.
Thanks.

-- 
Best Regards.