Re: Code optimization only in loops

2009-07-13 Thread Jean Christophe Beyler
Sorry to be coming back to this problem, I'm working on many projects
at the same time at the moment.

The port actually has a shift-immediate. It actually sees it later in
the fwprop pass:

In insn 14, replacing
 (ashift:DI (reg:DI 79)
(reg:DI 77))
 with (ashift:DI (reg:DI 79)
(const_int 3 [0x3]))
Changed insn 14
deferring rescan insn with uid = 14.

(not the same numbers but you get the idea). I don't understand why
there is a problem here but it does seem that it is linked to the way
my port handles the calculations of addresses.

Thanks again if you have any ideas, or anything I can do to give
information or ideas,
Jc


On Fri, Jun 12, 2009 at 3:55 AM, Paolo Bonzini wrote:
> Jean Christophe Beyler wrote:
>>
>> I've gone back to this problem (since I've solved another one ;-)).
>> And I've moved forward a bit:
>>
>> It seems that if I consider an array of characters, there are no
>> longer any shifts and therefore I do get my two loads with the use of
>> an offset:
>
> The reason there are shifts instead of multiplies is that multiplications
> are canonicalized to shifts whenever possible outside addresses, because a
> shift instruction should be more efficient.
>
> The interesting dump should be fwprop which is where the address generation
> happens.
>
> From your dumps, however, the problem seems to be that you do not have a
> shift-by-immediate instruction.  You may consider adding it even though it
> does not apply to your assembly, either with define_insn_and_split or by
> loosening the predicate and keeping a "r" constraint (or whatever you're
> using now).
>
> Paolo
>


Adding constraints.md to my port

2009-07-14 Thread Jean Christophe Beyler
Dear all,

I'm continuing this port and have run into something strange, if I add
a constraints.md file with only this:
(define_constraint "I"
  "A signed 16-bit constant (for arithmetic instructions)."
  (and (match_code "const_int")
   (match_test "SMALL_OPERAND (ival)")))

(Like one of the constraints from the mips port).

And add a :
(include "constraints.md")

to my md file, I get these during the compilation:
In file included from ./tm_p.h:5,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:35:
./tm-preds.h:40:1: warning: "CONSTRAINT_LEN" redefined
In file included from ./tm.h:6,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:24:
/home/toto/gcc-4.3.2/gcc/defaults.h:797:1: warning: this is the
location of the previous definition
In file included from ./tm_p.h:5,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:35:
./tm-preds.h:42:1: warning: "REG_CLASS_FROM_CONSTRAINT" redefined
In file included from ./tm.h:6,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:24:
/home/toto/gcc-4.3.2/gcc/defaults.h:810:1: warning: this is the
location of the previous definition
In file included from ./tm_p.h:5,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:35:
./tm-preds.h:44:1: warning: "CONST_OK_FOR_CONSTRAINT_P" redefined
In file included from ./tm.h:6,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:24:
/home/toto/gcc-4.3.2/gcc/defaults.h:801:1: warning: this is the
location of the previous definition
In file included from ./tm_p.h:5,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:35:
./tm-preds.h:47:1: warning: "CONST_DOUBLE_OK_FOR_CONSTRAINT_P" redefined
In file included from ./tm.h:6,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:24:
/home/toto/gcc-4.3.2/gcc/defaults.h:805:1: warning: this is the
location of the previous definition
In file included from ./tm_p.h:5,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:35:
./tm-preds.h:49:1: warning: "EXTRA_MEMORY_CONSTRAINT" redefined
In file included from ./tm.h:6,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:24:
/home/toto/gcc-4.3.2/gcc/defaults.h:781:1: warning: this is the
location of the previous definition
In file included from ./tm_p.h:5,
 from /home/toto/gcc-4.3.2/gcc/c-pragma.c:35:
./tm-preds.h:51:1: warning: "EXTRA_ADDRESS_CONSTRAINT" redefined


Any ideas why this happens when I add a constraint to my machine description?

As always, thank you very much for your time,
Jean Christophe Beyler


Code optimization with GCSE

2009-07-14 Thread Jean Christophe Beyler
Dear all,

As some might know, I've been concentrating on optimizing the handling
of loads for my port of GCC. I'm now considering this code:
uint64_t data[107];
uint64_t foo (void)
{
uint64_t x0, x1, x2, x3, x4, x5,x6,x7;
uint64_t i;

for(i=0;i<107;i++) {
data[i] = i;
}

return data[0] + data[1] + data[2];
}

However, the code I get is:

la  r9,data
mov r8, r9
#LOOP, of no consequence, it uses r8 for the stores...

la  r7,data+8#Loads data+8 in r7 instead of using r9 directly
ldd r9,0(r9)   #This loads from r9 and the two next from r7
ldd r6,0(r7)
ldd r8,8(r7)


As you can see, the compiler uses r9 to store data and then uses that
for data[0] but also loads in r7 data+8 instead of directly using r9.
If I remove the loop then it does not do this.

Any ideas where I can look for this, or is this going to be difficult to fix?
Thanks !
Jean Christophe Beyler


Re: Adding constraints.md to my port

2009-07-14 Thread Jean Christophe Beyler
Ok, so actually, my cpu.h file had some of those included. I've
removed them and am now filling in the constraints.md file with what
is needed. I also so that they were obsolete in the GCC internals,
I'll be moving everything to the constraints then.

Thank you for your help,
Jc

On Tue, Jul 14, 2009 at 3:53 PM, Ian Lance Taylor wrote:
> Jean Christophe Beyler  writes:
>
>> In file included from ./tm_p.h:5,
>>                  from /home/toto/gcc-4.3.2/gcc/c-pragma.c:35:
>> ./tm-preds.h:40:1: warning: "CONSTRAINT_LEN" redefined
>> In file included from ./tm.h:6,
>>                  from /home/toto/gcc-4.3.2/gcc/c-pragma.c:24:
>> /home/toto/gcc-4.3.2/gcc/defaults.h:797:1: warning: this is the
>> location of the previous definition
>
> See defaults.h:
>
> #if  !defined CONSTRAINT_LEN                    \
>  && !defined REG_CLASS_FROM_LETTER             \
>  && !defined REG_CLASS_FROM_CONSTRAINT         \
>  && !defined CONST_OK_FOR_LETTER_P             \
>  && !defined CONST_OK_FOR_CONSTRAINT_P         \
>  && !defined CONST_DOUBLE_OK_FOR_LETTER_P      \
>  && !defined CONST_DOUBLE_OK_FOR_CONSTRAINT_P  \
>  && !defined EXTRA_CONSTRAINT                  \
>  && !defined EXTRA_CONSTRAINT_STR              \
>  && !defined EXTRA_MEMORY_CONSTRAINT           \
>  && !defined EXTRA_ADDRESS_CONSTRAINT
>
> #define USE_MD_CONSTRAINTS
>
>
> Make sure that your CPU.h file does not define any of those macros.
>
> Ian
>


Re: Code optimization with GCSE

2009-07-15 Thread Jean Christophe Beyler
Ah ok, so I can see why it would not be able to perform that
optimization around the loop but I changed the code to simply have
this:

uint64_t foo (void)
{
return data[0] + data[1] + data[2];
}

And this generates :

la  r9,data
la  r7,data+8
ldd r6,0(r7)
ldd r8,0(r9)
ldd r7,16(r9)

I'm trying to see if there is a problem with my rtx costs function
because again, I don't understand why it would generate 2 la instead
of using an offset of 8 and 16.

Thanks for any input,
Jc

On Wed, Jul 15, 2009 at 1:29 AM, Paolo Bonzini wrote:
>
>> As you can see, the compiler uses r9 to store data and then uses that
>> for data[0] but also loads in r7 data+8 instead of directly using r9.
>> If I remove the loop then it does not do this.
>
> This optimization is done by CSE only, currently.  That's why it cannot look
> through loops.
>
> Paolo
>


Re: Adding constraints.md to my port

2009-07-15 Thread Jean Christophe Beyler
I agree. I've actually pretty much finished that change and it seems
to be stable. I think it will be more maintainable in the
constraints.md state than it was before,

Thanks again,
Jc

On Wed, Jul 15, 2009 at 1:30 AM, Paolo Bonzini wrote:
> On 07/14/2009 10:31 PM, Jean Christophe Beyler wrote:
>>
>> Ok, so actually, my cpu.h file had some of those included. I've
>> removed them and am now filling in the constraints.md file with what
>> is needed. I also so that they were obsolete in the GCC internals,
>> I'll be moving everything to the constraints then.
>
> That's good, it's usually much more readable.  However, as you found out,
> it's an "all or nothing" decision for each port.
>
> Paolo
>


Re: Code optimization with GCSE

2009-07-15 Thread Jean Christophe Beyler
The subreg pass has this :

(insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74)
(const:DI (plus:DI (symbol_ref:DI ("data") )
(const_int 8 [0x8] 71 {movdi_internal} (nil))

(insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75)
(symbol_ref:DI ("data") )) 71
{movdi_internal} (nil))

...

(insn 10 9 11 2 ex1b.c:8 (set (reg/f:DI 79)
(const:DI (plus:DI (symbol_ref:DI ("data") )
(const_int 16 [0x10] 71 {movdi_internal} (nil))


As we can see, all three are using the symbol_ref data before adding
their offset. But after cse, we get this:

(insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74)
(const:DI (plus:DI (symbol_ref:DI ("data") )
(const_int 8 [0x8] 71 {movdi_internal} (nil))

(insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75)
(symbol_ref:DI ("data") )) 71
{movdi_internal} (nil))

...

(insn 10 9 11 2 ex1b.c:8 (set (reg/f:DI 79)
(plus:DI (reg/f:DI 75)
(const_int 16 [0x10]))) 2 {adddi3_port}
(expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("data")
table_size = 0, use_info->table_size = 0
;;  invalidated by call  2 [r2] 4 [r4] 5 [r5] 6 [r6] 7 [r7] 8 [r8] 9
[r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14 [r14] 15 [r15] 16 [r16] 17
[r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22] 23 [r23] 24 [r24]
25 [r25] 26 [r26] 27 [r27] 28 [r28] 29 [r29] 30 [r30] 31 [r31] 32
[r32] 33 [r33] 34 [r34] 35 [r35] 36 [r36] 37 [r37] 38 [r38] 39 [r39]
40 [r40] 41 [r41] 42 [r42] 43 [r43] 44 [r44] 45 [r45] 46 [r46] 47
[r47] 63 [r63] 64 [$rap] 65 [cc] 66 [acc]
;;  hardware regs used   0 [r0] 1 [r1] 3 [r3]
;;  regular block artificial uses0 [r0] 1 [r1] 3 [r3] 62 [r62]
;;  eh block artificial uses 0 [r0] 1 [r1] 3 [r3] 62 [r62]
;;  entry block defs 0 [r0] 1 [r1] 3 [r3] 6 [r6] 8 [r8] 9 [r9] 10
[r10] 11 [r11] 12 [r12] 13 [r13] 14 [r14] 15 [r15] 62 [r62] 63 [r63]
;;  exit block uses  1 [r1] 3 [r3] 6 [r6] 62 [r62]
;;  regs ever live   6[r6]

( )->[0]->( 2 )
;; bb 0 artificial_defs: { d-1(0){ }d-1(1){ }d-1(3){ }d-1(6){ }d-1(8){
}d-1(9){ }d-1(10){ }d-1(11){ }d-1(12){ }d-1(13){ }d-1(14){ }d-1(15){
}d-1(62){ }d-1(63){ }}
;; bb 0 artificial_uses: { }

( 0 )->[2]->( 1 )
;; bb 2 artificial_defs: { }
;; bb 2 artificial_uses: { u-1(0){ }u-1(1){ }u-1(3){ }u-1(62){ }}

( 2 )->[1]->( )
;; bb 1 artificial_defs: { }
;; bb 1 artificial_uses: { u-1(1){ }u-1(3){ }u-1(6){ }u-1(62){ }}

Finding needed instructions:
  Adding insn 23 to worklist
Finished finding needed instructions:
processing block 2 live out =  0 [r0] 1 [r1] 3 [r3] 6 [r6] 62 [r62]
  Adding insn 17 to worklist
  Adding insn 13 to worklist
  Adding insn 12 to worklist
  Adding insn 11 to worklist
  Adding insn 10 to worklist
  Adding insn 9 to worklist
  Adding insn 8 to worklist
  Adding insn 7 to worklist
  Adding insn 6 to worklist
  Adding insn 5 to worklist
df_worklist_dataflow_overeager:n_basic_blocks 3 n_edges 2 count 3 (1)
;; Following path with 11 sets: 2
deferring rescan insn with uid = 10.
deferring rescan insn with uid = 17.


try_optimize_cfg iteration 1

(note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)

(insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74)
(const:DI (plus:DI (symbol_ref:DI ("data") )
(const_int 8 [0x8] 71 {movdi_internal} (nil))

(insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75)
(symbol_ref:DI ("data") )) 71
{movdi_internal} (nil))

(insn 7 6 8 2 ex1b.c:8 (set (reg:DI 77 [ data+8 ])
(mem/s:DI (reg/f:DI 74) [2 data+8 S8 A64])) 71 {movdi_internal} (nil))

(insn 8 7 9 2 ex1b.c:8 (set (reg:DI 78 [ data ])
(mem/s:DI (reg/f:DI 75) [2 data+0 S8 A64])) 71 {movdi_internal} (nil))

(insn 9 8 10 2 ex1b.c:8 (set (reg:DI 76)
(plus:DI (reg:DI 77 [ data+8 ])
(reg:DI 78 [ data ]))) 2 {adddi3_port} (nil))

(insn 10 9 11 2 ex1b.c:8 (set (reg/f:DI 79)
(plus:DI (reg/f:DI 75)
(const_int 16 [0x10]))) 2 {adddi3_port}
(expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI ("data")
)
(const_int 16 [0x10])))
(nil)))

(insn 11 10 12 2 ex1b.c:8 (set (reg:DI 80 [ data+16 ])
(mem/s:DI (reg/f:DI 79) [2 data+16 S8 A64])) 71 {movdi_internal} (nil))

(insn 12 11 13 2 ex1b.c:8 (set (reg:DI 73)
(plus:DI (reg:DI 76)
(reg:DI 80 [ data+16 ]))) 2 {adddi3_port} (nil))

(insn 13 12 17 2 ex1b.c:8 (set (reg:DI 72 [  ])
(reg:DI 73)) 71 {movdi_internal} (nil))

(insn 17 13 23 2 ex1b.c:10 (set (reg/i:DI 6 r6)
(reg:DI 73)) 71 {movdi_internal} (nil))

(insn 23 17 0 2 ex1b.c:10 (use (reg/i:DI 6 r6)) -1 (nil))
starting the processing of deferred insns
rescanning insn with uid = 10.
deleting insn with uid = 10.
rescanning insn with uid = 17.
deleting insn with uid = 17.
ending the processing of deferred insns




On Wed, Jul 15, 2009 at 12:25 PM, Adam Nemet wrote:
> Jean Christophe Beyler  writes:
>> uint64_t fo

Re: How to set the alignment

2009-08-04 Thread Jean Christophe Beyler
I had a similar issue on my port. I fixed it by adding a check_move
before the generation of the move to look at the potential offset
used.

That allowed the compiler to not generate those wrong offsets
depending on the mode on my port. It looks something like this :

/* Check the operands of the move to make sure we can perform the move */
bool check_move (rtx op0, rtx op1)
{
/* First check offsets */
if (MEM_P (op0))
{
int offset;
int res = rtx_get_offset (XEXP (op0, 0), &offset);

if (res < 0)
return false;

if (! cyclops_check_offset (offset, GET_MODE (op0)))
{
return false;
}
}

/* Check second operand */
if (MEM_P (op1))
{
int offset;
int res = rtx_get_offset (XEXP (op1, 0), &offset);

if (res < 0)
return false;

if (! cyclops_check_offset (offset, GET_MODE (op1)))
{
return false;
}
}

return true;
}

Jc

On Tue, Aug 4, 2009 at 1:39 AM, Mohamed Shafi wrote:
> 2009/8/3 Jim Wilson :
>> On 08/03/2009 02:14 AM, Mohamed Shafi wrote:
>>>
>>> short - 2 bytes
>>> i am not able to implement the alignment for short.
>>> The following is are the macros that i used for this
>>> #define PARM_BOUNDARY 8
>>> #define STACK_BOUNDARY 64
>>
>> You haven't explained what the actual problem is.  Is there a problem with
>> global variables?  Is the variable initialized or uninitialized? If it is
>> uninitialized, is it common?  If this a local variable?  Is this a function
>> argument or parameter?  Is this a named or unnamed (stdarg) argument or
>> parameter?  Etc.  It always helps to include a testcase.
>>
>> You should also mention what gcc is currently emitting, why it is wrong, and
>> what the output should be instead.
>>
>> All this talk about stack and parm boundary suggests that it might be an
>> issue with function arguments, in which case you will probably have to
>> describe the calling conventions a bit so we can understand what you want.
>>
>  This is the test case that i tried
>
> short funs (int a, int b, char w,short e,short r)
> {
>  return e+r;
> }
>
> The target is 32bit . The first two parameters are passed in registers
> and the rest in stack. For the parameters that are passed in stack the
> alignment is that of the data type. The stack pointer is 8 byte
> aligned. char is 1 byte, int is 4 byte and short is 2 byte. The code
> that is getting generated is give below (-O0 -fomit-frame-pointer)
>
> funs:
>        add      16,sp
>        mov  d0,(sp-16)
>        mov  d1,(sp-12)
>        movh  (sp-19),d0
>        movh  d0,(sp-8)
>        movh  (sp-21),d0
>        movh  d0,(sp-6)
>        movh  (sp-8),d1
>        movh  (sp-6),d0
>        add     d1,d0,d0
>        sub    16,sp
>        ret
>
>
> From the above code you can see that some of the half word access is
> not aligned on a 2byte boundary.
>
> So where am i going wrong.
> Hope this info is enough
>
> Regards,
> Shafi
>


GREG modifying the FIXED_REGISTERS...

2009-08-21 Thread Jean Christophe Beyler
Dear all,

I actually have determined that doing what I say below is a bad idea
since it will probably lessen the impact of future optimizations in
the compiler. However, I'm curious to know why it didn't work :-)

Here goes: I've been working on getting better code generation and one
thing I noticed was that it is best to use the register r0 on my
architecture than the constant 0. Therefore, as I expand move
instructions, I check to see I can transform the second operand into
r0 like so:

/* If we have a 0, use r0 instead */
if ( ((GET_CODE (op1) == CONST_INT) && (INTVAL (op1) == 0)) ||
(GET_CODE (op1) == CONST_DOUBLE &&
CONST_DOUBLE_LOW (op1) == CONST_DOUBLE_HIGH (op1) &&
CONST_DOUBLE_LOW (op1) == 0))
{
op1 = gen_rtx_REG (GET_MODE (op0),0);
emit_move_insn (op0, op1);
return 1;
}

Now this seems to work correctly and I get the following RTL later
down the line:

(insn:HI 66 64 69 2 strlen.c:37 (set (reg:DI 138)
(eq:DI (reg:DI 136)
(reg/f:DI 0 r0))) 80 {*cyclops.md:2813}
(expr_list:REG_DEAD (reg:DI 136)
(nil)))

However, for a reason that I ignore, the GREG pass, transforms the r0
into r3 which is not at all the same thing:

(insn:HI 66 64 69 2 strlen.c:37 (set (reg:DI 6 r6 [138])
(eq:DI (reg:DI 6 r6 [136])
(reg/f:DI 3 r3))) 80 {*cyclops.md:2813} (nil))

This is the output concerning the instruction from the GREG pass:

insn = 66 live = hardregs [62 ] renumbered [6 ] pseudos [  138(6)  135]
  adding def 138(6)
  roc adding 138<=>(6 62 138 135 )
  roc adding 138<=>6
  roc adding 135<=>6
  clearing def 138(6)
  seeing use 0
  seeing use 136(6)
dying pseudo
  set_renumbers_live 136->6
  starting early clobber conflicts.

Now that I think about it, I believe the problem is that for the GREG,
r0 can be replaced by r3 because I have not told him that r0 is
actually worth 0 all the time though it is defined by the
FIXED_REGISTER macro.

Though I've determined that doing so is actually a really bad idea for
future optimization passes (since this is done in the expand pass). I
would, however, like to understand why the GREG was replacing that
register and not leaving it alone since it was defined as a
FIXED_REGISTER.

What allowed it to change it and why would it change it to r3 that was
also a FIXED_REGISTER.

As always, thanks for any input,
Jean Christophe Beyler


Re: GREG modifying the FIXED_REGISTERS...

2009-08-22 Thread Jean Christophe Beyler
That was also the conclusion I came to. I was now wondering why GREG
allows itself to change that register from r0 to r3. Is there any
reason?

It seems to me that the most likely reason is that it considers in
this case r0 as undefined and thus can modify it to r3 because of
this. I was wondering if this assumption is correct and what "could I
have done" (though I won't) to tell him not to.

This question is now to understand the hows and whys of GREG instead
of how to do what I initially wanted. As I have stated and you have
confirmed, this would be a bad idea. I also noticed that by modifying
my constraints, I'm getting the generated code I was looking for.

Thanks again,
Jc

On Fri, Aug 21, 2009 at 8:12 PM, Richard Henderson wrote:
> On 08/21/2009 04:01 PM, Jean Christophe Beyler wrote:
>>
>>     /* If we have a 0, use r0 instead */
>
> Don't do this.  See how, e.g. alpha and mips handle the zero register.
> You want to leave this as (const_int 0) throughout compilation.
>
>
> r~
>


Bit fields

2009-08-31 Thread Jean Christophe Beyler
Dear all,

I am currently working on handling bit-fields on my port and am having
difficulties understanding why GCC is having problems with what I
wrote in.

Following what mips did:

(define_expand "extzv"
  [(set (match_operand 0 "register_operand")
(zero_extract (match_operand 1 "nonimmediate_operand")
  (match_operand 2 "immediate_operand")
  (match_operand 3 "immediate_operand")))]
  "!TARGET_MIPS16"
{
  ...
})

(define_insn "extzv"
  [(set (match_operand:GPR 0 "register_operand" "=d")
(zero_extract:GPR (match_operand:GPR 1 "register_operand" "d")
  (match_operand:SI 2 "immediate_operand" "I")
  (match_operand:SI 3 "immediate_operand" "I")))]
  "mips_use_ins_ext_p (operands[1], INTVAL (operands[2]),INTVAL (operands[3]))"
  "ext\t%0,%1,%3,%2"
  [(set_attr "type" "arith")
   (set_attr "mode" "")])

I did:

(define_insn "extzv"
   [(set (match_operand 0 "register_operand" "")
 (zero_extract (match_operand 1 "register_operand" "")
   (match_operand 2 "const_int_operand" "")
   (match_operand 3 "const_int_operand" "")))]
 ""
 "")

(define_insn "extzvdi2"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (zero_extract:DI (match_operand:DI 1 "register_operand" "r")
   (match_operand:DI 2 "const_int_operand" "L")
   (match_operand:DI 3 "const_int_operand" "L")))]
 "check_extract (DImode, DImode, operands[2], operands[2])"
 "extr\\t%0,%1,%3,%2";
 [(set_attr "type" "arith")
  (set_attr "mode" "DI")
  (set_attr "length"   "1")])

For the moment, I haven't put anything in my expand because I don't
know if anything is necessary yet. I was first looking at if GCC was
generating the right code on simple examples.

But I get this message:
struct4.c: In function 'goo':
struct4.c:32: internal compiler error: in simplify_subreg, at
simplify-rtx.c:4923

Does anybody know how can I solve this issue ?
Thanks again for all your help,
Jean Christophe Beyler


Re: Bit fields

2009-08-31 Thread Jean Christophe Beyler
Sorry, you are correct. That line is the :
  gcc_assert (outermode != VOIDmode);
of the simplify_subreg function.

However, I've played around with it and saw that I made a mistake when
writing up this question, I simplified what I had put in my MD file,
and actually made a mistake. I apologize.

If I replace this :
(define_insn "extzv"
  [(set (match_operand 0 "register_operand" "")
(zero_extract (match_operand 1 "register_operand" "")
  (match_operand 2 "const_int_operand" "")
  (match_operand 3 "const_int_operand" "")))]
 ""
 "")

into :
(define_expand "extzv"
  [(set (match_operand 0 "register_operand" "")
(zero_extract (match_operand 1 "register_operand" "")
  (match_operand 2 "const_int_operand" "")
  (match_operand 3 "const_int_operand" "")))]
 ""
 "")

I do not get any errors. I don't know if it's handled but it seems to
no longer have issues anymore.

However, if I consider this code:
typedef struct stest {
uint64_t a:1;
uint64_t b:1;
}STest;

void
goo (STest *t, uint64_t *a, uint64_t *b)
{
*a = t->a;
}

At the expand phase, I see:

(insn 9 8 10 3 struct4.c:24 (set (subreg:DI (reg:QI 76) 0)
(zero_extract:DI (reg:DI 75)
(const_int 1 [0x1])
(const_int 0 [0x0]))) -1 (nil))

(insn 10 9 11 3 struct4.c:24 (set (reg:DI 77)
(zero_extend:DI (reg:QI 76))) -1 (nil))

Is there anything I can do to remove that zero_extend?

Thanks,
Jc


Re: Bit fields

2009-08-31 Thread Jean Christophe Beyler
On Mon, Aug 31, 2009 at 4:36 PM, Richard Henderson wrote:
> On 08/31/2009 01:07 PM, Jean Christophe Beyler wrote:
>>
>> If I replace this :
>> (define_insn "extzv"
>>   [(set (match_operand 0 "register_operand" "")
>>         (zero_extract (match_operand 1 "register_operand" "")
>>                       (match_operand 2 "const_int_operand" "")
>>                       (match_operand 3 "const_int_operand" "")))]
>>  ""
>>  "")
>
> Well, I can tell you that an insn pattern with no modes
> on the non-immediate operands will definitely cause problems.

Yes that was a mistake on my part, I wanted a define_expand instead.

>> (insn 9 8 10 3 struct4.c:24 (set (subreg:DI (reg:QI 76) 0)
>>         (zero_extract:DI (reg:DI 75)
>>             (const_int 1 [0x1])
>>             (const_int 0 [0x0]))) -1 (nil))
>>
>> (insn 10 9 11 3 struct4.c:24 (set (reg:DI 77)
>>         (zero_extend:DI (reg:QI 76))) -1 (nil))
>>
>> Is there anything I can do to remove that zero_extend?
>
> You could try either using a predicate that disallows
> a subreg, or by having your expander rewrite things into
>
>  (set (reg:DI new-scratch))
>       (zero_extract:DI ...))
>  (set (reg:QI 76 (subreg:QI (reg:DI new scratch)))
>
> and relying on subsequent passes to clean that up.

I am going to try this but shouldn't it be :

(set (reg:QI new-scratch))
  (zero_extract:DI ...))
(set (reg:DI 76 (subreg:DI (reg:QI new scratch)))

?


I did this:
if (GET_CODE (operands[0]) != REG)
{
rtx tmp = gen_reg_rtx (DImode);
emit_insn (gen_extzvdi2 (tmp, operands[1], operands[2], operands[3]));
emit_move_insn (operands[0], tmp);
DONE;
}

Which generates in my case :
(insn 9 8 10 3 struct4.c:33 (set (reg:DI 77)
(zero_extract (reg:DI 75)
(const_int 1 [0x1])
(const_int 0 [0x0]))) -1 (nil))

(insn 10 9 11 3 struct4.c:33 (set (subreg:DI (reg:QI 76) 0)
(reg:DI 77)) -1 (nil))

(insn 11 10 12 3 struct4.c:33 (set (reg:DI 78)
(zero_extend:DI (reg:QI 76))) -1 (nil))

As we can see, I still have that zero_extend but maybe it can be
simplified afterwards. I don't know because, now, it fails at the same
place in :

struct4.c:35: internal compiler error: in simplify_subreg, at
simplify-rtx.c:4923
-> This line is the second assert of function simplify_subreg :
gcc_assert (outermode != VOIDmode);

The outermode is VoidMode and the op given to the function is the following :
(ashift:DI (mem/s:DI (reg/v/f:DI 72 [ t ]) [0 S8 A64])
(const_int -1 [0x]))

I don't know where this ashift is from since it doesn't appear in any
of the previous passes.

Any ideas?

As always, thanks,
Jc


Re: Bit fields

2009-08-31 Thread Jean Christophe Beyler
On Mon, Aug 31, 2009 at 5:27 PM, Richard Henderson wrote:
> On 08/31/2009 02:07 PM, Jean Christophe Beyler wrote:
>>
>> I am going to try this but shouldn't it be :
>>
>> (set (reg:QI new-scratch))
>>       (zero_extract:DI ...))
>
> No.

Ok, I think I understand why not:

>> (insn 9 8 10 3 struct4.c:24 (set (subreg:DI (reg:QI 76) 0)
>> (zero_extract:DI (reg:DI 75)
>> (const_int 1 [0x1])
>> (const_int 0 [0x0]))) -1 (nil))

Is basically saying :

(set (reg:DI new-scratch)
(zero_extract:DI (reg:DI 75)
(const_int 1 [0x1])
(const_int 0 [0x0]))) -1 (nil))

and then apply the subreg:

(set (reg:QI 76) (subreg:QI (reg:DI new-scratch)))

which is what you were saying. I was reading the subreg the other way around.

>
>> Any ideas?
>
> Nope.  You'll have to debug it.
>

Ok, is it normal to see a ashift with a negative value though or is
this already sign of a (potentially) different problem?

Thanks again and sorry about the random questions,
Jc


Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
Dear all,

I have been also been looking into how to generate a function call for
certain operations. I've looked at various other targets for a similar
problem/solution but have not seen anything. On my target
architecture, we have certain optimized versions of the multiplication
for example.

I wanted to replace certain mutliplications with a function call. The
solution I found was to do perform a FAIL on the define_expand of the
multiplication for these cases. This forces the compiler to generate a
function call to __multdi3.

I then go in the define_expand of the function call and check the
symbol_ref to see what function is called. I can then modify the call
at that point.

My question is: is this a good approach or is there another solution
that you would use?

Thanks again for your time,
Jean Christophe Beyler


Re: Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
I have looked at how other targest use the
init_builtins/expand_builtins. Of course, I don't understand
everything there but it seems indeed to be more for generating a
series of instructions instead of a function call. I haven't seen
anything resembling what I want to do.

I had also first thought of going directly in the define_expand and
expanding to the function call I would want. The problem I have is
that it is unclear to me how to handle (set-up) the arguments of the
builtin_function I am trying to define.

To go from no function call to :

- Potentially spill output registers
- Potentially spill scratch registers

- Setup output registers with the operands
- Perform function call
- Copy return to output operand

- Potentially restore scratch registers
- Potentially restore output registers

Seems a bit difficult to do at the define_expand level and might not
generate good code. I guess I could potentially perform a pass in the
tree representation to do what I am looking for but I am not sure that
that is the best solution either.

For the moment, I will continue looking at what you suggest and also
see if my solution works. I see that, for example, the compiler will
not always generate the call I need to change. Therefore, it does seem
that I need another solution than the one I propose.

I'm more and more considering a pass in the middle-end to get what I
need. Do you think this is better?

Thanks for your input,
Jc

On Tue, Sep 1, 2009 at 12:34 PM, Ian Lance Taylor wrote:
> Jean Christophe Beyler  writes:
>
>> I have been also been looking into how to generate a function call for
>> certain operations. I've looked at various other targets for a similar
>> problem/solution but have not seen anything. On my target
>> architecture, we have certain optimized versions of the multiplication
>> for example.
>>
>> I wanted to replace certain mutliplications with a function call. The
>> solution I found was to do perform a FAIL on the define_expand of the
>> multiplication for these cases. This forces the compiler to generate a
>> function call to __multdi3.
>>
>> I then go in the define_expand of the function call and check the
>> symbol_ref to see what function is called. I can then modify the call
>> at that point.
>>
>> My question is: is this a good approach or is there another solution
>> that you would use?
>
> I think that what you describe will work.  I would probably generate a
> call to a builtin function in the define_expand.  Look for the way
> targets use init_builtins and expand_builtin.  Normally expand_builtin
> expands to some target-specific RTL, but it can expand to a function
> call too.
>
> Ian
>


Re: Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
Actually, what I've done is probably something in between what you
were suggesting and what I was initially doing. If we consider the
multiplication, I've modified the define_expand for example to:

(define_expand "muldi3"
  [(set (match_operand:DI 0 "register_operand" "")
(mult:DI (match_operand:DI 1 "register_operand" "")
 (match_operand:DI 2 "register_operand" "")))]
 ""
 "
 {
emit_function_call_2args (DImode, DImode, DImode,
\"my_version_of_mull\", operands[0], operands[1], operands[2]);
DONE;
 }")

and my emit function is:

void
emit_function_call_2args (
enum machine_mode return_mode,
enum machine_mode arg1_mode,
enum machine_mode arg2_mode,
const char *fname,
rtx op0,
rtx op1,
rtx op2)
{
 tree id;
 rtx insn;

 /* Move arguments */
 emit_move_insn (gen_rtx_REG (arg1_mode, GP_ARG_FIRST), op1);
 emit_move_insn (gen_rtx_REG (arg2_mode, GP_ARG_FIRST + 1), op2);

 /* Get name */
 id = get_identifier (fname);

 /* Generate call value */
 insn = gen_call_value (
gen_rtx_REG (return_mode, 6),
gen_rtx_MEM (DImode,
   gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (id))),
GEN_INT (64),
NULL
);

 /* Annotate the call to say we are using both argument registers */
 use_reg (&CALL_INSN_FUNCTION_USAGE (insn), gen_rtx_REG
(arg1_mode, GP_ARG_FIRST));
 use_reg (&CALL_INSN_FUNCTION_USAGE (insn), gen_rtx_REG
(arg1_mode, GP_ARG_FIRST + 1));

 /* Emit call */
 emit_call_insn (insn);

 /* Set back return to op0 */
 emit_move_insn (op0, gen_rtx_REG (return_mode, GP_RETURN));
}

First off: does this seem correct?

Second, I have a bit of a worry in the case where, if we consider this C code :

bar (a * b, c * d);

it is possible that the compiler would have normally generated this :

mult output1, a, b
mult output2, c, d
call bar

Which would be problematic for my expand system since this would expand into:

mov output1, a
mov output2, b
call internal_mult
mov output1, return_reg

mov output1, c   #Rewriting on output1...
mov output2, d
call internal_mult
mov output2, return_reg

call bar


However, I am unsure this is possible in the expand stage, would the
expand stage automatically have this instead:

mult tmp1, a, b
mult tmp2, c, d

mov output1, tmp1
mov output2, tmp2
call bar

in which case, I know I can do what I am currently doing.

Thanks again for your help and I apologize for these basic questions...
Jc


On Tue, Sep 1, 2009 at 2:30 PM, Jean Christophe
Beyler wrote:
> I have looked at how other targest use the
> init_builtins/expand_builtins. Of course, I don't understand
> everything there but it seems indeed to be more for generating a
> series of instructions instead of a function call. I haven't seen
> anything resembling what I want to do.
>
> I had also first thought of going directly in the define_expand and
> expanding to the function call I would want. The problem I have is
> that it is unclear to me how to handle (set-up) the arguments of the
> builtin_function I am trying to define.
>
> To go from no function call to :
>
> - Potentially spill output registers
> - Potentially spill scratch registers
>
> - Setup output registers with the operands
> - Perform function call
> - Copy return to output operand
>
> - Potentially restore scratch registers
> - Potentially restore output registers
>
> Seems a bit difficult to do at the define_expand level and might not
> generate good code. I guess I could potentially perform a pass in the
> tree representation to do what I am looking for but I am not sure that
> that is the best solution either.
>
> For the moment, I will continue looking at what you suggest and also
> see if my solution works. I see that, for example, the compiler will
> not always generate the call I need to change. Therefore, it does seem
> that I need another solution than the one I propose.
>
> I'm more and more considering a pass in the middle-end to get what I
> need. Do you think this is better?
>
> Thanks for your input,
> Jc
>
> On Tue, Sep 1, 2009 at 12:34 PM, Ian Lance Taylor wrote:
>> Jean Christophe Beyler  writes:
>>
>>> I have been also been looking into how to generate a function call for
>>> certain operations. I've looked at various other targets for a similar
>>> problem/solution but have not seen anything. On my targe

Re: Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
I don't think I quite understand what you're meaning. I want to use
the standard ABI, basically I want to transform certain operations
into function calls.

In regard to what you said, do you mean I should build the tree before
the expand pass, by writing a new pass that will work on the trees
instead of rtx?

Otherwise, I fail to see how that is different to what I'm already
doing. Would you have an example?

Thanks,
Jc

PS: Although when I look at what GCC generates at the expand stage, it
really does seem that he first generates the calculation of the
parameters in pseudo-registers and then moves them to the actual
output registers. It's the next phases that will combine the two to
save a move.

On Tue, Sep 1, 2009 at 6:26 PM, Ian Lance Taylor wrote:
> Jean Christophe Beyler  writes:
>
>> First off: does this seem correct?
>
> Awkward though it is, it may be more reliable to build a small tree here
> and pass it to expand_call.  This assumes that you want to use the
> standard ABI when calling this function.
>
> Then your second issue would go away.
>
> Ian
>


Re: Replacing certain operations with function calls

2009-09-01 Thread Jean Christophe Beyler
Finally, I guess the one thing I can do is simply generate
pseudo_registers and copy all my registers into the pseudos before the
call I make.

Then I do my expand like I showed above.

And finally, move everything back.

Later passes will remove anything that was not needed, anything that
was will be kept. This could be a solution to the second issue but
I'll wait to understand what you meant first.

Jc

On Tue, Sep 1, 2009 at 6:35 PM, Jean Christophe
Beyler wrote:
> I don't think I quite understand what you're meaning. I want to use
> the standard ABI, basically I want to transform certain operations
> into function calls.
>
> In regard to what you said, do you mean I should build the tree before
> the expand pass, by writing a new pass that will work on the trees
> instead of rtx?
>
> Otherwise, I fail to see how that is different to what I'm already
> doing. Would you have an example?
>
> Thanks,
> Jc
>
> PS: Although when I look at what GCC generates at the expand stage, it
> really does seem that he first generates the calculation of the
> parameters in pseudo-registers and then moves them to the actual
> output registers. It's the next phases that will combine the two to
> save a move.
>
> On Tue, Sep 1, 2009 at 6:26 PM, Ian Lance Taylor wrote:
>> Jean Christophe Beyler  writes:
>>
>>> First off: does this seem correct?
>>
>> Awkward though it is, it may be more reliable to build a small tree here
>> and pass it to expand_call.  This assumes that you want to use the
>> standard ABI when calling this function.
>>
>> Then your second issue would go away.
>>
>> Ian
>>
>


Re: Replacing certain operations with function calls

2009-09-02 Thread Jean Christophe Beyler
> Yes.  Though I do wonder why you are avoiding using the normal libcall
> machinery.  If all you really care about is changing the function name, then
> all you need to do is modify the appropriate optab.  See, for instance,
> arm_init_libfuncs.

I guess both could work. I had seen the TARGET_INIT_LIB but, when I
had tried playing with it, I got a segmentation fault in the
emit_unop_insn function (for some reason, pat = GEN_FCN (icode) (temp,
op0) was returning NULL). I guessed that either I had another problem
in my MD file that was posing a problem or that I didn't define
everything needed for the libcall system.

This solution worked relatively straight away and is open enough to
allow me to change the implementation later down the road.

When things stabilize, I might go back and move the real function
calls back to the libcall machinery.

>> mult tmp1, a, b
>> mult tmp2, c, d
>>
>> mov output1, tmp1
>> mov output2, tmp2
>> call bar

Ok then I don't foresee a problem with the way I do it.

Thanks again all of you,
Jc


Constraints and reload

2009-09-14 Thread Jean Christophe Beyler
Dear all,

I was working on simplifying the movdi in my port and I had put :

  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,o")
(match_operand:DI 1 "general_operand"  "r,i,o,r"))]

However, due to constraints on the offset that is allowed, I:

- Added the check of the offset in the expand
- Added the check of the offset in the definition of the instruction
- Changed the constraints to:

  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,R")
(match_operand:DI 1 "general_operand"  "r,i,R,r"))]

Where R checks if the operand is a memory operand and if the offset is correct.

Once I did that, in one case of my regression tests, I received the error:
asm operand requires impossible reload

However, if I generate what I get in the first version, all the moves
seem to be correct anyway.

My theory is that reload needs the o EVEN IF what it will be trying to
do is illegal and it will figure that out and finally generate the
right thing afterwards.
Thus, why the o is necessary and not the R.

Is this correct?

Thanks,
Jean Christophe Beyler


Re: Constraints and reload

2009-09-16 Thread Jean Christophe Beyler
Sorry to bring this back to the conversation. Is there any reason why
this would not work with floating-point constraints ?

My "R" constraint is defined as:

(define_memory_constraint "R"
  "R is for memory references which take 1 word for the instruction"
  (and (match_code "mem")
   (match_test "simple_memory_operand (op)")))

I have :
(define_insn "movdf_internal"
  [(set (match_operand:DF 0 "nonimmediate_operand" "=r,r,r,R")
(match_operand:DF 1 "general_operand"  " r,F,R,r"))]
  "check_move (operands[0], operands[1])"
   "@
mov\\t%0,%1
lid\\t%0,%1
ldd\\t%0,%1
std\\t%1,%0"
  [(set_attr "type" "move,arith,load,store")
   (set_attr "mode" "DF,DF,DF,DF")
   (set_attr "length"   "1,1,1,1")])

However, on a relatively complex bit of code, I get :
error: unrecognizable insn:
(set (mem/s:DF (const:DI (plus:DI (symbol_ref:DI ("st") )
(const_int 48 [0x30]))) [14 st+48 S8 A64])
(reg:DF 10 r10)) -1 (nil))

(My architecture cannot allow const as addresses to the store, it must
go through a register).

Basically, I have the same expand code and the same check_move
functions for the integer modes since they check the same things: is
it a memory, does it have an acceptable offset.

Check_move and simple_memory return false to the destination operand
of this set and the expand phase put "st" in a register.

But duing the greg pass, it shows:

Reloads for insn # 140
Reload 0: reload_out (DF) = (mem/s:DF (const:DI (plus:DI
(symbol_ref:DI ("st") )
(const_int
48 [0x30]))) [14 st+48 S8 A64])
GR_REGS, RELOAD_FOR_OUTPUT (opnum = 0)
reload_out_reg: (mem/s:DF (const:DI (plus:DI (symbol_ref:DI
("st") )
(const_int
48 [0x30]))) [14 st+48 S8 A64])
reload_reg_rtx: (reg:DF 10 r10)

It is theorically possible that my code fails for integer modes but
that my code example doesn't force the compiler in the same route and
thus I don't see a fail. But I don't really think so.

Any ideas ?

Thanks a lot,
Jc

On Mon, Sep 14, 2009 at 2:16 PM, Jean Christophe Beyler
 wrote:
> That seems to have fixed it, I just reverted back to what I had done 
> initially.
>
> Thanks a lot,
> Jc
>
> On Mon, Sep 14, 2009 at 1:46 PM, Richard Henderson  wrote:
>> On 09/14/2009 12:18 PM, Jean Christophe Beyler wrote:
>>>
>>>   [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,R")
>>>         (match_operand:DI 1 "general_operand"      "r,i,R,r"))]
>>>
>>> Where R checks if the operand is a memory operand and if the offset is
>>> correct.
>>
>> Did you use define_memory_constraint for R, or just define_constraint?
>> If the former, it's a bug in reload; if the later, that's the problem.
>>
>>
>> r~
>>
>


Re: Constraints and reload

2009-09-16 Thread Jean Christophe Beyler
> Not that I can think of.  Did you provide all of the secondary reload
> stuff that you need?  Probably not.

Probably not, I've been working at cleaning up what was done before.
What exactly is needed to be done to define the secondary reload stuff
?

> You'll do much better by rejecting these addresses earlier.
> Probably by some combination of custom predicates that allow
> only simple_memory_operand + registers + constants, or by
> simply defining this kind of memory address illegal for DFmode.

That is what I thought I had done:

- DImode: accepts move from addresses to a register, immediates to a
register, loads and stores using the R constraint
- Other integer modes: accepts immediates to a register, loads and
stores using the R constraint
- DF/SF modes: as you saw before


I have an expand_move that rejects anything not accepted, a check_move
that makes sure everything is ok and the constraints that go with (my
'R' constraint).

The instruction generated during greg should normally be refused.

From what you've said, it seems that I did forget to define something
so I will look at the secondary reload stuff ?

- What exactly is needed there ?

Thanks,
Jc


Turning off unrolling to certain loops

2009-10-05 Thread Jean Christophe Beyler
Dear all,

I was wondering if it was possible to turn off the unrolling to
certain loops. Basically, I'd like the compiler not to consider
certain loops for unrolling but fail to see how exactly I can achieve
that.

I've traced the unrolling code to multiple places in the code (I'm
working with the 4.3.2 version) and, for the moment, I'm trying to
figure out if I can add something in the loop such as a note that I
can later find in the FOR_EACH_LOOP loops in order to turn the
unrolling for that particular loop off.

Have you got any ideas of what I could use like "note" or even a
better idea all together?

Thanks,
Jc


Re: Turning off unrolling to certain loops

2009-10-06 Thread Jean Christophe Beyler
Yes, I'd be happy to look into how you did it or where you were up to.

I don't know what I'll be able to do but it might lead me in the right
direction and allow me to finish what you started.

Thanks,
Jc

On Tue, Oct 6, 2009 at 2:53 AM, Zdenek Dvorak  wrote:
> Hi,
>
>> I was wondering if it was possible to turn off the unrolling to
>> certain loops. Basically, I'd like the compiler not to consider
>> certain loops for unrolling but fail to see how exactly I can achieve
>> that.
>>
>> I've traced the unrolling code to multiple places in the code (I'm
>> working with the 4.3.2 version) and, for the moment, I'm trying to
>> figure out if I can add something in the loop such as a note that I
>> can later find in the FOR_EACH_LOOP loops in order to turn the
>> unrolling for that particular loop off.
>>
>> Have you got any ideas of what I could use like "note" or even a
>> better idea all together?
>
> the normal way of doing this is using a #pragma.  For instance,
> icc implements #pragma nounroll for this purpose.  I have some prototype
> implementation for gcc, I can send it to you if you were interested in
> finishing it,
>
> Zdenek
>


Re: Turning off unrolling to certain loops

2009-10-08 Thread Jean Christophe Beyler
Dear all,

I've been working on a loop unrolling scheme and I have a few questions:

1) Is there an interest in having a loop unrolling scheme for GCC? I'm
working on the 4.3.2 version but can port it afterwards to the 4.5
version or any version you think is appropriate.

2) I was using a simple example:

#pragma unroll 2
for (i=0;i<6;i++)
{
printf ("Hello world\n");
}

If I do this, instead of transforming the code into :
for (i=0;i<3;i++)
{
printf ("Hello world\n");
printf ("Hello world\n");
}

as we could expect, it is transformed into:
for (i=0;i<2;i++)
{
printf ("Hello world\n");
printf ("Hello world\n");
}
for (i=0;i<2;i++)
{
printf ("Hello world\n");
}


(I am using 4.3.2 currently)

I am using the tree_unroll_loop function to perform the unrolling and
it seems to always want to keep that epilogue. Is there a reason for
this? Or is this a bug of some sorts?

It seems that because the unrolling function wants always have this epilogue.

I've moved forwards (debugging wise) on this also but will wait to
know a bit of your input. I'll be looking at how to remove this
epilogue when it is not needed.

Thanks in advance,
Jc


Re: Turning off unrolling to certain loops

2009-10-08 Thread Jean Christophe Beyler
Hi,

> such an epilogue is needed when the # of iterations is not known in the
> compile time; it should be fairly easy to modify the unrolling not to
> emit it when it is not necessary,

Agreed, that is why I was surprised to see this in my simple example.
It seems to me that the whole unrolling process has been made to, on
purpose, have this epilogue in place.

In the case where the unrolling would be perfect (ie. there would be
no epilogue), the calculation of the max bound of the unrolled version
is always done to have this epilogue (if you have 4 iterations and ask
to unroll twice, it will actually change the max bound to 3,
therefore, having one iteration of the unrolled version and 2
iterations of the original...). I am currently looking at the code of
tree_transform_and_unroll_loop to figure out how to change this and
not have an epilogue in my cases.

Jc


Re: Turning off unrolling to certain loops

2009-10-14 Thread Jean Christophe Beyler
Ok, I've actually gone a different route. Instead of waiting for the
middle end to perform this, I've directly modified the parser stage to
unroll the loop directly there.

Basically, I take the parser of the for and modify how it adds the
various statements. Telling it to, instead of doing in the
c_finish_loop :

  if (body)
add_stmt (body);
  if (clab)
add_stmt (build1 (LABEL_EXPR, void_type_node, clab));
  if (incr)
add_stmt (incr);
...

I tell it to add multiple copies of body and incr and the at the end
add in the loop the rest of it. I've also added support to remove
further unrolling to these modified loops and will be handling the
"No-unroll" pragma. I then let the rest of the optimization passes,
fuse the incrementations together if possible,  etc.

The initial results are quite good and seem to work and produce good code.

Currently, there are two possibilities :

- If the loop is not in the form we want, for example:

for (;i wrote:
> Hi,
>
>> such an epilogue is needed when the # of iterations is not known in the
>> compile time; it should be fairly easy to modify the unrolling not to
>> emit it when it is not necessary,
>
> Agreed, that is why I was surprised to see this in my simple example.
> It seems to me that the whole unrolling process has been made to, on
> purpose, have this epilogue in place.
>
> In the case where the unrolling would be perfect (ie. there would be
> no epilogue), the calculation of the max bound of the unrolled version
> is always done to have this epilogue (if you have 4 iterations and ask
> to unroll twice, it will actually change the max bound to 3,
> therefore, having one iteration of the unrolled version and 2
> iterations of the original...). I am currently looking at the code of
> tree_transform_and_unroll_loop to figure out how to change this and
> not have an epilogue in my cases.
>
> Jc
>


Re: Turning off unrolling to certain loops

2009-10-15 Thread Jean Christophe Beyler
You are entirely correct, I hadn't thought that through enough.

So I backtracked and have just merged what Bingfeng Mei has done with
your code and have now a corrected version of the loop unrolling.

What I did was directly modified tree_unroll_loop to handle the case
of a perfect unroll or not internally and then used something similar
to what you had done around that. I added what I think is needed to
stop unrolling of those loops in later passes.

I'll be starting my tests but I can port it to 4.5 if you are
interested to see what I did.
Jc

On Thu, Oct 15, 2009 at 6:00 AM, Zdenek Dvorak  wrote:
> Hi,
>
>> I faced a similar issue a while ago. I filed a bug report
>> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712) In the end,
>> I implemented a simple tree-level unrolling pass in our port
>> which uses all the existing infrastructure. It works quite well for
>> our purpose, but I hesitated to submit the patch because it contains
>> our not-very-elegannt #prgama unroll implementation.
>
> could you please do so anyway?  Even if there are some issues with the
> #prgama unroll implementation, it could serve as a basis of a usable
> patch.
>
>> /* Perfect unrolling of a loop */
>> static void tree_unroll_perfect_loop (struct loop *loop, unsigned factor,
>>                 edge exit)
>> {
>> ...
>> }
>>
>>
>>
>> /* Go through all the loops:
>>      1. Determine unrolling factor
>>      2. Unroll loops in different conditions
>>         -- perfect loop: no extra copy of original loop
>>         -- other loops: the original version of loops to execute the 
>> remaining iterations
>> */
>> static unsigned int rest_of_tree_unroll (void)
>> {
> ...
>>        tree niters = number_of_exit_cond_executions(loop);
>>
>>        bool perfect_unrolling = false;
>>        if(niters != NULL_TREE && niters!= chrec_dont_know && 
>> TREE_CODE(niters) == INTEGER_CST){
>>          int num_iters = tree_low_cst(niters, 1);
>>          if((num_iters % unroll_factor) == 0)
>>            perfect_unrolling = true;
>>        }
>>
>>        /* If no. of iterations can be divided by unrolling factor, we have 
>> perfect unrolling */
>>        if(perfect_unrolling){
>>          tree_unroll_perfect_loop(loop, unroll_factor, 
>> single_dom_exit(loop));
>>        }
>>        else{
>>          tree_unroll_loop (loop, unroll_factor, single_dom_exit (loop), 
>> &desc);
>>        }
>
> It would be better to move this test to tree_unroll_loop, and not
> duplicate its code in tree_unroll_perfect_loop.
>
> Zdenek
>


Re: Turning off unrolling to certain loops

2009-10-15 Thread Jean Christophe Beyler
I implemented it like this:

- I modified c_parser_for_statement to include a pragma tree node in
the loop with the unrolling request as an argument

- Then during my pass to handle unrolling, I parse the loop to find the pragma.
 - I retrieve the unrolling factor and use a merge of Zdenek's
code and yours to perform either a perfect unrolling or  and register
it in the loop structure

  - During the following passes that handle loop unrolling, I
look at that variable in the loop structure to see if yes or no, we
should allow the normal execution of the unrolling

  - During the expand, I transform the pragma into a note that
will allow me to remove any unrolling at that point.

That is what I did and it seems to work quite well.

Of course, I have a few things I am currently considering:
- Is there really a position that is better for the pragma node in
the tree representation ?
- Can other passes actually move that node out of a given loop
before I register it in the loop ?
- Should I, instead of keeping that node through the tree
optimizations, actually remove it and later on add it just before
expansion ?
- Can an RTL pass remove notes or move them out of a loop ?
- Can the tree node or note change whether or not an optimization
takes place?
- It is important to note that after the registration, the pragma
node or note are actually just there to say "don't do anything",
therefore, the number of nodes or notes in the loop is not important
as long as they are not moved out.

Those are my current concerns :-), I can give more information if requested,
Jc

PS: What exactly was your solution to this issue?


On Thu, Oct 15, 2009 at 12:11 PM, Bingfeng Mei  wrote:
> Jc,
> How did you implement #pragma unroll?  I checked other compilers. The
> pragma should govern the next immediate loop. It took me a while to
> find a not-so-elegant way to do that. I also implemented #pragma ivdep.
> These information are supposed to be passed through both tree and RTL
> levels and suvive all GCC optimization. I still have problem in handling
> combination of unroll and ivdep.
>
> Bingfeng
>
>> -Original Message-
>> From: fearyours...@gmail.com [mailto:fearyours...@gmail.com]
>> On Behalf Of Jean Christophe Beyler
>> Sent: 15 October 2009 16:34
>> To: Zdenek Dvorak
>> Cc: Bingfeng Mei; gcc@gcc.gnu.org
>> Subject: Re: Turning off unrolling to certain loops
>>
>> You are entirely correct, I hadn't thought that through enough.
>>
>> So I backtracked and have just merged what Bingfeng Mei has done with
>> your code and have now a corrected version of the loop unrolling.
>>
>> What I did was directly modified tree_unroll_loop to handle the case
>> of a perfect unroll or not internally and then used something similar
>> to what you had done around that. I added what I think is needed to
>> stop unrolling of those loops in later passes.
>>
>> I'll be starting my tests but I can port it to 4.5 if you are
>> interested to see what I did.
>> Jc
>>
>> On Thu, Oct 15, 2009 at 6:00 AM, Zdenek Dvorak
>>  wrote:
>> > Hi,
>> >
>> >> I faced a similar issue a while ago. I filed a bug report
>> >> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36712) In the end,
>> >> I implemented a simple tree-level unrolling pass in our port
>> >> which uses all the existing infrastructure. It works quite well for
>> >> our purpose, but I hesitated to submit the patch because
>> it contains
>> >> our not-very-elegannt #prgama unroll implementation.
>> >
>> > could you please do so anyway?  Even if there are some
>> issues with the
>> > #prgama unroll implementation, it could serve as a basis of a usable
>> > patch.
>> >
>> >> /* Perfect unrolling of a loop */
>> >> static void tree_unroll_perfect_loop (struct loop *loop,
>> unsigned factor,
>> >>                 edge exit)
>> >> {
>> >> ...
>> >> }
>> >>
>> >>
>> >>
>> >> /* Go through all the loops:
>> >>      1. Determine unrolling factor
>> >>      2. Unroll loops in different conditions
>> >>         -- perfect loop: no extra copy of original loop
>> >>         -- other loops: the original version of loops to
>> execute the remaining iterations
>> >> */
>> >> static unsigned int rest_of_tree_unroll (void)
>> >> {
>> > ...
>> >>        tree niters = number_of_exit_cond_executions(loop);
>> >>
>> >>        bool perfect_unrolling = false;
>&g

Re: Turning off unrolling to certain loops

2009-10-16 Thread Jean Christophe Beyler
Wow, it seems more complicated then me.

I think I will modify my approach to :

- Add a node PRAGMA in the tree level when parsing the loop

- As soon as the loop structures are in place, find the pragma, update
the loop structure, remove the node

- Before the loop structure is lost due to the transformation to RTL,
create a note in the loop

- As soon as the loop structures are valid again in RTL, find the
notes and update the loop structures again, remove the notes


For the moment, though I don't remove the nodes like I just said, I
see no problems with this solutions, I've tried on multiple examples,
single loops or nested loops, and it seems to be working fine.

I will continue the testing, at worse, I'll change the notes to an
RTL, my question is, how do you force it to not be taken out of the
loop and how do you ensure it does not affect later pass optimizations
? I suppose you can remove them once you register their presence.

Jc

On Fri, Oct 16, 2009 at 5:09 AM, Bingfeng Mei  wrote:
> The basic idea is the same as what is described in this thread.
>  http://gcc.gnu.org/ml/gcc/2008-05/msg00436.html
> But I made many changes on Alex's method.
>
> Pragmas are encoded into names of the helper functions because
> they are not optimized out by tree-level optimizer. These
> pseudo functions are either be consumed by target-builtins.c
> if it is only used at tree-level (unroll) or be replaced in
> target-builtins.c with special rtl NOTE(ivdep).
>
> To ensure the right scope of these loop pragmas, I also modified
> c_parser_for_statement, c_parser_while_statemennt, etc, to check
> loop level. I define that these pragmas only control the next
> innnermost loop. Once the right scope of the pragma is determined,
> I actually generate two helper functions for each pragma. The second
> is to close the scope of the pragma.
>
> When the pragma is used, I just search backward for preceding
> helper function (tree-level) or special note (rtl-level)
>
>
> Bingfeng
>
>> -Original Message-
>> From: fearyours...@gmail.com [mailto:fearyours...@gmail.com]
>> On Behalf Of Jean Christophe Beyler
>> Sent: 15 October 2009 17:27
>> To: Bingfeng Mei
>> Cc: Zdenek Dvorak; gcc@gcc.gnu.org
>> Subject: Re: Turning off unrolling to certain loops
>>
>> I implemented it like this:
>>
>> - I modified c_parser_for_statement to include a pragma tree node in
>> the loop with the unrolling request as an argument
>>
>> - Then during my pass to handle unrolling, I parse the loop
>> to find the pragma.
>>          - I retrieve the unrolling factor and use a merge of Zdenek's
>> code and yours to perform either a perfect unrolling or  and register
>> it in the loop structure
>>
>>           - During the following passes that handle loop unrolling, I
>> look at that variable in the loop structure to see if yes or no, we
>> should allow the normal execution of the unrolling
>>
>>           - During the expand, I transform the pragma into a note that
>> will allow me to remove any unrolling at that point.
>>
>> That is what I did and it seems to work quite well.
>>
>> Of course, I have a few things I am currently considering:
>>     - Is there really a position that is better for the pragma node in
>> the tree representation ?
>>     - Can other passes actually move that node out of a given loop
>> before I register it in the loop ?
>>     - Should I, instead of keeping that node through the tree
>> optimizations, actually remove it and later on add it just before
>> expansion ?
>>     - Can an RTL pass remove notes or move them out of a loop ?
>>     - Can the tree node or note change whether or not an optimization
>> takes place?
>>     - It is important to note that after the registration, the pragma
>> node or note are actually just there to say "don't do anything",
>> therefore, the number of nodes or notes in the loop is not important
>> as long as they are not moved out.
>>
>> Those are my current concerns :-), I can give more
>> information if requested,
>> Jc
>>
>> PS: What exactly was your solution to this issue?
>>
>>
>> On Thu, Oct 15, 2009 at 12:11 PM, Bingfeng Mei
>>  wrote:
>> > Jc,
>> > How did you implement #pragma unroll?  I checked other
>> compilers. The
>> > pragma should govern the next immediate loop. It took me a while to
>> > find a not-so-elegant way to do that. I also implemented
>> #pragma ivdep.
>> > These information are supposed to be passed through both
>> tree and RTL
>> > levels and suvive all GCC opt

Re: Why does GCC Preprocessor NOT support such macro?

2009-10-23 Thread Jean Christophe Beyler
It works no problem for me.

This is what I get:

$ gcc -Wall -o "Test.exe" "macro.cpp" -lstdc++ -s -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ./configure --enable-compiler=c --enable-language=c
--prefix=/home/beyler/tmp/gcc-4.3.2/bin
Thread model: posix
gcc version 4.3.2 (GCC)
COLLECT_GCC_OPTIONS='-Wall' '-o' 'Test.exe' '-s' '-v' '-mtune=generic'
 /home/beyler/tmp/gcc-4.3.2/bin/libexec/gcc/i686-pc-linux-gnu/4.3.2/cc1plus
-quiet -v -D_GNU_SOURCE macro.cpp -quiet -dumpbase macro.cpp
-mtune=generic -auxbase macro -Wall -version -o /tmp/ccx2HphX.s
ignoring nonexistent directory
"/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 
/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/../../../../include/c++/4.3.2
 
/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/../../../../include/c++/4.3.2/i686-pc-linux-gnu
 
/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/../../../../include/c++/4.3.2/backward
 /usr/local/include
 /home/beyler/tmp/gcc-4.3.2/bin/include
 /home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/include
 /home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/include-fixed
 /usr/include
End of search list.
GNU C++ (GCC) version 4.3.2 (i686-pc-linux-gnu)
compiled by GNU C version 4.3.2, GMP version 4.2.2, MPFR version 2.3.2.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 80f643acb5e6c86441cd11d3ee5d089a
COLLECT_GCC_OPTIONS='-Wall' '-o' 'Test.exe' '-s' '-v' '-mtune=generic'
 as -V -Qy -o /tmp/cccKZT5N.o /tmp/ccx2HphX.s
GNU assembler version 2.18.93 (i486-linux-gnu) using BFD version (GNU
Binutils for Ubuntu) 2.18.93.20081009
COMPILER_PATH=/home/beyler/tmp/gcc-4.3.2/bin/libexec/gcc/i686-pc-linux-gnu/4.3.2/:/home/beyler/tmp/gcc-4.3.2/bin/libexec/gcc/i686-pc-linux-gnu/4.3.2/:/home/beyler/tmp/gcc-4.3.2/bin/libexec/gcc/i686-pc-linux-gnu/:/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/:/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/
LIBRARY_PATH=/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/:/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-Wall' '-o' 'Test.exe' '-s' '-v' '-mtune=generic'
 /home/beyler/tmp/gcc-4.3.2/bin/libexec/gcc/i686-pc-linux-gnu/4.3.2/collect2
--eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o
Test.exe -s /usr/lib/crt1.o /usr/lib/crti.o
/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/crtbegin.o
-L/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2
-L/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/../../..
/tmp/cccKZT5N.o -lstdc++ -lgcc --as-needed -lgcc_s --no-as-needed -lc
-lgcc --as-needed -lgcc_s --no-as-needed
/home/beyler/tmp/gcc-4.3.2/bin/lib/gcc/i686-pc-linux-gnu/4.3.2/crtend.o
/usr/lib/crtn.o


On Fri, Oct 23, 2009 at 12:35 PM, Zhang Lin  wrote:
> Sorry, maybe my  representation is not quite clear.
> I mean that I didn't define ACE_HAS_NONSTATIC_OBJECT_MANAGER at all, so the 
> preprocesser should not process "#elif (ACE_HAS_NONSTATIC_OBJECT_MANAGER == 
> 0)", the following simple cpp can reproduce the problem.
>
> Test.cpp
> ==
> #if !defined (ACE_HAS_NONSTATIC_OBJECT_MANAGER)
> # define ACE_HAS_NONSTATIC_OBJECT_MANAGER
> #elif (ACE_HAS_NONSTATIC_OBJECT_MANAGER == 0)
> # undef ACE_HAS_NONSTATIC_OBJECT_MANAGER
> #endif /* ACE_HAS_NONSTATIC_OBJECT_MANAGER */
>
> int main(int argc, char *argv[])
> {
>  return 0;
> }
> ==
>
> the compile command is:
> gcc -Wall -o "Test.exe" "Test.cpp" -lstdc++ -s
>
> and the error message is:
> Test.cpp:3:41: error: operator '==' has no left operand
>
>
> - Original Message -
> From: "John Graham" 
> To: 
> Sent: Friday, October 23, 2009 10:03 PM
> Subject: Re: Why does GCC Preprocessor NOT support such macro?
>
>
>> 2009/10/23 Zhang Lin :
>> > Hello,
>> > I have encountered an issue when building ACE with MinGW and GCC 4.4.1
>> > The following macro was not accepted by the preprocessor and it reported 
>> > such an error: "error: operator '==' has no left operand".
>> >
>> > #if !defined (ACE_HAS_NONSTATIC_OBJECT_MANAGER)
>> > # define ACE_HAS_NONSTATIC_OBJECT_MANAGER
>> > #elif (ACE_HAS_NONSTATIC_OBJECT_MANAGER == 0)
>> > # undef ACE_HAS_NONSTATIC_OBJECT_MANAGER
>> > #endif /* ACE_HAS_NONSTATIC_OBJECT_MANAGER */
>> >
>> > As I think, since ACE_HAS_NONSTATIC_OBJECT_MANAGER isn't defined, the 
>> > #elif branch should not be processed.
>> > This macro is accepted by VC7.1 and Sun Studio 12.
>> >
>> > Thanks.
>> >
>> > Best Regards,
>> > Lin Zhang
>> > 2009-10-23
>>
>> You'll get this error if ACE_HAS_NONSTATIC_OBJECT_MANAGER is defined,
>> but has a null value - i.e. if somewhere before it was:
>>
>> #define ACE_HAS_NONSTATIC_OBJECT_MANAGER
>>
>> and not:
>>
>> #define ACE_HAS_NONSTATIC_OBJECT_MANAGER 

Re: Preserving the argument spills for GDB

2009-11-03 Thread Jean Christophe Beyler
I will try to add more details about my issue. Another way of putting this is :

For my architecture there is:

- A certain number of input registers which allow calls without going
on the stack
- There is not automatically a frame pointer on the stack either

What is needed, from the GCC side, to provide GDB with enough
information to preserve the arguments that are passed and allow to see
a backtrace and walk the stack.

Do we need to change the ABI and store certain indispensable things on
the stack for GDB ?

I see very little documentation about this or questions, so I don't
know exactly where to look to.

Note: in O0 levels, the compiler does actually store the arguments on
the stack and has a frame pointer, this gives GDB the possibility to
do show the arguments passed to a function at a break point but not
the backtrace.

In O3, however, I don't even get the arguments correct.

Thanks for any input,
Jc

On Tue, Nov 3, 2009 at 2:01 PM, Jean Christophe Beyler
 wrote:
> Dear all,
>
> I've been working on handling the Debugging information for the use of
> GDB on my port. Though I definitely know that, when compiling in -O3,
> some information is lost and the debugger can't always have all the
> information, I'd like to at least keep the values of the arguments
> when doing a backtrace. Since my architecture uses register passing
> for arguments, relatively quickly this is lost since it is not stored
> on the stack.
>
> Therefore, I would like to, when doing a backtrace, have the argument
> information. I asked a few clarifications on the GDB mailing list and,
> it seems, that I need to copy the arguments on the stack (like what is
> done by default when using -O0). However, when compiling in -O3, the
> compiler of course removes these stores on the stack and relies solely
> (and justly so) on the input registers.
>
> I've reread the GCC internals and have been looking at anything
> regarding the stack but can't seem to figure this one out. How exactly
> do I explain to the compiler that I want to keep those spills to the
> stack for debugging purposes ?
>
> Thanks for your help,
> Jc
>


Preserving the argument spills for GDB

2009-11-03 Thread Jean Christophe Beyler
Dear all,

I've been working on handling the Debugging information for the use of
GDB on my port. Though I definitely know that, when compiling in -O3,
some information is lost and the debugger can't always have all the
information, I'd like to at least keep the values of the arguments
when doing a backtrace. Since my architecture uses register passing
for arguments, relatively quickly this is lost since it is not stored
on the stack.

Therefore, I would like to, when doing a backtrace, have the argument
information. I asked a few clarifications on the GDB mailing list and,
it seems, that I need to copy the arguments on the stack (like what is
done by default when using -O0). However, when compiling in -O3, the
compiler of course removes these stores on the stack and relies solely
(and justly so) on the input registers.

I've reread the GCC internals and have been looking at anything
regarding the stack but can't seem to figure this one out. How exactly
do I explain to the compiler that I want to keep those spills to the
stack for debugging purposes ?

Thanks for your help,
Jc


Re: Preserving the argument spills for GDB

2009-11-04 Thread Jean Christophe Beyler
> You can force your writes to the stack to not be removed by making
> them use UNSPEC_VOLATILE.  You would write special define_insns for
> this.

Is there an architecture port that has done this already ?

> Not to miss the obvious, note that this will hurt optimization.
> However, if you need to have the argument values available for all
> backtraces, then I'm not sure what else to recommend.  In general gcc
> will discard argument values that are not needed.

I know. Personally, I have not been advocating this but for the
moment, we have been making a study at what would be needed and how
bad it would be.

However, I've been going through the first step : running GDB, setting
a break-point and doing a continue to see what I get and try to get
the information right for O3 too.

In O0, I get:
Breakpoint @@ 1, foo (a=4, b=3, c=2, d=1) at hello.c:10

In O3, I get:
Breakpoint @@ 1, foo (a=Variable "a" is not available.) at hello.c:11

Now, I've been able to tell GCC to save those arguments exactly at the
same address in O3 as it does in O0 (I hacked the varargs saving
arguments code so that it would do the same thing for all functions).

It seems that, in the O0 case, the Dwarf information is automatically
propagated to say "The input register is now here", but when I do it
in O3, I'm issuing the information in the same way.

What am I exactly missing? Any ideas why GDB would not have enough
information in this case?

Thanks,
Jean Christophe Beyler


Re: Preserving the argument spills for GDB

2009-11-09 Thread Jean Christophe Beyler
Yes I understand. I'm trying to give multiple options to the users in
order to either have this enabled or not actually.

I'm running into one issue. In order for this to work, it would be
better if I could keep the top of the frame and the stack pointer in
two separate registers. This way, whatever happens, I know where to
find the top of the frame.

I have set HARD_FRAME_POINTER_REGNUM and I have, after I store the
hard_frame_pointer on the stack and update the stack pointer generated
:

/* Update HARD_FRAME_POINTER_REGNUM */
insn = emit_move_insn (hard_frame_pointer_rtx, stack_pointer_rtx);
RTX_FRAME_RELATED_P (insn) = 1;


At the end of the function, I load back the hard_frame_pointer and set
back the stack pointer.

However, in O3, the compiler sometimes decides that my instruction is
useless. For debugging purposes this is not good since I put the stack
pointer and the return address in fixed places relative to this frame
pointer and not the stack pointer (since the stack can move around
depending on variable arguments, it's better to use that register).

How can I force the prologue to keep this instruction. It is useless
only in the case that there is no function call or no alloca. But I
have a case where there is a function call and it is still removed.

Any ideas ?

Thank you very much for your input,
Jc

On Wed, Nov 4, 2009 at 11:45 AM, Ian Lance Taylor  wrote:
> Jean Christophe Beyler  writes:
>
>>> You can force your writes to the stack to not be removed by making
>>> them use UNSPEC_VOLATILE.  You would write special define_insns for
>>> this.
>>
>> Is there an architecture port that has done this already ?
>
> No, because, when given the choice, gcc prefers faster execution over
> more reliable debugging at high optimization levels.
>
> Ian
>


Re: Preserving the argument spills for GDB

2009-11-09 Thread Jean Christophe Beyler
I actually already did put it as a fixed register using the
FIXED_REGISTER macro. However, I have not yet tested the EPILOGUE_USES
because it said that : "The stack and frame pointer are already
assumed to be used as needed".

My current port defines a different FRAME_POINTER_REGNUM from the
HARD_FRAME_POINTER_REGNUM. I have this because before register
allocation it is not possible to know the offset as it is said in the
Internals.

Basically, my code looks like this:

move stack pointer down
conditional trap on stack pointer

store on stack the return address
store on stack the frame pointer HARD_FRAME_POINTER_REGNUM (it's the
old one so that we know where it started)
mov HARD_FRAME_POINTER_REGNUM, stack_pointer

... Function code ...

restore return address
restore HARD_FRAME_POINTER_REGNUM
mov stack_pointer, HARD_FRAME_POINTER_REGNUM

This seemed like the simplest solution but it seems to be that,
because I restore it in the epilogue, it considers the move as being
useless since it doesn't realize it is needed by subsequent function
calls in the function code.

Thanks for any input,
Jc

On Mon, Nov 9, 2009 at 10:40 AM, Ian Lance Taylor  wrote:
> Jean Christophe Beyler  writes:
>
>> How can I force the prologue to keep this instruction. It is useless
>> only in the case that there is no function call or no alloca. But I
>> have a case where there is a function call and it is still removed.
>
> Make the hard frame pointer register a fixed register, or add it to
> EPILOGUE_USES.
>
> Ian
>


Re: Preserving the argument spills for GDB

2009-11-12 Thread Jean Christophe Beyler
Dear all,

As I continue to work on this I have found something that is surprising.

I wrote this code :

int foo (int argc, int argv)
{
   bar (argv, argc);
   return 0;
}

On my architecture, this is transformed into the following assembly code:

mov r6 = InputReg1
mov InputReg1 = InputReg2
mov InputReg2 = tmp

However, I am of course looking at the debug information, and I was
surprised to see that :

 <2><12f>: Abbrev Number: 6 (DW_TAG_formal_parameter)
  <130> DW_AT_name: argc
  <135> DW_AT_decl_file   : 1
  <136> DW_AT_decl_line   : 4
  <137> DW_AT_type: 
  <13b> DW_AT_location: 0x38(location list)
 <2><13f>: Abbrev Number: 6 (DW_TAG_formal_parameter)
  <140> DW_AT_name: argv
  <145> DW_AT_decl_file   : 1
  <146> DW_AT_decl_line   : 4
  <147> DW_AT_type: 
  <14b> DW_AT_location: 0x6e(location list)

This is ok, no problems here. But if I look at the locations given for
these variables:
0038  000c (DW_OP_InputReg1)
0038 000c 001c (DW_OP_r6)
0038 
006e  0010 (DW_OP_InputReg2)
006e 

Is there any reason why the compiler does not generate the debug
information showing that the InputReg1 is now in InputReg2 and
vice-versa? It would seem to me that the debugger would need this
information but I may be mistaken.

As always, thanks for your input,
Jc


Re: Preserving the argument spills for GDB

2009-11-12 Thread Jean Christophe Beyler
Of course:

mov r6 = InputReg1
mov InputReg1 = InputReg2
mov InputReg2 = tmp

should read:
mov r6 = InputReg1
mov InputReg1 = InputReg2
mov InputReg2 = r6


Sorry about that.
Jc


On Thu, Nov 12, 2009 at 12:07 PM, Jean Christophe Beyler
 wrote:
> Dear all,
>
> As I continue to work on this I have found something that is surprising.
>
> I wrote this code :
>
> int foo (int argc, int argv)
> {
>   bar (argv, argc);
>   return 0;
> }
>
> On my architecture, this is transformed into the following assembly code:
>
> mov r6 = InputReg1
> mov InputReg1 = InputReg2
> mov InputReg2 = tmp
>
> However, I am of course looking at the debug information, and I was
> surprised to see that :
>
>  <2><12f>: Abbrev Number: 6 (DW_TAG_formal_parameter)
>  <130>     DW_AT_name        : argc
>  <135>     DW_AT_decl_file   : 1
>  <136>     DW_AT_decl_line   : 4
>  <137>     DW_AT_type        : 
>  <13b>     DW_AT_location    : 0x38    (location list)
>  <2><13f>: Abbrev Number: 6 (DW_TAG_formal_parameter)
>  <140>     DW_AT_name        : argv
>  <145>     DW_AT_decl_file   : 1
>  <146>     DW_AT_decl_line   : 4
>  <147>     DW_AT_type        : 
>  <14b>     DW_AT_location    : 0x6e    (location list)
>
> This is ok, no problems here. But if I look at the locations given for
> these variables:
>    0038  000c (DW_OP_InputReg1)
>    0038 000c 001c (DW_OP_r6)
>    0038 
>    006e  0010 (DW_OP_InputReg2)
>    006e 
>
> Is there any reason why the compiler does not generate the debug
> information showing that the InputReg1 is now in InputReg2 and
> vice-versa? It would seem to me that the debugger would need this
> information but I may be mistaken.
>
> As always, thanks for your input,
> Jc
>


Bitfields problem

2009-12-11 Thread Jean Christophe Beyler
As I continue my work on the machine description file, I currently
worked on the bitfields again to try to get a good code generation
working. Right now, I've followed what was done in the ia64 for signed
extractions :

(define_insn "extv"
  [(set (match_operand:DI 0 "gr_register_operand" "=r")
(sign_extract:DI (match_operand:DI 1 "gr_register_operand" "r")
 (match_operand:DI 2 "extr_len_operand" "n")
 (match_operand:DI 3 "shift_count_operand" "M")))]
  ""
  "extr %0 = %1, %3, %2"
  [(set_attr "itanium_class" "ishf")])


now this works for me except that I get for this code:

typedef struct sTest {
int64_t a:1;
int64_t b:5;
int64_t c:7;
int64_t d:15;
}STest;

int64_t bar2 (STest a)
{
int64_t res = a.d;
return res;
}

Here is what I get at the final cleanup:

;; Function bar2 (bar2)

bar2 (a)
{
  short unsigned int SR.44;
  short unsigned int SR.43;
  short unsigned int SR.41;
  short unsigned int SR.40;
  short unsigned int SR.22;
  short unsigned int SR.3;

:
  SR.22 = (short unsigned int) () ((short unsigned
int) a.d & 32767);
  SR.43 = SR.22 & 32767;
  SR.44 = SR.43 ^ 16384;
  SR.3 = (short unsigned int) () ((short unsigned
int) () (SR.44 + 49152) & 32767);
  SR.40 = SR.3 & 32767;
  SR.41 = SR.40 ^ 16384;
  return (int64_t) () (SR.41 + 49152);

}

I don't understand why I get all these instructions. I know that
because it's signed, it is more complicated but I would prefer to get
an unsigned extract and the a shift left/shift right. Thus 3
instructions.

Right now, I get so many more instructions that represent what I
showed from the final cleanup.

Any reason for all these instructions or ideas on how to get to my 3
instructions ?

Thank you for your help and time,
Jean Christophe Beyler


Re: Bitfields problem

2009-12-11 Thread Jean Christophe Beyler
ec 11, 2009 at 11:57 AM, Jean Christophe Beyler
 wrote:
> As I continue my work on the machine description file, I currently
> worked on the bitfields again to try to get a good code generation
> working. Right now, I've followed what was done in the ia64 for signed
> extractions :
>
> (define_insn "extv"
>  [(set (match_operand:DI 0 "gr_register_operand" "=r")
>    (sign_extract:DI (match_operand:DI 1 "gr_register_operand" "r")
>             (match_operand:DI 2 "extr_len_operand" "n")
>             (match_operand:DI 3 "shift_count_operand" "M")))]
>  ""
>  "extr %0 = %1, %3, %2"
>  [(set_attr "itanium_class" "ishf")])
>
>
> now this works for me except that I get for this code:
>
> typedef struct sTest {
>    int64_t a:1;
>    int64_t b:5;
>    int64_t c:7;
>    int64_t d:15;
> }STest;
>
> int64_t bar2 (STest a)
> {
>    int64_t res = a.d;
>    return res;
> }
>
> Here is what I get at the final cleanup:
>
> ;; Function bar2 (bar2)
>
> bar2 (a)
> {
>  short unsigned int SR.44;
>  short unsigned int SR.43;
>  short unsigned int SR.41;
>  short unsigned int SR.40;
>  short unsigned int SR.22;
>  short unsigned int SR.3;
>
> :
>  SR.22 = (short unsigned int) () ((short unsigned
> int) a.d & 32767);
>  SR.43 = SR.22 & 32767;
>  SR.44 = SR.43 ^ 16384;
>  SR.3 = (short unsigned int) () ((short unsigned
> int) () (SR.44 + 49152) & 32767);
>  SR.40 = SR.3 & 32767;
>  SR.41 = SR.40 ^ 16384;
>  return (int64_t) () (SR.41 + 49152);
>
> }
>
> I don't understand why I get all these instructions. I know that
> because it's signed, it is more complicated but I would prefer to get
> an unsigned extract and the a shift left/shift right. Thus 3
> instructions.
>
> Right now, I get so many more instructions that represent what I
> showed from the final cleanup.
>
> Any reason for all these instructions or ideas on how to get to my 3
> instructions ?
>
> Thank you for your help and time,
> Jean Christophe Beyler
>


New RTL instruction for my port

2009-12-14 Thread Jean Christophe Beyler
Dear all,

I wanted to add an instruction via a builtin to allow users to take
advantage of some special instructions that are on my architecture.

However, the latency of the instruction is not 1 so I'll also need to
handle it in the rtx_costs function.

I had first tried to explain with the current RTL what the instruction
did, but it used internally a special register and it was difficult to
define exactly what it does. Second, it seemed difficult to handle the
latency this way. Since I would not get a separate code for my rtl.

My current solution:

- Define a new rtl in rtl.def
- Add the new rtl in the MD file and the generated assembly instruction

However, the solution seems to work, except in O0, where I get this error:

testmaca.c:25: error: unrecognizable insn:
(insn 11 10 12 3 testmaca.c:16 (set (reg:DI 66 myreg)
(newrtl:DI (mem/c/i:DI (plus:DI (reg/f:DI 68 virtual-stack-vars)
(const_int 16 [0x10])) [0 x+0 S8 A64])
(mem/c/i:DI (plus:DI (reg/f:DI 68 virtual-stack-vars)
(const_int 8 [0x8])) [0 y+0 S8 A64]))) -1 (nil))

Any ideas how I can fix this ?

Here is my rtl definition :

(define_insn "newrtl"
  [(set (reg:DI 24)
(newrtl:DI (match_operand:DI 0 "register_operand" "r")
 (match_operand:DI 1 "register_operand" "r")))]
  ""
  "newrtl\\t%0,%1"
  [(set_attr "type" "arith")
   (set_attr "mode" "DI")
   (set_attr "length"   "1")])


If there is a better way to do this, I'll be happy to hear it :-)

Thanks in advance,
Jc


Re: New RTL instruction for my port

2009-12-14 Thread Jean Christophe Beyler
On Mon, Dec 14, 2009 at 5:04 PM, Daniel Jacobowitz  wrote:
> On Mon, Dec 14, 2009 at 04:46:59PM -0500, Jean Christophe Beyler wrote:
>> My current solution:
>>
>> - Define a new rtl in rtl.def
>
> Just use an unspec or unspec_volatile.  You don't need a new RTL
> operation.

I thought of that but then how do I add the cost ? I also have another
problem: there is a second instruction that would have the exact same
signature if I use an unspec.

Is there a solution for that and how do I handle the cost then ?
   - Just say that an unspec has a higher cost?


>
>> - Add the new rtl in the MD file and the generated assembly instruction
>>
>> However, the solution seems to work, except in O0, where I get this error:
>
> This means whatever is calling gen_newrtl to create the insn is not
> checking operand predicates first.  That's probably code you wrote
> too.

You are probably right, I'll move it into a register first before
generating my instruction.


Thanks a lot,
Jc


Re: New RTL instruction for my port

2009-12-15 Thread Jean Christophe Beyler
You are correct. So I should be changing things in the adjust_cost
function instead.

I was also wondering, these instructions modify an internal register
that has been set as a fixed register. However, the compiler optimizes
them out when the accumulator is not retrieved for a calculation. How
can I tell the compiler that it should not remove these instructions.

Here is an example code:

uint64_t foo (uint64_t x, uint64_t y)
{
uint64_t z;

__builtin_newins (x,y); /* Modifies the accumulator */

z = __builtin_retrieve_accum (); /* Retrieve the accumulator */

return z;
}

If I remove the instruction "z = ...;", then the compiler will
optimize out my first builtin call.


Thanks for your help and input,
Jc

On Mon, Dec 14, 2009 at 6:10 PM, Daniel Jacobowitz  wrote:
> On Mon, Dec 14, 2009 at 05:52:50PM -0500, Jean Christophe Beyler wrote:
>> I thought of that but then how do I add the cost ? I also have another
>> problem: there is a second instruction that would have the exact same
>> signature if I use an unspec.
>>
>> Is there a solution for that and how do I handle the cost then ?
>>    - Just say that an unspec has a higher cost?
>
> Are you really talking about rtx_costs?  It sounds to me more like you
> want to change your scheduler.
>
> --
> Daniel Jacobowitz
> CodeSourcery
>


Re: New RTL instruction for my port

2009-12-15 Thread Jean Christophe Beyler
EPILOGUE_USES does not seem to work, the code still gets optimized out.

However, unspec_volatile works but then, as you have said, the
compiler doesn't optimize out things that it then could.

I have for example an instruction to set this special register.
Theoretically, if we had :

set (x);
set (y);

The compiler should remove the first set. Which it does if I remove
the volatile but keep a retrieval of the special register in the
function.

Any other ideas by any chance?

Thanks again,
Jc

On Tue, Dec 15, 2009 at 10:20 AM, Daniel Jacobowitz  wrote:
> On Tue, Dec 15, 2009 at 10:08:02AM -0500, Jean Christophe Beyler wrote:
>> You are correct. So I should be changing things in the adjust_cost
>> function instead.
>>
>> I was also wondering, these instructions modify an internal register
>> that has been set as a fixed register. However, the compiler optimizes
>> them out when the accumulator is not retrieved for a calculation. How
>> can I tell the compiler that it should not remove these instructions.
>>
>> Here is an example code:
>>
>> uint64_t foo (uint64_t x, uint64_t y)
>> {
>> uint64_t z;
>>
>> __builtin_newins (x,y); /* Modifies the accumulator */
>>
>> z = __builtin_retrieve_accum (); /* Retrieve the accumulator */
>>
>> return z;
>> }
>>
>> If I remove the instruction "z = ...;", then the compiler will
>> optimize out my first builtin call.
>
> I suppose you could use EPILOGUE_USES to say that changes to the
> accumulator should not be discarded.  You could also use
> unspec_volatile instead of unspec, but that may further inhibit
> optimization.
>
> --
> Daniel Jacobowitz
> CodeSourcery
>


GMP and GCC 4.3.2

2009-12-16 Thread Jean Christophe Beyler
Dear all,

Found on http://gmplib.org/.

"N.B. gcc 4.3.2 miscompiles GMP 4.3.x on 64-bit machines. The problem
is specific to that very release; specifically gcc 4.3.1 and 4.3.3
seem to work fine."

Since porting to a newer version is difficult for me right now, I was
wondering if anybody worked on a patch for this specific problem.

Thanks,
Jc


Re: GMP and GCC 4.3.2

2009-12-16 Thread Jean Christophe Beyler
I've actually searched but still can't find any mention of such bugs/patches.

If anybody remembers a key word I could use, that would help. I've
tried "4.3.2" or "GMP" but haven't found anything.

Thanks a lot,
Jc

On Wed, Dec 16, 2009 at 2:37 PM, Richard Guenther
 wrote:
> On Wed, Dec 16, 2009 at 7:50 PM, Jean Christophe Beyler
>  wrote:
>> Dear all,
>>
>> Found on http://gmplib.org/.
>>
>> "N.B. gcc 4.3.2 miscompiles GMP 4.3.x on 64-bit machines. The problem
>> is specific to that very release; specifically gcc 4.3.1 and 4.3.3
>> seem to work fine."
>>
>> Since porting to a newer version is difficult for me right now, I was
>> wondering if anybody worked on a patch for this specific problem.
>
> It has been fixed for 4.3.3.  You can search bugzilla to find the
> offending patch.
>
> Richard.
>
>> Thanks,
>> Jc
>>
>


Re: GMP and GCC 4.3.2

2009-12-17 Thread Jean Christophe Beyler
Actually, I just finished updating my 4.3.2 to 4.3.3 and tested it and
I still have the same issue.

This seems to be a problem more than "just" 4.3.2.

Here is the test program:
#include 
#include 

int main() {
mpz_t a,b;
mpz_init_set_str(a, "100", 10); // program works with 10^9, but not
// with 10^10 (10^20 > 2^64)
mpz_init_set(b, a);
mpz_mul(a, a, a);
gmp_printf("first,  in GMP mpz_mul(a,a,a) with a=%Zd gives %Zd \n", b, a);
mpz_set(b, a);
mpz_mul(a, a, a);
gmp_printf("second, in GMP mpz_mul(a,a,a) with a=%Zd gives %Zd \n", b, a);
return 0;
}

We obtain:
first,  in GMP mpz_mul(a,a,a) with a=100 gives 1
second, in GMP mpz_mul(a,a,a) with a=1 gives
2254536013160540992915637663717291581095979492475463214517286840718852096

Which clearly is wrong for the second output.

This was tested with a 64 bit architecture. I know that with a 4.1.1
port of the compiler, I do not see this issue.

I will see if I can port it forward to see if I still see the problem
but it might be difficult to port from 4.3.2 to 4.4.2, I'm not sure
how many things have changed but I'm sure quite a bit !

This is the -v of my GCC, version 4.3.3:
Using built-in specs.
Target: myarch64-linux-elf
Configured with:
/home/beyler/myarch64/src/myarch64-gcc-4.3.2/configure
--target=myarch64-linux-elf
--with-headers=/home/beyler/myarch64/src/newlib-1.16.0/newlib/libc/include
--prefix=/home/beyler/myarch64/local --disable-nls
--enable-languages=c --with-newlib --disable-libssp
--with-mpfr=/home/beyler/myarch64/local
Thread model: single
gcc version 4.3.3 (GCC)

Jc

PS: I have already asked the gmp bug mailing list to see if they have
any input on this.


On Thu, Dec 17, 2009 at 6:19 AM, Jay Foad  wrote:
> If it's the bug being discussed here:
>
> http://gmplib.org/list-archives/gmp-discuss/2009-April/003717.html
>
> ... then it was reported as fixed here:
>
> http://gcc.gnu.org/ml/gcc/2009-04/msg00562.html
>
> Jay.
>


Re: GMP and GCC 4.3.2

2009-12-18 Thread Jean Christophe Beyler
myarch64 is my port.

Then perhaps this is an back-end problem on my port. I'll try to
determine this today and keep you posted.

Jc

On Thu, Dec 17, 2009 at 9:29 PM, Jie Zhang  wrote:
> On 12/18/2009 06:27 AM, Jean Christophe Beyler wrote:
>>
>> Actually, I just finished updating my 4.3.2 to 4.3.3 and tested it and
>> I still have the same issue.
>>
>> This seems to be a problem more than "just" 4.3.2.
>>
>> Here is the test program:
>> #include
>> #include
>>
>> int main() {
>>     mpz_t a,b;
>>     mpz_init_set_str(a, "100", 10); // program works with 10^9,
>> but not
>>     // with 10^10 (10^20>  2^64)
>>     mpz_init_set(b, a);
>>     mpz_mul(a, a, a);
>>     gmp_printf("first,  in GMP mpz_mul(a,a,a) with a=%Zd gives %Zd \n", b,
>> a);
>>     mpz_set(b, a);
>>     mpz_mul(a, a, a);
>>     gmp_printf("second, in GMP mpz_mul(a,a,a) with a=%Zd gives %Zd \n", b,
>> a);
>>     return 0;
>> }
>>
>> We obtain:
>> first,  in GMP mpz_mul(a,a,a) with a=100 gives
>> 1
>> second, in GMP mpz_mul(a,a,a) with a=1 gives
>>
>> 2254536013160540992915637663717291581095979492475463214517286840718852096
>>
>> Which clearly is wrong for the second output.
>>
>> This was tested with a 64 bit architecture. I know that with a 4.1.1
>> port of the compiler, I do not see this issue.
>>
>> I will see if I can port it forward to see if I still see the problem
>> but it might be difficult to port from 4.3.2 to 4.4.2, I'm not sure
>> how many things have changed but I'm sure quite a bit !
>>
>> This is the -v of my GCC, version 4.3.3:
>> Using built-in specs.
>> Target: myarch64-linux-elf
>> Configured with:
>> /home/beyler/myarch64/src/myarch64-gcc-4.3.2/configure
>> --target=myarch64-linux-elf
>> --with-headers=/home/beyler/myarch64/src/newlib-1.16.0/newlib/libc/include
>> --prefix=/home/beyler/myarch64/local --disable-nls
>> --enable-languages=c --with-newlib --disable-libssp
>> --with-mpfr=/home/beyler/myarch64/local
>> Thread model: single
>> gcc version 4.3.3 (GCC)
>>
> What's myarch64? I got the correct result for your test with vanilla
> gcc-4.3.2 and gmp-4.3.1 on Debian unstable AMD64.
>
>
> Jie
>


Re: New RTL instruction for my port

2010-01-07 Thread Jean Christophe Beyler
Dear all,

I've gone to using unspec and I think I know why I have a problem. It
seems that actually, the problem lies with the fact that these
instructions are touching an internal register and how I am handling
that register.

Since I don't want the register allocator to use that register, I put
a 1 in FIXED_REGISTERS. However, it's state is persistent across
function calls, meaning that if nothing changes its state, it will be
still set in the same way after a return call.

However, CALL_USED_REGISTERS is defined as "1 for registers not
available across function calls. These must include the
FIXED_REGISTERS and also any registers that can be used without being
saved.".

These two sentences seem in contradiction with what I want :

- I want to define a register that is not available to the register
allocator but is available across function calls.


What seems to happen is if I do :



__builtin_newins (x,y); /* Modifies the accumulator */

foo ();

z = __builtin_retrieve_accum (); /* Retrieve the accumulator */


the first builtin is removed because the internal register is deemed
to not conserve its value across the function call even if my
retrieve_accum is defined as :
(insn 12 11 13 3 mac3.c:8 (parallel [
(set (reg:DI 72 [ D.1223 ])
(unspec:DI [
(reg:DI 74 [ a.0 ])
(reg:DI 73 [ b.1 ])
(reg:DI 66 acc)
] 5))
(clobber (reg:DI 66 acc))
]) -1 (nil))

We see that I say that acc is used by the unspec but is also clobbered
by the instruction (which it is actually, we use the old value and
update it in the same instruction).


Any ideas on how I can solve this ?

As always, thank you for your input,
Jc

On Tue, Dec 15, 2009 at 10:48 AM, Jean Christophe Beyler
 wrote:
> EPILOGUE_USES does not seem to work, the code still gets optimized out.
>
> However, unspec_volatile works but then, as you have said, the
> compiler doesn't optimize out things that it then could.
>
> I have for example an instruction to set this special register.
> Theoretically, if we had :
>
> set (x);
> set (y);
>
> The compiler should remove the first set. Which it does if I remove
> the volatile but keep a retrieval of the special register in the
> function.
>
> Any other ideas by any chance?
>
> Thanks again,
> Jc
>
> On Tue, Dec 15, 2009 at 10:20 AM, Daniel Jacobowitz  wrote:
>> On Tue, Dec 15, 2009 at 10:08:02AM -0500, Jean Christophe Beyler wrote:
>>> You are correct. So I should be changing things in the adjust_cost
>>> function instead.
>>>
>>> I was also wondering, these instructions modify an internal register
>>> that has been set as a fixed register. However, the compiler optimizes
>>> them out when the accumulator is not retrieved for a calculation. How
>>> can I tell the compiler that it should not remove these instructions.
>>>
>>> Here is an example code:
>>>
>>> uint64_t foo (uint64_t x, uint64_t y)
>>> {
>>> uint64_t z;
>>>
>>> __builtin_newins (x,y); /* Modifies the accumulator */
>>>
>>> z = __builtin_retrieve_accum (); /* Retrieve the accumulator */
>>>
>>> return z;
>>> }
>>>
>>> If I remove the instruction "z = ...;", then the compiler will
>>> optimize out my first builtin call.
>>
>> I suppose you could use EPILOGUE_USES to say that changes to the
>> accumulator should not be discarded.  You could also use
>> unspec_volatile instead of unspec, but that may further inhibit
>> optimization.
>>
>> --
>> Daniel Jacobowitz
>> CodeSourcery
>>
>


Re: New RTL instruction for my port

2010-01-07 Thread Jean Christophe Beyler
I am almost convinced I had tried that already but apparently not.
This seems to have fixed my problem, thank you :-)

Jc

On Thu, Jan 7, 2010 at 4:14 PM, Richard Henderson  wrote:
> On 01/07/2010 12:58 PM, Jean Christophe Beyler wrote:
>>
>> Dear all,
>>
>> I've gone to using unspec and I think I know why I have a problem. It
>> seems that actually, the problem lies with the fact that these
>> instructions are touching an internal register and how I am handling
>> that register.
>>
>> Since I don't want the register allocator to use that register, I put
>> a 1 in FIXED_REGISTERS. However, it's state is persistent across
>> function calls, meaning that if nothing changes its state, it will be
>> still set in the same way after a return call.
>>
>> However, CALL_USED_REGISTERS is defined as "1 for registers not
>> available across function calls. These must include the
>> FIXED_REGISTERS and also any registers that can be used without being
>> saved.".
>>
>> These two sentences seem in contradiction with what I want :
>>
>> - I want to define a register that is not available to the register
>> allocator but is available across function calls.
>>
>
> See CALL_REALLY_USED_REGISTERS.
>
>
> r~
>


Changing the ABI

2010-01-18 Thread Jean Christophe Beyler
Dear all,

I have a current issue on my port. Basically the stack is defined as follows :

1) CALL_USED save area
2) Local variables
3) Caller arguments passed on the stack
4) 8 words to save arguments passed in registers, even if not passed

Now, this was done because we have defined 8 output registers,
therefore zone (3) is not used except if we call a function with more
than 8 parameters.

(4) is only used if we have va_arg functions that will then spill the
8 input registers on the stack and call the va_arg function.

This is done, to my understanding for our ABI, because, in the case of
a va_arg function, we want the parameters consecutively store in
memory.

However, this brings me to 2 problems :

1) If there are no va_arg function calls, there still is 8 words
wasted on the stack per function call still active on the stack.

2) If there is an alloca, then GCC moves the stack pointer but without
trying to get back those 8 words or even the space for (3).


I am currently working on not having that wasted space. I see two options:

1) Change the ABI to go closer to what I have seen in the MIPS port :
- The zone (4) is handled by the callee
- However, I'll still have the wasted space (4) when an alloca is used no ?

2) Figure out how to explain to GCC how to reclaim zones (3), (4)
before an alloca and then, once the size of the alloca is known, add
to the stack new zones (3) and (4) for subsequent calls.
Of course, when the alloca-ed zone is freed, we will have to set back
the old zones (3) and (4).

Hopefully this email made some sense and some one will have an idea or
a direction I can look into,
Thanks again for all your time,
Jean Christophe Beyler


Re: Changing the ABI

2010-01-20 Thread Jean Christophe Beyler
That is also the conclusions I made.

- Move the zone (4) to the callee if need be. You know that if your
caller had more than 8 output parameters, the last parameters are set
on the stack in zone (3).

- However, it seems to me that in the case of :

void foo (void)
{
...
bar (param1, param2, ..., param9);

...
ptr = alloca (size);
...

bar (param1, param2, ..., param9);
}

In this case, foo would need to add space for param9 on the stack for
the call to bar.

However, later in the function, it would need to perform an alloca.

Since there is another call to bar, it would need to add again some
space for param9 on the stack.

Therefore the stack would look like this :

(5) Space for param9
(6) Alloca space for ptr
(7) Space for param9

Except if there is a way to tell the compiler to retrieve space (5)
before performing the alloca in order to get this:

(6) Alloca space for ptr (retrieved space (5))
(7) Space for param9

Can this be done when expanding the alloca in order to first pop the
stack, then alloca a bit more for the parameters ? However, this needs
also to take precautions when popping the alloca's from the stack.

Thanks for your input and help,
Jean Christophe Beyler


On Tue, Jan 19, 2010 at 9:04 AM, Mikael Pettersson  wrote:
> On Mon, 18 Jan 2010 13:55:16 -0500, Jean Christophe Beyler wrote:
>>I have a current issue on my port. Basically the stack is defined as follows :
>>
>>1) CALL_USED save area
>>2) Local variables
>>3) Caller arguments passed on the stack
>>4) 8 words to save arguments passed in registers, even if not passed
>>
>>Now, this was done because we have defined 8 output registers,
>>therefore zone (3) is not used except if we call a function with more
>>than 8 parameters.
>>
>>(4) is only used if we have va_arg functions that will then spill the
>>8 input registers on the stack and call the va_arg function.
>>
>>This is done, to my understanding for our ABI, because, in the case of
>>a va_arg function, we want the parameters consecutively store in
>>memory.
>
> That's indeed a desirable property.
>
>>However, this brings me to 2 problems :
>>
>>1) If there are no va_arg function calls, there still is 8 words
>>wasted on the stack per function call still active on the stack.
>>
>>2) If there is an alloca, then GCC moves the stack pointer but without
>>trying to get back those 8 words or even the space for (3).
>>
>>
>>I am currently working on not having that wasted space. I see two options:
>>
>>1) Change the ABI to go closer to what I have seen in the MIPS port :
>>    - The zone (4) is handled by the callee
>>    - However, I'll still have the wasted space (4) when an alloca is used no 
>> ?
>
> Actually, zone (4) should just be deleted entirely from the ABI.
> That is, all the ABI should specify is that non-register arguments
> are in the caller's frame starting exactly at the stack pointer.
>
> For non-varargs calls, there's no waste.
>
> For varargs calls, the callee can allocate its frame and save the
> incoming register arguments at the top of its own frame, establishing
> the all-arguments-are-stored-consecutively property without burdening
> the caller or the ABI.
>
> An alloca() in the callee should just adjust the stack pointer and
> ignore the register arguments save area at the top of the frame.
>
> /Mikael
>


Zero extractions and zero extends

2010-02-09 Thread Jean Christophe Beyler
Dear all,

If I consider this code

typedef struct sTestUnsignedChar {
uint64_t a:1;
}STestUnsignedChar;

uint64_t getU (STestUnsignedChar a)
{
return a.a;
}


I get this in the DCE pass :
(insn 6 3 7 2 bitfield2.c:8 (set (subreg:DI (reg:QI 75) 0)
(zero_extract:DI (reg/v:DI 73 [ a ])
(const_int 1 [0x1])
(const_int 0 [0x0]))) 63 {extzvdi} (expr_list:REG_DEAD
(reg/v:DI 73 [ a ])
(nil)))

(insn 7 6 12 2 bitfield2.c:8 (set (reg:DI 74)
(zero_extend:DI (reg:QI 75))) 51 {zero_extendqidi2}
(expr_list:REG_DEAD (reg:QI 75)
(nil)))


(on the x86 port, I get a and instead of the zero_extract)

However, on the combine pass both stay, whereas in the x86 port, the
zero_extend is removed. Where is this decided exactly ?
I've checked the costs of the instructions, I have the same thing as
the x86 port.

Thanks for all your help,
Jean Christophe Beyler


Definition of an instruction

2010-03-04 Thread Jean Christophe Beyler
Dear all,

I've been trying to define this instruction but am having
difficulties. This instruction does 3 things:
 - Updates an internal accumulator (depending on two arguments)
 - Returns the value of that update in an output register
 - Updates the internal accumulator after that


I first wrote :

(define_insn "myInst"
 [
 (set (match_operand:DI 0 "register_operand" "=r")
  (unspec:DI [(match_operand:DI 1 "register_operand" "r")
   (match_operand:DI 2 "register_operand" "r")
   (reg:DI 66)] 5))
   (clobber (reg:DI 66))
 ]
   ""
   "myInst\\t%0,%1,%2"
  [(set_attr "type" "arith")
   (set_attr "mode" "DI")
   (set_attr "length"   "1")])

Therefore saying that the unspec was dependent on the accumulator and
the two values and that the result was operand 0.

I added a clobber to the accumulator to say: it's been changed.

However, now if I have two myInst, the first gets optimized out...

I therefore need to define this instruction differently, any ideas ?

Thanks a lot,
Jean Christophe Beyler


Re: Definition of an instruction

2010-03-04 Thread Jean Christophe Beyler
Yeah that's what I had tried first but I get :

genextract: Internal error: abort in VEC_safe_set_locstr

And I can't seem to get rid of that error.

Any idea why this happens?
Jc

On Thu, Mar 4, 2010 at 6:28 PM, Andrew Pinski  wrote:
> Maybe try this:
>
> (define_insn "myInst"
>  [
>  (set (match_operand:DI 0 "register_operand" "=r")
>  (unspec:DI [(match_operand:DI 1 "register_operand" "r")
>  (match_operand:DI 2 "register_operand" "r")
>  (reg:DI 66)] 5))
> (set (reg:DI 66)
>  (unspec:DI [(match_operand:DI 1 "register_operand" "r")
>  (match_operand:DI 2 "register_operand" "r")
>  (reg:DI 66)] 6))
>  ]
>  ""
>  "myInst\\t%0,%1,%2"
>  [(set_attr "type"     "arith")
>  (set_attr "mode"     "DI")
>  (set_attr "length"   "1")])
>
>
> Which basically means op[0] = unspec:5(op[1], op[2], reg:66); [reg:66]
> = unspec:6(op[1], op[2], reg:66)
> So when you have two in a row, the second depends on the first as you
> have the set of reg:66.  The clobber in your original code does not do
> that, it says reg 66 cannot be depended on the value across the
> instruction.
>
> Thanks,
> Andrew Pinski
>


Re: Definition of an instruction

2010-03-05 Thread Jean Christophe Beyler
Yes, I should have thought about it too. The error message threw me a
bit off, thanks for clearing that. Thanks for that !

Final question about this all:

I have a cost adjustment to make if the register operand 0 is used
later on. Basically, if that one is used, I have a +6 cost otherwise
if it's simply the 66 register, it's +1 cost.

I added this to the adjust_cost function :

/* Otherwise, if one is our instruction, we must check the dependencies */
if (isMyInst (dep_insn))
{
// Get statement
rtx stmt = PATTERN (dep_insn);
// Get target of the set, it's our instruction, we know where
it is exactly
int regno = REGNO (SET_DEST (XVECEXP (stmt, 0, 0)));

/* Depending on if the def of dep_insn is used in insn, we
adjust the cost */
bitmap tmp_uses = BITMAP_ALLOC (®_obstack);

/* Get the uses */
df_simulate_uses (insn, tmp_uses);

/* If ever one of the uses is our Set, add 6 */
if (bitmap_bit_p (tmp_uses, regno))
{
cost += 6;
}

/* Free bitmap */
BITMAP_FREE (tmp_uses), tmp_uses = NULL;
}

Basically, my question is: I'm using the df_simulate_uses because
that's the simplest I know how to get all the uses. Is this safe
without doing a df_analyse at the beginning of the adjust_cost
function ?

It works but I'd like to get some input on it.

Thanks again,
Jc

On Thu, Mar 4, 2010 at 6:42 PM, Andrew Pinski  wrote:
> On Thu, Mar 4, 2010 at 3:38 PM, Jean Christophe Beyler
>  wrote:
>> Yeah that's what I had tried first but I get :
>>
>> genextract: Internal error: abort in VEC_safe_set_locstr
>>
>> And I can't seem to get rid of that error.
>>
>> Any idea why this happens?
>
> Oh because the second time for the match_operand, you should be using
> match_dup (and yes the error message should be better).  I had forgot
> about that :).
> So try this:
> (define_insn "myInst"
>  [
>  (set (match_operand:DI 0 "register_operand" "=r")
>  (unspec:DI [(match_operand:DI 1 "register_operand" "r")
>  (match_operand:DI 2 "register_operand" "r")
>  (reg:DI 66)] 5))
> (set (reg:DI 66)
>  (unspec:DI [(match_dup 1)
>  (match_dup 2)
>  (reg:DI 66)] 6))
>  ]
>  ""
>  "myInst\\t%0,%1,%2"
>  [(set_attr "type"     "arith")
>  (set_attr "mode"     "DI")
>  (set_attr "length"   "1")])
>
>
> Thanks,
> Andrew Pinski
>


Passing options down to assembler and linker

2010-04-23 Thread Jean Christophe Beyler
Dear all,

I've been working on a side port for an architectural variant and
therefore there are a few differences in the assembler and linker to
be handled.

I know we can pass -Wl,option, -Wa,option from gcc down to as and ld
however if I have to write :

gcc -mArch2 -Wl,--arch2 -Wa,--arch2 hello.c

it gets a bit redundant, I must be blind because I can't seem to find
how to do it internally.

Any advice would be much appreciated!
Jc


Re: Passing options down to assembler and linker

2010-04-23 Thread Jean Christophe Beyler
That's exactly what I needed, thanks a lot :-)

Worked like a charm !
Jc

On Fri, Apr 23, 2010 at 2:09 PM, Nathan Froyd  wrote:
> On Fri, Apr 23, 2010 at 01:55:48PM -0400, Jean Christophe Beyler wrote:
>> I know we can pass -Wl,option, -Wa,option from gcc down to as and ld
>> however if I have to write :
>>
>> gcc -mArch2 -Wl,--arch2 -Wa,--arch2 hello.c
>>
>> it gets a bit redundant, I must be blind because I can't seem to find
>> how to do it internally.
>
> You want to get comfortable with specs:
>
> http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Spec-Files.html#Spec-Files
>
> and building in what specs should be handled by default:
>
> http://gcc.gnu.org/onlinedocs/gccint/Driver.html#Driver
>
> The ones you're interested in are ASM_SPEC and LINK_SPEC.
>
> -Nathan
>


Constant folding and Constant propagation

2009-02-05 Thread Jean Christophe Beyler
Dear all,

I'm currently working on removing the constant folding and constant
propagation because, on the architecture I'm working on, it is highly
costly to move a constant into a register if the number is big (we can
say over 16 bits).

Currently, I've been looking into this and it seems that I have to
hack into the fold-const.c file, CCP, SCCVN and probably the propagate
passes to tell the compiler to be careful if the number is too big.
But I'm sure there are many other places I'll have to hack and slash
to tell him to stop the folding and propagation. Is there an easier
way to do this ?

I've been looking at what happens for the Itanium since you'd have the
same problem if you had to use big numbers. You would in theory prefer
to not use too many move long instructions since they use two slots in
the bundle. But I fail to see if anything is really done in that
respect.

Thanks for your help,
Jean Christophe Beyler


Re: Constant folding and Constant propagation

2009-02-05 Thread Jean Christophe Beyler
True but what I've noticed with GCC 4.3.2 is that with this code:

#include 
#include 

int main(void)
{
long a = 0xcafecafe;

printf("Final: %lx %lx %lx\n", a, a+5, a+15);
return EXIT_SUCCESS;
}

Whether I compile it with a big number like here or a smaller number,
I'll get something like:

la  r8,$LC0 #Loading the address of
"Final: %lx %lx %lx\n"
lid r9,0xcafecafe#Loading the first immediate
lid r10,0xcafecb03 #Loading the second immediate
lid r11,0xcafecb0d #Loading the third immediate

But in my case, the lid is very costly for this large number. It would
be more efficient to have:

la  r8,$LC0 #Loading the address of
"Final: %lx %lx %lx\n"
lid r9,0xcafecafe#Loading the first immediate
addi r10,r9,0x5 #Loading the second immediate
addi r11,r9,0xf   #Loading the third immediate

So my question is, can I explain this in the machine description alone
and if so, does anybody know where?

Thanks again,
Jean Christophe Beyler

On Thu, Feb 5, 2009 at 4:02 PM, Joe Buck  wrote:
> On Thu, Feb 05, 2009 at 12:46:01PM -0800, Joe Buck wrote:
>> On Thu, Feb 05, 2009 at 12:34:14PM -0800, Jean Christophe Beyler wrote:
>> > I'm currently working on removing the constant folding and constant
>> > propagation because, on the architecture I'm working on, it is highly
>> > costly to move a constant into a register if the number is big (we can
>> > say over 16 bits).
>>
>> But in practice most constants that cprop deals with are values like
>> -1, 0, 1, or 2.  Wouldn't it be better to be able to constrain cprop
>> based on the values of the constants, than to eliminate it altogether?
>
> I shouldn't have said "most", but small constants are common.
>
>


Re: Constant folding and Constant propagation

2009-02-06 Thread Jean Christophe Beyler
Dear all,

I've been trying to play with the RTX costs by using a target function
using the macro TARGET_RTX_COSTS. However, I haven't been able to
really make any good progress. This is the program I have:

#include 
#include 

int main(void)
{
long a = 0xcafe;
printf("Final: %lx %lx\n", a, a+5);
return EXIT_SUCCESS;
}

What I would like to see is not have the two elements set as constants
in registers but only one and the other set with an add instruction.

I first looked at what rtx were sent to this function and I am
perplexed about how this would work. After many calls to the function,
I finally get this :

(const_int 51966 [0xcafe])
(const_int 51966 [0xcafe])
(const_int 51971 [0xcb03])
(const_int 51971 [0xcb03])

All of these have an outer code of SET. Therefore, I'm not quite
positive of how I'm supposed to implement my rtx_cost function. Since
I don't seem to get a choice between a set 0xcb03 and a (plus 0xcafe
5), how can I tell the compiler the different costs?

Jc

On Thu, Feb 5, 2009 at 4:10 PM, Ian Lance Taylor  wrote:
> Jean Christophe Beyler  writes:
>
>> I'm currently working on removing the constant folding and constant
>> propagation because, on the architecture I'm working on, it is highly
>> costly to move a constant into a register if the number is big (we can
>> say over 16 bits).
>>
>> Currently, I've been looking into this and it seems that I have to
>> hack into the fold-const.c file, CCP, SCCVN and probably the propagate
>> passes to tell the compiler to be careful if the number is too big.
>> But I'm sure there are many other places I'll have to hack and slash
>> to tell him to stop the folding and propagation. Is there an easier
>> way to do this ?
>
> This type of thing is normally controlled by the TARGET_RTX_COSTS
> hook.  E.g., see the handling of CONST_INT mips_rtx_costs in
> config/mips/mips.c.
>
> Ian
>


Re: Constant folding and Constant propagation

2009-02-07 Thread Jean Christophe Beyler
Ok, thanks for all this information and if you can dig that up it
would be nice too.  I'll start looking at that patch and PR33699 to
see if I can adapt them to my needs. Any reason why you wouldn't
recommend it?

I will keep you posted of anything I do for latter reference,
Jean Christophe Beyler

On Sat, Feb 7, 2009 at 5:05 AM, Paolo Bonzini  wrote:
> Steven Bosscher wrote:
>> On Fri, Feb 6, 2009 at 7:32 PM, Adam Nemet  wrote:
>>> I think you really need the Joern's optmize_related_values patch.  Also see
>>> PR33699.
>>
>> I wouldn't recommend that patch, but yes: Something that performs that
>> optimization ;-)
>
> Yes, something doing that using LCM was on my list of "things to do
> after fwprop to do what CSE currently does, but better".  I never got
> round to implementing it, I think.  I had an LCM-based replacement for
> canon_reg that I wanted to use as a basis, maybe I can dig it up.
>
> Paolo
>
>


Machine description question

2009-02-07 Thread Jean Christophe Beyler
Dear all,

I have a question about the way the machine description works and how
it affects the different passes of the compiler. I was reading the GNU
Compiler Collection Internals and I found this part (in section
14.8.1):

 (define_insn ""
   [(set (match_operand:SI 0 "general_operand" "=r")
 (plus:SI (match_dup 0)
  (match_operand:SI 1 "general_operand" "r")))]
   ""
   "...")

which has two operands, one of which must appear in two places, and

 (define_insn ""
   [(set (match_operand:SI 0 "general_operand" "=r")
 (plus:SI (match_operand:SI 1 "general_operand" "0")
  (match_operand:SI 2 "general_operand" "r")))]
   ""
   "...")

which has three operands, two of which are required by a constraint to
be identical.



My question is :
- How do the constraints work for match_dup or for when you put "0" as
a constraint? Since we want the same register but as input and not
output, how does the compiler set up its dependencies? Does it just
forget the "=" or "+" for the match_dup or "0" ?

- I also read that it is possible to assign a letter with a number for
the constraint. Does this mean I can do "r0" to say, it's going to be
an input register but it must look like the one you'll find in
position 0?

- My third question is: why not put "+r" as the constraint instead of
"=r" since we are going to be reading and writing into r0.

- Finally, I have a similar instruction definition, if I use the first
solution (match_dup), the instruction that will calculate the
input/output operand 0 sometimes gets removed by the web pass. But if
I use the second solution (put a "0" constraint), then it is no longer
removed. Any idea why these two definitions would be treated
differently in the web pass?

Thanks again in advance,
Jc


Re: Machine description question

2009-02-09 Thread Jean Christophe Beyler
Ok, I understand now. Thank you very much for your explanations,

Jean Christophe Beyler

On Sat, Feb 7, 2009 at 5:13 PM, Michael Meissner
 wrote:
> On Sat, Feb 07, 2009 at 03:54:51PM -0500, Jean Christophe Beyler wrote:
>> Dear all,
>>
>> I have a question about the way the machine description works and how
>> it affects the different passes of the compiler. I was reading the GNU
>> Compiler Collection Internals and I found this part (in section
>> 14.8.1):
>>
>>  (define_insn ""
>>[(set (match_operand:SI 0 "general_operand" "=r")
>>  (plus:SI (match_dup 0)
>>   (match_operand:SI 1 "general_operand" "r")))]
>>""
>>"...")
>>
>> which has two operands, one of which must appear in two places, and
>>
>>  (define_insn ""
>>[(set (match_operand:SI 0 "general_operand" "=r")
>>  (plus:SI (match_operand:SI 1 "general_operand" "0")
>>   (match_operand:SI 2 "general_operand" "r")))]
>>""
>>"...")
>>
>> which has three operands, two of which are required by a constraint to
>> be identical.
>>
>>
>> My question is :
>> - How do the constraints work for match_dup or for when you put "0" as
>> a constraint? Since we want the same register but as input and not
>> output, how does the compiler set up its dependencies? Does it just
>> forget the "=" or "+" for the match_dup or "0" ?
>
> Note, the two are not the same thing before register allocation.  Before
> register allocation, in the first example, the argument to the plus must point
> to the exact RTL that the result is.  Given that normally a different pseudo
> register is used for the output, the first example won't match a lot of insns
> that are created.
>
> The two general_operands are distinct and the normal live/death information
> will track the insns.  So you are saying that the first general operand is a
> write only value, and the second is a read only value.  During register
> allocation, the register allocator notices the "0" and then reuses the
> register.  In that example, the value of operand1 dies in that insn, replaced
> by the value in operand0.
>
> For example, consider a 2 address machine like the 386 (and ignoring things
> like LEA).  If the value of operand1 is live after the insn, and operand0 
> wants
> to be %eax, operand1 is in %ebx, and operand2 is in %edx, then the register
> allocation typically generates code like:
>
>movl %ebx, %eax
>addl %edx, %eax
>
> If operand1's value was not needed after the insn, then the compiler would
> possibly allocate the output to %ebx, and do:
>
>addl %edx, %ebx
>
>>
>> - I also read that it is possible to assign a letter with a number for
>> the constraint. Does this mean I can do "r0" to say, it's going to be
>> an input register but it must look like the one you'll find in
>> position 0?
>
> I'm not sure what you are asking here.  If you are talking about the
> multi-letter constraints in define_register_constraint, that is basically a 
> way
> to extend the machine description for more constraints when you run out of
> single letters.
>
>> - My third question is: why not put "+r" as the constraint instead of
>> "=r" since we are going to be reading and writing into r0.
>
> Because the output value is not an input to the add in the second example.  It
> is an input to the add in the first, due to the use of match_dup.
>
>> - Finally, I have a similar instruction definition, if I use the first
>> solution (match_dup), the instruction that will calculate the
>> input/output operand 0 sometimes gets removed by the web pass. But if
>> I use the second solution (put a "0" constraint), then it is no longer
>> removed. Any idea why these two definitions would be treated
>> differently in the web pass?
>
> I don't know much about the web pass, however as I said, the two definitions
> are fundamentally different.
>
> --
> Michael Meissner, IBM
> 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA
> meiss...@linux.vnet.ibm.com
>


Fwd: Constant folding and Constant propagation

2009-02-25 Thread Jean Christophe Beyler
Dear all,

I was working on the machine description so I was postponing a bit
this problem but now I have a bit more time on my hands to handle it.
I'm starting to look at the various links and code you've provided me
and will keep you posted if I make any headway or not ;-).

> For the GCC port I work on, I have fixed this by weighing the rtx_cost
> of propagating a register copy Vs propagating the constant into an insn.
> I have an initial patch for this problem.

Do you have a link to that patch? So that I can see if it applies for me ?

Thanks,
Jc


On Tue, Feb 10, 2009 at 2:25 PM, Rahul Kharche  wrote:
> I am looking at a related problem in GCSE, GCC 4.3 whereby constants
> are propagated to their use irrespective of the final instruction cost
> of generating them (machine cycles or instruction count).
>
> Global constant propagation effectively voids common expressions that
> form large constants, identified by global CSE. This is especially
> true of SYMBOl_REF constants like section-anchors generated while
> indexing global arrays.
>
> For the GCC port I work on, I have fixed this by weighing the rtx_cost
> of propagating a register copy Vs propagating the constant into an insn.
> I have an initial patch for this problem.
>
>
> Rahul Vijay Kharche
>


No address_cost calls when inlining ?

2009-03-09 Thread Jean Christophe Beyler
Dear all,

I am currently working on defining the cost of an address calculation
and I have an issue when the function is inlined. It seems that when
the address cost is taken into account only sometimes.

Here is the test code that I am testing this on:

long tab[2];

void foo(long x) {
   long i;

   for (i = 0; i < 2; i++) {
  tab[i] = x;
   }

   return;
}

long main() {
   int i;
   for (i = 0; i < 10; i++) {
  foo(i);
   }
   return 0;
}


I get for the function foo :
la  r6,tab
std r8,8(r6)
std r8,0(r6)

and for main:
std r7,tab
std r7,tab+8

I want the version of foo because the store with an address as
destination is costly on my architecture, which is why I defined
TARGET_ADDRESS_COST and added a cost when I get this scenario.
However, in the compilation of this code, it seems that, when the
function is inlined, the address_cost function does not seem to be
called anymore. Any ideas why ?

Thanks in advance,
Jean Christophe Beyler


Re: No address_cost calls when inlining ?

2009-03-10 Thread Jean Christophe Beyler
Ahhh ok, so basically I've hit the same wall as for the constant
folding and constant propagation!
Oh well, I will see how important it is for me to try to fix it in this case.

Thanks for the answer,
Jc

On Tue, Mar 10, 2009 at 4:18 AM, Paolo Bonzini  wrote:
>
>> I want the version of foo because the store with an address as
>> destination is costly on my architecture, which is why I defined
>> TARGET_ADDRESS_COST and added a cost when I get this scenario.
>> However, in the compilation of this code, it seems that, when the
>> function is inlined, the address_cost function does not seem to be
>> called anymore. Any ideas why ?
>
> This is (a variant of) PR33699.
>
> Paolo
>
>


Re: Constant folding and Constant propagation

2009-03-13 Thread Jean Christophe Beyler
I set up your patch and I get an internal error on this test program:

extern void foo(int, int);
extern int bar(void);
int startup;

void foobar()
{
  int i;
  while(1)
  {
if (bar()) {
foo(0,0);
}
  }
}

Here's the error:
/home/beyler/cyclops64/src/tnt/kernel/process_manager/testnode.c: In
function 'foobar':
/home/beyler/cyclops64/src/tnt/kernel/process_manager/testnode.c:15:
internal compiler error: in insert_regs, at cse.c:1156
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.


I think it's got something to do with my machine description but this
only happens when the values of foo are both 0. I haven't found
another case where it fails. It seems when the compiler sees two
constants 0 for the register (r8 which is my first input register), it
fails on the second. Somewhere between both calls, it must remove the
quantity.

Any ideas ?
Jc


Re: Constant folding and Constant propagation

2009-03-14 Thread Jean Christophe Beyler
If I replace your lines:

if ((GET_CODE (sets[i].src_elt->exp) == CONST_INT))
insert_const_anchors (dest, sets[i].src_elt, GET_MODE (dest));

with:

if ((GET_CODE (sets[i].src_elt->exp) == CONST_INT) && (INTVAL
(sets[i].src_elt->exp) != 0))
insert_const_anchors (dest, sets[i].src_elt, GET_MODE (dest));

it does not fail anymore since I remove the 0 corner case I am seeing.
I fail to see why it is a problem however. Like I said previously, it
might be a problem with my target architecture but I fail to see why 0
is considered differently in the target files.

Jc


Difference between local/global/parameter array handling

2009-03-16 Thread Jean Christophe Beyler
Dear all,

I've been working on explaining to GCC the cost of loads/stores on my
target and I arrived to this problem. Consider the following code:

uint64_t sum = 0;
for(i=0; i

Variable length arrays : aligned on stack

2009-03-23 Thread Jean Christophe Beyler
Dear all,

I've been looking to a problem where if I have this defined in the function:

void foo(int len)
{
long arr[len];
...
}

I get a complicated code to calculate the address of this
variable-length array. It seems that the compiler is aligning the
array when this code:

void foo(int len)
{
long arr[512];
...
}

Simply moves the stack pointer 512*sizeof(long) bytes down.

I suppose this is a Machine Description problem but was wondering if
you had any tips on where to look?

Thanks for your help,
Jean Christophe Beyler


Inefficient loads in certain loops depending on data declaration

2009-04-21 Thread Jean Christophe Beyler
Dear all,

As I work through handling load multiples and store multiples for my
target architecture, I came in front of this scenario:

int foo(int data[10240])
{
int w0, w1;
int part = 0,i ;

for (i=0; i<1;i+=2){
w0 = data[i];
w1 = data[i+1];
part += w0 + w1 ;
}

return part;
}


int bar(void)
{
int w0, w1;
int part = 0,i ;
int data[10240];

fillit (data);

for (i=0; i<1;i+=2){
w0 = data[i];
w1 = data[i+1];
part += w0 + w1 ;
}

return part;
}


In the case where data is defined as a parameter, I get two separate
loads with no offsets. GCC handles both loads separately. This means
more register pressure and more instructions in the loop. In essence I
get:

ld r1, (0)r5
ld r2, (0)r6

Instead of :

ld r1, (0)r5
ld r2, (8)r6

which I do get if the array is declared in the function.

I have written the address_cost function to handle this case and tell
the compiler that it is better to use the offsets but in this
scenario, the compiler is not calling the address cost function before
already getting those two different registers. So I don't why it's
already expanding those loads before getting to that call

Any ideas, thanks?
Jc


Re: Reserving a number of consecutive registers

2009-04-21 Thread Jean Christophe Beyler
Ok, I've been working at this and have actually made a bit of progress.

- I now have a framework that finds the group of loads (let's assume
they stay together).

- With the DF framework, I'm able to figure out which registers are
being used and which are free to be used.

- I've pretty much got it to change the registers to the ones I want
but there are still some corner cases where it seems to be badly
handling that and actually changing the semantics of the program.

I'll look into that, thanks,
Jc

On Mon, Apr 20, 2009 at 3:01 PM, Eric Botcazou  wrote:
>> Ok, I'll try to look at that. Is there an area where I can see how to
>> initialize the framework and get information about which registers are
>> free?
>
> The API is in df.h, see for example ifcvt.c.
>
> --
> Eric Botcazou
>


Machine description

2009-04-22 Thread Jean Christophe Beyler
Dear all,

I've been working on the Machine description of my target and was
wondering if you could help me out here.

I've been trying to force GCC out of it's habit of generating this code :
(insn 28 8 10 2 glob.c:13 (set (reg:DI 9 r9)
(mem/s:DI (symbol_ref:DI ("data") )
[4 data+0 S8 A64])) 71 {*movdi_internal1} (nil))

-> Load r9, 0(data)

Because, in my target it's costly to do so, I would prefer :

Mov r10, data
Load r9, 0(r10)

This way, GCC can optimize this by taking the first move out of the
loop for example. Otherwise, I have do a final pass but that's too
late for all the optimizations.

My problem is that I've defined the define_expand "movdi" and it only
sees the right case after reload is in progress or completed.
Therefore, I can't create any new pseudos.

Any ideas on how to do this, I've been looking at the other
architectures but haven't seen anything that resembles this,
As always, thanks a lot for your time,
Jc


Re: Machine description

2009-04-23 Thread Jean Christophe Beyler
I've actually done that. I defined the expansion like this:

/* If we can create pseudos, the first operand is a register but the
second is memory */
  if (
(
 can_create_pseudo_p () &&
 register_operand (operands[0], DImode)
  && memory_operand (operands[1], DImode)))
  {
/* If the memory is not a REG */
if (GET_CODE(XEXP (operands[1], 0)) != REG)
{
/* Make a register and move it there first before performing the load*/
rtx tmp = gen_reg_rtx(DImode);
emit_move_insn (tmp, XEXP (operands[1], 0));
emit_move_insn (operands[0], gen_rtx_MEM (DImode,tmp));
DONE;
}
  }

Then I have my base cost for address_cost that is used and actually
does tell GCC that loading from a global address is expensive (I put a
value like 10 for that) whereas a register with an offset would get 2.

For the moment, no change, the expansion code is actually not used in
this case because GCC only presents me with the load from a global
during or after reload. Therefore, it's already done and he doesn't
seem to want to change his ways. I haven't played with the rtx_cost
but that's my next step.

Any comments, ideas?
Jc

On Wed, Apr 22, 2009 at 4:08 PM, Ian Lance Taylor  wrote:
> Jean Christophe Beyler  writes:
>
>> I've been working on the Machine description of my target and was
>> wondering if you could help me out here.
>>
>> I've been trying to force GCC out of it's habit of generating this code :
>> (insn 28 8 10 2 glob.c:13 (set (reg:DI 9 r9)
>> (mem/s:DI (symbol_ref:DI ("data") )
>> [4 data+0 S8 A64])) 71 {*movdi_internal1} (nil))
>>
>> -> Load r9, 0(data)
>>
>> Because, in my target it's costly to do so, I would prefer :
>>
>> Mov r10, data
>> Load r9, 0(r10)
>>
>> This way, GCC can optimize this by taking the first move out of the
>> loop for example. Otherwise, I have do a final pass but that's too
>> late for all the optimizations.
>>
>> My problem is that I've defined the define_expand "movdi" and it only
>> sees the right case after reload is in progress or completed.
>> Therefore, I can't create any new pseudos.
>>
>> Any ideas on how to do this, I've been looking at the other
>> architectures but haven't seen anything that resembles this,
>
> This looks somewhat difficcult.  I would suggest having the movMM
> expander generate the two instruction sequence when can_create_pseudo_p
> returns true.  Then set the costs so that a load from an address plus an
> offset is expensive.
>
> Ian
>


Re: Reserving a number of consecutive registers

2009-04-30 Thread Jean Christophe Beyler
Ok, now in the easy case it seems to be working. I've handled most
cases but I was wondering about one problem that I don't seem to be
able to handle.

Let's say I want to rename register r6 to r15. I can safely do that in
the block if I know that r15 is not used in that basic block and that
r6 is not a live-out of the basic block.

However, how to handle the case where r6 is a live-out ? Then, I would
have to make sure that r15 is not defined in another basic block, thus
destroying my new live-out value?

It seems to be a cat-mouse game:

- I could copy back r15 to r6 in that case though I would like to try
to not have to do that because that requires an extra copy at the end
of the block
- I could go through all the blocks and rename all the r6s into r15s
and vice-versa. As long as they are regular registers (and not the
return register for example), it should work.

I don't see how to do to handle this case neatly, any ideas?

Thanks in advance,
Jc

On Tue, Apr 21, 2009 at 5:23 PM, Jean Christophe Beyler
 wrote:
> Ok, I've been working at this and have actually made a bit of progress.
>
> - I now have a framework that finds the group of loads (let's assume
> they stay together).
>
> - With the DF framework, I'm able to figure out which registers are
> being used and which are free to be used.
>
> - I've pretty much got it to change the registers to the ones I want
> but there are still some corner cases where it seems to be badly
> handling that and actually changing the semantics of the program.
>
> I'll look into that, thanks,
> Jc
>
> On Mon, Apr 20, 2009 at 3:01 PM, Eric Botcazou  wrote:
>>> Ok, I'll try to look at that. Is there an area where I can see how to
>>> initialize the framework and get information about which registers are
>>> free?
>>
>> The API is in df.h, see for example ifcvt.c.
>>
>> --
>> Eric Botcazou
>>
>


Re: Reserving a number of consecutive registers

2009-05-01 Thread Jean Christophe Beyler
Yes, I have done that and now am looking to limit those numbers. For
example, I don't copy back the ones that are not live-out registers.
It works well but I am have an issue when recompiling the whole
compiler.

I've simplified this to this, if in my reorg function, I do only this  :

FOR_EACH_BB (bb)
{
bitmap regs_forw = BITMAP_ALLOC (®_obstack);
bitmap_copy (regs_forw, df_get_live_in (bb));
BITMAP_FREE (regs_forw);
}

It fails here:
/home/beyler/cyclops64/src/cyclops64-gcc-4.3.2/libgcc/../gcc/libgcc2.c:
In function '__divdi3':
/home/beyler/cyclops64/src/cyclops64-gcc-4.3.2/libgcc/../gcc/libgcc2.c:1102:
internal compiler error: Segmentation fault

I'm looking into that now, am I allowed to be doing this in the reorg ?
#0  df_get_live_in (bb=0xb7a9c8ac) at
/home/beyler/cyclops64/src/cyclops64-gcc-4.3.2/gcc/df-problems.c:93
93  return DF_LR_IN (bb);

Or is there a way to know if I'm allowed to do that copy?
Thanks again,
Jc

On Thu, Apr 30, 2009 at 3:16 PM, Eric Botcazou  wrote:
>> Let's say I want to rename register r6 to r15. I can safely do that in
>> the block if I know that r15 is not used in that basic block and that
>> r6 is not a live-out of the basic block.
>>
>> However, how to handle the case where r6 is a live-out ? Then, I would
>> have to make sure that r15 is not defined in another basic block, thus
>> destroying my new live-out value?
>>
>> It seems to be a cat-mouse game:
>>
>> - I could copy back r15 to r6 in that case though I would like to try
>> to not have to do that because that requires an extra copy at the end
>> of the block
>
> Yes, you need to make a copy in this case but its cost could be offsetted by
> the gain from the load_multiple.  Or it could be eliminated by running a new
> instance of cprop_hardreg.  You need to experiment and tune the pass.
>
> --
> Eric Botcazou
>


Re: Reserving a number of consecutive registers

2009-05-01 Thread Jean Christophe Beyler
Ok, I added a df_analyze at the beginning of my target reorg function
and now it works. Is there anything I should add to cleanup afterwards
?

Sorry about this, I'm slowly learning different parts of the GCC
compiler as I go,
Thanks again for all your help,
Jc

On Fri, May 1, 2009 at 4:33 PM, Jean Christophe Beyler
 wrote:
> Yes, I have done that and now am looking to limit those numbers. For
> example, I don't copy back the ones that are not live-out registers.
> It works well but I am have an issue when recompiling the whole
> compiler.
>
> I've simplified this to this, if in my reorg function, I do only this  :
>
>    FOR_EACH_BB (bb)
>    {
>        bitmap regs_forw = BITMAP_ALLOC (®_obstack);
>        bitmap_copy (regs_forw, df_get_live_in (bb));
>        BITMAP_FREE (regs_forw);
>    }
>
> It fails here:
> /home/beyler/cyclops64/src/cyclops64-gcc-4.3.2/libgcc/../gcc/libgcc2.c:
> In function '__divdi3':
> /home/beyler/cyclops64/src/cyclops64-gcc-4.3.2/libgcc/../gcc/libgcc2.c:1102:
> internal compiler error: Segmentation fault
>
> I'm looking into that now, am I allowed to be doing this in the reorg ?
> #0  df_get_live_in (bb=0xb7a9c8ac) at
> /home/beyler/cyclops64/src/cyclops64-gcc-4.3.2/gcc/df-problems.c:93
> 93          return DF_LR_IN (bb);
>
> Or is there a way to know if I'm allowed to do that copy?
> Thanks again,
> Jc
>
> On Thu, Apr 30, 2009 at 3:16 PM, Eric Botcazou  wrote:
>>> Let's say I want to rename register r6 to r15. I can safely do that in
>>> the block if I know that r15 is not used in that basic block and that
>>> r6 is not a live-out of the basic block.
>>>
>>> However, how to handle the case where r6 is a live-out ? Then, I would
>>> have to make sure that r15 is not defined in another basic block, thus
>>> destroying my new live-out value?
>>>
>>> It seems to be a cat-mouse game:
>>>
>>> - I could copy back r15 to r6 in that case though I would like to try
>>> to not have to do that because that requires an extra copy at the end
>>> of the block
>>
>> Yes, you need to make a copy in this case but its cost could be offsetted by
>> the gain from the load_multiple.  Or it could be eliminated by running a new
>> instance of cprop_hardreg.  You need to experiment and tune the pass.
>>
>> --
>> Eric Botcazou
>>
>


Code optimization only in loops

2009-05-07 Thread Jean Christophe Beyler
Dear all,

I come back to you with another weirdness due to bad code generation
on my target architecture. I have a very simplified (for the moment)
rtx_costs and my address_cost is inspired by the i386 version.
However, I actually patched in the whole i386_rtx_cost function,
constraints, predicates to see if it was something I had done wrongly
but I seem to get the same results.

This is my two functions:

uint64_t foo (uint64_t n, uint64_t m)
{
uint64_t sum = 0,i;
for(i=n;i

Re: Code optimization only in loops

2009-05-13 Thread Jean Christophe Beyler
Ok, for the i386 port, I use uint32_t instead of uint64_t because
otherwise the assembly code generated is a bit complicated (I'm on a
32 bit machine).

The tree dump from final_cleanup are the same for the goo function:
goo (i)
{
:
  return data[i + 13] + data[i];

}


However, the first RTL dump from expand gives this for the i386 port:

(insn 6 5 7 3 ld.c:17 (parallel [
(set (reg:SI 61)
(plus:SI (reg/v:SI 59 [ i ])
(const_int 13 [0xd])))
(clobber (reg:CC 17 flags))
]) -1 (nil))

(insn 7 6 8 3 ld.c:17 (set (reg/f:SI 62)
(symbol_ref:SI ("data") )) -1 (nil))

(insn 8 7 9 3 ld.c:17 (set (reg/f:SI 63)
(symbol_ref:SI ("data") )) -1 (nil))

(insn 9 8 10 3 ld.c:17 (set (reg:SI 64)
(mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
(const_int 4 [0x4]))
(reg/f:SI 63)) [3 data S4 A32])) -1 (nil))

(insn 10 9 11 3 ld.c:17 (set (reg:SI 65)
(mem/s:SI (plus:SI (mult:SI (reg:SI 61)
(const_int 4 [0x4]))
(reg/f:SI 62)) [3 data S4 A32])) -1 (nil))

(insn 11 10 12 3 ld.c:17 (parallel [
(set (reg:SI 60)
(plus:SI (reg:SI 65)
(reg:SI 64)))
(clobber (reg:CC 17 flags))
]) -1 (expr_list:REG_EQUAL (plus:SI (mem/s:SI (plus:SI
(mult:SI (reg:SI 61)
(const_int 4 [0x4]))
(reg/f:SI 62)) [3 data S4 A32])
(mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
(const_int 4 [0x4]))
(reg/f:SI 63)) [3 data S4 A32]))
(nil)))

As we can see, the compiler moves 13, and the @ of data, then
muliplies the 13 with 4 to get the right size and then performs the 2
loads and finally has a plus.

In my port, I get:

(insn 6 5 7 3 ld.c:17 (set (reg:DI 75)
(plus:DI (reg/v:DI 73 [ i ])
(const_int 13 [0xd]))) -1 (nil))

(insn 7 6 8 3 ld.c:17 (set (reg/f:DI 76)
(symbol_ref:DI ("data") )) -1 (nil))

(insn 8 7 9 3 ld.c:17 (set (reg:DI 78)
(const_int 3 [0x3])) -1 (nil))

(insn 9 8 10 3 ld.c:17 (set (reg:DI 77)
(ashift:DI (reg:DI 75)
(reg:DI 78))) -1 (nil))

(insn 10 9 11 3 ld.c:17 (set (reg/f:DI 79)
(plus:DI (reg/f:DI 76)
(reg:DI 77))) -1 (nil))

(insn 11 10 12 3 ld.c:17 (set (reg/f:DI 80)
(symbol_ref:DI ("data") )) -1 (nil))

(insn 12 11 13 3 ld.c:17 (set (reg:DI 82)
(const_int 3 [0x3])) -1 (nil))

(insn 13 12 14 3 ld.c:17 (set (reg:DI 81)
(ashift:DI (reg/v:DI 73 [ i ])
(reg:DI 82))) -1 (nil))

(insn 14 13 15 3 ld.c:17 (set (reg/f:DI 83)
(plus:DI (reg/f:DI 80)
(reg:DI 81))) -1 (nil))

(insn 15 14 16 3 ld.c:17 (set (reg:DI 84)
(mem/s:DI (reg/f:DI 79) [2 data S8 A64])) -1 (nil))

(insn 16 15 17 3 ld.c:17 (set (reg:DI 85)
(mem/s:DI (reg/f:DI 83) [2 data S8 A64])) -1 (nil))

(insn 17 16 18 3 ld.c:17 (set (reg:DI 74)
(plus:DI (reg:DI 84)
(reg:DI 85))) -1 (nil))


Which seems to be the same idea, except that constant 3 gets load up
and a shift is performed. Is it possible that it's that that is
causing my problem in code generation?

I'm trying to figure out why my port is generating a shift instead of
simply a mult. I actually changed the cost of shift to a large value
and then it uses adds instead of simply a mult. I seem to think that
this is then an rtx_cost problem where I'm not telling the compiler
that a multiplication in this case is correct.

I've been playing with rtx_cost but have been unable to really get it
to generate the right code.

Thanks again for your help and insight,
Jc

On Fri, May 8, 2009 at 5:18 AM, Paolo Bonzini  wrote:
>
>> It seems that when set in a loop, the program is able to perform some
>> type of optimization to actually get the use of the offsets where as
>> in the case of no loop, we have twice the calculations of instructions
>> for each address calculations.
>
> I suggest you look at the dumps for i386 to see which pass does the
> changes, and then see what happens in your port.
>
> Paolo
>


Re: internal compiler error: in reload_combine_note_use, at postreload.c:1093

2009-05-14 Thread Jean Christophe Beyler
Except if I'm mistaken, force_reg will generally call gen_reg_rtx
which does check for those two flags internally (via
can_create_pseudo_p). So you should not have that error in this case.

I suggest you use a debug_rtx on the operand that's a problem, and
first check if gen_reg_rtx is used to create it. If that's the case,
you can use gdb to trace when it's called, get a backtrace and work
from there.

Jc

On Wed, May 13, 2009 at 11:54 PM, daniel tian  wrote:
> Thank you for your advice.
>
> Yes. I checked the MD file and relative machine.h/.c, there are some
> places which call the ''force_reg' unconditionally. I modified the it
> in "movm" insn pattern, the error still exists. And I also check the
> mips/arm, they also call 'force_reg' unconditional in some places.
>
> Anything I missed?
>
> 2009/5/13 Dave Korn :
>> daniel tian wrote:
>>> I have ported gcc4.0.2  to 32bit RISC chip. But internal compiler
>>> error happened: in reload_combine_note_use, at postreload.c:1093 . I
>>> tracked the code with insight.  error occurred in "CASE REG", when the
>>> register number is larger than FIRST_PSEUDO_REGISTER. Does this mean
>>> the reload register allocation failed?  What I know is that there is
>>> no pseudo register used after reload completed (Is this right?).
>>
>>  Yes, this is correct.
>>
>>> Can anyone give me some advice?
>>
>>  It is most likely one of the expanders or other patterns in your MD file is
>> unconditionally calling a function such as force_reg() without checking the
>> reload_completed / reload_in_progress flags.  If this pattern gets invoked
>> post-reload, that will allocate a pseudo and then fail later.
>>
>>    cheers,
>>      DaveK
>>
>>
>>
>


Re: Expanding a load instruction

2009-06-09 Thread Jean Christophe Beyler
Dear all,

I've moved forward on this issue. Again, the problem is not that the
data is not aligned but that the compiler tries to generate this
instruction:

(set (reg:HI 141) (mem/s/j:HI (plus:DI (reg:DI 134 [ ivtmp.23 ])
 (const_int 1 [0x1])) [0 .geno+0 S2 A16]))

And, in my target architecture, this is not authorized. I want to
transform this into:

(set (reg:DI 135) (plus:DI (reg:DI 134 [ ivtmp.23 ])
 (const_int 1 [0x1])) [0 .geno+0 S2 A16]))
(set (reg:HI 141) (mem/s/j:HI (reg:DI 135))


I've tried playing with define_expand, I can detect this problem, I
can emit my first move and then the second but for some reason, the
compiler reassembles them back together.

Any ideas why I am getting that? Am I doing anything wrong? After I
expand this, I use the DONE macro...
Thanks,
Jc

On Fri, Jun 5, 2009 at 1:17 PM, Jean Christophe
Beyler wrote:
> It's already at 1 actually. The problem isn't really that the data
> itself is unaligned but that the offset of the load must be a power of
> 2 (multiple of 2 for a HI mode, multiple of 4 for a SI, ...).
>
> Therefore, that macro does not seem to apply to me (and it's already at 1!).
>
> Thanks, I'm continuing to look into this issue,
> Jc
>
> On Fri, Jun 5, 2009 at 11:48 AM, Dave Korn
>  wrote:
>> fearyourself wrote:
>>
>>> In the instruction set of my architecture, the offsets of a half-load
>>> (HImode) have to be multiples of 2. However, if I set up a structure
>>> in a certain way, the compiler will generate:
>>>
>>> (mem/s/j:HI (plus:DI (reg:DI 134 [ ivtmp.23 ])
>>>         (const_int 1 [0x1])) [0 .geno+0 S2 A16])
>>>
>>> As the memory operand for the load.
>>>
>>> Now, one solution I am going to try to fix this is to use
>>> define_expand and add a move into another register before this load
>>> and then load from that register (thus removing the offset of 1).
>>>
>>> My question is: Is that how it should be done or is there another solution?
>>
>>  This looks like what you need:
>>
>>  -- Macro: STRICT_ALIGNMENT
>>     Define this macro to be the value 1 if instructions will fail to
>>     work if given data not on the nominal alignment.  If instructions
>>     will merely go slower in that case, define this macro as 0.
>>
>>    cheers,
>>      DaveK
>>
>>
>


Re: Expanding a load instruction

2009-06-10 Thread Jean Christophe Beyler
I'll be looking into this but I thought that GO_IF_LEGITIMATE_ADDRESS
is for branches ?

This is not my case. I've simplified my test case into:

struct test {
const char  *name;  /* full name */
chara;  /* symbol */
signed char b;
unsigned short  c;  /* creation/c mask value */
};

extern struct test tab[];   /* the master list of tabter types */

void bar(unsigned short);

int main(void)
{
int i;
for (i=0;i<128;i++)
if(tab[i].b)
bar(tab[i].c);
}

The compiler loads tab[i].b and, to do so, loads up the address of &tab[i].b.

Then, when it wants to pass the value of tab[i].c it tries to do a
ldhu r5, 1(r10)

which is wrong in our case. Because it is loading a HI, the offset
must be a multiple of 2. It so happens that currently the problem is
arrising when considering this RTX

(insn:HI 77 76 124 struct2.c:17 (set (reg:DI 8 r8)
(sign_extend:DI (mem:HI (plus:DI (reg:DI 48 r48 [orig:134
ivtmp.19 ] [134])
(const_int 1 [0x1])) [0 S2 A16]))) 61
{extendhidi2_cyc64} (nil))

Am I wrong to think that, in this case, GO_IF_LEGITIMATE_ADDRESS would
not help? On my architecture, we actually have a scratch register we
can use.

I'm thinking of using a define_peephole2 to correct this as a later
pass, taking the address and moving it into a register thus getting a
0 offset.

I just feel that is so ugly, there should be a way to tell GCC that a
mem cannot be defined with:
(mem:HI (plus:DI (reg:DI 48 r48 [orig:134 ivtmp.19 ] [134])
(const_int 1 [0x1])) [0 S2 A16]))) 61
{extendhidi2_cyc64} (nil))

in the case of a HI because of the constant (and then same for SI/DI of course).

Thanks again for your help, and I will continue looking into this,
Jc

On Tue, Jun 9, 2009 at 6:36 PM, Dave
Korn wrote:
> Jean Christophe Beyler wrote:
>> Dear all,
>>
>> I've moved forward on this issue. Again, the problem is not that the
>> data is not aligned but that the compiler tries to generate this
>> instruction:
>>
>> (set (reg:HI 141) (mem/s/j:HI (plus:DI (reg:DI 134 [ ivtmp.23 ])
>>          (const_int 1 [0x1])) [0 .geno+0 S2 A16]))
>>
>> And, in my target architecture, this is not authorized. I want to
>> transform this into:
>>
>> (set (reg:DI 135) (plus:DI (reg:DI 134 [ ivtmp.23 ])
>>          (const_int 1 [0x1])) [0 .geno+0 S2 A16]))
>> (set (reg:HI 141) (mem/s/j:HI (reg:DI 135))
>>
>>
>> I've tried playing with define_expand, I can detect this problem, I
>> can emit my first move and then the second but for some reason, the
>> compiler reassembles them back together.
>>
>> Any ideas why I am getting that? Am I doing anything wrong? After I
>> expand this, I use the DONE macro...
>
>
>  How about forbidding the form you don't allow in GO_IF_LEGITIMATE_ADDRESS?
>
>    cheers,
>      DaveK
>


Re: Expanding a load instruction

2009-06-10 Thread Jean Christophe Beyler
Ok, I wrongly read what this macro did. Sorry about that. I was
looking at the i386 port and use of this variable and this code came
up:


#ifdef REG_OK_STRICT
#define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR) \
do {\
  if (legitimate_address_p ((MODE), (X), 1))\
goto ADDR;  \
} while (0)

#else
#define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR) \
do {\
  if (legitimate_address_p ((MODE), (X), 0))\
goto ADDR;  \
} while (0)

#endif



and :

#define LEGITIMIZE_ADDRESS(X, OLDX, MODE, WIN)  \
do {\
  (X) = legitimize_address ((X), (OLDX), (MODE));   \
  if (memory_address_p ((MODE), (X)))   \
goto WIN;   \
} while (0)


It seems that I should do the same as them no for my solution. First
implement the legitimate_address function and then probably define it
in both macros.

As for the target hook, we are using GCC 4.3.2 for the moment and,
sadly, have not yet any plans to move forward to the latest version of
GCC.

Thanks again,
Jc

On Wed, Jun 10, 2009 at 10:29 AM, Dave
Korn wrote:
> Jean Christophe Beyler wrote:
>> I'll be looking into this but I thought that GO_IF_LEGITIMATE_ADDRESS
>> is for branches ?
>
>  No, absolutely not.  GILA is a general filter that has overall control over
> which forms of addressing modes used to address memory may be generated in 
> RTL.
>
> http://gcc.gnu.org/onlinedocs/gccint/Addressing-Modes.html
>
>  Hmm, I must have been looking at a slightly outdated build of the docs;
> according to that page it's been deprecated on HEAD in favour of the hook
> TARGET_LEGITIMATE_ADDRESS_P.  You'll need to check the particular sources
> you're using, that must be a fairly recent change(*).
>
>
>    cheers,
>      DaveK
> --
> (*) - Yes: it was changed in r.147534, around four weeks ago.
> http://gcc.gnu.org/viewcvs?view=rev&revision=147534
>
>


Re: Expanding a load instruction

2009-06-11 Thread Jean Christophe Beyler
Thanks, I've pretty much finished this part of the implementation and
it seems to be working well.

Thank you all very much for your help,
Jc

On Wed, Jun 10, 2009 at 11:12 AM, Dave
Korn wrote:
> Jean Christophe Beyler wrote:
>
>> It seems that I should do the same as them no for my solution. First
>> implement the legitimate_address function and then probably define it
>> in both macros.
>
>  Sounds about right.
>
>> As for the target hook, we are using GCC 4.3.2 for the moment and,
>> sadly, have not yet any plans to move forward to the latest version of
>> GCC.
>
>  Don't need to worry about it until you go to GCC 4.5.0 or above, then.
>
>    cheers,
>      DaveK
>
>


Re: Code optimization only in loops

2009-06-11 Thread Jean Christophe Beyler
I've gone back to this problem (since I've solved another one ;-)).
And I've moved forward a bit:

It seems that if I consider an array of characters, there are no
longer any shifts and therefore I do get my two loads with the use of
an offset:

Code:

char data[1312];

uint64_t goo (uint64_t i)
{
return data[i] - data[i+13];
}

generates the right code with two loads with the same base but
different offsets.

If I use anything else than a char type, I get the problem in
generation. This seems to confirm that somehow, the way things are
generated blocks the subsequent optimization passes in seeing that the
addresses are linked.

Right now, I'm trying to figure out why I'm getting shifts and is this
the problem instead of a multiply. Since this was one of the
differences between what I get and what the i386 port gets.

If you've got any ideas, thanks again,
Jean Christophe

On Wed, May 13, 2009 at 4:58 PM, Jean Christophe
Beyler wrote:
> Ok, for the i386 port, I use uint32_t instead of uint64_t because
> otherwise the assembly code generated is a bit complicated (I'm on a
> 32 bit machine).
>
> The tree dump from final_cleanup are the same for the goo function:
> goo (i)
> {
> :
>  return data[i + 13] + data[i];
>
> }
>
>
> However, the first RTL dump from expand gives this for the i386 port:
>
> (insn 6 5 7 3 ld.c:17 (parallel [
>            (set (reg:SI 61)
>                (plus:SI (reg/v:SI 59 [ i ])
>                    (const_int 13 [0xd])))
>            (clobber (reg:CC 17 flags))
>        ]) -1 (nil))
>
> (insn 7 6 8 3 ld.c:17 (set (reg/f:SI 62)
>        (symbol_ref:SI ("data") )) -1 (nil))
>
> (insn 8 7 9 3 ld.c:17 (set (reg/f:SI 63)
>        (symbol_ref:SI ("data") )) -1 (nil))
>
> (insn 9 8 10 3 ld.c:17 (set (reg:SI 64)
>        (mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
>                    (const_int 4 [0x4]))
>                (reg/f:SI 63)) [3 data S4 A32])) -1 (nil))
>
> (insn 10 9 11 3 ld.c:17 (set (reg:SI 65)
>        (mem/s:SI (plus:SI (mult:SI (reg:SI 61)
>                    (const_int 4 [0x4]))
>                (reg/f:SI 62)) [3 data S4 A32])) -1 (nil))
>
> (insn 11 10 12 3 ld.c:17 (parallel [
>            (set (reg:SI 60)
>                (plus:SI (reg:SI 65)
>                    (reg:SI 64)))
>            (clobber (reg:CC 17 flags))
>        ]) -1 (expr_list:REG_EQUAL (plus:SI (mem/s:SI (plus:SI
> (mult:SI (reg:SI 61)
>                        (const_int 4 [0x4]))
>                    (reg/f:SI 62)) [3 data S4 A32])
>            (mem/s:SI (plus:SI (mult:SI (reg/v:SI 59 [ i ])
>                        (const_int 4 [0x4]))
>                    (reg/f:SI 63)) [3 data S4 A32]))
>        (nil)))
>
> As we can see, the compiler moves 13, and the @ of data, then
> muliplies the 13 with 4 to get the right size and then performs the 2
> loads and finally has a plus.
>
> In my port, I get:
>
> (insn 6 5 7 3 ld.c:17 (set (reg:DI 75)
>        (plus:DI (reg/v:DI 73 [ i ])
>            (const_int 13 [0xd]))) -1 (nil))
>
> (insn 7 6 8 3 ld.c:17 (set (reg/f:DI 76)
>        (symbol_ref:DI ("data") )) -1 (nil))
>
> (insn 8 7 9 3 ld.c:17 (set (reg:DI 78)
>        (const_int 3 [0x3])) -1 (nil))
>
> (insn 9 8 10 3 ld.c:17 (set (reg:DI 77)
>        (ashift:DI (reg:DI 75)
>            (reg:DI 78))) -1 (nil))
>
> (insn 10 9 11 3 ld.c:17 (set (reg/f:DI 79)
>        (plus:DI (reg/f:DI 76)
>            (reg:DI 77))) -1 (nil))
>
> (insn 11 10 12 3 ld.c:17 (set (reg/f:DI 80)
>        (symbol_ref:DI ("data") )) -1 (nil))
>
> (insn 12 11 13 3 ld.c:17 (set (reg:DI 82)
>        (const_int 3 [0x3])) -1 (nil))
>
> (insn 13 12 14 3 ld.c:17 (set (reg:DI 81)
>        (ashift:DI (reg/v:DI 73 [ i ])
>            (reg:DI 82))) -1 (nil))
>
> (insn 14 13 15 3 ld.c:17 (set (reg/f:DI 83)
>        (plus:DI (reg/f:DI 80)
>            (reg:DI 81))) -1 (nil))
>
> (insn 15 14 16 3 ld.c:17 (set (reg:DI 84)
>        (mem/s:DI (reg/f:DI 79) [2 data S8 A64])) -1 (nil))
>
> (insn 16 15 17 3 ld.c:17 (set (reg:DI 85)
>        (mem/s:DI (reg/f:DI 83) [2 data S8 A64])) -1 (nil))
>
> (insn 17 16 18 3 ld.c:17 (set (reg:DI 74)
>        (plus:DI (reg:DI 84)
>            (reg:DI 85))) -1 (nil))
>
>
> Which seems to be the same idea, except that constant 3 gets load up
> and a shift is performed. Is it possible that it's that that is
> causing my problem in code generation?
>
> I'm trying to figure out why my port is generating a shift instead of
> simply a mult. I actually changed the cost of shift to a large value
> and then it uses adds instead of simply a mult. I seem to think that
> this is then an rtx_cost problem where I'm 

Immediates propagated wrongly in stores

2009-07-01 Thread Jean Christophe Beyler
Dear all,

I have this weird issue that I can't really understand:

In my architecture I do not have a store immediate into memory, I have
to go through a register.

However, the compiler is currently propagating constants into the
stores. So for example:

(insn 44 43 45 4 st3.c:21 (set (reg:DI 119)
(const_int 1 [0x1])) -1 (nil))

(insn 45 44 46 4 st3.c:21 (set (mem/s:DI (reg:DI 109 [ ivtmp.43 ]) [2
data S8 A64])
(reg:DI 119)) -1 (nil))

is transformed into :

(insn 46 44 48 3 st3.c:22 (set (mem/s:DI (reg:DI 109 [ ivtmp.43 ]) [2
data S8 A64])
(const_int 1 [0x1])) 72 {movdi_internal1} (expr_list:REG_EQUAL
(const_int 1 [0x1])
(nil)))


I tracked it down to the gcse pass. However, I thought I had turned
this off in the definition of a movdi in my machine description. But
apparently this is not sufficient, any ideas?

As always, thanks for your time,
Jean Christophe Beyler


Re: Immediates propagated wrongly in stores

2009-07-01 Thread Jean Christophe Beyler
Ok, I think I understand, I've been therefore playing with this. I
initially had:

(define_insn "movdi_internal1"
  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,R,T,r,R,o")
(match_operand:DI 1 "general_operand"  "r,iF,R,J,J,o,r,r"))]


For my movdi. I therefore first wanted to handle the fact that I won't
allow stores to use immediates so I thought of changing the above
into:

(define_insn "movdi_internal1"
  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,R,T,r,R,o")
(match_operand:DI 1 "move_operand"  "r,iF,R,J,J,o,r,r"))]

And then defining a simple predicate:

(define_predicate "move_operand"
 (if_then_else (match_operand 0 "register_operand")
(match_operand 1 "general_operand")
(match_operand 1 "register_operand")))

For me this means :
If the first operand is a register (that would then either be a
register-register move or a load),
   Then return whether or not operand 1 is a general operand
   Else it's a memory operand (by definition of the
nonimmediate_operand predicate) and
  return whether or not operand 1 is a register operand

In essence, it seemed to me that this should work as a first draft in
rewriting the whole movdi (and let me test it). However, this makes
the compiler fail on this instruction (unrecognisable)

st4.c:66: error: unrecognizable insn:
(insn 41 40 42 3 st4.c:22 (set (reg/f:DI 119)
(symbol_ref:DI ("data") )) -1 (nil))


For me this doesn't make sense since in this set, the predicate should
check the first register. The test of the if should be true, therefore
matching the second operand to a "general_operand" that it is, no?

Thanks for your help,
Jean Christophe Beyler

On Wed, Jul 1, 2009 at 12:23 PM, Richard Henderson wrote:
> On 07/01/2009 08:36 AM, Jean Christophe Beyler wrote:
>>
>> I tracked it down to the gcse pass. However, I thought I had turned
>> this off in the definition of a movdi in my machine description. But
>> apparently this is not sufficient, any ideas?
>
> You probably just changed the constraint letters, but didn't change
> the operand predicate.  The constraint letters are only used during
> register allocation, and the predicate is used elsewhere.
>
>  (match_operand:MODE N "predicate_function" "constraint_letters")
>
> There are a number of pre-defined predicate functions you can use,
>
>  register_operand
>  nonmemory_operand
>  nonimmediate_operand
>  general_operand
>
> and most ports find they need special ones to exactly match the
> characteristics of some insns.  e.g.
>
> cpu.md:
> (define_insn "*adddi_internal"
>  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
>        (plus:DI (match_operand:DI 1 "register_operand" "%r,r,r")
>                 (match_operand:DI 2 "add_operand" "r,K,L")))]
>  ""
>  "@
>   addq %1,%2,%0
>   lda %0,%2(%1)
>   ldah %0,%h2(%1)")
>
> predicates.md:
> (define_predicate "add_operand"
>  (if_then_else (match_code "const_int")
>    (match_test "satisfies_constraint_K (op) ||
>                 satisfies_constraint_L (op)")
>    (match_operand 0 "register_operand")))
>
> You should strive to ensure that the predicate function exactly
> matches the constraint letters.  If the predicate function accepts
> more than the constraint letters, the reload pass will fix it up.
> But better code is generated when reload doesn't need to do this.
>
>
> r~
>


Re: Immediates propagated wrongly in stores

2009-07-01 Thread Jean Christophe Beyler
I tried something similar:

(define_expand "movdi"
  [(set (match_operand:DI 0 "nonimmediate_operand" "")
(match_operand:DI 1 "general_operand" ""))]
  ""
  "
if(my_expand_move (operands[0], operands[1]))
   DONE;
  ")

(define_insn "movdi_internal1"
  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,R,T,r,R,o")
(match_operand:DI 1 "general_operand"  "r,iF,R,J,J,o,r,r"))]
  "my_move_ok (operands[0], operands[1])"
  "* return my_move_2words (operands, insn); "
  [(set_attr "type" "move,arith,load,store,store,load,store,store")
   (set_attr "mode" "DI,DI,DI,DI,DI,DI,DI,DI")
   (set_attr "length"   "1,1,1,1,1,1,1,1")])


int my_expand_move (rtx op0, rtx op1)
{
  /* Removed PIC, etc. for the moment */
if (
((reload_in_progress | reload_completed) == 0 &&
MEM_P (op0) && !REG_P (op1)))
{
op1 = force_reg (GET_MODE (op0), op1);
emit_move_insn (op0, op1);
return 1;
}
   return 0;
}

bool my_move_ok(rtx op0, rtx op1)
{
if (MEM_P (op0))
return register_operand (op1, VOIDmode);

return true;
}


This seems to do what I want, now I'm going to clean up all those
constraints and see if they are compatible and work on from there.

Thanks for your help, tell me if my solution seems very bad,
Jc


On Wed, Jul 1, 2009 at 3:26 PM, Richard Henderson wrote:
> On 07/01/2009 11:28 AM, Jean Christophe Beyler wrote:
>>
>> Ok, I think I understand, I've been therefore playing with this. I
>> initially had:
>>
>> (define_insn "movdi_internal1"
>>   [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,R,T,r,R,o")
>>         (match_operand:DI 1 "general_operand"      "r,iF,R,J,J,o,r,r"))]
>>
>>
>> For my movdi. I therefore first wanted to handle the fact that I won't
>> allow stores to use immediates so I thought of changing the above
>> into:
>>
>> (define_insn "movdi_internal1"
>>   [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,R,T,r,R,o")
>>         (match_operand:DI 1 "move_operand"      "r,iF,R,J,J,o,r,r"))]
>>
>> And then defining a simple predicate:
>>
>> (define_predicate "move_operand"
>>  (if_then_else (match_operand 0 "register_operand")
>>         (match_operand 1 "general_operand")
>>         (match_operand 1 "register_operand")))
>
> Predicates can only examine one operand, not two.
>
> However, you can write
>
> (define_expand "movdi"
>  [(set (match_operand:DI "nonimmediate_operand" "")
>        (match_operand:DI "general_operand" ""))]
>  ""
>  "my_expand_move (operands[0], operands[1]); DONE;")
>
> (define_insn "*movdi_internal"
>  [(set (match_operand:DI 0 "nonimmediate_operand" "...")
>        (match_operand:DI 1 "general_operand" "..."))]
>  "my_move_ok (operands[0], operands[1])"
>  ...)
>
> void my_expand_move (rtx op0, rtx op1)
> {
>   // Handle TLS, PIC and other special cases...
>
>  if (MEM_P (op0))
>    op1 = force_reg (mode, op1);
>
>  emit_insn (gen_rtx_SET (VOIDmode, op0, op1));
> }
>
> bool my_move_ok(rtx op0, rtx op1)
> {
>  if (MEM_P (op0))
>    return register_operand (op1, VOIDmode);
>  return true;
> }
>
> Compare this with similar code in other ports.
>
>
>
> r~
>


Re: Immediates propagated wrongly in stores

2009-07-06 Thread Jean Christophe Beyler
Sorry about the delay in the answer (4th of July weekend...).

Anyway, I agree it might not be necessary. I put it there because of
the force_reg call. Since that call might in turn call gen_reg_rtx, it
will then start off with an assertion of can_create_pseudo_p.

That is why, I thought best to put that test.

On Wed, Jul 1, 2009 at 5:33 PM, Richard Henderson wrote:
> On 07/01/2009 02:02 PM, Jean Christophe Beyler wrote:
>>
>>             ((reload_in_progress | reload_completed) == 0&&
>>             MEM_P (op0)&&  !REG_P (op1)))
>>     {
>>         op1 = force_reg (GET_MODE (op0), op1);
>>         emit_move_insn (op0, op1);
>>         return 1;
>
> I wouldn't think you'd actually need these reload checks.
> At least some other ports I glanced at don't need them.
>
>
> r~
>


Inserting function calls

2006-08-30 Thread jean-christophe . beyler

Dear all,

I have been trying to insert function calls during a new pass in the  
compiler but it does not seem to like my way of doing it. The basic  
idea is to insert a function call before any load in the program  
(later on I'll be selecting a few loads but for now I just want to do  
it for each and every one).


This looks like the profiling pass but I can't seem to understand what  
is the matter...


This is what the compiler says when I try to compile this test case :

hello.c:18: internal compiler error: tree check: expected ssa_name,  
have symbol_memory_tag in verify_ssa, at tree-ssa.c:776


This is what I did in my pass :

/* Generation of the call type, function will be of type
 *  de type void (*) (int) */
tree call_type = build_function_type_list ( void_type_node,
integer_type_node,
NULL_TREE);
tree call_fn = build_fn_decl ("__MarkovMainEntry",call_type);

data_reference_p a;
for (j = 0; VEC_iterate (data_reference_p, datarefs, j, a); j++)
{
tree stmt = DR_STMT (a);

/* Is it a load */
   if (DR_IS_READ (a))
{
printf("Have a load : %d\n",compteur_interne);
tree compteur = build_int_cst (integer_type_node,  
compteur_interne);

compteur_interne++;

/* Argument creation, just pass the constant integer node */
tree args = tree_cons (NULL_TREE, compteur, NULL_TREE);

tree call = build_function_call_expr (call_fn, args);

block_stmt_iterator bsi;
bsi = bsi_for_stmt (stmt);
bsi_insert_after(&bsi, call, BSI_SAME_STMT);
}
}


And the test code is :

#include 

int fct(int *t);

int main()
{
int tab[10];
int i;

tab[0] = 0;

printf("Hello World %d\n",tab[5]);
printf("Sum is : %d\n",fct(tab));
return 0;
}

int fct(int *t)
{
int i=10;
int tab[10];
while(i) {
tab[i] = tab[i]*2+1+tab[i+1];
i--;
printf("Here %d\n",tab[5]);
}

return t[i];
}


Does anyone have any ideas on to how I can modify my function and get  
it to insert the functions correctly ?


Thanx for any help in finishing this pass,
Jc

Finally here is the patch that shows what I did using the trunk 4.2 :


Index: doc/invoke.texi
===
--- doc/invoke.texi(revision 116394)
+++ doc/invoke.texi(working copy)
@@ -342,7 +342,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsplit-ivs-in-unroller -funswitch-loops @gol
 -fvariable-expansion-in-unroller @gol
 -ftree-pre  -ftree-ccp  -ftree-dce -ftree-loop-optimize @gol
--ftree-loop-linear -ftree-loop-im -ftree-loop-ivcanon -fivopts @gol
+-ftree-loop-linear -ftree-load-inst -ftree-loop-im  
-ftree-loop-ivcanon -fivopts @gol

 -ftree-dominator-opts -ftree-dse -ftree-copyrename -ftree-sink @gol
 -ftree-ch -ftree-sra -ftree-ter -ftree-lrs -ftree-fre -ftree-vectorize @gol
 -ftree-vect-loop-version -ftree-salias -fipa-pta -fweb @gol
@@ -5120,6 +5120,10 @@ at @option{-O} and higher.
 Perform linear loop transformations on tree.  This flag can improve cache
 performance and allow further loop optimizations to take place.

[EMAIL PROTECTED] -ftree-load-inst
+Perform instrumentation of load on trees. This flag inserts a call to  
a profiling

+function before the loads of a program.
+
 @item -ftree-loop-im
 Perform loop invariant motion on trees.  This pass moves only invariants that
 would be hard to handle at RTL level (function calls, operations  
that expand to

Index: tree-pass.h
===
--- tree-pass.h(revision 116394)
+++ tree-pass.h(working copy)
@@ -251,6 +251,7 @@ extern struct tree_opt_pass pass_empty_l
 extern struct tree_opt_pass pass_record_bounds;
 extern struct tree_opt_pass pass_if_conversion;
 extern struct tree_opt_pass pass_vectorize;
+extern struct tree_opt_pass pass_load_inst;
 extern struct tree_opt_pass pass_complete_unroll;
 extern struct tree_opt_pass pass_loop_prefetch;
 extern struct tree_opt_pass pass_iv_optimize;
Index: tree-load-inst.c
===
--- tree-load-inst.c(revision 0)
+++ tree-load-inst.c(revision 0)
@@ -0,0 +1,125 @@
+#include 
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "ggc.h"
+#include "tree.h"
+#include "target.h"
+
+#include "rtl.h"
+#include "basic-block.h"
+#include "diagnostic.h"
+#include "tree-flow.h"
+#include "tree-dump.h"
+#include "timevar.h"
+#include "cfgloop.h"
+#include "expr.h"
+#include "optabs.h"
+#include "tree-chrec.h"
+#include "tree-data-ref.h"
+#include "tree-scalar-evolution.h"
+#include "tree-pass.h"
+#include "lambda.h"
+
+extern struct loops *current_loops;
+static void tree_handle_loop (struct loops *loops);

Re: Inserting function calls

2006-08-30 Thread jean-christophe . beyler

Browse through omp-low.c.  In particular create_omp_child_function


I understand the beginning of the function with its declaration of the  
function but I have a question about these lines :


/* Allocate memory for the function structure.  The call to
 allocate_struct_function clobbers CFUN, so we need to restore
 it afterward.  */
  allocate_struct_function (decl);
  DECL_SOURCE_LOCATION (decl) = EXPR_LOCATION (ctx->stmt);
  cfun->function_end_locus = EXPR_LOCATION (ctx->stmt);
  cfun = ctx->cb.src_cfun;

Is that a necessary process for the declaration of a function ? I ask  
because I do not want the compiler to compile directly my function but  
rather ask the linker to take care of that (it will be an external  
function).



and expand_omp_parallel.


I notice that at the end of that function there is a call to  
expand_parallel_call and in that function I don't see a difference  
with how I prepare the arguments.


This leads me to think that the problem lies with the declaration of  
the function, am I correct ?



The new function needs to be added to the call
graph and queued for processing (cgraph_add_new_function).


This would be true if I wanted it to be compiled but if I do not  
(using a precompiled version) ?


Thank you for your time,
Jc
-
‹Degskalle› There is no point in arguing with an idiot, they will just
drag you down to their level and beat you with experience

Référence: http://www.bash.org/?latest
-




Re: Inserting function calls

2006-08-30 Thread jean-christophe . beyler

In create_omp_child_function, an identifier for the new function is
created.  We then create a call to it using build_function_call_expr in
expand_parallel_call.


Ok so that's what I saw, is this call necessary for what I'd need :

  decl = lang_hooks.decls.pushdecl (decl);


Then simplifying the problem, I just want to call a void _foo(void)  
function so, taking from create_omp_child_function and  
expand_parallel_call I did this :


type = build_function_type_list (void_type_node, NULL_TREE);

decl = build_decl (FUNCTION_DECL, "_foo" , type);

TREE_STATIC (decl) = 1;
TREE_USED (decl) = 1;
DECL_ARTIFICIAL (decl) = 1;
DECL_IGNORED_P (decl) = 0;
TREE_PUBLIC (decl) = 0;
DECL_UNINLINABLE (decl) = 1;
DECL_EXTERNAL (decl) = 0;
DECL_CONTEXT (decl) = NULL_TREE;
DECL_INITIAL (decl) = make_node (BLOCK);

t = build_decl (RESULT_DECL, NULL_TREE, void_type_node);
DECL_ARTIFICIAL (t) = 1;
DECL_IGNORED_P (t) = 1;
DECL_RESULT (decl) = t;

t = build_decl (PARM_DECL, NULL_TREE, void_type_node);
DECL_ARTIFICIAL (t) = 1;
DECL_ARG_TYPE (t) = void_type_node;
DECL_CONTEXT (t) = current_function_decl;
TREE_USED (t) = 1;
DECL_ARGUMENTS (decl) = t;

tree list = NULL_TREE;
tree call = build_function_call_expr (decl, NULL);
gimplify_and_add(call,&list);

bsi_insert_before(&bsi, list, BSI_CONTINUE_LINKING);


But this gives me this error when I try compiling it

hello.c:18: internal compiler error: tree check: expected  
identifier_node, have obj_type_ref in special_function_p, at calls.c:475


Any ideas ?
Jc

-
‹Degskalle› There is no point in arguing with an idiot, they will just
drag you down to their level and beat you with experience

Référence: http://www.bash.org/?latest
-




Re: Inserting function calls

2006-08-31 Thread jean-christophe . beyler

what you do seems basically OK to me.  The problem is that you also need
to fix the ssa form for the virtual operands of the added calls
(i.e., you must call mark_new_vars_to_rename for each of the calls,
and update_ssa once at the end of tree_handle_loop).

Zdenek



Ok, by inserting the mark_new_vars_to_rename and the TODO_update_ssa  
flag in the associated struct tree_opt_pass.


This works (yeah!) and I can link my call before loads but it seems to  
miss some...


To pass through the statements I use :

  datarefs = VEC_alloc (data_reference_p, heap, 10);
  dependence_relations = VEC_alloc (ddr_p, heap, 10 * 10);
  compute_data_dependences_for_loop (loop_nest, true, &datarefs,
 &dependence_relations);

  for (j = 0; VEC_iterate (data_reference_p, datarefs, j, a); j++)
{
  tree stmt = DR_STMT (a);

  /* See if it is a load */
  if (DR_IS_READ (a))

However if in my test code I have this :

for(i=0;i<10;i++)
for(j=i+1;j<10;j++)
{
a = t[j];
b = t[i-1];

//Calculations with a and b...
   }


I get am able to insert before the first load but the second doesn't  
seem captured, this is what I get from debug_loop_ir :


__MarkovMainEntry (2);
#   VUSE ;
a_12 = *D.2174_11;
#   VUSE ;
b_18 = *D.2179_17;

Any idea why ? I've looked around in the code to see how they parse  
the data dependance tree but I don't see a difference.


Thank you again,
Jc





Trans.: Re: Inserting function calls

2006-08-31 Thread jean-christophe . beyler

Dear all,


Any idea why ? I've looked around in the code to see how they parse
the data dependance tree but I don't see a difference.


Interesting.
So what statements *are* in the list of data dependences if not these.


Ok apparently it's more a problem of optimization levels in O3 the
compiler does not seem to generate the calls but in O1 it does.

Here is the test case :

#include 

int fct(int *t);

int main()
{
 int tab[10];
 int i;

 printf("Hello World %d\n",tab[5]);

 printf("Sum is : %d\n",fct(tab));

 return 0;
}

int fct(int *t)
{
 int i=9;
 int res;


 while(i>=0) {
 if(t[i]  Perform loop invariant motion on trees.  This pass moves only  
invariants that

  would be hard to handle at RTL level (function calls, operations
that expand to
Index: tree-pass.h
===
--- tree-pass.h (revision 116373)
+++ tree-pass.h (working copy)
@@ -251,6 +251,7 @@
  extern struct tree_opt_pass pass_record_bounds;
  extern struct tree_opt_pass pass_if_conversion;
  extern struct tree_opt_pass pass_vectorize;
+extern struct tree_opt_pass pass_load_inst;
  extern struct tree_opt_pass pass_complete_unroll;
  extern struct tree_opt_pass pass_loop_prefetch;
  extern struct tree_opt_pass pass_iv_optimize;
Index: tree-load-inst.c
===
--- tree-load-inst.c(revision 0)
+++ tree-load-inst.c(revision 0)
@@ -0,0 +1,139 @@
+#include 
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "flags.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "ggc.h"
+#include "langhooks.h"
+#include "hard-reg-set.h"
+#include "basic-block.h"
+#include "output.h"
+#include "expr.h"
+#include "function.h"
+#include "diagnostic.h"
+#include "bitmap.h"
+#include "pointer-set.h"
+#include "tree-flow.h"
+#include "tree-gimple.h"
+#include "tree-inline.h"
+#include "varray.h"
+#include "timevar.h"
+#include "hashtab.h"
+#include "tree-dump.h"
+#include "tree-pass.h"
+#include "toplev.h"
+#include "target.h"
+#include "cfgloop.h"
+#include "tree-chrec.h"
+#include "tree-data-ref.h"
+#include "tree-scalar-evolution.h"
+#include "lambda.h"
+#include "coverage.h"
+
+extern struct loops *current_loops;
+static void tree_handle_loop (struct loops *loops);
+
+
+static unsigned int tree_load_inst (void)
+{
+  fprintf (stderr, "My load instrumentation start %p\n",current_loops);
+
+  if (current_loops == NULL)
+{
+  fprintf (stderr,"No loop\n");
+  return 0;
+}
+
+  tree_handle_loop (current_loops);
+  return 0;
+}
+
+void tree_handle_loop (struct loops *loops)
+{
+  unsigned int i;
+  unsigned int j;
+  static int compteur_interne = 0;
+
+  for (i = 1; inum; i++)
+{
+  struct loop *loop_nest = loops->parray[i];
+
+  //If no loop_nest
+  if (!loop_nest)
+continue;
+
+  VEC (ddr_p, heap) *dependence_relations;
+  VEC (data_reference_p, heap) *datarefs;
+
+  datarefs = VEC_alloc (data_reference_p, heap, 10);
+  dependence_relations = VEC_alloc (ddr_p, heap, 10 * 10);
+  compute_data_dependences_for_loop (loop_nest, true, &datarefs,
+ &dependence_relations);
+
+  tree call_type = build_function_type_list (void_type_node,
+ integer_type_node,
+ NULL_TREE);
+  tree call_fn = build_fn_decl ("__MarkovMainEntry",call_type);
+
+  data_reference_p a;
+  for (j = 0; VEC_iterate (data_reference_p, datarefs, j, a); j++)
+{
+  tree stmt = DR_STMT (a);
+
+  /* On fait nos test pour voir si c'est un load */
+  if (DR_IS_READ (a))
+{
+  printf ("Have a load : %d\n", compteur_interne);
+  tree compteur = build_int_cst (integer_type_node,
compteur_interne);
+  compteur_interne++;
+
+  /* Generation de l'instruction pour l'appel */
+  tree args = tree_cons(NULL_TREE,//build_tree_list
(NULL_TREE, compteur);
+compteur,
+NULL_TREE);
+
+  /* Je suppose que args est correctement mis en place */
+  tree call = build_function_call_expr (call_fn, args);
+
+  mark_new_vars_to_rename(call);
+  block_stmt_iterator bsi;
+  bsi = bsi_for_stmt (stmt);
+  bsi_insert_before (&bsi, call, BSI_SAME_STMT);
+
+}
+}
+  VEC_free (data_reference_p, heap, datarefs);
+  VEC_free (ddr_p, heap, dependence_relations);
+}
+
+  debug_loop_ir ();
+  fprintf (stderr,"My load instrumentation stop\n");
+}
+
+static bool gate_tree_load_inst (void)
+{
+  return flag_tree_load_inst;
+}
+
+
+struct tree_opt_pass pass_load_inst =
+  {
+"loadinst",   /* name */
+gate_tree_load

Re: Inserting function calls

2006-08-31 Thread jean-christophe . beyler

is there the data reference for it in the datarefs array?

Zdenek



Using this code after the construction of the data dependance :


  datarefs = VEC_alloc (data_reference_p, heap, 10);
  dependence_relations = VEC_alloc (ddr_p, heap, 10 * 10);
  compute_data_dependences_for_loop (loop_nest, true, &datarefs,
 &dependence_relations);

  static int cnt = 0;
  char s[128];
  sprintf(s,"output%d",cnt++);
  FILE *f = fopen(s,"w");
  dump_data_references(f,datarefs);
  fclose(f);

With this program :

#include 

int fct(int *t);

int main()
{
int tab[10];
int i;

printf("Hello World %d\n",tab[5]);

printf("Sum is : %d\n",fct(tab));

return 0;
}

int fct(int *t)
{
int i=9;
int res=0;
int j,a,b;


for(i=0;i<10;i++)
   for(j=i+1;j<10;j++)
  {
  a = t[j];
  b = t[i-1];
  res += b - a;
  }
return res;
}


I got :


For the main :

(Data Ref:
  stmt:
  ref:
  base_object:
)

For fct :

(Data Ref:
  stmt: a_16 = *D.2144_15;
  ref: *D.2144_15;
  base_object:
  Access function 0: {0B, +, 1B}_2
)


Jc
-
‹Degskalle› There is no point in arguing with an idiot, they will just
drag you down to their level and beat you with experience

Référence: http://www.bash.org/?latest
-




Re: Inserting function calls

2006-09-01 Thread jean-christophe . beyler
First of all, thanx all for all your help, I'm finally seeing the  
light at the end of this long tunnel ;-)



one possibility is that the t[i-1] load got moved out of the loop by
PRE; could you check that the load is still present in the loop?

Zdenek


You're right, the load was moved to the outer loop...

I still have 2 more questions :

- How can I get the address of the load in question and pass it to the  
function. I've looked around in the code but haven't found anything  
similar. DR_MEMTAG seems to give an alias but how can I use that ?


- I am currently working on loop modification, is it difficult to  
expand to the whole program, thus looking at each loop and at what  
level should I include my PASS in the passes.c file ?


Jc

-
‹Degskalle› There is no point in arguing with an idiot, they will just
drag you down to their level and beat you with experience

Référence: http://www.bash.org/?latest
-