Re: How to support 40bit GP register - Take two

2009-12-18 Thread Hans-Peter Nilsson
Sorry if I misunderstood, but...

On Fri, 18 Dec 2009, Mohamed Shafi wrote:

> 2009/12/18 Hans-Peter Nilsson :
> > On Fri, 20 Nov 2009, Mohamed Shafi wrote:
> >> I tried implementing the suggestion given by Richard, but got into
> >> issues. The GCC frame work is written assuming that there are no modes
> >> with HOST_BITS_PER_WIDE_INT < GET_MODE_BITSIZE (mode) < 2 *
> >> HOST_BITS_PER_WIDE_INT.
> >
> > (Not seeing a reply regarding this issue, so here's mine, belated:)
> >
> > Perhaps a wart, but with a 64-bit HOST_BITS_PER_WIDE_INT, would
> > that affect your port?  It's not?  Just set need_64bit_hwint=yes
> > in config.gcc.  And send a patch for the introductory comment in
> > that file, unless your port already matches the "BITS_PER_WORD >
> > 32 bits" condition.
> >
>Thanks Hans for yourr reply
>I have already tried that. What you are suggesting is the first
> solution that i got from Richard Henderson.

Or rather, a prerequisite to that solution, working around the
wart you noted.

> I have mentioned the
> issues if faced with this in my mail. The GCC frame work is written
> assuming that there are no modes with HOST_BITS_PER_WIDE_INT <
> GET_MODE_BITSIZE (mode) < 2 * HOST_BITS_PER_WIDE_INT. So i had to hack
> at places to get things working. For my target the BITS_PER_WORD ==
> 32. The mode that i am using is RImode (5bytes)

You already mentioned all that in the message to which I replied
and quoted above (twice now :-)  By you mentioning that equation
I have to think you misread and missed the point (sorry if
instead it's me);

In config.gcc, for the line matching your port, set:
 need_64bit_hwint=yes

Then, GET_MODE_BITSIZE (mode) < HOST_BITS_PER_WIDE_INT
for your 40-bit mode, as the equation above yields false for
64 < 40 < 64*2.

HTH, good luck and happy holidays.
brgds, H-P

Re: GCC presentation targeted to users (43 slides in english)

2009-12-18 Thread Richard Earnshaw

On Thu, 2009-12-17 at 21:00 +0100, Laurent GUERBY wrote:
> On Thu, 2009-12-17 at 21:02 +0100, Eric Botcazou wrote:
> > > The 43 slides presentation in english is available here
> > > in PDF and openoffice format:
> > >
> > > http://guerby.org/ftp/gcc-toulibre-20091216.pdf
> > > http://guerby.org/ftp/gcc-toulibre-20091216.odp
> > 
> > A small nit: you don't need to do 'make bootstrap' anymore, 'make' is 
> > enough.
> 
> Yes I explained it during the presentation (on native vs cross) but I
> couldn't remember in what version the change was made so I erred on the
> safe side :).

Another nit.  Your description of -m options is somewhat (entirely?)
i386 centric.

R



Re: GCC presentation targeted to users (43 slides in english)

2009-12-18 Thread Laurent GUERBY
On Fri, 2009-12-18 at 10:42 +, Richard Earnshaw wrote:
> On Thu, 2009-12-17 at 21:00 +0100, Laurent GUERBY wrote:
> > On Thu, 2009-12-17 at 21:02 +0100, Eric Botcazou wrote:
> > > > The 43 slides presentation in english is available here
> > > > in PDF and openoffice format:
> > > >
> > > > http://guerby.org/ftp/gcc-toulibre-20091216.pdf
> > > > http://guerby.org/ftp/gcc-toulibre-20091216.odp
> > > 
> > > A small nit: you don't need to do 'make bootstrap' anymore, 'make' is 
> > > enough.
> > 
> > Yes I explained it during the presentation (on native vs cross) but I
> > couldn't remember in what version the change was made so I erred on the
> > safe side :).
> 
> Another nit.  Your description of -m options is somewhat (entirely?)
> i386 centric.

Yes I said orally that I ran the command on a regular PC but that each
target will have its own list (so "target specific") of -m reported and
also of course described in the manual (tersely :). I used a PC because
not many people know that you can use "-m32" there.

Sincerely,

Laurent





Re: GMP and GCC 4.3.2

2009-12-18 Thread Jean Christophe Beyler
myarch64 is my port.

Then perhaps this is an back-end problem on my port. I'll try to
determine this today and keep you posted.

Jc

On Thu, Dec 17, 2009 at 9:29 PM, Jie Zhang  wrote:
> On 12/18/2009 06:27 AM, Jean Christophe Beyler wrote:
>>
>> Actually, I just finished updating my 4.3.2 to 4.3.3 and tested it and
>> I still have the same issue.
>>
>> This seems to be a problem more than "just" 4.3.2.
>>
>> Here is the test program:
>> #include
>> #include
>>
>> int main() {
>>     mpz_t a,b;
>>     mpz_init_set_str(a, "100", 10); // program works with 10^9,
>> but not
>>     // with 10^10 (10^20>  2^64)
>>     mpz_init_set(b, a);
>>     mpz_mul(a, a, a);
>>     gmp_printf("first,  in GMP mpz_mul(a,a,a) with a=%Zd gives %Zd \n", b,
>> a);
>>     mpz_set(b, a);
>>     mpz_mul(a, a, a);
>>     gmp_printf("second, in GMP mpz_mul(a,a,a) with a=%Zd gives %Zd \n", b,
>> a);
>>     return 0;
>> }
>>
>> We obtain:
>> first,  in GMP mpz_mul(a,a,a) with a=100 gives
>> 1
>> second, in GMP mpz_mul(a,a,a) with a=1 gives
>>
>> 2254536013160540992915637663717291581095979492475463214517286840718852096
>>
>> Which clearly is wrong for the second output.
>>
>> This was tested with a 64 bit architecture. I know that with a 4.1.1
>> port of the compiler, I do not see this issue.
>>
>> I will see if I can port it forward to see if I still see the problem
>> but it might be difficult to port from 4.3.2 to 4.4.2, I'm not sure
>> how many things have changed but I'm sure quite a bit !
>>
>> This is the -v of my GCC, version 4.3.3:
>> Using built-in specs.
>> Target: myarch64-linux-elf
>> Configured with:
>> /home/beyler/myarch64/src/myarch64-gcc-4.3.2/configure
>> --target=myarch64-linux-elf
>> --with-headers=/home/beyler/myarch64/src/newlib-1.16.0/newlib/libc/include
>> --prefix=/home/beyler/myarch64/local --disable-nls
>> --enable-languages=c --with-newlib --disable-libssp
>> --with-mpfr=/home/beyler/myarch64/local
>> Thread model: single
>> gcc version 4.3.3 (GCC)
>>
> What's myarch64? I got the correct result for your test with vanilla
> gcc-4.3.2 and gmp-4.3.1 on Debian unstable AMD64.
>
>
> Jie
>


Possible IRA improvements for irregular register architectures

2009-12-18 Thread Ian Bolton
Let's assume I have two sub-classes of ALL_REGS: BOTTOM_REGS (c0-c15)
and TOP_CREGS (c16-c31).

Let's also assume I have two main types of instruction: A-type
Instructions, which can use ALL 32 registers, and B-type Instructions,
which can only use the 16 BOTTOM_REGS.

IRA will correctly calculate costs (in ira-costs.c) for allocnos
appearing in B-type instructions, such that TOP_CREGS has a higher
cost than BOTTOM_REGS.  It will also calculate costs for the A-type
instructions such that TOP_CREGS and BOTTOM_REGS have the same cost.

The order of coloring will be determined by the algorithm chosen:
Priority or Chaitin-Briggs.  As each allocno is colored, the costs
will be inspected and the best available hard register will be chosen,
mainly based on the register class costs mentioned above, so allocnos
in B-type Instructions will usually be assigned a BOTTOM_REG if one is
free.  If two or more hard registers share the same cost then
whichever one appears first in the REG_ALLOC_ORDER will be assigned.
(If no hard register can be found, the allocno is assigned memory and
will require a "reload" in a later pass to get a hard register.)

I do not wish to alter the coloring algorithms or the coloring order.
I believe they are good at determing the order to color allocnos,
which dictates the likelihood of being assigned a hard register.  What
I wish to change is the hard register that is assigned, given that the
coloring order has determined that this allocno should get one next.

Why do I need to do this?  Since the BOTTOM_REGS can be used by all
instructions, it makes sense to put them first in the REG_ALLOC_ORDER,
so we minimise the number of registers consumed by a low-pressure
function.  But it also makes sense, in high-pressure functions, to
steer A-type Instructions away from using BOTTOM_REGS so that they are
free for B-type Instructions to use.

To achieve this, I tried altering the costs calculated in ira-costs.c,
either explicitly with various hacks or by altering operand
constraints.  The problem with this approach was that it is static and
independent, occurring before any coloring order has been determined
and without any awareness of the needs of other allocnos.  I believe I
require a dynamic way to alter the costs, based on which allocnos
conflict with the allocno currently being colored and which hard
registers are still available at this point.

The patch I have attached here is my first reasonable successful
attempt at this dynamic approach, which has led to performance
improvements on some of our benchmarks and no significant
regressions.

I am hoping it will be useful to others, but I post it more as a
talking point or perhaps to inspire others to come up with better
solutions and share them with me :-)


bolton_ira_subset_experiments.diff
Description: bolton_ira_subset_experiments.diff


CSE & compare/branch template problem

2009-12-18 Thread Michael Eager

Hi --

I'm working on creating the cstore and cbranch templates
for the Xilinx MicroBlaze processor.  I've run into some
problems and an unexpected interaction with CSE processing.
Possibly, the insns I'm generating for comparisons are not
what the CSE optimization expects.

MicroBlaze has a bit unusual compare and branch architecture.
There are no condition flags; comparison results are stored in
a result register.

There's one branch instruction, which compares a register
with zero and branches based on condition (eq, ne, lt, gt, etc.).

There are a number of different instructions which can be used
for comparisons.  Regular instructions like sub or xor can be
used in some cases.  There are also cmp/cmpu instructions which
do a modified subtraction of signed and unsigned integers.
There are other comparison instructions for eq/ne which give
a 0 or one result.  The FP comparisons are all coupled with
a condition:  fcmp.eq, fcmp.lt, etc.

To generate the correct comparison instruction, I need to
know both the operands and the test condition.

It doesn't look like the (compare:CC ...) template can be
used.  Compare takes two operands and there is no place to
save the condition.

For comparisons, I'm generating insns like:

  (set (reg:CC rD) (eq:CC (reg:SI rA) (reg:SI rB))

followed by a branch:

  (if_then_else
 (eq:CC (reg:CC rD) (const_int 0))
 (label_ref xx)
 (pc)))

This looks OK and it appears to work reasonably well.  At
least without optimization.  CSE appears to be misinterpreting
the comparison insn and the code gets trashed.

This is taken from strchr.c in Newlib.  The preprocessed code
snippet is

   unsigned char c = i;

   if (!c)
  {
while (((long)s & (sizeof (long) - 1)))
  {
if (!*s)
  return (char *) s;
s++;
  }
. . .

Before CSE, this is the insn sequence:

(insn 90 89 91 2 .../strchr.c:66
  (set (reg/v:SI 137 [ c+-3 ])
   (zero_extend:SI (subreg:QI (reg/v:SI 244 [ i ]) 3))) 33

(insn 92 91 93 2 .../strchr.c:73
  (set (reg:SI 245)
   (const_int 0 [0x0])) 41

(insn 93 92 94 2 .../strchr.c:73
  (set (reg:CC 246)
   (eq:CC
   (reg/v:SI 137 [ c+-3 ])
   (reg:SI 245))) 71

(jump_insn 94 93 95 2 .../strchr.c:73
  (set (pc)
   (if_then_else
(eq:CC
(reg:CC 246)
(const_int 0 [0x0]))
(label_ref 112)
(pc)))

This is the first if expression.   Note that reg 137 [c] is set to the
masked value of the search character i.  Reg 245 is set to zero, which
is used in the compare instruction in insn 93.

The while expression is

(insn 97 96 98 3 .../strchr.c:93
  (set (reg:SI 247)
   (and:SI
(reg/v/f:SI 140 [ s ])
(const_int 3 [0x3]))) 25

(insn 98 97 99 3 .../strchr.c:93
  (set (reg:SI 248)
   (const_int 0 [0x0])) 41

(insn 99 98 100 3 .../strchr.c:93
  (set (reg:CC 249)
   (eq:CC
   (reg:SI 247)
   (reg:SI 248))) 71

(jump_insn 100 99 101 3 .../strchr.c:93
  (set (pc)
   (if_then_else
(eq:CC
(reg:CC 249)
(const_int 0 [0x0]))
(label_ref 225)
(pc))) 75

Reg 247 is set to the masked value of s, the string pointer.
Reg 248 is set to zero and is used in the compare in insn 99.

All of this looks OK to me.

There is an intermediate step where reg 248 is replaced by
reg 245, since both are set to zero.

Somehow, CSE is deciding that reg 137 is equal to zero and
translates insn 99 to

(insn 99 98 100 3 .../strchr.c:93
  (set (reg:CC 249)
   (eq:CC
   (reg:SI 247)
   (reg/v:SI 137 [ c+-3 ]))) 71

There is a REG_EQUAL note on this insn saying that it is equal
to const zero.  At this point, the code is hosed.

I've been stepping through CSE to figure out why it
decides that reg 137 is equal to zero, but haven't
found this yet.

(Yes, I do recognize that a comparison with zero can be
simplified into just the branch insn.  That's on my to do list.)

Questions:

  Is this a reasonable way to represent the comparisons?

  Are there other targets which save comparison results
  in registers and require the condition?

  Any suggestions on better ways to model the MicroBlaze
  comparison operations?

  Are there some restriction on using eq/ne/lt/... the
  way I am?

  Any suggestions on how to fix the problem in CSE?

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: CSE & compare/branch template problem

2009-12-18 Thread Joern Rennecke

Quoting Michael Eager :


Hi --

I'm working on creating the cstore and cbranch templates
for the Xilinx MicroBlaze processor.


In theory cstore / cbranch should be the future, but the last time I tried
to use them, they didn't quite work right yet, particularily if you have
an incomplete comparison set.  Because of delays with the Copyright
assignment, fixing the middle-end was not an efficient option, so I went
for the old hack with separate compare and condjump / cstore patterns.


Somehow, CSE is deciding that reg 137 is equal to zero and
translates insn 99 to

(insn 99 98 100 3 .../strchr.c:93
  (set (reg:CC 249)
   (eq:CC
   (reg:SI 247)
   (reg/v:SI 137 [ c+-3 ]))) 71


That is correct; after the jump_insn 94 is not taken, you can conclude
that reg 137 is zero.


There is a REG_EQUAL note on this insn saying that it is equal
to const zero.


I would say that is a problem; find out why the note gets there.



  Are there other targets which save comparison results
  in registers and require the condition?


Yes.  One example is SH64.


  Any suggestions on better ways to model the MicroBlaze
  comparison operations?

  Are there some restriction on using eq/ne/lt/... the
  way I am?

  Any suggestions on how to fix the problem in CSE?


There are some problems if comparisons can't be reversed.
I think it's a bug in the generic code, but for any given port, it's
easier - and gives better code - to make sure that all conditions can
be reversed, rather than fix the generic code.
I ran into this with the MXP port.