Re: libgcc-arch.ver details

2010-03-18 Thread Paulo J. Matos
On Thu, Mar 18, 2010 at 6:51 AM, Andrew Pinski  wrote:
> On Wed, Mar 17, 2010 at 11:38 PM, Paulo J. Matos  wrote:
>
>
> I cannot remember when half
> float support came in though, I thought it was only added on the trunk
> or did you backport that support too.
>

Thanks for the tips. However, I didn't backport anything. I created my
branch of gcc 4.3.4 from gcc 4_3 branch rev 150451 and added my own
changes. Did I pick gcc in the middle of a sequence of commits
therefore missing the implementation of half floats?

> Thanks,
> Andrew Pinski
>



-- 
PMatos


Re: libgcc-arch.ver details

2010-03-18 Thread Paulo J. Matos
On Wed, Mar 17, 2010 at 3:35 PM, Paulo J. Matos  wrote:
> So, let me see if I got this write.

This is not meant as a spam mail but I can't help myself to correct
this horrible mistake. Obviously I meant "right", instead of "write".
:)
[Having this error spread across multiple mailing list archives for
posterity without correction would be extremely embarrassing!]

-- 
PMatos


Re: Hash Function for "switch statement"

2010-03-18 Thread Andrew Haley
On 03/18/2010 05:22 AM, Jae Hyuk Kwak wrote:
> On Wed, Mar 17, 2010 at 1:04 PM, Michael Meissner
>  wrote:
>> Note, that many hash tables are computed by the modulus operation, which is
>> often fairly expensive (and on machines without a hardware divide unit,
>> requiring a function call).  I would expect many switch statements would slow
>> down if you switch to a hash operation that used modolus.
> 
> I agree that the cost of modulation can be high, but it can be even
> higher if we use a bunch of "else if".
> Consider the situation that a program has about 400 cases on a single
> switch statement.
> 
> The cost of modulation will be fixed price so that there should be a
> certain point that the price bis lower than else if statements.

Sure, but how often is that going to be cheaper than, for example,
using a tree rather than a hash table?

Andrew.


Re: Question on mips multiply patterns in md file

2010-03-18 Thread Amker.Cheng
> The reasoning here is
> that if splitting will result in worse code, then we shouldn't have
> accepted it in the first place.  If dropping this alternative results in
> register allocator failures for some strange reason, then we accept it
> and generate the 3 instruction sequence with a new define_split.

Thanks Jim.

I could not get your method well since don't know much about
the IRA and reload pass. Here comes the question,
Does it possible that the method would ever result in register
allocator failure?
In my understanding, doesn't reload pass would do whatever it can to make
all insns' constraints satisfied?


> If dropping this alternative results in the register allocator generating
> worse code for other surrounding operations, then it is better to accept
> it and add the new define_split.

By this , you mean I should go back to the define_split method if dropping
the alternative does results in bad insns generated by RA?


>
> Some experimentation might be necessary to see which change is the
> better solution.
Yes, I profiled MiBench and found gcc generates better codes by using
madd instruction; on the other hand, how bad the code is generated by
define_split still not closely checked.

Another thought on this topic, Maybe I should make two copy of mul_acc_si
like you said, with one remains the constraints, the other drop the "*?*?".
Does this is the same as your method suggested?


-- 
Best Regards.


Sub-optimal code

2010-03-18 Thread Alain Ketterlin


I've reported here recently about gcc producing conditional branches
with static outcome. I've finally managed to produce a simple
self-contained example. Here it is:

int f()
{
static int LSW=-1;
double x=0.987654321;
unsigned char ix = *((char *)&x);
if(ix==0xdd || ix==0x3f)
LSW = 1;
else
LSW = 0;
return 0;
}

Compiling this with "gcc-4.5 -O3 -c f.c" (20100304 snapshot) on my
x86-64 system produces the following code:

   0:   b8 b8 ff ff ff  mov$0xffb8,%eax
   5:   3c dd   cmp$0xdd,%al
   7:   0f 94 c0sete   %al
   ...

Ok, not a branch, but still useless code. gcc-4.4 does the same, and
gcc-4.3 is slightly less clever (it moves the whole immediate 8 bytes
into %rax and does two cmp+je).

I first thought the type-cast was responsible. But if you replace the
assignment to LSW with a simple return, gcc correctly optimizes the
whole function away.

Can anybody confirm this?  Is it worth a bug report?

-- Alain.


feature request - Static Stack Usage Analysis for C/C++

2010-03-18 Thread Massimiliano Cialdi
It could be very very useful to add tatic Stack Usage Analysis for C/C
++. Something like -fstack-usage already done in gnat

thanks


Coloring problem - Pass 0 for finding allocno costs

2010-03-18 Thread Frank Isamov
Hi,

In my backend, I have a problem with the pass which determines the
best register class for a virtual register (Pass 0 for finding allocno
costs).

In all insns in this example both R_REGS and D_REGS register classes
are applicable (but all registers in an insn should be from the same
register class).

This is asmcons output:

;; Function mul (mul)
(note 1 0 5 NOTE_INSN_DELETED)

(note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(note 2 5 3 2 NOTE_INSN_DELETED)

(insn 3 2 4 2 a.c:2 (set (reg/v:SI 97 [ b ])
(reg:SI 49 r1 [ b ])) 43 {movsi_regs} (expr_list:REG_DEAD
(reg:SI 49 r1 [ b ])
(nil)))

(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)

(insn 7 4 9 2 a.c:5 (set (reg/v:SI 94 [ __a_11 ])
(plus:SI (reg:SI 48 r0 [ a ])
(reg/v:SI 97 [ b ]))) 0 {*addsi3_1} (expr_list:REG_DEAD
(reg:SI 48 r0 [ a ])
(nil)))

(insn 9 7 10 2 a.c:5 (parallel [
(set (reg:SI 100)
(ashift:SI (reg/v:SI 97 [ b ])
(const_int 3 [0x3])))
(clobber (reg:CC 88 cc))
]) 36 {ashlsi3} (expr_list:REG_UNUSED (reg:CC 88 cc)
(expr_list:REG_EQUAL (ashift:SI (reg/v:SI 97 [ b ])
(const_int 3 [0x3]))
(nil

(insn 10 9 11 2 a.c:5 (set (reg:SI 101)
(plus:SI (reg:SI 100)
(reg/v:SI 97 [ b ]))) 0 {*addsi3_1} (expr_list:REG_DEAD (reg:SI 100)
(expr_list:REG_DEAD (reg/v:SI 97 [ b ])
(expr_list:REG_EQUAL (mult:SI (reg/v:SI 97 [ b ])
(const_int 9 [0x9]))
(nil)

(insn 11 10 12 2 a.c:5 (set (reg:SI 102)
(plus:SI (reg:SI 101)
(reg/v:SI 94 [ __a_11 ]))) 0 {*addsi3_1}
(expr_list:REG_DEAD (reg:SI 101)
(expr_list:REG_DEAD (reg/v:SI 94 [ __a_11 ])
(nil

(note 12 11 17 2 NOTE_INSN_DELETED)

(insn 17 12 23 2 a.c:7 (parallel [
(set (reg/i:SI 48 r0)
(ashift:SI (reg:SI 102)
(const_int 10 [0xa])))
(clobber (reg:CC 88 cc))
]) 36 {ashlsi3} (expr_list:REG_UNUSED (reg:CC 88 cc)
(expr_list:REG_DEAD (reg:SI 102)
(nil

(insn 23 17 0 2 a.c:7 (use (reg/i:SI 48 r0)) -1 (nil))

The problem I see is that for registers 100,101 I get best register
class D instead of R – actually they get the same cost and D is chosen
(maybe because it is first).

But they should not get the same cost since choosing D_REGS causes two
additional copies.

To my understanding the algorithm checks every insn, and since
register 100 appears only in insns in which all registers are still
virtual, any register class fits without additional cost. But later
when coloring it with d register, we need copies from r to d and back
to r.

Can someone please help with this?

Is there any reading material about this part of the IRA?

Thanks, Frank.


Re: Is it possible to port GCC backend to a architecture with very limited hard registers?

2010-03-18 Thread redriver jiang
Ok. Thanks!

Then I will persuade the guys who develop the MCU to add one or more
base registers to ease the reload problem, and besides, I will add
some virtual registers( who are static "memory".) to hold 16,32 bit
mode variables.Hope these 2 solutions can make better codes. Now I
just begin to build a prototype porting.

2010/3/18 Ian Lance Taylor :
> redriver jiang  writes:
>
>> Right now I attempts to port the GCC backend to a MCU with very
>> limited hard registers: only one 8 bit ACC reg, one 16 bit base reg
>> for addressing, one stats reg.
>> I searched the GCC backend porting, and seems 68HC1X has the similar
>> scene, but it use many "ram simulated" register. I wonder that if it
>> is possbile to provided thislimited 3 register to GCC bankend, and let
>> all 16bit(HImode), 32bit(SImode) operands spilled to stack.
>
> It should be possible, though it owuld not be easy to resolve all the
> reload issues.  gcc will not generate particularly good code for such
> a processor; you will see an awful lot of register shuffling.
>
> Ian
>


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Vincent Lefevre
On 2010-03-16 16:18:17 +0100, Richard Guenther wrote:
> pow (a, 0.5) is always expanded to sqrt(a).

This violates the ISO C99 standard for -0.0.

According to N1256, F.9.4.4:

  pow(±0, y) returns +0 for y > 0 and not an odd integer.

So, pow(-0.0, 0.5) should return +0. But sqrt(-0.0) should return -0
according to the IEEE 754 standard (and F.9.4.5 from ISO C99).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


RE: Coloring problem - Pass 0 for finding allocno costs

2010-03-18 Thread Ian Bolton
> The problem I see is that for registers 100,101 I get best register
> class D instead of R - actually they get the same cost and D is chosen
> (maybe because it is first).

Hi Frank.

Do D and R overlap?  It would be useful to know which regs are in
which class, before trying to understand what is going on.

Can you paste an example of your define_insn from your MD file to show
how operands from D or R are both valid?  I ask this because it is
possible to express that D is more expensive than R with operand
constraints.

For general IRA info, you might like to look over my long thread on
here called "Understanding IRA".

Cheers,
Ian


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Vincent Lefevre
On 2010-03-18 14:43:39 +0100, Vincent Lefevre wrote:
> This violates the ISO C99 standard for -0.0.
> 
> According to N1256, F.9.4.4:
> 
>   pow(±0, y) returns +0 for y > 0 and not an odd integer.
> 
> So, pow(-0.0, 0.5) should return +0. But sqrt(-0.0) should return -0
> according to the IEEE 754 standard (and F.9.4.5 from ISO C99).

I've reported the problem:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43419

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


Re: Coloring problem - Pass 0 for finding allocno costs

2010-03-18 Thread Frank Isamov
-- Forwarded message --
From: Frank Isamov 
Date: Thu, Mar 18, 2010 at 4:28 PM
Subject: Re: Coloring problem - Pass 0 for finding allocno costs
To: Ian Bolton 


On Thu, Mar 18, 2010 at 3:51 PM, Ian Bolton  wrote:
>> The problem I see is that for registers 100,101 I get best register
>> class D instead of R - actually they get the same cost and D is chosen
>> (maybe because it is first).
>
> Hi Frank.
>
> Do D and R overlap?  It would be useful to know which regs are in
> which class, before trying to understand what is going on.
>
> Can you paste an example of your define_insn from your MD file to show
> how operands from D or R are both valid?  I ask this because it is
> possible to express that D is more expensive than R with operand
> constraints.
>
> For general IRA info, you might like to look over my long thread on
> here called "Understanding IRA".
>
> Cheers,
> Ian
>

Hi Ian,

Thank you very much for your prompt reply.
D and R are not overlap. Please see fragments of .md and .h files below:

From the md:

(define_register_constraint "a" "R_REGS" "")
(define_register_constraint "d" "D_REGS" "")

(define_predicate "a_operand"
   (match_operand 0 "register_operand")
{
       unsigned int regno;
       if (GET_CODE (op) == SUBREG)
           op = SUBREG_REG (op);
       regno = REGNO (op);
       return (regno >= FIRST_PSEUDO_REGISTER ||
REGNO_REG_CLASS(regno) == R_REGS);
}
)

(define_predicate "d_operand"
   (match_operand 0 "register_operand")
{
       unsigned int regno;
       if (GET_CODE (op) == SUBREG)
           op = SUBREG_REG (op);
       regno = REGNO (op);
       return (regno >= FIRST_PSEUDO_REGISTER ||
REGNO_REG_CLASS(regno) == D_REGS);
}
)

(define_predicate "a_d_operand"
 (ior (match_operand 0 "a_operand")
      (match_operand 0 "d_operand")))

(define_predicate "i_a_d_operand"
 (ior (match_operand 0 "immediate_operand")
      (match_operand 0 "a_d_operand")))

(define_insn "mov_regs"
 [(set (match_operand:SISFM 0 "a_d_operand" "=a, a, a, d, d, d")
               (match_operand:SISFM 1 "i_a_d_operand"   "a, i, d, a, i, d"))]
 ""
 "move\t%1, %0"
)

(define_insn "*addsi3_1"
 [(set (match_operand:SI 0 "a_d_operand"    "=a, a,  a,d,d")
           (plus:SI (match_operand:SI 1 "a_d_operand" "%a, a,  a,d,d")
                           (match_operand:SI 2 "nonmemory_operand"
"U05,S16,a,d,U05")))]
 ""
 "adda\t%2, %1, %0"
)

;;  Arithmetic Left and Right Shift Instructions
(define_insn "3"
 [(set (match_operand:SCIM 0 "register_operand" "=a,d,d")
           (sh_oprnd:SCIM (match_operand:SCIM 1 "register_operand" "a,d,d")
                                   (match_operand:SI 2
"nonmemory_operand" "U05,d,U05")))
  (clobber (reg:CC CC_REGNUM))]
 ""
 "\t%2, %1, %0"
)

From the h file:

#define REG_CLASS_CONTENTS                                              \
 {
            \
   {0x, 0x, 0x}, /* NO_REGS*/          \
   {0x, 0x, 0x}, /* D_REGS*/          \
   {0x, 0x, 0x}, /* R_REGS*/           \

ABI requires use of R registers for arguments and return value. Other
than that all of these instructions are more or less symmetrical in
sense of using D or R. So, an optimal choice would be use of R for
this example. And if D register is chosen, it involves additional copy
from R to D and back to R.

Thank you, Frank


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Michael Matz
Hi,

On Thu, 18 Mar 2010, Vincent Lefevre wrote:

> On 2010-03-16 16:18:17 +0100, Richard Guenther wrote:
> > pow (a, 0.5) is always expanded to sqrt(a).
> 
> This violates the ISO C99 standard for -0.0.
> 
> According to N1256, F.9.4.4:
> 
>   pow(±0, y) returns +0 for y > 0 and not an odd integer.
> 
> So, pow(-0.0, 0.5) should return +0. But sqrt(-0.0) should return -0
> according to the IEEE 754 standard (and F.9.4.5 from ISO C99).

Yes, and I don't know why they specified it like that.  After all 
(-0)*(-0)==+0 (not ==-0), so the above definition is internally 
insonsistent.  Defining sqrt(-0) as +0 would be equally inconsistent, but 
at least agree with the pow(-0, 0.5) result.

But unfortunately you are right, this expansion can only be done for 
-fno-signed-zeros.  (FWIW the general expandsion of pow(x,N/2) where N!=1 
is already guarded by unsafe_math, but for N==1 we do it unconditionally).


Ciao,
Michael.

RE: Coloring problem - Pass 0 for finding allocno costs

2010-03-18 Thread Ian Bolton
> -Original Message-
> From: Frank Isamov [mailto:frank.isa...@gmail.com]
> Sent: 18 March 2010 14:29
> To: Ian Bolton
> Subject: Re: Coloring problem - Pass 0 for finding allocno costs
> 
> On Thu, Mar 18, 2010 at 3:51 PM, Ian Bolton 
> wrote:
> >> The problem I see is that for registers 100,101 I get best register
> >> class D instead of R - actually they get the same cost and D is
> chosen
> >> (maybe because it is first).
> >
> > Hi Frank.
> >
> > Do D and R overlap?  It would be useful to know which regs are in
> > which class, before trying to understand what is going on.
> >
> > Can you paste an example of your define_insn from your MD file to
> show
> > how operands from D or R are both valid?  I ask this because it is
> > possible to express that D is more expensive than R with operand
> > constraints.
> >
> > For general IRA info, you might like to look over my long thread on
> > here called "Understanding IRA".
> >
> > Cheers,
> > Ian
> >
> 
> Hi Ian,
> 
> Thank you very much for your prompt reply.
> D and R are not overlap. Please see fragments of .md and .h files
> below:
> 
> From the md:
> 
> (define_register_constraint "a" "R_REGS" "")
> (define_register_constraint "d" "D_REGS" "")
> 
> (define_predicate "a_operand"
> (match_operand 0 "register_operand")
> {
> unsigned int regno;
> if (GET_CODE (op) == SUBREG)
> op = SUBREG_REG (op);
> regno = REGNO (op);
> return (regno >= FIRST_PSEUDO_REGISTER ||
> REGNO_REG_CLASS(regno) == R_REGS);
> }
> )
> 
> (define_predicate "d_operand"
> (match_operand 0 "register_operand")
> {
> unsigned int regno;
> if (GET_CODE (op) == SUBREG)
> op = SUBREG_REG (op);
> regno = REGNO (op);
> return (regno >= FIRST_PSEUDO_REGISTER ||
> REGNO_REG_CLASS(regno) == D_REGS);
> }
> )
> 
> (define_predicate "a_d_operand"
>   (ior (match_operand 0 "a_operand")
>(match_operand 0 "d_operand")))
> 
> (define_predicate "i_a_d_operand"
>   (ior (match_operand 0 "immediate_operand")
>(match_operand 0 "a_d_operand")))
> 
> (define_insn "mov_regs"
>   [(set (match_operand:SISFM 0 "a_d_operand" "=a, a, a, d, d, d")
> (match_operand:SISFM 1 "i_a_d_operand"   "a, i, d, a,
> i, d"))]
>   ""
>   "move\t%1, %0"
> )
> 
> (define_insn "*addsi3_1"
>   [(set (match_operand:SI 0 "a_d_operand""=a, a,  a,d,d")
> (plus:SI (match_operand:SI 1 "a_d_operand" "%a, a,
a,d,d")
> (match_operand:SI 2 "nonmemory_operand"
> "U05,S16,a,d,U05")))]
>   ""
>   "adda\t%2, %1, %0"
> )
> 
> ;;  Arithmetic Left and Right Shift Instructions
> (define_insn "3"
>   [(set (match_operand:SCIM 0 "register_operand" "=a,d,d")
> (sh_oprnd:SCIM (match_operand:SCIM 1 "register_operand"
> "a,d,d")
> (match_operand:SI 2
> "nonmemory_operand" "U05,d,U05")))
>(clobber (reg:CC CC_REGNUM))]
>   ""
>   "\t%2, %1, %0"
> )
> 
> From the h file:
> 
> #define REG_CLASS_CONTENTS
> \
>   {
>  \
> {0x, 0x, 0x}, /* NO_REGS*/  \
> {0x, 0x, 0x}, /* D_REGS*/  \
> {0x, 0x, 0x}, /* R_REGS*/   \
> 
> ABI requires use of R registers for arguments and return value. Other
> than that all of these instructions are more or less symmetrical in
> sense of using D or R. So, an optimal choice would be use of R for
> this example. And if D register is chosen, it involves additional copy
> from R to D and back to R.
> 
> Thank you, Frank


Hi Frank,

Have you defined REG_ALLOC_ORDER?  All other things being equal,
IRA will choose the register that comes first in the REG_ALLOC_ORDER.

In terms of operand constraints, if you replace 'd' with '?d' in
each of your operand constraints, it will show IRA that using D_REGS
is slightly more expensive than using R_REGS.  This may not always
be true, however, if the value being put in the D_REG never needs to
be returned, so this might cause IRA to use up all the R_REGS first
with things that need not be there.  I didn't like using constraints
because of this issue - the extra cost sometimes incurred is
conditional, and you can't represent that in the constraints.

Have you defined REGISTER_MOVE_COST?  In defaults.h, it is set to
2, which should be what you are looking for.  If you can get IRA to
see that the best COVER_CLASS is R_REGS then, when D_REGS is given,
there will be an added cost of 2 for that option.

One other thing: Don't overlook Pass 1.  That's where the real costs
are calculated.

I hope some of this helps.  If not, you'll have to send me your
test program and ira dump file.

Best regards,
Ian


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Dominique Dhumieres
May I remind my original question:

> In the block "Handle constant exponents." in gcc/builtins.c, the condition
> !optimize_size has been replaced with optimize_insn_for_speed_p () between
> gcc 4.3 and 4.4, but I have not been able to find when and why.
> Does anybody remembers the when and why?

This change has been commited by someone and should have been approved by
someone else.

TIA

Dominique


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Jakub Jelinek
On Thu, Mar 18, 2010 at 04:07:28PM +0100, Dominique Dhumieres wrote:
> May I remind my original question:
> 
> > In the block "Handle constant exponents." in gcc/builtins.c, the condition
> > !optimize_size has been replaced with optimize_insn_for_speed_p () between
> > gcc 4.3 and 4.4, but I have not been able to find when and why.
> > Does anybody remembers the when and why?
> 
> This change has been commited by someone and should have been approved by
> someone else.

svn blame is your friend, then finding a patch on gcc-patches from that day
up to a few days before the commit.

Jakub


Re: Coloring problem - Pass 0 for finding allocno costs

2010-03-18 Thread Michael Matz
Hi,

On Thu, 18 Mar 2010, Frank Isamov wrote:

> From the h file:
> 
> #define REG_CLASS_CONTENTS                                              \
>  {
>             \
>    {0x, 0x, 0x}, /* NO_REGS*/          \
>    {0x, 0x, 0x}, /* D_REGS*/          \
>    {0x, 0x, 0x}, /* R_REGS*/           \
> 
> ABI requires use of R registers for arguments and return value. Other
> than that all of these instructions are more or less symmetrical in
> sense of using D or R.

If that is so, you're better off not using two different register classes 
at all.  Register classes are there to describe assymetry in the ISA 
register file.  For describing ABI constraints like argument or 
caller-save registers look at FUNCTION_ARG, FUNCTION_ARG_REGNO_P, 
FUNCTION_VALUE_REGNO_P and CALL_USED_REGISTERS (and some friends).

The compiler will e.g. try to make sure to first use call-clobbered 
registers in leaf functions, so that they don't need to be saved/restored 
in the prologue/epilogue.  You don't need register classes to describe 
this.


Ciao,
Michael.

Re: RFC: VTA alias set discrepancy

2010-03-18 Thread Richard Guenther
On Wed, 17 Mar 2010, Jakub Jelinek wrote:

> On Wed, Mar 17, 2010 at 09:26:29PM +0100, Jakub Jelinek wrote:
> > That will very much pessimize debug info.  While we are now always in
> > -funit-at-a-time mode, that doesn't mean DECL_RTL is computed early enough.
> > From the file scope non-static vars, at the point debug info is generated 
> > only
> > the first var usually has DECL_RTL set, all others don't.

But I see no reason why we couldn't compute DECL_RTL early enough.

> ...
> 
> And for the dwarf2out.c hunk please see
> http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00546.html
> 
> In both spots this happens only for TREE_STATIC && !DECL_EXTERNAL
> variables, and the debug info really doesn't care if the alias set will be 0
> on all of those MEMs.  Though of course it could use the decl's alias set
> if it has been computed already, and 0 otherwise.

Well, sure.  The hack with overriding flag_strict_aliasing will work,
it's still a hack.  If we go for it it should be commented as such
and be cleaned up properly (I suppose for TREE_STATIC && !DECL_EXTERNAL
we can easily just build the MEM manually instead of computing DECL_RTL
with all its side-effects).

Richard.


implementing load 8 byte instruction

2010-03-18 Thread roy rosen
Hi,

I am trying to implement a simple load 8 bytes instruction.
I tried to use movdi so that it would allocate two sequential
registers for the load.
It starts well but in pass subreg1 the insns are decomposed and all DI
operands are replaced with SI.

I understand that this is a desireable optimzation but then the load
is done using two load 4 bytes instructions.

Does anybody has any idea what should I do?

Thanks.


RE: Coloring problem - Pass 0 for finding allocno costs

2010-03-18 Thread Ian Bolton
> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Michael Matz
> Sent: 18 March 2010 15:13
> To: Frank Isamov
> Cc: gcc@gcc.gnu.org
> Subject: Re: Coloring problem - Pass 0 for finding allocno costs
> 
> Hi,
> 
> On Thu, 18 Mar 2010, Frank Isamov wrote:
> 
> > From the h file:
> >
> > #define REG_CLASS_CONTENTS
>    \
> >  {
> >             \
> >    {0x, 0x, 0x}, /* NO_REGS*/          \
> >    {0x, 0x, 0x}, /* D_REGS*/          \
> >    {0x, 0x, 0x}, /* R_REGS*/           \
> >
> > ABI requires use of R registers for arguments and return value. Other
> > than that all of these instructions are more or less symmetrical in
> > sense of using D or R.
> 
> If that is so, you're better off not using two different register
> classes
> at all.  Register classes are there to describe assymetry in the ISA
> register file.  For describing ABI constraints like argument or
> caller-save registers look at FUNCTION_ARG, FUNCTION_ARG_REGNO_P,
> FUNCTION_VALUE_REGNO_P and CALL_USED_REGISTERS (and some friends).
> 
> The compiler will e.g. try to make sure to first use call-clobbered
> registers in leaf functions, so that they don't need to be
> saved/restored
> in the prologue/epilogue.  You don't need register classes to describe
> this.

Good spot!  But ...

In the original post, Frank mentioned that all reg classes in an
insn must match ("all registers in an insn should be from the same
register class"), so I don't think a single class would suffice because
there would be no way to enforce the above requirement.

However, I think a superclass that contains both would work, and the
CALL_USED_REGISTERS define can be used to show just the R_REGS.
Meanwhile, the matching class requirement can still be enforced via the
operand constraints.

I might be wrong, however.  I am still learning about IRA too, so feel
free to teach me something new ;-)


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Vincent Lefevre
On 2010-03-18 15:32:04 +0100, Michael Matz wrote:
> > So, pow(-0.0, 0.5) should return +0. But sqrt(-0.0) should return -0
> > according to the IEEE 754 standard (and F.9.4.5 from ISO C99).
> 
> Yes, and I don't know why they specified it like that.  After all 
> (-0)*(-0)==+0 (not ==-0), so the above definition is internally 
> insonsistent.  Defining sqrt(-0) as +0 would be equally inconsistent, but 
> at least agree with the pow(-0, 0.5) result.

sqrt(-0) was defined first by the IEEE 754 standard in 1985.
AFAIK, -0 was chosen to allow a hack for some convention in
interval arithmetic (there may be other reasons). pow(-0, y)
was defined by the C committee.

> But unfortunately you are right, this expansion can only be done for
> -fno-signed-zeros. (FWIW the general expandsion of pow(x,N/2) where
> N!=1 is already guarded by unsafe_math, but for N==1 we do it
> unconditionally).

If GCC is able to track range of values, the transformation could
be allowed when it can be proved that the value -0 is not possible
(e.g. when x > 0).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Nathan Froyd
On Thu, Mar 18, 2010 at 04:34:56PM +0100, Vincent Lefevre wrote:
> On 2010-03-18 15:32:04 +0100, Michael Matz wrote:
> > But unfortunately you are right, this expansion can only be done for
> > -fno-signed-zeros. (FWIW the general expandsion of pow(x,N/2) where
> > N!=1 is already guarded by unsafe_math, but for N==1 we do it
> > unconditionally).
> 
> If GCC is able to track range of values, the transformation could
> be allowed when it can be proved that the value -0 is not possible
> (e.g. when x > 0).

That would be useful, and other language implementations do similar
things.  GCC's VRP (value range propagation pass) does not handle
floating-point values at the moment, though.

-Nathan


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Joseph S. Myers
On Thu, 18 Mar 2010, Vincent Lefevre wrote:

> On 2010-03-18 15:32:04 +0100, Michael Matz wrote:
> > > So, pow(-0.0, 0.5) should return +0. But sqrt(-0.0) should return -0
> > > according to the IEEE 754 standard (and F.9.4.5 from ISO C99).
> > 
> > Yes, and I don't know why they specified it like that.  After all 
> > (-0)*(-0)==+0 (not ==-0), so the above definition is internally 
> > insonsistent.  Defining sqrt(-0) as +0 would be equally inconsistent, but 
> > at least agree with the pow(-0, 0.5) result.
> 
> sqrt(-0) was defined first by the IEEE 754 standard in 1985.
> AFAIK, -0 was chosen to allow a hack for some convention in
> interval arithmetic (there may be other reasons). pow(-0, y)
> was defined by the C committee.

And the same rule on pow(-0, y) is present in 754-2008 (I don't know 
whether this was deliberately following the C definition, or deciding 
independently that this was the right definition, but you may know as a 
listed member of the balloting committee).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Sub-optimal code

2010-03-18 Thread Richard Guenther
On Thu, Mar 18, 2010 at 12:23 PM, Alain Ketterlin
 wrote:
>
> I've reported here recently about gcc producing conditional branches
> with static outcome. I've finally managed to produce a simple
> self-contained example. Here it is:
>
> int f()
> {
>    static int LSW=-1;
>    double x=0.987654321;
>    unsigned char ix = *((char *)&x);
>    if(ix==0xdd || ix==0x3f)
>        LSW = 1;
>    else
>        LSW = 0;
>    return 0;
> }
>
> Compiling this with "gcc-4.5 -O3 -c f.c" (20100304 snapshot) on my
> x86-64 system produces the following code:
>
>   0:   b8 b8 ff ff ff          mov    $0xffb8,%eax
>   5:   3c dd                   cmp    $0xdd,%al
>   7:   0f 94 c0                sete   %al
>   ...
>
> Ok, not a branch, but still useless code. gcc-4.4 does the same, and
> gcc-4.3 is slightly less clever (it moves the whole immediate 8 bytes
> into %rax and does two cmp+je).
>
> I first thought the type-cast was responsible. But if you replace the
> assignment to LSW with a simple return, gcc correctly optimizes the
> whole function away.
>
> Can anybody confirm this?  Is it worth a bug report?

I can confirm this and yes, it is worth a bugreport.  If you have
cases that do not involve floating-point constants accessed as
integer values that would be even more interesting (because
likely more common).

Thanks,
Richard.

> -- Alain.
>


Re: RFC: VTA alias set discrepancy

2010-03-18 Thread Jakub Jelinek
On Thu, Mar 18, 2010 at 04:25:03PM +0100, Richard Guenther wrote:
> On Wed, 17 Mar 2010, Jakub Jelinek wrote:
> 
> > On Wed, Mar 17, 2010 at 09:26:29PM +0100, Jakub Jelinek wrote:
> > > That will very much pessimize debug info.  While we are now always in
> > > -funit-at-a-time mode, that doesn't mean DECL_RTL is computed early 
> > > enough.
> > > From the file scope non-static vars, at the point debug info is generated 
> > > only
> > > the first var usually has DECL_RTL set, all others don't.
> 
> But I see no reason why we couldn't compute DECL_RTL early enough.

Maybe for 4.6, if all code relying on DECL_RTL set meaning it is going to be
output is fixed to use some other way, then perhaps we might create DECL_RTL
on all varpool symbols that might be needed, before starting output of
functions.  Before making this change in cfgexpand.c last year I've
certainly tried to create DECL_RTL without the SET_DECL_RTL afterwards, and
that resulted in many unneeded vars being emitted into assembly.

> > ...
> > 
> > And for the dwarf2out.c hunk please see
> > http://gcc.gnu.org/ml/gcc-patches/2009-11/msg00546.html
> > 
> > In both spots this happens only for TREE_STATIC && !DECL_EXTERNAL
> > variables, and the debug info really doesn't care if the alias set will be 0
> > on all of those MEMs.  Though of course it could use the decl's alias set
> > if it has been computed already, and 0 otherwise.
> 
> Well, sure.  The hack with overriding flag_strict_aliasing will work,
> it's still a hack.  If we go for it it should be commented as such

Yes, we can document it as a hack with a goal to fix eventually.

> and be cleaned up properly (I suppose for TREE_STATIC && !DECL_EXTERNAL
> we can easily just build the MEM manually instead of computing DECL_RTL
> with all its side-effects).

We'd have to duplicate most of make_decl_rtl.  Stuff like
flag_section_anchors handling, etc.  Alternative would be to
move current make_decl_rtl to make_decl_rtl_1, add a bool argument
for_debug and for make_decl_rtl pass false to it.  If for_debug
it would avoid flag_mudflap handling and avoid the creation of
new alias set.  Unfortunately not calling set_mem_attributes is not
desirable, var-tracking wants to see if such MEM is clobbered by
stores and avoiding MEM_EXPR etc. wouldn't be desirable.
Ideally we'd even ask for alias set, as long as it is just a read-only
operation which doesn't change anything.

Jakub


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Vincent Lefevre
On 2010-03-18 15:49:05 +, Joseph S. Myers wrote:
> And the same rule on pow(-0, y) is present in 754-2008 (I don't know 
> whether this was deliberately following the C definition, or deciding 
> independently that this was the right definition, but you may know as a 
> listed member of the balloting committee).

There were many discussions concerning the pow() function and several
changes, but I don't know who decided at the end. I think that anyone
agreed with pow(-0, y) returning +0 for y > 0 not an odd integer (due
to the absence of an unsigned 0, +0 is generally chosen). The case
pow(-1,inf) was much more controversial (it seems that most people
wanted NaN, while I preferred the C behavior for consistency with the
large arguments).

Also note that there are 3 power functions: pown, pow and powr.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


Re: constant hoisting out of loops

2010-03-18 Thread fanqifei
On Thu, Mar 18, 2010 at 2:30 AM, Jim Wilson  wrote:
> On Wed, 2010-03-17 at 11:27 +0800, fanqifei wrote:
>> You are correct. The reload pass emitted the clr.w insn.
>> However, I can see loop opt passes after reload:
>> problem1.c.174r.loop2_invariant1
>
> Not unless you have a modified toolchain.  We don't run loop opt after
> register allocation.  See the list of optimization passes in the FSF GCC
> passes.c file.  loop2 occurs before ira/postreload.  If you do have a
> modified toolchain, then I doubt that the current loop passes would work
> right, since they were designed to handle pseudo-regs not hard-regs.
>
> Jim
>
>
>
That passes were added by me more than two months ago. I thought these
passes could perform the optimization of hoisting constant out of
loop.
I just removed them.  Thanks very much for your help!
Now I am working on the movsi expander.

-- 
-Qifei Fan
http://freshtime.org


Re: libgcc-arch.ver details

2010-03-18 Thread Ian Lance Taylor
"Paulo J. Matos"  writes:

> On Thu, Mar 18, 2010 at 6:51 AM, Andrew Pinski  wrote:
>> On Wed, Mar 17, 2010 at 11:38 PM, Paulo J. Matos  wrote:
>>
>>
>> I cannot remember when half
>> float support came in though, I thought it was only added on the trunk
>> or did you backport that support too.
>>
>
> Thanks for the tips. However, I didn't backport anything. I created my
> branch of gcc 4.3.4 from gcc 4_3 branch rev 150451 and added my own
> changes. Did I pick gcc in the middle of a sequence of commits
> therefore missing the implementation of half floats?

Unlikely.  The question here is whether your target uses HFmode.  If
it does, you have to arrange to provide the HFmode libgcc functions.
That does not happen automatically.  HFmode is a 16-bit floating point
mode; currently the only target which uses that mode is ARM.

Ian


Re: Questions about "Handle constant exponents." in gcc/builtins.c

2010-03-18 Thread Dominique Dhumieres
> Google is your friend...

Thanks Jack. As you can see in comment #46 of pr40106, I have found
my own way. In my previous attempts I have made two mistakes:
(1) I tried to use the search engine of the gcc mailing lists that
kept parsing optimize_insn_for_speed_p as if the _ were spaces.
(2) I did not realized that the history in only kept in trunk.

> svn blame is your friend, ...

When I report a pr I am not interested at all to blame anything
or anybody, only to see the pr fixed ASAP;-)

Cheers,

Dominique

PS For the ongoing discussion about the equivalence between
sqrt(x) and pow(x, 0.5), I'll only say that some influent
people should learn some basic math!


The Linux binutils 2.20.51.0.7 is released

2010-03-18 Thread H.J. Lu
This is the beta release of binutils 2.20.51.0.7 for Linux, which is
based on binutils 2010 0318 in CVS on sourceware.org plus various
changes. It is purely for Linux.

All relevant patches in patches have been applied to the source tree.
You can take a look at patches/README to see what have been applied and
in what order they have been applied.

Starting from the 2.20.51.0.4 release, no diffs against the previous
release will be provided.

You can enable both gold and bfd ld with --enable-gold=both.  Gold will
be installed as ld.gold and bfd ld will be installed as ld.bfd.  By
default, ld.gold will be installed as ld.  You can use the configure
option, --enable-gold=both/bfd to choose bfd ld as the default linker,
ld.  IA-32 binary and X64_64 binary tar balls are configured with
--enable-gold=both/bfd --enable-plugins --enable-threads.

Starting from the 2.18.50.0.4 release, the x86 assembler no longer
accepts

fnstsw %eax

fnstsw stores 16bit into %ax and the upper 16bit of %eax is unchanged.
Please use

fnstsw %ax

Starting from the 2.17.50.0.4 release, the default output section LMA
(load memory address) has changed for allocatable sections from being
equal to VMA (virtual memory address), to keeping the difference between
LMA and VMA the same as the previous output section in the same region.

For

.data.init_task : { *(.data.init_task) }

LMA of .data.init_task section is equal to its VMA with the old linker.
With the new linker, it depends on the previous output section. You
can use

.data.init_task : AT (ADDR(.data.init_task)) { *(.data.init_task) }

to ensure that LMA of .data.init_task section is always equal to its
VMA. The linker script in the older 2.6 x86-64 kernel depends on the
old behavior.  You can add AT (ADDR(section)) to force LMA of
.data.init_task section equal to its VMA. It will work with both old
and new linkers. The x86-64 kernel linker script in kernel 2.6.13 and
above is OK.

The new x86_64 assembler no longer accepts

monitor %eax,%ecx,%edx

You should use

monitor %rax,%ecx,%edx

or
monitor

which works with both old and new x86_64 assemblers. They should
generate the same opcode.

The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,

movl (%eax),%ds
movl %ds,(%eax)

To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,

mov (%eax),%ds
mov %ds,(%eax)

should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support

movw (%eax),%ds
movw %ds,(%eax)

without the 0x66 prefix. Patches for 2.4 and 2.6 Linux kernels are
available at

http://www.kernel.org/pub/linux/devel/binutils/linux-2.4-seg-4.patch
http://www.kernel.org/pub/linux/devel/binutils/linux-2.6-seg-5.patch

The ia64 assembler is now defaulted to tune for Itanium 2 processors.
To build a kernel for Itanium 1 processors, you will need to add

ifeq ($(CONFIG_ITANIUM),y)
CFLAGS += -Wa,-mtune=itanium1
AFLAGS += -Wa,-mtune=itanium1
endif

to arch/ia64/Makefile in your kernel source tree.

Please report any bugs related to binutils 2.20.51.0.7 to
hjl.to...@gmail.com

and

http://www.sourceware.org/bugzilla/

Changes from binutils 2.20.51.0.6:

1. Update from binutils 2010 0318.
2. Don't set ELFOSABI_LINUX for undefined STT_GNU_IFUNC symbols.
3. Improve x86 assembler error messages.
4. Support vpermilp[ds] for x86.
5. Fix strip for group sections.
6. Fix objcopy for PE PIE.  PR 11396.
7. Avoid 32bit overflow in linker.
8. Correct backslash quote logic in assembler.  PR 11356.
9. Improve linker --section-start support.  PR 11304.
10. Properly update LMA.  PR 11219.
11. Don't combine .init_array/.fini_array sections for relocatable link.
12. Add Solaris Sparc/x86 linker support.
13. Support dumping .ARM.exidx/.ARM.extab.
14. Add STT_GNU_IFUNC support for Sparc.
15. Improve gold.
16. Improve arm support.
17. Improve avr support.
18. Improve mips support.
19. Improve ppc support.
20. Improve xtensa support.

Changes from binutils 2.20.51.0.5:

1. Update from binutils 2010 0205.
2. Support x86 XSAVE extended state core dump.
3. Add an option, -mavxscalar=, to x86 assembler to encoding AVX
scalar instructions with VL=256 and update x86 disassembler.
4. Add xsave64/xrstor64 to x86 assembler/disassembler.
5. Add all the possible aliases for VPCOM* insns to x86 assembler.
5. Fix --gc-sections to detect unresolved symbol in DSO.  PR 11218.
6. Support number of ELF program segments > 64K.
7. Support BSD4.4 extended archive write.
8. Report error on bad section name with "objdump -j". PR 11225.
9. Linker now checks if all files are present and indicates those missing.
PR 4437.
10. Allow adding section from empy file with objcopy.
11. Update C++ demangler to support vector.
12. Improve gold.
13. Improve arm support.
14. I

Re: Question on mips multiply patterns in md file

2010-03-18 Thread Jim Wilson
On Thu, 2010-03-18 at 19:20 +0800, Amker.Cheng wrote:
> Does it possible that the method would ever result in register
> allocator failure?
> In my understanding, doesn't reload pass would do whatever it can to make
> all insns' constraints satisfied?

In theory, there should be no failure.  In practice, maybe there will
be.  Lots of things in gcc are more complicated than they appear to be
at first, and reload is the biggest one.

The question in my mind is why does the mul_acc_si pattern have two
alternatives?  Maybe the second alternative was added as an obscure
minor optimization.  Or maybe it was added to fix an obscure register
allocator failure.  Without more info, it is impossible to know, and
thus unsafe to make assumptions about which one is true.  We may be able
to find which one is true by looking at gcc history though.  If we can
find the ChangeLog entry and/or the patch that added the second
alternative, that may tell us why the second alternative is there.

> By this , you mean I should go back to the define_split method if dropping
> the alternative does results in bad insns generated by RA?

Use the define_split method if dropping the alternative results in
reload aborting, or if dropping the alternative results in code that is
less efficient than the code you get with the define_split.

> Another thought on this topic, Maybe I should make two copy of mul_acc_si
> like you said, with one remains the constraints, the other drop the "*?*?".
> Does this is the same as your method suggested?

Yes, that is another possibility.  This makes the first alternative more
desirable, and thus reduces the number of times the second alternative
is chosen, but still allows the second one to be chosen in some cases
when it might be better. This might even be a better solution than the
first two we had.  The only way to know is to do some experimenting,
looking at assembly code to see what happens with the different choices.

Jim



gcc-4.5-20100318 is now available

2010-03-18 Thread gccadmin
Snapshot gcc-4.5-20100318 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100318/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 157552

You'll find:

gcc-4.5-20100318.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.5-20100318.tar.bz2 C front end and core compiler

gcc-ada-4.5-20100318.tar.bz2  Ada front end and runtime

gcc-fortran-4.5-20100318.tar.bz2  Fortran front end and runtime

gcc-g++-4.5-20100318.tar.bz2  C++ front end and runtime

gcc-java-4.5-20100318.tar.bz2 Java front end and runtime

gcc-objc-4.5-20100318.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.5-20100318.tar.bz2The GCC testsuite

Diffs from 4.5-20100311 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Hash Function for "switch statement"

2010-03-18 Thread Jae Hyuk Kwak
Hi Michael,

Thank you for the details.


On Thu, Mar 18, 2010 at 8:17 AM, Michael Meissner
 wrote:
>> > Note, that many hash tables are computed by the modulus operation, which is
>> > often fairly expensive (and on machines without a hardware divide unit,
>> > requiring a function call).  I would expect many switch statements would 
>> > slow
>> > down if you switch to a hash operation that used modolus.
>>
>> I agree that the cost of modulation can be high, but it can be even
>> higher if we use a bunch of "else if".
>> Consider the situation that a program has about 400 cases on a single
>> switch statement.
>
> However the compiler doesn't do 400 if-then-else's, instead it generates a
> decision tree using less than/greater than type tests so that log2
> comparison/branches are made.  So if there are 400 cases, it means there are
> probably 10 comparisons/branches done for any given value (and on some
> architectures, the last comparison for equality can be eliminated).  Looking 
> at
> the i386.c tunings, you see divides are in the 66 - 74 cycle ranges for 
> popular
> current chips.  If a comparison/branch is say 6 cycles, that means it is still
> faster on average to do the tree of if's than to do a modulus, load, and
> branch for 400 or so switch elements.

If I understood correctly, your point is that O(log N) is fast enough
because the size of switch is small in practice.
But I am still not convinced that using hash table would not increase the speed.
As we know hash table requires O(N) only.
There must be some point that O(N) bits O(logN), depending on the size
of N and the constant cost of O(N).

Is that true that current implementation of gcc on i386 optimizes
switch statement with decision tree?
I was expecting somebody who can confirm this.

I tried to find the specific implementation on the file, i386.c.
But it is very big file and I have only limited knowledge of gcc yet.
If you can point more specific implementation part, I may be able to
catch up what you meant.


>> If jump table is possible, it can be a choice, but jump table is not
>> always feasible depending on the values on "case".
>
> There are other strategies, for example if most values are clustered together,
> you can use if's for the outlying values, and then use a jump table for the
> values clustered together.  When I did the Data General C compiler many years
> ago, this was a win in our particular environment, because there were often
> code that did a switch on errno with a 0 case, and all of the E values.  In 
> the
> DG environment, the E values were high because each language had its own range
> of error codes, and the C language was a relative late comer.

The paper that Basile pointed well summarized some of strategies you
might refer to.
As I mentioned earlier, the paper uses a branching tree way but does
not dig into hash table part much.
And at the time moment, 2008, gcc did not seem to use even decision
tree for switch statement yet.


Thanks

Jay


Re: fixincl 'make check' regressions...

2010-03-18 Thread David Miller

You said you would fix this several nights ago, but I still
haven't seen any changes to fixincludes since then.

When will you get around to fixing these regressions you
introduced?

Thank you.