Re: GCC 4.6 performance regressions

2011-02-09 Thread Jonathan Wakely
On 9 February 2011 06:20, Tony Poppleton wrote:
>
> Out of interest, has their been much communication in the past between
> GCC and Phoronix to address any of these issues in their previous
> benchmarks?

I signed up to their forum to point out that snapshots have additional
checking turned on by default, but didn't see anyone "official"
replying to acknowledge it, only other forums users.  I wasn't
interested in spending any time trying to reach them.


RTL Expand pass - Difference in float and int datatypes

2011-02-09 Thread Anitha Boyapati
Hello All,

I  am trying to understand rtl generation. For the following testcase
I have dumped the rtl... (version: gcc-4.3.3, target: AVR32  not
upstreamed yet)

int main() {
    volatile int a, b;
    if(a>b) return 1;
    else return 2;

}

Expand  pass shows that "le"operator  is chosen.

(jump_insn 9 8 10 3 gt_int.c:4 (set (pc)
(if_then_else (le:CC (cc0)
(const_int 0 [0x0]))
(label_ref 14)
(pc))) -1 (nil))


If I modify the same testcase to use 'float' datatype instead of
'int', "gt" operator is picked up.

(jump_insn 9 8 34 3 gt.c:4 (set (pc)
(if_then_else (gt:CC (cc0)
(const_int 0 [0x0]))
(label_ref 12)
(pc))) -1 (nil))


I would like to know what prompts gcc to decide if "le" can be used in
the expand pass rather than "gt" operator. Or more precisely why it
differs incase of float.

I am not sure how much of this is target dependent, but the port being
used is for avr32 which is not upstream yet.  It will be great if I
can get some pointers on the above.


Regards
Anitha


Re: GCC 4.6 performance regressions

2011-02-09 Thread Sebastian Pop
On Wed, Feb 9, 2011 at 00:20, Tony Poppleton  wrote:
>> While I appreciate Phoronix as a booster site, their benchmarking
>> practice often seems very dodgy; I'd take the results with a large grain
>> of salt
>
> The main reason I posted the link in the first place was because it
> was reflecting my own emperical evidence for the application I am
> working on; the march/mtune flags (on various corei* cpus) are
> actually detrimental to performance (albeit by at most 10% - but
> still, not the performance boost I was hoping for).
>
> I had been trying for the last few weeks to strip down my application
> into a mini-benchmark I could create a PR out of, however it is
> tougher than expected and was hoping the Phoronix article was going to
> offer a quicker route to finding performance regressions than my code
> - as their coverage was a lot wider.  Anyway, apparently this is not
> the case, so back to my original work...
>
> Would it not be in the best interests for both GCC and Phoronix to
> rectify the problems in their benchmarks?

I agree with you here: for many reasons it would be interesting to
have Phoronix as a free alternative to the SPEC benchmarks.

>  Could we not contact the
> authors of the article and point out how they can make their
> benchmarks better?  I would be happy to contact them myself, however I
> think it would be far more effective (and valid) coming from a GCC
> maintainer.

I put Michael in CC.

>
> Points they have apparently missed so far are;
>  - clarify which compiler flags they are using

Also, the mechanism used to change the flags for a benchmark should be
documented and should produce the intended behaviour.

For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so
building this benchmark with CFLAGS="-O2" would have no effect.

Sebastian

>  - don't run make -j when looking at compile times
>  - ensure they are using --enable-checking=release when benchmarking a 
> snapshot
>  - in general, as much information as possible as to their setup and
> usage, to make the results easily repeatable
>
> Out of interest, has their been much communication in the past between
> GCC and Phoronix to address any of these issues in their previous
> benchmarks?
>
> Tony
>


Re: GCC 4.6 performance regressions

2011-02-09 Thread Jonathan Wakely
On 9 February 2011 08:34, Sebastian Pop wrote:
>
> For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so
> building this benchmark with CFLAGS="-O2" would have no effect.

Why not?

Ignoring the fact -O3 is the highest level for GCC, the manual says:
"If you use multiple -O options, with or without level numbers, the
last such option is the one that is effective."
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

And CFLAGS="-fno-fast-math -O2" would cancel the effects of -ffast-math too.


Re: GCC 4.6 performance regressions

2011-02-09 Thread Jakub Jelinek
On Wed, Feb 09, 2011 at 08:42:05AM +, Jonathan Wakely wrote:
> On 9 February 2011 08:34, Sebastian Pop wrote:
> >
> > For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so
> > building this benchmark with CFLAGS="-O2" would have no effect.
> 
> Why not?
> 
> Ignoring the fact -O3 is the highest level for GCC, the manual says:
> "If you use multiple -O options, with or without level numbers, the
> last such option is the one that is effective."
> http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
> 
> And CFLAGS="-fno-fast-math -O2" would cancel the effects of -ffast-math too.

Last time I've checked e.g. their byte-benchmark uses
OPTON = -O
...
$(CC) -o $(PROGDIR)/arithoh ${CFLAGS} ${OPTON} -Darithoh 
$(SRCDIR)/arith.c
and similarly for most of the other files, so it really doesn't matter what -O
level you provide in CFLAGS there, they are just comparing -O performance.

Jakub


Re: GCC 4.6 performance regressions

2011-02-09 Thread Richard Guenther
On Wed, Feb 9, 2011 at 7:20 AM, Tony Poppleton  wrote:
>> While I appreciate Phoronix as a booster site, their benchmarking
>> practice often seems very dodgy; I'd take the results with a large grain
>> of salt
>
> The main reason I posted the link in the first place was because it
> was reflecting my own emperical evidence for the application I am
> working on; the march/mtune flags (on various corei* cpus) are
> actually detrimental to performance (albeit by at most 10% - but
> still, not the performance boost I was hoping for).
>
> I had been trying for the last few weeks to strip down my application
> into a mini-benchmark I could create a PR out of, however it is
> tougher than expected and was hoping the Phoronix article was going to
> offer a quicker route to finding performance regressions than my code
> - as their coverage was a lot wider.  Anyway, apparently this is not
> the case, so back to my original work...
>
> Would it not be in the best interests for both GCC and Phoronix to
> rectify the problems in their benchmarks?  Could we not contact the
> authors of the article and point out how they can make their
> benchmarks better?  I would be happy to contact them myself, however I
> think it would be far more effective (and valid) coming from a GCC
> maintainer.

I have done analyses on C-Ray and Himeno last december and filed
a few bugzillas (and fixed the easiest ones).  Most of the benchmarks
they use have the problem that they do not use simple tricks that
are available (like build the single-file ones with -fwhole-program).
But they probably represent the "bad" coding style that is quite common,
so optimizing the codes is probably useful.

> Points they have apparently missed so far are;
>  - clarify which compiler flags they are using
>  - don't run make -j when looking at compile times
>  - ensure they are using --enable-checking=release when benchmarking a 
> snapshot
>  - in general, as much information as possible as to their setup and
> usage, to make the results easily repeatable
>
> Out of interest, has their been much communication in the past between
> GCC and Phoronix to address any of these issues in their previous
> benchmarks?

Yes.

Richard.

> Tony
>


Re: Broken bootstrap on Cygwin

2011-02-09 Thread Dave Korn
On 08/02/2011 16:08, Dave Korn wrote:
>   Sorry all, been offline for a couple of days after my pc blew up.
> 
> On 07/02/2011 20:50, Angelo Graziosi wrote:
> 
>> I do not understand the logic here: break GCC trunk for something that
>> hasn't been yet released.
> 
>   But GCC trunk has not been released either yet!  GCC trunk and Cygwin trunk
> are in synch.  The next released versions of both will work together.  People
> working on trunk occasionally need to use trunk versions of other things such
> as binutils or glibc to get the full functionality, and there's always
> --disable-libquadmath for those who don't want to.

  *sigh* Thinko here and in my other reply.  Please read 'libdecnumber' and
'--disable-decimal-float' respectively.  Patches to follow.

cheers,
  DaveK


Re: MELT plugin: test fopen

2011-02-09 Thread Daniel Marjamäki
Hello Pierre!

It is an interesting plugin. I'll keep an eye on your repository.

Thanks!
Daniel Marjamäki


2011/2/9 Pierre Vittet :
> Hello,
>
> I would like to present you a small plugin, which could be a good exemple of
> a MELT use case.
> This plugin allows to monitor that after every call to the fopen function,
> we have a test on the pointer returned by fopen (monitoring that it is not
> null).
>
> It creates a pass after SSA and works on gimple. It firstly matchs the
> gimple_call on fopen and save the tree corresponding to the FILE *. Then we
> check that the next instruction is a gimple_cond between the saved tree and
> the NULL ptr.
> When no test are found, the following error is returned:
> "test_fopen_cfile.c:35:11: warning: fopen not followed by a test on his
> returned pointer [enabled by default]"
>
> I think MELT is particulary adaptated when we have to match tree or gimples
> like I do here.
>
> For the moment I have only used it on small test file. I will try to see
> what it gives on a more realistic small to medium application.
>
> The code can be find on github: https://github.com/Piervit/GMWarn . The idea
> is to add more plugins. If you have some ideas or remarks, I am interested
> (however I still have to learn a lot from both GCC and MELT).
>
> If you try the code you will need to use the GCC MELT branch (or there is
> some changes to do in the Makefile to have it with MELT as plugin).
>
>
> Regards!
>
>


loop hoisting fails

2011-02-09 Thread Paulo J. Matos

Hi,

I am facing a problem with code hoisting from a loop in our backend.

This problem seems to be hinted at on:
-fbranch-target-load-optimize
Perform branch target register load optimization before prologue / 
epilogue threading. The use of target registers can typically be exposed 
only during reload, thus hoisting loads out of loops and doing 
inter-block scheduling needs a separate optimization pass.



However, the problem has nothing to do with this flag.
Our architecture doesn't allow something like:

store 0,(X+2) ; store 0 into mem of X+2

So we need to do
load Y,0 ; load Y with constant zero
store Y,(X+2)  ; store the 0 in Y into mem of X+2

If I compile a loop:
int vec[8];
int i = 0;
while(i++ < 8)
  vec[i] = 0;

I something like:

  load X,#vec
?L2:
  load Y,0
  store Y,(X+0)
  add X,1

The load Y,0 though should be out of the loop.
I know what the problem is... gcc during expand has:

For ?L2 bb:

(set (mem/s:QI (reg:QI 41)) (const_int 0))

This is only transformed after the reload into:

(set (reg:QI Y) (const_int 0))
(set (mem/s:QI X) (reg:QI Y))

And so there's no opportunity after reloading for hoistering it seems... 
so I though about splitting the insn into two during expand:


(set (reg:QI Y) (const_int 0))
(set (mem/s:QI X) (reg:QI Y))

But then this is combined by cse into:

(set (mem/s:QI (reg:QI 41)) (const_int 0))

and bammm, same problem. No loop hoisting. What's the best way to handle 
this? Any suggestions?


Cheers,

Paulo Matos




Spelling mistake!

2011-02-09 Thread Anonymous bin Ich

At

http://gcc.gnu.org/onlinedocs/libstdc++/manual/streambufs.html

Under section 'Buffering', Particularly is written as 'Chaptericularly'.

Regards


Re: GCC 4.6 performance regressions

2011-02-09 Thread Michael Larabel

Hi Everyone,

I've been following the thread already from the start (I'm on the list). 
Matthew Tippett (CC'ed now as well) and I have already been working on 
some ways to help further and one of us should have some more 
information to be presented shortly. If any of you have any other 
questions please feel free to let us know.


-- Michael

On 02/09/2011 02:34 AM, Sebastian Pop wrote:

On Wed, Feb 9, 2011 at 00:20, Tony Poppleton  wrote:

While I appreciate Phoronix as a booster site, their benchmarking
practice often seems very dodgy; I'd take the results with a large grain
of salt

The main reason I posted the link in the first place was because it
was reflecting my own emperical evidence for the application I am
working on; the march/mtune flags (on various corei* cpus) are
actually detrimental to performance (albeit by at most 10% - but
still, not the performance boost I was hoping for).

I had been trying for the last few weeks to strip down my application
into a mini-benchmark I could create a PR out of, however it is
tougher than expected and was hoping the Phoronix article was going to
offer a quicker route to finding performance regressions than my code
- as their coverage was a lot wider.  Anyway, apparently this is not
the case, so back to my original work...

Would it not be in the best interests for both GCC and Phoronix to
rectify the problems in their benchmarks?

I agree with you here: for many reasons it would be interesting to
have Phoronix as a free alternative to the SPEC benchmarks.


  Could we not contact the
authors of the article and point out how they can make their
benchmarks better?  I would be happy to contact them myself, however I
think it would be far more effective (and valid) coming from a GCC
maintainer.

I put Michael in CC.


Points they have apparently missed so far are;
  - clarify which compiler flags they are using

Also, the mechanism used to change the flags for a benchmark should be
documented and should produce the intended behaviour.

For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so
building this benchmark with CFLAGS="-O2" would have no effect.

Sebastian


  - don't run make -j when looking at compile times
  - ensure they are using --enable-checking=release when benchmarking a snapshot
  - in general, as much information as possible as to their setup and
usage, to make the results easily repeatable

Out of interest, has their been much communication in the past between
GCC and Phoronix to address any of these issues in their previous
benchmarks?

Tony



Re: Spelling mistake!

2011-02-09 Thread Paolo Carlini
On 02/09/2011 01:44 PM, Anonymous bin Ich wrote:
> At
>
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/streambufs.html
>
> Under section 'Buffering', Particularly is written as 'Chaptericularly'.
Fixed in SVN. Thanks.

Paolo.


Re: RTL Expand pass - Difference in float and int datatypes

2011-02-09 Thread Ian Lance Taylor
Anitha Boyapati  writes:

> int main() {
>     volatile int a, b;
>     if(a>b) return 1;
>     else return 2;
>
> }
>
> Expand  pass shows that "le"operator  is chosen.
>
> (jump_insn 9 8 10 3 gt_int.c:4 (set (pc)
> (if_then_else (le:CC (cc0)
> (const_int 0 [0x0]))
> (label_ref 14)
> (pc))) -1 (nil))
>
>
> If I modify the same testcase to use 'float' datatype instead of
> 'int', "gt" operator is picked up.
>
> (jump_insn 9 8 34 3 gt.c:4 (set (pc)
> (if_then_else (gt:CC (cc0)
> (const_int 0 [0x0]))
> (label_ref 12)
> (pc))) -1 (nil))
>
>
> I would like to know what prompts gcc to decide if "le" can be used in
> the expand pass rather than "gt" operator. Or more precisely why it
> differs incase of float.

The choice of LE when using int is just a happenstance of the way that
gcc generates code.  When gcc comes to RTL generation it happens to be
looking at an if statement that falls through on true and branches on
else.  So it flips the condition to branch on true (in
do_compare_rtx_and_jump).  The use of volatile doesn't affect this: it
will still only load the variables precisely as described in the
program.

The condition can not be flipped for float because that would be an
invalid transformation if one of the values is NaN.

Ian


Re: loop hoisting fails

2011-02-09 Thread Ian Lance Taylor
"Paulo J. Matos"  writes:

> But then this is combined by cse into:
>
> (set (mem/s:QI (reg:QI 41)) (const_int 0))
>
> and bammm, same problem. No loop hoisting. What's the best way to
> handle this? Any suggestions?

You need to set TARGET_RTX_COSTS to indicate that this operation is
relatively expensive.  That should stop combine from generating it.

Ian


Re: loop hoisting fails

2011-02-09 Thread Paulo J. Matos

On 09/02/11 15:07, Ian Lance Taylor wrote:


You need to set TARGET_RTX_COSTS to indicate that this operation is
relatively expensive.  That should stop combine from generating it.



Thanks, I will give it a try. One of the things that are not as polished 
as it should is exactly the rtx costs. I am writing costs for optimize_size.


It is slightly confusing for me the way it works. I have added a debug 
printf in it so I can debug the costs and when they are called and I get 
a list like this:


== RTXCOST[pass]: outer_code rtx => cost rec/done==

And it starts with:
== RTXCOST[]: UnKnown (const_int 0 [0x0]) => 0 [done]==
== RTXCOST[]: set (plus:QI (reg:QI 18 [0])
(reg:QI 18 [0])) => 4 [rec]==
== RTXCOST[]: set (neg:QI (reg:QI 18 [0])) => 4 [rec]==
...
== RTXCOST[]: set (set (reg:QI 21)
(subreg:QI (reg:SI 14 virtual-stack-vars) 3)) => 4 [rec]==

The first one is the only one that is called with an unknown outer_code. 
All the other seems to have an outer_code set which should be the 
context in which it shows up but then you get things like the last line. 
outer_code is a set but the rtx itself is _also_ a set. How is this 
possible?


Now, what's the best way to write the costs? For example, what's the 
cost for a constant? If the rtx is a (const_int 0), and the outer is 
unknown I would guess the cost is 0 since there's no cost in using it. 
However, if the outer_code is set, then the cost depends on where I am 
putting the zero into. If the outer_code is plus, should the cost be 0? 
(since x + 0 = x and the operation will probably be discarded?)


Any tips on this?

Cheers,

Paulo Matos



Re: loop hoisting fails

2011-02-09 Thread Ian Lance Taylor
"Paulo J. Matos"  writes:

> It is slightly confusing for me the way it works. I have added a debug
> printf in it so I can debug the costs and when they are called and I
> get a list like this:
>
> == RTXCOST[pass]: outer_code rtx => cost rec/done==
>
> And it starts with:
> == RTXCOST[]: UnKnown (const_int 0 [0x0]) => 0 [done]==
> == RTXCOST[]: set (plus:QI (reg:QI 18 [0])
> (reg:QI 18 [0])) => 4 [rec]==
> == RTXCOST[]: set (neg:QI (reg:QI 18 [0])) => 4 [rec]==
> ...
> == RTXCOST[]: set (set (reg:QI 21)
> (subreg:QI (reg:SI 14 virtual-stack-vars) 3)) => 4 [rec]==
>
> The first one is the only one that is called with an unknown
> outer_code. All the other seems to have an outer_code set which should
> be the context in which it shows up but then you get things like the
> last line. outer_code is a set but the rtx itself is _also_ a set. How
> is this possible?

SET is more or less a default outer_code, you can just ignore it.

> Now, what's the best way to write the costs? For example, what's the
> cost for a constant? If the rtx is a (const_int 0), and the outer is
> unknown I would guess the cost is 0 since there's no cost in using
> it. However, if the outer_code is set, then the cost depends on where
> I am putting the zero into. If the outer_code is plus, should the cost
> be 0? (since x + 0 = x and the operation will probably be discarded?)

For your processor it sounds like you should make a constant more
expensive than a register for an outer code of SET.  You're right that
the cost should really depend on the destination of the set but
unfortunately I don't know if you will see that.

I agree that costs are unfortunately not very well documented and the
middle-end does not use them in the most effective manner.  It's still
normally the right mechanism to use to control what combine does.

Ian


Fwd: AIX vs long double

2011-02-09 Thread Tobias Burnus
Cf. http://gcc.gnu.org/ml/gcc/2011-02/msg00109.html and 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47032


Seems as if one needs to add link-time tests to libgfortran for some of 
the C99 functions - the compile checks exists and succeed.


Tobias

 Original Message 
Subject: AIX vs long double
Date: Mon, 7 Feb 2011 14:01:44 -0500
From: David Edelsohn 
To: GCC Development 

AIX provides two versions of long double and declares all of the C99
long double symbols in math.h header file.  One implementation aliases
long double to IEEE double precision type and the other implementation
aliases long double to IBM's paired double format.  All of the C99
symbols for IEEE double precision are implemented in libm, but not all
of the C99 symbols for the IBM long double format are implemented.

IBM's proprietary XL compiler (cc, xlc) defaults to IEEE double
precision and provides a special invocation to default to IBM long
double (cc128, xlc128).

GNU/Linux and GCC on GNU/Linux default to IEEE quad precision long double.

Because the long double choice is an ABI change, GCC on AIX switched
to GNU/Linux compatibility with support for AIX 6.1, before the
incomplete implementation was noticed.  This mostly worked until
libgfortran started using additional C99 functions, causing GCC
Bugzilla

[target/47032] libgfortran references complex long double functions
missing on AIX

libstdc++-v3 on AIX builds with AIX long double support, although user
programs that try to access the missing functions (copysign,
nextafter, etc.) will experience link-time failures.  libgfortran on
AIX fails to build with AIX long double support because it accesses
the missing functions in its implementation.

I would like to solicit feedback about how to proceed: Should GCC on
AIX revert back to 64 bit long double size, allowing all libraries to
build and work, but breaking GCC/G++ ABI on AIX? Or should it continue
to utilize 128 bit long double, building libstdc++-v3 that works if
the user program does not utilize the missing symbols and fails to
build libgfortran?

Thanks, David



Re: RTL Expand pass - Difference in float and int datatypes

2011-02-09 Thread Anitha Boyapati
On 9 February 2011 20:34, Ian Lance Taylor  wrote:

>> I would like to know what prompts gcc to decide if "le" can be used in
>> the expand pass rather than "gt" operator. Or more precisely why it
>> differs incase of float.
>
> The choice of LE when using int is just a happenstance of the way that
> gcc generates code.  When gcc comes to RTL generation it happens to be
> looking at an if statement that falls through on true and branches on
> else.  So it flips the condition to branch on true (in
> do_compare_rtx_and_jump).  The use of volatile doesn't affect this: it
> will still only load the variables precisely as described in the
> program.
>
> The condition can not be flipped for float because that would be an
> invalid transformation if one of the values is NaN.
>

ok. I get it.

I would like to understand some more of reverse-conditional branches
in case of float. For the target I am working on (AVR32),  there is a
floating point unit which follows IEEE 754. The port is relatively
stable. We are now incrementally building support for these FPU
instructions (like fadd, fmul, fcmp).

Now for the same test case with float data type, after expand pass, in
one of the passes (outof_cfglayout) the direct-conditional branches
are turned to reverse-conditional branches.


Direct-conditional branch

> (jump_insn 9 8 34 3 gt.c:4 (set (pc)
> (if_then_else (gt:CC (cc0)
> (const_int 0 [0x0]))
> (label_ref 12)
> (pc))) -1 (nil))

Reverse-conditional Branch

> (jump_insn 9 8 34 3 gt.c:4 (set (pc)
> (if_then_else (gt:CC (cc0)
> (const_int 0 [0x0]))
> (pc))) -1 (nil))
> (label_ref 14)

(Sorry that I don't have access to the installation right now, so I
just manually modified the reverse-conditional branch. The idea is to
illustrate that label_ref and pc are interchanged).

The latter pattern is supposed to emit assembly which tests for the
reverse-condition. For instance if  the former pattern emits assembly
instruction like "brgt " then the latter pattern is supposed to
emit instruction like "brle "

I am a little confused here. I think this kind of condition reversal
should not happen for floating point comparisons (IEEE 754). Am I on
the right track still? If yes, how to prevent this kind of jump
optimization? I see some REVERSIBLE_CC_MODE which is not defined in
our case. Also not defined is REVERSE_CONDITION...

Moreover I see that this kind of interdependence of patterns in jumps
has been removed when the branch  "cond-optab" got merged. So this
condition-reversal branch may not be seen in higher versions of gcc ?
(like 4.5.1)

I am not sure if I am overloading this conversation with too many of
questions...I am trying to present the problem(s) clearly.
Thanks Ian for the hint on do_compare_rtx_and jump ()!! That was
exactly what I looked for.

Anitha


Re: RTL Expand pass - Difference in float and int datatypes

2011-02-09 Thread Richard Guenther
On Wed, Feb 9, 2011 at 5:55 PM, Anitha Boyapati
 wrote:
> On 9 February 2011 20:34, Ian Lance Taylor  wrote:
>
>>> I would like to know what prompts gcc to decide if "le" can be used in
>>> the expand pass rather than "gt" operator. Or more precisely why it
>>> differs incase of float.
>>
>> The choice of LE when using int is just a happenstance of the way that
>> gcc generates code.  When gcc comes to RTL generation it happens to be
>> looking at an if statement that falls through on true and branches on
>> else.  So it flips the condition to branch on true (in
>> do_compare_rtx_and_jump).  The use of volatile doesn't affect this: it
>> will still only load the variables precisely as described in the
>> program.
>>
>> The condition can not be flipped for float because that would be an
>> invalid transformation if one of the values is NaN.
>>
>
> ok. I get it.
>
> I would like to understand some more of reverse-conditional branches
> in case of float. For the target I am working on (AVR32),  there is a
> floating point unit which follows IEEE 754. The port is relatively
> stable. We are now incrementally building support for these FPU
> instructions (like fadd, fmul, fcmp).
>
> Now for the same test case with float data type, after expand pass, in
> one of the passes (outof_cfglayout) the direct-conditional branches
> are turned to reverse-conditional branches.
>
>
> Direct-conditional branch
>
>> (jump_insn 9 8 34 3 gt.c:4 (set (pc)
>>         (if_then_else (gt:CC (cc0)
>>                 (const_int 0 [0x0]))
>>             (label_ref 12)
>>             (pc))) -1 (nil))
>
> Reverse-conditional Branch
>
>> (jump_insn 9 8 34 3 gt.c:4 (set (pc)
>>         (if_then_else (gt:CC (cc0)
>>                 (const_int 0 [0x0]))
>>             (pc))) -1 (nil))
>>             (label_ref 14)
>
> (Sorry that I don't have access to the installation right now, so I
> just manually modified the reverse-conditional branch. The idea is to
> illustrate that label_ref and pc are interchanged).
>
> The latter pattern is supposed to emit assembly which tests for the
> reverse-condition. For instance if  the former pattern emits assembly
> instruction like "brgt " then the latter pattern is supposed to
> emit instruction like "brle "

That's the misunderstanding.  The above should emit a brngt 
(branch if !gt).

Richard.


Re: RTL Expand pass - Difference in float and int datatypes

2011-02-09 Thread Anitha Boyapati
On 9 February 2011 22:28, Richard Guenther  wrote:
> On Wed, Feb 9, 2011 at 5:55 PM, Anitha Boyapati
>  wrote:
>> On 9 February 2011 20:34, Ian Lance Taylor  wrote:
>>
 I would like to know what prompts gcc to decide if "le" can be used in
 the expand pass rather than "gt" operator. Or more precisely why it
 differs incase of float.
>>>
>>> The choice of LE when using int is just a happenstance of the way that
>>> gcc generates code.  When gcc comes to RTL generation it happens to be
>>> looking at an if statement that falls through on true and branches on
>>> else.  So it flips the condition to branch on true (in
>>> do_compare_rtx_and_jump).  The use of volatile doesn't affect this: it
>>> will still only load the variables precisely as described in the
>>> program.
>>>
>>> The condition can not be flipped for float because that would be an
>>> invalid transformation if one of the values is NaN.
>>>
>>
>> ok. I get it.
>>
>> I would like to understand some more of reverse-conditional branches
>> in case of float. For the target I am working on (AVR32),  there is a
>> floating point unit which follows IEEE 754. The port is relatively
>> stable. We are now incrementally building support for these FPU
>> instructions (like fadd, fmul, fcmp).
>>
>> Now for the same test case with float data type, after expand pass, in
>> one of the passes (outof_cfglayout) the direct-conditional branches
>> are turned to reverse-conditional branches.
>>
>>
>> Direct-conditional branch
>>
>>> (jump_insn 9 8 34 3 gt.c:4 (set (pc)
>>>         (if_then_else (gt:CC (cc0)
>>>                 (const_int 0 [0x0]))
>>>             (label_ref 12)
>>>             (pc))) -1 (nil))
>>
>> Reverse-conditional Branch
>>
>>> (jump_insn 9 8 34 3 gt.c:4 (set (pc)
>>>         (if_then_else (gt:CC (cc0)
>>>                 (const_int 0 [0x0]))
>>>             (pc))) -1 (nil))
>>>             (label_ref 14)
>>
>> (Sorry that I don't have access to the installation right now, so I
>> just manually modified the reverse-conditional branch. The idea is to
>> illustrate that label_ref and pc are interchanged).
>>
>> The latter pattern is supposed to emit assembly which tests for the
>> reverse-condition. For instance if  the former pattern emits assembly
>> instruction like "brgt " then the latter pattern is supposed to
>> emit instruction like "brle "
>
> That's the misunderstanding.  The above should emit a brngt 
> (branch if !gt).
>

OK. We are using the same branch pattern for INT as well as all modes.
For int, generating "brle" should be fine I think where as float this
is wrong. So that leaves us to define a separate branch pattern for
floating mode.

It would be very interesting to know why a direct-conditional branch
is turned to reverse-conditional branch at all... I have been skimming
through jump.c and cfglayout.c for some clues.

Anitha


g++.dg/tree-ssa/inline-3.C

2011-02-09 Thread Jason Merrill
This testcase is ill-formed (and breaks in C++0x mode), because it 
declares puts to have the wrong return type.  I note that changing it to 
have the right return type causes it to no longer be inlined.  What do 
you suggest we do about this?


Jason


Re: g++.dg/tree-ssa/inline-3.C

2011-02-09 Thread Richard Guenther
On Wed, Feb 9, 2011 at 11:39 PM, Jason Merrill  wrote:
> This testcase is ill-formed (and breaks in C++0x mode), because it declares
> puts to have the wrong return type.  I note that changing it to have the
> right return type causes it to no longer be inlined.  What do you suggest we
> do about this?

I think this is PR47663 - something I stumbled over today by accident ;)
I suggest to either XFAIL the testcase or change it to use std::free
(or some other function with void return type).

Richard.

> Jason
>


math-68881.h vs -ffast-math

2011-02-09 Thread Vincent Rivière

Hello.

The file gcc/config/m68k/math-68881.h is distributed with GCC. It is about 
inlining the libm functions using FPU instructions on m68k targets.


But -ffast-math seems to serve the same purpose, even better.

My question: Is math-68881.h still useful for some purpose ?

BTW: That file does not support C99, I have just posted a patch to fix that.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47672

--
Vincent Rivière


Re: GCC 4.6 performance regressions

2011-02-09 Thread Quentin Neill
On Wed, Feb 9, 2011 at 2:42 AM, Jonathan Wakely  wrote:
> On 9 February 2011 08:34, Sebastian Pop wrote:
>>
>> For example x264 defines CFLAGS="-O4 -ffast-math $CFLAGS", and so
>> building this benchmark with CFLAGS="-O2" would have no effect.
>
> Why not?
>
> Ignoring the fact -O3 is the highest level for GCC, the manual says:
> "If you use multiple -O options, with or without level numbers, the
> last such option is the one that is effective."
> http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
>
> And CFLAGS="-fno-fast-math -O2" would cancel the effects of -ffast-math too.
>
Because the makefile can override CFLAGS in the environment (or in a
make variable) at any time, and GCC wouldn't even see it.

$ cat makefile
t1:
   @echo passed in VENV=$$VENV VMAK=$(VMAK)
t2:
   @echo env oride; VENV=v1 make t1
   @echo mak oride; make VMAK=v2 t1
$ make
passed in VENV= VMAK=
$ VENV=env_from_cli make VMAK=make_from_cli
passed in VENV=env_from_cli VMAK=make_from_cli
$
$ make t2
env oride
make[1]: Entering directory `/tmp'
passed in VENV=v1 VMAK=
make[1]: Leaving directory `/tmp'
mak oride
make[1]: Entering directory `/tmp'
passed in VENV= VMAK=v2
make[1]: Leaving directory `/tmp'
$
$ VENV=env_from_cli make VMAK=make_from_cli t2
env oride
make[1]: Entering directory `/tmp'
passed in VENV=v1 VMAK=make_from_cli
make[1]: Leaving directory `/tmp'
mak oride
make[1]: Entering directory `/tmp'
passed in VENV=env_from_cli VMAK=v2
make[1]: Leaving directory `/tmp'


ICE in get_constraint_for_component_ref

2011-02-09 Thread Mohamed Shafi
Hi all,

I am trying to port a private target in GCC 4.5.1. Following are the
properties of the target

#define BITS_PER_UNIT   32
#define BITS_PER_WORD32
#define UNITS_PER_WORD   1


#define CHAR_TYPE_SIZE32
#define SHORT_TYPE_SIZE   32
#define INT_TYPE_SIZE 32
#define LONG_TYPE_SIZE32
#define LONG_LONG_TYPE_SIZE   32



I am getting an ICE
internal compiler error: in get_constraint_for_component_ref, at
tree-ssa-structalias.c:3031

For the following testcase:

struct fb_cmap {
 int start;
 int len;
 int *green;
};

extern struct fb_cmap fb_cmap;

void directcolor_update_cmap(void)
{
  fb_cmap.green[0] = 34;
}

The following is the output of debug_tree of the argument thats given
for the function get_constraint_for_component_ref


unit size 
align 32 symtab 0 alias set -1 canonical type
0x2b6a4554a498 precision 32 min  max 
pointer_to_this >
unsigned PQI size  unit size

align 32 symtab 0 alias set -1 canonical type 0x2b6a45559930>

arg 0 
unit size 
align 32 symtab 0 alias set -1 canonical type
0x2b6a45602888 fields  context

chain >
used public external common BLK file pr28675.c line 7 col 23
size  unit size 
align 32
chain 
public static QI file pr28675.c line 9 col 6 align 32
initial  result 
(mem:QI (symbol_ref:PQI ("directcolor_update_cmap") [flags
0x3] ) [0 S1
A32])
struct-function 0x2b6a455453f0>>
arg 1 
unsigned PQI file pr28675.c line 4 col 7 size  unit size 
align 32 offset_align 32
offset 
bit offset  context
>
pr28675.c:11:10>

I was wondering if this ICE is due to the fact that this is a 32bit
char target ? Can somebody help me with pointers to debug this issue?

Regards,
Shafi