Re: gcc and the kernel

2008-12-30 Thread Andi Kleen
"Brian O'Mahoney"  writes:
>
> Last, a word to the wise, compiler developers, are by nature fairly agressive 
> but, unless you want to work on gcc itself, it is wise to stay a bit behind 
> the bleeding edge, and, unless your systems are excellently backed up,
> ___DONT_BUILD_THE_KERNEL___ with and untested gcc. I mean a kernel you are 
> going to try to run. For an example of why not see the 'ix68 string 
> direction' fiasco noting that an interrupt can happen anywhere and that the 
> CPU x flag is a non-deterministic (to the interrupt code) part of the 
> interrupted context. Oh what fun!

This particular problem has nothing to do with the compiler that built the
kernel.

-Andi
-- 
a...@linux.intel.com


Code Motion after Machine Dependent Reorganization??

2008-12-30 Thread Balaji V. Iyer
Hello Everyone,
I am currently working on the OpenRISC port of GCC. There isn't much
significant backend optimization implemented, its just a straightforward
port.
 
Now, is it possible for the code to move between Basic blocks (or
even inside the basic blocks) after machine dependent reorganization
stage?  If so, how can I stop it from happening.. or can I? 
 
I printed out the RTL dump using the following code during the machine
dependent reorganization


FOR_EACH_BB(bb) {
for (insn = bb_head(bb); insn != bb_end(bb); insn = NEXT_INSN(insn))
{
   if (INSN_P(insn))
  print_rtl_single(insn);
}
}
 
Then I compared with the assembly output and the RTL-equivalent and they
do not come out in the same order.. A couple instructions were even
moved outside a basic-block... Am I going through the instruction chain
in the wrong way?
 
Any help is deeply appreciated!
 
Thanks,
 
Balaji V. Iyer.
-- 
 
Balaji V. Iyer
PhD Candidate, 
Center for Efficient, Scalable and Reliable Computing,
Department of Electrical and Computer Engineering,
North Carolina State University.




RE: Code Motion after Machine Dependent Reorganization??

2008-12-30 Thread Balaji V. Iyer
I forgot to mention one important part..I am using GCC 4.0.2


Hello Everyone,
I am currently working on the OpenRISC port of GCC. There isn't much
significant backend optimization implemented, its just a straightforward
port.
 
Now, is it possible for the code to move between Basic blocks (or
even inside the basic blocks) after machine dependent reorganization
stage?  If so, how can I stop it from happening.. or can I? 
 
I printed out the RTL dump using the following code during the machine
dependent reorganization


FOR_EACH_BB(bb) {
for (insn = bb_head(bb); insn != bb_end(bb); insn = NEXT_INSN(insn))
{
   if (INSN_P(insn))
  print_rtl_single(insn);
}
}
 
Then I compared with the assembly output and the RTL-equivalent and they
do not come out in the same order.. A couple instructions were even
moved outside a basic-block... Am I going through the instruction chain
in the wrong way?
 
Any help is deeply appreciated!
 
Thanks,
 
Balaji V. Iyer.
-- 
 
Balaji V. Iyer
PhD Candidate,
Center for Efficient, Scalable and Reliable Computing, Department of
Electrical and Computer Engineering, North Carolina State University.




Inlining behaviour in GCC

2008-12-30 Thread Kristian Spangsege
Hi

A simple example (see below) seems to reveal that GCC considers the
unoptimized size of a function rather that the optimized one, when
deciding whether it is small enough to be inlined. In fact, it shows
that GCC does consider the optimized size, but optimized based only on
its body, not the constant/literal arguments of a particular
invocation.

In the example below 'g1' does not get inline, but 'g2' does, although
they exapnd to the exact same thing, which is a single call to 'f',
because the argument is true. As can be seen, the "work around" is to
factor out the bulk of the function. That is, if a "bulky" function
has one or more special cases that each amount to little code, and can
be selected based on constant arguments, it is wise to factor out the
"bulky" part into a separate function. It would have been nice if GCC
had been able to "see" past this barrier, such that one would not have
to be aware of the limitation, but I guess there is a complexity issue
to take into account.

If one defines 'f' as an empty function and declares it 'inline', then
'g1' gets inlined too, ergo, GCC does some optimization before
considering its size for inlining, but as suggested above, it probably
does not optimize it further based on arguments of a particular
invocation, until it is actually selected to be inlined. It seems that
inlining is performed bottom-up, rather than top-down.

These observations are made with GCC version 4.3.2.


void f()

inline void g1(bool b)
{
  if(b) { f(); return; }
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
}

inline void h()
{
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
  f();f();f();f();f();f();f();f();f();f();f();f();
}

inline void g2(bool b)
{
  if(b) { f(); return; }
  h();
}

int main()
{
  g1(true);
  g2(true);
  return 0;
}


I assume that this behaviour of the GCC optimizer is a concious and
deliberate choice,
but I would by happy if someone could confirm it, and maybe comment briefly on
the rationale behind it.


Regards,
Kristian Spangsege


Re: Inlining behaviour in GCC

2008-12-30 Thread Richard Guenther
On Tue, Dec 30, 2008 at 1:08 PM, Kristian Spangsege
 wrote:
> Hi
>
> A simple example (see below) seems to reveal that GCC considers the
> unoptimized size of a function rather that the optimized one, when
> deciding whether it is small enough to be inlined. In fact, it shows
> that GCC does consider the optimized size, but optimized based only on
> its body, not the constant/literal arguments of a particular
> invocation.
>
> In the example below 'g1' does not get inline, but 'g2' does, although
> they exapnd to the exact same thing, which is a single call to 'f',
> because the argument is true. As can be seen, the "work around" is to
> factor out the bulk of the function. That is, if a "bulky" function
> has one or more special cases that each amount to little code, and can
> be selected based on constant arguments, it is wise to factor out the
> "bulky" part into a separate function. It would have been nice if GCC
> had been able to "see" past this barrier, such that one would not have
> to be aware of the limitation, but I guess there is a complexity issue
> to take into account.
>
> If one defines 'f' as an empty function and declares it 'inline', then
> 'g1' gets inlined too, ergo, GCC does some optimization before
> considering its size for inlining, but as suggested above, it probably
> does not optimize it further based on arguments of a particular
> invocation, until it is actually selected to be inlined. It seems that
> inlining is performed bottom-up, rather than top-down.
>
> These observations are made with GCC version 4.3.2.
>
>
> void f()
>
> inline void g1(bool b)
> {
>  if(b) { f(); return; }
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
> }
>
> inline void h()
> {
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
>  f();f();f();f();f();f();f();f();f();f();f();f();
> }
>
> inline void g2(bool b)
> {
>  if(b) { f(); return; }
>  h();
> }
>
> int main()
> {
>  g1(true);
>  g2(true);
>  return 0;
> }
>
>
> I assume that this behaviour of the GCC optimizer is a concious and
> deliberate choice,
> but I would by happy if someone could confirm it, and maybe comment briefly on
> the rationale behind it.

It's a missed optimization that should be fixed (somewhat) with GCC
4.4 through the
fixing and enabling of IPA-CP (interprocedural constant propagation).

Richard.


Re: Code Motion after Machine Dependent Reorganization??

2008-12-30 Thread Paul Brook
On Tuesday 30 December 2008, Balaji V. Iyer wrote:
> I forgot to mention one important part..I am using GCC 4.0.2

The first thing you should to is update to current gcc (preferably svn trunk). 
4.0.2 is really old and hasn't been maintained for quite some time.  There's 
a good chance things have been fixed or otherwise changed since then.

> Hello Everyone,
> I am currently working on the OpenRISC port of GCC. There isn't much
> significant backend optimization implemented, its just a straightforward
> port.

Paul


Re: Makefile support requested - enabling multilib for target

2008-12-30 Thread NightStrike
On Fri, Dec 26, 2008 at 5:07 PM, NightStrike  wrote:
> On Sun, Dec 21, 2008 at 2:38 PM, NightStrike  wrote:
>> Currently, gcc doesn't support a multilib build for win64.  I have
>> been looking at how to do this, and have so far come up with a
>> beginning to a solution.  The work done thus far is part of this PR:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38294
>>
>> The current blocker is in building libgcc.  At some point in building
>> libgcc, it gets down to the final compilation.  In doing so, gcc is
>> invoked with a -B option pointing to the 32-bit lib directory instead
>> of the 64-bit lib directory.  This works for building the 32-bit
>> libgcc, but not the 64-bit default version.
>>
>> From what I can tell, the -B option pointing to the 32-bit lib
>> directory is in $(GCC_FOR_TARGET).  Is that where it's supposed to be?
>>  Is there a way to make gcc search the right directory first (or at
>> all)?  What steps am I missing for enabling the multilib build?
>>
>> For reference, the system root and all of its libraries are installed into:
>>
>> $prefix/$target/lib and $prefix/$target/lib64, the latter of course
>> being the 64-bit version of all the libs.
>>
>
> Ping
>

Ping


Re: Code Motion after Machine Dependent Reorganization??

2008-12-30 Thread Ian Lance Taylor
"Balaji V. Iyer"  writes:

> I printed out the RTL dump using the following code during the machine
> dependent reorganization
>
>
> FOR_EACH_BB(bb) {
> for (insn = bb_head(bb); insn != bb_end(bb); insn = NEXT_INSN(insn))
> {
>if (INSN_P(insn))
>   print_rtl_single(insn);
> }
> }
>  
> Then I compared with the assembly output and the RTL-equivalent and they
> do not come out in the same order.. A couple instructions were even
> moved outside a basic-block... Am I going through the instruction chain
> in the wrong way?

The CFG is not valid at the point of the machine reorg pass, mainly
for historical reasons.  You can see all the insns reliably by doing
  for (insn = get_insns (); insn != NULL_RTX; insn = NEXT_INSN (insn))

Ian


RE: Code Motion after Machine Dependent Reorganization??

2008-12-30 Thread Balaji V. Iyer
Ian,
Thanks for your help. What I mainly want to do is to make some
hardware decisions by looking at the instructions inside a Basic block.
Ths is why I was using the "FOR_EACH_BB" function.

When and where can I intercept the RTL such that I can get the
RTL that matches the output assembly equivalents? I am willing to add my
own hook if necessary.

Thanks!

Balaji V. Iyer.


-- 
 
Balaji V. Iyer
PhD Candidate, 
Center for Efficient, Scalable and Reliable Computing,
Department of Electrical and Computer Engineering,
North Carolina State University.


-Original Message-
From: Ian Lance Taylor [mailto:i...@google.com] 
Sent: Tuesday, December 30, 2008 10:51 AM
To: Balaji V. Iyer
Cc: gcc@gcc.gnu.org
Subject: Re: Code Motion after Machine Dependent Reorganization??

"Balaji V. Iyer"  writes:

> I printed out the RTL dump using the following code during the machine

> dependent reorganization
>
>
> FOR_EACH_BB(bb) {
> for (insn = bb_head(bb); insn != bb_end(bb); insn = 
> NEXT_INSN(insn)) {
>if (INSN_P(insn))
>   print_rtl_single(insn);
> }
> }
>  
> Then I compared with the assembly output and the RTL-equivalent and 
> they do not come out in the same order.. A couple instructions were 
> even moved outside a basic-block... Am I going through the instruction

> chain in the wrong way?

The CFG is not valid at the point of the machine reorg pass, mainly for
historical reasons.  You can see all the insns reliably by doing
  for (insn = get_insns (); insn != NULL_RTX; insn = NEXT_INSN (insn))

Ian



Re: Code Motion after Machine Dependent Reorganization??

2008-12-30 Thread Ian Lance Taylor
"Balaji V. Iyer"  writes:

>   Thanks for your help. What I mainly want to do is to make some
> hardware decisions by looking at the instructions inside a Basic block.
> Ths is why I was using the "FOR_EACH_BB" function.
>
>   When and where can I intercept the RTL such that I can get the
> RTL that matches the output assembly equivalents? I am willing to add my
> own hook if necessary.

If you want to look at RTL which precisely matches the output
assembly, then you should use FINAL_PRESCAN_INSN.  You won't get basic
block markers, though.

If you want to look at RTL which is pretty close to the output
assembly, and for which the basic blocks are reasonably valid, then
you should write a pass which runs somewhere after the second
scheduling pass.

Ian


Re: Code Motion after Machine Dependent Reorganization??

2008-12-30 Thread Steven Bosscher
On Tue, Dec 30, 2008 at 8:38 PM, Ian Lance Taylor  wrote:
> "Balaji V. Iyer"  writes:
>
>>   Thanks for your help. What I mainly want to do is to make some
>> hardware decisions by looking at the instructions inside a Basic block.
>> Ths is why I was using the "FOR_EACH_BB" function.
>>
>>   When and where can I intercept the RTL such that I can get the
>> RTL that matches the output assembly equivalents? I am willing to add my
>> own hook if necessary.
>
> If you want to look at RTL which precisely matches the output
> assembly, then you should use FINAL_PRESCAN_INSN.  You won't get basic
> block markers, though.
>
> If you want to look at RTL which is pretty close to the output
> assembly, and for which the basic blocks are reasonably valid, then
> you should write a pass which runs somewhere after the second
> scheduling pass.

You can also just resurrect the CFG when you enter your machine reorg
pass.  See how ia64 does that, for example (grep for
compute_bb_for_insn).

Gr.
Steven


RE: Odd performance regression with -Os

2008-12-30 Thread Weddington, Eric

> -Original Message-
> From: Mark Mitchell [mailto:m...@codesourcery.com] 
> Sent: Monday, December 29, 2008 11:51 AM
> To: Andrew Haley
> Cc: Eric Botcazou; gcc@gcc.gnu.org; Georg-Johann Lay
> Subject: Re: Odd performance regression with -Os
> 
> Andrew Haley wrote:
> > Eric Botcazou wrote:
> >>> Thanks.  Are you holding this because we're in Stage 3?
> >> The patch was written very recently so I wanted to let it 
> go through a good 
> >> deal of internal testing.  Moveover I haven't measured its 
> impact on anything 
> >> else than Ada benchmarks (and on a patched 4.3 branch).  
> If people think that
> >> it would be worth having for the 4.4 release, I can port 
> it and conduct basic 
> >> testing with it on the mainline, but that's pretty much it.
> > 
> > Well, it's a fairly nasty regression on embedded targets with no
> > multiplier, where people are likely to use -Os.  Sounds to me like
> > it qualifies for 4.4
> 
> I agree.

I just tried Eric's patch  for 
the AVR on 4.3.2 (patch slightly modified to patch against the 4.3.2 release) 
and tested with Andrew's original test case. The AVR is a perfect target to 
test this as it is an 8-bit embedded processor, there are a number of variants 
that do not have multiply instructions, and almost all applications are 
compiled with -Os.

I compiled the original test case using -mmcu=at90usb82 -Os. It compiled to 15 
instructions at 32 bytes, with a call to __mulhi3. With the patch, the test 
case compiled to 10 instructions at 20 bytes, and no call to libgcc.

I haven't regtested the patch yet, but so far I like what I see.

Thanks,
Eric Weddington