Re: Help with PPC Relocation options

2011-08-01 Thread Rohit Arul Raj
On Mon, Aug 1, 2011 at 12:12 PM, Rohit Arul Raj  wrote:
> Hello All,
>
> I compiled a simple 1.c file with -mpcu=e500mc64 option and while
> trying to create a relocatable, i am getting the following error:
>
> $powerpc-elf-ld.exe -static -r 1.o
> powerpc-elf-ld.exe: Relocatable linking with relocations from format
> elf64-powerpc (1.o) to format elf32-powerpc (a.out) is not supported
>
>
> $ powerpc-elf-ld.exe -static -r 1.o --oformat elf64-powerpc
> powerpc-elf-ld.exe: Relocatable linking with relocations from format
> elf64-powerpc (1.o) to format elf64-powerpc (a.out) is not supported
>
> Is relocatable linking not allowed for 64bit PPC?
>
> Regards,
> Rohit
>

Yeah got it.. "-m elf64ppc"

I was trying with '-m64' option which worked with  'powerpc-linux'
tool chain but doesn't work with 'powerpc-elf' toolchain.

Thanks,
Rohit


Re: onlinedocs formated text too small to read

2011-08-01 Thread Georg-Johann Lay
Jon Grant wrote:
> Hello
> 
> Georg-Johann Lay wrote, On 08/07/11 19:08:
> [.]
>> I can confirm that it's hardly readable on some systems.
>> I use Opera and several FF versions, some worse, some a bit less worse.
>>
>> IMO it's definitely to small, I already thought about complaining, too.
>>
>> Johann
> 
> Could I ask, what would be the best way to progress this request? e.g.
> Should I create a bugzilla ticket.
> 
> Best regards, Jon

http://gcc.gnu.org/ml/gcc/2011-07/msg00106.html

CCed Gerald, I think he cares for that kind of things.

If he does not answer (it's vacation time) file a PR so that it won't be 
forgotten.

Johann



Re: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jakub Jelinek
On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote:
> It's quite necessary to solve the general problem in middle-end rather than 
> in back-end.

That's what we disagree on.  All back-ends but ARM are able to handle it
right, why can't ARM too?  The ABI rules for stack handling in the epilogues
are simply too diverse and complex to be handled easily in the scheduler.

Jakub


Re: PATCH RFA: Build stages 2 and 3 with C++

2011-08-01 Thread Richard Guenther
2011/8/1 Marc Glisse :
> On Fri, 15 Jul 2011, Ian Lance Taylor wrote:
>
>> I would like to propose this patch as a step toward building gcc using a
>> C++ compiler.  This patch builds stage1 with the C compiler as usual,
>> and defaults to building stages 2 and 3 with a C++ compiler built during
>> stage 1.  This means that the gcc installed and used by most people will
>> be built by a C++ compiler.  This will ensure that gcc is fully
>> buildable with C++, while retaining the ability to bootstrap with only a
>> C compiler, not a C++ compiler.
>
> Nice step. Now that gcc can (mostly) build with g++, it would be great if it
> could build with a non-gnu compiler. More precisely, with a compiler that
> doesn't define __GNUC__. Indeed, the code is quite different in this case,
> as can be seen trying to compile gcc with CC='gcc -U__GNUC__' and CXX='g++
> -U__GNUC__' (there are other reasons why this won't work, but at least it
> shows some of the same issues I see with sunpro).
>
>
> To start with, the obstack_free macro casts a pointer to an int -> error.
> /data/repos/gcc/trunk/libcpp/directives.c:2048:7: error: cast from ‘char*’
> to ‘int’ loses precision [-fpermissive]
>
>
> Then, ENUM_BITFIELD(cpp_ttype) is expanded to unsigned int instead of the
> enum, and conversions from int to enum require an explicit cast in C++,
> giving many errors like:
> /data/repos/gcc/trunk/libcpp/charset.c:1615:79: error: invalid conversion
> from ‘unsigned int’ to ‘cpp_ttype’ [-fpermissive]
> /data/repos/gcc/trunk/libcpp/charset.c:1371:1: error:   initializing
> argument 5 of ‘bool cpp_interpret_string(cpp_reader*, const cpp_string*,
> size_t, cpp_string*, cpp_ttype)’ [-fpermissive]
>
> Do we want to add a cast in almost every place a field declared with
> ENUM_BITFIELD is used? That's quite a lot of places, everywhere in gcc...
> The alternative would be to store the full enum instead of a bitfield (just
> for stage1 so that's not too bad), but some comments in the code seem to
> advise against it.

I think it's the only viable solution (use the full enum for a non-GCC stage1
C++ compiler).  We could help it somewhat by at least placing
enum bitfields first/last in our bitfield groups.

Any other opinions?

Btw, thanks for trying non-GCC stage1 compilers ;)

Richard.


Re: PATCH RFA: Build stages 2 and 3 with C++

2011-08-01 Thread Joseph S. Myers
On Mon, 1 Aug 2011, Richard Guenther wrote:

> I think it's the only viable solution (use the full enum for a non-GCC stage1
> C++ compiler).  We could help it somewhat by at least placing
> enum bitfields first/last in our bitfield groups.

Are GCC and other compilers declaring that they support the GNU C and C++ 
languages by defining __GNUC__ really the only compilers with this 
extension?  Feature tests for particular features are generally better 
than testing for whether the compiler in use is GCC.  (Using configure 
tests for things in ansidecl.h does require checking where in the gcc and 
src repositories those things are used, to make sure that the relevant 
configure tests are used everywhere necessary.)

(Actually, C++03 appears to support enum bit-fields - it's only for C that 
they are a GNU extension - so can't we just enable them unconditionally 
when building as C++?)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PATCH RFA: Build stages 2 and 3 with C++

2011-08-01 Thread Richard Guenther
On Mon, Aug 1, 2011 at 11:53 AM, Joseph S. Myers
 wrote:
> On Mon, 1 Aug 2011, Richard Guenther wrote:
>
>> I think it's the only viable solution (use the full enum for a non-GCC stage1
>> C++ compiler).  We could help it somewhat by at least placing
>> enum bitfields first/last in our bitfield groups.
>
> Are GCC and other compilers declaring that they support the GNU C and C++
> languages by defining __GNUC__ really the only compilers with this
> extension?  Feature tests for particular features are generally better
> than testing for whether the compiler in use is GCC.  (Using configure
> tests for things in ansidecl.h does require checking where in the gcc and
> src repositories those things are used, to make sure that the relevant
> configure tests are used everywhere necessary.)
>
> (Actually, C++03 appears to support enum bit-fields - it's only for C that
> they are a GNU extension - so can't we just enable them unconditionally
> when building as C++?)

Oh, sure - that's even better.

Richard.


Re: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Joseph S. Myers
On Mon, 1 Aug 2011, Jakub Jelinek wrote:

> On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote:
> > It's quite necessary to solve the general problem in middle-end rather than 
> > in back-end.
> 
> That's what we disagree on.  All back-ends but ARM are able to handle it
> right, why can't ARM too?  The ABI rules for stack handling in the epilogues
> are simply too diverse and complex to be handled easily in the scheduler.

Given that the long-standing open bugs relating to scheduling and stack 
adjustments (30282, 38644) include issues for Power that do not yet appear 
to have been fixed, even if other back ends are able to handle it right 
they don't seem to do so at present.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PATCH RFA: Build stages 2 and 3 with C++

2011-08-01 Thread Marc Glisse

On Mon, 1 Aug 2011, Joseph S. Myers wrote:


On Mon, 1 Aug 2011, Richard Guenther wrote:


I think it's the only viable solution (use the full enum for a non-GCC stage1
C++ compiler).  We could help it somewhat by at least placing
enum bitfields first/last in our bitfield groups.


Are GCC and other compilers declaring that they support the GNU C and C++
languages by defining __GNUC__ really the only compilers with this
extension?  Feature tests for particular features are generally better
than testing for whether the compiler in use is GCC.  (Using configure
tests for things in ansidecl.h does require checking where in the gcc and
src repositories those things are used, to make sure that the relevant
configure tests are used everywhere necessary.)


I just checked, and indeed sunpro supports this extension as well in C.


(Actually, C++03 appears to support enum bit-fields - it's only for C that
they are a GNU extension - so can't we just enable them unconditionally
when building as C++?)


Great, I didn't know that. That's a much better solution.

--
Marc Glisse


RE: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jiangning Liu
The answer is ARM can. However, if you look into the bugs PR30282 and 
PR38644, PR44199, you may find in history, there are several different cases

in different ports reporting the similar failures, covering x86, PowerPC and

ARM. You are right, they were all fixed in back-ends in the past, but we
should 
fix the bug in a general way to make GCC infrastructure stronger, rather 
than fixing the problem target-by-target and case-by-case! If you further 
look into the back-end fixes in x86 and PowerPC, you may find they looks 
quite similar in back-ends. 

Thanks,
-Jiangning

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
> On Behalf Of Jakub Jelinek
> Sent: Monday, August 01, 2011 5:12 PM
> To: Jiangning Liu
> Cc: 'Joern Rennecke'; gcc@gcc.gnu.org; gcc-patc...@gcc.gnu.org;
> vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana
> Radhakrishnan; 'Ramana Radhakrishnan'
> Subject: Re: [RFC] Add middle end hook for stack red zone size
> 
> On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote:
> > It's quite necessary to solve the general problem in middle-end rather
than in
> back-end.
> 
> That's what we disagree on.  All back-ends but ARM are able to handle it
> right, why can't ARM too?  The ABI rules for stack handling in the
epilogues
> are simply too diverse and complex to be handled easily in the scheduler.
> 
>   Jakub






Re: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Richard Earnshaw
On 01/08/11 10:11, Jakub Jelinek wrote:
> On Mon, Aug 01, 2011 at 11:44:04AM +0800, Jiangning Liu wrote:
>> It's quite necessary to solve the general problem in middle-end rather than 
>> in back-end.
> 
> That's what we disagree on.  All back-ends but ARM are able to handle it
> right, why can't ARM too?  The ABI rules for stack handling in the epilogues
> are simply too diverse and complex to be handled easily in the scheduler.

Because the vast majority of back-ends (ie those without red zones)
shouldn't have to deal with this mess.  This is something the MI code
should be able to work out and deal with itself.  Then the compiler will
at least generate safe code, rather than randomly moving things about
and allowing potentially unsafe writes/reads from beyond the allocated
stack region.

We should build the compiler defensively, but then allow for more
aggressive optimizations to disable the defences when the back-end wants
to take on the responsibility.  Not the other way around.

R.



Re: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jakub Jelinek
On Mon, Aug 01, 2011 at 06:14:27PM +0800, Jiangning Liu wrote:
> ARM. You are right, they were all fixed in back-ends in the past, but we
> should 
> fix the bug in a general way to make GCC infrastructure stronger, rather 
> than fixing the problem target-by-target and case-by-case! If you further 
> look into the back-end fixes in x86 and PowerPC, you may find they looks 
> quite similar in back-ends. 
> 

Red zone is only one difficulty, your patch is e.g. completely ignoring
existence of biased stack pointers (e.g. SPARC -m64 has them).
Some targets have stack growing in opposite direction, etc.
We have really a huge amount of very diverse ABIs and making the middle-end
grok what is an invalid stack access is difficult.

Jakub


Re: Defining constraint for registers tuple

2011-08-01 Thread Kirill Yukhin
> Don't change the constraint, just add an alternative.  Or use a
> different insn with an insn predicate.

This is misunderstanding beacuse of my great English :)

I am not going to update existing constraint. I am going to implement new one.
Actually, I am looking for some expample, where similar constraint
might be implemented already.

--
Thanks, K


Re: Re: patch: don't issue -Wreorder warnings when order doesn't matter

2011-08-01 Thread Daniel Marjamäki
> What if the object being constructed has only POD-type members with constant
> initializers but is declared volatile

I don't understand really... but it doesn't matter, I give up.


Re: Performance degradation on g++ 4.6

2011-08-01 Thread Oleg Smolsky

Hi Benjamin,

On 2011/7/30 06:22, Benjamin Redelings I wrote:

I had some performance degradation with 4.6 as well.

However, I was able to cure it by using -finline-limit=800 or 1000 I 
think.  However, this lead to a code size increase.  Were the old 
higher-performance binaries larger?

Yes, the older binary for the degraded test was indeed larger: 107K vs 88K.

However, I have just re-built and re-run the test and there was no 
significant difference in performance. IE the degradation in 
"simple_types_constant_folding" test remains when building with 
-finline-limit=800 (or =1000)


IIRC, setting finline-limit=n actually sets two params to n/2, but I 
think you may only need to change 1 to get the old performance back.  
--param max-inline-insns-single defaults to 450, but --param 
max-inline-insns-auto defaults to 90.  Perhaps you can get the old 
performance back by adjusting just one of these two parameters, or by 
setting them to different values, instead of the same value, as would 
be achieved by -finline-limit.
"--param max-inline-insns-auto=800" by itself does not help. The 
"--param max-inline-insns-single=800 --param max-inline-insns-auto=1000" 
combination makes no significant difference either.


BTW, some of these tweaks increase the binary size to 99K, yet there is 
no performance increase.


Oleg.


Re: Performance degradation on g++ 4.6

2011-08-01 Thread Marc Glisse

On Mon, 1 Aug 2011, Oleg Smolsky wrote:

BTW, some of these tweaks increase the binary size to 99K, yet there is no 
performance increase.


I don't see this in the thread: did you use -march=native?

--
Marc Glisse


Do I need some Python stuff to build trunk as of 177065 ?

2011-08-01 Thread Toon Moene

See:

http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00117.html

Kind regards,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: Do I need some Python stuff to build trunk as of 177065 ?

2011-08-01 Thread Marc Glisse

On Mon, 1 Aug 2011, Toon Moene wrote:


See:

http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00117.html


Er, the python thing only tells you your system has a broken symlink but 
ignores it. Did you check in libgcc/config.log for the real error?


--
Marc Glisse


Re: Do I need some Python stuff to build trunk as of 177065 ?

2011-08-01 Thread Toon Moene

On 08/01/2011 08:45 PM, Marc Glisse wrote:


On Mon, 1 Aug 2011, Toon Moene wrote:


See:

http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00117.html


Er, the python thing only tells you your system has a broken symlink but
ignores it. Did you check in libgcc/config.log for the real error?


Oops - sorry for the noise.

The relevant config.log says:

cc1: error: unrecognized command line option '-mnolzcnt'

Kind regards,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: Do I need some Python stuff to build trunk as of 177065 ?

2011-08-01 Thread Toon Moene

On 08/01/2011 08:56 PM, Toon Moene wrote:


On 08/01/2011 08:45 PM, Marc Glisse wrote:


On Mon, 1 Aug 2011, Toon Moene wrote:


See:

http://gcc.gnu.org/ml/gcc-testresults/2011-08/msg00117.html


Er, the python thing only tells you your system has a broken symlink but
ignores it. Did you check in libgcc/config.log for the real error?


Oops - sorry for the noise.

The relevant config.log says:

cc1: error: unrecognized command line option '-mnolzcnt'


Would this, per chance, have been caused by revision 177034 ?

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: IRA vs CANNOT_CHANGE_MODE_CLASS, + 4.7 IRA regressions?

2011-08-01 Thread Sandra Loosemore

On 07/29/2011 12:13 PM, Vladimir Makarov wrote:

On 07/27/2011 05:59 PM, Sandra Loosemore wrote:


[snip]

So, here's my question. Is it worthwhile for me to continue this
approach of trying to make the MIPS backend smarter? Or is the way IRA
deals with CANNOT_CHANGE_MODE_CLASS fundamentally broken and in need
of fixing in a target-inspecific way? And/or is there some other
regression in IRA on mainline that's causing it to spill to memory
when it didn't used to in 4.6?


I think the second ("fixing in a target-inspecific way"). Instead of
prohibiting class for a pseudo (that what is happening for class
FP_REGS) because the pseudo can change its mode, impossibility of
changing mode should be reflected in the class cost (through some reload
cost evaluation).

I'll try to fix it. The only problem is that it will take sometime
because the fix should be tested on a few platforms. It would be nice to
make PR not to forget about the problem.


Thanks for offering to look into this.  I created

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49936

with my test case and WIP patch for the MIPS backend.  At this point I'm 
thinking that the additional memory spills I'm seeing on mainline are 
not related to CANNOT_CHANGE_MODE_CLASS at all but are just some other 
regression in the register allocator compared to 4.6.  It might be 
useful to try to confirm/isolate that problem first.


-Sandra


libgcc: strange optimization

2011-08-01 Thread Michael Walle
Hi list,


consider the following test code:
 static void inline f1(int arg)
 {
   register int a1 asm("r8") = 10;
   register int a2 asm("r1") = arg;

   asm("scall" : : "r"(a1), "r"(a2));
 }

 void f2(int arg)
 {
   f1(arg >> 10);
 }


If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
email), the a1 = 10; assignment is optimized away. According to my
understanding the following happens:

 1) function inlining
 2) deferred argument evaluation
 3) because our target has no barrel shifter, (arg >> 10) is emitted as a
function call to libgcc's __ashrsi3 (_in place_!)
 4) BAM! dead code elimination optimizes r8 assignment away because calli
may clobber r1-r10 (callee saved registers on lm32).

If you use:
 void f2(int arg)
 {
   f1(__ashrsi3(arg, 10));
 }
everything works as expected, __ashrsi3 is evaluated before the body of f1.

According to wikipedia [1], function calls are sequence points and all
side effects for the arguments are completed before entering the function.
So in my understanding the deferred argument evaluation is wrong if that
operation is emitted as a call to a libgcc helper.

I tried that on other architectures too (microblaze and avr). All show the
same behaviour. If an integer arithmetic opcode is translated to a call to
libgcc, every assignment to a register which is clobbered by the call is
optimized away.

The GCC mentions some caveats when using explicit register variables [2]:
  In the above example, beware that a register that is call-clobbered by
  the target ABI will be overwritten by any function call in the
  assignment, including library calls for arithmetic operators. Also a
  register may be clobbered when generating some operations, like variable
  shift, memory copy or memory move on x86. Assuming it is a call-clobbered
  register, this may happen to r0 above by the assignment to p2. If you
  have to use such a register, use temporary variables for expressions
  between the register assignment.

But i think, this may not apply to the case above, where the arithmetic
operator is an argument of the called function. Eg. there is a sequence
point and the statements must not be reordered.


Assembler output (lm32-gcc -O1 -S -c test.c):
f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r2, r0, 10
calli__ashrsi3
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

Assembler output with no DCE (lm32-gcc -O1 -S -fno-dce -c test.c)
f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r8, r0, 10
addi r2, r0, 10
calli__ashrsi3
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

[1] http://en.wikipedia.org/wiki/Sequence_point
[2]
http://gcc.gnu.org/onlinedocs/gcc/Extended-
Asm.html#Example%20of%20asm%20with%20clobbered%20asm%20reg

-- 
Michael


Re: libgcc: strange optimization

2011-08-01 Thread Georg-Johann Lay

Michael Walle schrieb:

Hi list,

consider the following test code:
 static void inline f1(int arg)
 {
   register int a1 asm("r8") = 10;
   register int a2 asm("r1") = arg;

   asm("scall" : : "r"(a1), "r"(a2));
 }

 void f2(int arg)
 {
   f1(arg >> 10);
 }


If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
email), the a1 = 10; assignment is optimized away.


Your asm has no output operands and no side effects, with more 
aggressive optimization the whole ask would disappear.


What you want is maybe something like

   asm volatile ("scall" : : "r"(a1), "r"(a2));

Johann


Re: libgcc: strange optimization

2011-08-01 Thread Michael Walle

Hi,

That was quick :)

> Your asm has no output operands and no side effects, with more
> aggressive optimization the whole ask would disappear.
Sorry, that was just a small test file, the original code has output operands.

The new test code:
 static int inline f1(int arg)
 {
   register int ret asm("r1");
   register int a1 asm("r8") = 10;
   register int a2 asm("r1") = arg;

   asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory");

   return ret;
 }

 int f2(int arg1, int arg2)
 {
   return f1(arg1 >> 10);
 }

translates to the same assembly:
f2:
addi sp, sp, -4
sw   (sp+4), ra
addi r2, r0, 10
calli__ashrsi3
scall
lw   ra, (sp+4)
addi sp, sp, 4
bra

PS. R1 is the return register in the target architecture ABI.

-- 
Michael


Re: libgcc: strange optimization

2011-08-01 Thread Richard Henderson
On 08/01/2011 01:30 PM, Michael Walle wrote:
>  1) function inlining
>  2) deferred argument evaluation
>  3) because our target has no barrel shifter, (arg >> 10) is emitted as a
> function call to libgcc's __ashrsi3 (_in place_!)
>  4) BAM! dead code elimination optimizes r8 assignment away because calli
> may clobber r1-r10 (callee saved registers on lm32).

I'm afraid the only solution I can think of is to force F1 out-of-line.
That's the only safe way to make sure that arguments are completely
evaluated before forcing them into hard register variables.

Alternately, expose new constraints such that you don't need the
hard register variables at all.  E.g.

  asm("scall" : : "R08"(a1), "R01"(a2));

where Rxx is defined in constraints.md for every relevant register.
That'll prevent a reference to the hard register until register
allocation, at which point we'll have done the right thing with
the shift.


r~


Re: Line 0 Hack??

2011-08-01 Thread Gabriel Charette
Re-sending as plain text for gcc@gcc.gnu.org ...



Hi,

I have a question about the line 0 hack on line 13232 of gcc/cp/decl.c
(or just text search for "Hack", it's the only place it's found in
that file...).

From my revision history, Steven introduced this in 2005, and Tom
modified it in 2007 (probably when modifying the linemap).

The problem is that this very call to get the source_location of line
0 creates a NEW linemap entry in the line table, AFTER all of the
LC_LEAVE have taken place (i.e. we were done parsing and now add a new
linemap to the line_table)...

And hence, we finish the parsing with line_table->depth == 1.

In particular, I am building linemap serialization for pre-parsed
headers, and added what I think to be a fair gcc_assert that when we
serialize the line_table, it's depth should be 0. However, if we
happen to have "main" in the header (this is when we get in this block
in decl.c from my understanding as DECL_MAIN_P is then true), a new
linemap is added at the end of the line table in the header after the
LC_LEAVE... when merging this in the middle of a C file upon
deserializing, this entry makes no sense, but I can't just ignore it
either as a source_location has been handed off for it...

My question is, what is this "special line 0" is this just a hack for
this particular situation or is "line 0" a much more important concept
I can't mess around with?

I could potentially hack around it, but hacking around a hack can only
make things worst in the long run..

I'm not familiar with the "middle end warning" referred to in the
comment in decl.c, could we potentially once and for all get rid of
this hack?

Best,
Gabriel


[x32] Allow R_X86_64_64

2011-08-01 Thread H.J. Lu
Hi,

It turns out that x32 needs R_X86_64_64.  One major reason is
the displacement range of x32 is -2G to +2G.  It isn't a problem
for compiler since only small model is required for x32.

However, to address 0 to 4G directly in assembly code, we have
to use R_X86_64_64 with movabs.  I am checking the follow patch
into x32 psABI to allow R_X86_64_64.


-- 
H.J.
diff --git a/object-files.tex b/object-files.tex
index 3c9b9c6..7f0fd14 100644
--- a/object-files.tex
+++ b/object-files.tex
@@ -451,7 +451,7 @@ or \texttt{Elf32_Rel} relocation.
   \multicolumn{1}{c}{Calculation} \\
   \hline
   \texttt{R_X86_64_NONE}  & 0 & none & none \\
-  \texttt{R_X86_64_64} $^\dagger$ & 1 & \textit{word64} & \texttt{S + A} \\
+  \texttt{R_X86_64_64} & 1 & \textit{word64} & \texttt{S + A} \\
   \texttt{R_X86_64_PC32}  & 2 & \textit{word32} & \texttt{S + A - P} \\
   \texttt{R_X86_64_GOT32} & 3 & \textit{word32} & \texttt{G + A} \\
   \texttt{R_X86_64_PLT32} & 4 & \textit{word32} & \texttt{L + A - P} \\


A case that PRE optimization hurts performance

2011-08-01 Thread Jiangning Liu
Hi,

For the following simple test case, PRE optimization hoists computation
(s!=1) into the default branch of the switch statement, and finally causes
very poor code generation. This problem occurs in both X86 and ARM, and I
believe it is also a problem for other targets. 

int f(char *t) {
int s=0;

while (*t && s != 1) {
switch (s) {
case 0:
s = 2;
break;
case 2:
s = 1;
break;
default:
if (*t == '-') 
s = 1;
break;
}
t++;
}

return s;
}

Taking X86 as an example, with option "-O2" you may find 52 instructions
generated like below,

 :
   0:   55  push   %ebp
   1:   31 c0   xor%eax,%eax
   3:   89 e5   mov%esp,%ebp
   5:   57  push   %edi
   6:   56  push   %esi
   7:   53  push   %ebx
   8:   8b 55 08mov0x8(%ebp),%edx
   b:   0f b6 0amovzbl (%edx),%ecx
   e:   84 c9   test   %cl,%cl
  10:   74 50   je 62 
  12:   83 c2 01add$0x1,%edx
  15:   85 c0   test   %eax,%eax
  17:   75 23   jne3c 
  19:   8d b4 26 00 00 00 00lea0x0(%esi,%eiz,1),%esi
  20:   0f b6 0amovzbl (%edx),%ecx
  23:   84 c9   test   %cl,%cl
  25:   0f 95 c0setne  %al
  28:   89 c7   mov%eax,%edi
  2a:   b8 02 00 00 00  mov$0x2,%eax
  2f:   89 fb   mov%edi,%ebx
  31:   83 c2 01add$0x1,%edx
  34:   84 db   test   %bl,%bl
  36:   74 2a   je 62 
  38:   85 c0   test   %eax,%eax
  3a:   74 e4   je 20 
  3c:   83 f8 02cmp$0x2,%eax
  3f:   74 1f   je 60 
  41:   80 f9 2dcmp$0x2d,%cl
  44:   74 22   je 68 
  46:   0f b6 0amovzbl (%edx),%ecx
  49:   83 f8 01cmp$0x1,%eax
  4c:   0f 95 c3setne  %bl
  4f:   89 df   mov%ebx,%edi
  51:   84 c9   test   %cl,%cl
  53:   0f 95 c3setne  %bl
  56:   89 de   mov%ebx,%esi
  58:   21 f7   and%esi,%edi
  5a:   eb d3   jmp2f 
  5c:   8d 74 26 00 lea0x0(%esi,%eiz,1),%esi
  60:   b0 01   mov$0x1,%al
  62:   5b  pop%ebx
  63:   5e  pop%esi
  64:   5f  pop%edi
  65:   5d  pop%ebp
  66:   c3  ret
  67:   90  nop
  68:   b8 01 00 00 00  mov$0x1,%eax
  6d:   5b  pop%ebx
  6e:   5e  pop%esi
  6f:   5f  pop%edi
  70:   5d  pop%ebp
  71:   c3  ret

But with command line option "-O2 -fno-tree-pre", there are only 12
instructions generated, and the code would be very clean like below,

 :
   0:   55  push   %ebp
   1:   31 c0   xor%eax,%eax
   3:   89 e5   mov%esp,%ebp
   5:   8b 55 08mov0x8(%ebp),%edx
   8:   80 3a 00cmpb   $0x0,(%edx)
   b:   74 0e   je 1b 
   d:   80 7a 01 00 cmpb   $0x0,0x1(%edx)
  11:   b0 02   mov$0x2,%al
  13:   ba 01 00 00 00  mov$0x1,%edx
  18:   0f 45 c2cmovne %edx,%eax
  1b:   5d  pop%ebp
  1c:   c3  ret

Do you have any idea about this?

Thanks,
-Jiangning





Re: Line 0 Hack??

2011-08-01 Thread Gabriel Charette
I have removed the hack and the test output is identical to the clean
build test output.

See issue4835047 for the patch.

Gabriel

On Mon, Aug 1, 2011 at 2:56 PM, Gabriel Charette  wrote:
> Re-sending as plain text for gcc@gcc.gnu.org ...
>
> 
>
> Hi,
>
> I have a question about the line 0 hack on line 13232 of gcc/cp/decl.c
> (or just text search for "Hack", it's the only place it's found in
> that file...).
>
> From my revision history, Steven introduced this in 2005, and Tom
> modified it in 2007 (probably when modifying the linemap).
>
> The problem is that this very call to get the source_location of line
> 0 creates a NEW linemap entry in the line table, AFTER all of the
> LC_LEAVE have taken place (i.e. we were done parsing and now add a new
> linemap to the line_table)...
>
> And hence, we finish the parsing with line_table->depth == 1.
>
> In particular, I am building linemap serialization for pre-parsed
> headers, and added what I think to be a fair gcc_assert that when we
> serialize the line_table, it's depth should be 0. However, if we
> happen to have "main" in the header (this is when we get in this block
> in decl.c from my understanding as DECL_MAIN_P is then true), a new
> linemap is added at the end of the line table in the header after the
> LC_LEAVE... when merging this in the middle of a C file upon
> deserializing, this entry makes no sense, but I can't just ignore it
> either as a source_location has been handed off for it...
>
> My question is, what is this "special line 0" is this just a hack for
> this particular situation or is "line 0" a much more important concept
> I can't mess around with?
>
> I could potentially hack around it, but hacking around a hack can only
> make things worst in the long run..
>
> I'm not familiar with the "middle end warning" referred to in the
> comment in decl.c, could we potentially once and for all get rid of
> this hack?
>
> Best,
> Gabriel
>


RE: [RFC] Add middle end hook for stack red zone size

2011-08-01 Thread Jiangning Liu
Hi Jakub,

Appreciate for your valuable comments!

I think SPARC V9 ABI doesn't have red zone defined, right? So
stack_red_zone_size should be defined as zero by default, the scheduler
would block moving memory accesses across stack adjustment no matter what
the offset is. I don't see any risk here. Also, in my patch function *abs*
is being used to avoid the opposite stack direction issue as you mentioned.

Some people like you insist on the ABI diversity, and actually I agree with
you on this. But part of the ABI definition is general for all targets. The
point here is memory access beyond stack red zone should be avoided, which
is the general part of ABI that compiler should guarantee. For this general
part, middle end should take the responsibility.

Thanks,
-Jiangning

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Monday, August 01, 2011 6:31 PM
> To: Jiangning Liu
> Cc: 'Joern Rennecke'; gcc@gcc.gnu.org; gcc-patc...@gcc.gnu.org;
> vmaka...@redhat.com; dje@gmail.com; Richard Henderson; Ramana
> Radhakrishnan; 'Ramana Radhakrishnan'
> Subject: Re: [RFC] Add middle end hook for stack red zone size
> 
> On Mon, Aug 01, 2011 at 06:14:27PM +0800, Jiangning Liu wrote:
> > ARM. You are right, they were all fixed in back-ends in the past, but
> we
> > should
> > fix the bug in a general way to make GCC infrastructure stronger,
> rather
> > than fixing the problem target-by-target and case-by-case! If you
> further
> > look into the back-end fixes in x86 and PowerPC, you may find they
> looks
> > quite similar in back-ends.
> >
> 
> Red zone is only one difficulty, your patch is e.g. completely ignoring
> existence of biased stack pointers (e.g. SPARC -m64 has them).
> Some targets have stack growing in opposite direction, etc.
> We have really a huge amount of very diverse ABIs and making the
> middle-end
> grok what is an invalid stack access is difficult.
> 
>   Jakub






Re: Performance degradation on g++ 4.6

2011-08-01 Thread Xinliang David Li
Try isolate the int8_t constant folding testing from the rest to see
if the slow down can be reproduced with the isolated case. If the
problem disappear, it is likely due to the following inline
parameters:

large-function-insns, large-function-growth, large-unit-insns,
inline-unit-growth. For instance set

--param large-function-insns=1
--param large-unit-insns=2

David

On Mon, Aug 1, 2011 at 11:43 AM, Oleg Smolsky  wrote:
> On 2011/7/29 14:07, Xinliang David Li wrote:
>>
>> Profiling tools are your best friend here. If you don't have access to
>> any, the least you can do is to build the program with -pg option and
>> use gprof tool to find out differences.
>
> The test suite has a bunch of very basic C++ tests that are executed an
> enormous number of times. I've built one with the obvious performance
> degradation and attached the source, output and reports.
>
> Here are some highlights:
>    v4.1:    Total absolute time for int8_t constant folding: 30.42 sec
>    v4.6:    Total absolute time for int8_t constant folding: 43.32 sec
>
> Every one of the tests in this section had degraded... the first half more
> than the second. I am not sure how much further I can take this - the
> benchmarked code is very short and plain. I can post disassembly for one
> (some?) of them if anyone is willing to take a look...
>
> Thanks,
> Oleg.
>


Re: libgcc: strange optimization

2011-08-01 Thread Hans-Peter Nilsson
On Mon, 1 Aug 2011, Georg-Johann Lay wrote:
> Michael Walle schrieb:
> > Hi list,
> >
> > consider the following test code:
> >  static void inline f1(int arg)
> >  {
> >register int a1 asm("r8") = 10;
> >register int a2 asm("r1") = arg;
> >
> >asm("scall" : : "r"(a1), "r"(a2));
> >  }
> >
> >  void f2(int arg)
> >  {
> >f1(arg >> 10);
> >  }
> >
> >
> > If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
> > email), the a1 = 10; assignment is optimized away.
>
> Your asm has no output operands and no side effects, with more aggressive
> optimization the whole ask would disappear.

No, for the record that's not supposed to happen for asms
*without outputs*.

"If an @code{asm} has output operands, GCC assumes for
optimization purposes the instruction has no side effects except
to change the output
operands."

> What you want is maybe something like
>
>asm volatile ("scall" : : "r"(a1), "r"(a2));

For the code at hand, the scall should be described to both have
an output and be marked volatile, since the system call is a
side effect that GCC can't see and might otherwise optimize away
if the system call return value is unused.  A plain volatile
marking as the above should not be necessary, modulo gcc bugs.

The real problem is quite worrysome.  I don't think a port
(lm32) should have to solve it with constraints; the (inline)
function parameter *should* cause a non-clobbering temporary to
hold any intermediate operations, but it looks as if you'll
otherwise have to debug it yourself.

brgds, H-P


Re: libgcc: strange optimization

2011-08-01 Thread Hans-Peter Nilsson
On Mon, 1 Aug 2011, Richard Henderson wrote:

> On 08/01/2011 01:30 PM, Michael Walle wrote:
> >  1) function inlining
> >  2) deferred argument evaluation
> >  3) because our target has no barrel shifter, (arg >> 10) is emitted as a
> > function call to libgcc's __ashrsi3 (_in place_!)
> >  4) BAM! dead code elimination optimizes r8 assignment away because calli
> > may clobber r1-r10 (callee saved registers on lm32).
>
> I'm afraid the only solution I can think of is to force F1 out-of-line.

Or another temporary - but the parameter should already have
that effect.

brgds, H-P


Re: libgcc: strange optimization

2011-08-01 Thread Georg-Johann Lay
Michael Walle wrote:
> Hi,
> 
> That was quick :)
> 
>> Your asm has no output operands and no side effects, with more
>> aggressive optimization the whole ask would disappear.
> Sorry, that was just a small test file, the original code has output operands.
> 
> The new test code:
>  static int inline f1(int arg)
>  {
>register int ret asm("r1");
>register int a1 asm("r8") = 10;
>register int a2 asm("r1") = arg;
> 
>asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory");
> 
>return ret;
>  }
> 
>  int f2(int arg1, int arg2)
>  {
>return f1(arg1 >> 10);
>  }
> 
> translates to the same assembly:
> f2:
> addi sp, sp, -4
> sw   (sp+4), ra
> addi r2, r0, 10
> calli__ashrsi3
> scall
> lw   ra, (sp+4)
> addi sp, sp, 4
> bra
> 
> PS. R1 is the return register in the target architecture ABI.

I'd guess you ran into

http://gcc.gnu.org/onlinedocs/gcc/Local-Reg-Vars.html#Local-Reg-Vars

A common pitfall is to initialize multiple call-clobbered registers
with arbitrary expressions,  where a function call or  library call
for an arithmetic operator will overwrite a register value from a
previous assignment, for example r0 below:

 register int *p1 asm ("r0") = ...;
 register int *p2 asm ("r1") = ...;

In those cases, a solution is to use a temporary variable for each
arbitrary expression.

So I'd try to rewrite it as

static int inline f1 (int arg0)
{
int arg = arg0;
register int ret asm("r1");
register int a1 asm("r8") = 10;
register int a2 asm("r1") = arg;

asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory");

return ret;
}

and if that does not help the rather hackish

static int inline f1 (int arg0)
{
int arg = arg0;
register int ret asm("r1");
register int a1 asm("r8");
register int a2 asm("r1");

asm ("" : "+r" (arg));

a1 = 10;
a2 = arg;

asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory");

return ret;
}