date:20100222

RE: Function versioning tests?

2010-02-22 Thread Bingfeng Mei

Hi,
GCC 4.5 already contains such patch.  
http://gcc.gnu.org/ml/gcc-patches/2009-03/msg01186.html
If you are working on 4.4 branch, you can just apply the patch without problem.

Cheers,
Bingfeng

> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On 
> Behalf Of Ian Bolton
> Sent: 19 February 2010 17:09
> To: gcc@gcc.gnu.org
> Subject: Function versioning tests?
> 
> Hi there,
> 
> I've changed our private port of GCC to give versioned functions
> better names (rather than T.0, T.1), and was wondering if there
> are any existing tests that push function-versioning to the limit,
> so I can test whether my naming scheme is sound.
> 
> Failing that, I'd appreciate some pointers on how I might make
> such a test.  I know I need to be passing a constant in as a
> parameter, but I don't know what other criteria are required to
> make it trigger.
> 
> Cheers,
> Ian
> 
>

Re: Change x86 default arch for 4.5?

2010-02-22 Thread Andrew Haley

On 02/21/2010 12:13 PM, Richard Guenther wrote:
> On Sun, Feb 21, 2010 at 1:06 PM, Geert Bosch  wrote:
>>
>> On Feb 21, 2010, at 06:18, Steven Bosscher wrote:
>>> My point: gcc may fail to attract users (and/or may be losing users)
>>> when it tries to tailor to the needs of minorities.
>>>
>>> IMHO it would be much more reasonable to change the defaults to
>>> generate code that can run on, say, 95% of the computers still in use.
>>> If a user want to use the latest-and-greatest gcc for a really old
>>> machine, the burden of adding extra flags to change the default
>>> behavior of the compiler should be on that user.
>>>
>>> In this case of the i386 back end, that probably means changing the
>>> default to something like pentium3.
>>
>> The biggest change we need to make for x86 is to enable SSE2,
>> so we can get proper rounding behavior for float and double,
>> as well as significant performance increases.
> 
> I think Joseph fixed the rounding behavior for 4.5.  Also without an adjusted
> ABI you'd introduce x87 <-> SSE register moves which are not helpful
> for performance.

Exactly.  For example,

double plus(double a, double b)
{
  return a+b;
}

plus:
pushl   %ebp
movl%esp, %ebp
subl$8, %esp
movsd   16(%ebp), %xmm0
addsd   8(%ebp), %xmm0
movsd   %xmm0, -8(%ebp)
fldl-8(%ebp)
leave
ret

Andrew.

Re: Change x86 default arch for 4.5?

2010-02-22 Thread Richard Guenther

On Mon, Feb 22, 2010 at 11:27 AM, Andrew Haley  wrote:
> On 02/21/2010 12:13 PM, Richard Guenther wrote:
>> On Sun, Feb 21, 2010 at 1:06 PM, Geert Bosch  wrote:
>>>
>>> On Feb 21, 2010, at 06:18, Steven Bosscher wrote:
 My point: gcc may fail to attract users (and/or may be losing users)
 when it tries to tailor to the needs of minorities.

 IMHO it would be much more reasonable to change the defaults to
 generate code that can run on, say, 95% of the computers still in use.
 If a user want to use the latest-and-greatest gcc for a really old
 machine, the burden of adding extra flags to change the default
 behavior of the compiler should be on that user.

 In this case of the i386 back end, that probably means changing the
 default to something like pentium3.
>>>
>>> The biggest change we need to make for x86 is to enable SSE2,
>>> so we can get proper rounding behavior for float and double,
>>> as well as significant performance increases.
>>
>> I think Joseph fixed the rounding behavior for 4.5.  Also without an adjusted
>> ABI you'd introduce x87 <-> SSE register moves which are not helpful
>> for performance.
>
> Exactly.  For example,
>
> double plus(double a, double b)
> {
>  return a+b;
> }
>
> plus:
>        pushl   %ebp
>        movl    %esp, %ebp
>        subl    $8, %esp
>        movsd   16(%ebp), %xmm0
>        addsd   8(%ebp), %xmm0
>        movsd   %xmm0, -8(%ebp)
>        fldl    -8(%ebp)
>        leave
>        ret

Yep.  As the issue only concerns return values we could start to
return in both %sp0 and %xmm0 for externally visible functions.
That would still have a spurios set of %sp0 and FP stack adjustment
but the caller could use %xmm0 and hope performance wouldn't
be affected too much.  Of course that's an ABI change that is
only compatible in one direction.

Richard.

Register Allocation Pref. in 3.3.3

2010-02-22 Thread Paulo J. Matos

Hi,

For anyone who still remembers what went on with 3.3.3, in global.c,
set_preference, why is there a bias to set preference for operand 0 of
src?
It is not intuitive and I there's no comment regarding this so I guess
there is some 'assumption' gcc makes regarding the order of operands.

Two generic questions regarding this version:
- Does the order in which define_insn and define_expand rules show up
in the .md file bias the compiler to choose a rule in one way or the
other?
- If two define_insn patterns match the same insn, which one will be used?

Cheers,

-- 
PMatos

Re: Change x86 default arch for 4.5?

2010-02-22 Thread Andrew Haley

On 02/22/2010 12:29 AM, Erik Trulsson wrote:
> On Sun, Feb 21, 2010 at 11:35:11PM +, Dave Korn wrote:
>> On 21/02/2010 22:42, Erik Trulsson wrote:
>>
>>> Yes, it does if the user is using binaries compiled by somebody else,
>>> and that somebody else did not explicitly specify any CPU-flags.
>>>
>>> I believe that is the situation when installing most
>>> Linux-distributions for example.
>>
>>   No, surely not.  The linux distributions use configure options
>> when they package their compilers to choose the default with-cpu
>> and with-arch options, and those are quite deliberately chosen
>> according to the binary standards of the distro.  It is hardly a
>> case of "somebody else did not explicitly specify" cpu flags; they
>> in fact explicitly specified them according to the system
>> requirements for the distro.  If your distro says it doesn't
>> support i386, this is *why*!
> 
> Are you sure of that?  Really sure?
> Some Linux distributions almost certainly do as you describe, but all
> of them?  I doubt it.

And I doubt otherwise.  Linux distros put a great deal of thought into
which machines they are targeting with their binary distributions.
And the existence of one tiny distro somewhere that doesn't would not
change that fact.

Andrew.

RE: Function versioning tests?

2010-02-22 Thread Ian Bolton

Hi Bingfeng.

Thanks for pointing me at that patch, but it doesn't do what I require,
which is probably fortunate because I would have wasted work otherwise!

My change incorporates the callee function name and caller function
name into the new name (e.g. bar.0.foo), so that we can trace that
name back to the original non-versioned function in the source, without
requiring additional debugging information; a standard build contains
all the info we need because it is held in the function names.

Are there any versioning tests for the patch you mention?

Cheers,
Ian

> -Original Message-
> From: Bingfeng Mei [mailto:b...@broadcom.com]
> Sent: 22 February 2010 09:58
> To: Ian Bolton; gcc@gcc.gnu.org
> Subject: RE: Function versioning tests?
> 
> Hi,
> GCC 4.5 already contains such patch.  http://gcc.gnu.org/ml/gcc-
> patches/2009-03/msg01186.html
> If you are working on 4.4 branch, you can just apply the patch without
> problem.
> 
> Cheers,
> Bingfeng
> 
> > -Original Message-
> > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On
> > Behalf Of Ian Bolton
> > Sent: 19 February 2010 17:09
> > To: gcc@gcc.gnu.org
> > Subject: Function versioning tests?
> >
> > Hi there,
> >
> > I've changed our private port of GCC to give versioned functions
> > better names (rather than T.0, T.1), and was wondering if there
> > are any existing tests that push function-versioning to the limit,
> > so I can test whether my naming scheme is sound.
> >
> > Failing that, I'd appreciate some pointers on how I might make
> > such a test.  I know I need to be passing a constant in as a
> > parameter, but I don't know what other criteria are required to
> > make it trigger.
> >
> > Cheers,
> > Ian
> >
> >

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Michael Matz

Hi,

On Sun, 21 Feb 2010, Jon Turner wrote:

> I have recently encountered a gross inaccuracy in gprof that
> I can't explain. Yes, I know gprof uses a sampling technique

This is incorrect.  Code compiled with -pg will call mcount on each 
function entry.  If there are many calls (compared to other computations) 
the mcount overhead might become fairly large.

> so I should not expect a high level of precision, but the results
> I am getting clearly reflect a more fundamental issue.
> 
> The program in question has been compiled with -pg for all
> source code files. The time command reports 20 seconds of user
> time (which is consistent with personal observation) but
> the gprof output accounts for only about 6 seconds of the
> execution time.

As others have mentioned, code compiled without -pg or in shared libraries 
(independend of being compiled with -pg) won't be in the profile.  That 
might be the other reason of inconsistencies.

Ciao,
Michael.

Re: Register Allocation Pref. in 3.3.3

2010-02-22 Thread Joern Rennecke


Quoting "Paulo J. Matos" :


Hi,

For anyone who still remembers what went on with 3.3.3, in global.c,
set_preference, why is there a bias to set preference for operand 0 of
src?


I don't remember the detail of this specific code, but in general operand 0
is mostly used as an output operand; if an output operand can be assigned
a register, it is likely to be needed in a subsequent instruction to do
something with it.


- Does the order in which define_insn and define_expand rules show up
in the .md file bias the compiler to choose a rule in one way or the
other?


You may not have more than one pattern with the same name.  define_insn
patterns will not be used for instruction pattern nor split point
recognition.


- If two define_insn patterns match the same insn, which one will be used?


The first one to be recognized.  This is not necessarily the same as the first
one in the file, because insn code numbers are cached, and in general  
not recomputed during or after reload.  If the insn code caching has  
an effect

on eventual instruction selection, it is usually a bug in the machine
description.

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Joern Rennecke


Quoting Michael Matz :


Hi,

On Sun, 21 Feb 2010, Jon Turner wrote:


I have recently encountered a gross inaccuracy in gprof that
I can't explain. Yes, I know gprof uses a sampling technique


This is incorrect.  Code compiled with -pg will call mcount on each
function entry.  If there are many calls (compared to other computations)
the mcount overhead might become fairly large.


The mcount overhead actually depends on the machine description, although
most ports have standardized on a very runtime profligate scheme.

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Jon Turner


You're not listening. I am using -pg and the program is statically
linked. The concern I am raising is not about the function counting,
but the reported running times, which are determined by sampling
(read the gprof manual, if this is news to you).

In this case, the mcount overhead cannot account for the discrepancy,
since that would cause gprof to OVER-estimate the run time, while
in this case it is UNDER-estimating. It's missing about 70% of the
actual running time in the program.

It conceivably I am doing something wrong. I hope so, since
once I know what it is, I can fix it. But at the moment,
it's hard to avoid the suspicion that something about the
gprof implementation is deeply flawed.

Jon

Joern Rennecke wrote:

Quoting Michael Matz :


Hi,

On Sun, 21 Feb 2010, Jon Turner wrote:


I have recently encountered a gross inaccuracy in gprof that
I can't explain. Yes, I know gprof uses a sampling technique


This is incorrect.  Code compiled with -pg will call mcount on each
function entry.  If there are many calls (compared to other computations)
the mcount overhead might become fairly large.


The mcount overhead actually depends on the machine description, although
most ports have standardized on a very runtime profligate scheme.

RE: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Maucci, Cyrille

Hi Jon,

What does ldd says about your executable?

Thanks
++Cyrille

-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jon 
Turner
Sent: Monday, February 22, 2010 3:44 PM
To: Joern Rennecke
Cc: Michael Matz; gcc@gcc.gnu.org
Subject: Re: Gprof can account for less than 1/3 of execution time?!?!

You're not listening. I am using -pg and the program is statically
linked. The concern I am raising is not about the function counting,
but the reported running times, which are determined by sampling
(read the gprof manual, if this is news to you).

In this case, the mcount overhead cannot account for the discrepancy,
since that would cause gprof to OVER-estimate the run time, while
in this case it is UNDER-estimating. It's missing about 70% of the
actual running time in the program.

It conceivably I am doing something wrong. I hope so, since
once I know what it is, I can fix it. But at the moment,
it's hard to avoid the suspicion that something about the
gprof implementation is deeply flawed.

Jon

Joern Rennecke wrote:
> Quoting Michael Matz :
> 
>> Hi,
>>
>> On Sun, 21 Feb 2010, Jon Turner wrote:
>>
>>> I have recently encountered a gross inaccuracy in gprof that
>>> I can't explain. Yes, I know gprof uses a sampling technique
>>
>> This is incorrect.  Code compiled with -pg will call mcount on each
>> function entry.  If there are many calls (compared to other computations)
>> the mcount overhead might become fairly large.
> 
> The mcount overhead actually depends on the machine description, although
> most ports have standardized on a very runtime profligate scheme.

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Axel Freyn

Hi Jon,
On Mon, Feb 22, 2010 at 08:43:31AM -0600, Jon Turner wrote:
> You're not listening. I am using -pg and the program is statically
> linked. The concern I am raising is not about the function counting,
> but the reported running times, which are determined by sampling
> (read the gprof manual, if this is news to you).
>
> In this case, the mcount overhead cannot account for the discrepancy,
> since that would cause gprof to OVER-estimate the run time, while
> in this case it is UNDER-estimating. It's missing about 70% of the
> actual running time in the program.

Well, іn fact I expect gprof to UNDER-estimate the runtime: The
processor needs time in order to do the profiling (count, how often a
function was called from which function; measure the times, ...) Now:

 - the external "time" command will INCLUDE theses times into the
   reported runtime (it can't distinguish between operations done by
   your code and by the profiling-code)

 - I would expect the profiling code to EXCLUDE theses times - because
   they only pollute the profiling data: When I try to optimize the code
   based on the profiling-data, I'm not interested in the question how
   much time was used for the profiling-code, but I'm only interested in
   the time used in MY code.

So: 
 - "time" will give the total runtime (your code + profiling-code)
 - the gprof-output will (probably) give the runtime of your code,
   without the profiling code
and that means, that gprof should underestimate the runtime.

Well, that's what I believe, I never checked it carefully.

In addition, you have another place where gprof underestimates the
runtime (as already mentioned): 
 - when you link a library which was compiled without -pg, its runtime
   will be neglected by gprof, but not by "time". Are the
   standard-libraries (glibc for example) to which you link compiled
   with "-pg" ? if not, time spend in "printf" or "std::cout" or ...
   will be neglected by gprof too.
 - when you call a Kernel-function, which runs in Kernel space, it will
   be included by "time" - but not by "gprof".


HTH,

Axel

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Dave Korn

On 22/02/2010 14:43, Jon Turner wrote:
> You're not listening. I am using -pg and the program is statically
> linked. 

> It conceivably I am doing something wrong. I hope so, since
> once I know what it is, I can fix it. 

  Providing a simple reproducible testcase with steps to reproduce the problem
would be the way to proceed now.  Then everyone will be able to help you
figure out whether or not you're making any mis-steps, and if not, will have a
working example to help them debug what's up with gprof.

cheers,
  DaveK

Re: Change x86 default arch for 4.5?

2010-02-22 Thread Dave Korn

On 22/02/2010 11:04, Andrew Haley wrote:
> On 02/22/2010 12:29 AM, Erik Trulsson wrote:
>> On Sun, Feb 21, 2010 at 11:35:11PM +, Dave Korn wrote:
>>> On 21/02/2010 22:42, Erik Trulsson wrote:
>>>
 Yes, it does if the user is using binaries compiled by somebody else,
 and that somebody else did not explicitly specify any CPU-flags.

 I believe that is the situation when installing most
 Linux-distributions for example.
>>>   No, surely not.  The linux distributions use configure options
>>> when they package their compilers to choose the default with-cpu
>>> and with-arch options, and those are quite deliberately chosen
>>> according to the binary standards of the distro.  It is hardly a
>>> case of "somebody else did not explicitly specify" cpu flags; they
>>> in fact explicitly specified them according to the system
>>> requirements for the distro.  If your distro says it doesn't
>>> support i386, this is *why*!
>> Are you sure of that?  Really sure?
>> Some Linux distributions almost certainly do as you describe, but all
>> of them?  I doubt it.
> 
> And I doubt otherwise.  Linux distros put a great deal of thought into
> which machines they are targeting with their binary distributions.
> And the existence of one tiny distro somewhere that doesn't would not
> change that fact.

  Actually, there are probably dozens or hundreds of tiny distros out there
that are basically somebody's home-made repackaging-and-minor-variant of
existing bundles.  However, I think that these are the same distros that
constitute the 90% referred to in Sturgeon's Law, and so I would still not see
that as any reason to change GCC's defaults!

cheers,
  DaveK

is -fno-toplevel-reorder going to deprecate

2010-02-22 Thread Sergey Yakoushkin

Hello,

I'm cross-compiling glibc(eglibc) for new processor.
As far as I can see -fno-toplevel-reorder option is critical for
successful build.
Without option some files (initfini.c, source for crt*.o) can be miscompiled.

I've heard that option might become deprecated in future gcc versions
(e.g. 4.6).
Although, I don't have any evidence. Could you please clarify?

Regards,
Sergey Yakoushkin

Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2

2010-02-22 Thread Christoph Rupp

2010/2/20 Richard Guenther :
> On Sat, Feb 20, 2010 at 1:38 PM, Christoph Rupp  wrote:
[...]
>> I fixed all warnings regarding dereferencing type-punned pointers and
>> I compile with -O3 AND -fno-strict-aliasing.
>>
>> and i still get the same crash as earlier. it does not crash with -O2.
>>
>> From what i understand the -fno-strict-aliasing should solve the
>> problem, but it doesn't.
>>
>> The function which crashes basically is searching a bitmap for a
>> pattern. When the search starts, it uses two pointers to the same
>> memory location (u64 ptr for searching in 8byte-steps, char* ptr for
>> searching byte-wise). If i understand the aliasing correctly then this
>> might cause the problems. I just don't know how to rewrite the
>> function - I want to use those two pointers for performance reasons. I
>> don't even know if the problem is in this function or if it's
>> somewhere else and the crash is just a side-effect of a completely
>> different problem.
>
> With -fno-strict-aliasing this is perfectly valid so the problem must be
> elsewhere.  It might be alignment related if you do not make sure
> that the u64 accesses are properly aligned.  Try -O3 -fno-tree-vectorize
> or analyze the crash.

If i enable -fno-tree-vectorize for one file (that's the file which
produces the crash) everything works.

For me this workaround is fine, although this file is performance
relevant and i'd like to have every optimization that's available.

I don't really understand how alignment issues could cause this crash.
i'm only building for x86_64 and i was not aware of any alignment
requirements (i'd love to test on a SPARC, but sadly i don't have
access to one...)

BTW - if i execute this in valgrind then it doesn't crash or give warnings.

For me this looks cheesy - there are no gcc warnings, -O2 works, -O3
crashes. MSVC works and older gcc versions also work fine. In this
case it's hard for me not to blame gcc for the crash.

Anyway - thanks for your help!
Christoph

>
> Richard.
>
>> To reproduce:
>> wget http://crupp.de/hamsterdb-1.1.3.tar.gz
>> tar -zxvf hamsterdb-1.1.3.tar.gz
>> cd hamsterdb-1.1.3
>> ./configure
>> make
>> cd unittests
>> ./test # <-- will segfault with a bad pointer
>>
>> Here's my gcc version:
>> Using built-in specs.
>> Target: x86_64-linux-gnu
>> Configured with: ../src/configure -v --with-pkgversion='Ubuntu
>> 4.4.1-4ubuntu9'
>> --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs
>> --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
>> --enable-shared --enable-multiarch --enable-linker-build-id
>> --with-system-zlib --libexecdir=/usr/lib --without-included-gettext
>> --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4
>> --program-suffix=-4.4 --enable-nls --enable-clocale=gnu
>> --enable-libstdcxx-debug --enable-objc-gc --disable-werror
>> --with-arch-32=i486 --with-tune=generic --enable-checking=release
>> --build=x86_64-linux-gnu --host=x86_64-linux-gnu
>> --target=x86_64-linux-gnu
>> Thread model: posix
>> gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)
>>
>> Thanks for any help,
>> Christoph
>>
>>
>> 2010/1/14 Jonathan Wakely :
>>> 2010/1/14 Christoph Rupp:

 To reproduce, these steps are necessary:

 wget http://crupp.de/dl/hamsterdb-1.1.1.tar.gz
 tar -zxvf hamsterdb-1.1.1.tar.gz
 cd hamsterdb-1.1.1
 ./configure --enable-internal
 make
>>>
>>> There are lots of these warnings, which you ignore at your peril:
>>>
>>> freelist.c:3326: warning: dereferencing type-punned pointer will break
>>> strict-aliasing rules
>>>
>>> ham_info.c:80: warning: dereferencing type-punned pointer will break
>>> strict-aliasing rules
>>>
>>> env.cpp:1804: warning: dereferencing type-punned pointer will break
>>> strict-aliasing rules
>>>
>>> You should probably either fix those warnings, avoid compiling at high
>>> optimisation levels, or use -fno-strict-aliasing (which allows the
>>> tests to run successfully.)
>>>
>>
>

Re: is -fno-toplevel-reorder going to deprecate

2010-02-22 Thread Ian Lance Taylor

Sergey Yakoushkin  writes:

> I'm cross-compiling glibc(eglibc) for new processor.
> As far as I can see -fno-toplevel-reorder option is critical for
> successful build.
> Without option some files (initfini.c, source for crt*.o) can be miscompiled.
>
> I've heard that option might become deprecated in future gcc versions
> (e.g. 4.6).
> Although, I don't have any evidence. Could you please clarify?

Where did you hear that?

I have not heard of any plans to deprecate -fno-toplevel-reorder.  As
you say, it is required for certains kinds of use.  The gcc build even
uses it itself.

It's possible that you have this confused with -fno-unit-at-a-time.
That option is deprecated.  It is being replaced with
-fno-toplevel-reorder and -fno-section-anchors.

Ian

Re: Change x86 default arch for 4.5?

2010-02-22 Thread Toon Moene


Dave Korn wrote:


On 21/02/2010 20:03, Martin Guy wrote:



The point about defaults is that the GCC default tends to filter down
into the default for distributions; 


  I'd find it surprising if that was really the way it happens; don't
distributions make deliberate and conscious decisions about binary standards
and things like that?  They certainly ought to be doing so, IMO, rather than
just going with whatever-the-compiler-does-when-you-don't-tell-it-what-to-do.


This is Debian Testing as of last Sunday - note that it has had 
--with-arch-32=i486 for more time than I care to remember ...


hir...@super:~$ gfortran -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.2-9' 
--with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs 
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr 
--enable-shared --enable-multiarch --enable-linker-build-id 
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 
--program-suffix=-4.4 --enable-nls --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-objc-gc --with-arch-32=i486 
--with-tune=generic --enable-checking=release --build=x86_64-linux-gnu 
--host=x86_64-linux-gnu --target=x86_64-linux-gnu

Thread model: posix
gcc version 4.4.3 20100108 (prerelease) (Debian 4.4.2-9)

Hope this helps,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html

Re: is -fno-toplevel-reorder going to deprecate

2010-02-22 Thread Sergey Yakoushkin

Hi, Ian

I mean exactly -fno-toplevel-reorder option. There are some compilers
which don't support it.
If option is going to deprecate in gcc in near future as well, than it
make sense to consider changes in glibc(eglibc).
So, there are no plans to deprecate option. Did I understand correctly?

Sergey Y.

2010/2/22 Ian Lance Taylor :
> Sergey Yakoushkin  writes:
>
>> I'm cross-compiling glibc(eglibc) for new processor.
>> As far as I can see -fno-toplevel-reorder option is critical for
>> successful build.
>> Without option some files (initfini.c, source for crt*.o) can be miscompiled.
>>
>> I've heard that option might become deprecated in future gcc versions
>> (e.g. 4.6).
>> Although, I don't have any evidence. Could you please clarify?
>
> Where did you hear that?
>
> I have not heard of any plans to deprecate -fno-toplevel-reorder.  As
> you say, it is required for certains kinds of use.  The gcc build even
> uses it itself.
>
> It's possible that you have this confused with -fno-unit-at-a-time.
> That option is deprecated.  It is being replaced with
> -fno-toplevel-reorder and -fno-section-anchors.
>
> Ian

Re: Change x86 default arch for 4.5?

2010-02-22 Thread Dave Korn

On 22/02/2010 20:32, Toon Moene wrote:

[ Not singling you out here, yours just happens to be the latest reply: ]

> Dave Korn wrote:
> 
>> On 21/02/2010 20:03, Martin Guy wrote:
> 
>>> The point about defaults is that the GCC default tends to filter down
>>> into the default for distributions; 
>>
>>   I'd find it surprising if that was really the way it happens; don't
>> distributions make deliberate and conscious decisions about binary
>> standards and things like that?

> This is Debian Testing 

  Enough with the examples, already!  It was a rhetorical question!  :-)

cheers,
  DaveK

Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2

2010-02-22 Thread Janis Johnson

On Mon, 2010-02-22 at 21:17 +0100, Christoph Rupp wrote:
> 2010/2/20 Richard Guenther :
> > On Sat, Feb 20, 2010 at 1:38 PM, Christoph Rupp  wrote:
> [...]
> >> I fixed all warnings regarding dereferencing type-punned pointers and
> >> I compile with -O3 AND -fno-strict-aliasing.
> >>
> >> and i still get the same crash as earlier. it does not crash with -O2.
> >>
> >> From what i understand the -fno-strict-aliasing should solve the
> >> problem, but it doesn't.
> >>
> >> The function which crashes basically is searching a bitmap for a
> >> pattern. When the search starts, it uses two pointers to the same
> >> memory location (u64 ptr for searching in 8byte-steps, char* ptr for
> >> searching byte-wise). If i understand the aliasing correctly then this
> >> might cause the problems. I just don't know how to rewrite the
> >> function - I want to use those two pointers for performance reasons. I
> >> don't even know if the problem is in this function or if it's
> >> somewhere else and the crash is just a side-effect of a completely
> >> different problem.
> >
> > With -fno-strict-aliasing this is perfectly valid so the problem must be
> > elsewhere.  It might be alignment related if you do not make sure
> > that the u64 accesses are properly aligned.  Try -O3 -fno-tree-vectorize
> > or analyze the crash.
> 
> If i enable -fno-tree-vectorize for one file (that's the file which
> produces the crash) everything works.
> 
> For me this workaround is fine, although this file is performance
> relevant and i'd like to have every optimization that's available.
> 
> I don't really understand how alignment issues could cause this crash.
> i'm only building for x86_64 and i was not aware of any alignment
> requirements (i'd love to test on a SPARC, but sadly i don't have
> access to one...)
> 
> BTW - if i execute this in valgrind then it doesn't crash or give warnings.
> 
> For me this looks cheesy - there are no gcc warnings, -O2 works, -O3
> crashes. MSVC works and older gcc versions also work fine. In this
> case it's hard for me not to blame gcc for the crash.
> 
> Anyway - thanks for your help!
> Christoph

Vector instructions usually require aligned data.  Perhaps the
vectorizer assumes that a pointer to 64-bit data will only be used
to access data that is 64-bit aligned.

If you can reproduce the problem with a small, self-contained test then
please file a bug report.  It might be possible to issue a warning or
to detect that the loop should not be vectorized.  If not, maybe the
compiler should disable vectorization for -fno-strict-aliasing.

Janis

> >
> > Richard.
> >
> >> To reproduce:
> >> wget http://crupp.de/hamsterdb-1.1.3.tar.gz
> >> tar -zxvf hamsterdb-1.1.3.tar.gz
> >> cd hamsterdb-1.1.3
> >> ./configure
> >> make
> >> cd unittests
> >> ./test # <-- will segfault with a bad pointer
> >>
> >> Here's my gcc version:
> >> Using built-in specs.
> >> Target: x86_64-linux-gnu
> >> Configured with: ../src/configure -v --with-pkgversion='Ubuntu
> >> 4.4.1-4ubuntu9'
> >> --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs
> >> --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
> >> --enable-shared --enable-multiarch --enable-linker-build-id
> >> --with-system-zlib --libexecdir=/usr/lib --without-included-gettext
> >> --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4
> >> --program-suffix=-4.4 --enable-nls --enable-clocale=gnu
> >> --enable-libstdcxx-debug --enable-objc-gc --disable-werror
> >> --with-arch-32=i486 --with-tune=generic --enable-checking=release
> >> --build=x86_64-linux-gnu --host=x86_64-linux-gnu
> >> --target=x86_64-linux-gnu
> >> Thread model: posix
> >> gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)
> >>
> >> Thanks for any help,
> >> Christoph
> >>
> >>
> >> 2010/1/14 Jonathan Wakely :
> >>> 2010/1/14 Christoph Rupp:
> 
>  To reproduce, these steps are necessary:
> 
>  wget http://crupp.de/dl/hamsterdb-1.1.1.tar.gz
>  tar -zxvf hamsterdb-1.1.1.tar.gz
>  cd hamsterdb-1.1.1
>  ./configure --enable-internal
>  make
> >>>
> >>> There are lots of these warnings, which you ignore at your peril:
> >>>
> >>> freelist.c:3326: warning: dereferencing type-punned pointer will break
> >>> strict-aliasing rules
> >>>
> >>> ham_info.c:80: warning: dereferencing type-punned pointer will break
> >>> strict-aliasing rules
> >>>
> >>> env.cpp:1804: warning: dereferencing type-punned pointer will break
> >>> strict-aliasing rules
> >>>
> >>> You should probably either fix those warnings, avoid compiling at high
> >>> optimisation levels, or use -fno-strict-aliasing (which allows the
> >>> tests to run successfully.)
> >>>
> >>
> >

Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2

2010-02-22 Thread Andrew Pinski

On Mon, Feb 22, 2010 at 1:06 PM, Janis Johnson  wrote:
> If you can reproduce the problem with a small, self-contained test then
> please file a bug report.  It might be possible to issue a warning or
> to detect that the loop should not be vectorized.  If not, maybe the
> compiler should disable vectorization for -fno-strict-aliasing.

It is not an aliasing issue but an alignment issue.  Anyways this is
most likely the same as
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43009 .

Thanks,
Andrew Pinski

Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2

2010-02-22 Thread Janis Johnson

On Mon, 2010-02-22 at 13:11 -0800, Andrew Pinski wrote:
> On Mon, Feb 22, 2010 at 1:06 PM, Janis Johnson  wrote:
> > If you can reproduce the problem with a small, self-contained test then
> > please file a bug report.  It might be possible to issue a warning or
> > to detect that the loop should not be vectorized.  If not, maybe the
> > compiler should disable vectorization for -fno-strict-aliasing.
> 
> It is not an aliasing issue but an alignment issue.  Anyways this is
> most likely the same as
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43009 .

Yes, that's the problem I had in mind but I was thinking about an
explicit cast to a pointer to more-aligned data in the function that has
the vector loop.  There's no way to warn about the undefined behavior
when the cast is in a different source file.

It's interesting that two reports of failure due to this same undefined
behavior come so close together.

Janis

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Nathan Froyd

On Mon, Feb 22, 2010 at 03:23:52PM -0600, Jon Turner wrote:
> In it, you will find a directory with all the source code
> needed to observe the problem for yourself.
> The top level directory contains a linux executable called
> maxFlo, which you should be able to run on a linux box
> as is. But if you want/need to compile things yourself,
> type "make clean" and "make all" in the top level
> directory and you should get a fresh copy of maxFlo.

So, compiling maxFlo with no -pg option:

@nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo 

real0m3.465s
user0m3.460s
sys 0m0.000s

Compiling maxFlo with -pg option:

@nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo 

real0m9.780s
user0m9.760s
sys 0m0.010s

Notice that ~60% of the running time with gprof enabled is simply
overhead from call counting and the like.  That time isn't recorded by
gprof.  That alone accounts for your report about gprof ignoring 2/3 of
the execution time.

Checking to see whether maxFlo is a dynamic executable (since you
claimed earlier that you were statically linking your program):

@nightcrawler:~/src/gprof-trouble-case$ ldd ./maxFlo 
  linux-vdso.so.1 =>  (0x7fff2977f000)
  libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7fb422c21000)
  libm.so.6 => /lib/libm.so.6 (0x7fb42299d000)
  libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x7fb422786000)
  libc.so.6 => /lib/libc.so.6 (0x7fb422417000)
  /lib64/ld-linux-x86-64.so.2 (0x7fb422f31000)

So calls to shared library functions (such as functions in libm) will
not be caught by gprof.  Those calls count account for a significant
amount of running time of your program and gprof can't tell you about
them.

Inspecting the gmon.out file:

@nightcrawler:~/src/gprof-trouble-case$ gprof maxFlo gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 16.09  0.37 0.3727649 0.00 0.00  shortPath::findPath()
 12.61  0.66 0.29 55889952 0.00 0.00  graph::next(int,int) const
 11.96  0.94 0.28 61391904 0.00 0.00  graph::mate(int,int) const
 10.87  1.19 0.25 58654752 0.00 0.00  flograph::res(int,int) 
const
 10.44  1.43 0.24 _fini
  6.96  1.59 0.16 65055289 0.00 0.00  graph::term(int) const
  6.96  1.75 0.16 61391904 0.00 0.00  digraph::tail(int) const
[...lots of stuff elided...]
  0.00  2.30 0.001 0.00 0.00  graph

gprof is telling you about 2.3 seconds of your execution time.  With the
factors above accounted for, that doesn't seem unreasonable.

-Nathan

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Jon Turner


Ok, this is not as simple as I would like to make it,
but hopefully it's simple enough. I've placed a tar file at

www.arl.wustl.edu/~jst/gprof-tarfile

In it, you will find a directory with all the source code
needed to observe the problem for yourself.
The top level directory contains a linux executable called
maxFlo, which you should be able to run on a linux box
as is. But if you want/need to compile things yourself,
type "make clean" and "make all" in the top level
directory and you should get a fresh copy of maxFlo.

The basic test, with my results appears below.

% time maxFlo
20.356u 0.001s 0:20.38 99.8%0+0k 0+0io 0pf+0w

Note, that's 20 seconds of user time. Now for gprof,

% maxFlo
% gprof maxFlo | more
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 18.87  1.17 1.1727649 0.00 0.00  shortPath::findPath()
 18.39  2.31 1.14 58654752 0.00 0.00  flograph::res(int, int) 
const
 13.87  3.17 0.86 61391904 0.00 0.00  graph::mate(int, int) 
const
 12.18  3.93 0.76 55889952 0.00 0.00  graph::next(int, int) 
const
  6.29  4.32 0.39 61391904 0.00 0.00  graph::left(int) const
  5.65  4.67 0.35 65055289 0.00 0.00  graph::term(int) const
  5.16  4.99 0.32 61391904 0.00 0.00  digraph::tail(int) const
  4.03  5.24 0.25 70488952 0.00 0.00  graph::n() const
  3.63  5.46 0.23  9232153 0.00 0.00  list::operator&=(int)
  2.10  5.59 0.13  9165337 0.00 0.00  list::operator<<=(int)
  1.94  5.71 0.12 58654752 0.00 0.00  graph::m() const
  1.45  5.80 0.0927649 0.00 0.00  list::makeSpace()
  1.29  5.88 0.08  9165337 0.00 0.00  list::operator[](int) 
const
  1.21  5.96 0.08  9165337 0.00 0.00  graph::first(int) const
  0.81  6.01 0.05  2737152 0.00 0.00  flograph::addFlow(int, 
int, int)
  0.81  6.06 0.0527648 0.00 0.00  augPath::augment()
  0.65  6.10 0.04  2737152 0.00 0.00  min(int, int)
  0.48  6.13 0.03 15466777 0.00 0.00  flograph::src() const
  0.48  6.16 0.03 fatal(char*)
  0.24  6.17 0.02 operator<<(std::ostream&, 
list const&)
  0.16  6.18 0.01  9287496 0.00 0.00  flograph::snk() const
  0.16  6.19 0.01  9165338 0.00 0.00  list::empty() const
  0.16  6.20 0.01 list::clear()
  0.00  6.20 0.0027649 0.00 0.00  list::freeSpace()
  0.00  6.20 0.0027649 0.00 0.00  list::list(int)
  0.00  6.20 0.0027649 0.00 0.00  list::~list()
  0.00  6.20 0.00 1140 0.00 0.00  digraph::join(int, int)
  0.00  6.20 0.00 1140 0.00 0.00  flograph::changeCap(int, 
int)
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _Z7badCasei
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN4list9makeSpaceEv
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN4misc6cflushERSic
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN5graph9makeSpaceEv
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN7augPathC2ER8flographRi
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN7digraph9makeSpaceEv
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN8flograph9makeSpaceEv
  0.00  6.20 0.001 0.00 0.00  global constructors keyed 
to _ZN9shortPathC2ER8flograph
Ri
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  
__static_initialization_and_destruction_0(int, int)
  0.00  6.20 0.001 0.00 0.00  badCase(int)
  0.00  6.20 0.001 0.00 0.00  graph::makeSpace()
  0.00  6.20 0.001 0.00 0.00  graph::graph(int, int)
  0.00  6.20 0.001

Re: is -fno-toplevel-reorder going to deprecate

2010-02-22 Thread Ian Lance Taylor

Sergey Yakoushkin  writes:

> I mean exactly -fno-toplevel-reorder option. There are some compilers
> which don't support it.

Yes.  The option was introduced in gcc 4.2.

> If option is going to deprecate in gcc in near future as well, than it
> make sense to consider changes in glibc(eglibc).
> So, there are no plans to deprecate option. Did I understand correctly?

Correct.  There are no plans to deprecate the -fno-toplevel-reorder
option.

I still don't know why you think that there are plans to deprecate it.

Ian

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Jon Turner


Doh! Thanks, Nathan. I think you put your finger on it.
I was well aware of the overhead that gprof can introduce,
but did not recognize that this overhead was not being
counted by gprof.

Jon

Nathan Froyd wrote:

On Mon, Feb 22, 2010 at 03:23:52PM -0600, Jon Turner wrote:

In it, you will find a directory with all the source code
needed to observe the problem for yourself.
The top level directory contains a linux executable called
maxFlo, which you should be able to run on a linux box
as is. But if you want/need to compile things yourself,
type "make clean" and "make all" in the top level
directory and you should get a fresh copy of maxFlo.


So, compiling maxFlo with no -pg option:

@nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo 


real0m3.465s
user0m3.460s
sys 0m0.000s

Compiling maxFlo with -pg option:

@nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo 


real0m9.780s
user0m9.760s
sys 0m0.010s

Notice that ~60% of the running time with gprof enabled is simply
overhead from call counting and the like.  That time isn't recorded by
gprof.  That alone accounts for your report about gprof ignoring 2/3 of
the execution time.

Checking to see whether maxFlo is a dynamic executable (since you
claimed earlier that you were statically linking your program):

@nightcrawler:~/src/gprof-trouble-case$ ldd ./maxFlo 
  linux-vdso.so.1 =>  (0x7fff2977f000)

  libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7fb422c21000)
  libm.so.6 => /lib/libm.so.6 (0x7fb42299d000)
  libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x7fb422786000)
  libc.so.6 => /lib/libc.so.6 (0x7fb422417000)
  /lib64/ld-linux-x86-64.so.2 (0x7fb422f31000)

So calls to shared library functions (such as functions in libm) will
not be caught by gprof.  Those calls count account for a significant
amount of running time of your program and gprof can't tell you about
them.

Inspecting the gmon.out file:

@nightcrawler:~/src/gprof-trouble-case$ gprof maxFlo gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 16.09  0.37 0.3727649 0.00 0.00  shortPath::findPath()

 12.61  0.66 0.29 55889952 0.00 0.00  graph::next(int,int) const
 11.96  0.94 0.28 61391904 0.00 0.00  graph::mate(int,int) const
 10.87  1.19 0.25 58654752 0.00 0.00  flograph::res(int,int) 
const
 10.44  1.43 0.24 _fini
  6.96  1.59 0.16 65055289 0.00 0.00  graph::term(int) const
  6.96  1.75 0.16 61391904 0.00 0.00  digraph::tail(int) const
[...lots of stuff elided...]
  0.00  2.30 0.001 0.00 0.00  graph

gprof is telling you about 2.3 seconds of your execution time.  With the
factors above accounted for, that doesn't seem unreasonable.

-Nathan

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Daniel Jacobowitz

On Mon, Feb 22, 2010 at 05:09:53PM -0600, Jon Turner wrote:
> Doh! Thanks, Nathan. I think you put your finger on it.
> I was well aware of the overhead that gprof can introduce,
> but did not recognize that this overhead was not being
> counted by gprof.

gprof generally does not have any support for shared libraries.  It
will ignore profiling samples that lie outside the executable.  And in
this case, that includes _mcount (which is in libc.so.6).  That's
probably why.

-- 
Daniel Jacobowitz
CodeSourcery

Re: Gprof can account for less than 1/3 of execution time?!?!

2010-02-22 Thread Andreas Schwab

Daniel Jacobowitz  writes:

> gprof generally does not have any support for shared libraries.  It
> will ignore profiling samples that lie outside the executable.  And in
> this case, that includes _mcount (which is in libc.so.6).  That's
> probably why.

Even when statically linked, gprof will always ignore the time spent in
the _mcount (or equivalent) function.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Building GCC 4.3.3 for ARM.

2010-02-22 Thread Vincent Forbes

I am trying to build an eabi cross tool-chain for the arm using
version 4.3.3.  I noticed from earlier mailing list posts that the
configuration flag --disable-libunwind-exceptions is not working as
intended and that --without-system-unwind is the preferred flag.  I
wonder what is the change in the GCC build between the two flags.  If
the host system does not contain the stock libunwind, then does gcc
use its defaults and does this --without flag explicitly tell GCC to
use its version of libunwind?

Also, some colleagues of mine are running into problems when linking
compiling with optimization turned on.  After a check, I suspect that
the option --enable-target-optspace is compiling libgcc with space
optimization and any program linking to it will cause errors.  The
make check output also fails in tests where optimization is turned on.

V. Forbes

RE: Function versioning tests?

Re: Change x86 default arch for 4.5?

Re: Change x86 default arch for 4.5?

Register Allocation Pref. in 3.3.3

Re: Change x86 default arch for 4.5?

RE: Function versioning tests?

Re: Gprof can account for less than 1/3 of execution time?!?!

Re: Register Allocation Pref. in 3.3.3

Re: Gprof can account for less than 1/3 of execution time?!?!

Re: Gprof can account for less than 1/3 of execution time?!?!

RE: Gprof can account for less than 1/3 of execution time?!?!

Re: Gprof can account for less than 1/3 of execution time?!?!

Re: Gprof can account for less than 1/3 of execution time?!?!

Re: Change x86 default arch for 4.5?

is -fno-toplevel-reorder going to deprecate

Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2

Re: is -fno-toplevel-reorder going to deprecate

Re: Change x86 default arch for 4.5?

Re: is -fno-toplevel-reorder going to deprecate

Re: Change x86 default arch for 4.5?

Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2

Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2

Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2

Re: Gprof can account for less than 1/3 of execution time?!?!

Re: Gprof can account for less than 1/3 of execution time?!?!

Re: is -fno-toplevel-reorder going to deprecate

Re: Gprof can account for less than 1/3 of execution time?!?!

Re: Gprof can account for less than 1/3 of execution time?!?!

Re: Gprof can account for less than 1/3 of execution time?!?!

Building GCC 4.3.3 for ARM.

30 matches

Site Navigation

Mail list logo

Footer information