RE: Function versioning tests?
Hi, GCC 4.5 already contains such patch. http://gcc.gnu.org/ml/gcc-patches/2009-03/msg01186.html If you are working on 4.4 branch, you can just apply the patch without problem. Cheers, Bingfeng > -Original Message- > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On > Behalf Of Ian Bolton > Sent: 19 February 2010 17:09 > To: gcc@gcc.gnu.org > Subject: Function versioning tests? > > Hi there, > > I've changed our private port of GCC to give versioned functions > better names (rather than T.0, T.1), and was wondering if there > are any existing tests that push function-versioning to the limit, > so I can test whether my naming scheme is sound. > > Failing that, I'd appreciate some pointers on how I might make > such a test. I know I need to be passing a constant in as a > parameter, but I don't know what other criteria are required to > make it trigger. > > Cheers, > Ian > >
Re: Change x86 default arch for 4.5?
On 02/21/2010 12:13 PM, Richard Guenther wrote: > On Sun, Feb 21, 2010 at 1:06 PM, Geert Bosch wrote: >> >> On Feb 21, 2010, at 06:18, Steven Bosscher wrote: >>> My point: gcc may fail to attract users (and/or may be losing users) >>> when it tries to tailor to the needs of minorities. >>> >>> IMHO it would be much more reasonable to change the defaults to >>> generate code that can run on, say, 95% of the computers still in use. >>> If a user want to use the latest-and-greatest gcc for a really old >>> machine, the burden of adding extra flags to change the default >>> behavior of the compiler should be on that user. >>> >>> In this case of the i386 back end, that probably means changing the >>> default to something like pentium3. >> >> The biggest change we need to make for x86 is to enable SSE2, >> so we can get proper rounding behavior for float and double, >> as well as significant performance increases. > > I think Joseph fixed the rounding behavior for 4.5. Also without an adjusted > ABI you'd introduce x87 <-> SSE register moves which are not helpful > for performance. Exactly. For example, double plus(double a, double b) { return a+b; } plus: pushl %ebp movl%esp, %ebp subl$8, %esp movsd 16(%ebp), %xmm0 addsd 8(%ebp), %xmm0 movsd %xmm0, -8(%ebp) fldl-8(%ebp) leave ret Andrew.
Re: Change x86 default arch for 4.5?
On Mon, Feb 22, 2010 at 11:27 AM, Andrew Haley wrote: > On 02/21/2010 12:13 PM, Richard Guenther wrote: >> On Sun, Feb 21, 2010 at 1:06 PM, Geert Bosch wrote: >>> >>> On Feb 21, 2010, at 06:18, Steven Bosscher wrote: My point: gcc may fail to attract users (and/or may be losing users) when it tries to tailor to the needs of minorities. IMHO it would be much more reasonable to change the defaults to generate code that can run on, say, 95% of the computers still in use. If a user want to use the latest-and-greatest gcc for a really old machine, the burden of adding extra flags to change the default behavior of the compiler should be on that user. In this case of the i386 back end, that probably means changing the default to something like pentium3. >>> >>> The biggest change we need to make for x86 is to enable SSE2, >>> so we can get proper rounding behavior for float and double, >>> as well as significant performance increases. >> >> I think Joseph fixed the rounding behavior for 4.5. Also without an adjusted >> ABI you'd introduce x87 <-> SSE register moves which are not helpful >> for performance. > > Exactly. For example, > > double plus(double a, double b) > { > return a+b; > } > > plus: > pushl %ebp > movl %esp, %ebp > subl $8, %esp > movsd 16(%ebp), %xmm0 > addsd 8(%ebp), %xmm0 > movsd %xmm0, -8(%ebp) > fldl -8(%ebp) > leave > ret Yep. As the issue only concerns return values we could start to return in both %sp0 and %xmm0 for externally visible functions. That would still have a spurios set of %sp0 and FP stack adjustment but the caller could use %xmm0 and hope performance wouldn't be affected too much. Of course that's an ABI change that is only compatible in one direction. Richard.
Register Allocation Pref. in 3.3.3
Hi, For anyone who still remembers what went on with 3.3.3, in global.c, set_preference, why is there a bias to set preference for operand 0 of src? It is not intuitive and I there's no comment regarding this so I guess there is some 'assumption' gcc makes regarding the order of operands. Two generic questions regarding this version: - Does the order in which define_insn and define_expand rules show up in the .md file bias the compiler to choose a rule in one way or the other? - If two define_insn patterns match the same insn, which one will be used? Cheers, -- PMatos
Re: Change x86 default arch for 4.5?
On 02/22/2010 12:29 AM, Erik Trulsson wrote: > On Sun, Feb 21, 2010 at 11:35:11PM +, Dave Korn wrote: >> On 21/02/2010 22:42, Erik Trulsson wrote: >> >>> Yes, it does if the user is using binaries compiled by somebody else, >>> and that somebody else did not explicitly specify any CPU-flags. >>> >>> I believe that is the situation when installing most >>> Linux-distributions for example. >> >> No, surely not. The linux distributions use configure options >> when they package their compilers to choose the default with-cpu >> and with-arch options, and those are quite deliberately chosen >> according to the binary standards of the distro. It is hardly a >> case of "somebody else did not explicitly specify" cpu flags; they >> in fact explicitly specified them according to the system >> requirements for the distro. If your distro says it doesn't >> support i386, this is *why*! > > Are you sure of that? Really sure? > Some Linux distributions almost certainly do as you describe, but all > of them? I doubt it. And I doubt otherwise. Linux distros put a great deal of thought into which machines they are targeting with their binary distributions. And the existence of one tiny distro somewhere that doesn't would not change that fact. Andrew.
RE: Function versioning tests?
Hi Bingfeng. Thanks for pointing me at that patch, but it doesn't do what I require, which is probably fortunate because I would have wasted work otherwise! My change incorporates the callee function name and caller function name into the new name (e.g. bar.0.foo), so that we can trace that name back to the original non-versioned function in the source, without requiring additional debugging information; a standard build contains all the info we need because it is held in the function names. Are there any versioning tests for the patch you mention? Cheers, Ian > -Original Message- > From: Bingfeng Mei [mailto:b...@broadcom.com] > Sent: 22 February 2010 09:58 > To: Ian Bolton; gcc@gcc.gnu.org > Subject: RE: Function versioning tests? > > Hi, > GCC 4.5 already contains such patch. http://gcc.gnu.org/ml/gcc- > patches/2009-03/msg01186.html > If you are working on 4.4 branch, you can just apply the patch without > problem. > > Cheers, > Bingfeng > > > -Original Message- > > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On > > Behalf Of Ian Bolton > > Sent: 19 February 2010 17:09 > > To: gcc@gcc.gnu.org > > Subject: Function versioning tests? > > > > Hi there, > > > > I've changed our private port of GCC to give versioned functions > > better names (rather than T.0, T.1), and was wondering if there > > are any existing tests that push function-versioning to the limit, > > so I can test whether my naming scheme is sound. > > > > Failing that, I'd appreciate some pointers on how I might make > > such a test. I know I need to be passing a constant in as a > > parameter, but I don't know what other criteria are required to > > make it trigger. > > > > Cheers, > > Ian > > > >
Re: Gprof can account for less than 1/3 of execution time?!?!
Hi, On Sun, 21 Feb 2010, Jon Turner wrote: > I have recently encountered a gross inaccuracy in gprof that > I can't explain. Yes, I know gprof uses a sampling technique This is incorrect. Code compiled with -pg will call mcount on each function entry. If there are many calls (compared to other computations) the mcount overhead might become fairly large. > so I should not expect a high level of precision, but the results > I am getting clearly reflect a more fundamental issue. > > The program in question has been compiled with -pg for all > source code files. The time command reports 20 seconds of user > time (which is consistent with personal observation) but > the gprof output accounts for only about 6 seconds of the > execution time. As others have mentioned, code compiled without -pg or in shared libraries (independend of being compiled with -pg) won't be in the profile. That might be the other reason of inconsistencies. Ciao, Michael.
Re: Register Allocation Pref. in 3.3.3
Quoting "Paulo J. Matos" : Hi, For anyone who still remembers what went on with 3.3.3, in global.c, set_preference, why is there a bias to set preference for operand 0 of src? I don't remember the detail of this specific code, but in general operand 0 is mostly used as an output operand; if an output operand can be assigned a register, it is likely to be needed in a subsequent instruction to do something with it. - Does the order in which define_insn and define_expand rules show up in the .md file bias the compiler to choose a rule in one way or the other? You may not have more than one pattern with the same name. define_insn patterns will not be used for instruction pattern nor split point recognition. - If two define_insn patterns match the same insn, which one will be used? The first one to be recognized. This is not necessarily the same as the first one in the file, because insn code numbers are cached, and in general not recomputed during or after reload. If the insn code caching has an effect on eventual instruction selection, it is usually a bug in the machine description.
Re: Gprof can account for less than 1/3 of execution time?!?!
Quoting Michael Matz : Hi, On Sun, 21 Feb 2010, Jon Turner wrote: I have recently encountered a gross inaccuracy in gprof that I can't explain. Yes, I know gprof uses a sampling technique This is incorrect. Code compiled with -pg will call mcount on each function entry. If there are many calls (compared to other computations) the mcount overhead might become fairly large. The mcount overhead actually depends on the machine description, although most ports have standardized on a very runtime profligate scheme.
Re: Gprof can account for less than 1/3 of execution time?!?!
You're not listening. I am using -pg and the program is statically linked. The concern I am raising is not about the function counting, but the reported running times, which are determined by sampling (read the gprof manual, if this is news to you). In this case, the mcount overhead cannot account for the discrepancy, since that would cause gprof to OVER-estimate the run time, while in this case it is UNDER-estimating. It's missing about 70% of the actual running time in the program. It conceivably I am doing something wrong. I hope so, since once I know what it is, I can fix it. But at the moment, it's hard to avoid the suspicion that something about the gprof implementation is deeply flawed. Jon Joern Rennecke wrote: Quoting Michael Matz : Hi, On Sun, 21 Feb 2010, Jon Turner wrote: I have recently encountered a gross inaccuracy in gprof that I can't explain. Yes, I know gprof uses a sampling technique This is incorrect. Code compiled with -pg will call mcount on each function entry. If there are many calls (compared to other computations) the mcount overhead might become fairly large. The mcount overhead actually depends on the machine description, although most ports have standardized on a very runtime profligate scheme.
RE: Gprof can account for less than 1/3 of execution time?!?!
Hi Jon, What does ldd says about your executable? Thanks ++Cyrille -Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jon Turner Sent: Monday, February 22, 2010 3:44 PM To: Joern Rennecke Cc: Michael Matz; gcc@gcc.gnu.org Subject: Re: Gprof can account for less than 1/3 of execution time?!?! You're not listening. I am using -pg and the program is statically linked. The concern I am raising is not about the function counting, but the reported running times, which are determined by sampling (read the gprof manual, if this is news to you). In this case, the mcount overhead cannot account for the discrepancy, since that would cause gprof to OVER-estimate the run time, while in this case it is UNDER-estimating. It's missing about 70% of the actual running time in the program. It conceivably I am doing something wrong. I hope so, since once I know what it is, I can fix it. But at the moment, it's hard to avoid the suspicion that something about the gprof implementation is deeply flawed. Jon Joern Rennecke wrote: > Quoting Michael Matz : > >> Hi, >> >> On Sun, 21 Feb 2010, Jon Turner wrote: >> >>> I have recently encountered a gross inaccuracy in gprof that >>> I can't explain. Yes, I know gprof uses a sampling technique >> >> This is incorrect. Code compiled with -pg will call mcount on each >> function entry. If there are many calls (compared to other computations) >> the mcount overhead might become fairly large. > > The mcount overhead actually depends on the machine description, although > most ports have standardized on a very runtime profligate scheme.
Re: Gprof can account for less than 1/3 of execution time?!?!
Hi Jon, On Mon, Feb 22, 2010 at 08:43:31AM -0600, Jon Turner wrote: > You're not listening. I am using -pg and the program is statically > linked. The concern I am raising is not about the function counting, > but the reported running times, which are determined by sampling > (read the gprof manual, if this is news to you). > > In this case, the mcount overhead cannot account for the discrepancy, > since that would cause gprof to OVER-estimate the run time, while > in this case it is UNDER-estimating. It's missing about 70% of the > actual running time in the program. Well, іn fact I expect gprof to UNDER-estimate the runtime: The processor needs time in order to do the profiling (count, how often a function was called from which function; measure the times, ...) Now: - the external "time" command will INCLUDE theses times into the reported runtime (it can't distinguish between operations done by your code and by the profiling-code) - I would expect the profiling code to EXCLUDE theses times - because they only pollute the profiling data: When I try to optimize the code based on the profiling-data, I'm not interested in the question how much time was used for the profiling-code, but I'm only interested in the time used in MY code. So: - "time" will give the total runtime (your code + profiling-code) - the gprof-output will (probably) give the runtime of your code, without the profiling code and that means, that gprof should underestimate the runtime. Well, that's what I believe, I never checked it carefully. In addition, you have another place where gprof underestimates the runtime (as already mentioned): - when you link a library which was compiled without -pg, its runtime will be neglected by gprof, but not by "time". Are the standard-libraries (glibc for example) to which you link compiled with "-pg" ? if not, time spend in "printf" or "std::cout" or ... will be neglected by gprof too. - when you call a Kernel-function, which runs in Kernel space, it will be included by "time" - but not by "gprof". HTH, Axel
Re: Gprof can account for less than 1/3 of execution time?!?!
On 22/02/2010 14:43, Jon Turner wrote: > You're not listening. I am using -pg and the program is statically > linked. > It conceivably I am doing something wrong. I hope so, since > once I know what it is, I can fix it. Providing a simple reproducible testcase with steps to reproduce the problem would be the way to proceed now. Then everyone will be able to help you figure out whether or not you're making any mis-steps, and if not, will have a working example to help them debug what's up with gprof. cheers, DaveK
Re: Change x86 default arch for 4.5?
On 22/02/2010 11:04, Andrew Haley wrote: > On 02/22/2010 12:29 AM, Erik Trulsson wrote: >> On Sun, Feb 21, 2010 at 11:35:11PM +, Dave Korn wrote: >>> On 21/02/2010 22:42, Erik Trulsson wrote: >>> Yes, it does if the user is using binaries compiled by somebody else, and that somebody else did not explicitly specify any CPU-flags. I believe that is the situation when installing most Linux-distributions for example. >>> No, surely not. The linux distributions use configure options >>> when they package their compilers to choose the default with-cpu >>> and with-arch options, and those are quite deliberately chosen >>> according to the binary standards of the distro. It is hardly a >>> case of "somebody else did not explicitly specify" cpu flags; they >>> in fact explicitly specified them according to the system >>> requirements for the distro. If your distro says it doesn't >>> support i386, this is *why*! >> Are you sure of that? Really sure? >> Some Linux distributions almost certainly do as you describe, but all >> of them? I doubt it. > > And I doubt otherwise. Linux distros put a great deal of thought into > which machines they are targeting with their binary distributions. > And the existence of one tiny distro somewhere that doesn't would not > change that fact. Actually, there are probably dozens or hundreds of tiny distros out there that are basically somebody's home-made repackaging-and-minor-variant of existing bundles. However, I think that these are the same distros that constitute the 90% referred to in Sturgeon's Law, and so I would still not see that as any reason to change GCC's defaults! cheers, DaveK
is -fno-toplevel-reorder going to deprecate
Hello, I'm cross-compiling glibc(eglibc) for new processor. As far as I can see -fno-toplevel-reorder option is critical for successful build. Without option some files (initfini.c, source for crt*.o) can be miscompiled. I've heard that option might become deprecated in future gcc versions (e.g. 4.6). Although, I don't have any evidence. Could you please clarify? Regards, Sergey Yakoushkin
Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2
2010/2/20 Richard Guenther : > On Sat, Feb 20, 2010 at 1:38 PM, Christoph Rupp wrote: [...] >> I fixed all warnings regarding dereferencing type-punned pointers and >> I compile with -O3 AND -fno-strict-aliasing. >> >> and i still get the same crash as earlier. it does not crash with -O2. >> >> From what i understand the -fno-strict-aliasing should solve the >> problem, but it doesn't. >> >> The function which crashes basically is searching a bitmap for a >> pattern. When the search starts, it uses two pointers to the same >> memory location (u64 ptr for searching in 8byte-steps, char* ptr for >> searching byte-wise). If i understand the aliasing correctly then this >> might cause the problems. I just don't know how to rewrite the >> function - I want to use those two pointers for performance reasons. I >> don't even know if the problem is in this function or if it's >> somewhere else and the crash is just a side-effect of a completely >> different problem. > > With -fno-strict-aliasing this is perfectly valid so the problem must be > elsewhere. It might be alignment related if you do not make sure > that the u64 accesses are properly aligned. Try -O3 -fno-tree-vectorize > or analyze the crash. If i enable -fno-tree-vectorize for one file (that's the file which produces the crash) everything works. For me this workaround is fine, although this file is performance relevant and i'd like to have every optimization that's available. I don't really understand how alignment issues could cause this crash. i'm only building for x86_64 and i was not aware of any alignment requirements (i'd love to test on a SPARC, but sadly i don't have access to one...) BTW - if i execute this in valgrind then it doesn't crash or give warnings. For me this looks cheesy - there are no gcc warnings, -O2 works, -O3 crashes. MSVC works and older gcc versions also work fine. In this case it's hard for me not to blame gcc for the crash. Anyway - thanks for your help! Christoph > > Richard. > >> To reproduce: >> wget http://crupp.de/hamsterdb-1.1.3.tar.gz >> tar -zxvf hamsterdb-1.1.3.tar.gz >> cd hamsterdb-1.1.3 >> ./configure >> make >> cd unittests >> ./test # <-- will segfault with a bad pointer >> >> Here's my gcc version: >> Using built-in specs. >> Target: x86_64-linux-gnu >> Configured with: ../src/configure -v --with-pkgversion='Ubuntu >> 4.4.1-4ubuntu9' >> --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs >> --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr >> --enable-shared --enable-multiarch --enable-linker-build-id >> --with-system-zlib --libexecdir=/usr/lib --without-included-gettext >> --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 >> --program-suffix=-4.4 --enable-nls --enable-clocale=gnu >> --enable-libstdcxx-debug --enable-objc-gc --disable-werror >> --with-arch-32=i486 --with-tune=generic --enable-checking=release >> --build=x86_64-linux-gnu --host=x86_64-linux-gnu >> --target=x86_64-linux-gnu >> Thread model: posix >> gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9) >> >> Thanks for any help, >> Christoph >> >> >> 2010/1/14 Jonathan Wakely : >>> 2010/1/14 Christoph Rupp: To reproduce, these steps are necessary: wget http://crupp.de/dl/hamsterdb-1.1.1.tar.gz tar -zxvf hamsterdb-1.1.1.tar.gz cd hamsterdb-1.1.1 ./configure --enable-internal make >>> >>> There are lots of these warnings, which you ignore at your peril: >>> >>> freelist.c:3326: warning: dereferencing type-punned pointer will break >>> strict-aliasing rules >>> >>> ham_info.c:80: warning: dereferencing type-punned pointer will break >>> strict-aliasing rules >>> >>> env.cpp:1804: warning: dereferencing type-punned pointer will break >>> strict-aliasing rules >>> >>> You should probably either fix those warnings, avoid compiling at high >>> optimisation levels, or use -fno-strict-aliasing (which allows the >>> tests to run successfully.) >>> >> >
Re: is -fno-toplevel-reorder going to deprecate
Sergey Yakoushkin writes: > I'm cross-compiling glibc(eglibc) for new processor. > As far as I can see -fno-toplevel-reorder option is critical for > successful build. > Without option some files (initfini.c, source for crt*.o) can be miscompiled. > > I've heard that option might become deprecated in future gcc versions > (e.g. 4.6). > Although, I don't have any evidence. Could you please clarify? Where did you hear that? I have not heard of any plans to deprecate -fno-toplevel-reorder. As you say, it is required for certains kinds of use. The gcc build even uses it itself. It's possible that you have this confused with -fno-unit-at-a-time. That option is deprecated. It is being replaced with -fno-toplevel-reorder and -fno-section-anchors. Ian
Re: Change x86 default arch for 4.5?
Dave Korn wrote: On 21/02/2010 20:03, Martin Guy wrote: The point about defaults is that the GCC default tends to filter down into the default for distributions; I'd find it surprising if that was really the way it happens; don't distributions make deliberate and conscious decisions about binary standards and things like that? They certainly ought to be doing so, IMO, rather than just going with whatever-the-compiler-does-when-you-don't-tell-it-what-to-do. This is Debian Testing as of last Sunday - note that it has had --with-arch-32=i486 for more time than I care to remember ... hir...@super:~$ gfortran -v Using built-in specs. Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.2-9' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 4.4.3 20100108 (prerelease) (Debian 4.4.2-9) Hope this helps, -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html
Re: is -fno-toplevel-reorder going to deprecate
Hi, Ian I mean exactly -fno-toplevel-reorder option. There are some compilers which don't support it. If option is going to deprecate in gcc in near future as well, than it make sense to consider changes in glibc(eglibc). So, there are no plans to deprecate option. Did I understand correctly? Sergey Y. 2010/2/22 Ian Lance Taylor : > Sergey Yakoushkin writes: > >> I'm cross-compiling glibc(eglibc) for new processor. >> As far as I can see -fno-toplevel-reorder option is critical for >> successful build. >> Without option some files (initfini.c, source for crt*.o) can be miscompiled. >> >> I've heard that option might become deprecated in future gcc versions >> (e.g. 4.6). >> Although, I don't have any evidence. Could you please clarify? > > Where did you hear that? > > I have not heard of any plans to deprecate -fno-toplevel-reorder. As > you say, it is required for certains kinds of use. The gcc build even > uses it itself. > > It's possible that you have this confused with -fno-unit-at-a-time. > That option is deprecated. It is being replaced with > -fno-toplevel-reorder and -fno-section-anchors. > > Ian
Re: Change x86 default arch for 4.5?
On 22/02/2010 20:32, Toon Moene wrote: [ Not singling you out here, yours just happens to be the latest reply: ] > Dave Korn wrote: > >> On 21/02/2010 20:03, Martin Guy wrote: > >>> The point about defaults is that the GCC default tends to filter down >>> into the default for distributions; >> >> I'd find it surprising if that was really the way it happens; don't >> distributions make deliberate and conscious decisions about binary >> standards and things like that? > This is Debian Testing Enough with the examples, already! It was a rhetorical question! :-) cheers, DaveK
Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2
On Mon, 2010-02-22 at 21:17 +0100, Christoph Rupp wrote: > 2010/2/20 Richard Guenther : > > On Sat, Feb 20, 2010 at 1:38 PM, Christoph Rupp wrote: > [...] > >> I fixed all warnings regarding dereferencing type-punned pointers and > >> I compile with -O3 AND -fno-strict-aliasing. > >> > >> and i still get the same crash as earlier. it does not crash with -O2. > >> > >> From what i understand the -fno-strict-aliasing should solve the > >> problem, but it doesn't. > >> > >> The function which crashes basically is searching a bitmap for a > >> pattern. When the search starts, it uses two pointers to the same > >> memory location (u64 ptr for searching in 8byte-steps, char* ptr for > >> searching byte-wise). If i understand the aliasing correctly then this > >> might cause the problems. I just don't know how to rewrite the > >> function - I want to use those two pointers for performance reasons. I > >> don't even know if the problem is in this function or if it's > >> somewhere else and the crash is just a side-effect of a completely > >> different problem. > > > > With -fno-strict-aliasing this is perfectly valid so the problem must be > > elsewhere. It might be alignment related if you do not make sure > > that the u64 accesses are properly aligned. Try -O3 -fno-tree-vectorize > > or analyze the crash. > > If i enable -fno-tree-vectorize for one file (that's the file which > produces the crash) everything works. > > For me this workaround is fine, although this file is performance > relevant and i'd like to have every optimization that's available. > > I don't really understand how alignment issues could cause this crash. > i'm only building for x86_64 and i was not aware of any alignment > requirements (i'd love to test on a SPARC, but sadly i don't have > access to one...) > > BTW - if i execute this in valgrind then it doesn't crash or give warnings. > > For me this looks cheesy - there are no gcc warnings, -O2 works, -O3 > crashes. MSVC works and older gcc versions also work fine. In this > case it's hard for me not to blame gcc for the crash. > > Anyway - thanks for your help! > Christoph Vector instructions usually require aligned data. Perhaps the vectorizer assumes that a pointer to 64-bit data will only be used to access data that is 64-bit aligned. If you can reproduce the problem with a small, self-contained test then please file a bug report. It might be possible to issue a warning or to detect that the loop should not be vectorized. If not, maybe the compiler should disable vectorization for -fno-strict-aliasing. Janis > > > > Richard. > > > >> To reproduce: > >> wget http://crupp.de/hamsterdb-1.1.3.tar.gz > >> tar -zxvf hamsterdb-1.1.3.tar.gz > >> cd hamsterdb-1.1.3 > >> ./configure > >> make > >> cd unittests > >> ./test # <-- will segfault with a bad pointer > >> > >> Here's my gcc version: > >> Using built-in specs. > >> Target: x86_64-linux-gnu > >> Configured with: ../src/configure -v --with-pkgversion='Ubuntu > >> 4.4.1-4ubuntu9' > >> --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs > >> --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr > >> --enable-shared --enable-multiarch --enable-linker-build-id > >> --with-system-zlib --libexecdir=/usr/lib --without-included-gettext > >> --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 > >> --program-suffix=-4.4 --enable-nls --enable-clocale=gnu > >> --enable-libstdcxx-debug --enable-objc-gc --disable-werror > >> --with-arch-32=i486 --with-tune=generic --enable-checking=release > >> --build=x86_64-linux-gnu --host=x86_64-linux-gnu > >> --target=x86_64-linux-gnu > >> Thread model: posix > >> gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9) > >> > >> Thanks for any help, > >> Christoph > >> > >> > >> 2010/1/14 Jonathan Wakely : > >>> 2010/1/14 Christoph Rupp: > > To reproduce, these steps are necessary: > > wget http://crupp.de/dl/hamsterdb-1.1.1.tar.gz > tar -zxvf hamsterdb-1.1.1.tar.gz > cd hamsterdb-1.1.1 > ./configure --enable-internal > make > >>> > >>> There are lots of these warnings, which you ignore at your peril: > >>> > >>> freelist.c:3326: warning: dereferencing type-punned pointer will break > >>> strict-aliasing rules > >>> > >>> ham_info.c:80: warning: dereferencing type-punned pointer will break > >>> strict-aliasing rules > >>> > >>> env.cpp:1804: warning: dereferencing type-punned pointer will break > >>> strict-aliasing rules > >>> > >>> You should probably either fix those warnings, avoid compiling at high > >>> optimisation levels, or use -fno-strict-aliasing (which allows the > >>> tests to run successfully.) > >>> > >> > >
Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2
On Mon, Feb 22, 2010 at 1:06 PM, Janis Johnson wrote: > If you can reproduce the problem with a small, self-contained test then > please file a bug report. It might be possible to issue a warning or > to detect that the loop should not be vectorized. If not, maybe the > compiler should disable vectorization for -fno-strict-aliasing. It is not an aliasing issue but an alignment issue. Anyways this is most likely the same as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43009 . Thanks, Andrew Pinski
Re: gcc 4.4.1/linux 64bit: code crashes with -O3, works with -O2
On Mon, 2010-02-22 at 13:11 -0800, Andrew Pinski wrote: > On Mon, Feb 22, 2010 at 1:06 PM, Janis Johnson wrote: > > If you can reproduce the problem with a small, self-contained test then > > please file a bug report. It might be possible to issue a warning or > > to detect that the loop should not be vectorized. If not, maybe the > > compiler should disable vectorization for -fno-strict-aliasing. > > It is not an aliasing issue but an alignment issue. Anyways this is > most likely the same as > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43009 . Yes, that's the problem I had in mind but I was thinking about an explicit cast to a pointer to more-aligned data in the function that has the vector loop. There's no way to warn about the undefined behavior when the cast is in a different source file. It's interesting that two reports of failure due to this same undefined behavior come so close together. Janis
Re: Gprof can account for less than 1/3 of execution time?!?!
On Mon, Feb 22, 2010 at 03:23:52PM -0600, Jon Turner wrote: > In it, you will find a directory with all the source code > needed to observe the problem for yourself. > The top level directory contains a linux executable called > maxFlo, which you should be able to run on a linux box > as is. But if you want/need to compile things yourself, > type "make clean" and "make all" in the top level > directory and you should get a fresh copy of maxFlo. So, compiling maxFlo with no -pg option: @nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo real0m3.465s user0m3.460s sys 0m0.000s Compiling maxFlo with -pg option: @nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo real0m9.780s user0m9.760s sys 0m0.010s Notice that ~60% of the running time with gprof enabled is simply overhead from call counting and the like. That time isn't recorded by gprof. That alone accounts for your report about gprof ignoring 2/3 of the execution time. Checking to see whether maxFlo is a dynamic executable (since you claimed earlier that you were statically linking your program): @nightcrawler:~/src/gprof-trouble-case$ ldd ./maxFlo linux-vdso.so.1 => (0x7fff2977f000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7fb422c21000) libm.so.6 => /lib/libm.so.6 (0x7fb42299d000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x7fb422786000) libc.so.6 => /lib/libc.so.6 (0x7fb422417000) /lib64/ld-linux-x86-64.so.2 (0x7fb422f31000) So calls to shared library functions (such as functions in libm) will not be caught by gprof. Those calls count account for a significant amount of running time of your program and gprof can't tell you about them. Inspecting the gmon.out file: @nightcrawler:~/src/gprof-trouble-case$ gprof maxFlo gmon.out Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 16.09 0.37 0.3727649 0.00 0.00 shortPath::findPath() 12.61 0.66 0.29 55889952 0.00 0.00 graph::next(int,int) const 11.96 0.94 0.28 61391904 0.00 0.00 graph::mate(int,int) const 10.87 1.19 0.25 58654752 0.00 0.00 flograph::res(int,int) const 10.44 1.43 0.24 _fini 6.96 1.59 0.16 65055289 0.00 0.00 graph::term(int) const 6.96 1.75 0.16 61391904 0.00 0.00 digraph::tail(int) const [...lots of stuff elided...] 0.00 2.30 0.001 0.00 0.00 graph gprof is telling you about 2.3 seconds of your execution time. With the factors above accounted for, that doesn't seem unreasonable. -Nathan
Re: Gprof can account for less than 1/3 of execution time?!?!
Ok, this is not as simple as I would like to make it, but hopefully it's simple enough. I've placed a tar file at www.arl.wustl.edu/~jst/gprof-tarfile In it, you will find a directory with all the source code needed to observe the problem for yourself. The top level directory contains a linux executable called maxFlo, which you should be able to run on a linux box as is. But if you want/need to compile things yourself, type "make clean" and "make all" in the top level directory and you should get a fresh copy of maxFlo. The basic test, with my results appears below. % time maxFlo 20.356u 0.001s 0:20.38 99.8%0+0k 0+0io 0pf+0w Note, that's 20 seconds of user time. Now for gprof, % maxFlo % gprof maxFlo | more Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 18.87 1.17 1.1727649 0.00 0.00 shortPath::findPath() 18.39 2.31 1.14 58654752 0.00 0.00 flograph::res(int, int) const 13.87 3.17 0.86 61391904 0.00 0.00 graph::mate(int, int) const 12.18 3.93 0.76 55889952 0.00 0.00 graph::next(int, int) const 6.29 4.32 0.39 61391904 0.00 0.00 graph::left(int) const 5.65 4.67 0.35 65055289 0.00 0.00 graph::term(int) const 5.16 4.99 0.32 61391904 0.00 0.00 digraph::tail(int) const 4.03 5.24 0.25 70488952 0.00 0.00 graph::n() const 3.63 5.46 0.23 9232153 0.00 0.00 list::operator&=(int) 2.10 5.59 0.13 9165337 0.00 0.00 list::operator<<=(int) 1.94 5.71 0.12 58654752 0.00 0.00 graph::m() const 1.45 5.80 0.0927649 0.00 0.00 list::makeSpace() 1.29 5.88 0.08 9165337 0.00 0.00 list::operator[](int) const 1.21 5.96 0.08 9165337 0.00 0.00 graph::first(int) const 0.81 6.01 0.05 2737152 0.00 0.00 flograph::addFlow(int, int, int) 0.81 6.06 0.0527648 0.00 0.00 augPath::augment() 0.65 6.10 0.04 2737152 0.00 0.00 min(int, int) 0.48 6.13 0.03 15466777 0.00 0.00 flograph::src() const 0.48 6.16 0.03 fatal(char*) 0.24 6.17 0.02 operator<<(std::ostream&, list const&) 0.16 6.18 0.01 9287496 0.00 0.00 flograph::snk() const 0.16 6.19 0.01 9165338 0.00 0.00 list::empty() const 0.16 6.20 0.01 list::clear() 0.00 6.20 0.0027649 0.00 0.00 list::freeSpace() 0.00 6.20 0.0027649 0.00 0.00 list::list(int) 0.00 6.20 0.0027649 0.00 0.00 list::~list() 0.00 6.20 0.00 1140 0.00 0.00 digraph::join(int, int) 0.00 6.20 0.00 1140 0.00 0.00 flograph::changeCap(int, int) 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _Z7badCasei 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN4list9makeSpaceEv 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN4misc6cflushERSic 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN5graph9makeSpaceEv 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN7augPathC2ER8flographRi 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN7digraph9makeSpaceEv 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN8flograph9makeSpaceEv 0.00 6.20 0.001 0.00 0.00 global constructors keyed to _ZN9shortPathC2ER8flograph Ri 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 6.20 0.001 0.00 0.00 badCase(int) 0.00 6.20 0.001 0.00 0.00 graph::makeSpace() 0.00 6.20 0.001 0.00 0.00 graph::graph(int, int) 0.00 6.20 0.001
Re: is -fno-toplevel-reorder going to deprecate
Sergey Yakoushkin writes: > I mean exactly -fno-toplevel-reorder option. There are some compilers > which don't support it. Yes. The option was introduced in gcc 4.2. > If option is going to deprecate in gcc in near future as well, than it > make sense to consider changes in glibc(eglibc). > So, there are no plans to deprecate option. Did I understand correctly? Correct. There are no plans to deprecate the -fno-toplevel-reorder option. I still don't know why you think that there are plans to deprecate it. Ian
Re: Gprof can account for less than 1/3 of execution time?!?!
Doh! Thanks, Nathan. I think you put your finger on it. I was well aware of the overhead that gprof can introduce, but did not recognize that this overhead was not being counted by gprof. Jon Nathan Froyd wrote: On Mon, Feb 22, 2010 at 03:23:52PM -0600, Jon Turner wrote: In it, you will find a directory with all the source code needed to observe the problem for yourself. The top level directory contains a linux executable called maxFlo, which you should be able to run on a linux box as is. But if you want/need to compile things yourself, type "make clean" and "make all" in the top level directory and you should get a fresh copy of maxFlo. So, compiling maxFlo with no -pg option: @nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo real0m3.465s user0m3.460s sys 0m0.000s Compiling maxFlo with -pg option: @nightcrawler:~/src/gprof-trouble-case$ time ./maxFlo real0m9.780s user0m9.760s sys 0m0.010s Notice that ~60% of the running time with gprof enabled is simply overhead from call counting and the like. That time isn't recorded by gprof. That alone accounts for your report about gprof ignoring 2/3 of the execution time. Checking to see whether maxFlo is a dynamic executable (since you claimed earlier that you were statically linking your program): @nightcrawler:~/src/gprof-trouble-case$ ldd ./maxFlo linux-vdso.so.1 => (0x7fff2977f000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7fb422c21000) libm.so.6 => /lib/libm.so.6 (0x7fb42299d000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x7fb422786000) libc.so.6 => /lib/libc.so.6 (0x7fb422417000) /lib64/ld-linux-x86-64.so.2 (0x7fb422f31000) So calls to shared library functions (such as functions in libm) will not be caught by gprof. Those calls count account for a significant amount of running time of your program and gprof can't tell you about them. Inspecting the gmon.out file: @nightcrawler:~/src/gprof-trouble-case$ gprof maxFlo gmon.out Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 16.09 0.37 0.3727649 0.00 0.00 shortPath::findPath() 12.61 0.66 0.29 55889952 0.00 0.00 graph::next(int,int) const 11.96 0.94 0.28 61391904 0.00 0.00 graph::mate(int,int) const 10.87 1.19 0.25 58654752 0.00 0.00 flograph::res(int,int) const 10.44 1.43 0.24 _fini 6.96 1.59 0.16 65055289 0.00 0.00 graph::term(int) const 6.96 1.75 0.16 61391904 0.00 0.00 digraph::tail(int) const [...lots of stuff elided...] 0.00 2.30 0.001 0.00 0.00 graph gprof is telling you about 2.3 seconds of your execution time. With the factors above accounted for, that doesn't seem unreasonable. -Nathan
Re: Gprof can account for less than 1/3 of execution time?!?!
On Mon, Feb 22, 2010 at 05:09:53PM -0600, Jon Turner wrote: > Doh! Thanks, Nathan. I think you put your finger on it. > I was well aware of the overhead that gprof can introduce, > but did not recognize that this overhead was not being > counted by gprof. gprof generally does not have any support for shared libraries. It will ignore profiling samples that lie outside the executable. And in this case, that includes _mcount (which is in libc.so.6). That's probably why. -- Daniel Jacobowitz CodeSourcery
Re: Gprof can account for less than 1/3 of execution time?!?!
Daniel Jacobowitz writes: > gprof generally does not have any support for shared libraries. It > will ignore profiling samples that lie outside the executable. And in > this case, that includes _mcount (which is in libc.so.6). That's > probably why. Even when statically linked, gprof will always ignore the time spent in the _mcount (or equivalent) function. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Building GCC 4.3.3 for ARM.
I am trying to build an eabi cross tool-chain for the arm using version 4.3.3. I noticed from earlier mailing list posts that the configuration flag --disable-libunwind-exceptions is not working as intended and that --without-system-unwind is the preferred flag. I wonder what is the change in the GCC build between the two flags. If the host system does not contain the stock libunwind, then does gcc use its defaults and does this --without flag explicitly tell GCC to use its version of libunwind? Also, some colleagues of mine are running into problems when linking compiling with optimization turned on. After a check, I suspect that the option --enable-target-optspace is compiling libgcc with space optimization and any program linking to it will cause errors. The make check output also fails in tests where optimization is turned on. V. Forbes