Re: Problem with commas in macro parameters
* Dmitry Yu. Bolkhovityanov wrote on Sun, Nov 06, 2005 at 07:21:56AM CET: > > That's probably an old problem, but I haven't found any notion of > it in GCC docs. So... It's one better discussed on the gcc-help mailing list. > #define V(value) = value > This works fine, until I try to pass it some complex value: > > D int some_array[2] V({4,5}) If you can limit yourself to C99-aware preprocessors, use #define V(...) = __VA_ARGS__ Cheers, Ralf
Call for compiler help/advice: atomic builtins for v3
Hi, we have this long standing issue which really we should solve, one way or another: otherwise there are both correctness and performance issues which we cannot fix, new features which we cannot implement. I have plenty of examples, just ask, in case, if you want more details and motivations. In a nutshell, the problem is that there is no easy way to use in the library *headers* the new atomic builtins: we want the builtins in the headers, because we want inlining, but we cannot know whether the builtins are actually available. This is of course because of targets like i686-* (I think old Sparcs is another example), when by default the generated code is i386 compatible. In such cases, more generally, the availability or not of the builtins depends on the actual command line switches (e.g., -march). Thus my request: would it be possible to have available the builtins unconditionally, by way of a slow (locks) fallback replacing the real implementation when the actual target code doesn't allow for them? I think it's rather obvious that this is the cleanest solution for the issue by far. Alternately, something I could probably implement myself rather quickly, export a preprocessor builtin, which could be exploited in macros. As I said, I already explored a bit this solution time ago and seems to me very easy to implement, but really I rather prefer the first one. However, if you really believe the latter is the only possible way to go, I can work on it immediately: the issue is on top of my priorities for the next weeks. Other proposals? I'm going to link this message to a new enhancement PR, anyway. Thanks in advance, Paolo.
Re: [RFC] Enabling loop unrolls at -O3?
On 11/6/05, Robert Dewar <[EMAIL PROTECTED]> wrote: > Giovanni Bajo wrote: > > > I believe you are missing my point. What is the GCC command line option for > > "try to optimize as best as you can, please, I don't care compiletime"? I > > believe that should be -O3. Otherwise let's make -O4. Or -O666. The only > > real > > argument I heard till now is that -funroll-loops isn't valuable without > > profile > > feedback. My experience is that it isn't true, I for sure use it for profit > > in > > my code. But it looks like the only argument that could make a difference is > > SPEC, and SPEC is not freely available. So I'd love if someone could > > SPEC -funroll-loops for me. > > It is not at all the case that SPEC is the only good argument, in fact > SPEC on its own is a bad argument. Much more important is impact on real > life applications, so your data point that it makes a significant difference > for your application is more interesting than SPEC marks. When it comes > to GCC, we are more interested in performance than good SPEC figures! Of course SPEC consists of real life applications. Whether it is a good representation for todays real life applications is another question, but it certainly is a very good set of tests. Richard.
Re: Call for compiler help/advice: atomic builtins for v3
On 11/6/05, Paolo Carlini <[EMAIL PROTECTED]> wrote: > Hi, > > we have this long standing issue which really we should solve, one way > or another: otherwise there are both correctness and performance issues > which we cannot fix, new features which we cannot implement. I have > plenty of examples, just ask, in case, if you want more details and > motivations. > > In a nutshell, the problem is that there is no easy way to use in the > library *headers* the new atomic builtins: we want the builtins in the > headers, because we want inlining, but we cannot know whether the > builtins are actually available. This is of course because of targets > like i686-* (I think old Sparcs is another example), when by default the > generated code is i386 compatible. In such cases, more generally, the > availability or not of the builtins depends on the actual command line > switches (e.g., -march). > > Thus my request: would it be possible to have available the builtins > unconditionally, by way of a slow (locks) fallback replacing the real > implementation when the actual target code doesn't allow for them? I > think it's rather obvious that this is the cleanest solution for the > issue by far. Alternately, something I could probably implement myself > rather quickly, export a preprocessor builtin, which could be exploited > in macros. As I said, I already explored a bit this solution time ago > and seems to me very easy to implement, but really I rather prefer the > first one. However, if you really believe the latter is the only > possible way to go, I can work on it immediately: the issue is on top of > my priorities for the next weeks. > > Other proposals? We could just provide fallback libcalls in libgcc - though usually you would want a fallback on a higher level to avoid too much overhead. So - can't you work with some preprocessor magic and a define, if the builtins are available? Richard.
Re: Call for compiler help/advice: atomic builtins for v3
Richard Guenther wrote: We could just provide fallback libcalls in libgcc Indeed, this is an option. Not one I can implement myself quickly, but I think the idea of issuing a library call when the builtin is not available was actually meant to enable this kind of solution. Can you work on it? - though usually you would want a fallback on a higher level to avoid too much overhead. Well, we are talking about i386... But yes, that could make sense, for example if you want to compile once both for i386 and i686 and obtain decent performance everywhere. Consider, however, that currently we end up doing a function call anyway, no inlining, so... So - can't you work with some preprocessor magic and a define, if the builtins are available? I don't really understand this last remark of yours: is it an alternate solution?!? Any preprocessor magic has to rely on a new preprocessor builtin, in case, because there is no consistent, unequivocal way to know whether the builtins are available for the actual target, we already explored that some time ago, in our (SUSE) gcc-development list too. Paolo.
Re: Call for compiler help/advice: atomic builtins for v3
Richard Guenther wrote: We could just provide fallback libcalls in libgcc All in all, I think this is really the best solution. For 4.2 Sparc will also have the builtins available and even if we want that the libgcc code is equivalent to what is currently available in libstdc++-v3/config/cpu, that is, not just locks when possible, the amount of code added to libgcc would be very limited. Then, during the transition phase we could help ourselves with some macro tricks, but only temporarily. Paolo.
Re: [RFC] Enabling loop unrolls at -O3?
Robert Dewar <[EMAIL PROTECTED]> writes: | Steven Bosscher wrote: | | > You must not have been paying attention to one of the most frequent | > complaints about gcc, which is that it is dog slow already ;-) | | Sure, but to me -O2 says you don't care much about compilation time. If the Ada front-end wishes, it can make special flags for its own needs... -- Gaby
Re: Is -fvisibility patch possible on GCC 3.3.x
"Gary M Mann" <[EMAIL PROTECTED]> writes: | Hi, | | The -fvisibility feature in GCC 4.0 is a really useful way of hiding all | non-public symbols in a dynamic shared object. | | While I'm aware of a patch which backports this feature to GCC 3.4 (over at | nedprod.com), I was wondering whether there is a similar patch available for | GCC 3.3. I'm aware that GCC 3.3 (and some vintages of 3.2) support the | __attribute__ means of setting a symbol's visibility, but I want to be able | to change the default visibility for all symbols. | | The problem is that we're stuck for now with the GCC 3.3 compiler as we need | v5 ABI compatibility for Orbix 6.2 and cannot move to 3.4 until Iona catch | up. | | Does anyone know if such a patch exists, or even if it is feasible in the | 3.3 framework? I'm not aware of any such patch. However, beware that the visibility patch comes with its own can of worms. -- Gaby
Re: [RFC] Enabling loop unrolls at -O3?
Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: >>> You must not have been paying attention to one of the most frequent >>> complaints about gcc, which is that it is dog slow already ;-) >> >> Sure, but to me -O2 says you don't care much about compilation time. > > If the Ada front-end wishes, it can make special flags for its own > needs... Why are you speaking of the Ada frontend? If -O1 means "optimize, but be fast", what does -O2 mean? And what does -O3 mean? If -O2 means "the current set of optimizer that we put in -O2", that's unsatisfying for me. Giovanni Bajo
Re: [RFC] Enabling loop unrolls at -O3?
On Sun, Nov 06, 2005 at 01:32:43PM +0100, Giovanni Bajo wrote: > If -O1 means "optimize, but be fast", what does -O2 mean? And what does -O3 > mean? If -O2 means "the current set of optimizer that we put in -O2", that's > unsatisfying for me. `-O2' Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. The compiler does not perform loop unrolling or function inlining when you specify `-O2'. As compared to `-O', this option increases both compilation time and the performance of the generated code. Including loop unrolling to -O2 is IMNSHO a bad idea, as loop unrolling increases code size, sometimes a lot. And the distinction between -O2 and -O3 is exactly in the space-for-speed tradeoffs. On many CPUs for many programs, -O3 generates slower code than -O2, because the cache footprint disadvantages override positive effects of the loop unrolling, extra inlining etc. Jakub
Re: Call for compiler help/advice: atomic builtins for v3
On 11/6/05, Paolo Carlini <[EMAIL PROTECTED]> wrote: > Richard Guenther wrote: > > We could just provide fallback libcalls in libgcc > Indeed, this is an option. Not one I can implement myself quickly, but I > think the idea of issuing a library call when the builtin is not > available was actually meant to enable this kind of solution. > > Can you work on it? > > - though usually you > > would want a fallback on a higher level to avoid too much overhead. > > > Well, we are talking about i386... But yes, that could make sense, for > example if you want to compile once both for i386 and i686 and obtain > decent performance everywhere. Consider, however, that currently we end > up doing a function call anyway, no inlining, so... > > So - can't you work with some preprocessor magic and a define, if > > the builtins are available? > > > I don't really understand this last remark of yours: is it an alternate > solution?!? Any preprocessor magic has to rely on a new preprocessor > builtin, in case, because there is no consistent, unequivocal way to > know whether the builtins are available for the actual target, we > already explored that some time ago, in our (SUSE) gcc-development list too. Can you point me to some libstdc++ class/file where you use the builtins or other solution? Of course I meant having another builtin preprocessor macro. As the builtins are pretty low-level, somebody might provide a low-level workaround if they're missing, too, instead of having libcalls to libgcc (like for a freestanding environment like the kernel). So ultimately a preprocessor driven solution maybe the better one. Richard.
Re: Call for compiler help/advice: atomic builtins for v3
Richard Guenther wrote: >Can you point me to some libstdc++ class/file where you use the >builtins or other solution? > Simply config/cpu/*/atomicity.h will do, for ia64, powerpc, ia64, alpha, s390, currently to implement __exchange_and_add and __atomic_add. Note that in this way the latter are *not* inlined, because atomicity.h is compiled in the *.so :-( Anyway, as you can see, currently we are using only one builtin, __sync_fetch_and_add, and only for one type, _Atomic_word, but if we can rely on the builtins to be available we can call them directly from the *headers* for a big performance benefit. Also in many other places, of course, e.g., lock-free tr1::shared_ptr. I'm still unsure, however, which is the most flexible and clean strategy: a fall back library call mechanism in principle (could) allow to easily obtain a behavior very similar to the current one when you don't pass any special -march at compile time, at the cost of worse performance indeed, but meaning a single executable both for i386 and i686. I think it is possible to have libgcc code tailored to the effective target (i386 vs i686), right? If possible, I think the latter is a wanted feature. Maybe there is something I don't understand about the libgcc-based mechanisms... I'm eager to learn more... Paolo.
Re: [RFC] Enabling loop unrolls at -O3?
Richard Guenther wrote: Of course SPEC consists of real life applications. Whether it is a good representation for todays real life applications is another question, but it certainly is a very good set of tests. It's mostly small programs by its nature, it is not very practical to include real million line applications in a test suite like this. To say it is certainly a very good set of tests is fairly contentious I would say, since time and time again, the experience is that spec marc values are not indicative of real performance. Now oddly enough that's *least* true of gcc, since compared to other compilers less effort has been put into tweaking spec numbers. But I still maintain I am more interested in figures from real customer applications than bench mark suites of this kind. And the usual discovery is that optimizations are less effective, at least in my experience.
Re: [RFC] Enabling loop unrolls at -O3?
Gabriel Dos Reis wrote: Robert Dewar <[EMAIL PROTECTED]> writes: | Steven Bosscher wrote: | | > You must not have been paying attention to one of the most frequent | > complaints about gcc, which is that it is dog slow already ;-) | | Sure, but to me -O2 says you don't care much about compilation time. If the Ada front-end wishes, it can make special flags for its own needs... Yes, of course, though there would have to be very good motivation for this ...
Re: [RFC] Enabling loop unrolls at -O3?
Giovanni Bajo wrote: If -O1 means "optimize, but be fast", what does -O2 mean? And what does -O3 mean? If -O2 means "the current set of optimizer that we put in -O2", that's unsatisfying for me. Right, that's exactly my point, you want some clear statement of the goals of different levels, defining -On as merely being better performance than -O(n+1) at the expense of worse compile time is really not sufficient. With large applications, it is often a big job to test out different optimization levels, so this is not always the practical way of choosing.
Re: [RFC] Enabling loop unrolls at -O3?
Jakub Jelinek wrote: Including loop unrolling to -O2 is IMNSHO a bad idea, as loop unrolling increases code size, sometimes a lot. And the distinction between -O2 and -O3 is exactly in the space-for-speed tradeoffs. That's certainly a valid way of defining the difference (and certainly used to be the case in the old days when the principle extra optimization was inlining) On many CPUs for many programs, -O3 generates slower code than -O2, because the cache footprint disadvantages override positive effects of the loop unrolling, extra inlining etc. That's what we have found, though I would have thought it unusual that loop unrolling would run into this cache effect in most cases. Jakub
Re: [RFC] Enabling loop unrolls at -O3?
Robert Dewar <[EMAIL PROTECTED]> writes: >> Including loop unrolling to -O2 is IMNSHO a bad idea, as loop unrolling >> increases code size, sometimes a lot. And the distinction between -O2 >> and -O3 is exactly in the space-for-speed tradeoffs. >That's certainly a valid way of defining the difference (and certainly >used to be the case in the old days when the principle extra optimization >was inlining) But even -O2 makes several space-for-speed optimisations (multiply by shifting and adding, align jump targets, etc), so this cannot define the difference between -O2 and -O3. It is more quantitative in nature: -O2 only generates bigger code where the payoff in speed is almost certain, which is not always the case for unrolling/inlining. Still, more and more projects seem to be switching to -Os for everything which could be interpreted as both a consequence of the inevitable bloating of large programs but also as a dissatisfaction with what -O2/3 does to the code. Or maybe they are wisening up and actually using the option that has been there all along.
Re: [RFC] Enabling loop unrolls at -O3?
Hi, On Sunday 06 November 2005 16:20, Mattias Engdegård wrote: > Robert Dewar <[EMAIL PROTECTED]> writes: > >> Including loop unrolling to -O2 is IMNSHO a bad idea, as loop unrolling > >> increases code size, sometimes a lot. And the distinction between -O2 > >> and -O3 is exactly in the space-for-speed tradeoffs. > > > >That's certainly a valid way of defining the difference (and certainly > >used to be the case in the old days when the principle extra optimization > >was inlining) > > But even -O2 makes several space-for-speed optimisations (multiply by > shifting and adding, align jump targets, etc), so this cannot define > the difference between -O2 and -O3. It is more quantitative in nature: > -O2 only generates bigger code where the payoff in speed is almost > certain, which is not always the case for unrolling/inlining. > > Still, more and more projects seem to be switching to -Os for everything > which could be interpreted as both a consequence of the inevitable > bloating of large programs but also as a dissatisfaction with what -O2/3 > does to the code. Or maybe they are wisening up and actually using the > option that has been there all along. We used -Os in the past, however I saw major speed regression for C++ with -Os and 4.0.0. I would need to meassure again with 4.0.x or 4.1 to see if that changed after the .0 release. Yours, -- René Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany) http://www.exactcode.de | http://www.t2-project.org +49 (0)30 255 897 45 pgp5hgTPcRXjm.pgp Description: PGP signature
Re: Call for compiler help/advice: atomic builtins for v3
On Nov 6, 2005, at 6:03 AM, Paolo Carlini wrote: So - can't you work with some preprocessor magic and a define, if the builtins are available? I don't really understand this last remark of yours: is it an alternate solution?!? Any preprocessor magic has to rely on a new preprocessor builtin, in case, because there is no consistent, unequivocal way to know whether the builtins are available for the actual target, we already explored that some time ago, in our (SUSE) gcc-development list too. Coincidentally I also explored this option in another product. We ended up implementing it and it seemed to work quite well. It did require the back end to "register" with the preprocessor those builtins it implemented, and quite frankly I don't know exactly how that registration worked. But from a library writer's point of view, it was pretty cool. For example: inline unsigned count_bits32(_CSTD::uint32_t x) { #if __has_intrinsic(__builtin_count_bits32) return __builtin_count_bits32(x); #else x -= (x >> 1) & 0x; x = (x & 0x) + ((x >> 2) & 0x); x = (x + (x >> 4)) & 0x0F0F0F0F; x += x >> 8; x += x >> 16; return (unsigned)x & 0xFF; #endif } In general, if __builtin_XYZ is implemented in the BE, then __has_intrinsic(__builtin_XYZ) answers true, else it answers false. This offers a lot of generality to the library writer. Generic implementations are written once and for all (in the library), and need not be inlined. Furthermore, the library can test __has_intrinsic(__builtin_XYZ) in places relatively unrelated to __builtin_XYZ (such as a client of __builtin_XYZ) and perform different algorithms in the client depending on whether the builtin existed or not (clients of the atomic builtins might especially value such flexibility). Also during testing the "macro-ness" of this facility can come in really handy: // Test generic library "builtins" #undef __has_intrinsic #define __has_intrinsic(x) 0 // All __builtins are effectively disabled here, // except of course those not protected by __has_intrinsic. The work in the back end seems somewhat eased as well as the back end can completely ignore the builtin unless it wants to implement it. And then there is only the extra "registration" with __has_intrinsic to deal with. My apologies for rehashing an old argument. If there are pointers to: we already explored that some time ago, in our (SUSE) gcc- development list too. I would certainly take the time to review that thread. -Howard
Re: Call for compiler help/advice: atomic builtins for v3
Paolo Carlini wrote: > Hi, > > we have this long standing issue which really we should solve, one way > or another: otherwise there are both correctness and performance issues > which we cannot fix, new features which we cannot implement. I have > plenty of examples, just ask, in case, if you want more details and > motivations. I think this is a somewhat difficult problem because of the tension between performance and functionality. In particular, as you say, the code sequence you want to use varies by CPU. I don't think I have good answers; this email is just me musing out loud. You probably don't want to inline the assembly code equivalent of: if (cpu == i386) ... else if (cpu == i486) ... else if (cpu == i586) ... ... On the other hand, if you inline, say, the i486 variant, and then run on a i686, you may not get very good performance. So, the important thing is to weigh the cost of a function call plus run-time conditionals (when using a libgcc routine that would contain support for all the CPUs) against the benefit of getting the fastest code sequences on the current processors. And in a workstation distribution you may be concerned about supporting multiple CPUs; if you're building for a specific hardware board, then you only care about the CPU actually on that board. What do you propose that the libgcc routine do for a CPU that cannot support the builtin at all? Just do a trivial implementation that is safe only for a single-CPU, single-threaded system? I think that to satisfy everyone, you may need a configure option to decide between inlining support for a particular processor (for maximum performance when you know the target performance) and making a library call (when you don't). -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304
Re: Call for compiler help/advice: atomic builtins for v3
Hi Howard, > Coincidentally I also explored this option in another product. We > ended up implementing it and it seemed to work quite well. It did > require the back end to "register" with the preprocessor those > builtins it implemented, and quite frankly I don't know exactly how > that registration worked. But from a library writer's point of view, > it was pretty cool. For example: > > inline > unsigned > count_bits32(_CSTD::uint32_t x) > { > #if __has_intrinsic(__builtin_count_bits32) > return __builtin_count_bits32(x); > #else > x -= (x >> 1) & 0x; > x = (x & 0x) + ((x >> 2) & 0x); > x = (x + (x >> 4)) & 0x0F0F0F0F; > x += x >> 8; > x += x >> 16; > return (unsigned)x & 0xFF; > #endif > } > > In general, if __builtin_XYZ is implemented in the BE, then > __has_intrinsic(__builtin_XYZ) answers true, else it answers false. > This offers a lot of generality to the library writer. Generic > implementations are written once and for all (in the library), and > need not be inlined. Indeed, we are striving for generality. The mechanism that you suggest would be rather easy to implement, in principle: as I wrote earlier today, it's pretty simple to extract the info and prepare a new preprocessor builtin that tells you for sure whether that specific target has the atomics implemented or not. Later I can also tell you the exact files and functions which would be likely touched, in case, have to dig a bit in my disk ;) Then, however, where to put the fallback *assembly* for each *target-specific* atomic built-in? The most natural choice seems to me libgcc, because our infrastructure of builtins *automatically* issues library calls at run time when the builtin is not available. That solution would avoid once and for all playing tricks with macros like the above, IMHO. And it's flexible also for many different targets, some even requiring different subversions (the obnoxious i386 vs i686...). To repeat my "philosophy", the idea of having compiler builtins is good, very good, but then we should, IMHO, complete the offloading of this low-level issue to the compiler, letting the compiler (+ libgcc, in case) to take care of generating code for __exchange_and_add, or whatever. Paolo.
Re: Call for compiler help/advice: atomic builtins for v3
On Sun, Nov 06, 2005 at 11:34:30AM +0100, Paolo Carlini wrote: > Thus my request: would it be possible to have available the builtins > unconditionally, by way of a slow (locks) fallback replacing the real > implementation when the actual target code doesn't allow for them? I suppose that in some cases it would be possible to implement them in libgcc. Certainly we provided for that possibility by expanding to external calls. Not all targets are going to be able to implement the builtins, even with locks. It is imperitive that the target have an atomic store operation, so that other read-only references to the variable see either the old or new value, but not a mix. r~
Re: Call for compiler help/advice: atomic builtins for v3
Hi Mark, >I think this is a somewhat difficult problem because of the tension >between performance and functionality. In particular, as you say, the >code sequence you want to use varies by CPU. > >I don't think I have good answers; this email is just me musing out loud. > >You probably don't want to inline the assembly code equivalent of: > > if (cpu == i386) ... > else if (cpu == i486) ... > else if (cpu == i586) ... > ... > >On the other hand, if you inline, say, the i486 variant, and then run on >a i686, you may not get very good performance. > >So, the important thing is to weigh the cost of a function call plus >run-time conditionals (when using a libgcc routine that would contain >support for all the CPUs) against the benefit of getting the fastest >code sequences on the current processors. > > Actually, the situation is not as bad, as far as I can see: the worst case is i386 vs i486+, and Old-Sparc vs New-Sparc. More generally, a targer either cannot implement the builtin at all (a trivial fall back using locks or no MT support at all) or can in no more than 1 non-trivial way. Then libgcc would contain at most 2 versions: the trivial one, and another piece of assembly, absolutely identical in principle to what the builtin is expanded too in case the inline version is actually desired. >And in a workstation distribution you may be concerned about supporting >multiple CPUs; if you're building for a specific hardware board, then >you only care about the CPU actually on that board. > >What do you propose that the libgcc routine do for a CPU that cannot >support the builtin at all? Just do a trivial implementation that is >safe only for a single-CPU, single-threaded system? > > Either that or a very low performance one, using locks. The issue it's still open, we can resolve it rather easily, I think. >I think that to satisfy everyone, you may need a configure option to >decide between inlining support for a particular processor (for maximum >performance when you know the target performance) and making a library >call (when you don't). > > Yes, let's consider for simplicity the obnoxious i686: if the user doesn't passes any -march then the fallback using locks is picked from libgcc or the non-trivial implementation if the specific target (i486+) supports it; if the user passes -march=i486+ then the builtin is expanded inline by the compiler, no use of libgcc at all. Similarly for Sparc. Paolo.
Re: Call for compiler help/advice: atomic builtins for v3
Paolo Carlini wrote: > Actually, the situation is not as bad, as far as I can see: the worst > case is i386 vs i486+, and Old-Sparc vs New-Sparc. More generally, a > targer either cannot implement the builtin at all (a trivial fall back > using locks or no MT support at all) or can in no more than 1 > non-trivial way. Are you saying that you don't expect there to ever be an architecture that might have three or more ways of doing locking? That seems rather optimistic to me. I think we ought to plan for needing as many versions as we have CPUs, roughly speaking. As for the specific IA32 case, I think you're suggesting that we ought to inline the sequence if (at least) -march=i486 is passed. If we currently use the same sequences for all i486 and higher processors, then that's a fine idea; there's not much point in making a library call to a function that will just do that one sequence, with the only benefit being that someone might later be able to replace that function if some as-of-yet uknown IA32 processor needs a different sequence. -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304
Re: Call for compiler help/advice: atomic builtins for v3
On 11/6/05, Mark Mitchell <[EMAIL PROTECTED]> wrote: > Paolo Carlini wrote: > > > Actually, the situation is not as bad, as far as I can see: the worst > > case is i386 vs i486+, and Old-Sparc vs New-Sparc. More generally, a > > targer either cannot implement the builtin at all (a trivial fall back > > using locks or no MT support at all) or can in no more than 1 > > non-trivial way. > > Are you saying that you don't expect there to ever be an architecture > that might have three or more ways of doing locking? That seems rather > optimistic to me. I think we ought to plan for needing as many versions > as we have CPUs, roughly speaking. > > As for the specific IA32 case, I think you're suggesting that we ought > to inline the sequence if (at least) -march=i486 is passed. If we > currently use the same sequences for all i486 and higher processors, > then that's a fine idea; there's not much point in making a library call > to a function that will just do that one sequence, with the only benefit > being that someone might later be able to replace that function if some > as-of-yet uknown IA32 processor needs a different sequence. Just to put some more thoughts on the table, I'm about to propose adding a __gcc_cpu_feature symbol to $suitable_place, similar to what Intel is doing with its __intel_cpu_indicator which is used in their runtime libraries to select different code paths based on processor capabilities. I'm not yet sure if and how to best expose this to the user, but internally this could be mapped 1:1 to what we have in the TARGET_* flags. Richard.
Re: Call for compiler help/advice: atomic builtins for v3
Mark Mitchell wrote: >Paolo Carlini wrote: > >>Actually, the situation is not as bad, as far as I can see: the worst >>case is i386 vs i486+, and Old-Sparc vs New-Sparc. More generally, a >>targer either cannot implement the builtin at all (a trivial fall back >>using locks or no MT support at all) or can in no more than 1 >>non-trivial way. >> >> >Are you saying that you don't expect there to ever be an architecture >that might have three or more ways of doing locking? That seems rather >optimistic to me. I think we ought to plan for needing as many versions >as we have CPUs, roughly speaking. > > Yes, in principle you are right, but in that case we can reorder the ifs: first i686, last i386 ;) Seriously earlier today I was hoping we can have something smarter than a series of conditionals at the level of libgcc, I don't know it much. I was hoping we can manage to install a version of it "knowing the target", so to speak. >As for the specific IA32 case, I think you're suggesting that we ought >to inline the sequence if (at least) -march=i486 is passed. If we >currently use the same sequences for all i486 and higher processors, >then that's a fine idea; there's not much point in making a library call >to a function that will just do that one sequence, with the only benefit >being that someone might later be able to replace that function if some >as-of-yet uknown IA32 processor needs a different sequence. > Yes. My point, more correctly, is that, when -march=i486+ is passed, *nothing* would change compared to what happens *now*. Again, I don't know the details of Richard's implementation but the real new things that we are discussing today kick-in when i386 (the default) is in effect. Paolo.
Re: Call for compiler help/advice: atomic builtins for v3
Paolo Carlini wrote: > Yes, in principle you are right, but in that case we can reorder the > ifs: first i686, last i386 ;) Seriously earlier today I was hoping we > can have something smarter than a series of conditionals at the level of > libgcc, I don't know it much. I was hoping we can manage to install a > version of it "knowing the target", so to speak. Yes, GLIBC does that kind of thing, and we could do. In the simplest form, we could have startup code that checks the CPU, and sets up a table of function pointers that application code could use. In principle, at least, this would be useful for other things, too; we might want different versions of integer division routines for ARM, or maybe, even, use real FP instructions in our software floating-point routines, if we happened to be on a hardware floating-point processor. -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304
Re: Call for compiler help/advice: atomic builtins for v3
On Sun, Nov 06, 2005 at 11:02:29AM -0800, Mark Mitchell wrote: > Are you saying that you don't expect there to ever be an architecture > that might have three or more ways of doing locking? That seems rather > optimistic to me. I think we ought to plan for needing as many versions > as we have CPUs, roughly speaking. I think this is overkill. > If we currently use the same sequences for all i486 and higher processors, > then that's a fine idea; This is pretty much true. To keep all this in perspective, folks should remember that atomic operations are *slow*. Very very slow. Orders of magnitude slower than function calls. Seriously. Taking p4 as the extreme example, one can expect a null function call in around 10 cycles, but a locked memory operation to take 1000. Usually things aren't that bad, but I believe some poor design decisions were made for p4 here. But even on a platform without such problems you can expect a factor of 30 difference. r~
Re: Call for compiler help/advice: atomic builtins for v3
Mark Mitchell wrote: >>Yes, in principle you are right, but in that case we can reorder the >>ifs: first i686, last i386 ;) Seriously earlier today I was hoping we >>can have something smarter than a series of conditionals at the level of >>libgcc, I don't know it much. I was hoping we can manage to install a >>version of it "knowing the target", so to speak. >> >> >Yes, GLIBC does that kind of thing, and we could do. In the simplest >form, we could have startup code that checks the CPU, and sets up a >table of function pointers that application code could use. > >In principle, at least, this would be useful for other things, too; we >might want different versions of integer division routines for ARM, or >maybe, even, use real FP instructions in our software floating-point >routines, if we happened to be on a hardware floating-point processor. > Yeah Paolo.
Re: Call for compiler help/advice: atomic builtins for v3
Richard Henderson wrote: > I believe some poor design decisions were made for p4 here. But even > on a platform without such problems you can expect a factor of 30 > difference. So, that suggests that inlining these operations probably isn't very profitable. In that case, it seems like we could put these routines into libgcc, and just have libstdc++ call them. And, that if __exchange_and_add is showing up on the top of the profile, the fix probably isn't inlining -- it's to work out a way to make less use of atomic operations. -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304
Re: Call for compiler help/advice: atomic builtins for v3
Mark Mitchell wrote: >Richard Henderson wrote: > >>I believe some poor design decisions were made for p4 here. But even >>on a platform without such problems you can expect a factor of 30 >>difference. >> >> >So, that suggests that inlining these operations probably isn't very >profitable. In that case, it seems like we could put these routines >into libgcc, and just have libstdc++ call them. And, that if >__exchange_and_add is showing up on the top of the profile, the fix >probably isn't inlining -- it's to work out a way to make less use of >atomic operations. > > Indeed, I was about to reply to Richard the very same things. If we are really sure that there is not much to gain from inlining (*), then the libgcc idea is still valid, even more so, in a sense I care a lot: working on the library will be *so* nice and the code so *clean*! Paolo. (*) You may dig some numbers from this thread: http://gcc.gnu.org/ml/libstdc++/2004-02/msg00372.html where I had to agree that we didn't give away too much performance. Still, it's measurable.
Re: Call for compiler help/advice: atomic builtins for v3
Paolo Carlini wrote: >>And, that if >>__exchange_and_add is showing up on the top of the profile, the fix >>probably isn't inlining -- it's to work out a way to make less use of >>atomic operations. >> >> I want to add that we are certainly going in this direction, with a non-refcounted string for the next library ABI, also available as an extension, a preview for 4.1. On the other hand, you cannot avoid atomic operations completely in the library, and in some areas, for instance tr1::shared_ptr, it's really the best you can use, using lock-free programming, to achieve MT-safety and good performance. Also, I think an user yesterday had a good point: we should also make possible a configuration option enabling to build the library optimized for single thread. To be fair, Gerald Pfeifer also suggested that a lot of time ago... Paolo.
Re: Call for compiler help/advice: atomic builtins for v3
Richard Henderson <[EMAIL PROTECTED]> writes: > Not all targets are going to be able to implement the builtins, > even with locks. It is imperitive that the target have an > atomic store operation, so that other read-only references to > the variable see either the old or new value, but not a mix. How many processors out there support multi-processor systems but do not provide any sort of atomic store operation? On a uni-processor system, presumably we only have to worry about preemptive thread switching and signals. Appropriate locking mechanisms are available for both. I can see that there is a troubling case that code may be compiled for i386 and then run on a multi-processing system using newer processors. That is something which we would have to detect at run time, in start up code or the first time the builtins are invoked. But it's not like I've looked at this stuff all that much. In general, what is the worst case we have to worry about? Ian
Re: Call for compiler help/advice: atomic builtins for v3
Hi Ian, >I can see that there is a troubling case that code may be compiled for >i386 and then run on a multi-processing system using newer processors. >That is something which we would have to detect at run time, in start >up code or the first time the builtins are invoked. > > Earlier in this thread, Mark figured out a very nice solution for this problem: a target-aware libgcc. Apparently glibc is already using a similar approach. Paolo.
Re: Call for compiler help/advice: atomic builtins for v3
* Paolo Carlini: > Actually, the situation is not as bad, as far as I can see: the worst > case is i386 vs i486+, and Old-Sparc vs New-Sparc. More generally, a > targer either cannot implement the builtin at all (a trivial fall back > using locks or no MT support at all) or can in no more than 1 > non-trivial way. Isn't IA32 a counter-example for this? Non-SMP or non-threaded (normal INC/DEC), SMP (LOCK INC/DEC), and modern SMP (something else) already results in three choices.
Re: Call for compiler help/advice: atomic builtins for v3
Richard Henderson wrote: To keep all this in perspective, folks should remember that atomic operations are *slow*. Very very slow. Orders of magnitude slower than function calls. Seriously. Taking p4 as the extreme example, one can expect a null function call in around 10 cycles, but a locked memory operation to take 1000. Usually things aren't that bad, but I believe some poor design decisions were made for p4 here. But even on a platform without such problems you can expect a factor of 30 difference. Apologies in advance if the following is not relevant... Even on a P4, inlining may enable compiler optimizations. One case is when the compiler can see that the return value of __sync_fetch_and_or (for instance) isn't used. It's possible to use a wait-free "lock or" instead of a "lock cmpxchg" loop (MSVC 8 does this for _InterlockedOr.) Another case is when inlining results in a sequence of K adjacent __sync_fetch_and_add( &x, 1 ) operations. These can legally be replaced with a single __sync_fetch_and_add. Currently the __sync_* intrinsics seem to be fully locked, but if acquire/release/unordered variants are added, other platforms may also suffer from lack of inlining. On a PowerPC an unordered atomic increment is pretty much the same speed as an ordinary increment (when there is no contention.)
Re: Call for compiler help/advice: atomic builtins for v3
* Richard Henderson: > To keep all this in perspective, folks should remember that atomic > operations are *slow*. Very very slow. Orders of magnitude slower > than function calls. Seriously. Taking p4 as the extreme example, > one can expect a null function call in around 10 cycles, but a locked > memory operation to take 1000. Usually things aren't that bad, but > I believe some poor design decisions were made for p4 here. But even > on a platform without such problems you can expect a factor of 30 > difference. And, as far as I know, you take this performance hit even if you aren't running SMP and could use an ordinary read-modify-write instruction instead.
Re: Call for compiler help/advice: atomic builtins for v3
On Sun, Nov 06, 2005 at 12:10:03PM -0800, Ian Lance Taylor wrote: > How many processors out there support multi-processor systems but do > not provide any sort of atomic store operation? My point here had been more wrt the 8-byte operations, wherein there are *pleanty* of multi-processor systems that don't have an 8-byte atomic store. Although in some cases we can make use of the fpu for that. r~
Re: Call for compiler help/advice: atomic builtins for v3
* Peter Dimov: > Even on a P4, inlining may enable compiler optimizations. One case is when > the compiler can see that the return value of __sync_fetch_and_or (for > instance) isn't used. It's possible to use a wait-free "lock or" instead of > a "lock cmpxchg" loop (MSVC 8 does this for _InterlockedOr.) You don't need inlining to optimize these cases. You only need to know precisely what the library implementations do, and you need a couple of choices. GCC can already optimizes printf ("hello world\n"); to puts ("hello world"); even though no inlining takes place.
Re: Call for compiler help/advice: atomic builtins for v3
Richard Henderson wrote: On Sun, Nov 06, 2005 at 12:10:03PM -0800, Ian Lance Taylor wrote: How many processors out there support multi-processor systems but do not provide any sort of atomic store operation? My point here had been more wrt the 8-byte operations, wherein there are *pleanty* of multi-processor systems that don't have an 8-byte atomic store. Although in some cases we can make use of the fpu for that. Indeed, I hope we can deal with that, at least on most 64-bit machines: we have got a nasty bug in the mt_allocator code (libstdc++/24469) which would be trivial to fix if size_t-sized atomics were generally available... Paolo.
Re: Call for compiler help/advice: atomic builtins for v3
Mark Mitchell wrote: > Yes, GLIBC does that kind of thing, and we could do. In the simplest > form, we could have startup code that checks the CPU, and sets up a > table of function pointers that application code could use. That's not what glibc does and it is a horrible idea. The indirect jumps are costly, very much so. The longer the pipeline the worse. The best solution (for Linux) is to compile multiple versions of the DSO and place them in the correct places so that the dynamic linker finds them if the system has the right functionality. Code generation issues aside, this is really only needed for atomic ops and maybe vector operations (XMM, Altivec, ...). The number of configurations really needed and important is small (as is the libstdc++ binary). So, just make sure that an appropriate configure line can be given (i.e., add --enable-xmm flags or so) and make it possible to compile multiple libstdc++ without recompiling the whole gcc. Packagers can then compile packages with multiple libstdc++. Pushing some of the operations (atomic ops) to libgcc seems sensible but others (like vector ops) are libstdc++ specific and therefore splitting the functionality between libstdc++ and libgcc means requiring even more versions since then libgcc also would have to be available in multiple versions. And note that for Linux the atomic ops need to take arch-specific extensions into account. For ppc it'll likely mean using the vDSO. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc Description: OpenPGP digital signature
Re: Call for compiler help/advice: atomic builtins for v3
Ulrich Drepper wrote: > Mark Mitchell wrote: > >>Yes, GLIBC does that kind of thing, and we could do. In the simplest >>form, we could have startup code that checks the CPU, and sets up a >>table of function pointers that application code could use. > > > That's not what glibc does and it is a horrible idea. The indirect > jumps are costly, very much so. The longer the pipeline the worse. I didn't mean to imply that GLIBC uses the simplistic solution I suggested, and, certainly, dynamic linking is going to be better on ssytems that support it. The simplistic solution was meant to be illustrative, and might be appropriate for use on systems without dynamic linking, like bare-metal embedded configurations, where, however, the exact CPU isn't know at link-time. -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304
Re: Call for compiler help/advice: atomic builtins for v3
On Sun, Nov 06, 2005 at 10:51:51AM -0800, Richard Henderson wrote: > I suppose that in some cases it would be possible to implement > them in libgcc. Certainly we provided for that possibility > by expanding to external calls. Actually, no, it's not possible. At least in the context we're discussing here. Consider: One part of the application (say, libstdc++) is compiled with only i386 support. Here we wind up relying on a mutex to protect the memory update. Another part of the application (say, the exe) is compiled with i686 support, and so chooses to use atomic operations. The application will now fail because not all updates to the memory location are protected by the mutex. The use of a mutex (or not) is part of the API associated with the memory location. If it's ever used by one part of the application, it must be used by all. The only thing we can do for libstdc++ is to put something in the config files that claims that atomic operations are *always* available for all members of the architecture family. If so, we may inline them. Otherwise, we should force the routines to be out-of-line in the library. This provides the stable ABI that we require. We may then use optimization flags to either implement the operation via atomic operations, or a mutex. We cannot provide fallbacks for the sync builtins in libgcc, because at that level we do not have enough information about the API required by the user. r~
Re: Call for compiler help/advice: atomic builtins for v3
Richard Henderson wrote: >One part of the application (say, libstdc++) is compiled with only >i386 support. Here we wind up relying on a mutex to protect the >memory update. Another part of the application (say, the exe) is >compiled with i686 support, and so chooses to use atomic operations. >The application will now fail because not all updates to the memory >location are protected by the mutex. > >The use of a mutex (or not) is part of the API associated with the >memory location. If it's ever used by one part of the application, >it must be used by all. > > Ok, I think I get this point: you cannot link together two parts of the application "sharing" a memory location and protecting the accesses in different ways. Seems clear now. Very similar to the issue you have if you try to allocate with a memory allocator and deallocate with another in different parts of the application. >The only thing we can do for libstdc++ is to put something in the >config files that claims that atomic operations are *always* >available for all members of the architecture family. If so, we >may inline them. Otherwise, we should force the routines to be >out-of-line in the library. This provides the stable ABI that we >require. > I think I get this too. This "quid" in the config files basically is info that we already have easily available: powerpc -> ok; alpha -> ok; i?86 -> not ok; Sparc -> not ok, and so on (*) ... We only need to use it in the configury of the library and either call the builtins or not depending on the architecture family. We have to add to the library out-of-line versions of the builtins... (in order to do that, we may end up restoring the old inline assembly implementations of CAS, for example) > We may then use optimization flags to either implement >the operation via atomic operations, or a mutex. > > If I understand correctly, this is what we are already doing in the *.so, for i386 vs i486+. I would not call that "optimization flag", however. Can you clarify? Therefore, all in all, if I understand correctly, we: 1- Don't need new preprocessor builtins. 2- Don't need new libgcc code. 3- Do need out-of-line library versions of the various builtins for the architecture families that don't implement uniformly the builtins (a lot of work!). 4- Do need a bit of additional configury (bit of work). Paolo. (*) I think there is an annoying special case which is multilib x86_64: in its 32-bit version belongs to the i?86 family and, as far as I can see, we cannot inline the builtins for x86_64 tout court, or?!?
Re: Call for compiler help/advice: atomic builtins for v3
On Mon, Nov 07, 2005 at 01:35:13AM +0100, Paolo Carlini wrote: > We have to add to the library > out-of-line versions of the builtins... (in order to do that, we may end > up restoring the old inline assembly implementations of CAS, for example) I don't think you need to restore inline assembly. > If I understand correctly, this is what we are already doing in the > *.so, for i386 vs i486+. I would not call that "optimization flag", > however. Can you clarify? I'm not sure how you were previously controling what went in here. By configuration name? That's certainly one way to do it, and probably the most reliable. Another method is to use -march=i486 on the command line, and from there use the __i486__ defines already present to determine what to do. Note that, at least for x86, -mtune=cpu affects __tune_cpu__, but not __cpu__. My thinking would be along the lines of #if !ARCH_ALWAYS_HAS_SYNC_BUILTINS _Atomic_word __exchange_and_add(volatile _Atomic_word* __mem, int __val) { #if ARCH_HAS_SYNC_BUILTINS return __sync_fetch_and_add(__mem, __val); #else __gthread_mutex_lock (&mutex); _Atomic_word ret = *__mem; *__mem = ret + __val; __gthread_mutex_unlock (&mutex); return ret; #endif } void __atomic_add(volatile _Atomic_word* __mem, int __val) { #if ARCH_HAS_SYNC_BUILTINS __sync_fetch_and_add(__mem, __val); #else __gthread_mutex_lock (&mutex); *__mem += __val; __gthread_mutex_unlock (&mutex); #endif } #endif This definition applies identically to *all* platforms. For x86, the config file would have #define ARCH_ALWAYS_HAS_SYNC_BUILTINS 0 #define ARCH_HAS_SYNC_BUILTINS \ (__i486__ || __i586__ || __i686__ || __k6__ || __athlon__ || \ __k8__ || __pentium4__ || __nocona__) We then arrange for the appropriate -march= -mtune= combination to be set by the configury. You might want to examine what I'm doing with this kind of thing for libgomp. r~
Re: Call for compiler help/advice: atomic builtins for v3
Richard Henderson wrote: >On Mon, Nov 07, 2005 at 01:35:13AM +0100, Paolo Carlini wrote: > > >>We have to add to the library >>out-of-line versions of the builtins... (in order to do that, we may end >>up restoring the old inline assembly implementations of CAS, for example) >> >> >I don't think you need to restore inline assembly. > > Would be a dirty short-cut... ;) >>If I understand correctly, this is what we are already doing in the >>*.so, for i386 vs i486+. I would not call that "optimization flag", >>however. Can you clarify? >> >> >I'm not sure how you were previously controling what went in here. >By configuration name? > Yes, see configure.host > That's certainly one way to do it, and >probably the most reliable. > > Ok, thanks. >Another method is to use -march=i486 on the command line, and from >there use the __i486__ defines already present to determine what >to do. Note that, at least for x86, -mtune=cpu affects __tune_cpu__, >but not __cpu__. > >My thinking would be along the lines of > > [snip] Ok, thanks. That is also by and large what I had in mind, modulo I would exploit our current infrastructure that you can ascertain looking to configure.host. To be sure: can you confirm that there is no easy solution for the x86_64 issue? I mean, it's annoying that we cannot inline the builtins for i686, but even more so for x86_64... Paolo.
Re: Call for compiler help/advice: atomic builtins for v3
Richard Henderson wrote: >My thinking would be along the lines of > > >#if !ARCH_ALWAYS_HAS_SYNC_BUILTINS > > [snip] >#endif > > Well, there is a minor catch, which is, if we don't want to break the ABI, we have to keep on implementing and exporting from the *.so __exchange_and_add and __atomic_add also for arches that actually always support sync builtins (even if new code will have the builtins expanded inline) Paolo.
Does gcc-3.4.3 for HP-UX 11.23/IA-64 work?
I've built gcc-3.4.3 for HP-UX 11.23/IA-64 and used the pre-compiled gcc-3.4.4 binary from the http://www.hp.com/go/gcc site. Both exhibit the same problem. While trying to build Perl 5.8.6: $ gmake ... gcc -v -o libperl.so -shared -fPIC perl.o gv.o toke.o perly.o op.o pad.o regcomp.o dump.o util.o mg.o reentr.o hv.o av.o run.o pp_hot.o sv.o pp.o scope.o pp_ctl.o pp_sys.o doop.o doio.o regexec.o utf8.o taint.o deb.o universal.o xsutils.o globals.o perlio.o perlapi.o numeric.o locale.o pp_pack.o pp_sort.o -lcl -lnsl -lnm -ldl -ldld -lm -lsec -lpthread -lc ... /opt/TWWfsw/gcc343/libexec/gcc/ia64-hp-hpux11.23/3.4.3/collect2 +Accept TypeMismatch -b -o libperl.so -L/opt/TWWfsw/gcc343r/lib -L/opt/TWWfsw/gcc343/lib/gcc/ia64-hp-hpux11.23/3.4.3 -L/usr/ccs/bin -L/usr/ccs/lib -L/opt/TWWfsw/gcc343/lib/gcc/ia64-hp-hpux11.23/3.4.3/../../.. perl.o gv.o toke.o perly.o op.o pad.o regcomp.o dump.o util.o mg.o reentr.o hv.o av.o run.o pp_hot.o sv.o pp.o scope.o pp_ctl.o pp_sys.o doop.o doio.o regexec.o utf8.o taint.o deb.o universal.o xsutils.o globals.o perlio.o perlapi.o numeric.o locale.o pp_pack.o pp_sort.o -lcl -lnsl -lnm -ldl -ldld -lm -lsec -lpthread -lc -lgcc -lgcc ^^^ Notice the "-lgcc -lgcc" at the end of the collect2 command-line, _not_ "-lgcc_s -lgcc_s". On HP-UX 11.23/PA-RISC, I get: /opt/TWWfsw/gcc343/libexec/gcc/hppa2.0-hp-hpux11.23/3.4.3/collect2 -z -b -o libperl.sl -L/opt/TWWfsw/gcc343r/lib -L/opt/TWWfsw/gcc343r/lib -L/opt/TWWfsw/gcc343/lib/gcc/hppa2.0-hp-hpux11.23/3.4.3 -L/usr/ccs/bin -L/usr/ccs/lib -L/opt/langtools/lib -L/opt/TWWfsw/gcc343/lib perl.o gv.o toke.o perly.o op.o pad.o regcomp.o dump.o util.o mg.o reentr.o hv.o av.o run.o pp_hot.o sv.o pp.o scope.o pp_ctl.o pp_sys.o doop.o doio.o regexec.o utf8.o taint.o deb.o universal.o xsutils.o globals.o perlio.o perlapi.o numeric.o locale.o pp_pack.o pp_sort.o -lcl -lnsl -lnm -lmalloc -ldld -lm -lcrypt -lsec -lpthread -lc -lgcc_s -lgcc_s Using the HP pre-compiled binary of gcc-4.0.2, I get: /opt/hp-gcc/4.0.2/bin/../libexec/gcc/ia64-hp-hpux11.23/4.0.2/collect2 -z +Accept TypeMismatch -b -o libperl.so -L/opt/hp-gcc/4.0.2/bin/../lib/gcc/ia64-hp-hpux11.23/4.0.2 -L/opt/hp-gcc/4.0.2/bin/../lib/gcc -L/opt/hp-gcc/4.0.2//lib/gcc/ia64-hp-hpux11.23/4.0.2 -L/usr/ccs/bin -L/usr/ccs/lib -L/opt/hp-gcc/4.0.2/bin/../lib/gcc/ia64-hp-hpux11.23/4.0.2/../../.. -L/opt/hp-gcc/4.0.2//lib/gcc/ia64-hp-hpux11.23/4.0.2/../../.. perl.o gv.o toke.o perly.o op.o pad.o regcomp.o dump.o util.o mg.o reentr.o hv.o av.o run.o pp_hot.o sv.o pp.o scope.o pp_ctl.o pp_sys.o doop.o doio.o regexec.o utf8.o taint.o deb.o universal.o xsutils.o globals.o perlio.o perlapi.o numeric.o locale.o pp_pack.o pp_sort.o -lcl -lnsl -lnm -ldl -ldld -lm -lsec -lpthread -lc -lgcc_s -lunwind -lgcc_s -lunwind The "*libgcc" line from the 3.4.3/3.4.4 specs file: *libgcc: %{shared-libgcc:%{!mlp64:-lgcc_s}%{mlp64:-lgcc_s_hpux64} %{static|static-libgcc:-lgcc -lgcc_eh -lunwind}%{!static:%{!static-libgcc:%{!shared:%{!shared-libgcc:-lgcc -lgcc_eh -lunwind}%{shared-libgcc:-lgcc_s%M -lunwind -lgcc}}%{shared:-lgcc_s%M -lunwind%{!shared-libgcc:-lgcc} The "*libgcc" line from the 4.0.2 specs file (via -dumpspecs): *libgcc: %{static|static-libgcc:-lgcc -lgcc_eh -lunwind}%{!static:%{!static-libgcc:%{!shared:%{!shared-libgcc:-lgcc -lgcc_eh -lunwind}%{shared-libgcc:-lgcc_s -lunwind -lgcc}}%{shared:-lgcc_s -lunwind}}} Is the problem in the "*libgcc" entry? It seems !shared-libgcc is true, though I don't know why. $ /opt/TWWfsw/gcc343/bin/gcc -v Reading specs from /opt/TWWfsw/gcc343/lib/gcc/ia64-hp-hpux11.23/3.4.3/specs Configured with: /opt/build/gcc-3.4.3/configure --with-gnu-as --with-as=/opt/TWWfsw/gcc343/ia64-hp-hpux11.23/bin/as --with-included-gettext --enable-shared --datadir=/opt/TWWfsw/gcc343/share --enable-languages=c,c++,f77 --with-local-prefix=/opt/TWWfsw/gcc343 --prefix=/opt/TWWfsw/gcc343 Thread model: single gcc version 3.4.3 (TWW) -- albert chin ([EMAIL PROTECTED])
Re: Call for compiler help/advice: atomic builtins for v3
Richard Henderson wrote: >Actually, no, it's not possible. At least in the context we're >discussing here. Consider: > >One part of the application (say, libstdc++) is compiled with only >i386 support. Here we wind up relying on a mutex to protect the >memory update. Another part of the application (say, the exe) is >compiled with i686 support, and so chooses to use atomic operations. >The application will now fail because not all updates to the memory >location are protected by the mutex. > > Richard, sorry, I don't agree, on second thought. You are not considering that the idea is using a "smart" libgcc, a la glibc, as per Mark and Uli messages. A "libstdc++ compiled with only i386 support" what is it? It is a libstdc++ which at run time will call into libgcc, it has nothing inline. Then libgcc will use the best the machine has available, that is, certainly atomic operations, if the exe (compiled with -march=i686) can possibly run. In short, the keys are: 1- The "smart" libgcc, which always makes available the best the machine. 2- Mutexes cannot be inline, only atomic operations can. Paolo.
Re: [RFC] Enabling loop unrolls at -O3?
"Giovanni Bajo" <[EMAIL PROTECTED]> writes: | Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: | | >>> You must not have been paying attention to one of the most frequent | >>> complaints about gcc, which is that it is dog slow already ;-) | >> | >> Sure, but to me -O2 says you don't care much about compilation time. | > | > If the Ada front-end wishes, it can make special flags for its own | > needs... | | | Why are you speaking of the Ada frontend? http://gcc.gnu.org/ml/gcc/2005-11/msg00262.html -- Gaby
no warning being displayed.
Hi all, I am compiling small program on a SPARC gcc 3.4.3 test.c --- struct test1 { int a; int b; char c; }; struct test2 { char a; char b; char c; }; struct test3 { int a; int b; int c; }; int main() { struct test1* t1, t11; struct test2* t2 ; struct test3* t3; t1 = &t11; t2 = (struct t2*)t1; t3 = (struct t3*)t1; return 0; } I suppose such an assignment should give a warning "incompatible pointer type" but when compiling with gcc 3.4.3 no such warning is given even with -Wall enabled. Is this a bug in this version? GCC version used --- ./cc1 --version GNU C version 3.4.3 (sparc-elf) compiled by GNU C version 3.2.3 20030502 (Red Hat Linux 3.2.3-20). GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 -- Thanks, Inder