Re: Threading the compiler
On Nov 10, 2006, at 9:08 PM, Geert Bosch wrote: I'd guess we win more by writing object files directly to disk like virtually every other compiler on the planet. The cost of my assembler is around 1.0% (ppc) to 1.4% (x86) overhead as measured with -pipe -O2 on expr.c,. If it was converted, what type of speedup would you expect? Most of my compilations (on Linux, at least) use close to 100% of CPU. Adding more overhead for threading and communication/ synchronization can only hurt. Would you notice if the cost were under 0.1%? Would you care?
Threading the compiler
* /From/: Mike Stump * /To/: GCC Development * /Date/: Fri, 10 Nov 2006 12:38:07 -0800 * /Subject/: Threading the compiler We're going to have to think seriously about threading the compiler. Intel predicts 80 cores in the near future (5 years). http:// hardware.slashdot.org/article.pl?sid=06/09/26/1937237&from=rss To use this many cores for a single compile, we have to find ways to split the work. The best way, of course is to have make -j80 do that for us, this usually results in excellent efficiencies and an ability to use as many cores as there are jobs to run. However, for the edit, compile, debug cycle of development, utilizing many cores is harder. You should give make -j80 a try before you dismiss it as not enough. I wrote a paper in 1991 "GNU & You: Building a Better World" (a play on X11's "make world" invocation) describing using massively parallel machines as a compile server and how to write correct parallel Makefiles with GNU make. As you say, you get excellent efficiencies from this. The edit/compile/debug cycle of development isn't going to benefit appreciably from a multithreaded compiler. You can only edit one file at a time, the debugging stage isn't going to benefit at all. And in general, large programs tend to be split into many source files, where many parallel invocations of gcc work just fine. The actual time to compile a single source module tends to be small. Before you launch into this idea, you should obtain profile traces that show you have any idle CPU cycles in a particular compilation, cycles that could be profitably used once a thread scheduler enters the picture. Personally, on my dual-core AMD X2, make -j3 works just fine to keep both cores above 98% until the build is done, on projects I currently maintain. Back in 1991, make -j20 worked well enough to keep an 8 processor Alliant FX busy building X11, in about a twelfth of the time it took to build serially, once I'd excised that abominable imake crap and replaced it with pure GNU make. (One hour builds down to about 5 minutes, as I recall.) Of course, if your program has fewer than 80 source files you may not get 100% utilization out of the machine, but at that point are you really going to care? -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: strict aliasing question
Howard Chu <[EMAIL PROTECTED]> writes: > That's good to know, thanks. But frankly that's braindead to require > someone to add all these new union declarations all over their code, There is no need for any union trick in your example. Just use a temporary with the correct type, and you have strictly conforming code. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: Question on tree-nested.c:convert_nl_goto_reference
> I don't know whether there are any functions nested inside nested > functions which do non-local gotos in the Ada testsuite. There aren't now. But there will be plenty when a transformation that changes raises of exceptions to gotos to a visible exception handler is finished. I'll use the C test case you wrote to verify that I'm seeing what you saw. Thanks.
Re: strict aliasing question
Howard Chu wrote: > extern void getit( void **arg ); > > main() { >union { >int *foo; >void *bar; >} u; > >getit( &u.bar ); >printf("foo: %x\n", *u.foo); > } Rask Ingemann Lambertsen wrote: > As far as I know, memcpy() is the answer: You don't need a union or memcpy() to convert the pointer types. You can solve the "void **" aliasing problem with just a cast: void *p; getit(&p); printf("%d\n", *(int *)p); This assumes that getit() actually writes to an "int" object and returns a "void *" pointer to that object. If it doesn't then you have another aliasing problem to worry about. If it writes to the object using some other known type, then you need two casts to make it safe: void *p; getit(&p); printf("%d\n", (int)*(long *)p); If writes to the object using an unknown type then you might able to use memcpy() to get around the aliasing problem, but this assumes you know that two types are compatable at the bit level: void *p; int n; getit(&p); memcpy(&n, p, sizeof n); printf("%d\n", n); The best solution would be to fix the interface so that it returns the pointer types it acutally uses. This would make it typesafe and you wouldn't need to use any casts. If you can't fix the interface itself the next best thing would be to create your own wrappers which put all the nasty casts in one place: int sasl_getprop_str(sasl_conn_t *conn, int prop, char const **pvalue) { assert(prop == SASL_AUTHUSER || prop == SASL_APPNAME || ...); void *tmp; int r = sasl_getprop(conn, prop, &tmp); if (r == SASL_OK) *pvalue = (char const *) tmp; return r; } Unfortuantely, there are aliasing problems in the Cyrus SASL source that can still come around and bite you once LTO arrives no matter what you do in your own code. You might want to see if you can't get them to change undefined code like this: *(unsigned **)pvalue = &conn->oparams.maxoutbuf; into code like this: *pvalue = (void *) &conn->oparams.maxoutbuf; Ross Ridge
Re: Threading the compiler
> Let's just say, the CPU is doomed. So you're building consensus for something that is doomed? > > Seriously thought I don't really understand what sort of response > > you're expecting. > > Just consensus building. To build a consensus you have to have something for people to agree or disagree with. > > Do you have any justification for aiming for 8x parallelism in this > > release and 2x increase in parallelism in the next release? > > Our standard box we ship today that people do compiles on tends to be > a 4 way box. If a released compiler made use of the hardware we ship > today, it would need to be 4 way. For us to have had the feature in > the compiler we ship with those systems, the feature would have had > to be in gcc-4.0. Intel has already announced 4 core chips that are > pin compatible with the 2 core chips. Their ship date is in 3 days. > People have already dropped them in our boxes and they have 8 way > machines, today. For them to make use of those cores, today, gcc-4.0 > would had to have been 8 way capable. The rate of increase in cores > is 2x every 18 months. gcc releases are about one every 12-18 > months. By the time I deploy gcc-4.2, I could use 8 way, by the time > I stop using gcc-4.2, I could make use of 16-32 cores I suspect. :-( > > > Why not just aim for 16x in the first instance? > > If 16x is more work than 8x, then I can't yet pony up the work > required for 16x myself. If cheap enough, I'll design a system where > it is just N-way. Won't know til I start doing code. 4.2 is already frozen for release, and the feature list for 4.3 is pretty much fixed at this point. I wouldn't expect any work of this scale to be released before gcc4.4. By your own numbers this means you should be aiming for 32x. > > You mention that "competition is already starting to make > > progress". Have they found it to be as easy as you imply? > > I didn't ask if they found it easy or not. Do you have any evidence the scheme you're proposing is even feasible? > > whole-program optimisation and SMP machines have been around for a > > fair while now, so I'm guessing not. > > I don't know of anything that is particularly hard about it, but, if > you know of bits that are hard, or have pointer to such, I'd be > interested in it. You imply you're considering backporting this to 4.2. I'd be amazed if that was worthwhile. I'd expect changes to be required in pretty much the whole compiler. Your strategy is built around the assumption that the majority of the work can be split into multiple independent chunks of work. There are several fairly obvious places where that is hard. eg. the frontend probably needs to process the whole file in series because previous declarations effect later code. And inter-procedural optimisations (eg. inlining) don't lend themselves to splitting on function boundaries. For other optimisations I'm not convinced there's an easy win compared with make -j. You have to make sure those passes don't have any global state, and as other people have pointed out garbage collection gets messy. The compile server project did something similar, and that seems to have died. If you're suggesting it's possible to make minor changes to gcc, and hide all the threading bits in a "manager" module then simply I don't believe you. Come back when you have a working prototype. I don't know how much of the memory allocated is global readonly data (ie. suitable for sharing between threads). I wouldn't be surprised if it's a relatively small fraction. If you have answers for the above questions, or some sort of feasibility study, maybe you could publish them? That would give people something to build a consensus on. So far you've given a suggestion of how we might like it to work but no indication of feasibility, level of effort, or where problems are likely to occur. Paul
Re: Threading the compiler
On Sat, Nov 11, 2006 at 04:16:19PM +, Paul Brook wrote: > I don't know how much of the memory allocated is global readonly data (ie. > suitable for sharing between threads). I wouldn't be surprised if it's a > relatively small fraction. I don't have numbers on global readonly, but in typical compilation most of the memory allocated is definitely global. Past a certain point much of that is probably readonly. However, it would take some clever interfaces and discipline to _guarantee_ that any particular global bit was shareable. -- Daniel Jacobowitz CodeSourcery
gcc-4.3-20061111 is now available
Snapshot gcc-4.3-2006 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-2006/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.3 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 118701 You'll find: gcc-4.3-2006.tar.bz2 Complete GCC (includes all of below) gcc-core-4.3-2006.tar.bz2 C front end and core compiler gcc-ada-4.3-2006.tar.bz2 Ada front end and runtime gcc-fortran-4.3-2006.tar.bz2 Fortran front end and runtime gcc-g++-4.3-2006.tar.bz2 C++ front end and runtime gcc-java-4.3-2006.tar.bz2 Java front end and runtime gcc-objc-4.3-2006.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.3-2006.tar.bz2The GCC testsuite Diffs from 4.3-20061104 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.3 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Reducing the size of C++ executables - eliminating malloc
GCC 4.1.1 for PowerPC generates a 162K executable for a minimal program "int main() { return 0; }". GCC 3.4.1 generated a 7.2K executable. Mark Mitchell mentioned the same problem for ARM and proposed a patch to remove the reference to malloc in atexit (http://sourceware.org/ml/newlib/2006/msg00181.html). There are references to malloc in eh_alloc.c and unwind-dw2-fde.c. It looks like these are being included even when there are no exception handlers. Any suggestions on how to eliminate the references to these routines? -- Michael Eager[EMAIL PROTECTED] 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: Threading the compiler
> > whole-program optimisation and SMP machines have been around for a > > fair while now, so I'm guessing not. > > I don't know of anything that is particularly hard about it, but, if > you know of bits that are hard, or have pointer to such, I'd be > interested in it. You imply you're considering backporting this to 4.2. I'd be amazed if that was worthwhile. I'd expect changes to be required in pretty much the whole compiler. Your strategy is built around the assumption that the majority of the work can be split into multiple independent chunks of work. There are several fairly obvious places where that is hard. eg. the frontend probably needs to process the whole file in series because previous declarations effect later code. And inter-procedural optimisations (eg. inlining) don't lend themselves to splitting on function boundaries. Actually, most IPA optimizations parallelize very well. Pointer analysis, inlining, can all be partitioned in ways that work can be split into threads. Mike is actually not saying anything that most people around here truly disagree with. We all want to eventually parallelize and distribute GCC optimizations. I just don't think we are at the point where it makes sense to start doing that yet. Personally, I believe the time to start thinking about parallelizing about this stuff is when the problems that make LTO hard (getting rid of all the little niggles like front ends generating RTL, and doing the hard stuff like a middle end type system) are solved.Why?. Without solving the problems that make LTO hard, you are going to hit them in trying to make the IPA optimizations (or anything else) parallel, because they are exactly the shared state between functions and global state problems that GCC has. In fact, various people (including me) have been been discussing how to parallelize and distribute our optimizations for a few months now. So if you really want to help parallelizing along, the thing to do is help LTO right now. I'm happy to commit to parallelizing IPA pointer analysis (which is a ridiculously parallel problem) once the hard LTO problems are solved. Before then, I just think we are going to end up with a bunch of hacks to try to work around our shared state. --Dan
Re: Threading the compiler
Ross Ridge wrote: Mike Stump writes: We're going to have to think seriously about threading the compiler. Intel predicts 80 cores in the near future (5 years). [...] To use this many cores for a single compile, we have to find ways to split the work. The best way, of course is to have make -j80 do that for us, this usually results in excellent efficiencies and an ability to use as many cores as there are jobs to run. Umm... those 80 processors that Intel is talking about are more like the 8 coprocessors in the Cell CPU. No, the Cell is asymmetrical (vintage 2000) architecture. Intel & AMD have announced that they are developing large multi-core symmetric processors. The timelines I've seen say that the number of cores on each chip will double every year or two. Moore's law hasn't stopped. The number of gates per chip doubles every 18 months. -- Michael Eager[EMAIL PROTECTED] 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
gmp/mpfr and multilib
Does anyone know how the changes for gcc to require gmp/mpfr will effect the multilib builds? In the past, gmp/mpfr in gfortran appeared to only be linked into the compiler itself so that a 32-bit/64-bit multilib build on Darwin PPC only required gmp/mpfr for 32-bit to be installed. Will any of the libraries in gcc now require gmp/mpfr such that both 32-bit and 64-bit versions of gmp/mpfr must be installed? If that is the case, will the multilib build look for both a lipo 32-bit/64-bit combined shared library in $prefix/lib as well as individual versions in lib and lib64 subdirectories? Jack
Re: Threading the compiler
Mike Stump wrote: Thoughts? Parallelizing GCC is an interesting problem for a couple reasons: First, the problem is inherently sequential. Second, GCC expects that each step in the process happens in order, one after the other. Most invocations of GCC are part of a "cluster" of similar invocations. If we look at this cluster, rather than at individual invocations, there may be opportunities for parallelization. Make -j allows several commands to run at the same time. It may be reasonable to incorporate some of the same functionality in the GCC driver, so that it starts processing threads in the background and exits. (There is the interesting question of how threads are re-synced.) Parsing the source is inherently a sequential operation. I don't think that it is possible to parse different include files independently in separate threads, or even identify the dependencies between include files. But it may be possible to use the results of parsing an include file (or sequence of include files) from another instance of GCC which is executing in a different process or thread. Each of the functions in a C/C++ program is dependent on the global environment, but each is independent of each other. Separate threads could process the tree/RTL for each function independently, with the results merged on completion. This may interact adversely with some global optimizations, such as inlining. -- Michael Eager[EMAIL PROTECTED] 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: Threading the compiler
Geert Bosch wrote: Most of my compilations (on Linux, at least) use close to 100% of CPU. Adding more overhead for threading and communication/synchronization can only hurt. On a single-processor system, adding overhead for multi- threading does reduce performance. On a multi-processor system, the overhead is distributed across all processors, but results in a net gain. For parallelizations pprograms, a 4-way processor might achieve 3X performance improvement. -- Michael Eager[EMAIL PROTECTED] 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: strict aliasing question
Howard Chu <[EMAIL PROTECTED]> writes: > Daniel Berlin wrote: > > > > We ask the TBAA analyzer "can a store to a short * touch i. > > In this case, it says "no", because it's not legal. > > > If you know the code is not legal, why don't you abort the compilation > with an error code? It's not actually that easy to detect the undefined cases. Sometimes it's easy, sure. But most times it is not. The compiler does not normally do the sort of analysis which is required. That said, one of my co-workers has developed a patch which detects most aliasing violations, based on the compiler's existing alias analysis. It is able to give warnings for a wide range of cases which the compiler does not currently detect, for a relatively small increase in compilation time. If everything works out right, we'll propose it for gcc 4.3. Ian
Re: gmp/mpfr and multilib
Jack Howarth wrote: Does anyone know how the changes for gcc to require gmp/mpfr will effect the multilib builds? In the past, gmp/mpfr in gfortran appeared to only be linked into the compiler itself so that a 32-bit/64-bit multilib build on Darwin PPC only required gmp/mpfr for 32-bit to be installed. Will any of the libraries in gcc now require gmp/mpfr such that both 32-bit and 64-bit versions of gmp/mpfr must be installed? If that is the case, will the multilib build look for both a lipo 32-bit/64-bit combined shared library in $prefix/lib as well as individual versions in lib and lib64 subdirectories? So far as I know, gmp/mpfr is still only being used for compile-time evaluation of constant expressions (in order to do so in a way that's not dependent on the host's architecture, as it may be different from the target's architecture). I don't believe that there's any intention of using it in a way that would make it useful to link into libraries. - Brooks
Re: Threading the compiler
> Each of the functions in a C/C++ program is dependent on > the global environment, but each is independent of each other. > Separate threads could process the tree/RTL for each function > independently, with the results merged on completion. This > may interact adversely with some global optimizations, such > as inlining. Is it just me or could lazy evaluation really help here? Ok maybe its just me.
Re: Polyhedron performance regression
Just wanted to note to the list that Tobias spotted a performance regression on Polyhedron ac. http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron- summary.txt-2-0.html Hum, the performance change on ac is significant. Anyway we can get the revision numbers before and after the jump? (and before the last jump to zero)? What patches have been commited in that time that could affect this? I can't see anything on the fortran patches... FX
Re: Polyhedron performance regression
On 11/11/06, FX Coudert <[EMAIL PROTECTED]> wrote: > Just wanted to note to the list that Tobias spotted a performance > regression on Polyhedron ac. > > http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron- > summary.txt-2-0.html Hum, the performance change on ac is significant. Anyway we can get the revision numbers before and after the jump? (and before the last jump to zero)? What patches have been commited in that time that could affect this? I can't see anything on the fortran patches... If I had to guess I would say it was the forwprop merge. But I didn't investigate. Richard.
Re: Polyhedron performance regression
On Sat, 11 Nov 2006, FX Coudert wrote: > >Just wanted to note to the list that Tobias spotted a performance regression > >on Polyhedron ac. > > > >http://www.suse.de/~gcctest/c++bench/polyhedron/polyhedron-summary.txt-2-0.html > > Hum, the performance change on ac is significant. Anyway we can get the > revision numbers before and after the jump? (and before the last jump to > zero)? What patches have been commited in that time that could affect this? I > can't see anything on the fortran patches... Must have between r118372 and r118615. Richard.
Re: Polyhedron performance regression
Richard, If I had to guess I would say it was the forwprop merge... The what? :-) Paul
Re: Polyhedron performance regression
On 11/11/06, Paul Thomas <[EMAIL PROTECTED]> wrote: Richard, > > If I had to guess I would say it was the forwprop merge... The what? :-) fwprop, see http://gcc.gnu.org/ml/gcc-patches/2006-11/msg00141.html If someone can confirm that this patch causes the drop, I can help trying to find a fix. Gr. Steven
-funsafe-math-optimizations and -fno-rounding-math
Hello, -fno-rounding-math enables the transformation of (-(X - Y)) -> (Y - X) in simplify-rtx.c which seems to be the same transformation that enabled by -funsafe-math-optimizations in fold-const.c. If I understand currently -frounding-math means that the rounding mode is important. In that case should there be correlation between -funsafe-math-optimizations and -fno-rounding-math (which currently does not exist)? Thanks, Revital
Re: Polyhedron performance regression
Steven and Jerry, If someone can confirm that this patch causes the drop, I can help trying to find a fix. amd64/Cygwin_NT $ /irun/bin/gfortran -O3 -funroll-loops -ffast-math -march=opteron ac.f90 118372 20.2s 118475 20.4s Bonzini's patch 118704 16.2s I believe that the improvement is FX's and my patch for MOD. Notice that this is a single core machine and that there is a PR out on the vectorizer. Could this be the problem, since the suse tests are done on a two core machine, if I understood correctly? Paul
Re: -funsafe-math-optimizations and -fno-rounding-math
On 11/11/06, Revital1 Eres <[EMAIL PROTECTED]> wrote: Hello, -fno-rounding-math enables the transformation of (-(X - Y)) -> (Y - X) in simplify-rtx.c which seems to be the same transformation that enabled by -funsafe-math-optimizations in fold-const.c. If I understand currently -frounding-math means that the rounding mode is important. In that case should there be correlation between -funsafe-math-optimizations and -fno-rounding-math (which currently does not exist)? I think the simplify-rtx.c code is partly wrong, as it changes behavior with signed zeros. I don't know off-hand if -(X - Y) and Y - X behave the same in rounding if the rounding mode is round to nearest, but certainly for round to +Inf it will differ. So HONOR_SIGNED_ZEROS (mode) && !flag_rounding_math might be the correct predicate here (and in the fold-const.c case). But floating point rounding scares me ;) Richard.
Re: Polyhedron performance regression
On Sat, 11 Nov 2006, Paul Thomas wrote: > Steven and Jerry, > > > > > > If someone can confirm that this patch causes the drop, I can help > > trying to find a fix. > > > amd64/Cygwin_NT > > $ /irun/bin/gfortran -O3 -funroll-loops -ffast-math -march=opteron ac.f90 > > 118372 20.2s > 118475 20.4s Bonzini's patch > 118704 16.2s > > I believe that the improvement is FX's and my patch for MOD. Note that the suse x86_64 tester has 18.6s with the last run before bonzinis patch and 30.8s the first run after it, so it regressed quite badly. It also has -ftree-vectorize as additional option here. > > Notice that this is a single core machine and that there is a PR out on the > vectorizer. Could this be the problem, since the suse tests are done on a two > core machine, if I understood correctly? Yes this is a dual-socket machine. But I don't see how this can be an issue here? Richard. -- Richard Guenther <[EMAIL PROTECTED]> Novell / SUSE Labs
comments on getting the most out of multi-core machines
my 'day job' is a medium sized software operation. we have between 5 and 50 programmers assigned to a given project; and a project is usually a couple of thousand source files (mix of f77,c,c++, ada). all this source get's stuck in between 50 and 100 libraries, and the end result is less than a dozen executables...one of which is much larger than the rest. it is a rare single file that takes more than 30 seconds to compile (at least with gcc3 and higher). linking the largest executable takes about 3 minutes. (sorry to be so long-winded getting to the topic!!) ordinary case is i change a file and re-link. takes less than 3.5 minutes. even if gcc was infinitely fast, it would still be 3 minutes. the other case is compiling everything from scratch (which is done regularly). Using a tool like SCons which can build a total dependency graph, i have learned that roughly j100 would be ideal. of course i am stuck with -j4 today. given enough cores to throw the work on, best case is still 3.5 minutes. (of course, this is a simplified analysis) my point in all of this is that effort at the higher policy levels (by making the build process mult-threaded at the file level) pays off today and for the near future. changing gcc to utilize multi-core systems may be a lot harder and less beneficial than moving up the problem space a notch or two. regards, bud davis
re: comments on getting the most out of multi-core machines
* /From/: Bud Davis * /Date/: Sat, 11 Nov 2006 16:06:44 -0800 (PST) it is a rare single file that takes more than 30 seconds to compile (at least with gcc3 and higher). linking the largest executable takes about 3 minutes. (sorry to be so long-winded getting to the topic!!) ordinary case is i change a file and re-link. takes less than 3.5 minutes. even if gcc was infinitely fast, it would still be 3 minutes. Sounds like you'd be well served by an incremental linker, like AIX provides. But that's mostly a topic for a binutils list. The AIX tools have some good ideas, worth adopting more widely, like recording the sizes of functions in each object file. It basically allows each individual function to be treated separately, the way traditional linkers treat separate .o files. So individual functions can be replaced during a relink, leaving the majority of the object file intact (plus or minus relocations that moved along the way). It also somewhat blurs the distinction between a fully linked executable file and an intermediate relocatable object, since executables can also be incrementally relinked. It's a real timesaver when you just need to fix one file in a very large program. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: comments on getting the most out of multi-core machines
Howard Chu wrote: It also somewhat blurs the distinction between a fully linked executable file and an intermediate relocatable object, since executables can also be incrementally relinked. It's a real timesaver when you just need to fix one file in a very large program. a proper fix-and-continue functionality would be even more of a timesaver, maybe that's where the energy should go ...
Re: strict aliasing question
Ian Lance Taylor wrote: Howard Chu <[EMAIL PROTECTED]> writes: Daniel Berlin wrote: We ask the TBAA analyzer "can a store to a short * touch i. In this case, it says "no", because it's not legal. If you know the code is not legal, why don't you abort the compilation with an error code? It's not actually that easy to detect the undefined cases. Sometimes it's easy, sure. But most times it is not. The compiler does not normally do the sort of analysis which is required. OK, that makes sense too. Dan's statement implied that there was a cut-and-dry test. If the analysis has not occurred, then you obviously cannot know that certain statements can be ignored. You can't even know that they're safe to re-order. That said, one of my co-workers has developed a patch which detects most aliasing violations, based on the compiler's existing alias analysis. It is able to give warnings for a wide range of cases which the compiler does not currently detect, for a relatively small increase in compilation time. If everything works out right, we'll propose it for gcc 4.3. Here's a different example, which produces the weaker warning warning: type-punning to incomplete type might break strict-aliasing rules struct foo; int blah(int fd) { int buf[BIG_ENOUGH]; void *v = buf; struct foo *f; f = v; f = (struct foo *)buf; init(f, fd); munge(f); flush(f); } "foo" is an opaque structure. We have no idea what's inside, we just know that it's relatively small. There are allocators available that will malloc them for us, but we don't want to use malloc here because it's too slow, so we want to reserve space for it on the stack, do a few things with it, then forget it. If we go through the temporary variable v, there's no warning. If we don't use the temporary variable, we get the "might break" message. In this case, nothing in our code will ever dereference the pointer. Why is there any problem here, considering that using the temporary variable accomplishes exactly the same thing, but requires two extra statements? -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: strict aliasing question
Mike Stump wrote: On Nov 10, 2006, at 9:48 AM, Howard Chu wrote: Richard Guenther wrote: If you compile with -O3 -combine *.c -o alias it will break. Thanks for pointing that out. But that's not a realistic danger for the actual application. The accessor function is always going to be in a library compiled at a separate time. The call will always be from a program built at a separate time, so -combine isn't a factor. We are building a compiler to outsmart you. We presently working on technology (google ("LTO")) to break your code. :-) Don't cry when we turn it on by default and it does. I'd recommend understanding the rules and following them. This raises another interesting point. Aggressive link time optimization may be nice in a lot of cases, but there are boundaries that should not (and most likely cannot) be crossed. E.g., you shouldn't go peering inside libraries to look behind their exported interfaces. You probably can't, in the case of a shared library, but you probably could for a static library. This also raises the question of what exactly a static library represents - is it just a group of object files collected together for convenience, as an intermediate step in a large build process, or is it a coherent entity that provides a strictly defined set of services? GNU libtool is known to create "convenience libraries" simply as a means of aggregating object files together, before doing something else with them. (A good argument can be made that this is stupid; they should be using "ld -r" for that purpose, but that's another story...) libtool isn't the only example, either. As a convenience collection, you should be free to globally optimize across it to your heart's content. But as an actual library, you should stop at its exported interface. How will you distinguish these two cases, when all you see is "foo.a" on the command line? -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
RE: strict aliasing question
On 12 November 2006 03:35, Howard Chu wrote: > Here's a different example, which produces the weaker warning > warning: type-punning to incomplete type might break strict-aliasing rules > > struct foo; > > int blah(int fd) { > int buf[BIG_ENOUGH]; > void *v = buf; > struct foo *f; > > f = v; > f = (struct foo *)buf; > > init(f, fd); > munge(f); > flush(f); > } > > "foo" is an opaque structure. We have no idea what's inside, we just > know that it's relatively small. There are allocators available that > will malloc them for us, but we don't want to use malloc here because > it's too slow, so we want to reserve space for it on the stack, do a few > things with it, then forget it. > > If we go through the temporary variable v, there's no warning. If we > don't use the temporary variable, we get the "might break" message. Try > f = (struct foo *)(void *)buf; Or even better... struct foo; int blah(int fd) { struct foo *f; f = alloca (BIG_ENOUGH); init(f, fd); munge(f); flush(f); } cheers, DaveK -- Can't think of a witty .sigline today
Re: Threading the compiler
Ross Ridge wrote: >Umm... those 80 processors that Intel is talking about are more like the >8 coprocessors in the Cell CPU. Michael Eager wrote: >No, the Cell is asymmetrical (vintage 2000) architecture. The Cell CPU as a whole is asymmetrical, but I'm only comparing the design to the 8 identical coprocessors (of which only 7 are enabled in the CPU used in the PlayStation 3). >Intel & AMD have announced that they are developing large multi-core >symmetric processors. The timelines I've seen say that the number of >cores on each chip will double every year or two. This doesn't change that fact that SMP systems don't scale well after 16 processors or so. To go beyond that you need a different design. Clustering and NUMA have been ways of solving the problem outside the chip. Intel's plan for solving it inside the chip involves giving each of the 80 cores it's own 32 MB of SRAM and only connecting each core to its immediate neighbours. This is similiar to the Cell SPE's. Each has 256K of local memory and they're all connected together in a ring. > Moore's law hasn't stopped. While Moore's Law may still be holding on, bus and memory speeds aren't doubling every two years. You can't design an 80 core CPU like an 4 core CPU with 20 times as many cores. Having 80 processors all competing over the same bus for the same memory won't work. Neither will "make -j80". You need to do more than just divide up the work between different processes or threads. You need to divide up the program and data into chunks that will fit into each core's local memory and orchestrate everything so that the data propagates smoothly between cores. > The number of gates per chip doubles every 18 months. Actually, in fact it's closer to doubling every 24 months and Gordon Moore never said it would double every 18 months. Originaly in 1965 he said that the number of components doubled every year, in 1975 after things slowed down he revised it to doubling every two years. Ross Ridge
RE: strict aliasing question
On 12 November 2006 04:16, Howard Chu wrote: > Dave Korn wrote: >> On 12 November 2006 03:35, Howard Chu wrote: >> >> >>> If we go through the temporary variable v, there's no warning. If we >>> don't use the temporary variable, we get the "might break" message. >>> >> >> Try >> >> >>> f = (struct foo *)(void *)buf; >>> >> >> > That's good, but why is it safe? Passing through void* means gcc has to assume it could alias anything, IIUIC, as a result of the standard allowing implicit void*<=>T* conversions. cheers, DaveK -- Can't think of a witty .sigline today
Re: strict aliasing question
Howard Chu <[EMAIL PROTECTED]> writes: > Here's a different example, which produces the weaker warning > warning: type-punning to incomplete type might break strict-aliasing rules > > struct foo; > > int blah(int fd) { > int buf[BIG_ENOUGH]; > void *v = buf; > struct foo *f; > > f = v; > f = (struct foo *)buf; > > init(f, fd); > munge(f); > flush(f); > } > > "foo" is an opaque structure. We have no idea what's inside, we just > know that it's relatively small. There are allocators available that > will malloc them for us, but we don't want to use malloc here because > it's too slow, so we want to reserve space for it on the stack, do a > few things with it, then forget it. > > If we go through the temporary variable v, there's no warning. If we > don't use the temporary variable, we get the "might break" message. In > this case, nothing in our code will ever dereference the pointer. Why > is there any problem here, considering that using the temporary > variable accomplishes exactly the same thing, but requires two extra > statements? Since you don't do any loads or stores via buf, this code is going to be OK. The warning you get is not all that good since it gives both false positives and (many) false negatives. Your code will be safe on all counts if you change buf from int[] to char[]. The language standard grants a special exemption to char* pointers. Without that exemption, it would be impossible to write malloc in C. Ian
Re: strict aliasing question
On Sat, 2006-11-11 at 22:18 -0800, Ian Lance Taylor wrote: > > Your code will be safe on all counts if you change buf from int[] to > char[]. The language standard grants a special exemption to char* > pointers. Without that exemption, it would be impossible to write > malloc in C. Actually it is not that what the C standard allows. What the C standard says is accesses via the character type is always valid and the normal type (and signed/unsigned version of both the normal and character type). This means accessing an element of the character array via any other type except via an unsigned/signed character type is undefined. Thanks, Andrew Pinski
Re: strict aliasing question
Andrew Pinski wrote: On Sat, 2006-11-11 at 22:18 -0800, Ian Lance Taylor wrote: Your code will be safe on all counts if you change buf from int[] to char[]. The language standard grants a special exemption to char* pointers. Without that exemption, it would be impossible to write malloc in C. As I recall, we chose int[] for alignment reasons, figuring we'd have no guarantees on the alignment of a char[]. Actually it is not that what the C standard allows. What the C standard says is accesses via the character type is always valid and the normal type (and signed/unsigned version of both the normal and character type). This means accessing an element of the character array via any other type except via an unsigned/signed character type is undefined. Right, I've just read that text as well, which is why I'm still wondering. But as Ian said, we never do any loads or stores into the actual buf, so it seems we don't need to care care whether its value is defined or not. If that's a safe assumption, then I propose that this is a rule worth stating: aliasing means two pointers point to the same memory if only one pointer is ever used to access that memory, aliasing doesn't matter -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc OpenLDAP Core Teamhttp://www.openldap.org/project/
Re: strict aliasing question
On Nov 11, 2006, at 10:45 PM, Howard Chu wrote: Andrew Pinski wrote: On Sat, 2006-11-11 at 22:18 -0800, Ian Lance Taylor wrote: Your code will be safe on all counts if you change buf from int[] to char[]. The language standard grants a special exemption to char* pointers. Without that exemption, it would be impossible to write malloc in C. As I recall, we chose int[] for alignment reasons, figuring we'd have no guarantees on the alignment of a char[]. True, but add __attribute__((aligned(4))) and all is well.