Re: gomp slowness
I'm not sure what OpenMP spec says about default data scope (too lazy to read through), but it seems that examples from http://kallipolis.com/openmp/2.html assume default(private), while GCC GOMP defaults to shared. In your case, #pragma omp parallel for shared(A, row, col) for (i = k+1; i
Optimization of conditional access to globals: thread-unsafe?
Hello, I have a question regarding the thread-safeness of a particular GCC optimization. I'm sorry if this was already discussed on the list, if so please provide me with the reference to the previous discussion. Consider this piece of code: extern int v; void f(int set_v) { if (set_v) v = 1; } If f() is called concurrently from several threads, then call to f(1) should be protected by the mutex. But do we have to acquire the mutex for f(0) calls? I'd say no, why, there's no access to global v in that case. But GCC 3.3.4--4.3.0 on i686 with -01 generates the following: f: pushl %ebp movl%esp, %ebp cmpl$0, 8(%ebp) movl$1, %eax cmove v, %eax; load (maybe) movl%eax, v; store (always) popl%ebp ret Note the last unconditional store to v. Now, if some thread would modify v between our load and store (acquiring the mutex first), then we will overwrite the new value with the old one (and would do that in a thread-unsafe manner, not acquiring the mutex). So, do the calls to f(0) require the mutex, or it's a GCC bug? This very bug was actually already reported for a bit different case, "Loop IM and other optimizations harmful for -fopenmp" (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 ; please ignore my last comment there, as I no longer sure myself). But the report was closed with "UNCONFIRMED" mark, and reasons for that are not quire clear to me. I tried to dig into the C99 standard and David Butenhof's "Programming with POSIX Threads", and didn't find any indication that call f(0) should be also protected by the mutex. Here are some pieces from C99: Sec 3.1 par 3: NOTE 2 "Modify" includes the case where the new value being stored is the same as the previous value. Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access objects. Sec 5.1.2.3 par 3: In the abstract machine, all expressions are evaluated as specified by the semantics. Sec 5.1.2.3 par 5 basically says that the result of the program execution wrt volatile objects, external files and terminal output should be the same for all confirming implementations. Sec 5.1.2.3 par 8: EXAMPLE 1 An implementation might define a one-to-one correspondence between abstract and actual semantics: ... Sec 5.1.2.3 par 9: Alternatively, an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. ... I think that the above says that even when compiler chooses to do some optimizations, the result of the _whole execution_ should be the same as if actual semantics equals to abstract semantics. Sec 5.1.2.3 par 9 cited last is not a permission to do optimizations that may change the end result. In our case when threads are involved the result may change, because there's no access to v in the abstract semantics, and thus no mutex is required from abstract POV. So, could someone explain me why this GCC optimization is valid, and, if so, where lies the boundary below which I may safely assume GCC won't try to store to objects that aren't stored to explicitly during particular execution path? Or maybe the named bug report is valid after all? Thanks in advance, -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Sun, Oct 21, 2007 at 17:26:02 +0200, Erik Trulsson wrote: > Note that C99 is firmly based on a single-threaded execution model and says > nothing whatsoever about what should happen or not happen in a threaded > environment. According to C99 a C compiler is allowed to generate such code > as gcc does. Yes, I understand that C99 doesn't concern threads per see, but I wouldn't call it pro-single-threaded, rather thread-neutral. I.e. the standard isn't made explicitly incompatible with threads, it is simply "says nothing about threads". > If you are using some threaded environment then you will have to read the > relevant standard for that to find out if it imposes any additional > restricitions on a C compiler beyond what the C standard does. All we have is POSIX, and it imposes very little on compiler I guess. > I suspect that most of them will not say one way or the other about what > should happen in this case, which means that you will have to assume the > worst case and protect all calls to f() regardless of the value of the > argument. Well, assuming the worst case won't always work, that's why I asked about reasonable boundary. Consider the following (putting style/efficiency matters aside): #include #define N 100 /* mutex[i] corresponds to byte[i]. */ pthread_mutex_t mutex[N]; char byte[N]; void f(int i) { pthread_mutex_lock(&mutex[i]); byte[i] = 1; pthread_mutex_unlock(&mutex[i]); } Is this code thread-safe? Because from some POV C99 doesn't forbid to load and store the whole word when single byte[i] is accessed (given that C99 is pro-single-threaded). But if C99 is thread-neutral, then it's compiler's responsibility to ensure the same result as some abstract machine (which may be sequential). In this case the compiler should access the single byte, no more. OK, I've got your point, but I'm not satisfied :). -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 00:07:50 +0100, Dave Korn wrote: > Because of the 'as-if' rule. Since the standard is neutral with regard to > threads, gcc does not have to take them into account when it decides whether > an optimisation would satisfy the 'as-if' rule. If this would be true, then the compiler is free to inject the sequence mov mem -> reg mov reg -> mem just _anywhere_. How the programmer can predict where and when to lock the mutex to protect mem? The only thing we could relay on then is that the compiler is sound, it wouldn't inject such a sequence unless it really feels so. But still, how to determine when the compiler really feels so? Here's another piece of code, more real and sound this time: #include static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; static int acquires_count = 0; int trylock() { int res; res = pthread_mutex_trylock(&mutex); if (res == 0) ++acquires_count; return res; } Is it thread safe? Or rather, should the compiler preserve its thread-safeness, as seen from the programmer's POV? Otherwise I don't get how pthread_mutex_trylock() could possibly ever be used, because it's exactly the case when you _have_ to do the access based on the condition, "assume the worst" won't work here. GCC 4.3 with -O1 generates: trylock: pushl %ebp movl%esp, %ebp subl$8, %esp movl$mutex, (%esp) callpthread_mutex_trylock cmpl$1, %eax; test res movlacquires_count, %edx; load adcl$0, %edx; maybe add 1 movl %edx, acquires_count; store leave ret -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 11:19:31 +0100, Andrew Haley wrote: > Please have a read of [1]. Let us know if anything you have observed > isn't covered in that paper. > > [1] Hans-Juergen Boehm. Threads cannot be implemented as a library. In > Proc. of the ACM SIGPLAN 2005 Conf. on Programming Language > Design and Implementation (PLDI), pages 261?268, Chicago, IL, June > 2005. Unfortunately I'm not lucky enough to have ACM access. But from the Abstract: We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code. Can't agree less! That's why for _practical_ reasons I'd say GCC should be thread-aware, even if _theoretically_ it doesn't have to. And AFAIU it already _is_, for the most part of it. That's why I want to see Bug#31862 be confirmed, accepted, and fixed. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 14:50:44 +0400, Tomash Brechko wrote: > Can't agree less! Can't agree more!, that's what it was supposed to say, think you've got it right ;). -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 11:54:47 +0100, Dave Korn wrote: > http://www.google.com/search?q=Threads+cannot+be+implemented+as+a+library&sour > ceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org > .mozilla:en-GB:official Thanks! -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 12:07:20 +0100, Dave Korn wrote: > And even volatile wouldn't help if the code said > > if (i > x) > var += i; > > instead of a simple assignment. The race in fact *does* exist in the original > program, but is hidden by the fact that you don't care which of two operations > that overwrite the previous value complete in which order, but you're assuming > the operation that modifies var is atomic, and there's nothing to innately > guarantee that in the original program. The race condition *is* already > there. Why? For that example, if executed verbatim, it is either i > x always false, or the mutex is properly acquired. No one is assuming atomic update. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 12:08:02 +0100, Andrew Haley wrote: > Well, that's a big job: you'd have to decide on what a memory model > really should be, and then implement that model. Wouldn't the following rule of thumb work?: GCC is allowed to inject additional store operations on some execution path only if there are explicit store operations (i.e. issued by the user code if read verbatim). The whole problem will vanish if the last store that GCC adds will be made conditional, like if (there_were_explicit_stores_already) store; When execution do not get to basic blocks that have stores, GCC shouldn't add any. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 12:19:40 +0100, Dave Korn wrote: > *What* mutex are you referring to? There is no mutex in that code. I was talking about the code in the comment#7. For the code in the comment#1, the piece is simply incomplete. For it, mutex should be used if x < 99, not clear if x >= 99. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 14:53:41 +0100, Dave Korn wrote: > The optimisation the compiler is making here is a big win in normal > code, you wouldn't want to disable it unless absolutely necessary; > to be precise, you wouldn't want to automatically disable it for > every loop and variable in a program that used -fopenmp just because > /some/ of the variables in that program couldn't be safely accessed > that way. I'd rather wish the optimization would be done differently. Currently we have: mem -> reg; loop loop if (condition)=> optimize => if (condition) val -> mem; val -> reg; reg -> mem; But it could use additional register and be: 0 -> flag_reg; loop if (condition) val -> reg; 1 -> flag_reg; if (flag_reg == 1) reg -> mem; Note that by doing so we also eliminate all memory accesses when they are not needed (when condition is never true), and memory bandwidth is a major limiting factor nowadays. Actually, for the very first code piece of this thread I'd say that optimization mem -> reg; if (condition) => optimize => if (condition) val -> mem; val -> reg; reg -> mem; (there's no loop) is actually a counter-optimization even in single-threaded case: we replace a branch, which surely has its costs, with unconditional memory load and store, which cost much more. Even if branching would flush CPU pipeline even when jump destination is already in the pipeline (is this the case?), memory load has its own quite big cost plus the cost of flushing one line from the cache just to perform single operation on mem. So, why not use flag_reg and thus make GCC thread-aware for this case? I read the article suggested by Andrew Haley, its main point is that the compiler should be made thread-aware. Making all shared objects volatile is an overkill, and is more a trick rather than a solution. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 18:15:35 +0200, Michael Matz wrote: > > I'd rather wish the optimization would be done differently. Currently > > we have: > > > > mem -> reg; > >loop loop > > if (condition)=> optimize => if (condition) > >val -> mem; val -> reg; > > reg -> mem; > > > > > > But it could use additional register and be: > > > > 0 -> flag_reg; > > loop > >if (condition) > > val -> reg; > > 1 -> flag_reg; > > if (flag_reg == 1) > >reg -> mem; > > That could be done but would be besides the point. You traded one > conditional store with another one, so you've gained nothing in that > transformation. Rather I traded possibly many conditional stores in a loop with one conditional store outside the loop. And this exactly coincides with the point of discussion: you can't go further, when you replace conditional store with unconditional one, you introduce the race that wasn't in the original code. Several people already suggested to use volatile for shared data. Yes, it will help because we know it will disable all access optimizations, including thread-unaware ones. But I don't want to disable _all_ optimizations, I rather vote for thread-aware optimizations. There is no requirement in POSIX to make all shared data volatile. As the article referenced in the thread explains, there is no agreement between POSIX and C/C++ wrt memory access. But should it be fixed in the compiler (as article suggests), or should every shared data in every threaded program be defined volatile, just for the case? I never seen latter approach in any Open Source project (though didn't look for it specifically), and many of them are considered quite portable. Again, we are not discussing some particular code sample, and how it might be fixed, but the problem in general. Should GCC do thread-unsafe optimizations, or not? -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 18:33:37 +0100, Andrew Haley wrote: > We do understand what you're saying, and simply repeating the same > thing doesn't help. > > I think we should wait to see what the C++ working group comes up with > and consider implementing that, rather than some ad-hoc gcc-specific > proposal. Aha, but repeating worked. This is the first time someone agrees that the problem lies not entirely in the programmer's code. Thank you! :)) -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 22, 2007 at 18:48:02 +0100, Andrew Haley wrote: > Err, not exactly. :) > > See http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/why_undef.html Why, I'd say that page is about original races in the program, not about what compiler should do with races that it introduces itself. Still, "let's wait and see" is probably the best outcome that I can expect from this discussion, so thanks anyway. ;) -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
Hello Bart, Thanks for the summary. There are good pointers in this e-mail thread regarding the current state of the process of defining memory model for C++ (and eventually for C I guess). >From those pointers several conclusions may be made (which are in line with that you said): - though neither Standard C nor POSIX require to use volatile, it seems like you have to use it until the memory model is clearly defined. - the compiler should not introduce speculative stores to the shared objects. This is what my original question was about. I haven't read all the papers yet, so one thing is still unclear to me: it seems like atomic variables will be annotated as such (atomic). But I found no proposal for annotation of non-atomic objects that are protected by the ordinary locks (like mutexes). Will the compiler be forbiden to do all speculative stores, or how will it recognize shared objects as such? - the compiler should not cross object boundary when doing the store (i.e. when storing to 8-bit char it should not store to the whole 32/64-bit word). Here's the same question about shared object annotation. Cheers, -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Fri, Oct 26, 2007 at 08:32:07 -0700, Ian Lance Taylor wrote: > The language standard does not forbid speculative stores to non-atomic > objects. That's why there's a proposal to refine the language. I was meaning the folloing: http://www.artima.com/cppsource/threads_meeting.html: Hans Boehm and Herb Sutter both presented very detailed and well-thought out memory models. Their differences are subtle and important, but in broad strokes, both proposals paint a similar picture. In particular, both proposals: * Specify a set of atomic (aka, interlocked) primitive operations. * Explicitly specify the ordering constraints on atomic reads and writes. * Specify the visibility of atomic writes. * Disallow speculative stores on potentially shared objects. * Disallow reading and re-writing of unrelated objects. (For instance, if you have struct S{ char a,b; }; it is not OK to modify b by reading in the whole struct, bit-twiddling b, and writing the whole struct because that would interfere with another thread that is trying to write to a.) So, will "potentially shared objects" be marked as such explicitly by the programmer, or is it a compiler job to identify them? -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Fri, Oct 26, 2007 at 17:00:28 +0100, Dave Korn wrote: > > * Disallow speculative stores on potentially shared objects. > > * Disallow reading and re-writing of unrelated objects. (For > > instance, if you have struct S{ char a,b; }; it is not OK to > > modify b by reading in the whole struct, bit-twiddling b, and > > writing the whole struct because that would interfere with > > another thread that is trying to write to a.) > > I don't see how that second one is possible in the most general case. Some > cpus don't have all widths of access mode; >From http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf: Fortunately, the original motivation for this lax specification seems to stem from machine architectures that did not support byte-wide stores. To our knowledge, no such architectures are still in wide-spread multiprocessor use. > and how could it possibly work for sub-world bitfields? (Or are > those just to be considered 'related'?) How mutex-protected, or even atomic access to bit-fields could possibly work? Yes, they are related, or rather do not constitute a separate object, but belong to one common. > Aren't we about to reinvent -fvolatile, with all the hideous performance > losses that that implies? It was already said that instead of disallowing all optimization with volatile, the optimization itself may be made a bit differently. Besides, the concern that it will hurt performance at large is a bit far-stretched. You still may speculatively store to automatic var for which address was never taken, and this alone covers 50%--80% of cases. Only globals, or locals which address was passed to some function, should be treated specially. Also, for the case void f(int set_v, int *v) { if (set_v) *v = 1; } there's no load-maybe_update-store optimization, so there won't be slowdown for such cases also (BTW, how this case is different from when v is global?). -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Fri, Oct 26, 2007 at 19:04:10 +0200, Michael Matz wrote: > int f(int M, int *mc, int *mpp, int *tpmm, int *ip, int *tpim, int *dpp, > int *tpdm, int xmb, int *bp, int *ms) > { > int k, sc; > for (k = 1; k <= M; k++) > { > mc[k] = mpp[k-1] + tpmm[k-1]; > if ((sc = ip[k-1] + tpim[k-1]) > mc[k]) mc[k] = sc; > if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k]) mc[k] = sc; > if ((sc = xmb + bp[k]) > mc[k]) mc[k] = sc; > mc[k] += ms[k]; > } > } Aha, but the store in this example is _never_ speculative when concurrency in concerned: you _explicitly_ store to mc[k] anyway, so you may as well add some stores here and there. If mc[] shared, it's programmer's responsibility to protect it with the lock. When you remove the first and the last lines inside the loop, then all stores will become conditional. But only one value will get to mc[k], so there's no point in making the only store unconditional. Note that it doesn't cancel cmoves, as those are loads, not stores. But look at the whole matter another way: suppose GCC implements some optimization, really cool one, and users quickly find a lot of uses for it. But then it is discovered that this optimization is not general enough, and in come cases wrong code is produced. What would you do? Remove it? But users will complain. Ignore the matter? Other users will complain. But you may make it optional, like -funsafe-math-optimizations or -funsafe-loop-optimizations, and everyone is happy. Our situation is a bit different, because 1) speculative store is not a bug per see, 2) program classes where it can do harm (mutli-threaded), and where it can not (single-threaded), are clearly separable. Alright, not entirely, because we don't know when and how libraries are used. But that is the case for -funsafe- options above too. Want safe library? Compile with -fno-thread-unsafe-optimizations, or specify that any user data pointers to which are passed to the library should not be shared (at least during the library call). > > void > > f(int set_v, int *v) > > { > > if (set_v) > > *v = 1; > > } > > > > there's no load-maybe_update-store optimization, so there won't be > > slowdown for such cases also (BTW, how this case is different from > > when v is global?). > > The difference is, that 'v' might be zero, hence *v could trap, hence it > can't be moved out of its control region. If you somehow could determine > that *v can't trap (e.g. by having a dominating access to it already) then > the transformation will be done. Good point. But how to tell the compiler that it is not NULL? The following doesn't work too: void f(int set_v, int v[1]) { if (set_v) v[0] = 1; } void g(int set_v, int *v) __attribute__((nonnull)); void g(int set_v, int *v) { if (set_v) *v = 1; } Please note that I'm not trying to prove you wrong, just curious about the reasons why there's no optimization. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Fri, Oct 26, 2007 at 21:45:03 +0400, Tomash Brechko wrote: > Note that it doesn't cancel cmoves, as those are loads, not stores. I just checked with x86 instruction reference, CMOVcc is reg->reg or mem->reg, never reg->mem. You know God's deed when you see it. :) -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Sat, Oct 27, 2007 at 03:06:21 +1000, skaller wrote: > err .. what about the heap?? The heap are objects for which the addresses were taken. So they can be shared. But I haven't yet seen that the optimization we discuss is being applied to the object accessed though the pointer (see my reply to Michael Matz). Maybe this is just a coincidence. I was beaten already for repeating myself, but please let me do that once more :). First, I have a strong believe (though I didn't test it) that if (C) val->mem; runs faster than mem->reg; if (C) val->reg; reg->mem; (short) jump will cost less then unconditional load/store when they are not needed (especially the store). BTW, it would be interesting to measure if short jumps are as bad as long jumps, i.e. whether CPU pipeline is flushed when jump target is already in it. Second, in situation like loop if (C) val->mem; i.e. when there are lots of conditional stores, only one final store matters. And current optimization employs this: mem->reg; loop if (C) val->reg; reg->mem;// One final store. But at the cost of additional register this final store can be made conditional (there are cases when even that register is not needed, but that requires thorough analysis of val's possible values, i.e. reg could be initialized to some "invalid" value and then checked for it). Registers are a valuable resource, yes. But so is the correct program result. Since GCC is correct wrt all standards, next comes its usability in not-yet-standardized domains. > And what do you do if you do not KNOW what the storage class is, > which is the case 99.99% of the time in C++ member functions? I'm not quite sure what you mean here. If extern vs static---that's of no concern. What matters is whether the object can possibly be accessed from another thread, and this has nothing specific to C++. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Sat, Oct 27, 2007 at 09:25:09 +1000, skaller wrote: > Yes, but with a class: > > struct X { > int x; > void f() { if (C) x = 1; } > void f2() { reg = x; if (c) reg = 1; x = reg; } > }; Hmm, indeed, and the example may end right here, you don't have to allocate global X. x member is "shared" among all X member functions, so if both X::f() and X::f2() are called concurrently for the same object without the lock, you are in trouble, even if you know only one of them might modify the x for current conditions. Since both f() and f2() implicitly get 'this' pointer, the situation when "the address of some local var is taken" is more frequent then I thought before, thanks for pointing this. Then perhaps all unconditional speculative stores should be avoided (unless there's also explicit unconditional store), without the need to analize whether it is safe or not. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Sun, Oct 28, 2007 at 09:47:36 -0400, Robert Dewar wrote: > Bart Van Assche wrote: > > >Requiring that all thread-shared variables should be declared > >volatile is completely unacceptable. > > Why is this unacceptable .. seems much better to me than writing > undefined stuff. There's a parallel thread in the Linux Kernel Mailing List. Everyone is advised to read it, if not already. There are several good points there: - the problem is not limited to multithreaded domain: the page with the object could be made read-only during execution, thus if (! page_is_read_only) v = 1; would SIGSEGV for no apparent reason. - making things volatile is unacceptable from performance POV. - optimization in question might well turn out to be misoptimization for anything but microbenchmarks (read LKML for cache flush/dirty page issues). - "people knowledgeable in POSIX say that this optimization is bogus". I would add that though we may say that Standard C is not aware of threads, POSIX _is_ aware of Standard C. While POSIX failed to solve the issue by formal word, its intent is clear: to make POSIX Threads usable. The compiler that claims to be POSIX compatible should take this into account. - there's also a good talk on lawyer-ish vs attached-to-reality approach. I personally doubt those who continue to advise to use volatile are actually writing such multithreaded programs. Most argue just for the fun of it. > Well Hans is talking about C/C++, you are talking about some other > language in which programs which do not have well defined semantics > in C or C++ do have well defined semantics in your language. Good thing we have this _bug_ in languages that define memory semantics (Ada, Java), and no one yet argues that GCC should be fixed wrt to only those languages. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Sun, Oct 28, 2007 at 17:51:57 +0100, Michael Matz wrote: > I was merely showing that this transformation _does_ matter in some cases > to refute opposite claims which seemed to be expressed too airy in this > thread. You got my intent all wrong. Performance matters for both sides. And currently the only option for multithreaded programs is to use volatile, which _greatly_ hurts performance. What I was trying to say, is that it would be nice to have -fno-thread-unsafe-optimization option. And I was trying to say that when you _enable_ this option, the performance won't be hurt much, while the program will become thread-safe. I never even said that this option should be the default (though it would be reasonable for -pthread or -fopenmp). But there are obviously people who think there's no need in such option whatsoever, because "threaded code is broken by definition, and I don't write it anyway". Even if mutithreading is of no immediate concern for you, it will become tomorrow then you decide to run your loop on all 1024 cores your cell phone provides. So you can't argue that this option wouldn't be nice to have, no? And as I understood this discussion, there will be such option in GCC in the nearest future. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Sun, Oct 28, 2007 at 21:03:09 +0300, Tomash Brechko wrote: > What I was trying to say, is that it would be nice to have > -fno-thread-unsafe-optimization option. Rather clear -fno-speculative-store, in the light of mprotect() and non-writable memory. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 02:39:15 -, Dave Korn wrote: > BTW, you and Tomash should get your stories in synch. He says speculative > loads are ok, just no stores, and wants a kind of half-volatile flag that > would only suppress stores. I think you're already looking one step further > down the road than he is and have realised that speculative loads will give > you problems too. You don't do your homework. This pointer http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2338.html (which was already posted in this thread) explains the matter, see "Speculative code motion involving loads" section. So both David and me are correct. But curious, Bart already tried _several times_ to explain why using volatile is not an option, but his arguments seem to be too "inconvenient" to be considered. Let me repeat: suppose we agree that every shared data should be annotated as volatile. So if I want to share dynamic data, I have to write _volatile_ data_type *pdata = malloc(size); But how to use this data? There are not many library functions that accept pointer to volatile (and casting the qualifier away will bring us back to the start). Should every library function have 2^n copies where different combinations of parameters are annotated as volatile? I think most pro-volatile people didn't understood the meaning of several papers in the Internet that say you have to use volatile. Those papers never meant to say that volatile is a proper way to use shared data with POSIX threads, rather that because the compilers are made the way they are you have to use volatile for now to overcome compiler thread-unawareness. David R. Butenhof was the member of POSIX.1c (POSIX Threads) committee. In his book, "Programming with POSIX Threads", there are no volatiles at all. Of course one can say he didn't grok C, or even POSIX, or POSIX Threads. But it shows the intent, at least how he felt it. And this is the way to go: in sane world standards follow the reality, not the other way around. And they will, that's why the work of Hans Boehm is there. As it was already mentioned in this thread, while his proposal is not final yet, most of the work is being done on atomics, so it highly unlikely that "no-speculative-stores-please" requirement will change. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 10:43:13 +0300, Tomash Brechko wrote: > I think most pro-volatile people didn't understood the meaning of > several papers in the Internet that say you have to use volatile. And some don't understand the true purposes of volatile itself. In the code below volatile int *v = (int *) 0xdeadbeef; void f() { int i; for (i = 0; i < N; ++i) *v = 1; } _all_ N stores matter. Why? Because v may point to the device I/O port, and the device may _count_ those writes among other things. But if *v is simply shared, do all stores to it matter? No, only the final value is relevant. That's why -fno-speculative-store will never be equal to volatile, and that's why it is needed to replace current volatile hammer. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 09:12:09 +0100, Eric Botcazou wrote: > Define "final value". The value that will be seen by other threads after they synchronize memory (with pthread_mutex_lock(), for instance). -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 01:08:22 -0700, Andrew Pinski wrote: > On 10/29/07, Tomash Brechko <[EMAIL PROTECTED]> wrote: > > But if *v is simply shared, do all stores to it matter? No, only the > > final value is relevant. > > Actually it depends, it might matter. If you have a loop checking the > value of *v on a different thread and it does not change until this > loop is done, then you end up with a wrong wait. This is the same as > what violatile is for really where it will change out side of the > current thread. Such program would be incorrect wrt POSIX Threads: you shouldn't read the object that may be modified by another thread. Such "wait" loop is always wrong wrt POSIX Threads. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 11:42:10 +0300, Tomash Brechko wrote: > It means that the current thread is free to cache the value in the > register as long as it knows no other thread can access it (i.e. as > long as it holds corresponding mutex). And because your next question will be "how the compiler will know the corresponding mutex", the answer is: it can't, that's why "opaque function" rules come to play. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 09:31:13 +0100, Eric Botcazou wrote: > > The value that will be seen by other threads after they synchronize > > memory (with pthread_mutex_lock(), for instance). > > What does it mean from the viewpoint of the current thread? It means that the current thread is free to cache the value in the register as long as it knows no other thread can access it (i.e. as long as it holds corresponding mutex). -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 09:50:16 +0100, Eric Botcazou wrote: > Right, so please define more or less formally what the "final value" is from > the viewpoint of the current thread, this is the crux of the matter. OK, formally there's no "final" value from current thread's POV, only the "current" value. "Final" only matters from other thread's POV, like "this is the last value that was produced by another thread before it released the mutex". But shouldn't we formally define "define" first? :) -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 11:55:25 +0300, Tomash Brechko wrote: > OK, formally there's no "final" value from current thread's POV, only > the "current" value. "Final" only matters from other thread's POV, > like "this is the last value that was produced by another thread > before it released the mutex". Rather, "...before it released the mutex, and we acuired the same mutex". But it may be the same thread actually, so "final value" is the value that is seen by the thread at the beginning of excusive access to the object. It is "final" wrt previous exclusive access to this object. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 12:04:14 +0300, Tomash Brechko wrote: > Rather, "...before it released the mutex, and we acuired the same > mutex". But it may be the same thread actually, so "final value" is > the value that is seen by the thread at the beginning of excusive > access to the object. It is "final" wrt previous exclusive access to > this object. Note that this doesn't require the value to actually _be_ in the memory, only to be observed as if it is there. That's the power of POSIX Threads, and that's why memory barriers, not cache flushes, are behind pthread_mutex_lock() and friends. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 12:54:22 +0100, Andi Kleen wrote: > See http://gcc.gnu.org/ml/gcc/2007-10/msg00607.html for a test case > that shows where it can go horrible wrong (optimized code significantly > slower than unoptimized code) Admittedly it is a constructed > one, but I don't think it is that unrealistic. Thanks. I had to change %Lu to %lu, and the example shows the point when run multiple times. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 15:53:56 +0100, Michael Matz wrote: > No it won't, because without further information GCC can't know that a > memory access won't trap. Ergo it will not move it out of its control > region, exactly because it would potentially introduce traps where there > were none before. Good reasoning, and that's exactly what some of us are asking for. Please see the example: #include #include #include int f(int read_only, int a[]) { int res = a[0]; if (! read_only) a[0] = 1; return res; } int main(void) { const long page_size = sysconf(_SC_PAGESIZE); int *a1 = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); int *a2 = mmap(NULL, page_size, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); f(0, a1); f(1, a2); fputs("GCC is the best compiler ever!\n", stdout); } It gives: moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O0 mmap.c -o mmap moonlight:/tmp$ ./mmap GCC is the best compiler ever! moonlight:/tmp$ /usr/local/gcc-4.3-trunk/bin/gcc -O1 mmap.c -o mmap moonlight:/tmp$ ./mmap Segmentation fault :-/ The discussion is not pointless, just please try to understand what other people are trying to say. No one is stupid, we all just not on the same page yet. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 19:20:25 +0300, Tomash Brechko wrote: > Good reasoning, and that's exactly what some of us are asking for. > Please see the example: At higher optimization levels GCC may inline f(), or not call it at all, so below is a more complete case: #include #include #include int f(int read_only, int a[]) __attribute__((__noinline__)); int f(int read_only, int a[]) { int res = a[0]; if (! read_only) a[0] = 1; return res; } int main(void) { int res; const long page_size = sysconf(_SC_PAGESIZE); int *a1 = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); int *a2 = mmap(NULL, page_size, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); res += f(0, a1); res += f(1, a2); fputs("GCC is the best compiler ever!\n", stdout); return res; } -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 21:52:19 +0100, Michael Matz wrote: > It is safe if there's another dominating store outside the control region. > Apart from that I reluctantly agree (i.e. it's not enough if there's any > dominating access through the pointer in question, it must be a store). Thank you! I almost started to think like I'm loosing grounds for my claims :). -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 20:37:52 +0100, Duncan Sands wrote: > I don't see this with gcc 4.1 or 4.2. Just a data point. Yes, thanks for pointing this. It fails with gcc (GCC) 4.3.0 20071021 (experimental) though. It turns out that GCC 4.2 and below don't do this optimization for pointers (even when known to be non-null). Formally, POSIX requires mprotect() to work only on mmap()'ed regions, which are accessed through pointers. Technically you can make any page read-only, including the one that holds globals, but this won't pass GCC lawyers. Still, I believe the example proves the general idea. It shows that speculative store is never safe, because every 'if' may be an 'if not read-only'-one. And if optimization is not being performed, then it's only for good: the program is thread-safe, and disabling optimization for other cases won't affect performance of pointer case. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Mon, Oct 29, 2007 at 22:30:20 +0100, Eric Botcazou wrote: > See gcc/gthr-posix.h for a proper use of "volatile" for a shared access. It was already shown that you can't use volatile in general case, because you can't pass such data to any function. See the mail of Bart Van Assche. The use doesn't become proper simply because it appears in the code, even if in the code of GCC. volatile might be used there for completely different reasons. Consider this comment: static volatile int __gthread_active = -1; ... /* This test is not protected to avoid taking a lock on the main code path so every update of __gthread_active in a threaded program must be atomic with regard to the result of the test. */ if (__builtin_expect (__gthread_active_latest_value < 0, 0)) { ... volatile + atomic update + cache-coherent system will indeed give you the correct result, but such use is not POSIX-compliant, and I mostly talk about POSIX Threads. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
I accidentally removed the essential line, it should be: On Tue, Oct 30, 2007 at 10:44:52 +0300, Tomash Brechko wrote: > static volatile int __gthread_active = -1; > > ... int __gthread_active_latest_value = __gthread_active; > /* This test is not protected to avoid taking a lock on the main code >path so every update of __gthread_active in a threaded program must >be atomic with regard to the result of the test. */ > if (__builtin_expect (__gthread_active_latest_value < 0, 0)) > { > ... But you knew it already ;). -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Tue, Oct 30, 2007 at 10:59:24 +0300, Tomash Brechko wrote: > On Tue, Oct 30, 2007 at 08:56:08 +0100, Eric Botcazou wrote: > > > The use doesn't become proper simply because it appears in the code, > > > even if in the code of GCC. volatile might be used there for > > > completely different reasons. > > > > No, I put it there for this purpose. > > Then you could remove it, if not for unlocked access. Frankly, you realise the consequences of volatile access, you have this comment: /* Avoid reading __gthread_active twice on the main code path. */ int __gthread_active_latest_value = __gthread_active; Now, do you really believe that every multithreaded program should use volatile, and then should copy shared data to temporal storage, just because volatile is such a hammer? You may have to, with current compilers, but that's not what was supposed by POSIX. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Tue, Oct 30, 2007 at 08:56:08 +0100, Eric Botcazou wrote: > > The use doesn't become proper simply because it appears in the code, > > even if in the code of GCC. volatile might be used there for > > completely different reasons. > > No, I put it there for this purpose. Then you could remove it, if not for unlocked access. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Tue, Oct 30, 2007 at 09:20:28 +0100, Eric Botcazou wrote: > No, I just wanted to point out that "volatile" has a well-defined semantics > and can be properly used for shared accesses. In other words, it's not all > or nothing like your earlier message[*] seemed to imply. > > [*] http://gcc.gnu.org/ml/gcc/2007-10/msg00663.html I didn't get your point. Sure volatile can be used _along_ with shared data. But we can't say it _has_ to be used _for_ shared data. I.e. if you require all shared data to be volatile, you can't pass pointer to such data to any function without casting away the qualifier. volatile can be properly used _only_ if you also assume atomicity and cache-coherence, and this is beyond POSIX. But anyway, I'm proving the opposite: when you use POSIX locks, you don't have to use volatile, that it. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
I'd like to answer one last argument, mostly for the sake of curious reader, because Michael himself has agreed with (at least the part of) the point. On Mon, Oct 29, 2007 at 16:00:18 +0100, Michael Matz wrote: > The issue is, that people want to write this: > > if (condition) > *p = value; > > (i.e. without any synchronization primitive or in fact anything else after > the store in the control region) and expect that the store indeed only > happens in that control region. And this expectation is misguided. Had > they written it like: > > if (condition) { > *p = value; > membarrier(); > } > > it would have worked just fine. Even if we put aside the fact that there's no such membarrier() equivalent in POSIX bindings, this won't help. First of all, let's note that you can't break the program by making it _more_ ordered. Indeed, program correctness doesn't depend on some particular reordering (you can't predict it anyway), it depends only on some particular ordering. So we can rewrite if (condition) { *p = value; membarrier(); } as if (condition) { *p = value; membarrier(); } else { membarrier(); } But this is the same as if (condition) *p = value; membarrier(); and we are back to the start: the store could me moved outside the condition. In general the following would work if (condition) { *p = value; opaque_function(); } because GCC has to assume that the call may access any memory, thus store to *p can't be moved outside of the condition, because the call itself can't be moved outside. But such a construction can't be the requirement for threaded programming. In the original example there _were_ synchronization primitives already, the complete piece is: if (condition) pthread_mutex_lock(&mutex); ... if (condition) *p = value; ... if (condition) pthread_mutex_unlock(&mutex); and POSIX doesn't require any additional ordering between lock() and unlock(). When condition is false, any speculative store to *p is bogus, because any condition is potentially a 'lock acquired' condition (or 'not read-only' condition). And it was shown that the volatile qualifier can't be applied in general case. But perhaps I'm the only one who is still unsure about the outcome of this discussion :). -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Tue, Oct 30, 2007 at 15:33:56 +0100, Eric Botcazou wrote: > We're not talking about locks, see the example you gave in your > first message. Please read the _description_ that comes along with the code example. Anyways, the patch is there. -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Tue, Oct 30, 2007 at 07:50:04 -0700, Ian Lance Taylor wrote: > Tomash Brechko <[EMAIL PROTECTED]> writes: > > > Even if we put aside the fact that there's no such membarrier() > > equivalent in POSIX bindings, this won't help. > > In POSIX, any mutex function must be a membarrier. For example, on > x86, mutex lock and unlock more or less have to execute the mfence > instruction. If they don't, the program can see inconsistent data > structures despite the mutex operations. Yes, but you don't imply I should write if (condition) { *p = value; pthread_mutex_lock(&dummy): pthread_mutex_unlock(&dummy): } just to trigger it. > > if (condition) { > > *p = value; > > membarrier(); > > } else { > > membarrier(); > > } > > > > But this is the same as > > > > if (condition) > > *p = value; > > membarrier(); > > No, it isn't. If membarrier is not a general function call, then it > has to be a magic function. In gcc it is implemented using a volatile > asm. I didn't get your point, but probably you didn't get my either. I was talking about memory barriers as a whole, not a particular implementation in GCC. And my point is that you are free to inject them wherever you like. This will affect performance, but not correctness. Hence you can't be sure membarrier() won't be moved from the condition. > Note that I've committed my patch to avoid speculative stores to all > active branches, so this particular case should be a non-issue going > forward. However, we all are going to have to take a careful look at > gcc to make sure that it generally conforms to the C++0x memory model. I'm not against ending this discussion. As I understand the patch (and I don't grok GCC internals), it fixes both read-only memory case, and race case. But it doesn't try to preserve the optimization in the form that was suggested by Michael Matz (i.e. to use pointer to dummy object on the stack), right? -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Tue, Oct 30, 2007 at 09:49:00 -0700, Ian Lance Taylor wrote: > I don't know which suggestion you are referring to. The patch I wrote > will retain the optimization in the case where the memory location is > unconditionally written later in the function. This is most relevant > in that the optimization can take place in a loop, if somewhere after > the loop the memory location is unconditionally written. OK, thanks for the description, I just couldn't build GCC after update to see what result looks like. And big Thank You for the patch! -- Tomash Brechko