Optimization of conditional access to globals: thread-unsafe?
Hello, I have a question regarding the thread-safeness of a particular GCC optimization. I'm sorry if this was already discussed on the list, if so please provide me with the reference to the previous discussion. Consider this piece of code: extern int v; void f(int set_v) { if (set_v) v = 1; } If f() is called concurrently from several threads, then call to f(1) should be protected by the mutex. But do we have to acquire the mutex for f(0) calls? I'd say no, why, there's no access to global v in that case. But GCC 3.3.4--4.3.0 on i686 with -01 generates the following: f: pushl %ebp movl%esp, %ebp cmpl$0, 8(%ebp) movl$1, %eax cmove v, %eax; load (maybe) movl%eax, v; store (always) popl%ebp ret Note the last unconditional store to v. Now, if some thread would modify v between our load and store (acquiring the mutex first), then we will overwrite the new value with the old one (and would do that in a thread-unsafe manner, not acquiring the mutex). So, do the calls to f(0) require the mutex, or it's a GCC bug? This very bug was actually already reported for a bit different case, "Loop IM and other optimizations harmful for -fopenmp" (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 ; please ignore my last comment there, as I no longer sure myself). But the report was closed with "UNCONFIRMED" mark, and reasons for that are not quire clear to me. I tried to dig into the C99 standard and David Butenhof's "Programming with POSIX Threads", and didn't find any indication that call f(0) should be also protected by the mutex. Here are some pieces from C99: Sec 3.1 par 3: NOTE 2 "Modify" includes the case where the new value being stored is the same as the previous value. Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access objects. Sec 5.1.2.3 par 3: In the abstract machine, all expressions are evaluated as specified by the semantics. Sec 5.1.2.3 par 5 basically says that the result of the program execution wrt volatile objects, external files and terminal output should be the same for all confirming implementations. Sec 5.1.2.3 par 8: EXAMPLE 1 An implementation might define a one-to-one correspondence between abstract and actual semantics: ... Sec 5.1.2.3 par 9: Alternatively, an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. ... I think that the above says that even when compiler chooses to do some optimizations, the result of the _whole execution_ should be the same as if actual semantics equals to abstract semantics. Sec 5.1.2.3 par 9 cited last is not a permission to do optimizations that may change the end result. In our case when threads are involved the result may change, because there's no access to v in the abstract semantics, and thus no mutex is required from abstract POV. So, could someone explain me why this GCC optimization is valid, and, if so, where lies the boundary below which I may safely assume GCC won't try to store to objects that aren't stored to explicitly during particular execution path? Or maybe the named bug report is valid after all? Thanks in advance, -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On Sun, Oct 21, 2007 at 06:55:13PM +0400, Tomash Brechko wrote: > Hello, > > I have a question regarding the thread-safeness of a particular GCC > optimization. I'm sorry if this was already discussed on the list, if > so please provide me with the reference to the previous discussion. > > Consider this piece of code: > > extern int v; > > void > f(int set_v) > { > if (set_v) > v = 1; > } > > If f() is called concurrently from several threads, then call to f(1) > should be protected by the mutex. But do we have to acquire the mutex > for f(0) calls? I'd say no, why, there's no access to global v in > that case. But GCC 3.3.4--4.3.0 on i686 with -01 generates the > following: > > f: > pushl %ebp > movl%esp, %ebp > cmpl$0, 8(%ebp) > movl$1, %eax > cmove v, %eax; load (maybe) > movl%eax, v; store (always) > popl%ebp > ret > > Note the last unconditional store to v. Now, if some thread would > modify v between our load and store (acquiring the mutex first), then > we will overwrite the new value with the old one (and would do that in > a thread-unsafe manner, not acquiring the mutex). > > So, do the calls to f(0) require the mutex, or it's a GCC bug? Note that C99 is firmly based on a single-threaded execution model and says nothing whatsoever about what should happen or not happen in a threaded environment. According to C99 a C compiler is allowed to generate such code as gcc does. If you are using some threaded environment then you will have to read the relevant standard for that to find out if it imposes any additional restricitions on a C compiler beyond what the C standard does. I suspect that most of them will not say one way or the other about what should happen in this case, which means that you will have to assume the worst case and protect all calls to f() regardless of the value of the argument. Personally I would say that it is the accesses to 'v' that should be protected by a mutex (or similar), not the calls to f(). It is almost always better programming practice to protect access to data objects rather than to code. > > This very bug was actually already reported for a bit different case, > "Loop IM and other optimizations harmful for -fopenmp" > (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 ; please ignore my > last comment there, as I no longer sure myself). But the report was > closed with "UNCONFIRMED" mark, and reasons for that are not quire > clear to me. I tried to dig into the C99 standard and David > Butenhof's "Programming with POSIX Threads", and didn't find any > indication that call f(0) should be also protected by the mutex. > > Here are some pieces from C99: > > Sec 3.1 par 3: NOTE 2 "Modify" includes the case where the new value >being stored is the same as the previous value. > > Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access >objects. > > Sec 5.1.2.3 par 3: In the abstract machine, all expressions are >evaluated as specified by the semantics. > > Sec 5.1.2.3 par 5 basically says that the result of the program > execution wrt volatile objects, external files and terminal output > should be the same for all confirming implementations. > > Sec 5.1.2.3 par 8: EXAMPLE 1 An implementation might define a >one-to-one correspondence between abstract and >actual semantics: ... > > Sec 5.1.2.3 par 9: Alternatively, an implementation might perform >various optimizations within each translation unit, >such that the actual semantics would agree with the >abstract semantics only when making function calls >across translation unit boundaries. ... > > I think that the above says that even when compiler chooses to do some > optimizations, the result of the _whole execution_ should be the same > as if actual semantics equals to abstract semantics. Sec 5.1.2.3 par > 9 cited last is not a permission to do optimizations that may change > the end result. In our case when threads are involved the result may > change, because there's no access to v in the abstract semantics, and > thus no mutex is required from abstract POV. > > > So, could someone explain me why this GCC optimization is valid, and, > if so, where lies the boundary below which I may safely assume GCC > won't try to store to objects that aren't stored to explicitly during > particular execution path? Or maybe the named bug report is valid > after all? > -- Erik Trulsson [EMAIL PROTECTED]
Re: Optimization of conditional access to globals: thread-unsafe?
On Sun, Oct 21, 2007 at 17:26:02 +0200, Erik Trulsson wrote: > Note that C99 is firmly based on a single-threaded execution model and says > nothing whatsoever about what should happen or not happen in a threaded > environment. According to C99 a C compiler is allowed to generate such code > as gcc does. Yes, I understand that C99 doesn't concern threads per see, but I wouldn't call it pro-single-threaded, rather thread-neutral. I.e. the standard isn't made explicitly incompatible with threads, it is simply "says nothing about threads". > If you are using some threaded environment then you will have to read the > relevant standard for that to find out if it imposes any additional > restricitions on a C compiler beyond what the C standard does. All we have is POSIX, and it imposes very little on compiler I guess. > I suspect that most of them will not say one way or the other about what > should happen in this case, which means that you will have to assume the > worst case and protect all calls to f() regardless of the value of the > argument. Well, assuming the worst case won't always work, that's why I asked about reasonable boundary. Consider the following (putting style/efficiency matters aside): #include #define N 100 /* mutex[i] corresponds to byte[i]. */ pthread_mutex_t mutex[N]; char byte[N]; void f(int i) { pthread_mutex_lock(&mutex[i]); byte[i] = 1; pthread_mutex_unlock(&mutex[i]); } Is this code thread-safe? Because from some POV C99 doesn't forbid to load and store the whole word when single byte[i] is accessed (given that C99 is pro-single-threaded). But if C99 is thread-neutral, then it's compiler's responsibility to ensure the same result as some abstract machine (which may be sequential). In this case the compiler should access the single byte, no more. OK, I've got your point, but I'm not satisfied :). -- Tomash Brechko
Re: Optimization of conditional access to globals: thread-unsafe?
On 10/21/07, Tomash Brechko <[EMAIL PROTECTED]> wrote: > On Sun, Oct 21, 2007 at 17:26:02 +0200, Erik Trulsson wrote: > > Note that C99 is firmly based on a single-threaded execution model and says > > nothing whatsoever about what should happen or not happen in a threaded > > environment. According to C99 a C compiler is allowed to generate such code > > as gcc does. > > Yes, I understand that C99 doesn't concern threads per see, but I > wouldn't call it pro-single-threaded, rather thread-neutral. I.e. the > standard isn't made explicitly incompatible with threads, it is > simply "says nothing about threads". > > > > If you are using some threaded environment then you will have to read the > > relevant standard for that to find out if it imposes any additional > > restricitions on a C compiler beyond what the C standard does. > > All we have is POSIX, and it imposes very little on compiler I guess. > > > > I suspect that most of them will not say one way or the other about what > > should happen in this case, which means that you will have to assume the > > worst case and protect all calls to f() regardless of the value of the > > argument. > > Well, assuming the worst case won't always work, that's why I asked > about reasonable boundary. Consider the following (putting > style/efficiency matters aside): > > #include > > #define N 100 > > /* mutex[i] corresponds to byte[i]. */ > pthread_mutex_t mutex[N]; > char byte[N]; > > void > f(int i) > { > pthread_mutex_lock(&mutex[i]); > byte[i] = 1; > pthread_mutex_unlock(&mutex[i]); > } > > > Is this code thread-safe? Because from some POV C99 doesn't forbid to > load and store the whole word when single byte[i] is accessed (given > that C99 is pro-single-threaded). > > But if C99 is thread-neutral, then it's compiler's responsibility to > ensure the same result as some abstract machine (which may be > sequential). In this case the compiler should access the single byte, > no more. On some architectures this is not even possible. For performance reasons you should store the individual bytes to separate cache-lines anyway. Richard.
RE: Optimization of conditional access to globals: thread-unsafe?
On 21 October 2007 15:55, Tomash Brechko wrote: > Consider this piece of code: > > extern int v; > > void > f(int set_v) > { > if (set_v) > v = 1; > } > f: > pushl %ebp > movl%esp, %ebp > cmpl$0, 8(%ebp) > movl$1, %eax > cmove v, %eax; load (maybe) > movl%eax, v; store (always) > popl%ebp > ret > > Note the last unconditional store to v. > So, could someone explain me why this GCC optimization is valid, Because of the 'as-if' rule. Since the standard is neutral with regard to threads, gcc does not have to take them into account when it decides whether an optimisation would satisfy the 'as-if' rule. If you really want all externally-visible accesses to v to be made exactly as the code directs, rather than allowing gcc to optimise them in any way that (from the program's POV) it's just the same 'as-if' they had been done exactly, make v volatile. cheers, DaveK -- Can't think of a witty .sigline today
Bug in C++ FE?
When i'm trying to compile the following Code: template class X { private: typedef void (X::*PtrToMethod)(); PtrToMethod method; template void some_func() { return; } public: void func() { method = &X::some_func; } }; int main() { X obj; obj.func(); } gcc rejects it with the expressionless errors: error.cpp:15: error: expected primary-expression before 'int' error.cpp:15: error: expected `;' before 'int' (line 15 is method = &X::some_func;) I'm sure, this code is correct, because gcc compiles it without problems, if X is not a template-class, but a standard-c++-class. Also, Comeau Online also compiles both variants (template and non-template version of X) I tried gcc 4.1.2 and gcc 4.3.0(20071012) René
Re: Bug in C++ FE?
On 10/21/07, Rene Buergel <[EMAIL PROTECTED]> wrote: > When i'm trying to compile the following Code: > method = &X::some_func; You forgot the template keyword as X is dependent : method = &X::template some_func; There is a defect report against the C++ standard about X being dependent inside the class itself. -- Pinski
Re: Optimization of conditional access to globals: thread-unsafe?
On Sun, 2007-10-21 at 20:16 +0400, Tomash Brechko wrote: > But if C99 is thread-neutral, then it's compiler's responsibility to > ensure the same result as some abstract machine No. The compiler is responsible for ensuring that things work if, and only if: 1. your program conforms in required ways 2. you only call functions in the standard library or functions you define in your program Therefore, when making calls to ANY external library, all bets are off UNLESS that library also meets the above conditions. Interfacing the operating system in any way OTHER than the C standard library functions immediately relieves the compiler of all responsibility -- and that includes calls to Posix functions which are not implemented entirely in Standard C. Of course, a C compiler may make additional guarantees, for example, Posix compliance, but then you must check the Posix standard to see what additional things it promises. AFAIK, access to a shared variable is sound if serialised by a mutex. However that isn't a proper description, there are situations where you can safely read a shared variable without a lock being held, for example if you know another thread has not modified it since the last synchronisation. -- John Skaller Felix, successor to C++: http://felix.sf.net
RE: Optimization of conditional access to globals: thread-unsafe?
On Mon, 2007-10-22 at 00:07 +0100, Dave Korn wrote: > If you really want all externally-visible accesses to v to be made exactly > as the code directs, rather than allowing gcc to optimise them in any way that > (from the program's POV) it's just the same 'as-if' they had been done > exactly, make v volatile. That is not enough. Apart from the lack of ISO semantics for volatile, typically a compiler will take volatile as a hint to not hold values of the variable in a register. On a multi-processor, this is not enough, because each CPU may still hold modified values in separate caches. Perhaps gcc actually puts a RW barrier to force cache synchronisation on every volatile access.. this seems rather expensive and very hard to do since it is very dependent on the actual box (not just the processor). Some processor caches might require external electrical signals to synchronise, for example. This is quite possible if you have multiple CPU boards in a box. But I don't actually know what gcc does, although I guess it does nothing. The OS has to do the right thing here when a mutex is locked etc, but the code for that is probably in the kernel which is better able to manage things like cache synchronisation than a compiler. -- John Skaller Felix, successor to C++: http://felix.sf.net