Optimization of conditional access to globals: thread-unsafe?

2007-10-21 Thread Tomash Brechko
Hello,

I have a question regarding the thread-safeness of a particular GCC
optimization.  I'm sorry if this was already discussed on the list, if
so please provide me with the reference to the previous discussion.

Consider this piece of code:

extern int v;
  
void
f(int set_v)
{
  if (set_v)
v = 1;
}

If f() is called concurrently from several threads, then call to f(1)
should be protected by the mutex.  But do we have to acquire the mutex
for f(0) calls?  I'd say no, why, there's no access to global v in
that case.  But GCC 3.3.4--4.3.0 on i686 with -01 generates the
following:

f:
pushl   %ebp
movl%esp, %ebp
cmpl$0, 8(%ebp)
movl$1, %eax
cmove   v, %eax; load (maybe)
movl%eax, v; store (always)
popl%ebp
ret

Note the last unconditional store to v.  Now, if some thread would
modify v between our load and store (acquiring the mutex first), then
we will overwrite the new value with the old one (and would do that in
a thread-unsafe manner, not acquiring the mutex).

So, do the calls to f(0) require the mutex, or it's a GCC bug?

This very bug was actually already reported for a bit different case,
"Loop IM and other optimizations harmful for -fopenmp"
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 ; please ignore my
last comment there, as I no longer sure myself).  But the report was
closed with "UNCONFIRMED" mark, and reasons for that are not quire
clear to me.  I tried to dig into the C99 standard and David
Butenhof's "Programming with POSIX Threads", and didn't find any
indication that call f(0) should be also protected by the mutex.

Here are some pieces from C99:

Sec 3.1 par 3: NOTE 2 "Modify" includes the case where the new value
   being stored is the same as the previous value.

Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access
   objects.

Sec 5.1.2.3 par 3: In the abstract machine, all expressions are
   evaluated as specified by the semantics.

Sec 5.1.2.3 par 5 basically says that the result of the program
execution wrt volatile objects, external files and terminal output
should be the same for all confirming implementations.

Sec 5.1.2.3 par 8: EXAMPLE 1 An implementation might define a
   one-to-one correspondence between abstract and
   actual semantics: ...

Sec 5.1.2.3 par 9: Alternatively, an implementation might perform
   various optimizations within each translation unit,
   such that the actual semantics would agree with the
   abstract semantics only when making function calls
   across translation unit boundaries. ...

I think that the above says that even when compiler chooses to do some
optimizations, the result of the _whole execution_ should be the same
as if actual semantics equals to abstract semantics.  Sec 5.1.2.3 par
9 cited last is not a permission to do optimizations that may change
the end result.  In our case when threads are involved the result may
change, because there's no access to v in the abstract semantics, and
thus no mutex is required from abstract POV.


So, could someone explain me why this GCC optimization is valid, and,
if so, where lies the boundary below which I may safely assume GCC
won't try to store to objects that aren't stored to explicitly during
particular execution path?  Or maybe the named bug report is valid
after all?


Thanks in advance,

-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-21 Thread Erik Trulsson
On Sun, Oct 21, 2007 at 06:55:13PM +0400, Tomash Brechko wrote:
> Hello,
> 
> I have a question regarding the thread-safeness of a particular GCC
> optimization.  I'm sorry if this was already discussed on the list, if
> so please provide me with the reference to the previous discussion.
> 
> Consider this piece of code:
> 
> extern int v;
>   
> void
> f(int set_v)
> {
>   if (set_v)
> v = 1;
> }
> 
> If f() is called concurrently from several threads, then call to f(1)
> should be protected by the mutex.  But do we have to acquire the mutex
> for f(0) calls?  I'd say no, why, there's no access to global v in
> that case.  But GCC 3.3.4--4.3.0 on i686 with -01 generates the
> following:
> 
> f:
> pushl   %ebp
> movl%esp, %ebp
> cmpl$0, 8(%ebp)
> movl$1, %eax
> cmove   v, %eax; load (maybe)
> movl%eax, v; store (always)
> popl%ebp
> ret
> 
> Note the last unconditional store to v.  Now, if some thread would
> modify v between our load and store (acquiring the mutex first), then
> we will overwrite the new value with the old one (and would do that in
> a thread-unsafe manner, not acquiring the mutex).
> 
> So, do the calls to f(0) require the mutex, or it's a GCC bug?

Note that C99 is firmly based on a single-threaded execution model and says
nothing whatsoever about what should happen or not happen in a threaded
environment.  According to C99 a C compiler is allowed to generate such code
as gcc does.


If you are using some threaded environment then you will have to read the
relevant standard for that to find out if it imposes any additional
restricitions on a C compiler beyond what the C standard does.

I suspect that most of them will not say one way or the other about what
should happen in this case, which means that you will have to assume the
worst case and protect all calls to f() regardless of the value of the
argument.


Personally I would say that it is the accesses to 'v' that should be
protected by a mutex (or similar), not the calls to f().
It is almost always better programming practice to protect access to data
objects rather than to code.  



> 
> This very bug was actually already reported for a bit different case,
> "Loop IM and other optimizations harmful for -fopenmp"
> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31862 ; please ignore my
> last comment there, as I no longer sure myself).  But the report was
> closed with "UNCONFIRMED" mark, and reasons for that are not quire
> clear to me.  I tried to dig into the C99 standard and David
> Butenhof's "Programming with POSIX Threads", and didn't find any
> indication that call f(0) should be also protected by the mutex.
> 
> Here are some pieces from C99:
> 
> Sec 3.1 par 3: NOTE 2 "Modify" includes the case where the new value
>being stored is the same as the previous value.
> 
> Sec 3.1 par 4: NOTE 3 Expressions that are not evaluated do not access
>objects.
> 
> Sec 5.1.2.3 par 3: In the abstract machine, all expressions are
>evaluated as specified by the semantics.
> 
> Sec 5.1.2.3 par 5 basically says that the result of the program
> execution wrt volatile objects, external files and terminal output
> should be the same for all confirming implementations.
> 
> Sec 5.1.2.3 par 8: EXAMPLE 1 An implementation might define a
>one-to-one correspondence between abstract and
>actual semantics: ...
> 
> Sec 5.1.2.3 par 9: Alternatively, an implementation might perform
>various optimizations within each translation unit,
>such that the actual semantics would agree with the
>abstract semantics only when making function calls
>across translation unit boundaries. ...
> 
> I think that the above says that even when compiler chooses to do some
> optimizations, the result of the _whole execution_ should be the same
> as if actual semantics equals to abstract semantics.  Sec 5.1.2.3 par
> 9 cited last is not a permission to do optimizations that may change
> the end result.  In our case when threads are involved the result may
> change, because there's no access to v in the abstract semantics, and
> thus no mutex is required from abstract POV.
> 
> 
> So, could someone explain me why this GCC optimization is valid, and,
> if so, where lies the boundary below which I may safely assume GCC
> won't try to store to objects that aren't stored to explicitly during
> particular execution path?  Or maybe the named bug report is valid
> after all?
> 

-- 

Erik Trulsson
[EMAIL PROTECTED]


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-21 Thread Tomash Brechko
On Sun, Oct 21, 2007 at 17:26:02 +0200, Erik Trulsson wrote:
> Note that C99 is firmly based on a single-threaded execution model and says
> nothing whatsoever about what should happen or not happen in a threaded
> environment.  According to C99 a C compiler is allowed to generate such code
> as gcc does.

Yes, I understand that C99 doesn't concern threads per see, but I
wouldn't call it pro-single-threaded, rather thread-neutral.  I.e. the
standard isn't made explicitly incompatible with threads, it is
simply "says nothing about threads".


> If you are using some threaded environment then you will have to read the
> relevant standard for that to find out if it imposes any additional
> restricitions on a C compiler beyond what the C standard does.

All we have is POSIX, and it imposes very little on compiler I guess.


> I suspect that most of them will not say one way or the other about what
> should happen in this case, which means that you will have to assume the
> worst case and protect all calls to f() regardless of the value of the
> argument.

Well, assuming the worst case won't always work, that's why I asked
about reasonable boundary.  Consider the following (putting
style/efficiency matters aside):

  #include 

  #define N 100

  /* mutex[i] corresponds to byte[i].  */
  pthread_mutex_t mutex[N];
  char byte[N];

  void
  f(int i)
  {
pthread_mutex_lock(&mutex[i]);
byte[i] = 1;
pthread_mutex_unlock(&mutex[i]);
  }


Is this code thread-safe?  Because from some POV C99 doesn't forbid to
load and store the whole word when single byte[i] is accessed (given
that C99 is pro-single-threaded).

But if C99 is thread-neutral, then it's compiler's responsibility to
ensure the same result as some abstract machine (which may be
sequential).  In this case the compiler should access the single byte,
no more.


OK, I've got your point, but I'm not satisfied :).


-- 
   Tomash Brechko


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-21 Thread Richard Guenther
On 10/21/07, Tomash Brechko <[EMAIL PROTECTED]> wrote:
> On Sun, Oct 21, 2007 at 17:26:02 +0200, Erik Trulsson wrote:
> > Note that C99 is firmly based on a single-threaded execution model and says
> > nothing whatsoever about what should happen or not happen in a threaded
> > environment.  According to C99 a C compiler is allowed to generate such code
> > as gcc does.
>
> Yes, I understand that C99 doesn't concern threads per see, but I
> wouldn't call it pro-single-threaded, rather thread-neutral.  I.e. the
> standard isn't made explicitly incompatible with threads, it is
> simply "says nothing about threads".
>
>
> > If you are using some threaded environment then you will have to read the
> > relevant standard for that to find out if it imposes any additional
> > restricitions on a C compiler beyond what the C standard does.
>
> All we have is POSIX, and it imposes very little on compiler I guess.
>
>
> > I suspect that most of them will not say one way or the other about what
> > should happen in this case, which means that you will have to assume the
> > worst case and protect all calls to f() regardless of the value of the
> > argument.
>
> Well, assuming the worst case won't always work, that's why I asked
> about reasonable boundary.  Consider the following (putting
> style/efficiency matters aside):
>
>   #include 
>
>   #define N 100
>
>   /* mutex[i] corresponds to byte[i].  */
>   pthread_mutex_t mutex[N];
>   char byte[N];
>
>   void
>   f(int i)
>   {
> pthread_mutex_lock(&mutex[i]);
> byte[i] = 1;
> pthread_mutex_unlock(&mutex[i]);
>   }
>
>
> Is this code thread-safe?  Because from some POV C99 doesn't forbid to
> load and store the whole word when single byte[i] is accessed (given
> that C99 is pro-single-threaded).
>
> But if C99 is thread-neutral, then it's compiler's responsibility to
> ensure the same result as some abstract machine (which may be
> sequential).  In this case the compiler should access the single byte,
> no more.

On some architectures this is not even possible.  For performance
reasons you should store the individual bytes to separate cache-lines
anyway.

Richard.


RE: Optimization of conditional access to globals: thread-unsafe?

2007-10-21 Thread Dave Korn
On 21 October 2007 15:55, Tomash Brechko wrote:

> Consider this piece of code:
> 
> extern int v;
> 
> void
> f(int set_v)
> {
>   if (set_v)
> v = 1;
> }

> f:
> pushl   %ebp
> movl%esp, %ebp
> cmpl$0, 8(%ebp)
> movl$1, %eax
> cmove   v, %eax; load (maybe)
> movl%eax, v; store (always)
> popl%ebp
> ret
> 
> Note the last unconditional store to v.  
> So, could someone explain me why this GCC optimization is valid, 

  Because of the 'as-if' rule.  Since the standard is neutral with regard to
threads, gcc does not have to take them into account when it decides whether
an optimisation would satisfy the 'as-if' rule.

  If you really want all externally-visible accesses to v to be made exactly
as the code directs, rather than allowing gcc to optimise them in any way that
(from the program's POV) it's just the same 'as-if' they had been done
exactly, make v volatile.

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Bug in C++ FE?

2007-10-21 Thread Rene Buergel

When i'm trying to compile the following Code:

template  class X
{
   private:
   typedef void (X::*PtrToMethod)();
  
   PtrToMethod method;
  
   template  void some_func()

   {
   return;
   }
   public:
   void func()
   {
   method = &X::some_func;
   }
};

int main()
{
   X obj;
   obj.func();
}

gcc rejects it with the expressionless errors:
error.cpp:15: error: expected primary-expression before 'int'
error.cpp:15: error: expected `;' before 'int'
(line 15 is method = &X::some_func;)

I'm sure, this code is correct, because gcc compiles it without 
problems, if X is not a template-class, but a standard-c++-class.
Also, Comeau Online also compiles both variants (template and 
non-template version of X)

I tried gcc 4.1.2 and gcc 4.3.0(20071012)

René


Re: Bug in C++ FE?

2007-10-21 Thread Andrew Pinski
On 10/21/07, Rene Buergel <[EMAIL PROTECTED]> wrote:
> When i'm trying to compile the following Code:

> method = &X::some_func;

You forgot the template keyword as X is dependent :
method = &X::template some_func;

There is a defect report against the C++ standard about X being
dependent inside the class itself.

-- Pinski


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-21 Thread skaller

On Sun, 2007-10-21 at 20:16 +0400, Tomash Brechko wrote:

> But if C99 is thread-neutral, then it's compiler's responsibility to
> ensure the same result as some abstract machine 

No. The compiler is responsible for ensuring that things work
if, and only if:

1. your program conforms in required ways 

2. you only call functions in the standard library
   or functions you define in your program

Therefore, when making calls to ANY external library,
all bets are off UNLESS that library also meets the above
conditions.

Interfacing the operating system in any way OTHER than
the C standard library functions immediately relieves
the compiler of all responsibility -- and that includes
calls to Posix functions which are not implemented
entirely in Standard C.

Of course, a C compiler may make additional guarantees,
for example, Posix compliance, but then you must check
the Posix standard to see what additional things it 
promises.

AFAIK, access to a shared variable is sound if serialised
by a mutex. However that isn't a proper description,
there are situations where you can safely read a shared
variable without a lock being held, for example if you know
another thread has not modified it since the last synchronisation.


-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net


RE: Optimization of conditional access to globals: thread-unsafe?

2007-10-21 Thread skaller

On Mon, 2007-10-22 at 00:07 +0100, Dave Korn wrote:

>   If you really want all externally-visible accesses to v to be made exactly
> as the code directs, rather than allowing gcc to optimise them in any way that
> (from the program's POV) it's just the same 'as-if' they had been done
> exactly, make v volatile.

That is not enough. Apart from the lack of ISO semantics for volatile,
typically a compiler will take volatile as a hint to not hold
values of the variable in a register.

On a multi-processor, this is not enough, because each CPU
may still hold modified values in separate caches.

Perhaps gcc actually puts a RW barrier to force 
cache synchronisation on every volatile access..
this seems rather expensive and very hard to do since
it is very dependent on the actual box (not just the
processor). Some processor caches might require external
electrical signals to synchronise, for example. This is
quite possible if you have multiple CPU boards in a box.

But I don't actually know what gcc does, although I guess
it does nothing. The OS has to do the right thing here
when a mutex is locked etc, but the code for that is
probably in the kernel which is better able to manage
things like cache synchronisation than a compiler.


-- 
John Skaller 
Felix, successor to C++: http://felix.sf.net