Re: [10 PATCHES] inline functions to avoid stack overflow

2008-06-25 Thread Bart Van Assche
On Thu, Jun 26, 2008 at 12:09 AM, David Miller <[EMAIL PROTECTED]> wrote:
> From: Mikulas Patocka <[EMAIL PROTECTED]>
> Date: Wed, 25 Jun 2008 08:53:10 -0400 (EDT)
>
>> Even worse, gcc doesn't use these additional bytes. If you try this:
>>
>> extern void f(int *i);
>> void g()
>> {
>>  int a;
>>  f(&a);
>> }
>>
>> , it allocates additional 16 bytes for the variable "a" (so there's total
>> 208 bytes), even though it could place the variable into 48-byte
>> ABI-mandated area that it inherited from the caller or into it's own
>> 16-byte padding that it made when calling "f".
>
> The extra 16 bytes of space allocated is so that GCC can perform a
> secondary reload of a quad floating point value.  It always has to be
> present, because we can't satisfy a secondary reload by emitting yet
> another reload, it's the end of the possible level of recursions
> allowed by the reload pass.

Is there any floating-point code present in the Linux kernel ? Would
it be a good idea to add an option to gcc that tells gcc that the
compiled code does not contain floating-point instructions, such that
gcc knows that no space has to be provided for a quad floating point
value ?

Bart.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Bart Van Assche
On 10/21/07, Tomash Brechko <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I have a question regarding the thread-safeness of a particular GCC
> optimization.  I'm sorry if this was already discussed on the list, if
> so please provide me with the reference to the previous discussion.
>
> Consider this piece of code:
>
> extern int v;
>
> void
> f(int set_v)
> {
>   if (set_v)
> v = 1;
> }
>
> If f() is called concurrently from several threads, then call to f(1)
> should be protected by the mutex.  But do we have to acquire the mutex
> for f(0) calls?  I'd say no, why, there's no access to global v in
> that case.  But GCC 3.3.4--4.3.0 on i686 with -01 generates the
> following:
>
> f:
> pushl   %ebp
> movl%esp, %ebp
> cmpl$0, 8(%ebp)
> movl$1, %eax
> cmove   v, %eax; load (maybe)
> movl%eax, v; store (always)
> popl%ebp
> ret
>
> Note the last unconditional store to v.  Now, if some thread would
> modify v between our load and store (acquiring the mutex first), then
> we will overwrite the new value with the old one (and would do that in
> a thread-unsafe manner, not acquiring the mutex).
>
> So, do the calls to f(0) require the mutex, or it's a GCC bug?
...
> So, could someone explain me why this GCC optimization is valid, and,
> if so, where lies the boundary below which I may safely assume GCC
> won't try to store to objects that aren't stored to explicitly during
> particular execution path?  Or maybe the named bug report is valid
> after all?

Hello Tomash,

I'm not an expert in the C89/C99 standards, but I have written a Ph.D.
on the subject of memory models. What I learned during writing that
Ph.D. is the following:

- If you want to know which optimizations are valid and which ones are
not, you have to look at the semantics defined in the language
standard.

- Every language standard document defines what the result is of
executing a sequential program. The definition of the behavior of a
multithreaded program written in a certain programming language is
called the memory model of that programming language.

- The memory model of C and C++ is still under discussion as has
already been pointed out on this mailing list.

- Although the memory model for C and C++ is still under discussion,
there is a definition for the behavior of multithreaded C and C++
programs. The following is required by the ANSI/ISO C89 standard (from
paragraph 5.1.2.3, Program Execution):
  Accessing a volatile object, modifying an object, modifying a file,
or calling a function
  that does any of those operations are all side effects, which are
changes in the state of
  the execution environment. Evaluation of an expression may produce
side effects. At
  certain specified points in the execution sequence called sequence
points, all side effects
  of previous evaluations shall be complete and no side effects of
subsequent evaluations
  shall have taken place. (A summary of the sequence points is given
in annex C.)

In annex C it is explained that a.o. the call to a function (after
argument evaluation) is a sequence point.

See also http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf

- The above paragraph does not impose any limitation for the compiler
with regard to optimizations on non-volatile variables. Or: the
generated code shown in your mail is allowed by the above paragraph.

- The above paragraph has also the following implications for volatile
variables:
  * There exists a total order for all accesses to all volatile variables.
  * It is the responsibility of the compiler to ensure cache coherency
for volatile variables. If memory barrier instructions are needed to
ensure cache coherency on the architecture for which the compiler is
generating code for, then it is the responsibility of the compiler to
generate these instructions for volatile variables. This fact is often
overlooked.
  * The compiler must generate code such that exactly one store
statement is executed for each assignment to a volatile variable.
Prefetching volatile variables is allowed as long as it does not
violate paragraph 5.1.2.3 from the language definition.
  * As known the compiler may reorder function calls and assignments
to non-volatile variables if the compiler can prove that the called
function won't modify that variable. This becomes problematic if the
variable is modified by more than one thread and the called function
is a synchronization function, e.g. pthread_mutex_lock(). This kind of
reordering is highly undesirable. This is why any variable that is
shared over threads has to be declared volatile, even when using
explicit locking calls.

I hope the above brings more clarity in this discussion.

Bart Van Assche.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-26 Thread Bart Van Assche
On 10/22/07, Andrew Haley  wrote:

> The core problem here seems to be that the "C with threads" memory
> model isn't sufficiently well-defined to make a determination
> possible.  You're assuming that you have no responsibility to mark
> shared memory protected by a mutex as volatile, but I know of nothing
> in the C standard that makes such a guarantee.  A prudent programmer
> will make conservative assumptions.

I agree that according to the C and C++ language standards, any
variable shared over threads should be declared volatile. But it is
nearly impossible to live with this requirement: this requirement
implies that for each library function that modifies data through
pointers a second version should be added that accepts a volatile
pointer instead of a regular pointer. Consider e.g. the function
snprintf(), which writes to the character buffer passed as its first
argument. When snprintf() is used to write to a buffer that is not
shared over threads, the existing snprintf() function is fine. When
however snprintf() is used to write to a buffer that is shared by two
or more threads, a version is needed of snprintf() that accepts
volatile char* as its first argument.

My opinion is that it should be possible to declare whether C/C++ code
has acquire, release or acquire+release semantics. The fact that code
has acquire semantics means that no subsequent load or store
operations may be moved in front of that code, and the fact that code
has release semantics means that no preceding load or store operations
may be moved past that code. Adding definitions for acquire and
release semantics in pthread.h would help a lot. E.g.
pthread_mutex_lock() should be declared to have acquire semantics, and
pthread_mutex_unlock() should be declared to have release semantics.
Maybe it is a good idea to add the following function attributes in
gcc: __attribute__((acquire)) and __attribute__((release)) ? A
refinement of these attributes would be to allow to specify not only
the acquire/release attributes, but also the memory locations to which
the acquire and release apply (pthreads synchronization functions
always apply to all memory locations).

I'm not inventing anything new here -- as far as I know the concepts
of acquire and release were first defined by Gharachorloo e.a. in 1990
(Memory consistency and event ordering in scalable shared-memory
multiprocessors, International Symposium on Computer Architecture,
1990, http://portal.acm.org/citation.cfm?id=325102&dl=ACM&coll=GUIDE).

Bart Van Assche.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-27 Thread Bart Van Assche
> > On 10/22/07, Andrew Haley  wrote:
> >
> > > The core problem here seems to be that the "C with threads" memory
> > > model isn't sufficiently well-defined to make a determination
> > > possible.  You're assuming that you have no responsibility to mark
> > > shared memory protected by a mutex as volatile, but I know of nothing
> > > in the C standard that makes such a guarantee.  A prudent programmer
> > > will make conservative assumptions.

I'd like to return on this. Variables that are shared over two or more
threads can be classified as follows:
- static variables, global variables and dynamically allocated data.
- variables allocated on the stack.
As known the compiler may not reorder any access to any static
variable, global variable or dynamically allocated data with a call to
a function that is not declared inline. Variables that are allocated
on the stack of a thread can only be shared with another thread only
by passing a pointer to that variable to another thread first. Passing
such a pointer inhibits reordering of accesses to shared local
variables with a call to a function that is not declared inline.

Or: a C/C++ compiler will never reorder accesses to shared variables
with function calls. So if accesses to shared variables are properly
guarded with calls to synchronization functions, it is not necessary
to declare these shared variables volatile.

There is one category of synchronization functions that has not yet
been discussed: synchronization functions that can be inlined. I
propose to require that such synchronization functions contain at
least one asm statement that specifies that it clobbers memory,
because this inhibits reordering between the inlined synchronization
function and accesses to variables.

Maybe it's a good idea to add a chapter to the gcc manual about
multithreaded programming, such that gcc users who did not follow this
discussion can look up this kind of information easily ?

Bart Van Assche.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-27 Thread Bart Van Assche
On 10/27/07, Florian Weimer <[EMAIL PROTECTED]> wrote:

> And this isn't really specific to threads.

Hello Florian,

What I was trying to explain is that it is not necessary to declare
shared variables volatile, not for any C/C++ compiler that is
compliant with the language standard. Your reply did not acknowledge
nor deny this.

Bart Van Assche.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-28 Thread Bart Van Assche
On 10/27/07, Florian Weimer <[EMAIL PROTECTED]> wrote:
>
> Anyway, not reordering across function calls is not sufficient to get
> sane threading semantics (IIRC, this is also explained in detail in Hans
> Boehm's paper).

Hello Florian,

In Hans Boehm's paper the following issues are identified:
1. Concurrent accesses of variables without explicit locking can cause
unexpected results in a multithreaded context (paragraph 4.1).
2. If non-atomic variables (e.g. one field of a bitfield) are shared
over threads, and these are not protected by explicit locking,
updating such a variable in a multithreaded context is troublesome
(paragraph 4.2).
3. If the compiler performs register promotion on a shared variable,
this can cause undesired results in a multithreaded context (paragraph
4.3)

And this thread started with:
4. If the compiler generates a store operation for an assignment
statement that is not executed, this can cause trouble in a
mulithreaded context.

My opinion is that, given the importance of multithreading, it should
be documented in the gcc manual which optimizations can cause trouble
in multithreaded software (such as (3) and (4)). It should also be
documented which compiler flags must be used to disable optimizations
that cause trouble for multithreaded software. Requiring that all
thread-shared variables should be declared volatile is completely
unacceptable. We need a solution today for the combination of C/C++
and POSIX threads, we can't wait for the respective language
standardization committees to come up with a solution.

Regarding issues (1) and (2): (1) can be addressed by using
platform-specific or compiler-specific solutions, e.g. the datatype
atomic_t provided by the Linux kernel headers. And any prudent
programmer won't write code that triggers (2).

And as you may have noted, I do not agree with Hans Boehm where he
states that the combination of C/C++ with POSIX threads cannot result
in correctly working programs. I believe that the issues raised by
Hans Boehm can be solved.

Bart Van Assche.


Re: Optimization of conditional access to globals: thread-unsafe?

2007-10-28 Thread Bart Van Assche
On 10/28/07, Robert Dewar <[EMAIL PROTECTED]> wrote:
> Bart Van Assche wrote:
>
> > My opinion is that, given the importance of multithreading, it should
> > be documented in the gcc manual which optimizations can cause trouble
> > in multithreaded software (such as (3) and (4)). It should also be
> > documented which compiler flags must be used to disable optimizations
> > that cause trouble for multithreaded software. Requiring that all
> > thread-shared variables should be declared volatile is completely
> > unacceptable.
>
> Why is this unacceptable .. seems much better to me than writing
> undefined stuff.

Requiring that all thread-shared variables must be declared volatile,
even those protected by calls to synchronization functions, implies
that all multithreaded code and all existing libraries have to be
changed. Functions like snprintf() write data to a buffer provided by
the caller. If all thread-shared variables should be declared
volatile, then a second version of snprintf() would be required with
volatile char* as the datatype of the first argument instead of char*.
Quite a hefty requirement if you ask me ...

Bart Van Assche.