Re: [10 PATCHES] inline functions to avoid stack overflow
On Thu, Jun 26, 2008 at 12:09 AM, David Miller <[EMAIL PROTECTED]> wrote: > From: Mikulas Patocka <[EMAIL PROTECTED]> > Date: Wed, 25 Jun 2008 08:53:10 -0400 (EDT) > >> Even worse, gcc doesn't use these additional bytes. If you try this: >> >> extern void f(int *i); >> void g() >> { >> int a; >> f(&a); >> } >> >> , it allocates additional 16 bytes for the variable "a" (so there's total >> 208 bytes), even though it could place the variable into 48-byte >> ABI-mandated area that it inherited from the caller or into it's own >> 16-byte padding that it made when calling "f". > > The extra 16 bytes of space allocated is so that GCC can perform a > secondary reload of a quad floating point value. It always has to be > present, because we can't satisfy a secondary reload by emitting yet > another reload, it's the end of the possible level of recursions > allowed by the reload pass. Is there any floating-point code present in the Linux kernel ? Would it be a good idea to add an option to gcc that tells gcc that the compiled code does not contain floating-point instructions, such that gcc knows that no space has to be provided for a quad floating point value ? Bart.
Re: Optimization of conditional access to globals: thread-unsafe?
On 10/21/07, Tomash Brechko <[EMAIL PROTECTED]> wrote: > Hello, > > I have a question regarding the thread-safeness of a particular GCC > optimization. I'm sorry if this was already discussed on the list, if > so please provide me with the reference to the previous discussion. > > Consider this piece of code: > > extern int v; > > void > f(int set_v) > { > if (set_v) > v = 1; > } > > If f() is called concurrently from several threads, then call to f(1) > should be protected by the mutex. But do we have to acquire the mutex > for f(0) calls? I'd say no, why, there's no access to global v in > that case. But GCC 3.3.4--4.3.0 on i686 with -01 generates the > following: > > f: > pushl %ebp > movl%esp, %ebp > cmpl$0, 8(%ebp) > movl$1, %eax > cmove v, %eax; load (maybe) > movl%eax, v; store (always) > popl%ebp > ret > > Note the last unconditional store to v. Now, if some thread would > modify v between our load and store (acquiring the mutex first), then > we will overwrite the new value with the old one (and would do that in > a thread-unsafe manner, not acquiring the mutex). > > So, do the calls to f(0) require the mutex, or it's a GCC bug? ... > So, could someone explain me why this GCC optimization is valid, and, > if so, where lies the boundary below which I may safely assume GCC > won't try to store to objects that aren't stored to explicitly during > particular execution path? Or maybe the named bug report is valid > after all? Hello Tomash, I'm not an expert in the C89/C99 standards, but I have written a Ph.D. on the subject of memory models. What I learned during writing that Ph.D. is the following: - If you want to know which optimizations are valid and which ones are not, you have to look at the semantics defined in the language standard. - Every language standard document defines what the result is of executing a sequential program. The definition of the behavior of a multithreaded program written in a certain programming language is called the memory model of that programming language. - The memory model of C and C++ is still under discussion as has already been pointed out on this mailing list. - Although the memory model for C and C++ is still under discussion, there is a definition for the behavior of multithreaded C and C++ programs. The following is required by the ANSI/ISO C89 standard (from paragraph 5.1.2.3, Program Execution): Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.) In annex C it is explained that a.o. the call to a function (after argument evaluation) is a sequence point. See also http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf - The above paragraph does not impose any limitation for the compiler with regard to optimizations on non-volatile variables. Or: the generated code shown in your mail is allowed by the above paragraph. - The above paragraph has also the following implications for volatile variables: * There exists a total order for all accesses to all volatile variables. * It is the responsibility of the compiler to ensure cache coherency for volatile variables. If memory barrier instructions are needed to ensure cache coherency on the architecture for which the compiler is generating code for, then it is the responsibility of the compiler to generate these instructions for volatile variables. This fact is often overlooked. * The compiler must generate code such that exactly one store statement is executed for each assignment to a volatile variable. Prefetching volatile variables is allowed as long as it does not violate paragraph 5.1.2.3 from the language definition. * As known the compiler may reorder function calls and assignments to non-volatile variables if the compiler can prove that the called function won't modify that variable. This becomes problematic if the variable is modified by more than one thread and the called function is a synchronization function, e.g. pthread_mutex_lock(). This kind of reordering is highly undesirable. This is why any variable that is shared over threads has to be declared volatile, even when using explicit locking calls. I hope the above brings more clarity in this discussion. Bart Van Assche.
Re: Optimization of conditional access to globals: thread-unsafe?
On 10/22/07, Andrew Haley wrote: > The core problem here seems to be that the "C with threads" memory > model isn't sufficiently well-defined to make a determination > possible. You're assuming that you have no responsibility to mark > shared memory protected by a mutex as volatile, but I know of nothing > in the C standard that makes such a guarantee. A prudent programmer > will make conservative assumptions. I agree that according to the C and C++ language standards, any variable shared over threads should be declared volatile. But it is nearly impossible to live with this requirement: this requirement implies that for each library function that modifies data through pointers a second version should be added that accepts a volatile pointer instead of a regular pointer. Consider e.g. the function snprintf(), which writes to the character buffer passed as its first argument. When snprintf() is used to write to a buffer that is not shared over threads, the existing snprintf() function is fine. When however snprintf() is used to write to a buffer that is shared by two or more threads, a version is needed of snprintf() that accepts volatile char* as its first argument. My opinion is that it should be possible to declare whether C/C++ code has acquire, release or acquire+release semantics. The fact that code has acquire semantics means that no subsequent load or store operations may be moved in front of that code, and the fact that code has release semantics means that no preceding load or store operations may be moved past that code. Adding definitions for acquire and release semantics in pthread.h would help a lot. E.g. pthread_mutex_lock() should be declared to have acquire semantics, and pthread_mutex_unlock() should be declared to have release semantics. Maybe it is a good idea to add the following function attributes in gcc: __attribute__((acquire)) and __attribute__((release)) ? A refinement of these attributes would be to allow to specify not only the acquire/release attributes, but also the memory locations to which the acquire and release apply (pthreads synchronization functions always apply to all memory locations). I'm not inventing anything new here -- as far as I know the concepts of acquire and release were first defined by Gharachorloo e.a. in 1990 (Memory consistency and event ordering in scalable shared-memory multiprocessors, International Symposium on Computer Architecture, 1990, http://portal.acm.org/citation.cfm?id=325102&dl=ACM&coll=GUIDE). Bart Van Assche.
Re: Optimization of conditional access to globals: thread-unsafe?
> > On 10/22/07, Andrew Haley wrote: > > > > > The core problem here seems to be that the "C with threads" memory > > > model isn't sufficiently well-defined to make a determination > > > possible. You're assuming that you have no responsibility to mark > > > shared memory protected by a mutex as volatile, but I know of nothing > > > in the C standard that makes such a guarantee. A prudent programmer > > > will make conservative assumptions. I'd like to return on this. Variables that are shared over two or more threads can be classified as follows: - static variables, global variables and dynamically allocated data. - variables allocated on the stack. As known the compiler may not reorder any access to any static variable, global variable or dynamically allocated data with a call to a function that is not declared inline. Variables that are allocated on the stack of a thread can only be shared with another thread only by passing a pointer to that variable to another thread first. Passing such a pointer inhibits reordering of accesses to shared local variables with a call to a function that is not declared inline. Or: a C/C++ compiler will never reorder accesses to shared variables with function calls. So if accesses to shared variables are properly guarded with calls to synchronization functions, it is not necessary to declare these shared variables volatile. There is one category of synchronization functions that has not yet been discussed: synchronization functions that can be inlined. I propose to require that such synchronization functions contain at least one asm statement that specifies that it clobbers memory, because this inhibits reordering between the inlined synchronization function and accesses to variables. Maybe it's a good idea to add a chapter to the gcc manual about multithreaded programming, such that gcc users who did not follow this discussion can look up this kind of information easily ? Bart Van Assche.
Re: Optimization of conditional access to globals: thread-unsafe?
On 10/27/07, Florian Weimer <[EMAIL PROTECTED]> wrote: > And this isn't really specific to threads. Hello Florian, What I was trying to explain is that it is not necessary to declare shared variables volatile, not for any C/C++ compiler that is compliant with the language standard. Your reply did not acknowledge nor deny this. Bart Van Assche.
Re: Optimization of conditional access to globals: thread-unsafe?
On 10/27/07, Florian Weimer <[EMAIL PROTECTED]> wrote: > > Anyway, not reordering across function calls is not sufficient to get > sane threading semantics (IIRC, this is also explained in detail in Hans > Boehm's paper). Hello Florian, In Hans Boehm's paper the following issues are identified: 1. Concurrent accesses of variables without explicit locking can cause unexpected results in a multithreaded context (paragraph 4.1). 2. If non-atomic variables (e.g. one field of a bitfield) are shared over threads, and these are not protected by explicit locking, updating such a variable in a multithreaded context is troublesome (paragraph 4.2). 3. If the compiler performs register promotion on a shared variable, this can cause undesired results in a multithreaded context (paragraph 4.3) And this thread started with: 4. If the compiler generates a store operation for an assignment statement that is not executed, this can cause trouble in a mulithreaded context. My opinion is that, given the importance of multithreading, it should be documented in the gcc manual which optimizations can cause trouble in multithreaded software (such as (3) and (4)). It should also be documented which compiler flags must be used to disable optimizations that cause trouble for multithreaded software. Requiring that all thread-shared variables should be declared volatile is completely unacceptable. We need a solution today for the combination of C/C++ and POSIX threads, we can't wait for the respective language standardization committees to come up with a solution. Regarding issues (1) and (2): (1) can be addressed by using platform-specific or compiler-specific solutions, e.g. the datatype atomic_t provided by the Linux kernel headers. And any prudent programmer won't write code that triggers (2). And as you may have noted, I do not agree with Hans Boehm where he states that the combination of C/C++ with POSIX threads cannot result in correctly working programs. I believe that the issues raised by Hans Boehm can be solved. Bart Van Assche.
Re: Optimization of conditional access to globals: thread-unsafe?
On 10/28/07, Robert Dewar <[EMAIL PROTECTED]> wrote: > Bart Van Assche wrote: > > > My opinion is that, given the importance of multithreading, it should > > be documented in the gcc manual which optimizations can cause trouble > > in multithreaded software (such as (3) and (4)). It should also be > > documented which compiler flags must be used to disable optimizations > > that cause trouble for multithreaded software. Requiring that all > > thread-shared variables should be declared volatile is completely > > unacceptable. > > Why is this unacceptable .. seems much better to me than writing > undefined stuff. Requiring that all thread-shared variables must be declared volatile, even those protected by calls to synchronization functions, implies that all multithreaded code and all existing libraries have to be changed. Functions like snprintf() write data to a buffer provided by the caller. If all thread-shared variables should be declared volatile, then a second version of snprintf() would be required with volatile char* as the datatype of the first argument instead of char*. Quite a hefty requirement if you ask me ... Bart Van Assche.