On Fri, Oct 26, 2007 at 01:23:15PM -0700, Ian Lance Taylor wrote:
> Andi Kleen <[EMAIL PROTECTED]> writes:
> 
> > Ian Lance Taylor <[EMAIL PROTECTED]> writes:
> > >
> > > This code isn't going to be a problem, because spin_unlock presumably
> > > includes a memory barrier.
> > 
> > At least in the Linux kernel and also in glibc for mutexes locks are just 
> > plain
> > function calls, which are not necessarily full memory barriers.
> 
> True, and problematic in some cases--but a function call which gcc
> can't see is a memory barrier for all addressable memory.

I constructed a test case now to show why the optimization is a bad 
idea in general. It just essentially measures how much it costs
to do the access on a cache cold variable. On a Core2 this is about 

% gcc -o tstore tstore.c 
% ./tstore 
209 cycles
% gcc -O2 -o tstore tstore.c 
% ./tstore 
671 cycles

It runs about 3x faster without optimization (no if conversion of
variable++) than without because of the cache miss.

Your patch would fix it too because it uses a function call, but
it might not in the general case when the condition happens to be 
not a function call.

-Andi

(x86 specific, but can be adapted to other architectures)


#include <stdio.h>
#include <string.h>

int GO_SLOW = 0;

#define LARGE (5*1020*1024)
int larger_than_cache[LARGE];   
int variable;

static inline unsigned long long rdtsc(void)
{
        unsigned a,d;
        asm volatile("rdtsc" : "=a" (a), "=d" (d));
        return a | ((unsigned long long)d) << 32;
}

void test(void)
{
        unsigned long start, end;
        start = rdtsc();
        if (go_slow())
                variable++;
        end = rdtsc();
        printf("%Lu cycles\n", end - start);
}

int go_slow(void)
{
        return GO_SLOW;
}

int main(void)
{
        variable++;
        memset(&larger_than_cache, 0, sizeof larger_than_cache);
        go_slow();
        test();
        return 0;
}

Reply via email to