On Fri, Oct 26, 2007 at 01:23:15PM -0700, Ian Lance Taylor wrote: > Andi Kleen <[EMAIL PROTECTED]> writes: > > > Ian Lance Taylor <[EMAIL PROTECTED]> writes: > > > > > > This code isn't going to be a problem, because spin_unlock presumably > > > includes a memory barrier. > > > > At least in the Linux kernel and also in glibc for mutexes locks are just > > plain > > function calls, which are not necessarily full memory barriers. > > True, and problematic in some cases--but a function call which gcc > can't see is a memory barrier for all addressable memory.
I constructed a test case now to show why the optimization is a bad idea in general. It just essentially measures how much it costs to do the access on a cache cold variable. On a Core2 this is about % gcc -o tstore tstore.c % ./tstore 209 cycles % gcc -O2 -o tstore tstore.c % ./tstore 671 cycles It runs about 3x faster without optimization (no if conversion of variable++) than without because of the cache miss. Your patch would fix it too because it uses a function call, but it might not in the general case when the condition happens to be not a function call. -Andi (x86 specific, but can be adapted to other architectures) #include <stdio.h> #include <string.h> int GO_SLOW = 0; #define LARGE (5*1020*1024) int larger_than_cache[LARGE]; int variable; static inline unsigned long long rdtsc(void) { unsigned a,d; asm volatile("rdtsc" : "=a" (a), "=d" (d)); return a | ((unsigned long long)d) << 32; } void test(void) { unsigned long start, end; start = rdtsc(); if (go_slow()) variable++; end = rdtsc(); printf("%Lu cycles\n", end - start); } int go_slow(void) { return GO_SLOW; } int main(void) { variable++; memset(&larger_than_cache, 0, sizeof larger_than_cache); go_slow(); test(); return 0; }