http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031
--- Comment #3 from js-gcc at webkeks dot org <js-gcc at webkeks dot org> 2011-01-01 12:06:56 UTC --- > The problem is that property accessors are basically general purpose routines that may be used in the most varied situations. It does not matter very much in which situation a property is used. To chose which type of lock you use, it's only important what is done while the lock is held. In this case, no call to the kernel-space is made at all and only a small operation is done. Switching to kernel-space for a mutex is already way more complex than what we do in the lock. If I'd have to guess, I'd say switching to kernel-space is at least 100 times more expensive than what we do. > So, we have very little control or knowledge over when and how they are used > -- Which we don't care about at all. > * we don't know how many CPUs or cores the user has Does not really matter. If we have two cores, the spinlock can give control to another thread after 10 spins using sched_yield(). So, if we only have one core and one thread spins because it waits for another core to release the lock, then we waste at maximum 10 tries. This is the worst-case scenario. If we have more than one core, we most likely have another thread releasing the lock before it even spinned 10 times. So, no matter how many cores, it does not perform worse than a mutex (at least not in a measurable way), while on systems with many cores, it's a huge improvement. Plus changing a property is something that's so fast that we most likely will never encounter a locked spinlock. That'd only happen if the scheduler gave control to another thread before the property was changed. So, with spinlocks, in 99% of the cases, it's not even measurable. With mutexes, in 100% of the cases, it IS measurable. > * we don't know how many threads the user is starting > * we don't know how many threads are sharing a CPU or core We don't really care about them, I think. > * we don't know how intensively the user is using the property accessors So, because we don't know how intensively the user is using properties, we will make them slow on purpose? > Spinlocks are appropriate when certain conditions are met; but in this case, > it seems impossible to be confident that these are met. Which conditions are not met in your opinion? Please list the conditions that you think are not met, as Apple clearly thinks they are all met. And so do I. > A user may write a > program with 3 or 4 threads running on his 1 CPU/core machine, which > constantly > read/write an atomic synthesized property to synchronize between themselves. > Why not; but then, spinlocks would actually degrade performance instead of > improving it. This is actually why you call sched_yield() after 10 spins. It prevents a thread from being stuck spinning while another thread could release the lock. > Traditional locks may be slower if you a low contention case, but work > consistently OK in all conditions. Yes, they are the same in all conditions because they are always more complex and slower ;). > * spinlocks are better/faster if there is low contention and very little > chance that two threads enter the critical region (inside the accessors) at > the > same time. This is the case here. > * the difference in performance between mutexes and spinlocks only matters in > the program performance if the accessors are called very often. If you init a lot of objects and those initialize let's say 30 variables using properties, then this means that 30 locks are retained and released, although no other thread could possibly access it. But still you do 30 userland-kernelspace-switches. For a single object! Now create 1000 objects. With spinlocks, there won't be a single userland-kernelspace-switch! Just to demonstrate that we are talking about something which really can make a huge difference… I think the percentages you list cannot be used at all, as we don't have applications just doing some math calculations and then quitting. We don't want something slow just because it might only be a small part of the program. We want everything to be as fast as possible. Otherwise it sums up and makes a crappy user experience for interactive applications. Apple demonstrated this quite well if you compare how crappy it felt a few years ago and how well it feels now that they started optimizing the small stuff as well. > The only case where spinlocks really help is if the program spends lots of > time calling accessors, and is not multi-threaded. In which case, the programmer could get a huge speed-up by simply declaring the properties non-atomic. Even in a threaded environment, it would make a huge difference. It's unlikely the lock is held. Only if it is held, you need some CPU time. But with Mutexes, each time you only check if the lock is held, you already switch to kernel-space. > Would using spinlocks make > accessors 2x faster ? 10x faster ? 10% faster ? My guess is that usually the spinlock is not held, so I could imagine factor 100 faster or even factor 1000. I remember having had some test some while ago where I tried just locking and releasing a mutex and a spinlock and doing one arithmetic operation. While the version with spinlocks took only a few seconds, the version with mutexes still was not finished after a few hours, which was when I aborted it.