Iain, Were you planning to try to get this committed before stage4 or will it have to wait for the next major gcc release? Jack
On Thu, Nov 13, 2014 at 3:34 PM, Iain Sandoe <i...@codesourcery.com> wrote: > Hello Richard, Joseph, > > Thanks for your reviews, > > On 13 Nov 2014, at 07:40, Richard Henderson wrote: > >> On 11/12/2014 10:18 PM, Iain Sandoe wrote: >> >>> # ifndef USE_ATOMIC >>> # define USE_ATOMIC 1 >>> # endif >> >> Why would USE_ATOMIC be defined previously? > > This was left-over from a mode where I allowed the User to jam the mode to > OSSpinLocks to test the performance. > > I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, my > trunk version of the patch had got behind my 4.9 one. (most of the work has > been hammered out there while we try to get bootstrap restored). > > re-synced and retested with a patched trunk that bootstraps with some > band-aid. > >>> inline static void LockUnlock(uint32_t *l) { >>> __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE); >>> } >> >> Gnu coding style, please. All through the file here. > > Fixed. > >> # define LOCK_SIZE sizeof(uint32_t) >>> # define NLOCKS (PAGE_SIZE / LOCK_SIZE) >>> static uint32_t locks[NLOCKS]; >> >> Um, surely not LOCK_SIZE, but CACHELINE_SIZE. It's the granularity of the >> target region that's at issue, not the size of the lock itself. > > The algorithm I've used is intentionally different from the pthreads-based > posix one, here's the rationale, as I see it: > > /* Algorithm motivations. > > Layout Assumptions: > o Darwin has a number of sub-targets with common atomic types that have no > 'native' in-line handling, but are smaller than a cache-line. > E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16. > > o The _Atomic alignment of a "natural type" is no greater than the type > size. > > o There are no special guarantees about the alignment of _Atomic aggregates > other than those determined by the psABI. > > o There are no guarantees that placement of an entity won't cause it to > straddle a cache-line boundary. > > o Realistic User code will likely place several _Atomic qualified types in > close proximity (such that they fall within the same cache-line). > Similarly, arrays of _Atomic qualified items. > > Performance Assumptions: > o Collisions of address hashes for items (which make up the lock keys) > constitute the largest performance issue. > > o We want to avoid unnecessary flushing of [lock table] cache-line(s) when > items are accessed. > > Implementation: > We maintain a table of locks, each lock being 4 bytes (at present). > This choice of lock size gives some measure of equality in hash-collision > statistics between the 'atomic' and 'OSSpinLock' implementations, since the > lock size is fixed at 4 bytes for the latter. > > The table occupies one physical page, and we attempt to align it to a > page boundary, appropriately. > > For entities that need a lock, with sizes < one cache line: > Each entity that requires a lock, chooses the lock to use from the table > on > the basis of a hash determined by its size and address. The lower > log2(size) > address bits are discarded on the assumption that the alignment of entities > will not be smaller than their size. > CHECKME: this is not verified for aggregates; it might be something that > could/should be enforced from the front ends (since _Atomic types are > allowed to have increased alignment c.f. 'normal'). > > For entities that need a lock, with sizes >= one cacheline_size: > We assume that the entity alignment >= log2(cacheline_size) and discard > log2(cacheline_size) bits from the address. > We then apply size/cacheline_size locks to cover the entity. > > The idea is that this will typically result in distinct hash keys for items > placed close together. The keys are mangled further such that the size is > included in the hash. > > Finally, to attempt to make it such that the lock table entries are accessed > in a scattered manner,to avoid repeated cacheline flushes, the hash is > rearranged to attempt to maximise the most noise in the upper bits. > */ > > NOTE that the CHECKME above doesn't put us in any worse position than the > pthreads implementation (likely slightly better since we have a smaller > granularity with the current scheme). > >>> #if USE_ATOMIC >>> LockLock (&locks[addr_hash (ptr, 1)]); >>> #else >>> OSSpinLockLock(&locks[addr_hash (ptr, 1)]); >>> #endif >> >> Better to #define LockLock OSSpinLockLock within the area above, so as to >> avoid the ifdefs here. > done. > > Thoughts on the rationale - or OK now? > > thanks > Iain > > I'm not aware of any other PRs that relate, but will do a final scan through > and ask around the darwin folks. > > libatomic: > > PR target/59305 > * config/darwin/host-config.h New. > * config/darwin/lock.c New. > * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for > darwin. > > > > >