Iain,
What is the status of this patch?
Jack
On Thu, Nov 13, 2014 at 3:34 PM, Iain Sandoe <[email protected]> wrote:
> Hello Richard, Joseph,
>
> Thanks for your reviews,
>
> On 13 Nov 2014, at 07:40, Richard Henderson wrote:
>
>> On 11/12/2014 10:18 PM, Iain Sandoe wrote:
>>
>>> # ifndef USE_ATOMIC
>>> # define USE_ATOMIC 1
>>> # endif
>>
>> Why would USE_ATOMIC be defined previously?
>
> This was left-over from a mode where I allowed the User to jam the mode to
> OSSpinLocks to test the performance.
>
> I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, my
> trunk version of the patch had got behind my 4.9 one. (most of the work has
> been hammered out there while we try to get bootstrap restored).
>
> re-synced and retested with a patched trunk that bootstraps with some
> band-aid.
>
>>> inline static void LockUnlock(uint32_t *l) {
>>> __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
>>> }
>>
>> Gnu coding style, please. All through the file here.
>
> Fixed.
>
>> # define LOCK_SIZE sizeof(uint32_t)
>>> # define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>>> static uint32_t locks[NLOCKS];
>>
>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE. It's the granularity of the
>> target region that's at issue, not the size of the lock itself.
>
> The algorithm I've used is intentionally different from the pthreads-based
> posix one, here's the rationale, as I see it:
>
> /* Algorithm motivations.
>
> Layout Assumptions:
> o Darwin has a number of sub-targets with common atomic types that have no
> 'native' in-line handling, but are smaller than a cache-line.
> E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.
>
> o The _Atomic alignment of a "natural type" is no greater than the type
> size.
>
> o There are no special guarantees about the alignment of _Atomic aggregates
> other than those determined by the psABI.
>
> o There are no guarantees that placement of an entity won't cause it to
> straddle a cache-line boundary.
>
> o Realistic User code will likely place several _Atomic qualified types in
> close proximity (such that they fall within the same cache-line).
> Similarly, arrays of _Atomic qualified items.
>
> Performance Assumptions:
> o Collisions of address hashes for items (which make up the lock keys)
> constitute the largest performance issue.
>
> o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
> items are accessed.
>
> Implementation:
> We maintain a table of locks, each lock being 4 bytes (at present).
> This choice of lock size gives some measure of equality in hash-collision
> statistics between the 'atomic' and 'OSSpinLock' implementations, since the
> lock size is fixed at 4 bytes for the latter.
>
> The table occupies one physical page, and we attempt to align it to a
> page boundary, appropriately.
>
> For entities that need a lock, with sizes < one cache line:
> Each entity that requires a lock, chooses the lock to use from the table
> on
> the basis of a hash determined by its size and address. The lower
> log2(size)
> address bits are discarded on the assumption that the alignment of entities
> will not be smaller than their size.
> CHECKME: this is not verified for aggregates; it might be something that
> could/should be enforced from the front ends (since _Atomic types are
> allowed to have increased alignment c.f. 'normal').
>
> For entities that need a lock, with sizes >= one cacheline_size:
> We assume that the entity alignment >= log2(cacheline_size) and discard
> log2(cacheline_size) bits from the address.
> We then apply size/cacheline_size locks to cover the entity.
>
> The idea is that this will typically result in distinct hash keys for items
> placed close together. The keys are mangled further such that the size is
> included in the hash.
>
> Finally, to attempt to make it such that the lock table entries are accessed
> in a scattered manner,to avoid repeated cacheline flushes, the hash is
> rearranged to attempt to maximise the most noise in the upper bits.
> */
>
> NOTE that the CHECKME above doesn't put us in any worse position than the
> pthreads implementation (likely slightly better since we have a smaller
> granularity with the current scheme).
>
>>> #if USE_ATOMIC
>>> LockLock (&locks[addr_hash (ptr, 1)]);
>>> #else
>>> OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
>>> #endif
>>
>> Better to #define LockLock OSSpinLockLock within the area above, so as to
>> avoid the ifdefs here.
> done.
>
> Thoughts on the rationale - or OK now?
>
> thanks
> Iain
>
> I'm not aware of any other PRs that relate, but will do a final scan through
> and ask around the darwin folks.
>
> libatomic:
>
> PR target/59305
> * config/darwin/host-config.h New.
> * config/darwin/lock.c New.
> * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for
> darwin.
>
>
>
>
>