Re: [PATCH, Libatomic, Darwin] Initial libatomic port for darwin.

Jack Howarth Thu, 08 Jan 2015 13:57:14 -0800

Iain,
     Were you planning to try to get this committed before stage4 or
will it have to wait for the next major gcc release?
                  Jack


On Thu, Nov 13, 2014 at 3:34 PM, Iain Sandoe <i...@codesourcery.com> wrote:
> Hello Richard, Joseph,
>
> Thanks for your reviews,
>
> On 13 Nov 2014, at 07:40, Richard Henderson wrote:
>
>> On 11/12/2014 10:18 PM, Iain Sandoe wrote:
>>
>>> #  ifndef USE_ATOMIC
>>> #    define USE_ATOMIC 1
>>> #  endif
>>
>> Why would USE_ATOMIC be defined previously?
>
> This was left-over from a mode where I allowed the User to jam the mode to 
> OSSpinLocks to test the performance.
>
> I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, my 
> trunk version of the patch had got behind my 4.9 one. (most of the work has 
> been hammered out there while we try to get bootstrap restored).
>
> re-synced and retested with a patched trunk that bootstraps with some 
> band-aid.
>
>>> inline static void LockUnlock(uint32_t *l) {
>>>   __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
>>> }
>>
>> Gnu coding style, please.  All through the file here.
>
> Fixed.
>
>> #  define LOCK_SIZE sizeof(uint32_t)
>>> #  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>>> static uint32_t locks[NLOCKS];
>>
>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>> target region that's at issue, not the size of the lock itself.
>
> The algorithm I've used is intentionally different from the pthreads-based 
> posix one, here's the rationale, as I see it:
>
> /* Algorithm motivations.
>
> Layout Assumptions:
>   o Darwin has a number of sub-targets with common atomic types that have no
>     'native' in-line handling, but are smaller than a cache-line.
>     E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.
>
>   o The _Atomic alignment of a "natural type" is no greater than the type 
> size.
>
>   o There are no special guarantees about the alignment of _Atomic aggregates
>     other than those determined by the psABI.
>
>   o There are no guarantees that placement of an entity won't cause it to
>     straddle a cache-line boundary.
>
>   o Realistic User code will likely place several _Atomic qualified types in
>     close proximity (such that they fall within the same cache-line).
>     Similarly, arrays of _Atomic qualified items.
>
> Performance Assumptions:
>   o Collisions of address hashes for items (which make up the lock keys)
>     constitute the largest performance issue.
>
>   o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
>     items are accessed.
>
> Implementation:
>   We maintain a table of locks, each lock being 4 bytes (at present).
>   This choice of lock size gives some measure of equality in hash-collision
>   statistics between the 'atomic' and 'OSSpinLock' implementations, since the
>   lock size is fixed at 4 bytes for the latter.
>
>   The table occupies one physical page, and we attempt to align it to a
>   page boundary, appropriately.
>
>   For entities that need a lock, with sizes < one cache line:
>     Each entity that requires a lock, chooses the lock to use from the table 
> on
>   the basis of a hash determined by its size and address.  The lower 
> log2(size)
>   address bits are discarded on the assumption that the alignment of entities
>   will not be smaller than their size.
>   CHECKME: this is not verified for aggregates; it might be something that
>   could/should be enforced from the front ends (since _Atomic types are
>   allowed to have increased alignment c.f. 'normal').
>
>   For entities that need a lock, with sizes >= one cacheline_size:
>     We assume that the entity alignment >= log2(cacheline_size) and discard
>   log2(cacheline_size) bits from the address.
>   We then apply size/cacheline_size locks to cover the entity.
>
>   The idea is that this will typically result in distinct hash keys for items
>   placed close together.  The keys are mangled further such that the size is
>   included in the hash.
>
>   Finally, to attempt to make it such that the lock table entries are accessed
>   in a scattered manner,to avoid repeated cacheline flushes, the hash is
>   rearranged to attempt to maximise the most noise in the upper bits.
> */
>
> NOTE that the CHECKME above doesn't put us in any worse position than the 
> pthreads implementation (likely slightly better since we have a smaller 
> granularity with the current scheme).
>
>>> #if USE_ATOMIC
>>>   LockLock (&locks[addr_hash (ptr, 1)]);
>>> #else
>>>   OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
>>> #endif
>>
>> Better to #define LockLock  OSSpinLockLock within the area above, so as to
>> avoid the ifdefs here.
> done.
>
> Thoughts on the rationale - or OK now?
>
> thanks
> Iain
>
> I'm not aware of any other PRs that relate, but will do a final scan through 
> and ask around the darwin folks.
>
> libatomic:
>
>         PR target/59305
>         * config/darwin/host-config.h New.
>         * config/darwin/lock.c New.
>         * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for 
> darwin.
>
>
>
>
>

Re: [PATCH, Libatomic, Darwin] Initial libatomic port for *darwin*.

Reply via email to

Re: [PATCH, Libatomic, Darwin] Initial libatomic port for darwin.