Iain,
     Were you planning to try to get this committed before stage4 or
will it have to wait for the next major gcc release?
                  Jack

On Thu, Nov 13, 2014 at 3:34 PM, Iain Sandoe <i...@codesourcery.com> wrote:
> Hello Richard, Joseph,
>
> Thanks for your reviews,
>
> On 13 Nov 2014, at 07:40, Richard Henderson wrote:
>
>> On 11/12/2014 10:18 PM, Iain Sandoe wrote:
>>
>>> #  ifndef USE_ATOMIC
>>> #    define USE_ATOMIC 1
>>> #  endif
>>
>> Why would USE_ATOMIC be defined previously?
>
> This was left-over from a mode where I allowed the User to jam the mode to 
> OSSpinLocks to test the performance.
>
> I apologise, [weak excuse follows] with the turbulence of Darwin on trunk, my 
> trunk version of the patch had got behind my 4.9 one. (most of the work has 
> been hammered out there while we try to get bootstrap restored).
>
> re-synced and retested with a patched trunk that bootstraps with some 
> band-aid.
>
>>> inline static void LockUnlock(uint32_t *l) {
>>>   __atomic_store_4((_Atomic(uint32_t)*)l, 0, __ATOMIC_RELEASE);
>>> }
>>
>> Gnu coding style, please.  All through the file here.
>
> Fixed.
>
>> #  define LOCK_SIZE sizeof(uint32_t)
>>> #  define NLOCKS (PAGE_SIZE / LOCK_SIZE)
>>> static uint32_t locks[NLOCKS];
>>
>> Um, surely not LOCK_SIZE, but CACHELINE_SIZE.  It's the granularity of the
>> target region that's at issue, not the size of the lock itself.
>
> The algorithm I've used is intentionally different from the pthreads-based 
> posix one, here's the rationale, as I see it:
>
> /* Algorithm motivations.
>
> Layout Assumptions:
>   o Darwin has a number of sub-targets with common atomic types that have no
>     'native' in-line handling, but are smaller than a cache-line.
>     E.G. PPC32 needs locking for >= 8byte quantities, X86/m32 for >=16.
>
>   o The _Atomic alignment of a "natural type" is no greater than the type 
> size.
>
>   o There are no special guarantees about the alignment of _Atomic aggregates
>     other than those determined by the psABI.
>
>   o There are no guarantees that placement of an entity won't cause it to
>     straddle a cache-line boundary.
>
>   o Realistic User code will likely place several _Atomic qualified types in
>     close proximity (such that they fall within the same cache-line).
>     Similarly, arrays of _Atomic qualified items.
>
> Performance Assumptions:
>   o Collisions of address hashes for items (which make up the lock keys)
>     constitute the largest performance issue.
>
>   o We want to avoid unnecessary flushing of [lock table] cache-line(s) when
>     items are accessed.
>
> Implementation:
>   We maintain a table of locks, each lock being 4 bytes (at present).
>   This choice of lock size gives some measure of equality in hash-collision
>   statistics between the 'atomic' and 'OSSpinLock' implementations, since the
>   lock size is fixed at 4 bytes for the latter.
>
>   The table occupies one physical page, and we attempt to align it to a
>   page boundary, appropriately.
>
>   For entities that need a lock, with sizes < one cache line:
>     Each entity that requires a lock, chooses the lock to use from the table 
> on
>   the basis of a hash determined by its size and address.  The lower 
> log2(size)
>   address bits are discarded on the assumption that the alignment of entities
>   will not be smaller than their size.
>   CHECKME: this is not verified for aggregates; it might be something that
>   could/should be enforced from the front ends (since _Atomic types are
>   allowed to have increased alignment c.f. 'normal').
>
>   For entities that need a lock, with sizes >= one cacheline_size:
>     We assume that the entity alignment >= log2(cacheline_size) and discard
>   log2(cacheline_size) bits from the address.
>   We then apply size/cacheline_size locks to cover the entity.
>
>   The idea is that this will typically result in distinct hash keys for items
>   placed close together.  The keys are mangled further such that the size is
>   included in the hash.
>
>   Finally, to attempt to make it such that the lock table entries are accessed
>   in a scattered manner,to avoid repeated cacheline flushes, the hash is
>   rearranged to attempt to maximise the most noise in the upper bits.
> */
>
> NOTE that the CHECKME above doesn't put us in any worse position than the 
> pthreads implementation (likely slightly better since we have a smaller 
> granularity with the current scheme).
>
>>> #if USE_ATOMIC
>>>   LockLock (&locks[addr_hash (ptr, 1)]);
>>> #else
>>>   OSSpinLockLock(&locks[addr_hash (ptr, 1)]);
>>> #endif
>>
>> Better to #define LockLock  OSSpinLockLock within the area above, so as to
>> avoid the ifdefs here.
> done.
>
> Thoughts on the rationale - or OK now?
>
> thanks
> Iain
>
> I'm not aware of any other PRs that relate, but will do a final scan through 
> and ask around the darwin folks.
>
> libatomic:
>
>         PR target/59305
>         * config/darwin/host-config.h New.
>         * config/darwin/lock.c New.
>         * configure.tgt (DEFAULT_X86_CPU): New, (target): New entry for 
> darwin.
>
>
>
>
>

Reply via email to