On Tue, Jan 19, 2016 at 03:28:52PM -0500, Rich Felker wrote:
> I've been working on the new version of runtime-selected SH atomics
> for musl, and I think what I've got might be appropriate for GCC's
> generated atomics too. I know Oleg was not very excited about doing
> this on the gcc side from a cost/benefit perspective, but I think my
> approach is actually preferable over inline atomics from a code size
> perspective. It uses a single "cas" function with an "SFUNC" type ABI
> (not standard calling convention) with the following constraints:
> 
> Inputs:
> - R0: Memory address to operate on
> - R1: Address of implementation function, loaded from a global
> - R2: Comparison value
> - R3: Value to set on success
> 
> Outputs:
> - R3: Old value read, ==R2 iff cas succeeded.
> 
> Preserved: R0, R2.
> 
> Clobbered: R1, PR, T.
> 
> This call (performed from __asm__ for musl, but gcc would do it as SH
> "SFUNC") is highly compact/convenient for inlining because it avoids
> clobbering any of the argument registers that are likely to already be
> in use by the caller, and it preserves the important values that are
> likely to be reused after the cas operation.
> 
> For J2 and future J4, the function pointer just points to:
> 
>       rts
>        cas.l r2,r3,@r0
> 
> and the only costs vs an inline cas.l are loading the address of the
> function (done in the caller; involves GOT access) and clobbering R1
> and PR.
> 
> This is still a draft design and the version in musl is subject to
> change at any time since it's not a public API/ABI, but I think it
> could turn into something useful to have on the gcc side with a
> -matomic-model=libfunc option or similar. Other ABI considerations for
> gcc use would be where to store the function pointer and how to
> initialize it. To be reasonably efficient with FDPIC the caller needs
> to be responsible for loading the function pointer (and it needs to
> always point to code, not a function descriptor) so that the callee
> does not need a GOT pointer passed in.

Attached is my current draft of the implementations of the cas 'sfunc'
for musl. Forgot to include it before.

Rich
/* Contract for all versions is same as cas.l r2,r3,@r0
 * pr and r1 are also clobbered (by jsr & r1 as temp).
 * r0,r2,r4-r15 must be preserved.
 * r3 contains result (==r2 iff cas succeeded). */

        .align 2
__sh_cas_gusa:
        mov.l r5,@-r15
        mov.l r4,@-r15
        mov.l r0,r4
        mova 1f,r0
        mov r15,r1
        mov #(0f-1f),r15
0:      mov.l @r4,r5
        cmp/eq r5,r2
        bf 1f
        mov.l r3,@r4
1:      mov r1,r15
        mov r5,r3
        mov r4,r0
        mov.l @r15+,r4
        rts
         mov.l @r15+,r5

__sh_cas_llsc:
        mov r0,r1
        synco
0:      movli.l @r1,r0
        cmp/eq r0,r2
        bf 1f
        mov r3,r0
        movco.l r0,@r1
        bf 0b
        mov r2,r0
1:      synco
        mov r0,r3
        rts
         mov r1,r0

__sh_cas_imask:
        mov r0,r1
        stc sr,r0
        mov.l r0,@-r15
        or #0xf0,r0
        ldc r0,sr
        mov.l @r1,r0
        cmp/eq r0,r2
        bf 1f
        mov r3,@r1
1:      ldc.l @r15+,sr
        mov r0,r3
        rts
         mov r1,r0

__sh_cas_cas_l:
        rts
         cas.l r2,r3,@r0

Reply via email to