PING Re: [Patch 0/4] ARM 64 bit sync atomic operations [V2]

2011-08-09 Thread David Gilbert
Other than Ramana's comment about my comment on 4/4, has anyone else got any
other input?  Otherwise I'd like to fix that comment and then get it in.

Dave

On 26 July 2011 09:59, Dr. David Alan Gilbert  wrote:
> Hi,
>  This is V2 of a series of 4 patches relating to ARM atomic operations;
> they incorporate most of the feedback from V1 - thanks Ramana, Richard and
> Joseph for comments.
>
>  1) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k
>     and above.
>  2) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
>     asssist is called (as per 32bit and smaller ops)
>  3) Fix pr48126 which is a misplaced barrier in the atomic generation
>  4) Correct the definition of TARGET_HAVE_DMB_MCR so that it doesn't
>     produce the mcr instruction in Thumb1 (and enable on ARMv6 not just 6k
>     as per the docs).
>
> Relative to v1:
>  Split the DMB_MCR patch out
>  Provide complete changelogs
>  Don't emit IT instruction except in Thumb2 mode
>  Move iterators to iterators.md (didn't move the table since it was specific
>    to sync.md)
>  Remove sync_atleastsi
>  Use sync_predtab in as many places as possible
>  Avoid headers in libgcc
>  Made various libgcc routines I added static
>  used __write instead of write
>  Comment the barrier move to explain it more
>
>  Note that the kernel interface has remained the same for the helper, and as
> such I've not changed the way the helper calling in patch 2 is structured.
>
> This code was tested with a full bootstrap on ARM; make check results
> are the same as without the patches except for extra passes due to the new
> tests.
>
> This work is part of Linaro blueprint:
> https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives
>
> Dave
>
>


PING PING Re: [Patch 0/4] ARM 64 bit sync atomic operations [V2]

2011-08-31 Thread David Gilbert
On 9 August 2011 11:30, David Gilbert  wrote:
> Other than Ramana's comment about my comment on 4/4, has anyone else got any
> other input?  Otherwise I'd like to fix that comment and then get it in.

We now have comments from Ramana on patches 3/4 and 4/4 - anyone for
anything on the
main pair ?

Dave

> On 26 July 2011 09:59, Dr. David Alan Gilbert  
> wrote:
>> Hi,
>>  This is V2 of a series of 4 patches relating to ARM atomic operations;
>> they incorporate most of the feedback from V1 - thanks Ramana, Richard and
>> Joseph for comments.
>>
>>  1) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k
>>     and above.
>>  2) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
>>     asssist is called (as per 32bit and smaller ops)
>>  3) Fix pr48126 which is a misplaced barrier in the atomic generation
>>  4) Correct the definition of TARGET_HAVE_DMB_MCR so that it doesn't
>>     produce the mcr instruction in Thumb1 (and enable on ARMv6 not just 6k
>>     as per the docs).
>>
>> Relative to v1:
>>  Split the DMB_MCR patch out
>>  Provide complete changelogs
>>  Don't emit IT instruction except in Thumb2 mode
>>  Move iterators to iterators.md (didn't move the table since it was specific
>>    to sync.md)
>>  Remove sync_atleastsi
>>  Use sync_predtab in as many places as possible
>>  Avoid headers in libgcc
>>  Made various libgcc routines I added static
>>  used __write instead of write
>>  Comment the barrier move to explain it more
>>
>>  Note that the kernel interface has remained the same for the helper, and as
>> such I've not changed the way the helper calling in patch 2 is structured.
>>
>> This code was tested with a full bootstrap on ARM; make check results
>> are the same as without the patches except for extra passes due to the new
>> tests.
>>
>> This work is part of Linaro blueprint:
>> https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives
>>
>> Dave
>>
>>
>


Re: [Patch 1/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread David Gilbert
On 30 September 2011 14:21, Ramana Radhakrishnan
 wrote:
> Hi Dave,
>
>
> The nit-picky bit - There are still a number of formatting issues with
> your patch . Could you run your patch through
> contrib/check_GNU_style.sh and correct these. These are typically
> around problems with the number of spaces between a full stop and the
> end of comment, lines with trailing whitespaces and a few lines with
> number of characters > 80.  Thanks.

Oops - sorry about those; I'll run it through the check script and nail them.

>>@@ -23590,82 +23637,142 @@ arm_output_sync_loop (emit_f emit,
>>
>>+      else
>>+      {
>>+        /* Silence false potentially unused warning */
>>+        required_value_lo = NULL;
>>+        required_value_hi = NULL;
>>+      }
>>
>
> s/NULL/NULL_RTX in a number of places in arm.c

OK.

>>+      /* The restrictions on target registers in ARM mode are that the two
>>+       registers are consecutive and the first one is even; Thumb is
>>+       actually more flexible, but DI should give us this anyway.
>>+       Note that the 1st register always gets the lowest word in memory.  */
>>+      gcc_assert ((REGNO (value) & 1) == 0);
>>+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
>>+      operands[3] = memory;
>>+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, 
>>%%C3",
>>+                         cc);
>>+    }
>>
>
> The restriction is actually mandatory for ARM state only and thus I'm fine
> with this assertion being true only in ARM state.

OK, I can make the assert only for thumb mode; but I thought the simpler
logic was better and should hold true anyway because of DI mode allocation.

> I don't like duplicating the tests from gcc.dg into gcc.target/arm.
> If you wanted to check for assembler output specific to a target you could
> add your own conditions to the test in gcc.dg and conditionalize that on
> target arm_eabi
>
> Something like :
>
> { dg-final { scan-assembler "ldrexd\t"} {target arm_eabi}} } .
>
> I would like a testsuite maintainer to comment on the testsuite infrastructure
> bits as well but I have a few comments below .

As discussed, I don't like the dupes either - the problem is that we
have 3 tests
with identical code but different dg annotation:

   1) Build & run and check that the sync behaves correctly - using whatever
   compile flags you happen to have. (gcc.dg version)
   2) Build and check assembler for use of ldrexd - compiled with armv6k flags
   3) Build and check assembler doesn't use ldrexd - compiled with armv5 flags

Because (2) and (3) include different dg-add-options lines I don't see
how I can combine them.

The suggestion that I'm OK with is to #include the gcc.dg one in the
gcc.arm one.

>>> +# Return 1 if the target supports atomic operations on "long long" and can 
>>> actually
>>+# execute them
>>+# So far only put checks in for ARM, others may want to add their own
>>+proc check_effective_target_sync_longlong { } {
>>+    return [check_runtime sync_longlong_runtime {
>>+      #include 
>>+      int main()
>>+      {
>>+      long long l1;
>>+
>>+      if (sizeof(long long)!=8)
>
> Space between ')' and ! as well as '=' and 8
>
>>+        exit(1);
>>+
>>+      #ifdef __arm__
>
> Why is this checking only for ARM state ? We could have ldrexd in T2 as
> well ?

Because __arm__ gets defined for either thumb or arm mode; in thumb mode
we just get __thumb__  (and __thumb2__) defined as well.

> Otherwise the functionality looks good to me. Can you confirm that
> this has survived a testrun for v7-a thumb2 and v7-a arm state ?

Yes it did.  I'll give it another whirl later today after I go and fix
the formatting niggles and mvoe the test.

Thanks for the review.

Dave


Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread David Gilbert
On 30 September 2011 18:01, H.J. Lu  wrote:
> You may want to look a look at:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50583
>
> ARM may have the same problem.

OK - although to be honest this patch only stretches the same
structures to 64bit - any major changes in semantics are a separate issue - but
thanks for pointing it out.

Hmm - I think what's produced is correct; however the manual
description is inconsistent:

 These builtins perform the operation suggested by the name, and
 returns the value that had previously been in memory.  That is,

  { tmp = *ptr; *ptr OP= value; return tmp; }

The ARM code (see below) does a single load inside a loop with a guarded
store.  This guarantees that the value returned is the value that
was 'previously been in memory' directly prior to the atomic operation - however
that does mean it doesn't do the pair of accesses implied by the 'tmp
= *ptr; *ptr OP= value'

On ARM the operation for fetch_and_add we get:
(This is pre-my-patch and 32bit, my patch doesn't change the structure
except for the position of that last label):

mov r3, r0
dmb sy
.LSYT6:
ldrex   r0, [r3]
add r2, r0, r1
strex   r0, r2, [r3]
teq r0, #0
bne .LSYT6
sub r0, r2, r1
dmb sy

That seems the correct semantics to me - if not what am I missing? Was
the intention of the example
really to cause two loads - if so why?

for sync_and_fetch we get:


dmb sy
.LSYT6:
ldrex   r0, [r3]
add r0, r0, r1
strex   r2, r0, [r3]
teq r2, #0
bne .LSYT6
dmb sy

i.e. the value returned is always the value that goes into the guarded
store - and is hence
always the value that's stored.

Dave


Re: [Patch 2/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread David Gilbert
On 3 October 2011 09:35, Andrew Haley  wrote:
> On 09/30/2011 08:54 PM, Joseph S. Myers wrote:
>> On Fri, 30 Sep 2011, Ramana Radhakrishnan wrote:
>>
>>> On 26 July 2011 10:01, Dr. David Alan Gilbert  
>>> wrote:

 +
 +extern unsigned int __write(int fd, const void *buf, unsigned int count);
>>>
>>> Why are we using __write instead of write?
>>
>> Because plain write is in the user's namespace in ISO C.  See what I said
>> in  - the
>> alternative is hardcoding the syscall number and using the syscall
>> directly.
>
> That would be better, no?  Unless __write is part of the glibc API,
> which AFAIK it isn't.

I could change it to calling the syscall directly - although it gets
a little messy having to deal with both ARM and Thumb syscalls;
I was trying to avoid further complicating an already complicated corner
case.

Dave


Re: [Patch 1/4] ARM 64 bit sync atomic operations [V2]

2011-10-03 Thread David Gilbert
(Sorry, repost - I'd meant to cc Mike and Rainer into the
conversation, but forgot to
add them).

On 3 October 2011 13:53, David Gilbert  wrote:
> On 30 September 2011 14:21, Ramana Radhakrishnan
>  wrote:
>> Hi Dave,
>>
>>
>> The nit-picky bit - There are still a number of formatting issues with
>> your patch . Could you run your patch through
>> contrib/check_GNU_style.sh and correct these. These are typically
>> around problems with the number of spaces between a full stop and the
>> end of comment, lines with trailing whitespaces and a few lines with
>> number of characters > 80.  Thanks.
>
> Oops - sorry about those; I'll run it through the check script and nail them.
>
>>>@@ -23590,82 +23637,142 @@ arm_output_sync_loop (emit_f emit,
>>>
>>>+      else
>>>+      {
>>>+        /* Silence false potentially unused warning */
>>>+        required_value_lo = NULL;
>>>+        required_value_hi = NULL;
>>>+      }
>>>
>>
>> s/NULL/NULL_RTX in a number of places in arm.c
>
> OK.
>
>>>+      /* The restrictions on target registers in ARM mode are that the two
>>>+       registers are consecutive and the first one is even; Thumb is
>>>+       actually more flexible, but DI should give us this anyway.
>>>+       Note that the 1st register always gets the lowest word in memory.  */
>>>+      gcc_assert ((REGNO (value) & 1) == 0);
>>>+      operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
>>>+      operands[3] = memory;
>>>+      arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, 
>>>%%C3",
>>>+                         cc);
>>>+    }
>>>
>>
>> The restriction is actually mandatory for ARM state only and thus I'm fine
>> with this assertion being true only in ARM state.
>
> OK, I can make the assert only for thumb mode; but I thought the simpler
> logic was better and should hold true anyway because of DI mode allocation.
>
>> I don't like duplicating the tests from gcc.dg into gcc.target/arm.
>> If you wanted to check for assembler output specific to a target you could
>> add your own conditions to the test in gcc.dg and conditionalize that on
>> target arm_eabi
>>
>> Something like :
>>
>> { dg-final { scan-assembler "ldrexd\t"} {target arm_eabi}} } .
>>
>> I would like a testsuite maintainer to comment on the testsuite 
>> infrastructure
>> bits as well but I have a few comments below .
>
> As discussed, I don't like the dupes either - the problem is that we
> have 3 tests
> with identical code but different dg annotation:
>
>   1) Build & run and check that the sync behaves correctly - using whatever
>       compile flags you happen to have. (gcc.dg version)
>   2) Build and check assembler for use of ldrexd - compiled with armv6k flags
>   3) Build and check assembler doesn't use ldrexd - compiled with armv5 flags
>
> Because (2) and (3) include different dg-add-options lines I don't see
> how I can combine them.
>
> The suggestion that I'm OK with is to #include the gcc.dg one in the
> gcc.arm one.
>
>>>> +# Return 1 if the target supports atomic operations on "long long" and 
>>>> can actually
>>>+# execute them
>>>+# So far only put checks in for ARM, others may want to add their own
>>>+proc check_effective_target_sync_longlong { } {
>>>+    return [check_runtime sync_longlong_runtime {
>>>+      #include 
>>>+      int main()
>>>+      {
>>>+      long long l1;
>>>+
>>>+      if (sizeof(long long)!=8)
>>
>> Space between ')' and ! as well as '=' and 8
>>
>>>+        exit(1);
>>>+
>>>+      #ifdef __arm__
>>
>> Why is this checking only for ARM state ? We could have ldrexd in T2 as
>> well ?
>
> Because __arm__ gets defined for either thumb or arm mode; in thumb mode
> we just get __thumb__  (and __thumb2__) defined as well.
>
>> Otherwise the functionality looks good to me. Can you confirm that
>> this has survived a testrun for v7-a thumb2 and v7-a arm state ?
>
> Yes it did.  I'll give it another whirl later today after I go and fix
> the formatting niggles and mvoe the test.
>
> Thanks for the review.
>
> Dave
>


Re: [PATCH 1/4] arm: Install __sync libfuncs for Linux.

2011-11-14 Thread David Gilbert
On 11 November 2011 23:32, Richard Henderson  wrote:
> Cc: Richard Earnshaw 
> Cc: Ramana Radhakrishnan 
> ---
>  gcc/config/arm/arm.c |    4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 6ef6f62..abf8ce1 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1096,6 +1096,10 @@ arm_set_fixed_conv_libfunc (convert_optab optable, 
> enum machine_mode to,
>  static void
>  arm_init_libfuncs (void)
>  {
> +  /* For Linux, we have access to kernel support for atomic operations.  */
> +  if (arm_abi == ARM_ABI_AAPCS_LINUX)
> +    init_sync_libfuncs (8);

There is unfortunately no guarantee that your kernel has support for the 64bit
cases, since this was only recently added - and the libgcc code checks
and aborts if linked in.

(As far as I can tell there is approximately one potential user of 64bit atomics
on ARM, so it's important not to do that check for all binaries).

Dave


Re: [ARM] Rewrite ldrex/strex support for atomic optabs

2011-11-28 Thread David Gilbert
On 23 November 2011 23:43, Richard Henderson  wrote:
> This transformation is quite a bit more dramatic than the other ports because 
> ARM was not splitting the code sequences post-reload.  Indeed, the failure to 
> split resulted in a distinctly odd coding style where fake output routines 
> were used to compute the length of the code sequence.  This all seemed highly 
> sub-optimal, so I rewrote everything from scratch.
>
> This has passed initial sniff test from a cross-compiler, and is undergoing 
> full testing on an armv7 host.  Hopefully results will be ready tomorrow.
>
> In the meantime, please review.

Hi Rchard,
  Can you explain the code:

+  if (mod_f != MEMMODEL_RELAXED)
+emit_label (label2);
+
+  arm_post_atomic_barrier (mod_s);
+
+  if (mod_f == MEMMODEL_RELAXED)
+emit_label (label2);
+}

in the case of the existing __sync_* will it always end up doing the
label and the sync as
Michael's pr 48126 indicated and my patch moved it?

(It's not at all clear to me what the ordering requirements of
ldrex/strex are from the ARM ARM).

Dave


Re: [testsuite, libffi] XFAIL libffi.call/cls_{,long}double_va.c on IRIX 6.5 (PR libffi/46660)

2011-06-29 Thread David Gilbert
On 29 June 2011 14:43, Rainer Orth  wrote:
> -/* { dg-output "PR libffi/46660" { xfail mips-sgi-irix6* } } */
> +/* { dg-output "" { xfail mips-sgi-irix6* } } PR libffi/46660 */

Do you fancy adding the appropriate MIPS fix on top of the libffi varargs patch
I posted a few months back - then it could actually pass the test!

Dave


Re: [Patch 2/3] ARM 64 bit atomic operations

2011-07-01 Thread David Gilbert
On 1 July 2011 17:03, Richard Henderson  wrote:
> On 07/01/2011 08:55 AM, Dr. David Alan Gilbert wrote:
>> +/* Check that the kernel has a new enough version at load */
>> +void __check_for_sync8_kernelhelper (void)
>> +{
>> +  if (__kernel_helper_version < 5)
>> +    {
>> +      const char err[] = "A newer kernel is required to run this binary. 
>> (__kernel_cmpxchg64 helper)\n";
>> +      /* At this point we need a way to crash with some information
>> +      for the user - I'm not sure I can rely on much else being
>> +      available at this point, so do the same as generic-morestack.c
>> +      write() and abort(). */
>> +      write (2 /* stderr */, err, sizeof(err));
>> +      abort ();
>> +    }
>> +};
>
> Wouldn't it be better to convert the arm kernel to use a proper VDSO,
> so that this error actually comes from the dynamic linker?  That's
> the beauty of a true VDSO -- proper symbol resolution.

Well I can't say I like the need for this check/exit - so yes if something else
could do it for me that would be great.  Having said that, I don't know
the history of the ARM commpage/lack of VDSO , neither do I know how
big a change that would be.

Let me have a chat to some of the kernel guys who might know the history.

Dave


Re: [Patch 2/3] ARM 64 bit atomic operations

2011-07-04 Thread David Gilbert
On 1 July 2011 20:38, Joseph S. Myers  wrote:

Hi Joseph,
  Thanks for your comments.

> On Fri, 1 Jul 2011, Dr. David Alan Gilbert wrote:
>
>> +/* For write */
>> +#include 
>> +/* For abort */
>> +#include 
>
> Please don't include system headers in libgcc without appropriate
> inhibit_libc checks for bootstrap purposes.  In this case, it would seem
> better just to declare the functions you need.

OK.

>> +/* Check that the kernel has a new enough version at load */
>> +void __check_for_sync8_kernelhelper (void)
>
> Shouldn't this function be static?

Yep.

>> +{
>> +  if (__kernel_helper_version < 5)
>> +    {
>> +      const char err[] = "A newer kernel is required to run this binary. 
>> (__kernel_cmpxchg64 helper)\n";
>> +      /* At this point we need a way to crash with some information
>> +      for the user - I'm not sure I can rely on much else being
>> +      available at this point, so do the same as generic-morestack.c
>> +      write() and abort(). */
>> +      write (2 /* stderr */, err, sizeof(err));
>
> "write" is in the user's namespace in ISO C so it's not ideal to have a
> call to it.  If there isn't a reserved-namespace version, using the
> syscall directly (hardcoding the syscall number) might be better.

OK, fair enough.

>> +void (*__sync8_kernelhelper_inithook[]) (void) __attribute__ ((section 
>> (".init_array"))) = {
>> +  &__check_for_sync8_kernelhelper
>> +};
>
> Shouldn't this also be static (marked "used" if needed)?  Though I'd have
> thought simply marking the function as a constructor would be simpler and
> better

OK, can do - I wasn't too sure if constructor would end up later in
the initialisation - I was worrying whether that might end up after a
C++ constructor that might actually use; (although I'm not actually
sure if that's more or less likely to happen with constructor v
init_array).

Dave


Re: [Patch 1/3] ARM 64 bit atomic operations

2011-07-13 Thread David Gilbert
On 12 July 2011 22:07, Ramana Radhakrishnan
 wrote:
> Hi Dave,

Hi Ramana,
  Thanks for the review.

> Could you split this further into a patch that deals with the
> case for disabling MCR memory barriers for Thumb1 so that it
> maybe backported to the release branches ? I have commented inline
> as well.

Sure.

> Could you also provide a proper changelog entry for this that will
> also help with review of the patch ?

Yep, no problem.

> I've not yet managed to fully review all the bits in this patch but
> here's some initial comments that should be looked at.
>
> On 1 July 2011 16:54, Dr. David Alan Gilbert  wrote:
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c


>> +      if (is_di)
>> +        {
>> +          arm_output_asm_insn (emit, 0, operands, "it\teq");
>
> This should be guarded with a if (TARGET_THUMB2) - there's no point in
> accounting for the length of this instruction in the compiler and then
> have the assembler fold it away in ARM state.

OK; the length accounting seems pretty broken anyway; I think it assumes
all instructions are 4 bytes.

>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>> index c32ef1a..3fdd22f 100644
>> --- a/gcc/config/arm/arm.h
>> +++ b/gcc/config/arm/arm.h
>> @@ -282,7 +282,8 @@ extern void 
>> (*arm_lang_output_object_attributes_hook)(void);
>> -#define TARGET_HAVE_DMB_MCR    (arm_arch6k && ! TARGET_HAVE_DMB)
>> +#define TARGET_HAVE_DMB_MCR    (arm_arch6k && ! TARGET_HAVE_DMB \
>> +                                && ! TARGET_THUMB1)
>
> This hunk (TARGET_HAVE_DMB_MCR) should probably be backported to
> release branches because this is technically fixing an issue and
> hence should be a separate patch that can be looked at separately.

OK, will do.

>>  /* Nonzero if this chip implements a memory barrier instruction.  */
>>  #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)
>> @@ -290,8 +291,12 @@ extern void 
>> (*arm_lang_output_object_attributes_hook)(void);
>
> sync.md changes -
>
>> (define_mode_iterator NARROW [QI HI])
>>+(define_mode_iterator QHSD [QI HI SI DI])
>>+(define_mode_iterator SIDI [SI DI])
>>+
>>+(define_mode_attr sync_predtab [(SI "TARGET_HAVE_LDREX && 
>>TARGET_HAVE_MEMORY_BARRIER")
>>+                              (QI "TARGET_HAVE_LDREXBH && 
>>TARGET_HAVE_MEMORY_BARRIER")
>>+                              (HI "TARGET_HAVE_LDREXBH && 
>>TARGET_HAVE_MEMORY_BARRIER")
>>+                              (DI "TARGET_HAVE_LDREXD && 
>>ARM_DOUBLEWORD_ALIGN && TARGET_HAVE_MEMORY_BARRIER")])
>>+
>
> Can we move all the iterators to iterators.md and then arrange
> includes to work automatically ? Minor nit - could you align the entries
> for QI, HI and DI with the start of the SI ?

Yes I can do - the only odd thing is I guess the sync_predtab is very
sync.md specific, does it really make sense for that
to be in iterators.md ?

>>+(define_mode_attr sync_atleastsi [(SI "SI")
>>+                                (DI "DI")
>>+                                (HI "SI")
>>+                                (QI "SI")])
>>
>
> I couldn't spot where this was being used. Can this be removed if not
> necessary ?

Ah - yes I think that's dead; it's a relic from an attempt to merge some of the
other narrow cases into the same iterator but it got way too messy.

>>-(define_insn "arm_sync_new_nandsi"
>>+(define_insn "arm_sync_new_"
>>   [(set (match_operand:SI 0 "s_register_operand" "=&r")
>>-        (unspec_volatile:SI [(not:SI (and:SI
>>-                               (match_operand:SI 1 "arm_sync_memory_operand" 
>>"+Q")
>>-                               (match_operand:SI 2 "s_register_operand" 
>>"r")))
>>-                          ]
>>-                          VUNSPEC_SYNC_NEW_OP))
>>+        (unspec_volatile:SI [(syncop:SI
>>+                             (zero_extend:SI
>>+                               (match_operand:NARROW 1 
>>"arm_sync_memory_operand" "+Q"))
>>+                             (match_operand:SI 2 "s_register_operand" "r"))
>>+                          ]
>>+                          VUNSPEC_SYNC_NEW_OP))
>>    (set (match_dup 1)
>>-        (unspec_volatile:SI [(match_dup 1) (match_dup 2)]
>>-                          VUNSPEC_SYNC_NEW_OP))
>>+      (unspec_volatile:NARROW [(match_dup 1) (match_dup 2)]
>>+                              VUNSPEC_SYNC_NEW_OP))
>>    (clobber (reg:CC CC_REGNUM))
>>    (clobber (match_scratch:SI 3 "=&r"))]
>>-  "TARGET_HAVE_LDREX && TARGET_HAVE_MEMORY_BARRIER"
>>+  "TARGET_HAVE_LDREXBH && TARGET_HAVE_MEMORY_BARRIER"
>
> Can't this just use  instead since the condition is identical
> for QImode and HImode from that mode attribute and in quite a few
> places below. ?

Hmm yes it can - I'd only been using predtab in the places where it was
varying on the mode; but as you say this can be converted as well.

>>@@ -461,19 +359,19 @@
>>         (unspec_volatile:SI
>>         [(not:SI
>>            (and:SI
>>-               (zero_extend:SI
>>-               (match_operand:NARROW 1 "ar