Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]

2018-07-10 Thread Sudakshina Das

Hi Jackson


On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote:

Hi all,

This patch resolves PR86014.  It does so by noticing that the last 
load may clobber the address register without issue (regardless of 
where it exists in the final ldp/stp sequence).  That check has been 
changed so that the last register may be clobbered and the testcase 
(gcc.target/aarch64/ldp_stp_10.c) now passes.


Bootstrap and regtest OK.

OK for trunk?

Jackson

Changelog:

gcc/

2018-06-25  Jackson Woodruff  

    PR target/86014
    * config/aarch64/aarch64.c 
(aarch64_operands_adjust_ok_for_ldpstp):

    Remove address clobber check on last register.


This looks good to me but you will need a maintainer to approve it. The only
thing I would add is that if you could move the comment on top of the 
for loop

to this patch. That is, keep the original
/* Check if the addresses are clobbered by load.  */
in your [1/2] and make the comment change in [2/2].

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
d0e9b2d464183eecc8cc7639ca3e981d2ff243ba..feffe8ebdbd4efd0ffc09834547767ceec46f4e4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17074,7 +17074,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, 
bool load,
   /* Only the last register in the order in which they occur
  may be clobbered by the load.  */
   if (load)
-for (int i = 0; i < num_instructions; i++)
+for (int i = 0; i < num_instructions - 1; i++)
   if (reg_mentioned_p (reg[i], mem[i]))
return false;


Thanks
Sudi



Re: [AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]

2018-07-12 Thread Sudakshina Das

Hi Jackson

On 11/07/18 17:48, Jackson Woodruff wrote:

Hi Sudi,

Thanks for the review.


On 07/10/2018 10:56 AM, Sudakshina wrote:

Hi Jackson


-  if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
+  if (!MEM_P (mem[1]) || aarch64_mem_pair_operand (mem[1], mode))

mem_1 == mem[1]?

Oops, yes... That should be mem[0].


 return false;

-  /* The mems cannot be volatile.  */
...

/* If we have SImode and slow unaligned ldp,
  check the alignment to be at least 8 byte. */
   if (mode == SImode
   && (aarch64_tune_params.extra_tuning_flags
-  & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
+      & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
   && !optimize_size
-  && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
+  && MEM_ALIGN (mem[1]) < 8 * BITS_PER_UNIT)

Likewise

Done

...
   /* Check if the registers are of same class.  */
-  if (rclass_1 != rclass_2 || rclass_2 != rclass_3 || rclass_3 != 
rclass_4)

-    return false;
+  for (int i = 0; i < 3; i++)

num_instructions -1 instead of 3 would be more consistent.

Done


+    if (rclass[i] != rclass[i + 1])
+  return false;

It looks good otherwise.

Thanks
Sudi


Re-regtested and boostrapped.

OK for trunk?


Looks good to me but you will need approval from
a maintainer to commit it!

Thanks
Sudi



Thanks,

Jackson




Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]

2018-07-12 Thread Sudakshina Das

Hi Jackson

On 11/07/18 17:48, Jackson Woodruff wrote:

Hi Sudi,

On 07/10/2018 02:29 PM, Sudakshina Das wrote:

Hi Jackson


On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote:

Hi all,

This patch resolves PR86014.  It does so by noticing that the last 
load may clobber the address register without issue (regardless of 
where it exists in the final ldp/stp sequence). That check has been 
changed so that the last register may be clobbered and the testcase 
(gcc.target/aarch64/ldp_stp_10.c) now passes.


Bootstrap and regtest OK.

OK for trunk?

Jackson

Changelog:

gcc/

2018-06-25  Jackson Woodruff  

    PR target/86014
    * config/aarch64/aarch64.c 
(aarch64_operands_adjust_ok_for_ldpstp):

    Remove address clobber check on last register.

This looks good to me but you will need a maintainer to approve it. 
The only
thing I would add is that if you could move the comment on top of the 
for loop

to this patch. That is, keep the original
/* Check if the addresses are clobbered by load.  */
in your [1/2] and make the comment change in [2/2].

Thanks, change made.  OK for trunk?



Looks good to me but you will need approval from
a maintainer to commit it!

Thanks
Sudi


Thanks,

Jackson




Re: [PATCH][GCC][AARCH64] Canonicalize aarch64 widening simd plus insns

2018-07-12 Thread Sudakshina Das

Hi Matthew

On 12/07/18 11:18, Richard Sandiford wrote:

Looks good to me FWIW (not a maintainer), just a minor formatting thing:

Matthew Malcomson  writes:

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
aac5fa146ed8dde4507a0eb4ad6a07ce78d2f0cd..67b29cbe2cad91e031ee23be656ec61a403f2cf9
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3302,38 +3302,78 @@
DONE;
  })
  
-(define_insn "aarch64_w"

+(define_insn "aarch64_subw"
[(set (match_operand: 0 "register_operand" "=w")
-(ADDSUB: (match_operand: 1 "register_operand" "w")
-   (ANY_EXTEND:
- (match_operand:VD_BHSI 2 "register_operand" "w"]
+   (minus:
+(match_operand: 1 "register_operand" "w")
+(ANY_EXTEND:
+  (match_operand:VD_BHSI 2 "register_operand" "w"]


The (minus should be under the "(match_operand":

(define_insn "aarch64_subw"
   [(set (match_operand: 0 "register_operand" "=w")
(minus: (match_operand: 1 "register_operand" "w")
   (ANY_EXTEND:
 (match_operand:VD_BHSI 2 "register_operand" "w"]

Same for the other patterns.

Thanks,
Richard



You will need a maintainer's approval but this looks good to me.
Thanks for doing this. I would only point out one other nit which you
can choose to ignore:

+/* Ensure
+   saddw2 and one saddw for the function add()
+   ssubw2 and one ssubw for the function subtract()
+   uaddw2 and one uaddw for the function uadd()
+   usubw2 and one usubw for the function usubtract() */
+
+/* { dg-final { scan-assembler-times "\[ \t\]ssubw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]ssubw\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]saddw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]saddw\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]usubw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]usubw\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]uaddw2\[ \t\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\[ \t\]uaddw\[ \t\]+" 1 } } */

The scan-assembly directives for the different
functions can be placed right below each of them and that would
make it easier to read the expected results in the test and you
can get rid of the comments saying the same.

Thanks
Sudi


Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-07-12 Thread Sudakshina Das

Hi Eric

On 27/06/18 12:22, Wilco Dijkstra wrote:

Eric Botcazou wrote:


This test can easily be changed not to use optimize since it doesn't look
like it needs it. We really need to tests these builtins properly,
otherwise they will continue to fail on most targets.


As far as I can see PR target/84521 has been reported only for Aarch64 so I'd
just leave the other targets alone (and avoid propagating FUD if possible).


It's quite obvious from PR84521 that this is an issue affecting all targets.
Adding better generic tests for __builtin_setjmp can only be a good thing.

Wilco



This conversation seems to have died down and I would like to
start it again. I would agree with Wilco's suggestion about
keeping the test in the generic folder. I have removed the
optimize attribute and the effect is still the same. It passes
on AArch64 with this patch and it currently fails on x86
trunk (gcc version 9.0.0 20180712 (experimental) (GCC))
on -O1 and above.

Thanks
Sudi
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index f284e74..9792d28 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version;
 #define EH_RETURN_STACKADJ_RTX	gen_rtx_REG (Pmode, R4_REGNUM)
 #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
 
-/* Don't use __builtin_setjmp until we've defined it.  */
+/* Don't use __builtin_setjmp until we've defined it.
+   CAUTION: This macro is only used during exception unwinding.
+   Don't fall for its name.  */
 #undef DONT_USE_BUILTIN_SETJMP
 #define DONT_USE_BUILTIN_SETJMP 1
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 01f35f8..4266a3d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3998,7 +3998,7 @@ static bool
 aarch64_needs_frame_chain (void)
 {
   /* Force a frame chain for EH returns so the return address is at FP+8.  */
-  if (frame_pointer_needed || crtl->calls_eh_return)
+  if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label)
 return true;
 
   /* A leaf function cannot have calls or write LR.  */
@@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
   expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
 }
 
+/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
+static rtx
+aarch64_builtin_setjmp_frame_value (void)
+{
+  return hard_frame_pointer_rtx;
+}
+
 /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
 
 static tree
@@ -17744,6 +17751,9 @@ aarch64_run_selftests (void)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN aarch64_fold_builtin
 
+#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
+#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
+
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG aarch64_function_arg
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..d5f33d8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6087,6 +6087,30 @@
   DONE;
 })
 
+;; This is broadly similar to the builtins.c except that it uses
+;; temporaries to load the incoming SP and FP.
+(define_expand "nonlocal_goto"
+  [(use (match_operand 0 "general_operand"))
+   (use (match_operand 1 "general_operand"))
+   (use (match_operand 2 "general_operand"))
+   (use (match_operand 3 "general_operand"))]
+  ""
+{
+rtx label_in = copy_to_reg (operands[1]);
+rtx fp_in = copy_to_reg (operands[3]);
+rtx sp_in = copy_to_reg (operands[2]);
+
+emit_move_insn (hard_frame_pointer_rtx, fp_in);
+emit_stack_restore (SAVE_NONLOCAL, sp_in);
+
+emit_use (hard_frame_pointer_rtx);
+emit_use (stack_pointer_rtx);
+
+emit_indirect_jump (label_in);
+
+DONE;
+})
+
 ;; Helper for aarch64.c code.
 (define_expand "set_clobber_cc"
   [(parallel [(set (match_operand 0)
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
new file mode 100644
index 000..564ef14
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target indirect_jumps } */
+
+#include 
+#include 
+#include 
+
+jmp_buf buf;
+
+int uses_longjmp (void)
+{
+  jmp_buf buf2;
+  memcpy (buf2, buf, sizeof (buf));
+  __builtin_longjmp (buf2, 1);
+}
+
+int gl;
+void after_longjmp (void)
+{
+  gl = 5;
+}
+
+int
+test_1 (int n)
+{
+  volatile int *p = alloca (n);
+  if (__builtin_setjmp (buf))
+{
+  after_longjmp ();
+}
+  else
+{
+  uses_longjmp ();
+}
+
+  return 0;
+}
+
+int
+test_2 (int n)
+{
+  int i;
+  int *ptr = (int *)__builtin_alloca (sizeof (int) * n);
+  for (i = 0; i < n; i++)
+ptr[i] = i;
+  test_1 (n);
+  return 0;
+}
+
+int main (int argc, const char **argv)
+{
+  __builtin_memset (&buf, 0xaf, sizeof (buf));
+  test_2 (100);
+}


Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-07-16 Thread Sudakshina Das

Hi Sam

On 13/07/18 17:09, Sam Tebbs wrote:

Hi all,

This patch adds an optimisation that exploits the AArch64 BFXIL instruction
when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0x) | (b & 0x)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long b) {
   return (a & 0xll) | (b & 0xll);
}

void read2(unsigned long long a, unsigned long long b, unsigned long long *c,
   unsigned long long *d) {
   *c = combine(a, b); *d = combine(b, a);
}

When compiled with -O2, read2 would result in:

read2:
   and   x5, x1, #0x
   and   x4, x0, #0x
   orr   x4, x4, x5
   and   x1, x1, #0x
   and   x0, x0, #0x
   str   x4, [x2]
   orr   x0, x0, x1
   str   x0, [x3]
   ret

But with this patch results in:

read2:
   mov   x4, x1
   bfxil x4, x0, 0, 32
   str   x4, [x2]
   bfxil x0, x1, 0, 32
   str   x0, [x3]
   ret
   
Bootstrapped and regtested on aarch64-none-linux-gnu and aarch64-none-elf with no regressions.


I am not a maintainer but I have a question about this patch. I may be 
missing something or reading

it wrong. So feel free to point it out:

+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+            (match_operand 3 "const_int_operand"))
+        (and:DI (match_operand:DI 2 "register_operand" "0")
+            (match_operand 4 "const_int_operand"]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+    && aarch64_is_left_consecutive (INTVAL (operands[3]))"
+  {
+    HOST_WIDE_INT op4 = INTVAL (operands[4]);
+    operands[3] = GEN_INT (64 - ceil_log2 (op4));
+    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);

In the BFXIL you are reading %3 LSB bits from operand 1 and putting it 
in the LSBs of %0.

This means that the pattern should be masking the 32-%3 MSB of %0 and
%3 LSB of %1. So shouldn't operand 4 is LEFT_CONSECUTIVE>

Can you please compare a simpler version of the above example you gave to
make sure the generated assembly is equivalent before and after the patch:

void read2(unsigned long long a, unsigned long long b, unsigned long long *c) {
  *c = combine(a, b);
}


From the above text

read2:
  and   x5, x1, #0x
  and   x4, x0, #0x
  orr   x4, x4, x5

read2:
  mov   x4, x1
  bfxil x4, x0, 0, 32

This does not seem equivalent to me.

Thanks
Sudi

+    return "";
+  }
+  [(set_attr "type" "bfx")]
+)

gcc/
2018-07-11  Sam Tebbs  

     * config/aarch64/aarch64.md (*aarch64_bfxil, *aarch64_bfxil_alt):
     Define.
     * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
     Define.
     * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New function.

gcc/testsuite
2018-07-11  Sam Tebbs  

     * gcc.target/aarch64/combine_bfxil.c: New file.
     * gcc.target/aarch64/combine_bfxil_2.c: New file.







Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-23 Thread Sudakshina Das

Hi Sam


On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:

Hi all,

This patch extends the aarch64_get_lane_zero_extendsi instruction 
definition to

also cover DI mode. This prevents a redundant AND instruction from being
generated due to the pattern failing to be matched.

Example:

typedef char v16qi __attribute__ ((vector_size (16)));

unsigned long long
foo (v16qi a)
{
  return a[0];
}

Previously generated:

foo:
    umov    w0, v0.b[0]
    and x0, x0, 255
    ret

And now generates:

foo:
    umov    w0, v0.b[0]
    ret

Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf 
with no

regressions.

gcc/
2018-07-23  Sam Tebbs 

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-23  Sam Tebbs 

    * gcc.target/aarch64/extract_zero_extend.c: New file

You will need an approval from a maintainer, but I would only add one 
request to this:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
   [(set_attr "type" "neon_to_gp")]
 )

-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-    (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI

Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for each 
of them.


Thanks
Sudi

   (vec_select:
     (match_operand:VDQQH 1 "register_operand" "w")
     (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));

+    operands[2] = aarch64_endian_lane_rtx (mode,
+                       INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]


Re: [PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64

2018-07-23 Thread Sudakshina Das

Hi Vlad


On Friday 20 July 2018 10:37 AM, Vlad Lazar wrote:

Hi,

The patch adds implementations for the NEON intrinsics vabsd_s64 and 
vnegd_s64.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification) 



Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
regressions.


OK for trunk?


Thanks for doing this. This looks good to me but you will a maintainer's 
approval.


Thanks
Sudi


Thanks,
Vlad

gcc/
2018-07-02  Vlad Lazar  

* config/aarch64/arm_neon.h (vabsd_s64, vnegd_s64): New.

gcc/testsuite/
2018-07-02  Vlad Lazar  

* gcc.target/aarch64/scalar_intrinsics.c (test_vabsd_s64, 
test_vabsd_s64): New.


---

diff --git a/gcc/config/aarch64/arm_neon.h 
b/gcc/config/aarch64/arm_neon.h
index 
2d18400040f031dfcdaf60269ad484647804e1be..19e22431a85bcd09d0ea759b42b0a52420b6c43c 
100644

--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11822,6 +11822,13 @@ vabsq_s64 (int64x2_t __a)
   return __builtin_aarch64_absv2di (__a);
 }

+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vabsd_s64 (int64_t __a)
+{
+  return __builtin_aarch64_absdi (__a);
+}
+
 /* vadd */

 __extension__ extern __inline int64_t
@@ -22907,6 +22914,12 @@ vneg_s64 (int64x1_t __a)
   return -__a;
 }

+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vnegd_s64 (int64_t __a)
+{
+  return -__a;
+}
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vnegq_f32 (float32x4_t __a)
diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c 
b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
index 
ea29066e369b967d0781d31c8a5208bda9e4f685..45afeec373971838e0cd107038b4aa51a2d4998f 
100644

--- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
+++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
@@ -603,6 +603,14 @@ test_vsqaddd_u64 (uint64_t a, int64_t b)
   return vsqaddd_u64 (a, b);
 }

+/* { dg-final { scan-assembler-times "\\tabs\\td\[0-9\]+" 1 } } */
+
+int64_t
+test_vabsd_s64 (int64_t a)
+{
+  return vabsd_s64 (a);
+}
+
 /* { dg-final { scan-assembler-times "\\tsqabs\\tb\[0-9\]+" 1 } } */

 int8_t
@@ -627,6 +635,14 @@ test_vqabss_s32 (int32_t a)
   return vqabss_s32 (a);
 }

+/* { dg-final { scan-assembler-times "\\tneg\\tx\[0-9\]+" 1 } } */
+
+int64_t
+test_vnegd_s64 (int64_t a)
+{
+  return vnegd_s64 (a);
+}
+
 /* { dg-final { scan-assembler-times "\\tsqneg\\tb\[0-9\]+" 1 } } */

 int8_t




Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-25 Thread Sudakshina Das

Hi Sam

On 25/07/18 14:08, Sam Tebbs wrote:

On 07/23/2018 05:01 PM, Sudakshina Das wrote:

Hi Sam


On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:

Hi all,

This patch extends the aarch64_get_lane_zero_extendsi instruction 
definition to

also cover DI mode. This prevents a redundant AND instruction from being
generated due to the pattern failing to be matched.

Example:

typedef char v16qi __attribute__ ((vector_size (16)));

unsigned long long
foo (v16qi a)
{
  return a[0];
}

Previously generated:

foo:
    umov    w0, v0.b[0]
    and x0, x0, 255
    ret

And now generates:

foo:
    umov    w0, v0.b[0]
    ret

Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf 
with no

regressions.

gcc/
2018-07-23  Sam Tebbs 

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-23  Sam Tebbs 

    * gcc.target/aarch64/extract_zero_extend.c: New file

You will need an approval from a maintainer, but I would only add one 
request to this:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
   [(set_attr "type" "neon_to_gp")]
 )

-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-    (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI

Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for 
each of them.


Thanks
Sudi


Hi Sudi,

Thanks for the feedback. Here is an updated patch that adds more 
testcases to cover the patterns generated by the different mode 
combinations. The changelog and description from my original email still 
apply.




Thanks for making the changes and adding more test cases. I do however
see that you are only covering 2 out of 4 new
*aarch64_get_lane_zero_extenddi<> patterns. The
*aarch64_get_lane_zero_extendsi<> were already existing. I don't mind
those tests. I would just ask you to add the other two new patterns
as well. Also since the different versions of the instruction generate
same instructions (like foo_16qi and foo_8qi both give out the same
instruction), I would suggest using a -fdump-rtl-final (or any relevant
rtl dump) with the dg-options and using a scan-rtl-dump to scan the
pattern name. Something like:
/* { dg-do compile } */
/* { dg-options "-O3 -fdump-rtl-final" } */
...
...
/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" 
"final" } } */


Thanks
Sudi



   (vec_select:
     (match_operand:VDQQH 1 "register_operand" "w")
     (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));

+    operands[2] = aarch64_endian_lane_rtx (mode,
+                       INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]






Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-07-27 Thread Sudakshina Das

Hi Sam

On 25/07/18 14:08, Sam Tebbs wrote:

On 07/23/2018 05:01 PM, Sudakshina Das wrote:

Hi Sam


On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:

Hi all,

This patch extends the aarch64_get_lane_zero_extendsi instruction 
definition to

also cover DI mode. This prevents a redundant AND instruction from being
generated due to the pattern failing to be matched.

Example:

typedef char v16qi __attribute__ ((vector_size (16)));

unsigned long long
foo (v16qi a)
{
  return a[0];
}

Previously generated:

foo:
    umov    w0, v0.b[0]
    and x0, x0, 255
    ret

And now generates:

foo:
    umov    w0, v0.b[0]
    ret

Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf 
with no

regressions.

gcc/
2018-07-23  Sam Tebbs 

    * config/aarch64/aarch64-simd.md
    (*aarch64_get_lane_zero_extendsi):
    Rename to...
(*aarch64_get_lane_zero_extend): ... This.
    Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-23  Sam Tebbs 

    * gcc.target/aarch64/extract_zero_extend.c: New file

You will need an approval from a maintainer, but I would only add one 
request to this:


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
   [(set_attr "type" "neon_to_gp")]
 )

-(define_insn "*aarch64_get_lane_zero_extendsi"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-    (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+    (zero_extend:GPI

Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for 
each of them.


Thanks
Sudi


Hi Sudi,

Thanks for the feedback. Here is an updated patch that adds more 
testcases to cover the patterns generated by the different mode 
combinations. The changelog and description from my original email still 
apply.




Thanks it looks good to me! You will still need a maintainer to approve.

Sudi



   (vec_select:
     (match_operand:VDQQH 1 "register_operand" "w")
     (parallel [(match_operand:SI 2 "immediate_operand" "i")]]
   "TARGET_SIMD"
   {
-    operands[2] = aarch64_endian_lane_rtx (mode, INTVAL 
(operands[2]));

+    operands[2] = aarch64_endian_lane_rtx (mode,
+                       INTVAL (operands[2]));
 return "umov\\t%w0, %1.[%2]";
   }
   [(set_attr "type" "neon_to_gp")]






Re: [GCC][PATCH][Aarch64] Stop redundant zero-extension after UMOV when in DI mode

2018-08-01 Thread Sudakshina Das

Hi Sam

On 01/08/18 10:12, Sam Tebbs wrote:



On 07/31/2018 11:16 PM, James Greenhalgh wrote:

On Thu, Jul 26, 2018 at 11:52:15AM -0500, Sam Tebbs wrote:




Thanks for making the changes and adding more test cases. I do however
see that you are only covering 2 out of 4 new
*aarch64_get_lane_zero_extenddi<> patterns. The
*aarch64_get_lane_zero_extendsi<> were already existing. I don't mind
those tests. I would just ask you to add the other two new patterns
as well. Also since the different versions of the instruction generate
same instructions (like foo_16qi and foo_8qi both give out the same
instruction), I would suggest using a -fdump-rtl-final (or any relevant
rtl dump) with the dg-options and using a scan-rtl-dump to scan the
pattern name. Something like:
/* { dg-do compile } */
/* { dg-options "-O3 -fdump-rtl-final" } */
...
...
/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi"
"final" } } */

Thanks
Sudi

Hi Sudi,

Thanks again. Here's an update that adds 4 more tests, so all 8 patterns
generated are now tested for!
This is OK for trunk, thanks for the patch (and thanks Sudi for the 
review!)


Thanks,
James


Thank you James! I'd appreciate it if someone could commit it as I don't 
have commit rights yet.




I have committed this on your behalf as r263200.

Thanks
Sudi


Sam




Below is the updated changelog

gcc/
2018-07-26  Sam Tebbs  

      * config/aarch64/aarch64-simd.md
      (*aarch64_get_lane_zero_extendsi):
      Rename to...
(*aarch64_get_lane_zero_extend): ... This.
      Use GPI iterator instead of SI mode.

gcc/testsuite
2018-07-26  Sam Tebbs  

      * gcc.target/aarch64/extract_zero_extend.c: New file







Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-08-01 Thread Sudakshina Das

Hi

On 31/07/18 22:48, Andrew Pinski wrote:

On Tue, Jul 31, 2018 at 2:43 PM James Greenhalgh
 wrote:


On Thu, Jul 12, 2018 at 12:01:09PM -0500, Sudakshina Das wrote:

Hi Eric

On 27/06/18 12:22, Wilco Dijkstra wrote:

Eric Botcazou wrote:


This test can easily be changed not to use optimize since it doesn't look
like it needs it. We really need to tests these builtins properly,
otherwise they will continue to fail on most targets.


As far as I can see PR target/84521 has been reported only for Aarch64 so I'd
just leave the other targets alone (and avoid propagating FUD if possible).


It's quite obvious from PR84521 that this is an issue affecting all targets.
Adding better generic tests for __builtin_setjmp can only be a good thing.

Wilco



This conversation seems to have died down and I would like to
start it again. I would agree with Wilco's suggestion about
keeping the test in the generic folder. I have removed the
optimize attribute and the effect is still the same. It passes
on AArch64 with this patch and it currently fails on x86
trunk (gcc version 9.0.0 20180712 (experimental) (GCC))
on -O1 and above.



I don't see where the FUD comes in here; either this builtin has a defined
semantics across targets and they are adhered to, or the builtin doesn't have
well defined semantics, or the targets fail to implement those semantics.


The problem comes from the fact the builtins are not documented at all.
See PR59039 for the issue on them not being documented.



Thanks @James for bringing this up again.
I tried to revive the conversation on PR59039 while working on this as
well but that conversation mainly focused on documenting if we are
allowed to use __builtin_setjmp and __builtin_longjmp on the same
function and with the same jmp buffer or not. This patch and this test
case however does not involve that issue. There are other holes in the
documentation/implementation of these builtins. For now as advised by
James, I have posted the test case on the PR. I personally don't see why
this test case should go on the AArch64 tests when it clearly fails on
other targets as well. But if we can not come to an agreement on that, I
am willing to move it to AArch64 tests and maybe open a new bug report
which is not marked as "target" with the same test case.

Thanks
Sudi


Thanks,
Andrew




I think this should go in as is. If other targets are unhappy with the
failing test they should fix their target or skip the test if it is not
appropriate.

You may want to CC some of the maintainers of platforms you know to fail as
a courtesy on the PR (add your testcase, and add failing targets and their
maintainers to that PR) before committing so it doesn't come as a complete
surprise.

This is OK with some attempt to get target maintainers involved in the
conversation before commit.

Thanks,
James


diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index f284e74..9792d28 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -473,7 +473,9 @@ extern unsigned aarch64_architecture_version;
  #define EH_RETURN_STACKADJ_RTX   gen_rtx_REG (Pmode, R4_REGNUM)
  #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()

-/* Don't use __builtin_setjmp until we've defined it.  */
+/* Don't use __builtin_setjmp until we've defined it.
+   CAUTION: This macro is only used during exception unwinding.
+   Don't fall for its name.  */
  #undef DONT_USE_BUILTIN_SETJMP
  #define DONT_USE_BUILTIN_SETJMP 1

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 01f35f8..4266a3d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3998,7 +3998,7 @@ static bool
  aarch64_needs_frame_chain (void)
  {
/* Force a frame chain for EH returns so the return address is at FP+8.  */
-  if (frame_pointer_needed || crtl->calls_eh_return)
+  if (frame_pointer_needed || crtl->calls_eh_return || 
cfun->has_nonlocal_label)
  return true;

/* A leaf function cannot have calls or write LR.  */
@@ -12218,6 +12218,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
nextarg ATTRIBUTE_UNUSED)
expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
  }

+/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
+static rtx
+aarch64_builtin_setjmp_frame_value (void)
+{
+  return hard_frame_pointer_rtx;
+}
+
  /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */

  static tree
@@ -17744,6 +17751,9 @@ aarch64_run_selftests (void)
  #undef TARGET_FOLD_BUILTIN
  #define TARGET_FOLD_BUILTIN aarch64_fold_builtin

+#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
+#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
+
  #undef TARGET_FUNCTION_ARG
  #define TARGET_FUNCTION_ARG aarch64_function_arg

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..d5f33d8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/

[PATCH, GCC] Fix unrolling check.

2019-11-08 Thread Sudakshina Das
Hi

I was fiddling around with the loop unrolling pass and noticed a check 
in decide_unroll_* functions (in the patch). The comment on top of this 
check says
"/* If we were not asked to unroll this loop, just return back silently. 
  */"
However the check returns when loop->unroll == 0 rather than 1.

The check was added in r255106 where the ChangeLog suggests that the 
actual intention was probably to check the value 1 and not 0.

Tested on aarch64-none-elf with one new regression:
FAIL: gcc.dg/pr40209.c (test for excess errors)
This fails because the changes cause the loop to unroll 3 times using 
unroll_stupid and that shows up as excess error due -fopt-info. This 
option was added in r202077 but I am not sure why this particular test 
was chosen for it.

Does this change look ok? Can I just remove the -fopt-info from the test 
or unrolling the loop in the test is not desirable?

Thanks
Sudi

gcc/ChangeLog:

2019-11-07  Sudakshina Das  

* loop-unroll.c (decide_unroll_constant_iterations): Update condition 
to check
loop->unroll.
(decide_unroll_runtime_iterations): Likewise.
(decide_unroll_stupid): Likewise.
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 63fccd23fae38f8918a7d94411aaa43c72830dd3..9f7ab4b5c1c9b2333148e452b84afbf040707456 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -354,7 +354,7 @@ decide_unroll_constant_iterations (class loop *loop, int flags)
   widest_int iterations;
 
   /* If we were not asked to unroll this loop, just return back silently.  */
-  if (!(flags & UAP_UNROLL) && !loop->unroll)
+  if (!(flags & UAP_UNROLL) && loop->unroll == 1)
 return;
 
   if (dump_enabled_p ())
@@ -674,7 +674,7 @@ decide_unroll_runtime_iterations (class loop *loop, int flags)
   widest_int iterations;
 
   /* If we were not asked to unroll this loop, just return back silently.  */
-  if (!(flags & UAP_UNROLL) && !loop->unroll)
+  if (!(flags & UAP_UNROLL) && loop->unroll == 1)
 return;
 
   if (dump_enabled_p ())
@@ -1159,7 +1159,7 @@ decide_unroll_stupid (class loop *loop, int flags)
   widest_int iterations;
 
   /* If we were not asked to unroll this loop, just return back silently.  */
-  if (!(flags & UAP_UNROLL_ALL) && !loop->unroll)
+  if (!(flags & UAP_UNROLL_ALL) && loop->unroll == 1)
 return;
 
   if (dump_enabled_p ())


Re: [PATCH, GCC] Fix unrolling check.

2019-11-11 Thread Sudakshina Das
Hi Eric

On 08/11/2019 19:16, Eric Botcazou wrote:
>> I was fiddling around with the loop unrolling pass and noticed a check
>> in decide_unroll_* functions (in the patch). The comment on top of this
>> check says
>> "/* If we were not asked to unroll this loop, just return back silently.
>>*/"
>> However the check returns when loop->unroll == 0 rather than 1.
>>
>> The check was added in r255106 where the ChangeLog suggests that the
>> actual intention was probably to check the value 1 and not 0.
> 
> No, this is intended, 0 is the default value of the field, not 1.  And note
> that decide_unroll_constant_iterations, decide_unroll_runtime_iterations and
> decide_unroll_stupid *cannot* be called with loop->unroll == 1 because of this
> check in decide_unrolling:

Thanks for the explanation. However, I do not understand why are we 
returning with the default value. The comment for "unroll" is a bit 
ambiguous for value 0.

   /* The number of times to unroll the loop.  0 means no information given,
  just do what we always do.  A value of 1 means do not unroll the loop.
  A value of USHRT_MAX means unroll with no specific unrolling factor.
  Other values means unroll with the given unrolling factor.  */
   unsigned short unroll;

What "do we always do"?

Thanks
Sudi

> 
>if (loop->unroll == 1)
>   {
> if (dump_file)
>   fprintf (dump_file,
>";; Not unrolling loop, user didn't want it unrolled\n");
> continue;
>   }
> 
>> Tested on aarch64-none-elf with one new regression:
>> FAIL: gcc.dg/pr40209.c (test for excess errors)
>> This fails because the changes cause the loop to unroll 3 times using
>> unroll_stupid and that shows up as excess error due -fopt-info. This
>> option was added in r202077 but I am not sure why this particular test
>> was chosen for it.
> 
> That's a regression, there should be no unrolling.
> 



Re: [PATCH, GCC] Fix unrolling check.

2019-11-11 Thread Sudakshina Das
On 11/11/2019 14:50, Eric Botcazou wrote:
>> Thanks for the explanation. However, I do not understand why are we
>> returning with the default value.
> 
> The regression you reported should be clear enough though: if we don't do
> that, we will unroll in cases where we would not have before.  Try with a
> compiler that predates the pragma and compare, there should be no changes.
> 
>> What "do we always do"?
> 
> What we do in the absence of specific unrolling directives for the loop.

Yeah fair enough! Sorry for the trouble.

Sudi
> 



[Patch, GCC] Fix a condition post r278611

2019-12-05 Thread Sudakshina Das
Hi

While looking at vect_model_reduction_cost function, it seems Richard's 
change in a recent commit r278611 missed an update to the following if 
condition. Since the check for EXTRACT_LAST_REDUCTION is now split 
above, the same check in the if condition will never be true.

gcc/ChangeLog

2019-xx-xx  Sudakshina Das  

* tree-vect-loop.c (vect_model_reduction_cost): Remove
reduction_type check from if condition.

Is this ok for trunk?

Thanks
Sudi
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ca8c818..7469204 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3933,7 +3933,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, internal_fn reduc_fn,
   /* No extra instructions needed in the prologue.  */
   prologue_cost = 0;
 
-  if (reduction_type == EXTRACT_LAST_REDUCTION || reduc_fn != IFN_LAST)
+  if (reduc_fn != IFN_LAST)
 	/* Count one reduction-like operation per vector.  */
 	inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar,
 	stmt_info, 0, vect_body);


Re: [Patch, GCC] Fix a condition post r278611

2019-12-05 Thread Sudakshina Das
Hi Richard

On 05/12/2019 17:04, Richard Sandiford wrote:
> Sudakshina Das  writes:
>> Hi
>>
>> While looking at vect_model_reduction_cost function, it seems Richard's
>> change in a recent commit r278611 missed an update to the following if
>> condition. Since the check for EXTRACT_LAST_REDUCTION is now split
>> above, the same check in the if condition will never be true.
>>
>> gcc/ChangeLog
>>
>> 2019-xx-xx  Sudakshina Das  
>>
>>  * tree-vect-loop.c (vect_model_reduction_cost): Remove
>>  reduction_type check from if condition.
>>
>> Is this ok for trunk?
> 
> OK, thanks.

Thanks. Committed as r279012.

Sudi

> 
> Richard
> 
>>
>> Thanks
>> Sudi
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index ca8c818..7469204 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -3933,7 +3933,7 @@ vect_model_reduction_cost (stmt_vec_info stmt_info, 
>> internal_fn reduc_fn,
>> /* No extra instructions needed in the prologue.  */
>> prologue_cost = 0;
>>   
>> -  if (reduction_type == EXTRACT_LAST_REDUCTION || reduc_fn != IFN_LAST)
>> +  if (reduc_fn != IFN_LAST)
>>  /* Count one reduction-like operation per vector.  */
>>  inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar,
>>  stmt_info, 0, vect_body);



Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts

2019-12-06 Thread Sudakshina Das
Hi

While looking at the vectorization for following example, we realized 
that even though vectorizable_shift function was distinguishing vector 
shifted by vector from vector shifted by scalar, while modeling the cost 
it would always add the cost of building a vector constant despite not 
needing it for vector shifted by scalar.

This patch fixes this by using scalar_shift_arg to determine whether we 
need to build a vector for the second operand or not. This reduces 
prologue cost as shown in the test.

Build and regression tests pass on aarch64-none-elf and 
x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in 
Spec2017 for AArch64.

gcc/ChangeLog:

2019-xx-xx  Sudakshina Das  
Richard Sandiford  

* tree-vect-stmt.c (vectorizable_shift): Condition ndts for
vect_model_simple_cost call on scalar_shift_arg.

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  

* gcc.dg/vect/vect-shift-5.c: New test.

Is this ok for trunk?

Thanks
Sudi

diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
new file mode 100644
index 000..c1fd4f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_int } */
+
+typedef unsigned int uint32_t;
+typedef short unsigned int uint16_t;
+
+int foo (uint32_t arr[4][4])
+{
+  int sum = 0;
+  for(int i = 0; i < 4; i++)
+{
+  sum += ((arr[0][i] >> 10) * 20) + ((arr[1][i] >> 11) & 53)
+	 + ((arr[2][i] >> 12) * 7)  + ((arr[3][i] >> 13) ^ 43);
+}
+return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1;
+}
+
+/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2cb6b15..396ff15 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5764,7 +5764,8 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 {
   STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type;
   DUMP_VECT_SCOPE ("vectorizable_shift");
-  vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, cost_vec);
+  vect_model_simple_cost (stmt_info, ncopies, dt,
+			  scalar_shift_arg ? 1 : ndts, slp_node, cost_vec);
   return true;
 }
 



Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts

2019-12-09 Thread Sudakshina Das
Hi Jeff

On 07/12/2019 17:44, Jeff Law wrote:
> On Fri, 2019-12-06 at 14:05 +0000, Sudakshina Das wrote:
>> Hi
>>
>> While looking at the vectorization for following example, we
>> realized
>> that even though vectorizable_shift function was distinguishing
>> vector
>> shifted by vector from vector shifted by scalar, while modeling the
>> cost
>> it would always add the cost of building a vector constant despite
>> not
>> needing it for vector shifted by scalar.
>>
>> This patch fixes this by using scalar_shift_arg to determine whether
>> we
>> need to build a vector for the second operand or not. This reduces
>> prologue cost as shown in the test.
>>
>> Build and regression tests pass on aarch64-none-elf and
>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>> Spec2017 for AArch64.
>>
>> gcc/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  
>>  Richard Sandiford  
>>
>>  * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>>  vect_model_simple_cost call on scalar_shift_arg.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-xx-xx  Sudakshina Das  
>>
>>  * gcc.dg/vect/vect-shift-5.c: New test.
> It's a bit borderline, but it's really just twiddling a cost, so OK.

Thanks :) Committed as r279114.

Sudi

> 
> jeff
> 



Re: Fwd: [PATCH, GCC, Vect] Fix costing for vector shifts

2019-12-10 Thread Sudakshina Das
Hi Christophe

On 10/12/2019 09:01, Christophe Lyon wrote:
> Hi,
> 
> On Mon, 9 Dec 2019 at 11:23, Sudakshina Das  wrote:
>>
>> Hi Jeff
>>
>> On 07/12/2019 17:44, Jeff Law wrote:
>>> On Fri, 2019-12-06 at 14:05 +, Sudakshina Das wrote:
>>>> Hi
>>>>
>>>> While looking at the vectorization for following example, we
>>>> realized
>>>> that even though vectorizable_shift function was distinguishing
>>>> vector
>>>> shifted by vector from vector shifted by scalar, while modeling the
>>>> cost
>>>> it would always add the cost of building a vector constant despite
>>>> not
>>>> needing it for vector shifted by scalar.
>>>>
>>>> This patch fixes this by using scalar_shift_arg to determine whether
>>>> we
>>>> need to build a vector for the second operand or not. This reduces
>>>> prologue cost as shown in the test.
>>>>
>>>> Build and regression tests pass on aarch64-none-elf and
>>>> x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in
>>>> Spec2017 for AArch64.
>>>>
> 
> Looks like you didn't check on arm, where I can see that the new testcase 
> fails:
> FAIL: gcc.dg/vect/vect-shift-5.c -flto -ffat-lto-objects
> scan-tree-dump vect "vectorizable_shift
> ===[\\n\\r][^\\n]*prologue_cost = 0"
> FAIL: gcc.dg/vect/vect-shift-5.c scan-tree-dump vect
> "vectorizable_shift ===[\\n\\r][^\\n]*prologue_cost = 0"
> 
> Seen on arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu neon-fp16
> 
> Christophe

Thanks for reporting this. There is already a bugzilla report PR92870 
for powerpc that I am looking at. Apologies I couldn't find your email 
address there to add you to the cc list.

Thanks
Sudi

> 
>>>> gcc/ChangeLog:
>>>>
>>>> 2019-xx-xx  Sudakshina Das  
>>>>   Richard Sandiford  
>>>>
>>>>   * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
>>>>   vect_model_simple_cost call on scalar_shift_arg.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-xx-xx  Sudakshina Das  
>>>>
>>>>   * gcc.dg/vect/vect-shift-5.c: New test.
>>> It's a bit borderline, but it's really just twiddling a cost, so OK.
>>
>> Thanks :) Committed as r279114.
>>
>> Sudi
>>
>>>
>>> jeff
>>>
>>



[Committed, testsuite] Fix PR92870

2019-12-12 Thread Sudakshina Das
Hi

With my recent commit, I added a test that is not passing on all 
targets. My change was valid for targets that have a vector/scalar 
shift/rotate optabs (optab that supports vector shifted by scalar).

Since it does not seem to be easy to find out which targets would 
support it, I am limiting the test to the target that I know pass.

Committed as obvious r279310.

gcc/testsuite/ChangeLog

2019-12-12  Sudakshina Das  

PR testsuite/92870
* gcc.dg/vect/vect-shift-5.c: Add target to scan-tree-dump.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
index c1fd4f2..68e517e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
@@ -16,4 +16,7 @@ int foo (uint32_t arr[4][4])
 return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1;
 }
 
-/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */
+/* For a target that has a vector/scalar shift/rotate optab, check
+   that we are not adding the cost of creating a vector from the scalar
+   in the prologue.  */
+/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" { target { aarch64*-*-* x86_64-*-* } } } } */


[PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-07-28 Thread Sudakshina Das
Hi

This is my attempt at reviving the old patch 
https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html

I have followed on Kyrill's comment upstream on the link above and I am using 
the recommended option iii that he mentioned.
"1) Adjust the copy_limit to 256 bits after checking 
AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning.
 2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256-bit 
moves. by iii:
   iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs ldp/stps. 
This wouldn't need any adjustments to
MD patterns, but would make 
aarch64_copy_one_block_and_progress_pointers more complex as it would now have
two paths, where one handles two adjacent memory addresses in one 
calls."

With this patch the following test

#define N 8
extern int src[N], dst[N];

void
foo (void)
{
  __builtin_memcpy (dst, src, N * sizeof (int));
}

which was originally giving
foo:
adrpx1, src
add x1, x1, :lo12:src
ldp x4, x5, [x1]
adrpx0, dst
add x0, x0, :lo12:dst
ldp x2, x3, [x1, 16]
stp x4, x5, [x0]
stp x2, x3, [x0, 16]
ret


changes to the following
foo:
adrpx1, src
add x1, x1, :lo12:src
adrpx0, dst
add x0, x0, :lo12:dst
ldp q1, q0, [x1]
stp q1, q0, [x0]
ret

This gives about 1.3% improvement on 523.xalancbmk_r in SPEC2017 and an overall 
code size reduction on most
SPEC2017 Int benchmarks on Neoverse N1 due to more LDP/STP Q pair registers.

Bootstrapped and regression tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Thanks
Sudi

gcc/ChangeLog:

2020-07-23  Sudakshina Das  
Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_gen_store_pair): Add case
for E_V4SImode.
(aarch64_gen_load_pair): Likewise.
(aarch64_copy_one_block_and_progress_pointers): Handle 256 bit copy.
(aarch64_expand_cpymem): Expand copy_limit to 256bits where
appropriate.

gcc/testsuite/ChangeLog:

2020-07-23  Sudakshina Das  
Kyrylo Tkachov  

* gcc.target/aarch64/cpymem-q-reg_1.c: New test.
* gcc.target/aarch64/large_struct_copy_2.c: Update for ldp q regs.

** Attachment inlined  
**

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
3fe1feaa80ccb0a287ee1c7ea1056e8f0a830532..a38ff39c4d5d53f056bbba3114ebaf8f0414c037
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6920,6 +6920,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx 
reg1, rtx mem2,
 case E_TFmode:
   return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2);
 
+case E_V4SImode:
+  return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2);
+
 default:
   gcc_unreachable ();
 }
@@ -6943,6 +6946,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx 
mem1, rtx reg2,
 case E_TFmode:
   return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2);
 
+case E_V4SImode:
+  return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
+
 default:
   gcc_unreachable ();
 }
@@ -21097,6 +21103,27 @@ static void
 aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,
  machine_mode mode)
 {
+  /* Handle 256-bit memcpy separately.  We do this by making 2 adjacent memory
+ address copies using V4SImode so that we can use Q registers.  */
+  if (known_eq (GET_MODE_BITSIZE (mode), 256))
+{
+  mode = V4SImode;
+  rtx reg1 = gen_reg_rtx (mode);
+  rtx reg2 = gen_reg_rtx (mode);
+  /* "Cast" the pointers to the correct mode.  */
+  *src = adjust_address (*src, mode, 0);
+  *dst = adjust_address (*dst, mode, 0);
+  /* Emit the memcpy.  */
+  emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2,
+   aarch64_progress_pointer (*src)));
+  emit_insn (aarch64_gen_store_pair (mode, *dst, reg1,
+aarch64_progress_pointer (*dst), 
reg2));
+  /* Move the pointers forward.  */
+  *src = aarch64_move_pointer (*src, 32);
+  *dst = aarch64_move_pointer (*dst, 32);
+  return;
+}
+
   rtx reg = gen_reg_rtx (mode);
 
   /* "Cast" the pointers to the correct mode.  */
@@ -21150,9 +21177,12 @@ aarch64_expand_cpymem (rtx *operands)
   /* Convert n to bits to make the rest of the code simpler.  */
   n = n * BITS_PER_UNIT;
 
-  /* Maximum amount to copy in one go.  The AArch64 back-end has integer modes
- larger than TImode, but we should not use them for loads/stores here.  */
-  const int copy_limit = GET_MODE_BITSIZE (TImode);
+  /* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
+ AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and 

RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-08-04 Thread Sudakshina Das
Hi Richard

> -Original Message-
> From: Richard Sandiford 
> Sent: 31 July 2020 16:14
> To: Sudakshina Das 
> Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov 
> Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem
> expansion
> 
> Sudakshina Das  writes:
> > Hi
> >
> > This is my attempt at reviving the old patch
> > https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html
> >
> > I have followed on Kyrill's comment upstream on the link above and I am
> using the recommended option iii that he mentioned.
> > "1) Adjust the copy_limit to 256 bits after checking
> AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning.
> >  2) Adjust aarch64_copy_one_block_and_progress_pointers to handle 256-
> bit moves. by iii:
> >iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs 
> > ldp/stps. This
> wouldn't need any adjustments to
> > MD patterns, but would make
> aarch64_copy_one_block_and_progress_pointers more complex as it would
> now have
> > two paths, where one handles two adjacent memory addresses in one
> calls."
> >
> > With this patch the following test
> >
> > #define N 8
> > extern int src[N], dst[N];
> >
> > void
> > foo (void)
> > {
> >   __builtin_memcpy (dst, src, N * sizeof (int)); }
> >
> > which was originally giving
> > foo:
> > adrpx1, src
> > add x1, x1, :lo12:src
> > ldp x4, x5, [x1]
> > adrpx0, dst
> > add x0, x0, :lo12:dst
> > ldp x2, x3, [x1, 16]
> > stp x4, x5, [x0]
> > stp x2, x3, [x0, 16]
> > ret
> >
> >
> > changes to the following
> > foo:
> > adrpx1, src
> > add x1, x1, :lo12:src
> > adrpx0, dst
> > add x0, x0, :lo12:dst
> > ldp q1, q0, [x1]
> > stp q1, q0, [x0]
> > ret
> >
> > This gives about 1.3% improvement on 523.xalancbmk_r in SPEC2017 and
> > an overall code size reduction on most
> > SPEC2017 Int benchmarks on Neoverse N1 due to more LDP/STP Q pair
> registers.
> 
> Sorry for the slow review.  LGTM with a very minor nit (sorry)…

Thanks. Committed with the change.
> 
> > @@ -21150,9 +21177,12 @@ aarch64_expand_cpymem (rtx *operands)
> >/* Convert n to bits to make the rest of the code simpler.  */
> >n = n * BITS_PER_UNIT;
> >
> > -  /* Maximum amount to copy in one go.  The AArch64 back-end has
> integer modes
> > - larger than TImode, but we should not use them for loads/stores here.
> */
> > -  const int copy_limit = GET_MODE_BITSIZE (TImode);
> > +  /* Maximum amount to copy in one go.  We allow 256-bit chunks based
> on the
> > + AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter and
> > +TARGET_SIMD.  */
> > +  const int copy_limit = ((aarch64_tune_params.extra_tuning_flags
> > +  & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
> > + || !TARGET_SIMD)
> > +? GET_MODE_BITSIZE (TImode) :  256;
> 
> Should only be one space before “256”.
> 
> I guess at some point we should consider handling fixed-length SVE too, but
> that's only worth it for -msve-vector-bits=512 and higher.

Yes sure I will add this for future backlog.
> 
> Thanks,
> Richard


RE: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem expansion

2020-08-05 Thread Sudakshina Das
Hi Richard

Thank you for fixing this. I apologise for the trouble. I ran bootstrap only on 
an
earlier version of the patch where I should have ran it again on the final one! 
☹
I will be more careful in the future,

Thanks
Sudi


> -Original Message-
> From: Richard Sandiford 
> Sent: 05 August 2020 14:52
> To: Andreas Schwab 
> Cc: Sudakshina Das ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH V2] aarch64: Use Q-reg loads/stores in movmem
> expansion
> 
> Andreas Schwab  writes:
> > This breaks bootstrap.
> 
> I've pushed the below to fix this after bootstrapping & regression testing on
> aarch64-linux-gnu.
> 
> Richard



RE: [PATCH] Fix handling of OPT_mgeneral_regs_only in attribute.

2020-05-21 Thread Sudakshina Das
Hi Martin

> -Original Message-
> From: Martin Liška 
> Sent: 21 May 2020 16:01
> To: gcc-patches@gcc.gnu.org
> Cc: Sudakshina Das 
> Subject: [PATCH] Fix handling of OPT_mgeneral_regs_only in attribute.
> 
> Hi.
> 
> Similarly to:
> 
>  case OPT_mstrict_align:
>if (val)
>   opts->x_target_flags |= MASK_STRICT_ALIGN;
>else
>   opts->x_target_flags &= ~MASK_STRICT_ALIGN;
>return true;
> 
> the MASK_GENERAL_REGS_ONLY mask should be handled the same way.

My old patch added the -mno-* version of the option and hence needed the change.
Without the _no_ version for mgeneral-regs-only, I would imagine "val" to only 
ever have
1 as a value. Am I missing something here?

Sudi
> 
> @Sudakshina: The 'opts->x_target_flags |= MASK_STRICT_ALIGN' change is
> not backported to all active branches. Can you please do it?
> 
> Ready to be installed?
> 
> gcc/ChangeLog:
> 
> 2020-05-21  Martin Liska  
> 
>   * common/config/aarch64/aarch64-common.c
> (aarch64_handle_option):
>   Properly maask MASK_GENERAL_REGS_ONLY based on val.
> ---
>   gcc/common/config/aarch64/aarch64-common.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 



[PATCH, GCC, AArch64] Fix PR88398 for AArch64

2019-11-14 Thread Sudakshina Das
Hi

This patch is trying to fix PR88398 for AArch64. As discussed in the PR, 
loop unrolling is probably what we can do here. As an easy fix, the 
existing unroll_stupid is unrolling the given example better than the 
unroll_runtime_iterations since the the loop contains a break inside it.

So all I have done here is:
1) Add a target hook so that this is AArch64 specific.
2) We are not unrolling the loops that decide_unroll_runtime_iterations
would reject.
3) Out of the ones that decide_unroll_runtime_iterations would accept, 
check if the loop has more than 1 exit (this is done in the new target 
hook) and if it does, try to unroll using unroll_stupid.

Regression tested on AArch64 and added the test from the PR. This gives 
an overall code size reduction of 2.35% and performance gain of 0.498% 
on Spec2017 Intrate.

Is this ok for trunk?

Thanks
Sudi

gcc/ChangeLog:

2019-xx-xx  Sudakshina Das  

PR88398
* cfgloop.h: Include target.h.
(lpt_dec): Move to...
* target.h (lpt_dec): ... Here.
* target.def: Define TARGET_LOOP_DECISION_ADJUST.
* loop-unroll.c (decide_unroll_runtime_iterations): Use new target hook.
(decide_unroll_stupid): Likewise.
* config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New function.
(TARGET_LOOP_DECISION_ADJUST): Define for AArch64.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST.

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  

PR88398
* gcc.target/aarch64/pr88398.c: New test.
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 0b0154ffd7bf031a005de993b101d9db6dd98c43..985c74e3b60728fc8c9d34b69634488cae3451cb 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -21,15 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_CFGLOOP_H
 
 #include "cfgloopmanip.h"
-
-/* Structure to hold decision about unrolling/peeling.  */
-enum lpt_dec
-{
-  LPT_NONE,
-  LPT_UNROLL_CONSTANT,
-  LPT_UNROLL_RUNTIME,
-  LPT_UNROLL_STUPID
-};
+#include "target.h"
 
 struct GTY (()) lpt_decision {
   enum lpt_dec decision;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 599d07a729e7438080f8b5240ee95037a49fb983..f31ac41d66257c01ead8d5f5b9b22379ecb5d276 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21093,6 +21093,39 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
 }
 }
 
+/* Implement TARGET_LOOP_DECISION_ADJUST.  CONSIDER is the loop decision
+   currently being checked for loop LOOP.  This returns a decision which could
+   either be LPT_UNROLL_STUPID or the current value in LOOP.  */
+static enum lpt_dec
+aarch64_loop_decision_adjust (enum lpt_dec consider, class loop *loop)
+{
+  switch (consider)
+{
+case LPT_UNROLL_CONSTANT:
+  return loop->lpt_decision.decision;
+
+case LPT_UNROLL_RUNTIME:
+/* Fall through.  */
+case LPT_UNROLL_STUPID:
+  {
+	vec edges = get_loop_exit_edges (loop);
+	if (edges.length () > 1)
+	  {
+	if (dump_file)
+	  fprintf (dump_file, ";; Need change in loop decision\n");
+	consider = LPT_UNROLL_STUPID;
+	return consider;
+	  }
+	return loop->lpt_decision.decision;
+  }
+
+case LPT_NONE:
+/* Fall through.  */
+default:
+  gcc_unreachable ();
+}
+}
+
 /* Implement TARGET_COMPUTE_PRESSURE_CLASSES.  */
 
 static int
@@ -21839,6 +21872,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
 
+#undef TARGET_LOOP_DECISION_ADJUST
+#define TARGET_LOOP_DECISION_ADJUST aarch64_loop_decision_adjust
+
 #undef TARGET_SCHED_ADJUST_PRIORITY
 #define TARGET_SCHED_ADJUST_PRIORITY aarch64_sched_adjust_priority
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cd9aed9874f4e6b2b0e2f8956ed6155975e643a8..61bd00e84c8a2a8865e95ba579c3b94790ab1331 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11857,6 +11857,15 @@ is required only when the target has special constraints like maximum
 number of memory accesses.
 @end deftypefn
 
+@deftypefn {Target Hook} {enum lpt_dec} TARGET_LOOP_DECISION_ADJUST (enum lpt_dec @var{consider}, class loop *@var{loop})
+This target hook returns either a new value for the loop unrolling
+decision or the existing value in @var{loop}. The parameter @var{consider}
+is the loop decision currently being tested. The parameter @var{loop} is a
+pointer to the loop, which is going to be checked for unrolling. This target
+hook is required only when the target wants to override the unrolling
+decisions.
+@end deftypefn
+
 @defmac POWI_MAX_MULTS
 If defined, this macro is interpreted as a signed integer C expression
 that specifies the maximum number of floating point multiplications
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 2739e9ceec5ad7253ff9135da8dbe3bf6010e8d7..7a7f917fb45a6cc22f373ff16f8b78aa3e35f210 100644
--- a/gcc/

Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

2019-11-15 Thread Sudakshina Das
Hi Richard
​
I apologise I should have given more explanation on my cover letter.​
Although the bug was filed for vectorization, the conversation on it talked​
about loops with two exits not being supported in the vectorizer and being not​
being possible without lto and peeling causing more harm than benefit.​
​
There was also no clear consensus among the discussion about the best way to do​
unrolling. So I looked at Wilco's suggestion of unrolling here​
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398#c8​
Although unroll_stupid does not exactly unroll it as he shows but​
it gets closer than unroll_runtime_iterations. So I ran an experiment​
to see if unrolliong the loop with unroll_stupid gets any benefit. The code​
size benefit was easy to see with the small example but it also gave 
performance​
benefit on Spec2017. The benefit comes because unroll_runtime_iteration adds​
a switch case at the beginning for iteration check. This is less efficient​
because it creates too many branch close together specially for a loop which​
has more than 1 exit.​
        beq     .L70​
        cmp     x12, 1​
        beq     .L55​
        cmp     x12, 2​
        beq     .L57​
        cmp     x12, 3​
        beq     .L59​
        cmp     x12, 4​
        beq     .L61​
        cmp     x12, 5​
        beq     .L63​
        cmp     x12, 6​
        bne     .L72​
​
Finally I agree that unroll_stupid by default did not touch loops with multiple​
exists but that was marked as a "TODO" to change later so I assumed that check​
was not a hard requirement for the unrolling alghorithm.​
/* Do not unroll loops with branches inside -- it increases number​
     of mispredicts. ​
     TODO: this heuristic needs tunning; call inside the loop body​
     is also relatively good reason to not unroll.  */​
unroll_stupid is also not touched unless there is -funroll-all-loops or a​ loop
pragma incidcating that maybe this could be potentially harmful on certain 
targets.​
Since my experiments on AArch64 showed otherwise, I thought the easiest 
starting​
point would be to do this in a target hook and only for a specific case​
(multiple exits).​
​
Thanks​
Sudi







From: Richard Biener 

Sent: Friday, November 15, 2019 9:32 AM

To: Sudakshina Das 

Cc: gcc-patches@gcc.gnu.org ; Kyrill Tkachov 
; James Greenhalgh ; 
Richard Earnshaw ; bin.ch...@linux.alibaba.com 
;
 o...@ucw.cz 

Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64
 

On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das  wrote:

>

> Hi

>

> This patch is trying to fix PR88398 for AArch64. As discussed in the PR,

> loop unrolling is probably what we can do here. As an easy fix, the

> existing unroll_stupid is unrolling the given example better than the

> unroll_runtime_iterations since the the loop contains a break inside it.


Hm, the bug reference doesn't help me at all in reviewing this - the bug

is about vectorization.


So why is unroll_stupid better than unroll_runtime_iterations for a loop

with a break (or as your implementation, with multiple exists)?


I don't like this target hook, it seems like general heuristics can be

improved here, but it seems unroll-stupid doesn't consider loops

with multiple exits at all?


Richard.


> So all I have done here is:

> 1) Add a target hook so that this is AArch64 specific.

> 2) We are not unrolling the loops that decide_unroll_runtime_iterations

> would reject.

> 3) Out of the ones that decide_unroll_runtime_iterations would accept,

> check if the loop has more than 1 exit (this is done in the new target

> hook) and if it does, try to unroll using unroll_stupid.

>

> Regression tested on AArch64 and added the test from the PR. This gives

> an overall code size reduction of 2.35% and performance gain of 0.498%

> on Spec2017 Intrate.

>

> Is this ok for trunk?

>

> Thanks

> Sudi

>

> gcc/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  

>

> PR88398

> * cfgloop.h: Include target.h.

> (lpt_dec): Move to...

> * target.h (lpt_dec): ... Here.

> * target.def: Define TARGET_LOOP_DECISION_ADJUST.

> * loop-unroll.c (decide_unroll_runtime_iterations): Use new target 
> hook.

> (decide_unroll_stupid): Likewise.

> * config/aarch64/aarch64.c (aarch64_loop_decision_adjust): New 
> function.

> (TARGET_LOOP_DECISION_ADJUST): Define for AArch64.

>     * doc/tm.texi: Regenerated.

> * doc/tm.texi.in: Document TARGET_LOOP_DECISION_ADJUST.

>

> gcc/testsuite/ChangeLog:

>

> 2019-xx-xx  Sudakshina Das  

>

> PR88398

> * gcc.target/aarch64/pr88398.c: New test.


[Committed][Arm][testsuite] Fix failure for arm-fp16-ops-*.C

2019-12-02 Thread Sudakshina Das
Hi

Since r275022 which deprecates some uses of volatile, we have seen the 
following failures on arm-none-eabi and arm-none-linux-gnueabihf
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-1.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-2.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-5.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-6.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-7.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-8.C  -std=gnu++2a (test for 
excess errors)
Which catches the deprecated uses of volatile variables declared in 
arm-fp16-ops.h.

This patch removes the volatile declarations from the header. Since none 
of the tests are run with any high optimization levels, this should 
change should not prevent the real function of the tests.

Tests with RUNTESTFLAGS="dg.exp=arm-fp16-ops-*.C" now pass with the 
patch on arm-none-eabi.
Committed as obvious r278905

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  

* g++.dg/ext/arm-fp16/arm-fp16-ops.h: Remove volatile keyword.

Thanks
Sudi
diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h
index 320494e..a92e081 100644
--- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h
+++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h
@@ -7,16 +7,16 @@
 #define TEST(e) assert (e)
 #define TESTNOT(e) assert (!(e))
 
-volatile __fp16 h0 = 0.0;
-volatile __fp16 h1 = 1.0;
-volatile __fp16 h42 = 42.0;
-volatile __fp16 hm2 = -2.0;
-volatile __fp16 temp;
-
-volatile float f0 = 0.0;
-volatile float f1 = 1.0;
-volatile float f42 = 42.0;
-volatile float fm2 = -2.0;
+__fp16 h0 = 0.0;
+__fp16 h1 = 1.0;
+__fp16 h42 = 42.0;
+__fp16 hm2 = -2.0;
+__fp16 temp;
+
+float f0 = 0.0;
+float f1 = 1.0;
+float f42 = 42.0;
+float fm2 = -2.0;
 
 int main (void)
 {


Re: [PATCH][GCC] Correct name of file in ChangeLog

2018-08-02 Thread Sudakshina Das

Hi Matthew

On 01/08/18 10:25, matthew.malcom...@arm.com wrote:

My first patch included an incorrect ChangeLog entry -- the filename was
misspelt. This corrects it.



I think this counts as an obvious change. I have committed this on your 
behalf.


Thanks
Sudi


Re: [PATCH][GCC][AARCH64] Use STLUR for atomic_store

2018-08-03 Thread Sudakshina Das

Hi Matthew

On 02/08/18 17:26, matthew.malcom...@arm.com wrote:

Use the STLUR instruction introduced in Armv8.4-a.
This insruction has the store-release semantic like STLR but can take a
9-bit unscaled signed immediate offset.

Example test case:
```
void
foo ()
{
 int32_t *atomic_vals = calloc (4, sizeof (int32_t));
 atomic_store_explicit (atomic_vals + 1, 2, memory_order_release);
}
```

Before patch generates
```
foo:
stp x29, x30, [sp, -16]!
mov x1, 4
mov x0, x1
mov x29, sp
bl  calloc
mov w1, 2
add x0, x0, 4
stlrw1, [x0]
ldp x29, x30, [sp], 16
ret
```

After patch generates
```
foo:
stp x29, x30, [sp, -16]!
mov x1, 4
mov x0, x1
mov x29, sp
bl  calloc
mov w1, 2
stlur   w1, [x0, 4]
ldp x29, x30, [sp], 16
ret
```

Full bootstrap and regression test done on aarch64.

Ok for trunk?

gcc/
2018-07-26  Matthew Malcomson  

 * config/aarch64/aarch64-protos.h
 (aarch64_offset_9bit_signed_unscaled_p): New declaration.
 * config/aarch64/aarch64.c
 (aarch64_offset_9bit_signed_unscaled_p): Rename from
 offset_9bit_signed_unscaled_p.
 * config/aarch64/aarch64.h (TARGET_ARMV8_4): Add feature macro.
 * config/aarch64/atomics.md (atomic_store): Allow offset
 and use stlur.
 * config/aarch64/constraints.md (Ust): New constraint.
 * config/aarch64/predicates.md.
 (aarch64_sync_or_stlur_memory_operand): New predicate.

gcc/testsuite/
2018-07-26  Matthew Malcomson  

* gcc.target/aarch64/atomic-store.c: New.



Thank you for doing this. I am not a maintainer but I have a few nits on
this patch:

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
af5db9c595385f7586692258f750b6aceb3ed9c8..630a75bf776fcdc374aa9ffa4bb020fea3719320 
100644

--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -393,6 +393,7 @@ void aarch64_split_add_offset (scalar_int_mode, rtx, 
rtx, rtx, rtx, rtx);

 bool aarch64_mov_operand_p (rtx, machine_mode);
...
-static inline bool
-offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
+bool
+aarch64_offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
   poly_int64 offset)

This needs to be aligned with the first argument

...

@@ -5837,7 +5837,7 @@ aarch64_classify_address (struct 
aarch64_address_info *info,

 ldr/str instructions (only big endian will get here).  */
  if (mode == CImode)
return (aarch64_offset_7bit_signed_scaled_p (TImode, offset)
-   && (offset_9bit_signed_unscaled_p (V16QImode, offset + 32)
+   && (aarch64_offset_9bit_signed_unscaled_p (V16QImode, 
offset + 32)

This is not less that 80 characters

...

+;; STLUR instruction constraint requires Armv8.4
+(define_special_memory_constraint "Ust"
+ "@internal
+ A memory address suitable for use with an stlur instruction."
+  (and (match_operand 0 "aarch64_sync_or_stlur_memory_operand")
+   (match_test "TARGET_ARMV8_4")))
+

You are already checking for TARGET_ARMV8_4 inside
aarch64_sync_or_stlur_memory_operand. Also see my comment below for this
function.

...

+;; True if the operand is memory reference valid for one of a str or stlur
+;; operation.
+(define_predicate "aarch64_sync_or_stlur_memory_operand"
+  (ior (match_operand 0 "aarch64_sync_memory_operand")
+   (and (match_operand 0 "memory_operand")
+   (match_code "plus" "0")
+   (match_code "reg" "00")
+   (match_code "const_int" "01")))
+{
+  if (aarch64_sync_memory_operand (op, mode))
+return true;
+
+  if (!TARGET_ARMV8_4)
+return false;
+
+  rtx mem_op = XEXP (op, 0);
+  rtx plus_op0 = XEXP (mem_op, 0);
+  rtx plus_op1 = XEXP (mem_op, 1);
+
+  if (GET_MODE (plus_op0) != DImode)
+return false;
+
+  poly_int64 offset;
+  poly_int_rtx_p (plus_op1, &offset);
+  return aarch64_offset_9bit_signed_unscaled_p (mode, offset);
+})
+

This predicate body makes it a bit mixed up with the two type of
operands that you want to test especially looking at it from the
constraint check perspective. I am assuming you would not want to use
the non-immediate form of stlur and instead only use it in the
form:
STLUR , [, #]
and use stlr for no immediate alternative. Thus the constraint does not
need to check for aarch64_sync_memory_operand.

My suggestion would be to make this operand check separate. Something
like:

+(define_predicate "aarch64_sync_or_stlur_memory_operand"
+  (ior (match_operand 0 "aarch64_sync_memory_operand")
+   (match_operand 0 "aarch64_stlur_memory_operand")))

Where you define aarch64_stlur_memory_operand as

+bool aarch64_stlur_memory_operand (rtx op)
+{
+  if (!TARGET_ARMV8_4)
+return false;
+
+  rtx mem_op = XE

Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.

2018-08-14 Thread Sudakshina Das

Hi Tamar

On 13/08/18 17:27, Tamar Christina wrote:

Hi Thomas,

Thanks for the review.

I’ll correct the typo before committing if I have no other changes required by 
a maintainer.

Regards,
Tamar.



I am not a maintainer but I would like to point out something in your
patch. I think you test case will fail with -mabi=ilp32

FAIL: gcc.target/aarch64/large_struct_copy_2.c (test for excess errors)
Excess errors:
/work/trunk/src/gcc/gcc/testsuite/gcc.target/aarch64/large_struct_copy_2.c:18:27: 
warning: overflow in conversion from 'long

long int' to 'long int' changes value from '4073709551611' to
'2080555003' [-Woverflow]

We have had more such recent failures and James gave a very neat
way to make sure the mode comes out what you intend it to here:
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00233.html

I would just ask you to change the data types accordingly and test it
with -mabi=ilp32.

Thanks
Sudi


From: Thomas Preudhomme 
Sent: Monday, August 13, 2018 14:37
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh 
; Richard Earnshaw ; Marcus Shawcroft 

Subject: Re: [PATCH][GCC][AArch64] Limit movmem copies to TImode copies.

Hi Tamar,

Thanks for your patch.

Just one comment about your ChangeLog entry for the testsuiet change: shouldn't 
it mention that it is a new testcase? The patch you attached seems to create 
the file.

Best regards,

Thomas

On Mon, 13 Aug 2018 at 10:33, Tamar Christina 
mailto:tamar.christ...@arm.com>> wrote:
Hi All,

On AArch64 we have integer modes larger than TImode, and while we can generate
moves for these they're not as efficient.

So instead make sure we limit the maximum we can copy to TImode.  This means
copying a 16 byte struct will issue 1 TImode copy, which will be done using a
single STP as we expect but an CImode sized copy won't issue CImode operations.

Bootstrapped and regtested on aarch4-none-linux-gnu and no issues.
Crosstested aarch4_be-none-elf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-13  Tamar Christina  
mailto:tamar.christ...@arm.com>>

 * config/aarch64/aarch64.c (aarch64_expand_movmem): Set TImode max.

gcc/testsuite/
2018-08-13  Tamar Christina  
mailto:tamar.christ...@arm.com>>

 * gcc.target/aarch64/large_struct_copy_2.c: Add assembler scan.

--





Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

2017-11-22 Thread Sudakshina Das



On 22/11/17 15:21, Kyrill Tkachov wrote:


On 22/11/17 11:25, Sudi Das wrote:

Hi Kyrill and Christophe


In case of soft fp testing, there are other assembly directives apart 
from the vmov one which are also failing. The directives probably 
make more sense in the hard fp context so instead of removing the 
vmov, I have added the -mfloat-abi=hard option.


Is this ok for trunk? If yes could someone post it on my behalf?

Sudi


Thanks Sudi,

You're right, this whole test isn't written with softfp in mind. We 
might as well restrict it to -mfloat-abi=hard.

I've committed the patch with r255061.


Hi Kyriil

Thanks for the commit!


Would you like to get commit access to the SVN repo?
If you complete the form at 
https://sourceware.org/cgi-bin/pdw/ps_form.cgi

using my email address as the approver we can get that sorted out :)



Thanks again for the invite. I have filled out the form! :)

Sudi



Kyrill



*** gcc/testsuite/ChangeLog ***

2017-11-22  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-1.c: Add -mfloat-abi=hard option.




From: Kyrill Tkachov 
Sent: Monday, November 20, 2017 2:20 PM
To: Christophe Lyon
Cc: Sudi Das; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; 
Richard Earnshaw

Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c


On 20/11/17 14:14, Christophe Lyon wrote:

Hi,

On 17 November 2017 at 12:12, Kyrill  Tkachov
 wrote:

On 17/11/17 10:45, Sudi Das wrote:

Hi Kyrill

Thanks I have made the change.

Thanks Sudi, I've committed this on your behalf with r254863.

Kyrill



Sudi



From: Kyrill Tkachov 
Sent: Thursday, November 16, 2017 5:03 PM
To: Sudi Das; gcc-patches@gcc.gnu.org
Cc: nd; Ramana Radhakrishnan; Richard Earnshaw
Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c

Hi Sudi,

On 16/11/17 16:37, Sudi Das wrote:

Hi

This patch fixes the test case armv8_2-fp16-move-1.c for
arm-none-linux-gnueabihf where 2 of the scan-assembler directives 
were

failing. We now generate less vmov between core and VFP registers.
Thus changing those directives to reflect that.

Is this ok for trunk?
If yes could someone commit it on my behalf?

Sudi


*** gcc/testsuite/ChangeLog ***

2017-11-16  Sudakshina Das  

 * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov
scan-assembler
 directives.


diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index bb4e68f..0ed8560 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
  /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, 
s[0-9]+,

s[0-9]+} 1 } }  */
  /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, 
s[0-9]+,

s[0-9]+} 1 } }  */
  -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, 
r[0-9]+} 4 }

}  */
-/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, 
s[0-9]+} 4 } }

*/
+/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, 
r[0-9]+} 2 } }

*/
+/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } 
}  */
  Some of the moves between core and fp registers were the 
result of

inefficient codegen and in hindsight
scanning for them was not very useful. Now that we emit only the 
required

ones I think scanning for the plain
vmovs between two S-registers doesn't test anything useful.
So can you please just remove the second scan-assembler directive 
here?



You are probably already aware of that: the tests fail on
arm-none-linux-gnueabi/arm-none-eabi
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vmov\\.f16\\ts[0-9]+, r[0-9]+ 2 (found 38 times)

but this is not a regression, the previous version of the test had the
same problem.

Grrr, that's because the softfp ABI necessitates moves between core and
FP registers,
so scanning for a particular number of vmovs between them is just not
gonna be stable across
soft-float ABIs. At this point I'd just remove the scan for vmovs
completely as it doesn't
seem to check anything useful.

A patch to remove that scan for VMOV is pre-approved.

Thanks for reminding me of this Christophe, softfp tends to slip my 
mind :(

Kyrill


Christophe


Thanks,
Kyrill








[PATCH] Add myself as GCC maintainer

2017-11-22 Thread Sudakshina Das

Added myself as GCC maintainer with r255071

*** ChangeLog ***

2017-11-22  Sudakshina Das  

* MAINTAINERS (Write After Approval): Add myself.


Thanks
Sudi
diff --git a/ChangeLog b/ChangeLog
index 13b0321..adaec62 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2017-11-22  Sudakshina Das  
+
+	* MAINTAINERS (Write After Approval): Add myself.
+
 2017-11-06  Palmer Dabbelt  
 
 	* MAINTAINERS (RISC-V): Add Jim Wilson as a maintainer.
diff --git a/MAINTAINERS b/MAINTAINERS
index d207b58..fb45a3c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -359,6 +359,7 @@ Ian Dall	
 David Daney	
 Robin Dapp	
 Simon Dardis	
+Sudakshina Das	
 Bud Davis	
 Chris Demetriou	
 Sameera Deshpande


[PATCH][ARM] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-11-24 Thread Sudakshina Das

Hi

For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
  return (a < b) ? b : c;
}

when compiled with -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm 
-mfloat-abi=hard trunk generates wrong code:


test_select:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16   s0, s0
vcvtb.f32.f16   s15, s1
vcmpe.f32   s0, s15
vmrsAPSR_nzcv, FPSCR
// <-- No conditional branch!
vmovs1, s2  @ __fp16
.L2:
vmovs0, s1  @ __fp16
bx  lr

There should have been a conditional branch there to skip one of the VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional
wherever needed.

Testing done: Add a new test case and checked for regressions 
arm-none-linux-gnueabihf.


Is this ok for trunk?

Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-24  Sudakshina Das  

* config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-24  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-2.c: New test.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 075a938..61b6477 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -410,7 +410,10 @@
   gcc_unreachable ();
 }
  }
-  [(set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
+  [(set_attr "conds" "*, *, unconditional, *, unconditional, unconditional,\
+		  unconditional, unconditional, unconditional,\
+		  unconditional")
+   (set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
(set_attr "predicable_short_it" "no, no, no, yes,\
 no, no, no, no,\
 no, no")
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
new file mode 100644
index 000..fcb857f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2 -marm" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+__fp16
+test_select (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a < b) ? b : c;
+}
+/* { dg-final { scan-assembler "bmi" } } */


Replying to an older patch ([PATCH] Fix failing test-case)

2017-11-30 Thread Sudakshina Das

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01157.html

This patch fixed a test case switch-case-2.c. I am seeing 
switch-case-1.c failing on


arm-none-linux-gnueabihf:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; 
basic block[^\\n]*count 2000" 1 (found 0 times)


aarch64-none-linux-gnu:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; 
basic block[^\\n]*count 2000" 1 (found 2 times)


which looks pretty similar (also the same changes make it pass). Can you 
confirm?


Sudi


Re: Replying to an older patch ([PATCH] Fix failing test-case)

2017-11-30 Thread Sudakshina Das

On 30/11/17 11:03, Sudakshina Das wrote:

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01157.html

This patch fixed a test case switch-case-2.c. I am seeing
switch-case-1.c failing on

arm-none-linux-gnueabihf:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";;
basic block[^\\n]*count 2000" 1 (found 0 times)

aarch64-none-linux-gnu:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";;
basic block[^\\n]*count 2000" 1 (found 2 times)

which looks pretty similar (also the same changes make it pass). Can you
confirm?

Sudi


Put wrong email address earlier. And adding more people on cc


[PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-11-30 Thread Sudakshina Das

Hi

This patch is the fix for gcc-7 for the same issue as mentioned in:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html


For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
  return (a < b) ? b : c;
}

when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm 
-mfloat-abi=hard generates wrong code:


test_select:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16   s0, s0
vcvtb.f32.f16   s15, s1
vmov.f16r3, s2  @ __fp16
vcmpe.f32   s0, s15
vmrsAPSR_nzcv, FPSCR
// <-- No conditional branch
vmov.f16r3, s1  @ __fp16
.L1:
vmov.f16s0, r3  @ __fp16
bx  lr

There should have been a conditional branch there to skip one of the VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional.

Testing done: Add a new test case and checked for regressions on 
bootstrapped arm-none-linux-gnueabihf.


Is this ok for gcc-7?

Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-30  Sudakshina Das  

* config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-30  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-2.c: New test.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index d8f77e2ffe4fdb7c952d6a5ac947d91f89ce259d..9f06c3da9526d09e43836a60f14da9a49671a393 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -456,7 +456,10 @@
   gcc_unreachable ();
 }
  }
-  [(set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
+  [(set_attr "conds" "*, *, unconditional, *, unconditional, unconditional,\
+		  unconditional, unconditional, unconditional,\
+		  unconditional")
+   (set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
(set_attr "predicable_short_it" "no, no, no, yes,\
 no, no, no, no,\
 no, no")
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
new file mode 100644
index ..ac7d4e3f2a9fb1d70a9ce95062dc6db4a69272ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+__fp16
+test_select (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a < b) ? b : c;
+}
+/* { dg-final { scan-assembler "bpl" } } */


Re: [PATCH][ARM] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-11-30 Thread Sudakshina Das

Hi Kyrill

On 27/11/17 12:25, Kyrill Tkachov wrote:

Hi Sudi,

On 24/11/17 14:57, Sudi Das wrote:

Hi

For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
   return (a < b) ? b : c;
}

when compiled with -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm
-mfloat-abi=hard trunk generates wrong code:

test_select:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 vcvtb.f32.f16   s0, s0
 vcvtb.f32.f16   s15, s1
 vcmpe.f32   s0, s15
 vmrsAPSR_nzcv, FPSCR
 // <-- No conditional branch!
 vmovs1, s2  @ __fp16
.L2:
 vmovs0, s1  @ __fp16
 bx  lr

There should have been a conditional branch there to skip one of the
VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional
wherever needed.

Testing done: Add a new test case and checked for regressions
arm-none-linux-gnueabihf.

Is this ok for trunk?



This is ok after assuming a bootstrap on arm-none-linux-gnueabihf passes
as well.
Does this bug appear on the GCC 7 branch?
If so, could you please test this patch on that branch as well if so?



I have tested the patch and also sent a new patch request for gcc-7
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02577.html

Thanks
Sudi


Thanks,
Kyrill


Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-24  Sudakshina Das  

* config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-24  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-2.c: New test.






Re: [PATCH][ARM] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-12-01 Thread Sudakshina Das

On 30/11/17 16:07, Kyrill Tkachov wrote:


On 30/11/17 16:06, Sudakshina Das wrote:

Hi Kyrill

On 27/11/17 12:25, Kyrill Tkachov wrote:
> Hi Sudi,
>
> On 24/11/17 14:57, Sudi Das wrote:
>> Hi
>>
>> For the following test case:
>> __fp16
>> test_select (__fp16 a, __fp16 b, __fp16 c)
>> {
>>return (a < b) ? b : c;
>> }
>>
>> when compiled with -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm
>> -mfloat-abi=hard trunk generates wrong code:
>>
>> test_select:
>>  @ args = 0, pretend = 0, frame = 0
>>  @ frame_needed = 0, uses_anonymous_args = 0
>>  @ link register save eliminated.
>>  vcvtb.f32.f16   s0, s0
>>  vcvtb.f32.f16   s15, s1
>>  vcmpe.f32   s0, s15
>>  vmrsAPSR_nzcv, FPSCR
>>  // <-- No conditional branch!
>>  vmovs1, s2  @ __fp16
>> .L2:
>>  vmovs0, s1  @ __fp16
>>  bx  lr
>>
>> There should have been a conditional branch there to skip one of the
>> VMOVs.
>> This patch fixes this problem by making *movhf_vfp_fp16 unconditional
>> wherever needed.
>>
>> Testing done: Add a new test case and checked for regressions
>> arm-none-linux-gnueabihf.
>>
>> Is this ok for trunk?
>>
>
> This is ok after assuming a bootstrap on arm-none-linux-gnueabihf
passes
> as well.
> Does this bug appear on the GCC 7 branch?
> If so, could you please test this patch on that branch as well if so?
>

I have tested the patch and also sent a new patch request for gcc-7
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02577.html



Thanks Sudi, this is ok to commit to the branch after we let this patch
bake on trunk for a week without problems.



Committed as r255301 on trunk. Will wait for a week before committing to 
gcc-7.


Thanks
Sudi


Kyrill



Thanks
Sudi

> Thanks,
> Kyrill
>
>> Sudi
>>
>> ChangeLog entry are as follow:
>>
>> *** gcc/ChangeLog ***
>>
>> 2017-11-24  Sudakshina Das 
>>
>> * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2017-11-24  Sudakshina Das 
>>
>> * gcc.target/arm/armv8_2-fp16-move-2.c: New test.
>







Re: [PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-12-11 Thread Sudakshina Das

On 30/11/17 16:01, Sudakshina Das wrote:

Hi

This patch is the fix for gcc-7 for the same issue as mentioned in:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html


For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
   return (a < b) ? b : c;
}

when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm 
-mfloat-abi=hard generates wrong code:


test_select:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 vcvtb.f32.f16    s0, s0
 vcvtb.f32.f16    s15, s1
 vmov.f16    r3, s2    @ __fp16
 vcmpe.f32    s0, s15
 vmrs    APSR_nzcv, FPSCR
     // <-- No conditional branch
 vmov.f16    r3, s1    @ __fp16
.L1:
 vmov.f16    s0, r3    @ __fp16
 bx    lr

There should have been a conditional branch there to skip one of the VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional.

Testing done: Add a new test case and checked for regressions on 
bootstrapped arm-none-linux-gnueabihf.


Is this ok for gcc-7?

Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-30  Sudakshina Das  

 * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-30  Sudakshina Das  

 * gcc.target/arm/armv8_2-fp16-move-2.c: New test.


As per the trunk thread for this 
(https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html) committed as 
r255536 on gcc-7-branch for the backport.


Thanks
Sudi


Re: [PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-12-12 Thread Sudakshina Das

Hi Christophe

On 12/12/17 09:59, Christophe Lyon wrote:

Hi,



On 11 December 2017 at 18:12, Sudakshina Das  wrote:

On 30/11/17 16:01, Sudakshina Das wrote:


Hi

This patch is the fix for gcc-7 for the same issue as mentioned in:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html


For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
return (a < b) ? b : c;
}

when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm
-mfloat-abi=hard generates wrong code:

test_select:
  @ args = 0, pretend = 0, frame = 0
  @ frame_needed = 0, uses_anonymous_args = 0
  @ link register save eliminated.
  vcvtb.f32.f16s0, s0
  vcvtb.f32.f16s15, s1
  vmov.f16r3, s2@ __fp16
  vcmpe.f32s0, s15
  vmrsAPSR_nzcv, FPSCR
  // <-- No conditional branch
  vmov.f16r3, s1@ __fp16
.L1:
  vmov.f16s0, r3@ __fp16
  bxlr

There should have been a conditional branch there to skip one of the
VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional.

Testing done: Add a new test case and checked for regressions on
bootstrapped arm-none-linux-gnueabihf.

Is this ok for gcc-7?

Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-30  Sudakshina Das  

  * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-30  Sudakshina Das  

  * gcc.target/arm/armv8_2-fp16-move-2.c: New test.



As per the trunk thread for this
(https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html) committed as
r255536 on gcc-7-branch for the backport.



I've noticed that this backport fails on arm-none-linux-gnueabi and
arm-none-eabi.
I suspect this is partly due to the fact that I use a "recent"
dejagnu, and has to do with whether
dg-add-options are appended or pre-pended.
I'm seeing a compilation line with:

-mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard
-mfpu=fp-armv8 -mfloat-abi=softfp -march=armv8.2-a+fp16

leading to:
FAIL: gcc.target/arm/armv8_2-fp16-move-2.c scan-assembler bpl

I'm not sure why this works on trunk, but there I have only:
-marm -mfloat-abi=softfp -march=armv8.2-a+fp16

Maybe this has to do with the new way cpu/fpu options are parsed on trunk.


Sorry for this. I will try to investigate.

Thanks
Sudi



Christophe



Thanks
Sudi




[PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp

2017-12-13 Thread Sudakshina Das

Hi

This patch is a follow up to the existing discussions on 
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html
Bin had earlier submitted a patch to fix the ICE that occurs because of 
the missing LTGT in aarch64-simd.md.
That discussion opened up a new bug report PR81647 for an inconsistent 
behavior.


As discussed earlier on the gcc-patches discussion and on the bug 
report, PR81647 was occurring because of how UNEQ was handled in 
aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is 
guaranteed to not give an FP exception but LTGT might, 
__builtin_islessgreater gets converted to ~UNEQ very early on in 
fold_builtin_unordered_cmp. Thus I will post a separate patch for 
correcting how UNEQ and other unordered comparisons are handled in 
aarch64-simd.md.


This patch is only adding the missing LTGT to plug the ICE.

Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new compile time test case that gives 
out LTGT to make sure it doesn't ICE.


Is this ok for trunk?

Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-13  Sudakshina Das  
Bin Cheng  

PR target/81228
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Move LTGT
to CCFPEmode.
* config/aarch64/aarch64-simd.md (vec_cmp): Add
LTGT.

*** gcc/testsuite/ChangeLog ***

2017-12-13  Sudakshina Das  

PR target/81228
* gcc.dg/pr81228.c: New.


Re: [PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp

2017-12-13 Thread Sudakshina Das

On 13/12/17 16:42, Sudakshina Das wrote:

Hi

This patch is a follow up to the existing discussions on 
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html
Bin had earlier submitted a patch to fix the ICE that occurs because of 
the missing LTGT in aarch64-simd.md.
That discussion opened up a new bug report PR81647 for an inconsistent 
behavior.


As discussed earlier on the gcc-patches discussion and on the bug 
report, PR81647 was occurring because of how UNEQ was handled in 
aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is 
guaranteed to not give an FP exception but LTGT might, 
__builtin_islessgreater gets converted to ~UNEQ very early on in 
fold_builtin_unordered_cmp. Thus I will post a separate patch for 
correcting how UNEQ and other unordered comparisons are handled in 
aarch64-simd.md.


This patch is only adding the missing LTGT to plug the ICE.

Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new compile time test case that gives 
out LTGT to make sure it doesn't ICE.


Is this ok for trunk?

Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-13  Sudakshina Das  
     Bin Cheng  

     PR target/81228
     * config/aarch64/aarch64.c (aarch64_select_cc_mode): Move LTGT
     to CCFPEmode.
     * config/aarch64/aarch64-simd.md (vec_cmp): Add
     LTGT.

*** gcc/testsuite/ChangeLog ***

2017-12-13  Sudakshina Das  

     PR target/81228
     * gcc.dg/pr81228.c: New.


Sorry
Forgot to attach the patch!

Sudi
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ae71af8334343a749f11db1801554eac01a33cac..f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2759,6 +2759,7 @@
 case UNEQ:
 case ORDERED:
 case UNORDERED:
+case LTGT:
   break;
 default:
   gcc_unreachable ();
@@ -2813,6 +2814,15 @@
   emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
   break;
 
+case LTGT:
+  /* LTGT is not guranteed to not generate a FP exception.  So let's
+	 go the faster way : ((a > b) || (b > a)).  */
+  emit_insn (gen_aarch64_cmgt (operands[0],
+	 operands[2], operands[3]));
+  emit_insn (gen_aarch64_cmgt (tmp, operands[3], operands[2]));
+  emit_insn (gen_ior3 (operands[0], operands[0], tmp));
+  break;
+
 case UNORDERED:
   /* Operands are ORDERED iff (a > b || b >= a), so we can compute
 	 UNORDERED as !ORDERED.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 75a6c0d0421354d7c0759292947eb5d407f5b703..3efb1b7548ea9b0ea5644d99a0677dbe5baba2ef 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4962,13 +4962,13 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
 	case UNGT:
 	case UNGE:
 	case UNEQ:
-	case LTGT:
 	  return CCFPmode;
 
 	case LT:
 	case LE:
 	case GT:
 	case GE:
+	case LTGT:
 	  return CCFPEmode;
 
 	default:
diff --git a/gcc/testsuite/gcc.dg/pr81228.c b/gcc/testsuite/gcc.dg/pr81228.c
new file mode 100644
index ..f7eecc510ad2acaa656a1ce5df0aafffa56b3bd9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr81228.c
@@ -0,0 +1,21 @@
+/* PR target/81228.  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-ssa" } */
+
+void *a;
+
+void b ()
+{
+  char c;
+  long d;
+  char *e = a;
+  for (; d; d++)
+  {
+double f, g;
+c = g < f || g > f;
+e[d] = c;
+  }
+}
+
+/* Let's make sure we do have a LTGT.  */
+/* { dg-final { scan-tree-dump "<>" "ssa" } } */


Re: [PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp

2017-12-14 Thread Sudakshina Das

Hi

On 13/12/17 16:56, James Greenhalgh wrote:

On Wed, Dec 13, 2017 at 04:45:33PM +, Sudi Das wrote:

On 13/12/17 16:42, Sudakshina Das wrote:

Hi

This patch is a follow up to the existing discussions on
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html
Bin had earlier submitted a patch to fix the ICE that occurs because of
the missing LTGT in aarch64-simd.md.
That discussion opened up a new bug report PR81647 for an inconsistent
behavior.

As discussed earlier on the gcc-patches discussion and on the bug
report, PR81647 was occurring because of how UNEQ was handled in
aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is
guaranteed to not give an FP exception but LTGT might,
__builtin_islessgreater gets converted to ~UNEQ very early on in
fold_builtin_unordered_cmp. Thus I will post a separate patch for
correcting how UNEQ and other unordered comparisons are handled in
aarch64-simd.md.

This patch is only adding the missing LTGT to plug the ICE.

Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and added a new compile time test case that gives
out LTGT to make sure it doesn't ICE.

Is this ok for trunk?


OK.

Thanks,
James



Thanks for the review.
Committed as r255625.
I think this needs a back-port as well to gcc-7-branch. Is that ok?

Sudi


[PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c

2017-12-14 Thread Sudakshina Das

Hi

This patch is a follow up on my previous patch with r255536 that was a 
back-port for fixing a wrong code generation

(https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html).
As pointed out by Christophe Lyon 
(https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test case 
started to fail on the new dejagnu for arm-none-linux-gnueabi and 
arm-none-eabi.
This patch just removes the dg-add-options from the test case because I 
think dg-options has all that is needed anyway.


Testing: Since I could not reproduce the failure on my machine, 
Christophe would it be possible for you to check if this patch fixes the

regression for you?

Thanks
Sudi
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
index ac7d4e3f2a9fb1d70a9ce95062dc6db4a69272ff..09adddfd57ca13e831c276aef25621e7340bcfff 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
@@ -1,7 +1,6 @@
 /* { dg-do compile }  */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
 /* { dg-options "-O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard" }  */
-/* { dg-add-options arm_v8_2a_fp16_scalar }  */
 
 __fp16
 test_select (__fp16 a, __fp16 b, __fp16 c)


Re: [PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c

2017-12-14 Thread Sudakshina Das

Hi

On 14/12/17 17:37, Christophe Lyon wrote:

On 14 December 2017 at 17:05, Sudakshina Das  wrote:

Hi

This patch is a follow up on my previous patch with r255536 that was a
back-port for fixing a wrong code generation
(https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html).
As pointed out by Christophe Lyon
(https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test case
started to fail on the new dejagnu for arm-none-linux-gnueabi and
arm-none-eabi.
This patch just removes the dg-add-options from the test case because I
think dg-options has all that is needed anyway.

Testing: Since I could not reproduce the failure on my machine, Christophe
would it be possible for you to check if this patch fixes the
regression for you?



Manually  tested on one of the offending configs, it did the trick.
Thanks



Thank you so much. I will wait for an OK and commit it!

Sudi


Christophe


Thanks
Sudi




Re: [PATCH][ARM][gcc-7] Fix regression on soft float targets for armv8_2-fp16-move-2.c

2017-12-15 Thread Sudakshina Das

Hi

On 14/12/17 18:26, Kyrill Tkachov wrote:


On 14/12/17 18:17, Sudi Das wrote:

Hi

On 14/12/17 17:37, Christophe Lyon wrote:
> On 14 December 2017 at 17:05, Sudakshina Das  wrote:
>> Hi
>>
>> This patch is a follow up on my previous patch with r255536 that was a
>> back-port for fixing a wrong code generation
>> (https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html).
>> As pointed out by Christophe Lyon
>> (https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00718.html) the test 
case

>> started to fail on the new dejagnu for arm-none-linux-gnueabi and
>> arm-none-eabi.
>> This patch just removes the dg-add-options from the test case 
because I

>> think dg-options has all that is needed anyway.
>>
>> Testing: Since I could not reproduce the failure on my machine, 
Christophe

>> would it be possible for you to check if this patch fixes the
>> regression for you?
>>
>
> Manually  tested on one of the offending configs, it did the trick.
> Thanks
>

Thank you so much. I will wait for an OK and commit it!



Thanks Sudi and Christophe.
The patch is ok with an appropriate ChangeLog entry.




Thanks Kyrill. I added the ChanngeLog entry. Committed with r255681.

Sudi


Kyrill


Sudi

> Christophe
>
>> Thanks
>> Sudi







[PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md

2017-12-15 Thread Sudakshina Das

Hi

This patch fixes the inconsistent behavior observed at -O3 for the 
unordered comparisons. According to the online docs 
(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
all of the following should not raise an FP exception:

- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception raising 
instructions such as fcmgt. This patch changes the instructions that are 
emitted to in order to not give out the exceptions. We first check each 
operand for NaNs and force any elements containing NaN to zero before 
using them in the compare.


Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : 
a, isnan (b) ? 0.0 : b))



The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).


Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new test case.


Is this ok for trunk? This will probably need a back-port to 
gcc-7-branch as well.


Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  

PR target/81647
	* config/aarch64/aarch64-simd.md (vec_cmp): Modify 
instructions for

UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  

PR target/81647
* gcc.target/aarch64/pr81647.c: New.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2731,10 +2731,10 @@
 	  break;
 	}
   /* Fall through.  */
-case UNGE:
+case UNLT:
   std::swap (operands[2], operands[3]);
   /* Fall through.  */
-case UNLE:
+case UNGT:
 case GT:
   comparison = gen_aarch64_cmgt;
   break;
@@ -2745,10 +2745,10 @@
 	  break;
 	}
   /* Fall through.  */
-case UNGT:
+case UNLE:
   std::swap (operands[2], operands[3]);
   /* Fall through.  */
-case UNLT:
+case UNGE:
 case GE:
   comparison = gen_aarch64_cmge;
   break;
@@ -2771,21 +2771,35 @@
 case UNGT:
 case UNLE:
 case UNLT:
-case NE:
-  /* FCM returns false for lanes which are unordered, so if we use
-	 the inverse of the comparison we actually want to emit, then
-	 invert the result, we will end up with the correct result.
-	 Note that a NE NaN and NaN NE b are true for all a, b.
-
-	 Our transformations are:
-	 a UNGE b -> !(b GT a)
-	 a UNGT b -> !(b GE a)
-	 a UNLE b -> !(a GT b)
-	 a UNLT b -> !(a GE b)
-	 a   NE b -> !(a EQ b)  */
-  gcc_assert (comparison != NULL);
-  emit_insn (comparison (operands[0], operands[2], operands[3]));
-  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
+  {
+	/* All of the above must not raise any FP exceptions.  Thus we first
+	   check each operand for NaNs and force any elements containing NaN to
+	   zero before using them in the compare.
+	   Example: UN (a, b) -> UNORDERED (a, b) |
+ (cm (isnan (a) ? 0.0 : a,
+	  isnan (b) ? 0.0 : b))
+	   We use the following transformations for doing the comparisions:
+	   a UNGE b -> a GE b
+	   a UNGT b -> a GT b
+	   a UNLE b -> b GE a
+	   a UNLT b -> b GT a.  */
+
+	rtx tmp0 = gen_reg_rtx (mode);
+	rtx tmp1 = gen_reg_rtx (mode);
+	rtx tmp2 = gen_reg_rtx (mode);
+	emit_insn (gen_aarch64_cmeq (tmp0, operands[2], operands[2]));
+	emit_insn (gen_aarch64_cmeq (tmp1, operands[3], operands[3]));
+	emit_insn (gen_and3 (tmp2, tmp0, tmp1));
+	emit_insn (gen_and3 (tmp0, tmp0,
+			lowpart_subreg (mode, operands[2], mode)));
+	emit_insn (gen_and3 (tmp1, tmp1,
+			lowpart_subreg (mode, operands[3], mode)));
+	gcc_assert (comparison != NULL);
+	emit_insn (comparison (operands[0],
+			   lowpart_subreg (mode, tmp0, mode),
+			   lowpart_subreg (mode, tmp1, mode)));
+	emit_insn (gen_orn3 (operands[0], tmp2, operands[0]));
+  }
   break;
 
 case LT:
@@ -2793,25 +2807,19 @@
 case GT:
 case GE:
 case EQ:
+case NE:
   /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
 	 As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
 	 a GE b -> a GE b
 	 a GT b -> a GT b
 	 a LE b -> b GE a
 	 a LT b -> b GT a
-	 a EQ b -> a EQ b  */
+	 a EQ b -> a EQ b
+	 a NE b -> ~(a EQ b)  */
   gcc_assert (comparison != NULL);
   emit_insn (comparison (operands[0], operands[2], operands[3]));
-  break;
-
-case UNEQ:
-  /* We first check (a > b ||  b > a) which is !UNEQ, inverting
-	 this result will then give us (a == b || a UNORDERED b).  */
-  emit_insn (gen_aarch64_cmgt (operands[0],
-	 operands

Re: [PATCH PR81228][AARCH64] Fix ICE by adding LTGT in vec_cmp

2017-12-18 Thread Sudakshina Das

On 14/12/17 10:38, Sudakshina Das wrote:

Hi

On 13/12/17 16:56, James Greenhalgh wrote:

On Wed, Dec 13, 2017 at 04:45:33PM +, Sudi Das wrote:

On 13/12/17 16:42, Sudakshina Das wrote:

Hi

This patch is a follow up to the existing discussions on
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01904.html
Bin had earlier submitted a patch to fix the ICE that occurs because of
the missing LTGT in aarch64-simd.md.
That discussion opened up a new bug report PR81647 for an inconsistent
behavior.

As discussed earlier on the gcc-patches discussion and on the bug
report, PR81647 was occurring because of how UNEQ was handled in
aarch64-simd.md rather than LTGT. Since __builtin_islessgreater is
guaranteed to not give an FP exception but LTGT might,
__builtin_islessgreater gets converted to ~UNEQ very early on in
fold_builtin_unordered_cmp. Thus I will post a separate patch for
correcting how UNEQ and other unordered comparisons are handled in
aarch64-simd.md.

This patch is only adding the missing LTGT to plug the ICE.

Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and added a new compile time test case that 
gives

out LTGT to make sure it doesn't ICE.

Is this ok for trunk?


OK.

Thanks,
James



Thanks for the review.
Committed as r255625.
I think this needs a back-port as well to gcc-7-branch. Is that ok?

Sudi


Backport Ping!

Sudi


Re: Replying to an older patch ([PATCH] Fix failing test-case)

2017-12-19 Thread Sudakshina Das

Hi Martin

On 19/12/17 10:49, Martin Liška wrote:

On 11/30/2017 12:03 PM, Sudakshina Das wrote:

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01157.html

This patch fixed a test case switch-case-2.c. I am seeing switch-case-1.c 
failing on

arm-none-linux-gnueabihf:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; basic 
block[^\\n]*count 2000" 1 (found 0 times)

aarch64-none-linux-gnu:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; basic 
block[^\\n]*count 2000" 1 (found 2 times)

which looks pretty similar (also the same changes make it pass). Can you 
confirm?

Sudi


Hello.

There's patch for that. Can you please test it?



I have tested these changes and the test case passes for both 
arm-none-linux-gnueabihf and aarch64-none-linux-gnu.


Thanks
Sudi


Martin





[PATCH PR82439][simplify-rtx] Simplify (x | y) == x -> (y & ~x) == 0

2018-01-03 Thread Sudakshina Das

Hi

This patch add support for the missing transformation of (x | y) == x -> 
(y & ~x) == 0.
The transformation for (x & y) == x case already exists in 
simplify-rtx.c since 2014 as of r218503 and this patch only adds a 
couple of extra patterns for the IOR case.


Citing the example given in PR82439 :

Simple testcase (f1 should generate the same as f2):

int f1(int x, int y) { return (x | y) == x; }
int f2(int x, int y) { return (y & ~x) == 0; }

f1:
orr w1, w0, w1
cmp w1, w0
csetw0, eq
ret
f2:
bicswzr, w1, w0
csetw0, eq
ret

This benefits targets that have the BICS instruction to generate better 
code. Wilco helped me in showing that even in targets that do not have 
the BICS instruction, this is no worse and gives out 2 instructions.

For example in x86:

 :
   0:   09 fe   or %edi,%esi
   2:   31 c0   xor%eax,%eax
   4:   39 fe   cmp%edi,%esi
   6:   0f 94 c0sete   %al
   9:   c3  retq

0010 :
  10:   f7 d7   not%edi
  12:   31 c0   xor%eax,%eax
  14:   85 f7   test   %esi,%edi
  16:   0f 94 c0sete   %al
  19:   c3  retq

Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and arm-none-linux-gnueabihf and added new test 
cases.

Is this ok for trunk?

Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-01-03  Sudakshina Das  

PR target/82439
* simplify-rtx.c (simplify_relational_operation_1): Add
simplifications of (x|y) == x for BICS pattern.

*** gcc/testsuite/ChangeLog ***

2017-01-03  Sudakshina Das  

PR target/82439
* gcc.target/aarch64/bics_5.c: New test.
* gcc.target/arm/bics_5.c: Likewise.


Re: [PATCH PR82439][simplify-rtx] Simplify (x | y) == x -> (y & ~x) == 0

2018-01-03 Thread Sudakshina Das

Hi

On 03/01/18 14:38, Segher Boessenkool wrote:

Hi!

On Wed, Jan 03, 2018 at 01:57:38PM +, Sudakshina Das wrote:

This patch add support for the missing transformation of (x | y) == x ->
(y & ~x) == 0.



Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and arm-none-linux-gnueabihf and added new test
cases.
Is this ok for trunk?


You forgot to include the patch :-)


(facepalm) This is the second time this has happened to me!
Sorry about this. Attaching now.

Sudi




Segher



diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index e5cfd3d..17536d0 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5032,34 +5032,38 @@ simplify_relational_operation_1 (enum rtx_code code, machine_mode mode,
 simplify_gen_binary (XOR, cmp_mode,
 			 XEXP (op0, 1), op1));
 
-  /* (eq/ne (and x y) x) simplifies to (eq/ne (and (not y) x) 0), which
- can be implemented with a BICS instruction on some targets, or
- constant-folded if y is a constant.  */
+  /* Simplify eq/ne (and/ior x y) x/y) for targets with a BICS instruction or
+ constant folding if x/y is a constant.  */
   if ((code == EQ || code == NE)
-  && op0code == AND
-  && rtx_equal_p (XEXP (op0, 0), op1)
+  && (op0code == AND || op0code == IOR)
   && !side_effects_p (op1)
   && op1 != CONST0_RTX (cmp_mode))
 {
-  rtx not_y = simplify_gen_unary (NOT, cmp_mode, XEXP (op0, 1), cmp_mode);
-  rtx lhs = simplify_gen_binary (AND, cmp_mode, not_y, XEXP (op0, 0));
+  /* Both (eq/ne (and x y) x) and (eq/ne (ior x y) y) simplify to
+	 (eq/ne (and (not y) x) 0).  */
+  if ((op0code == AND && rtx_equal_p (XEXP (op0, 0), op1))
+	  || (op0code == IOR && rtx_equal_p (XEXP (op0, 1), op1)))
+	{
+	  rtx not_y = simplify_gen_unary (NOT, cmp_mode, XEXP (op0, 1),
+	  cmp_mode);
+	  rtx lhs = simplify_gen_binary (AND, cmp_mode, not_y, XEXP (op0, 0));
 
-  return simplify_gen_relational (code, mode, cmp_mode, lhs,
-  CONST0_RTX (cmp_mode));
-}
+	  return simplify_gen_relational (code, mode, cmp_mode, lhs,
+	  CONST0_RTX (cmp_mode));
+	}
 
-  /* Likewise for (eq/ne (and x y) y).  */
-  if ((code == EQ || code == NE)
-  && op0code == AND
-  && rtx_equal_p (XEXP (op0, 1), op1)
-  && !side_effects_p (op1)
-  && op1 != CONST0_RTX (cmp_mode))
-{
-  rtx not_x = simplify_gen_unary (NOT, cmp_mode, XEXP (op0, 0), cmp_mode);
-  rtx lhs = simplify_gen_binary (AND, cmp_mode, not_x, XEXP (op0, 1));
+  /* Both (eq/ne (and x y) y) and (eq/ne (ior x y) x) simplify to
+	 (eq/ne (and (not x) y) 0).  */
+  if ((op0code == AND && rtx_equal_p (XEXP (op0, 1), op1))
+	  || (op0code == IOR && rtx_equal_p (XEXP (op0, 0), op1)))
+	{
+	  rtx not_x = simplify_gen_unary (NOT, cmp_mode, XEXP (op0, 0),
+	  cmp_mode);
+	  rtx lhs = simplify_gen_binary (AND, cmp_mode, not_x, XEXP (op0, 1));
 
-  return simplify_gen_relational (code, mode, cmp_mode, lhs,
-  CONST0_RTX (cmp_mode));
+	  return simplify_gen_relational (code, mode, cmp_mode, lhs,
+	  CONST0_RTX (cmp_mode));
+	}
 }
 
   /* (eq/ne (bswap x) C1) simplifies to (eq/ne x C2) with C2 swapped.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/bics_5.c b/gcc/testsuite/gcc.target/aarch64/bics_5.c
new file mode 100644
index 000..b9c2c40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/bics_5.c
@@ -0,0 +1,86 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps -fno-inline" } */
+
+extern void abort (void);
+
+int
+bics_si_test1 (int a, int b, int c)
+{
+  if ((a | b) == a)
+return a;
+  else
+return c;
+}
+
+int
+bics_si_test2 (int a, int b, int c)
+{
+  if ((a | b) == b)
+return b;
+  else
+return c;
+}
+
+typedef long long s64;
+
+s64
+bics_di_test1 (s64 a, s64 b, s64 c)
+{
+  if ((a | b) == a)
+return a;
+  else
+return c;
+}
+
+s64
+bics_di_test2 (s64 a, s64 b, s64 c)
+{
+  if ((a | b) == b)
+return b;
+  else
+return c;
+}
+
+int
+main ()
+{
+  int x;
+  s64 y;
+
+  x = bics_si_test1 (0xf00d, 0xf11f, 0);
+  if (x != 0)
+abort ();
+
+  x = bics_si_test1 (0xf11f, 0xf00d, 0);
+  if (x != 0xf11f)
+abort ();
+
+  x = bics_si_test2 (0xf00d, 0xf11f, 0);
+  if (x != 0xf11f)
+abort ();
+
+  x = bics_si_test2 (0xf11f, 0xf00d, 0);
+  if (x != 0)
+abort ();
+
+  y = bics_di_test1 (0x10001000f00dll, 0x12341000f00dll, 0ll);
+  if (y != 0)
+abort ();
+
+  y = bics_di_test1 (0x12341000f00dll, 0x10001000f00dll, 0ll);
+  if (y != 0x12341000f00dll)
+abort ();
+
+  y = bics_di_test2 (0x10001000f00dll, 0x12341000f00dll, 0ll);
+  if (y != 0x12341000f00dll)
+abort ();
+
+  y = bics_di_test2 (0x12341000f00dll, 0x10001000f00dll, 0ll);
+  if (y != 0)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "bics\twzr, w\[0-9\]+, w\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler

[PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-04 Thread Sudakshina Das

Hi

The bug reported a particular test di-longlong64-sync-1.c failing when 
run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3] 
and -mthumb -march=armv6 -O[g,1,2,3].


According to what I could see, the crash was caused because of the 
explicit VOIDmode argument that was sent to emit_store_flag_force (). 
Since the comparing argument was a long long, it was being forced into a 
VOID type register before the comparison (in prepare_cmp_insn()) is done.


As pointed out by Kyrill, there is a comment on emit_store_flag() which 
says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. 
 If it is VOIDmode, they cannot both be CONST_INT". This condition is 
not true in this case and thus I think it is suitable to change the 
argument.


Testing done: Checked for regressions on bootstrapped 
arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test 
cases.


Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-04  Sudakshina Das  

PR target/82096
* optabs.c (expand_atomic_compare_and_swap): Change argument
to emit_store_flag_force.

*** gcc/testsuite/ChangeLog ***

2017-01-04  Sudakshina Das  

PR target/82096
* gcc.c-torture/compile/pr82096-1.c: New test.
* gcc.c-torture/compile/pr82096-2.c: Likwise.
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 225e955..45b018e 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6295,7 +6295,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval,
   if (cc_reg)
 	{
 	  target_bool = emit_store_flag_force (target_bool, EQ, cc_reg,
-	   const0_rtx, VOIDmode, 0, 1);
+	   const0_rtx, mode, 0, 1);
 	  goto success;
 	}
   goto success_bool_from_val;
@@ -6323,7 +6323,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval,
 
  success_bool_from_val:
target_bool = emit_store_flag_force (target_bool, EQ, target_oval,
-	expected, VOIDmode, 1, 1);
+	expected, mode, 1, 1);
  success:
   /* Make sure that the oval output winds up where the caller asked.  */
   if (ptarget_oval)
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c
new file mode 100644
index 000..07eb5f6
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { arm*-*-* } } } */
+/* { dg-options "-march=armv5t -mthumb -mfloat-abi=soft" } */
+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll));
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c
new file mode 100644
index 000..2570b16
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { arm*-*-* } } } */
+/* { dg-options "-march=armv6 -mthumb -mfloat-abi=soft" } */
+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll));
+}


Re: [PATCH PR82439][simplify-rtx] Simplify (x | y) == x -> (y & ~x) == 0

2018-01-05 Thread Sudakshina Das

Hi Jeff

On 04/01/18 18:30, Jeff Law wrote:

On 01/03/2018 06:57 AM, Sudakshina Das wrote:

Hi

This patch add support for the missing transformation of (x | y) == x ->
(y & ~x) == 0.
The transformation for (x & y) == x case already exists in
simplify-rtx.c since 2014 as of r218503 and this patch only adds a
couple of extra patterns for the IOR case.

Citing the example given in PR82439 :

Simple testcase (f1 should generate the same as f2):

int f1(int x, int y) { return (x | y) == x; }
int f2(int x, int y) { return (y & ~x) == 0; }

f1:
 orr    w1, w0, w1
 cmp    w1, w0
 cset    w0, eq
 ret
f2:
 bics    wzr, w1, w0
 cset    w0, eq
 ret

This benefits targets that have the BICS instruction to generate better
code. Wilco helped me in showing that even in targets that do not have
the BICS instruction, this is no worse and gives out 2 instructions.
For example in x86:

 :
    0:    09 fe    or %edi,%esi
    2:    31 c0    xor    %eax,%eax
    4:    39 fe    cmp    %edi,%esi
    6:    0f 94 c0 sete   %al
    9:    c3   retq

0010 :
   10:    f7 d7    not    %edi
   12:    31 c0    xor    %eax,%eax
   14:    85 f7    test   %esi,%edi
   16:    0f 94 c0 sete   %al
   19:    c3   retq

Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and arm-none-linux-gnueabihf and added new test
cases.
Is this ok for trunk?

Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-01-03  Sudakshina Das  

 PR target/82439
 * simplify-rtx.c (simplify_relational_operation_1): Add
 simplifications of (x|y) == x for BICS pattern.

*** gcc/testsuite/ChangeLog ***

2017-01-03  Sudakshina Das  

 PR target/82439
 * gcc.target/aarch64/bics_5.c: New test.
 * gcc.target/arm/bics_5.c: Likewise.

OK.


Thanks! Committed as r256275.

Sudi


jeff





Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-05 Thread Sudakshina Das

Hi Kyrill

On 04/01/18 16:36, Kyrill Tkachov wrote:

Hi Sudi,

On 04/01/18 15:35, Sudakshina Das wrote:

Hi

The bug reported a particular test di-longlong64-sync-1.c failing when
run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3]
and -mthumb -march=armv6 -O[g,1,2,3].

According to what I could see, the crash was caused because of the
explicit VOIDmode argument that was sent to emit_store_flag_force ().
Since the comparing argument was a long long, it was being forced into a
VOID type register before the comparison (in prepare_cmp_insn()) is done.

As pointed out by Kyrill, there is a comment on emit_store_flag() which
says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs.
  If it is VOIDmode, they cannot both be CONST_INT". This condition is
not true in this case and thus I think it is suitable to change the
argument.

Testing done: Checked for regressions on bootstrapped
arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test
cases.

Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-04  Sudakshina Das  

    PR target/82096
    * optabs.c (expand_atomic_compare_and_swap): Change argument
    to emit_store_flag_force.

*** gcc/testsuite/ChangeLog ***

2017-01-04  Sudakshina Das  

    PR target/82096
    * gcc.c-torture/compile/pr82096-1.c: New test.
    * gcc.c-torture/compile/pr82096-2.c: Likwise.


diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c

new file mode 100644
index 000..07eb5f6
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { arm*-*-* } } } */
+/* { dg-options "-march=armv5t -mthumb -mfloat-abi=soft" } */

The tests in gcc.c-torture/compile/ are supposed to be 
target-independent, so

it's best to not gate it on target arm*-*-*. Best to add the arm-specific
options that you want using a dg-additional-options directive gated on 
target arm*-*-*.


+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 
0x1234567890ll));

+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c

new file mode 100644
index 000..2570b16
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { arm*-*-* } } } */
+/* { dg-options "-march=armv6 -mthumb -mfloat-abi=soft" } */
+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 
0x1234567890ll));

+}

I don't think we need an armv6 test here as the root cause is the same 
as on armv5t AFAICT

so this won't give us any extra code coverage.



I have made the changes in the test files as requested.

Thanks
Sudi


Thanks,
Kyrill



diff --git a/gcc/optabs.c b/gcc/optabs.c
index 225e955..45b018e 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6295,7 +6295,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval,
   if (cc_reg)
 	{
 	  target_bool = emit_store_flag_force (target_bool, EQ, cc_reg,
-	   const0_rtx, VOIDmode, 0, 1);
+	   const0_rtx, mode, 0, 1);
 	  goto success;
 	}
   goto success_bool_from_val;
@@ -6323,7 +6323,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval,
 
  success_bool_from_val:
target_bool = emit_store_flag_force (target_bool, EQ, target_oval,
-	expected, VOIDmode, 1, 1);
+	expected, mode, 1, 1);
  success:
   /* Make sure that the oval output winds up where the caller asked.  */
   if (ptarget_oval)
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
new file mode 100644
index 000..9fed28c
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
@@ -0,0 +1,9 @@
+/* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */
+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll));
+}


Re: [PATCH PR81647][AARCH64] PING Fix handling of Unordered Comparisons in aarch64-simd.md

2018-01-05 Thread Sudakshina Das

PING

On 15/12/17 11:57, Sudakshina Das wrote:

Hi

This patch fixes the inconsistent behavior observed at -O3 for the 
unordered comparisons. According to the online docs 
(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
all of the following should not raise an FP exception:

- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception raising 
instructions such as fcmgt. This patch changes the instructions that are 
emitted to in order to not give out the exceptions. We first check each 
operand for NaNs and force any elements containing NaN to zero before 
using them in the compare.


Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : 
a, isnan (b) ? 0.0 : b))



The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).


Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new test case.


Is this ok for trunk? This will probably need a back-port to 
gcc-7-branch as well.


Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * config/aarch64/aarch64-simd.md (vec_cmp): 
Modify instructions for

 UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * gcc.target/aarch64/pr81647.c: New.




Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-05 Thread Sudakshina Das

Hi Jeff

On 05/01/18 18:44, Jeff Law wrote:

On 01/04/2018 08:35 AM, Sudakshina Das wrote:

Hi

The bug reported a particular test di-longlong64-sync-1.c failing when
run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3]
and -mthumb -march=armv6 -O[g,1,2,3].

According to what I could see, the crash was caused because of the
explicit VOIDmode argument that was sent to emit_store_flag_force ().
Since the comparing argument was a long long, it was being forced into a
VOID type register before the comparison (in prepare_cmp_insn()) is done.

As pointed out by Kyrill, there is a comment on emit_store_flag() which
says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs.
  If it is VOIDmode, they cannot both be CONST_INT". This condition is
not true in this case and thus I think it is suitable to change the
argument.

Testing done: Checked for regressions on bootstrapped
arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test
cases.

Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-04  Sudakshina Das  

 PR target/82096
 * optabs.c (expand_atomic_compare_and_swap): Change argument
 to emit_store_flag_force.

*** gcc/testsuite/ChangeLog ***

2017-01-04  Sudakshina Das  

 PR target/82096
 * gcc.c-torture/compile/pr82096-1.c: New test.
 * gcc.c-torture/compile/pr82096-2.c: Likwise.

In the case where both (op0/op1) to
emit_store_flag/emit_store_flag_force are constants, don't we know the
result of the comparison and shouldn't we have optimized the store flag
to something simpler?

I feel like I must be missing something here.



emit_store_flag_force () is comparing a register to op0.

The 2 constant arguments are to the expand_atomic_compare_and_swap () 
function. emit_store_flag_force () is used in case when this function is 
called by the bool variant of the built-in function where the bool 
return value is computed by comparing the result register with the 
expected op0.


Sudi


Jeff






Re: C++ PATCHes to xvalue handling

2018-05-25 Thread Sudakshina Das

On 23/05/18 18:21, Jason Merrill wrote:

The first patch implements the adjustments from core issues 616 and
1213 to the value category of subobjects of class prvalues: they were
considered prvalues themselves, but that was kind of nonsensical.  Now
they are considered xvalues.  Along with this, I've removed the
diagnostic distinction between xvalues and prvalues when trying to use
one or the other as an lvalue; the important thing is that they are
rvalues.

The second patch corrects various issues with casts and xvalues/rvalue
references: we were treating an xvalue operand to dynamic_cast as an
lvalue, and we were objecting to casts from prvalue to rvalue
reference type.



With the second patch:
commit f7d2790049fd1e59af4b69ee12f7c101cfe4cdab
Author: jason 
Date:   Wed May 23 17:21:39 2018 +

Fix cast to rvalue reference from prvalue.

* cvt.c (diagnose_ref_binding): Handle rvalue reference.
* rtti.c (build_dynamic_cast_1): Don't try to build a reference to
non-class type.  Handle xvalue argument.
* typeck.c (build_reinterpret_cast_1): Allow cast from prvalue to
rvalue reference.
* semantics.c (finish_compound_literal): Do direct-initialization,
not cast, to initialize a reference.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@260622 
138bc75d-0d04-0410-961f-82ee72b054a4


I have observed the following failure in Spec2017 while building 
510.parest_r on aarch64-none-linux-gnu


aarch64-none-linux-gnu-g++ -c -o 
source/numerics/matrices.all_dimensions.o -DSPEC -DNDEBUG -Iinclude -I. 
-DSPEC_AUTO_SUPPRESS_OPENMP  -mcpu=cortex-a57+crypto -Ofast 
-fomit-frame-pointer -fpermissive-DSPEC_LP64 
source/numerics/matrices.all_dimensions.cc


source/numerics/matrices.all_dimensions.cc: In static member function 
'static void dealii::MatrixTools::apply_boundary_values(const 
std::map&, dealii::BlockSparseMatrix&, 
dealii::BlockVector&, dealii::BlockVector&, bool)':


source/numerics/matrices.all_dimensions.cc:469:50: error: lvalue 
required as unary '&' operand


[this_sparsity.get_rowstart_indices()[row]];

  ^

source/numerics/matrices.all_dimensions.cc:472:55: error: lvalue 
required as unary '&' operand


   [this_sparsity.get_rowstart_indices()[row]+1],

   ^

source/numerics/matrices.all_dimensions.cc:474:55: error: lvalue 
required as unary '&' operand


   [this_sparsity.get_rowstart_indices()[row+1]],

   ^

source/numerics/matrices.all_dimensions.cc:479:49: error: lvalue 
required as unary '&' operand


   [this_sparsity.get_rowstart_indices()[row]],

 ^

source/numerics/matrices.all_dimensions.cc:481:51: error: lvalue 
required as unary '&' operand


   [this_sparsity.get_rowstart_indices()[row+1]],

   ^

source/numerics/matrices.all_dimensions.cc:510:50: error: lvalue 
required as unary '&' operand


  [this_sparsity.get_rowstart_indices()[0]]);

Sudi


Tested x86_64-pc-linux-gnu, applying to trunk.





Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-07 Thread Sudakshina Das

On 02/05/18 18:28, Jeff Law wrote:

On 03/14/2018 11:40 AM, Sudakshina Das wrote:

Hi

This patch is another partial fix for PR 84521. This is adding a
definition to one of the target hooks used in the SJLJ implemetation so
that AArch64 defines the hard_frame_pointer_rtx as the
TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is
still a lot more work to be done for these builtins in the future.

Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added
new test.

Is this ok for trunk?

Sudi


*** gcc/ChangeLog ***

2018-03-14  Sudakshina Das  

 * builtins.c (expand_builtin_setjmp_receiver): Update condition
 to restore frame pointer.
 * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update
 comment.
 * config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value):
 New.
 (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define.

*** gcc/testsuite/ChangeLog ***

2018-03-14  Sudakshina Das  

 * gcc.c-torture/execute/pr84521.c: New test.

So just to be clear, you do _not_ want the frame pointer restored here?
Right?

aarch64_builtin_setjmp_frame_value always returns hard_frame_pointer_rtx
which will cause the generic code in builtins.c to not restore the frame
pointer.

Have you looked at other targets which define builtin_setjmp_frame_value
to determine if they'll do the right thing.  x86 and sparc are the most
important.  I see that arc, vax and avr also define that hook, but are
obviously harder to test.



Sorry this fell off my radar. I have reg-tested it on x86 and tried it
on the sparc machine from the gcc farm but I think I couldn't finished
the run and now its showing to he unreachable.

Sudi


jeff






Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-14 Thread Sudakshina Das

Hi Eric

On 07/06/18 16:33, Eric Botcazou wrote:

Sorry this fell off my radar. I have reg-tested it on x86 and tried it
on the sparc machine from the gcc farm but I think I couldn't finished
the run and now its showing to he unreachable.


The patch is a no-op for SPARC because it defines the nonlocal_goto pattern.

But I would nevertheless strongly suggest _not_ fiddling with the generic code
like that and just defining the nonlocal_goto pattern for Aarch64 instead.



Thank you for the suggestion, I have edited the patch accordingly and
defined the nonlocal_goto pattern for AArch64. This has also helped take
care of the issue with __builtin_longjmp that Wilco had mentioned in his
comment on the PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521#c19).

I have also modified the test case according to Wilco's comment to add 
an extra jump buffer. This test case passes with AArch64 but fails on

x86 trunk as follows (It may fail on other targets as well):

FAIL: gcc.c-torture/execute/pr84521.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test

Testing: Bootstrapped and regtested on aarch64-none-linux-gnu.

Is this ok for trunk?

Sudi

*** gcc/ChangeLog ***

2018-06-14  Sudakshina Das  

PR target/84521
* config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment.
* config/aarch64/aarch64.c (aarch64_needs_frame_chain): Add
cfun->has_nonlocal_label to force frame chain.
(aarch64_builtin_setjmp_frame_value): New.
(TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define.
* config/aarch64/aarch64.md (nonlocal_goto): New.

*** gcc/testsuite/ChangeLog ***

2018-06-14  Sudakshina Das  

PR target/84521
* gcc.c-torture/execute/pr84521.c: New test.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 976f9af..f042def 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version;
 #define EH_RETURN_STACKADJ_RTX	gen_rtx_REG (Pmode, R4_REGNUM)
 #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
 
-/* Don't use __builtin_setjmp until we've defined it.  */
+/* Don't use __builtin_setjmp until we've defined it.
+   CAUTION: This macro is only used during exception unwinding.
+   Don't fall for its name.  */
 #undef DONT_USE_BUILTIN_SETJMP
 #define DONT_USE_BUILTIN_SETJMP 1
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bd0ac2f..95f7fe3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3998,7 +3998,7 @@ static bool
 aarch64_needs_frame_chain (void)
 {
   /* Force a frame chain for EH returns so the return address is at FP+8.  */
-  if (frame_pointer_needed || crtl->calls_eh_return)
+  if (frame_pointer_needed || crtl->calls_eh_return || cfun->has_nonlocal_label)
 return true;
 
   /* A leaf function cannot have calls or write LR.  */
@@ -12213,6 +12213,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
   expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
 }
 
+/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
+static rtx
+aarch64_builtin_setjmp_frame_value (void)
+{
+  return hard_frame_pointer_rtx;
+}
+
 /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
 
 static tree
@@ -17829,6 +17836,9 @@ aarch64_run_selftests (void)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN aarch64_fold_builtin
 
+#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
+#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
+
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG aarch64_function_arg
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 830f976..381fd83 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6081,6 +6081,30 @@
   DONE;
 })
 
+;; This is broadly similar to the builtins.c except that it uses
+;; temporaries to load the incoming SP and FP.
+(define_expand "nonlocal_goto"
+  [(use (match_operand 0 "general_operand"))
+   (use (match_operand 1 "general_operand"))
+   (use (match_operand 2 "general_operand"))
+   (use (match_operand 3 "general_operand"))]
+  ""
+{
+rtx label_in = copy_to_reg (operands[1]);
+rtx fp_in = copy_to_reg (operands[3]);
+rtx sp_in = copy_to_reg (operands[2]);
+
+emit_move_insn (hard_frame_pointer_rtx, fp_i

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-25 Thread Sudakshina Das

PING!

On 14/06/18 12:10, Sudakshina Das wrote:

Hi Eric

On 07/06/18 16:33, Eric Botcazou wrote:

Sorry this fell off my radar. I have reg-tested it on x86 and tried it
on the sparc machine from the gcc farm but I think I couldn't finished
the run and now its showing to he unreachable.


The patch is a no-op for SPARC because it defines the nonlocal_goto 
pattern.


But I would nevertheless strongly suggest _not_ fiddling with the 
generic code
like that and just defining the nonlocal_goto pattern for Aarch64 
instead.




Thank you for the suggestion, I have edited the patch accordingly and
defined the nonlocal_goto pattern for AArch64. This has also helped take
care of the issue with __builtin_longjmp that Wilco had mentioned in his
comment on the PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521#c19).

I have also modified the test case according to Wilco's comment to add 
an extra jump buffer. This test case passes with AArch64 but fails on

x86 trunk as follows (It may fail on other targets as well):

FAIL: gcc.c-torture/execute/pr84521.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr84521.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test

Testing: Bootstrapped and regtested on aarch64-none-linux-gnu.

Is this ok for trunk?

Sudi

*** gcc/ChangeLog ***

2018-06-14  Sudakshina Das  

 PR target/84521
 * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update comment.
 * config/aarch64/aarch64.c (aarch64_needs_frame_chain): Add
 cfun->has_nonlocal_label to force frame chain.
 (aarch64_builtin_setjmp_frame_value): New.
 (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define.
 * config/aarch64/aarch64.md (nonlocal_goto): New.

*** gcc/testsuite/ChangeLog ***

2018-06-14  Sudakshina Das  

 PR target/84521
 * gcc.c-torture/execute/pr84521.c: New test.




[PATCH PR81228][AARCH64][gcc-7] Backport r255625 : Fix ICE by adding LTGT

2018-01-09 Thread Sudakshina Das

Hi

This patch is only adding the missing LTGT to plug the ICE. This is a 
backport to r255625 of trunk.


Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new compile time test case that gives 
out LTGT to make sure it doesn't ICE.


Is this ok for trunk?

Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2018-01-09  Sudakshina Das  
Bin Cheng  

Backport from mainline:
2017-12-14  Sudakshina Das  
Bin Cheng  

PR target/81228
	* config/aarch64/aarch64.c (aarch64_select_cc_mode): Move LTGT to 
CCFPEmode.

* config/aarch64/aarch64-simd.md (vec_cmp): Add
LTGT.

*** gcc/testsuite/ChangeLog ***

2017-01-09  Sudakshina Das  

Backport from mainline:
2017-12-14  Sudakshina Das  

PR target/81228
* gcc.dg/pr81228.c: New.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c462164..1e0a346 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2490,6 +2490,7 @@
 case UNEQ:
 case ORDERED:
 case UNORDERED:
+case LTGT:
   break;
 default:
   gcc_unreachable ();
@@ -2544,6 +2545,15 @@
   emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
   break;
 
+case LTGT:
+  /* LTGT is not guranteed to not generate a FP exception.  So let's
+	 go the faster way : ((a > b) || (b > a)).  */
+  emit_insn (gen_aarch64_cmgt (operands[0],
+	 operands[2], operands[3]));
+  emit_insn (gen_aarch64_cmgt (tmp, operands[3], operands[2]));
+  emit_insn (gen_ior3 (operands[0], operands[0], tmp));
+  break;
+
 case UNORDERED:
   /* Operands are ORDERED iff (a > b || b >= a), so we can compute
 	 UNORDERED as !ORDERED.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 436091a..db517ca 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4664,13 +4664,13 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
 	case UNGT:
 	case UNGE:
 	case UNEQ:
-	case LTGT:
 	  return CCFPmode;
 
 	case LT:
 	case LE:
 	case GT:
 	case GE:
+	case LTGT:
 	  return CCFPEmode;
 
 	default:
diff --git a/gcc/testsuite/gcc.dg/pr81228.c b/gcc/testsuite/gcc.dg/pr81228.c
new file mode 100644
index 000..f7eecc5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr81228.c
@@ -0,0 +1,21 @@
+/* PR target/81228.  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-ssa" } */
+
+void *a;
+
+void b ()
+{
+  char c;
+  long d;
+  char *e = a;
+  for (; d; d++)
+  {
+double f, g;
+c = g < f || g > f;
+e[d] = c;
+  }
+}
+
+/* Let's make sure we do have a LTGT.  */
+/* { dg-final { scan-tree-dump "<>" "ssa" } } */


Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-10 Thread Sudakshina Das

Hi Jeff

On 09/01/18 23:43, Jeff Law wrote:

On 01/05/2018 12:25 PM, Sudakshina Das wrote:

Hi Jeff

On 05/01/18 18:44, Jeff Law wrote:

On 01/04/2018 08:35 AM, Sudakshina Das wrote:

Hi

The bug reported a particular test di-longlong64-sync-1.c failing when
run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3]
and -mthumb -march=armv6 -O[g,1,2,3].

According to what I could see, the crash was caused because of the
explicit VOIDmode argument that was sent to emit_store_flag_force ().
Since the comparing argument was a long long, it was being forced into a
VOID type register before the comparison (in prepare_cmp_insn()) is done.


As pointed out by Kyrill, there is a comment on emit_store_flag() which
says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs.
   If it is VOIDmode, they cannot both be CONST_INT". This condition is
not true in this case and thus I think it is suitable to change the
argument.

Testing done: Checked for regressions on bootstrapped
arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test
cases.

Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-04  Sudakshina Das  

  PR target/82096
  * optabs.c (expand_atomic_compare_and_swap): Change argument
  to emit_store_flag_force.

*** gcc/testsuite/ChangeLog ***

2017-01-04  Sudakshina Das  

  PR target/82096
  * gcc.c-torture/compile/pr82096-1.c: New test.
  * gcc.c-torture/compile/pr82096-2.c: Likwise.

In the case where both (op0/op1) to
emit_store_flag/emit_store_flag_force are constants, don't we know the
result of the comparison and shouldn't we have optimized the store flag
to something simpler?

I feel like I must be missing something here.



emit_store_flag_force () is comparing a register to op0.

?
/* Emit a store-flags instruction for comparison CODE on OP0 and OP1
and storing in TARGET.  Normally return TARGET.
Return 0 if that cannot be done.

MODE is the mode to use for OP0 and OP1 should they be CONST_INTs.  If
it is VOIDmode, they cannot both be CONST_INT.


So we're comparing op0 and op1 AFAICT.  One, but not both can be a
CONST_INT.  If both are a CONST_INT, then you need to address the
problem in the caller (by optimizing away the condition).  If you've got
a REG and a CONST_INT, then the mode should be taken from the REG operand.







The 2 constant arguments are to the expand_atomic_compare_and_swap ()
function. emit_store_flag_force () is used in case when this function is
called by the bool variant of the built-in function where the bool
return value is computed by comparing the result register with the
expected op0.

So if only one of the two objects is a CONST_INT, then the mode should
come from the other object.  I think that's the fundamental problem here
and that you're just papering over it by changing the caller.

I think my earlier explanation was a bit misleading and I may have 
rushed into quoting the comment about both operands being const for 
emit_store_flag_force(). The problem is with the function and I do agree 
with your suggestion of changing the function to add the code below to 
be a better approach than the changing the caller. I will change the 
patch and test it.


Thanks
Sudi



For example in emit_store_flag_1 we have this code:

   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */

   if (swap_commutative_operands_p (op0, op1))
 {
   std::swap (op0, op1);
   code = swap_condition (code);
 }

   if (mode == VOIDmode)
 mode = GET_MODE (op0);

I think if you do this in emit_store_flag_force as well everything will
"just work".

You can put it after this call/test pair:

  /* First see if emit_store_flag can do the job.  */
   tem = emit_store_flag (target, code, op0, op1, mode, unsignedp,
normalizep);
   if (tem != 0)
 return tem;


jeff





Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-10 Thread Sudakshina Das

Hi Jeff

On 10/01/18 10:44, Sudakshina Das wrote:

Hi Jeff

On 09/01/18 23:43, Jeff Law wrote:

On 01/05/2018 12:25 PM, Sudakshina Das wrote:

Hi Jeff

On 05/01/18 18:44, Jeff Law wrote:

On 01/04/2018 08:35 AM, Sudakshina Das wrote:

Hi

The bug reported a particular test di-longlong64-sync-1.c failing when
run on arm-linux-gnueabi with options -mthumb -march=armv5t 
-O[g,1,2,3]

and -mthumb -march=armv6 -O[g,1,2,3].

According to what I could see, the crash was caused because of the
explicit VOIDmode argument that was sent to emit_store_flag_force ().
Since the comparing argument was a long long, it was being forced 
into a
VOID type register before the comparison (in prepare_cmp_insn()) is done. 




As pointed out by Kyrill, there is a comment on emit_store_flag() 
which
says "MODE is the mode to use for OP0 and OP1 should they be 
CONST_INTs.
   If it is VOIDmode, they cannot both be CONST_INT". This 
condition is

not true in this case and thus I think it is suitable to change the
argument.

Testing done: Checked for regressions on bootstrapped
arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test
cases.

Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-04  Sudakshina Das  

  PR target/82096
  * optabs.c (expand_atomic_compare_and_swap): Change argument
  to emit_store_flag_force.

*** gcc/testsuite/ChangeLog ***

2017-01-04  Sudakshina Das  

  PR target/82096
  * gcc.c-torture/compile/pr82096-1.c: New test.
  * gcc.c-torture/compile/pr82096-2.c: Likwise.

In the case where both (op0/op1) to
emit_store_flag/emit_store_flag_force are constants, don't we know the
result of the comparison and shouldn't we have optimized the store flag
to something simpler?

I feel like I must be missing something here.



emit_store_flag_force () is comparing a register to op0.

?
/* Emit a store-flags instruction for comparison CODE on OP0 and OP1
    and storing in TARGET.  Normally return TARGET.
    Return 0 if that cannot be done.

    MODE is the mode to use for OP0 and OP1 should they be 
CONST_INTs.  If

    it is VOIDmode, they cannot both be CONST_INT.


So we're comparing op0 and op1 AFAICT.  One, but not both can be a
CONST_INT.  If both are a CONST_INT, then you need to address the
problem in the caller (by optimizing away the condition).  If you've got
a REG and a CONST_INT, then the mode should be taken from the REG 
operand.








The 2 constant arguments are to the expand_atomic_compare_and_swap ()
function. emit_store_flag_force () is used in case when this function is
called by the bool variant of the built-in function where the bool
return value is computed by comparing the result register with the
expected op0.

So if only one of the two objects is a CONST_INT, then the mode should
come from the other object.  I think that's the fundamental problem here
and that you're just papering over it by changing the caller.

I think my earlier explanation was a bit misleading and I may have 
rushed into quoting the comment about both operands being const for 
emit_store_flag_force(). The problem is with the function and I do agree 
with your suggestion of changing the function to add the code below to 
be a better approach than the changing the caller. I will change the 
patch and test it.




This is the updated patch according to your suggestions.

Testing: Checked for regressions on arm-none-linux-gnueabihf and added 
new test case.


Thanks
Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-10  Sudakshina Das  

PR target/82096
* expmed.c (emit_store_flag_force): Swap if const op0
and change VOIDmode to mode of op0.

*** gcc/testsuite/ChangeLog ***

2017-01-10  Sudakshina Das  

PR target/82096
* gcc.c-torture/compile/pr82096.c: New test.



Thanks
Sudi



For example in emit_store_flag_1 we have this code:

   /* If one operand is constant, make it the second one.  Only do this
  if the other operand is not constant as well.  */

   if (swap_commutative_operands_p (op0, op1))
 {
   std::swap (op0, op1);
   code = swap_condition (code);
 }

   if (mode == VOIDmode)
 mode = GET_MODE (op0);

I think if you do this in emit_store_flag_force as well everything will
"just work".

You can put it after this call/test pair:

  /* First see if emit_store_flag can do the job.  */
   tem = emit_store_flag (target, code, op0, op1, mode, unsignedp,
normalizep);
   if (tem != 0)
 return tem;


jeff





diff --git a/gcc/expmed.c b/gcc/expmed.c
index 6b22946..142d542 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -6084,6 +6084,18 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1,
   if (tem != 0)
 return tem;
 
+  /* If one operand is constant, make it the second one.  Only do this
+ if the other operand is not constant as well.  */
+
+  if (swap_commutative_operands_p (op0, op1))
+{
+  std::s

Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-11 Thread Sudakshina Das

Hi Jeff

On 10/01/18 21:08, Jeff Law wrote:

On 01/10/2018 09:25 AM, Sudakshina Das wrote:

Hi Jeff

On 10/01/18 10:44, Sudakshina Das wrote:

Hi Jeff

On 09/01/18 23:43, Jeff Law wrote:

On 01/05/2018 12:25 PM, Sudakshina Das wrote:

Hi Jeff

On 05/01/18 18:44, Jeff Law wrote:

On 01/04/2018 08:35 AM, Sudakshina Das wrote:

Hi

The bug reported a particular test di-longlong64-sync-1.c failing
when
run on arm-linux-gnueabi with options -mthumb -march=armv5t
-O[g,1,2,3]
and -mthumb -march=armv6 -O[g,1,2,3].

According to what I could see, the crash was caused because of the
explicit VOIDmode argument that was sent to emit_store_flag_force ().
Since the comparing argument was a long long, it was being forced
into a
VOID type register before the comparison (in prepare_cmp_insn()) is done.



As pointed out by Kyrill, there is a comment on emit_store_flag()
which
says "MODE is the mode to use for OP0 and OP1 should they be
CONST_INTs.
    If it is VOIDmode, they cannot both be CONST_INT". This
condition is
not true in this case and thus I think it is suitable to change the
argument.

Testing done: Checked for regressions on bootstrapped
arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new
test
cases.

Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-04  Sudakshina Das  

   PR target/82096
   * optabs.c (expand_atomic_compare_and_swap): Change argument
   to emit_store_flag_force.

*** gcc/testsuite/ChangeLog ***

2017-01-04  Sudakshina Das  

   PR target/82096
   * gcc.c-torture/compile/pr82096-1.c: New test.
   * gcc.c-torture/compile/pr82096-2.c: Likwise.

In the case where both (op0/op1) to
emit_store_flag/emit_store_flag_force are constants, don't we know the
result of the comparison and shouldn't we have optimized the store
flag
to something simpler?

I feel like I must be missing something here.



emit_store_flag_force () is comparing a register to op0.

?
/* Emit a store-flags instruction for comparison CODE on OP0 and OP1
     and storing in TARGET.  Normally return TARGET.
     Return 0 if that cannot be done.

     MODE is the mode to use for OP0 and OP1 should they be
CONST_INTs.  If
     it is VOIDmode, they cannot both be CONST_INT.


So we're comparing op0 and op1 AFAICT.  One, but not both can be a
CONST_INT.  If both are a CONST_INT, then you need to address the
problem in the caller (by optimizing away the condition).  If you've got
a REG and a CONST_INT, then the mode should be taken from the REG
operand.







The 2 constant arguments are to the expand_atomic_compare_and_swap ()
function. emit_store_flag_force () is used in case when this
function is
called by the bool variant of the built-in function where the bool
return value is computed by comparing the result register with the
expected op0.

So if only one of the two objects is a CONST_INT, then the mode should
come from the other object.  I think that's the fundamental problem here
and that you're just papering over it by changing the caller.


I think my earlier explanation was a bit misleading and I may have
rushed into quoting the comment about both operands being const for
emit_store_flag_force(). The problem is with the function and I do
agree with your suggestion of changing the function to add the code
below to be a better approach than the changing the caller. I will
change the patch and test it.



This is the updated patch according to your suggestions.

Testing: Checked for regressions on arm-none-linux-gnueabihf and added
new test case.

Thanks
Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-10  Sudakshina Das  

 PR target/82096
 * expmed.c (emit_store_flag_force): Swap if const op0
 and change VOIDmode to mode of op0.

*** gcc/testsuite/ChangeLog ***

2017-01-10  Sudakshina Das  

 PR target/82096
 * gcc.c-torture/compile/pr82096.c: New test.

OK.


Thanks. Committed as r256526.
Sudi


jeff





[PATCH][ARM] Fix test fail with conflicting -mfloat-abi

2018-01-12 Thread Sudakshina Das

Hi

This patch fixes my earlier test case that fails for arm-none-eabi
with explicit user option for -mfloat-abi which conflict with
the test case options. I have added a guard to skip the test
on those cases.

@Christophe:
Sorry about this. I think this should fix the test case.
Can you please confirm if this works for you?

Thanks
Sudi

gcc/testsuite/ChangeLog

2018-01-12  Sudakshina Das  

* gcc.c-torture/compile/pr82096.c: Add dg-skip-if
directive.
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
index 9fed28c..759d390 100644
--- a/gcc/testsuite/gcc.c-torture/compile/pr82096.c
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
 /* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */
 
 static long long AL[24];


Re: [PATCH][ARM] Fix test fail with conflicting -mfloat-abi

2018-01-16 Thread Sudakshina Das

Hi Christophe

On 12/01/18 18:32, Christophe Lyon wrote:

Le 12 janv. 2018 15:26, "Sudakshina Das"  a écrit :

Hi

This patch fixes my earlier test case that fails for arm-none-eabi
with explicit user option for -mfloat-abi which conflict with
the test case options. I have added a guard to skip the test
on those cases.

@Christophe:
Sorry about this. I think this should fix the test case.
Can you please confirm if this works for you?


Yes it does thanks


Thanks for checking that. I have added one more directive for armv5t as 
well to avoid any conflicts for mcpu options.


Sudi




Thanks
Sudi

gcc/testsuite/ChangeLog

2018-01-12  Sudakshina Das  

 * gcc.c-torture/compile/pr82096.c: Add dg-skip-if
 directive.



diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
index 9fed28c..35551f5 100644
--- a/gcc/testsuite/gcc.c-torture/compile/pr82096.c
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
@@ -1,3 +1,5 @@
+/* { dg-require-effective-target arm_arch_v5t_ok } */
+/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
 /* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */
 
 static long long AL[24];


Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-16 Thread Sudakshina Das

Hi Jeff

On 12/01/18 23:00, Jeff Law wrote:

On 01/12/2018 01:45 AM, Christophe Lyon wrote:

Hi,

On 11 January 2018 at 11:58, Sudakshina Das  wrote:

Hi Jeff


On 10/01/18 21:08, Jeff Law wrote:


On 01/10/2018 09:25 AM, Sudakshina Das wrote:


Hi Jeff

On 10/01/18 10:44, Sudakshina Das wrote:


Hi Jeff

On 09/01/18 23:43, Jeff Law wrote:


On 01/05/2018 12:25 PM, Sudakshina Das wrote:


Hi Jeff

On 05/01/18 18:44, Jeff Law wrote:


On 01/04/2018 08:35 AM, Sudakshina Das wrote:


Hi

The bug reported a particular test di-longlong64-sync-1.c failing
when
run on arm-linux-gnueabi with options -mthumb -march=armv5t
-O[g,1,2,3]
and -mthumb -march=armv6 -O[g,1,2,3].

According to what I could see, the crash was caused because of the
explicit VOIDmode argument that was sent to emit_store_flag_force
().
Since the comparing argument was a long long, it was being forced
into a
VOID type register before the comparison (in prepare_cmp_insn()) is
done.



As pointed out by Kyrill, there is a comment on emit_store_flag()
which
says "MODE is the mode to use for OP0 and OP1 should they be
CONST_INTs.
 If it is VOIDmode, they cannot both be CONST_INT". This
condition is
not true in this case and thus I think it is suitable to change the
argument.

Testing done: Checked for regressions on bootstrapped
arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new
test
cases.

Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-04  Sudakshina Das  

PR target/82096
* optabs.c (expand_atomic_compare_and_swap): Change argument
to emit_store_flag_force.

*** gcc/testsuite/ChangeLog ***

2017-01-04  Sudakshina Das  

PR target/82096
* gcc.c-torture/compile/pr82096-1.c: New test.
* gcc.c-torture/compile/pr82096-2.c: Likwise.


In the case where both (op0/op1) to
emit_store_flag/emit_store_flag_force are constants, don't we know
the
result of the comparison and shouldn't we have optimized the store
flag
to something simpler?

I feel like I must be missing something here.



emit_store_flag_force () is comparing a register to op0.


?
/* Emit a store-flags instruction for comparison CODE on OP0 and OP1
  and storing in TARGET.  Normally return TARGET.
  Return 0 if that cannot be done.

  MODE is the mode to use for OP0 and OP1 should they be
CONST_INTs.  If
  it is VOIDmode, they cannot both be CONST_INT.


So we're comparing op0 and op1 AFAICT.  One, but not both can be a
CONST_INT.  If both are a CONST_INT, then you need to address the
problem in the caller (by optimizing away the condition).  If you've
got
a REG and a CONST_INT, then the mode should be taken from the REG
operand.







The 2 constant arguments are to the expand_atomic_compare_and_swap ()
function. emit_store_flag_force () is used in case when this
function is
called by the bool variant of the built-in function where the bool
return value is computed by comparing the result register with the
expected op0.


So if only one of the two objects is a CONST_INT, then the mode should
come from the other object.  I think that's the fundamental problem
here
and that you're just papering over it by changing the caller.


I think my earlier explanation was a bit misleading and I may have
rushed into quoting the comment about both operands being const for
emit_store_flag_force(). The problem is with the function and I do
agree with your suggestion of changing the function to add the code
below to be a better approach than the changing the caller. I will
change the patch and test it.



This is the updated patch according to your suggestions.

Testing: Checked for regressions on arm-none-linux-gnueabihf and added
new test case.

Thanks
Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-10  Sudakshina Das  

  PR target/82096
  * expmed.c (emit_store_flag_force): Swap if const op0
  and change VOIDmode to mode of op0.

*** gcc/testsuite/ChangeLog ***

2017-01-10  Sudakshina Das  

  PR target/82096
  * gcc.c-torture/compile/pr82096.c: New test.


OK.



Thanks. Committed as r256526.
Sudi



Could you add a guard like in other tests to skip it if the user added
-mfloat-abi=XXX when running the tests?

For instance, I have a configuration where I add
-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard
and the new test fails because:
xgcc: error: -mfloat-abi=soft and -mfloat-abi=hard may not be used together

It's starting to feel like the test should move into gcc.target/arm :-)
  I nearly suggested that already.  Consider moving it into
gcc.target/arm pre-approved along with adding the -O
to the options and whatever is needed to skip the test at the
appropriate time.


My initial thought was also to put the test in gcc.target/arm. But I 
wanted to put it in a torture suite as this was failing at different 
optimization levels. Creating several tests for different optimization 
levels or a new torture suite jus

Re: [PATCH][ARM] Fix test fail with conflicting -mfloat-abi

2018-01-22 Thread Sudakshina Das

Hi Kyrill

On 19/01/18 18:00, Kyrill Tkachov wrote:


On 16/01/18 10:31, Sudakshina Das wrote:

Hi Christophe

On 12/01/18 18:32, Christophe Lyon wrote:

Le 12 janv. 2018 15:26, "Sudakshina Das"  a écrit :

Hi

This patch fixes my earlier test case that fails for arm-none-eabi
with explicit user option for -mfloat-abi which conflict with
the test case options. I have added a guard to skip the test
on those cases.

@Christophe:
Sorry about this. I think this should fix the test case.
Can you please confirm if this works for you?


Yes it does thanks


Thanks for checking that. I have added one more directive for armv5t 
as well to avoid any conflicts for mcpu options.




I agree with what Sudi said in 
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01422.html
I'd rather keep the test in the generic torture suite as long as we get 
the directives right.


So this is ok for trunk (as the changes are arm-specific directives) 
with one change below:


Thanks,
Kyrill


Sudi




Thanks
Sudi

gcc/testsuite/ChangeLog

2018-01-12  Sudakshina Das  

 * gcc.c-torture/compile/pr82096.c: Add dg-skip-if
 directive.






diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c 
b/gcc/testsuite/gcc.c-torture/compile/pr82096.c

index 9fed28c..35551f5 100644
--- a/gcc/testsuite/gcc.c-torture/compile/pr82096.c
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
@@ -1,3 +1,5 @@
+/* { dg-require-effective-target arm_arch_v5t_ok } */

Please also guard this on { target arm*-*-* }
That way this test will be run on other targets as well so that they can 
benefit from it.


+/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { 
"-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
  /* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { 
target arm*-*-* } } */




Thanks committed with the change as r256941

Sudi







[PATCH PR82096][gcc-7] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-22 Thread Sudakshina Das

Hi

This is a patch to backport r256526 and r256941 (Fix case fix) of trunk 
to fix emit_store_flag_force () function to fix the ICE. The original 
discussion is at 
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html and 
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html


Is this ok for gcc-7-branch?
Testing : Ran regression testing with bootstrapped arm-none-linux-gnueabihf.

Thanks
Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2018-01-22  Sudakshina Das  

Backport from mainline:
2018-01-10  Sudakshina Das  

PR target/82096
* expmed.c (emit_store_flag_force): Swap if const op0
and change VOIDmode to mode of op0.

*** gcc/testsuite/ChangeLog ***

2018-01-22  Sudakshina Das  

Backport from mainline:
2018-01-10  Sudakshina Das  

PR target/82096
* gcc.c-torture/compile/pr82096.c: New test.
diff --git a/gcc/expmed.c b/gcc/expmed.c
index e9f634a..30001ac 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -5886,6 +5886,18 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1,
   if (tem != 0)
 return tem;
 
+  /* If one operand is constant, make it the second one.  Only do this
+ if the other operand is not constant as well.  */
+
+  if (swap_commutative_operands_p (op0, op1))
+{
+  std::swap (op0, op1);
+  code = swap_condition (code);
+}
+
+  if (mode == VOIDmode)
+mode = GET_MODE (op0);
+
   if (!target)
 target = gen_reg_rtx (word_mode);
 
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
new file mode 100644
index 000..d144b70
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
@@ -0,0 +1,11 @@
+/* { dg-require-effective-target arm_arch_v5t_ok { target arm*-*-* } } */
+/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
+/* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */
+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll));
+}


Re: [PATCH PR81647][AARCH64][PING] Fix handling of Unordered Comparisons in aarch64-simd.md

2018-02-16 Thread Sudakshina Das

PING

On 15/12/17 11:57, Sudakshina Das wrote:

Hi

This patch fixes the inconsistent behavior observed at -O3 for the 
unordered comparisons. According to the online docs 
(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
all of the following should not raise an FP exception:

- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception raising 
instructions such as fcmgt. This patch changes the instructions that are 
emitted to in order to not give out the exceptions. We first check each 
operand for NaNs and force any elements containing NaN to zero before 
using them in the compare.


Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : 
a, isnan (b) ? 0.0 : b))



The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).


Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new test case.


Is this ok for trunk? This will probably need a back-port to 
gcc-7-branch as well.


Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * config/aarch64/aarch64-simd.md (vec_cmp): 
Modify instructions for

 UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * gcc.target/aarch64/pr81647.c: New.




Re: [PATCH PR82096][gcc-7] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-02-16 Thread Sudakshina Das

On 22/01/18 15:23, Richard Biener wrote:

On Mon, Jan 22, 2018 at 4:10 PM, Sudakshina Das  wrote:

Hi

This is a patch to backport r256526 and r256941 (Fix case fix) of trunk to
fix emit_store_flag_force () function to fix the ICE. The original
discussion is at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html
and https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html

Is this ok for gcc-7-branch?
Testing : Ran regression testing with bootstrapped arm-none-linux-gnueabihf.


The branch is currently frozen so please wait until after the GCC 7.3 release.


Committed as r257741

Thanks
Sudi



Thanks,
Richard.


Thanks
Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2018-01-22  Sudakshina Das  

 Backport from mainline:
 2018-01-10  Sudakshina Das  

 PR target/82096
 * expmed.c (emit_store_flag_force): Swap if const op0
 and change VOIDmode to mode of op0.

*** gcc/testsuite/ChangeLog ***

2018-01-22  Sudakshina Das  

 Backport from mainline:
 2018-01-10  Sudakshina Das  

 PR target/82096
 * gcc.c-torture/compile/pr82096.c: New test.




Re: [PATCH PR82096][gcc-7, gcc-6] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-02-21 Thread Sudakshina Das

On 16/02/18 15:40, Sudakshina Das wrote:

On 22/01/18 15:23, Richard Biener wrote:

On Mon, Jan 22, 2018 at 4:10 PM, Sudakshina Das  wrote:

Hi

This is a patch to backport r256526 and r256941 (Fix case fix) of 
trunk to

fix emit_store_flag_force () function to fix the ICE. The original
discussion is at 
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html

and https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html

Is this ok for gcc-7-branch?
Testing : Ran regression testing with bootstrapped 
arm-none-linux-gnueabihf.


The branch is currently frozen so please wait until after the GCC 7.3 
release.


Committed as r257741


Backported to gcc-6 as r257871

Thanks
Sudi



Thanks
Sudi



Thanks,
Richard.


Thanks
Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2018-01-22  Sudakshina Das  

 Backport from mainline:
 2018-01-10  Sudakshina Das  

 PR target/82096
 * expmed.c (emit_store_flag_force): Swap if const op0
 and change VOIDmode to mode of op0.

*** gcc/testsuite/ChangeLog ***

2018-01-22  Sudakshina Das  

 Backport from mainline:
 2018-01-10  Sudakshina Das  

 PR target/82096
 * gcc.c-torture/compile/pr82096.c: New test.






Re: [PATCH PR81228][AARCH64][gcc-7] Backport r255625 : Fix ICE by adding LTGT

2018-02-22 Thread Sudakshina Das

On 09/01/18 15:37, Sudakshina Das wrote:

Hi

This patch is only adding the missing LTGT to plug the ICE. This is a 
backport to r255625 of trunk.


Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new compile time test case that gives 
out LTGT to make sure it doesn't ICE.


Is this ok for trunk?


Backported to gcc-7 as r257901.

Sudi


Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2018-01-09  Sudakshina Das  
     Bin Cheng  

 Backport from mainline:
 2017-12-14  Sudakshina Das  
     Bin Cheng  

 PR target/81228
 * config/aarch64/aarch64.c (aarch64_select_cc_mode): Move LTGT to 
CCFPEmode.

 * config/aarch64/aarch64-simd.md (vec_cmp): Add
 LTGT.

*** gcc/testsuite/ChangeLog ***

2017-01-09  Sudakshina Das  

 Backport from mainline:
 2017-12-14  Sudakshina Das  

 PR target/81228
 * gcc.dg/pr81228.c: New.




[PATCH][ARM][PR82989] Fix unexpected use of NEON instructions for shifts

2018-03-14 Thread Sudakshina Das

Hi

This patch fixes PR82989 so that we avoid NEON instructions when 
-mneon-for-64bits is not enabled. This is more of a short term fix for 
the real deeper problem of making and early decision of choosing or 
rejecting NEON instructions. There is now a new ticket PR84467 to deal 
with the longer term solution.

(Please refer to the discussion in the bug report for more details).

Testing: Bootstrapped and regtested on arm-none-linux-gnueabihf and 
added a new test case based on the test given on the bug report.


Ok for trunk and backports for gcc-7 and gcc-6 branches?

Sudi


*** gcc/ChangeLog ***

2018-03-14  Sudakshina Das  

* config/arm/neon.md (ashldi3_neon): Update ?s for constraints
to favor GPR over NEON registers.
(di3_neon): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-14  Sudakshina Das  

* gcc.target/arm/pr82989.c: New test.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6f5d7..1646b21 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1180,12 +1180,12 @@
 )
 
 (define_insn_and_split "ashldi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand"	"= w, w,?&r,?r,?&r, ?w,w")
-	(ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0,  r, 0w,w")
-		   (match_operand:SI 2 "general_operand""rUm, i,  r, i,  i,rUm,i")))
-   (clobber (match_scratch:SI 3"= X, X,?&r, X,  X,  X,X"))
-   (clobber (match_scratch:SI 4"= X, X,?&r, X,  X,  X,X"))
-   (clobber (match_scratch:DI 5"=&w, X,  X, X,  X, &w,X"))
+  [(set (match_operand:DI 0 "s_register_operand"	"= w, w, &r, r, &r, ?w,?w")
+	(ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0,  r, 0w, w")
+		   (match_operand:SI 2 "general_operand""rUm, i,  r, i,  i,rUm, i")))
+   (clobber (match_scratch:SI 3"= X, X, &r, X,  X,  X, X"))
+   (clobber (match_scratch:SI 4"= X, X, &r, X,  X,  X, X"))
+   (clobber (match_scratch:DI 5"=&w, X,  X, X,  X, &w, X"))
(clobber (reg:CC_C CC_REGNUM))]
   "TARGET_NEON"
   "#"
@@ -1276,7 +1276,7 @@
 ;; ashrdi3_neon
 ;; lshrdi3_neon
 (define_insn_and_split "di3_neon"
-  [(set (match_operand:DI 0 "s_register_operand"	 "= w, w,?&r,?r,?&r,?w,?w")
+  [(set (match_operand:DI 0 "s_register_operand"	 "= w, w, &r, r, &r,?w,?w")
 	(RSHIFTS:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0,  r,0w, w")
 		(match_operand:SI 2 "reg_or_int_operand" "  r, i,  r, i,  i, r, i")))
(clobber (match_scratch:SI 3 "=2r, X, &r, X,  X,2r, X"))
diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c b/gcc/testsuite/gcc.target/arm/pr82989.c
new file mode 100644
index 000..1295ee6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr82989.c
@@ -0,0 +1,38 @@
+/* PR target/82989 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { "-mcpu=cortex-a8" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfpu=*" } { "-mfpu=neon" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=hard" } } */
+/* { dg-options "-O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard" } */
+/* { dg-add-options arm_neon } */
+
+typedef unsigned long long uint64_t;
+
+void f_shr_imm (uint64_t *a )
+{
+  *a += *a >> 32;
+}
+/* { dg-final { scan-assembler-not "vshr*" } } */
+
+void f_shr_reg (uint64_t *a, uint64_t b)
+{
+  *a += *a >> b;
+}
+/* { dg-final { scan-assembler-not "vshl*" } } */
+/* Only 2 times for f_shr_reg. f_shr_imm should not have any.  */
+/* { dg-final { scan-assembler-times {lsr\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */
+
+void f_shl_imm (uint64_t *a)
+{
+  *a += *a << 32;
+}
+/* { dg-final { scan-assembler-not "vshl*" } } */
+
+void f_shl_reg (uint64_t *a, uint64_t b)
+{
+  *a += *a << b;
+}
+/* { dg-final { scan-assembler-not "vshl*" } } */
+/* Only 2 times for f_shl_reg. f_shl_imm should not have any.  */
+/* { dg-final { scan-assembler-times {lsl\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */


[PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-03-14 Thread Sudakshina Das

Hi

This patch is another partial fix for PR 84521. This is adding a 
definition to one of the target hooks used in the SJLJ implemetation so 
that AArch64 defines the hard_frame_pointer_rtx as the 
TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is 
still a lot more work to be done for these builtins in the future.


Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added 
new test.


Is this ok for trunk?

Sudi


*** gcc/ChangeLog ***

2018-03-14  Sudakshina Das  

* builtins.c (expand_builtin_setjmp_receiver): Update condition
to restore frame pointer.
* config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update
comment.
* config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value):
New.
(TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define.

*** gcc/testsuite/ChangeLog ***

2018-03-14  Sudakshina Das  

* gcc.c-torture/execute/pr84521.c: New test.
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa7..640f1a9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -898,7 +898,8 @@ expand_builtin_setjmp_receiver (rtx receiver_label)
 
   /* Now put in the code to restore the frame pointer, and argument
  pointer, if needed.  */
-  if (! targetm.have_nonlocal_goto ())
+  if (! targetm.have_nonlocal_goto ()
+  && targetm.builtin_setjmp_frame_value () != hard_frame_pointer_rtx)
 {
   /* First adjust our frame pointer to its actual value.  It was
 	 previously set to the start of the virtual area corresponding to
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e3c52f6..7a21c14 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version;
 #define EH_RETURN_STACKADJ_RTX	gen_rtx_REG (Pmode, R4_REGNUM)
 #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
 
-/* Don't use __builtin_setjmp until we've defined it.  */
+/* Don't use __builtin_setjmp until we've defined it.
+   CAUTION: This macro is only used during exception unwinding.
+   Don't fall for its name.  */
 #undef DONT_USE_BUILTIN_SETJMP
 #define DONT_USE_BUILTIN_SETJMP 1
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e1fb87f..e7ac0fe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12128,6 +12128,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
   expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
 }
 
+/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */
+static rtx
+aarch64_builtin_setjmp_frame_value (void)
+{
+  return hard_frame_pointer_rtx;
+}
+
 /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
 
 static tree
@@ -17505,6 +17512,9 @@ aarch64_run_selftests (void)
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN aarch64_fold_builtin
 
+#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE
+#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
+
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG aarch64_function_arg
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
new file mode 100644
index 000..76b10d2
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target indirect_jumps } */
+
+#include 
+
+jmp_buf buf;
+
+int uses_longjmp (void)
+{
+  __builtin_longjmp (buf, 1);
+}
+
+int gl;
+void after_longjmp (void)
+{
+  gl = 5;
+}
+
+int
+test_1 (int n)
+{
+  volatile int *p = alloca (n);
+  if (__builtin_setjmp (buf))
+{
+  after_longjmp ();
+}
+  else
+{
+  uses_longjmp ();
+}
+
+  return 0;
+}
+
+int __attribute__ ((optimize ("no-omit-frame-pointer")))
+test_2 (int n)
+{
+  int i;
+  int *ptr = (int *)__builtin_alloca (sizeof (int) * n);
+  for (i = 0; i < n; i++)
+ptr[i] = i;
+  test_1 (n);
+  return 0;
+}
+
+int main (int argc, const char **argv)
+{
+  __builtin_memset (&buf, 0xaf, sizeof (buf));
+  test_2 (100);
+}


Re: [Aarch64] Fix conditional branches with target far away.

2018-03-15 Thread Sudakshina Das

On 15/03/18 15:27, Sameera Deshpande wrote:

Ping!

On 28 February 2018 at 16:18, Sameera Deshpande
 wrote:

On 27 February 2018 at 18:25, Ramana Radhakrishnan
 wrote:

On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
 wrote:

Hi!

Please find attached the patch to fix bug in branches with offsets over 1MiB.
There has been an attempt to fix this issue in commit
050af05b9761f1979f11c151519e7244d5becd7c

However, the far_branch attribute defined in above patch used
insn_length - which computes incorrect offset. Hence, eliminated the
attribute completely, and computed the offset from insn_addresses
instead.

Ok for trunk?

gcc/Changelog

2018-02-13 Sameera Deshpande 
 * config/aarch64/aarch64.md (far_branch): Remove attribute. Eliminate
 all the dependencies on the attribute from RTL patterns.



I'm not a maintainer but this looks good to me modulo notes about how
this was tested. What would be nice is a testcase for the testsuite as
well as ensuring that the patch has been bootstrapped and regression
tested. AFAIR, the original patch was put in because match.pd failed
when bootstrap in another context.


regards
Ramana


--
- Thanks and regards,
   Sameera D.


The patch is tested with GCC testsuite and bootstrapping successfully.
Also tested for spec benchmark.



I am not a maintainer either. I noticed that the range check you do for
the offset has a (<= || >=). The "far_branch" however did (< || >=) for 
a positive value. Was that also part of the incorrect offset calculation?


@@ -692,7 +675,11 @@
{
  if (get_attr_length (insn) =3D=3D 8)
{
-   if (get_attr_far_branch (insn) =3D=3D 1)
+   long long int offset;
+   offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
+ - INSN_ADDRESSES (INSN_UID (insn));
+
+   if (offset <=3D -1048576 || offset >=3D 1048572)
   return aarch64_gen_far_branch (operands, 2, "Ltb",
  "\\t%0, %1, ");
 else
@@ -709,12 +696,7 @@
 (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
-32768))
(lt (minus (match_dup 2) (pc)) (const_int
32764)))
   (const_int 4)
- (const_int 8)))
-   (set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
-1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int
1048572)))
- (const_int 0)
- (const_int 1)))]
+ (const_int 8)))]

  )

Thanks
Sudi


--
- Thanks and regards,
   Sameera D.








Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md

2018-03-19 Thread Sudakshina Das

Hi

On 19/03/18 14:29, James Greenhalgh wrote:

On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote:

Hi

This patch fixes the inconsistent behavior observed at -O3 for the
unordered comparisons. According to the online docs
(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html),
all of the following should not raise an FP exception:
- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception raising
instructions such as fcmgt. This patch changes the instructions that are
emitted to in order to not give out the exceptions. We first check each
operand for NaNs and force any elements containing NaN to zero before
using them in the compare.

Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 :
a, isnan (b) ? 0.0 : b))


The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).

Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and added a new test case.

Is this ok for trunk? This will probably need a back-port to
gcc-7-branch as well.


OK.

Let it soak on trunk for a while before the backport.


Thanks. Committed to trunk as r258653. Will wait a week before backport.

Sudi



Thanks,
James


ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  

PR target/81647
* config/aarch64/aarch64-simd.md (vec_cmp): Modify
instructions for
UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  

PR target/81647
* gcc.target/aarch64/pr81647.c: New.



diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2731,10 +2731,10 @@
  break;
}
/* Fall through.  */
-case UNGE:
+case UNLT:
std::swap (operands[2], operands[3]);
/* Fall through.  */
-case UNLE:
+case UNGT:
  case GT:
comparison = gen_aarch64_cmgt;
break;
@@ -2745,10 +2745,10 @@
  break;
}
/* Fall through.  */
-case UNGT:
+case UNLE:
std::swap (operands[2], operands[3]);
/* Fall through.  */
-case UNLT:
+case UNGE:
  case GE:
comparison = gen_aarch64_cmge;
break;
@@ -2771,21 +2771,35 @@
  case UNGT:
  case UNLE:
  case UNLT:
-case NE:
-  /* FCM returns false for lanes which are unordered, so if we use
-the inverse of the comparison we actually want to emit, then
-invert the result, we will end up with the correct result.
-Note that a NE NaN and NaN NE b are true for all a, b.
-
-Our transformations are:
-a UNGE b -> !(b GT a)
-a UNGT b -> !(b GE a)
-a UNLE b -> !(a GT b)
-a UNLT b -> !(a GE b)
-a   NE b -> !(a EQ b)  */
-  gcc_assert (comparison != NULL);
-  emit_insn (comparison (operands[0], operands[2], operands[3]));
-  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
+  {
+   /* All of the above must not raise any FP exceptions.  Thus we first
+  check each operand for NaNs and force any elements containing NaN to
+  zero before using them in the compare.
+  Example: UN (a, b) -> UNORDERED (a, b) |
+(cm (isnan (a) ? 0.0 : a,
+ isnan (b) ? 0.0 : b))
+  We use the following transformations for doing the comparisions:
+  a UNGE b -> a GE b
+  a UNGT b -> a GT b
+  a UNLE b -> b GE a
+  a UNLT b -> b GT a.  */
+
+   rtx tmp0 = gen_reg_rtx (mode);
+   rtx tmp1 = gen_reg_rtx (mode);
+   rtx tmp2 = gen_reg_rtx (mode);
+   emit_insn (gen_aarch64_cmeq (tmp0, operands[2], operands[2]));
+   emit_insn (gen_aarch64_cmeq (tmp1, operands[3], operands[3]));
+   emit_insn (gen_and3 (tmp2, tmp0, tmp1));
+   emit_insn (gen_and3 (tmp0, tmp0,
+   lowpart_subreg (mode, operands[2], 
mode)));
+   emit_insn (gen_and3 (tmp1, tmp1,
+   lowpart_subreg (mode, operands[3], 
mode)));
+   gcc_assert (comparison != NULL);
+   emit_insn (comparison (operands[0],
+  lowpart_subreg (mode, tmp0, 
mode),
+  lowpart_subreg (mode, tmp1, 
mode)));
+   emit_insn (gen_orn3 (operands[0], tmp2, operands[0]));
+  }
break;
  
  case LT:

@@ -2793,25 +2807,19 @@
  case GT:
  case GE:
  case EQ:
+case NE:
/* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
 As

Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md

2018-03-20 Thread Sudakshina Das

Hi

On 20/03/18 08:13, Christophe Lyon wrote:

On 19 March 2018 at 19:55, Sudakshina Das  wrote:

Hi


On 19/03/18 14:29, James Greenhalgh wrote:


On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote:


Hi

This patch fixes the inconsistent behavior observed at -O3 for the
unordered comparisons. According to the online docs

(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html),
all of the following should not raise an FP exception:
- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception raising
instructions such as fcmgt. This patch changes the instructions that are
emitted to in order to not give out the exceptions. We first check each
operand for NaNs and force any elements containing NaN to zero before
using them in the compare.

Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 :
a, isnan (b) ? 0.0 : b))


The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).

Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and added a new test case.

Is this ok for trunk? This will probably need a back-port to
gcc-7-branch as well.



OK.

Let it soak on trunk for a while before the backport.



Thanks. Committed to trunk as r258653. Will wait a week before backport.



Hi,

As the test failed to compile on aarch64 bare-metal targets, I added
/* { dg-require-effective-target fenv_exceptions } */
as obvious (r258672).

2018-03-20  Christophe Lyon  

PR target/81647
* gcc.target/aarch64/pr81647.c: Require fenv_exceptions.

Index: testsuite/gcc.target/aarch64/pr81647.c
===
--- testsuite/gcc.target/aarch64/pr81647.c  (revision 258671)
+++ testsuite/gcc.target/aarch64/pr81647.c  (revision 258672)
@@ -1,5 +1,6 @@
  /* { dg-do run } */
  /* { dg-options "-O3 -fdump-tree-ssa" } */
+/* { dg-require-effective-target fenv_exceptions } */

  #include 



Christophe


Thanks for fixing this and apologies for missing it on the first place!

Sudi




Sudi




Thanks,
James


ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * config/aarch64/aarch64-simd.md (vec_cmp):
Modify
instructions for
 UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * gcc.target/aarch64/pr81647.c: New.




diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index
f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b
100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2731,10 +2731,10 @@
   break;
 }
 /* Fall through.  */
-case UNGE:
+case UNLT:
 std::swap (operands[2], operands[3]);
 /* Fall through.  */
-case UNLE:
+case UNGT:
   case GT:
 comparison = gen_aarch64_cmgt;
 break;
@@ -2745,10 +2745,10 @@
   break;
 }
 /* Fall through.  */
-case UNGT:
+case UNLE:
 std::swap (operands[2], operands[3]);
 /* Fall through.  */
-case UNLT:
+case UNGE:
   case GE:
 comparison = gen_aarch64_cmge;
 break;
@@ -2771,21 +2771,35 @@
   case UNGT:
   case UNLE:
   case UNLT:
-case NE:
-  /* FCM returns false for lanes which are unordered, so if we use
-the inverse of the comparison we actually want to emit, then
-invert the result, we will end up with the correct result.
-Note that a NE NaN and NaN NE b are true for all a, b.
-
-Our transformations are:
-a UNGE b -> !(b GT a)
-a UNGT b -> !(b GE a)
-a UNLE b -> !(a GT b)
-a UNLT b -> !(a GE b)
-a   NE b -> !(a EQ b)  */
-  gcc_assert (comparison != NULL);
-  emit_insn (comparison (operands[0], operands[2], operands[3]));
-  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
+  {
+   /* All of the above must not raise any FP exceptions.  Thus we
first
+  check each operand for NaNs and force any elements containing
NaN to
+  zero before using them in the compare.
+  Example: UN (a, b) -> UNORDERED (a, b) |
+(cm (isnan (a) ? 0.0 : a,
+ isnan (b) ? 0.0 : b))
+  We use the following transformations for doing the
comparisions:
+  a UNGE b -> a GE b
+  a UNGT b -> a GT b
+  a UNLE b -> b GE a
+  a UNLT b -> b GT a.  */
+
+   rtx tmp0 = gen_reg_rtx (mode);
+   rtx tmp1 = gen_reg_rtx (mode);
+   rtx tmp2 = gen_reg_rtx (m

Re: [PATCH][ARM][PR82989] Fix unexpected use of NEON instructions for shifts

2018-03-20 Thread Sudakshina Das

Hi

On 20/03/18 10:03, Richard Earnshaw (lists) wrote:

On 14/03/18 10:11, Sudakshina Das wrote:

Hi

This patch fixes PR82989 so that we avoid NEON instructions when
-mneon-for-64bits is not enabled. This is more of a short term fix for
the real deeper problem of making and early decision of choosing or
rejecting NEON instructions. There is now a new ticket PR84467 to deal
with the longer term solution.
(Please refer to the discussion in the bug report for more details).

Testing: Bootstrapped and regtested on arm-none-linux-gnueabihf and
added a new test case based on the test given on the bug report.

Ok for trunk and backports for gcc-7 and gcc-6 branches?



OK for trunk.  Please leave it a couple of days before backporting to
ensure that the testcase doesn't tickle any multilib issues.

R.


Thanks. Committed to trunk as r258677. Will wait a week for backporting.

Sudi




Sudi


*** gcc/ChangeLog ***

2018-03-14  Sudakshina Das  

 * config/arm/neon.md (ashldi3_neon): Update ?s for constraints
 to favor GPR over NEON registers.
 (di3_neon): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-14  Sudakshina Das  

 * gcc.target/arm/pr82989.c: New test.

pr82989.diff


diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6f5d7..1646b21 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1180,12 +1180,12 @@
  )
  
  (define_insn_and_split "ashldi3_neon"

-  [(set (match_operand:DI 0 "s_register_operand" "= w, w,?&r,?r,?&r, 
?w,w")
-   (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0,  r, 
0w,w")
-  (match_operand:SI 2 "general_operand""rUm, i,  r, i,  
i,rUm,i")))
-   (clobber (match_scratch:SI 3"= X, X,?&r, X,  
X,  X,X"))
-   (clobber (match_scratch:SI 4"= X, X,?&r, X,  
X,  X,X"))
-   (clobber (match_scratch:DI 5"=&w, X,  X, X,  X, 
&w,X"))
+  [(set (match_operand:DI 0 "s_register_operand" "= w, w, &r, r, &r, 
?w,?w")
+   (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0,  r, 0w, 
w")
+  (match_operand:SI 2 "general_operand""rUm, i,  r, i,  i,rUm, 
i")))
+   (clobber (match_scratch:SI 3"= X, X, &r, X,  
X,  X, X"))
+   (clobber (match_scratch:SI 4"= X, X, &r, X,  
X,  X, X"))
+   (clobber (match_scratch:DI 5"=&w, X,  X, X,  X, 
&w, X"))
 (clobber (reg:CC_C CC_REGNUM))]
"TARGET_NEON"
"#"
@@ -1276,7 +1276,7 @@
  ;; ashrdi3_neon
  ;; lshrdi3_neon
  (define_insn_and_split "di3_neon"
-  [(set (match_operand:DI 0 "s_register_operand"  "= w, 
w,?&r,?r,?&r,?w,?w")
+  [(set (match_operand:DI 0 "s_register_operand"  "= w, w, &r, r, 
&r,?w,?w")
(RSHIFTS:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r, 0,  r,0w, 
w")
(match_operand:SI 2 "reg_or_int_operand" "  r, i,  r, i,  i, r, 
i")))
 (clobber (match_scratch:SI 3"=2r, X, &r, X, 
 X,2r, X"))
diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c 
b/gcc/testsuite/gcc.target/arm/pr82989.c
new file mode 100644
index 000..1295ee6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr82989.c
@@ -0,0 +1,38 @@
+/* PR target/82989 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mcpu=*" } { 
"-mcpu=cortex-a8" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfpu=*" } { 
"-mfpu=neon" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } { "-mfloat-abi=*" } { 
"-mfloat-abi=hard" } } */
+/* { dg-options "-O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard" } */
+/* { dg-add-options arm_neon } */
+
+typedef unsigned long long uint64_t;
+
+void f_shr_imm (uint64_t *a )
+{
+  *a += *a >> 32;
+}
+/* { dg-final { scan-assembler-not "vshr*" } } */
+
+void f_shr_reg (uint64_t *a, uint64_t b)
+{
+  *a += *a >> b;
+}
+/* { dg-final { scan-assembler-not "vshl*" } } */
+/* Only 2 times for f_shr_reg. f_shr_imm should not have any.  */
+/* { dg-final { scan-assembler-times {lsr\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */
+
+void f_shl_imm (uint64_t *a)
+{
+  *a += *a << 32;
+}
+/* { dg-final { scan-assembler-not "vshl*" } } */
+
+void f_shl_reg (uint64_t *a, uint64_t b)
+{
+  *a += *a << b;
+}
+/* { dg-final { scan-assembler-not "vshl*" } } */
+/* Only 2 times for f_shl_reg. f_shl_imm should not have any.  */
+/* { dg-final { scan-assembler-times {lsl\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */







Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-03-20 Thread Sudakshina Das

On 19/03/18 12:11, James Greenhalgh wrote:

On Wed, Mar 14, 2018 at 05:40:49PM +, Sudi Das wrote:

Hi

This patch is another partial fix for PR 84521. This is adding a
definition to one of the target hooks used in the SJLJ implemetation so
that AArch64 defines the hard_frame_pointer_rtx as the
TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is
still a lot more work to be done for these builtins in the future.

Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added
new test.

Is this ok for trunk?


OK.


Thanks James but I realized I marked this wrong as only AArch64 patch. 
This also has a mid change so cc'ing more people for approval.


Sudi



Thanks,
James


*** gcc/ChangeLog ***

2018-03-14  Sudakshina Das  

* builtins.c (expand_builtin_setjmp_receiver): Update condition
to restore frame pointer.
* config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update
comment.
* config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value):
New.
(TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define.

*** gcc/testsuite/ChangeLog ***

2018-03-14  Sudakshina Das  

* gcc.c-torture/execute/pr84521.c: New test.



diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa7..640f1a9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -898,7 +898,8 @@ expand_builtin_setjmp_receiver (rtx receiver_label)
  
/* Now put in the code to restore the frame pointer, and argument

   pointer, if needed.  */
-  if (! targetm.have_nonlocal_goto ())
+  if (! targetm.have_nonlocal_goto ()
+  && targetm.builtin_setjmp_frame_value () != hard_frame_pointer_rtx)
  {
/* First adjust our frame pointer to its actual value.  It was
 previously set to the start of the virtual area corresponding to
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e3c52f6..7a21c14 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version;
  #define EH_RETURN_STACKADJ_RTXgen_rtx_REG (Pmode, R4_REGNUM)
  #define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
  
-/* Don't use __builtin_setjmp until we've defined it.  */

+/* Don't use __builtin_setjmp until we've defined it.
+   CAUTION: This macro is only used during exception unwinding.
+   Don't fall for its name.  */
  #undef DONT_USE_BUILTIN_SETJMP
  #define DONT_USE_BUILTIN_SETJMP 1
  
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

index e1fb87f..e7ac0fe 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12128,6 +12128,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
nextarg ATTRIBUTE_UNUSED)
expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
  }
  
+/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE.  */

+static rtx
+aarch64_builtin_setjmp_frame_value (void)
+{
+  return hard_frame_pointer_rtx;
+}
+
  /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR.  */
  
  static tree

@@ -17505,6 +17512,9 @@ aarch64_run_selftests (void)
  #undef TARGET_FOLD_BUILTIN
  #define TARGET_FOLD_BUILTIN aarch64_fold_builtin
  
+#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE

+#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value
+
  #undef TARGET_FUNCTION_ARG
  #define TARGET_FUNCTION_ARG aarch64_function_arg
  
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c b/gcc/testsuite/gcc.c-torture/execute/pr84521.c

new file mode 100644
index 000..76b10d2
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target indirect_jumps } */
+
+#include 
+
+jmp_buf buf;
+
+int uses_longjmp (void)
+{
+  __builtin_longjmp (buf, 1);
+}
+
+int gl;
+void after_longjmp (void)
+{
+  gl = 5;
+}
+
+int
+test_1 (int n)
+{
+  volatile int *p = alloca (n);
+  if (__builtin_setjmp (buf))
+{
+  after_longjmp ();
+}
+  else
+{
+  uses_longjmp ();
+}
+
+  return 0;
+}
+
+int __attribute__ ((optimize ("no-omit-frame-pointer")))
+test_2 (int n)
+{
+  int i;
+  int *ptr = (int *)__builtin_alloca (sizeof (int) * n);
+  for (i = 0; i < n; i++)
+ptr[i] = i;
+  test_1 (n);
+  return 0;
+}
+
+int main (int argc, const char **argv)
+{
+  __builtin_memset (&buf, 0xaf, sizeof (buf));
+  test_2 (100);
+}






Re: [PATCH][ARM][PR82989] Fix unexpected use of NEON instructions for shifts

2018-03-21 Thread Sudakshina Das

Hi

On 21/03/18 08:51, Christophe Lyon wrote:

On 20 March 2018 at 11:58, Sudakshina Das  wrote:

Hi

On 20/03/18 10:03, Richard Earnshaw (lists) wrote:


On 14/03/18 10:11, Sudakshina Das wrote:


Hi

This patch fixes PR82989 so that we avoid NEON instructions when
-mneon-for-64bits is not enabled. This is more of a short term fix for
the real deeper problem of making and early decision of choosing or
rejecting NEON instructions. There is now a new ticket PR84467 to deal
with the longer term solution.
(Please refer to the discussion in the bug report for more details).

Testing: Bootstrapped and regtested on arm-none-linux-gnueabihf and
added a new test case based on the test given on the bug report.

Ok for trunk and backports for gcc-7 and gcc-6 branches?



OK for trunk.  Please leave it a couple of days before backporting to
ensure that the testcase doesn't tickle any multilib issues.

R.



Thanks. Committed to trunk as r258677. Will wait a week for backporting.

Sudi



Hi Sudi,

I've noticed that:
FAIL:gcc.target/arm/pr82989.c scan-assembler-times lsl\\tr[0-9]+,
r[0-9]+, r[0-9] 2
FAIL:gcc.target/arm/pr82989.c scan-assembler-times lsr\\tr[0-9]+,
r[0-9]+, r[0-9] 2
on target armeb-none-linux-gnueabihf
--with-mode thumb
--with-cpu cortex-a9
--with-fpu neon-fp16

The tests pass when using --with-mode arm

Can you check?


Yes I see this as well. Sorry about this. I am testing a quick fix for 
this at the moment.


Thanks
Sudi



Thanks

Christophe






Sudi


*** gcc/ChangeLog ***

2018-03-14  Sudakshina Das  

  * config/arm/neon.md (ashldi3_neon): Update ?s for constraints
  to favor GPR over NEON registers.
  (di3_neon): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-14  Sudakshina Das  

  * gcc.target/arm/pr82989.c: New test.

pr82989.diff


diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6f5d7..1646b21 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1180,12 +1180,12 @@
   )
 (define_insn_and_split "ashldi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand"   "= w,
w,?&r,?r,?&r, ?w,w")
-   (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r,
0,  r, 0w,w")
-  (match_operand:SI 2 "general_operand""rUm, i,  r,
i,  i,rUm,i")))
-   (clobber (match_scratch:SI 3"= X,
X,?&r, X,  X,  X,X"))
-   (clobber (match_scratch:SI 4"= X,
X,?&r, X,  X,  X,X"))
-   (clobber (match_scratch:DI 5"=&w,
X,  X, X,  X, &w,X"))
+  [(set (match_operand:DI 0 "s_register_operand"   "= w, w, &r,
r, &r, ?w,?w")
+   (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r,
0,  r, 0w, w")
+  (match_operand:SI 2 "general_operand""rUm, i,  r,
i,  i,rUm, i")))
+   (clobber (match_scratch:SI 3"= X,
X, &r, X,  X,  X, X"))
+   (clobber (match_scratch:SI 4"= X,
X, &r, X,  X,  X, X"))
+   (clobber (match_scratch:DI 5"=&w,
X,  X, X,  X, &w, X"))
  (clobber (reg:CC_C CC_REGNUM))]
 "TARGET_NEON"
 "#"
@@ -1276,7 +1276,7 @@
   ;; ashrdi3_neon
   ;; lshrdi3_neon
   (define_insn_and_split "di3_neon"
-  [(set (match_operand:DI 0 "s_register_operand""= w,
w,?&r,?r,?&r,?w,?w")
+  [(set (match_operand:DI 0 "s_register_operand""= w, w, &r,
r, &r,?w,?w")
 (RSHIFTS:DI (match_operand:DI 1 "s_register_operand" " 0w, w, 0r,
0,  r,0w, w")
 (match_operand:SI 2 "reg_or_int_operand" "  r, i,  r,
i,  i, r, i")))
  (clobber (match_scratch:SI 3
"=2r, X, &r, X,  X,2r, X"))
diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c
b/gcc/testsuite/gcc.target/arm/pr82989.c
new file mode 100644
index 000..1295ee6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr82989.c
@@ -0,0 +1,38 @@
+/* PR target/82989 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } {
"-mcpu=*" } { "-mcpu=cortex-a8" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } {
"-mfpu=*" } { "-mfpu=neon" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } {
"-mfloat-abi=*" } { "-mfloat-abi=hard" } } */
+/* { dg-options "-O2 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=hard" } */
+/* { dg-add-options arm_neon } */
+
+typedef unsigned long long uint64_t;
+

[PATCH][ARM] Fix test pr82989.c for big endian and mthumb

2018-03-21 Thread Sudakshina Das

Hi

The test pr82989.c which was added in one of previous commits is failing
for mthumb and big-endian configurations. The aim of this test was to
check that NEON instructions are not being used for simple shift
operations. The scanning of lsl and lsr instructions and checking its
counts were just too restrictive for different configurations. So I
have now simplified the test to only check for the absence of NEON
instructions.

Testing: Only test case change so only tested the said test on
differently configured toolchain.
@Christophe can you confirm this patch fixes the failure for you?

Thanks
Sudi

*** gcc/testsuite/ChangeLog ***

2018-03-21  Sudakshina Das  

PR target/82989
* gcc.target/arm/pr82989.c: Change dg-scan-assembly directives.
diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c b/gcc/testsuite/gcc.target/arm/pr82989.c
index 6f74dba..8519c3f 100644
--- a/gcc/testsuite/gcc.target/arm/pr82989.c
+++ b/gcc/testsuite/gcc.target/arm/pr82989.c
@@ -13,26 +13,21 @@ void f_shr_imm (uint64_t *a)
 {
   *a += *a >> 32;
 }
-/* { dg-final { scan-assembler-not "vshr*" } } */
 
 void f_shr_reg (uint64_t *a, uint64_t b)
 {
   *a += *a >> b;
 }
-/* { dg-final { scan-assembler-not "vshl*" } } */
-/* Only 2 times for f_shr_reg. f_shr_imm should not have any.  */
-/* { dg-final { scan-assembler-times {lsr\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */
 
 void f_shl_imm (uint64_t *a)
 {
   *a += *a << 32;
 }
-/* { dg-final { scan-assembler-not "vshl*" } } */
 
 void f_shl_reg (uint64_t *a, uint64_t b)
 {
   *a += *a << b;
 }
 /* { dg-final { scan-assembler-not "vshl*" } } */
-/* Only 2 times for f_shl_reg. f_shl_imm should not have any.  */
-/* { dg-final { scan-assembler-times {lsl\tr[0-9]+, r[0-9]+, r[0-9]} 2 } } */
+/* { dg-final { scan-assembler-not "vshr*" } } */
+/* { dg-final { scan-assembler-not "vmov*" } } */


Re: [PATCH][ARM] Fix test pr82989.c for big endian and mthumb

2018-03-21 Thread Sudakshina Das

Hi

On 21/03/18 17:03, Kyrill Tkachov wrote:


On 21/03/18 16:33, Christophe Lyon wrote:

On 21 March 2018 at 13:11, Sudakshina Das  wrote:

Hi

The test pr82989.c which was added in one of previous commits is failing
for mthumb and big-endian configurations. The aim of this test was to
check that NEON instructions are not being used for simple shift
operations. The scanning of lsl and lsr instructions and checking its
counts were just too restrictive for different configurations. So I
have now simplified the test to only check for the absence of NEON
instructions.

Testing: Only test case change so only tested the said test on
differently configured toolchain.
@Christophe can you confirm this patch fixes the failure for you?


Yes, the validations are now OK my side.


Thanks, the patch is ok for trunk.


Thanks Christophe for validating and Kyrill for the Ok.
Committed to trunk as r258723.

Thanks
Sudi



Kyrill


Thanks

Christophe


Thanks
Sudi

*** gcc/testsuite/ChangeLog ***

2018-03-21  Sudakshina Das  

 PR target/82989
 * gcc.target/arm/pr82989.c: Change dg-scan-assembly directives.






[PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi

2018-03-21 Thread Sudakshina Das

Hi

The ICE in the bug report was happening because the macro
USE_RETURN_INSN (FALSE) was returning different values at different
points in the compilation. This was internally occurring because the
function arm_compute_static_chain_stack_bytes () which was dependent on
arm_r3_live_at_start_p () was giving a different value after the
cond_exec instructions were created in ce3 causing the liveness of r3
to escape up to the start block.

The function arm_compute_static_chain_stack_bytes () should really
only compute the value once during epilogue/prologue stage. This pass
introduces a new member 'static_chain_stack_bytes' to the target
definition of the struct machine_function which gets calculated in
expand_prologue and is the value that is returned by
arm_compute_static_chain_stack_bytes () beyond that.

Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf
and added the reported test to the testsuite.

Is this ok for trunk?

Sudi


ChangeLog entries:

*** gcc/ChangeLog ***

2018-03-21  Sudakshina Das  

PR target/84826
* config/arm/arm.h (machine_function): Add
static_chain_stack_bytes.
* config/arm/arm.c (arm_compute_static_chain_stack_bytes):
Avoid re-computing once computed.
(arm_expand_prologue): Compute machine->static_chain_stack_bytes.
(arm_init_machine_status): Initialize
machine->static_chain_stack_bytes.

*** gcc/testsuite/ChangeLog ***

2018-03-21  Sudakshina Das  

PR target/84826
* gcc.target/arm/pr84826.c: New test
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index bbf3937..2809112 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function
   machine_mode thumb1_cc_mode;
   /* Set to 1 after arm_reorg has started.  */
   int after_arm_reorg;
+  /* The number of bytes used to store the static chain register on the
+ stack, above the stack frame.  */
+  int static_chain_stack_bytes;
 }
 machine_function;
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cb6ab81..bc31810 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)
 static int
 arm_compute_static_chain_stack_bytes (void)
 {
+  /* Once the value is updated from the init value of -1, do not
+ re-compute.  */
+  if (cfun->machine->static_chain_stack_bytes != -1)
+return cfun->machine->static_chain_stack_bytes;
+
   /* See the defining assertion in arm_expand_prologue.  */
   if (IS_NESTED (arm_current_func_type ())
   && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
@@ -21699,6 +21704,11 @@ arm_expand_prologue (void)
   emit_insn (gen_movsi (stack_pointer_rtx, r1));
 }
 
+  /* Let's compute the static_chain_stack_bytes required and store it.  Right
+ now the value must the -1 as stored by arm_init_machine_status ().  */
+  cfun->machine->static_chain_stack_bytes
+= arm_compute_static_chain_stack_bytes ();
+
   /* The static chain register is the same as the IP register.  If it is
  clobbered when creating the frame, we need to save and restore it.  */
   clobber_ip = IS_NESTED (func_type)
@@ -24875,6 +24885,7 @@ arm_init_machine_status (void)
 #if ARM_FT_UNKNOWN != 0
   machine->func_type = ARM_FT_UNKNOWN;
 #endif
+  machine->static_chain_stack_bytes = -1;
   return machine;
 }
 
diff --git a/gcc/testsuite/gcc.target/arm/pr84826.c b/gcc/testsuite/gcc.target/arm/pr84826.c
new file mode 100644
index 000..c61c548
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr84826.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fstack-clash-protection" } */
+
+void d (void *);
+
+void a ()
+{
+  int b;
+  void bar (int c)
+  {
+if (__builtin_expect (c, 0))
+  ++b;
+  }
+  d (bar);
+}


Re: [Aarch64] Fix conditional branches with target far away.

2018-03-22 Thread Sudakshina Das

Hi Sameera

On 22/03/18 02:07, Sameera Deshpande wrote:

Hi Sudakshina,

As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
far branch instruction offset is inclusive of both the offsets. Hence,
I am using <=||=> and not <||>= as it was in previous implementation.


I have to admit earlier I was only looking at the patch mechanically and 
found a difference with the previous implementation in offset 
comparison. After you pointed out, I looked up the ARMv8 ARM and I have 
a couple of doubts:


1. My understanding is that any offset in [-1048576 ,1048572] both 
inclusive qualifies as an 'in range' offset. However, the code for both 
attribute length and far_branch has been using [-1048576 ,1048572), that 
is, ( >= && < ). If the far_branch was incorrectly calculated, then 
maybe the length calculations with similar magic numbers should also be 
corrected? Of course, I am not an expert in this and maybe this was a 
conscience decision so I would ask Ramana to maybe clarify if he remembers.


2. Now to come back to your patch, if my understanding is correct, I 
think a far_branch would be anything outside of this range, that is,
(offset < -1048576 || offset > 1048572), anything that can not be 
represented in the 21-bit range.


Thanks
Sudi



On 16 March 2018 at 00:51, Sudakshina Das  wrote:

On 15/03/18 15:27, Sameera Deshpande wrote:


Ping!

On 28 February 2018 at 16:18, Sameera Deshpande
 wrote:


On 27 February 2018 at 18:25, Ramana Radhakrishnan
 wrote:


On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
 wrote:


Hi!

Please find attached the patch to fix bug in branches with offsets over
1MiB.
There has been an attempt to fix this issue in commit
050af05b9761f1979f11c151519e7244d5becd7c

However, the far_branch attribute defined in above patch used
insn_length - which computes incorrect offset. Hence, eliminated the
attribute completely, and computed the offset from insn_addresses
instead.

Ok for trunk?

gcc/Changelog

2018-02-13 Sameera Deshpande 
  * config/aarch64/aarch64.md (far_branch): Remove attribute.
Eliminate
  all the dependencies on the attribute from RTL patterns.



I'm not a maintainer but this looks good to me modulo notes about how
this was tested. What would be nice is a testcase for the testsuite as
well as ensuring that the patch has been bootstrapped and regression
tested. AFAIR, the original patch was put in because match.pd failed
when bootstrap in another context.


regards
Ramana


--
- Thanks and regards,
Sameera D.



The patch is tested with GCC testsuite and bootstrapping successfully.
Also tested for spec benchmark.



I am not a maintainer either. I noticed that the range check you do for
the offset has a (<= || >=). The "far_branch" however did (< || >=) for a
positive value. Was that also part of the incorrect offset calculation?

@@ -692,7 +675,11 @@
 {
   if (get_attr_length (insn) =3D=3D 8)
 {
-   if (get_attr_far_branch (insn) =3D=3D 1)
+   long long int offset;
+   offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
+ - INSN_ADDRESSES (INSN_UID (insn));
+
+   if (offset <=3D -1048576 || offset >=3D 1048572)
return aarch64_gen_far_branch (operands, 2, "Ltb",
   "\\t%0, %1, ");
  else
@@ -709,12 +696,7 @@
  (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
-32768))
 (lt (minus (match_dup 2) (pc)) (const_int
32764)))
(const_int 4)
- (const_int 8)))
-   (set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
-1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int
1048572)))
- (const_int 0)
- (const_int 1)))]
+ (const_int 8)))]

   )

Thanks
Sudi


--
- Thanks and regards,
Sameera D.















Re: [PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi

2018-03-22 Thread Sudakshina Das

Hi Kyrill

On 22/03/18 16:08, Kyrill Tkachov wrote:

Hi Sudi,

On 21/03/18 17:44, Sudakshina Das wrote:

Hi

The ICE in the bug report was happening because the macro
USE_RETURN_INSN (FALSE) was returning different values at different
points in the compilation. This was internally occurring because the
function arm_compute_static_chain_stack_bytes () which was dependent on
arm_r3_live_at_start_p () was giving a different value after the
cond_exec instructions were created in ce3 causing the liveness of r3
to escape up to the start block.

The function arm_compute_static_chain_stack_bytes () should really
only compute the value once during epilogue/prologue stage. This pass
introduces a new member 'static_chain_stack_bytes' to the target
definition of the struct machine_function which gets calculated in
expand_prologue and is the value that is returned by
arm_compute_static_chain_stack_bytes () beyond that.

Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf
and added the reported test to the testsuite.

Is this ok for trunk?



Thanks for working on this.
I agree with the approach, I have a couple of comments inline.


Sudi


ChangeLog entries:

*** gcc/ChangeLog ***

2018-03-21  Sudakshina Das  

    PR target/84826
    * config/arm/arm.h (machine_function): Add
    static_chain_stack_bytes.
    * config/arm/arm.c (arm_compute_static_chain_stack_bytes):
    Avoid re-computing once computed.
    (arm_expand_prologue): Compute machine->static_chain_stack_bytes.
    (arm_init_machine_status): Initialize
    machine->static_chain_stack_bytes.

*** gcc/testsuite/ChangeLog ***

2018-03-21  Sudakshina Das  

    PR target/84826
    * gcc.target/arm/pr84826.c: New test


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index bbf3937..2809112 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function
    machine_mode thumb1_cc_mode;
    /* Set to 1 after arm_reorg has started.  */
    int after_arm_reorg;
+  /* The number of bytes used to store the static chain register on the
+ stack, above the stack frame.  */
+  int static_chain_stack_bytes;
  }
  machine_function;
  #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cb6ab81..bc31810 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)
  static int
  arm_compute_static_chain_stack_bytes (void)
  {
+  /* Once the value is updated from the init value of -1, do not
+ re-compute.  */
+  if (cfun->machine->static_chain_stack_bytes != -1)
+    return cfun->machine->static_chain_stack_bytes;
+

My concern is that this approach caches the first value that is computed 
for static_chain_stack_bytes.
I believe the layout frame code is called multiple times during register 
allocation as it goes through

the motions and I think we want the last value it computes during reload

How about we do something like:
   if (cfun->machine->static_chain_stack_bytes != -1 &&epilogue_completed)
     return cfun->machine->static_chain_stack_bytes;

?...

  /* See the defining assertion in arm_expand_prologue.  */
    if (IS_NESTED (arm_current_func_type ())
    && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
@@ -21699,6 +21704,11 @@ arm_expand_prologue (void)
    emit_insn (gen_movsi (stack_pointer_rtx, r1));
  }

+  /* Let's compute the static_chain_stack_bytes required and store it.  
Right
+ now the value must the -1 as stored by arm_init_machine_status 
().  */


... this comment would need to be tweaked as 
cfun->machine->static_chain_stack_bytes may hold
an intermediate value computed in reload or some other point before 
epilogue_completed.


+  cfun->machine->static_chain_stack_bytes
+    = arm_compute_static_chain_stack_bytes ();
+


Maybe I did not understand this completely, but my idea was that I am 
initializing the value of cfun->machine->static_chain_stack_bytes to be 
-1 in arm_init_machine_status () and then it stays as -1 all throughout 
reload and hence the function arm_compute_static_chain_stack_bytes () 
will keep computing the value instead of returning the cached value. 
Only during expand_prologue (which I assumed occurs much after reload), 
I overwrite the initial -1 and after that any call to 
arm_compute_static_chain_stack_bytes () would return this cached value.


I did start out writing the patch with a epilogue_completed check but 
realized that even during this stage 
arm_compute_static_chain_stack_bytes () was called several times and 
thus to avoid those re-computations, (again assuming by this stage we 
already should have a fixed value) I re-wrote it with the initialization 
to -1 approach.


Thanks
Sudi




Thanks,
Kyrill





Re: [PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi

2018-03-22 Thread Sudakshina Das

Hi

On 22/03/18 16:52, Kyrill Tkachov wrote:


On 22/03/18 16:20, Sudakshina Das wrote:

Hi Kyrill

On 22/03/18 16:08, Kyrill Tkachov wrote:

Hi Sudi,

On 21/03/18 17:44, Sudakshina Das wrote:

Hi

The ICE in the bug report was happening because the macro
USE_RETURN_INSN (FALSE) was returning different values at different
points in the compilation. This was internally occurring because the
function arm_compute_static_chain_stack_bytes () which was dependent on
arm_r3_live_at_start_p () was giving a different value after the
cond_exec instructions were created in ce3 causing the liveness of r3
to escape up to the start block.

The function arm_compute_static_chain_stack_bytes () should really
only compute the value once during epilogue/prologue stage. This pass
introduces a new member 'static_chain_stack_bytes' to the target
definition of the struct machine_function which gets calculated in
expand_prologue and is the value that is returned by
arm_compute_static_chain_stack_bytes () beyond that.

Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf
and added the reported test to the testsuite.

Is this ok for trunk?



Thanks for working on this.
I agree with the approach, I have a couple of comments inline.


Sudi


ChangeLog entries:

*** gcc/ChangeLog ***

2018-03-21  Sudakshina Das  

    PR target/84826
    * config/arm/arm.h (machine_function): Add
    static_chain_stack_bytes.
    * config/arm/arm.c (arm_compute_static_chain_stack_bytes):
    Avoid re-computing once computed.
    (arm_expand_prologue): Compute 
machine->static_chain_stack_bytes.

    (arm_init_machine_status): Initialize
    machine->static_chain_stack_bytes.

*** gcc/testsuite/ChangeLog ***

2018-03-21  Sudakshina Das  

    PR target/84826
    * gcc.target/arm/pr84826.c: New test


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index bbf3937..2809112 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function
    machine_mode thumb1_cc_mode;
    /* Set to 1 after arm_reorg has started.  */
    int after_arm_reorg;
+  /* The number of bytes used to store the static chain register on the
+ stack, above the stack frame.  */
+  int static_chain_stack_bytes;
  }
  machine_function;
  #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cb6ab81..bc31810 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)
  static int
  arm_compute_static_chain_stack_bytes (void)
  {
+  /* Once the value is updated from the init value of -1, do not
+ re-compute.  */
+  if (cfun->machine->static_chain_stack_bytes != -1)
+    return cfun->machine->static_chain_stack_bytes;
+

My concern is that this approach caches the first value that is 
computed for static_chain_stack_bytes.
I believe the layout frame code is called multiple times during 
register allocation as it goes through

the motions and I think we want the last value it computes during reload

How about we do something like:
   if (cfun->machine->static_chain_stack_bytes != -1 
&&epilogue_completed)

 return cfun->machine->static_chain_stack_bytes;

?...

  /* See the defining assertion in arm_expand_prologue.  */
    if (IS_NESTED (arm_current_func_type ())
    && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
@@ -21699,6 +21704,11 @@ arm_expand_prologue (void)
    emit_insn (gen_movsi (stack_pointer_rtx, r1));
  }

+  /* Let's compute the static_chain_stack_bytes required and store 
it.  Right
+ now the value must the -1 as stored by arm_init_machine_status 
().  */


... this comment would need to be tweaked as 
cfun->machine->static_chain_stack_bytes may hold
an intermediate value computed in reload or some other point before 
epilogue_completed.


+  cfun->machine->static_chain_stack_bytes
+    = arm_compute_static_chain_stack_bytes ();
+


Maybe I did not understand this completely, but my idea was that I am 
initializing the value of cfun->machine->static_chain_stack_bytes to 
be -1 in arm_init_machine_status () and then it stays as -1 all 
throughout reload and hence the function 
arm_compute_static_chain_stack_bytes () will keep computing the value 
instead of returning the cached value. Only during expand_prologue 
(which I assumed occurs much after reload), I overwrite the initial -1 
and after that any call to arm_compute_static_chain_stack_bytes () 
would return this cached value.


I did start out writing the patch with a epilogue_completed check but 
realized that even during this stage 
arm_compute_static_chain_stack_bytes () was called several times and 
thus to avoid those re-computations, (again assuming by this stage we 
already should have a fixed value) I re-wrote it with the 
initialization to -1 approach.




Right, I had read the 

Re: [PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi

2018-03-23 Thread Sudakshina Das

On 23/03/18 09:12, Kyrill Tkachov wrote:


On 23/03/18 08:47, Christophe Lyon wrote:

Hi Sudi,


On 22 March 2018 at 18:26, Sudakshina Das  wrote:

Hi


On 22/03/18 16:52, Kyrill Tkachov wrote:


On 22/03/18 16:20, Sudakshina Das wrote:

Hi Kyrill

On 22/03/18 16:08, Kyrill Tkachov wrote:

Hi Sudi,

On 21/03/18 17:44, Sudakshina Das wrote:

Hi

The ICE in the bug report was happening because the macro
USE_RETURN_INSN (FALSE) was returning different values at different
points in the compilation. This was internally occurring because the
function arm_compute_static_chain_stack_bytes () which was 
dependent on

arm_r3_live_at_start_p () was giving a different value after the
cond_exec instructions were created in ce3 causing the liveness 
of r3

to escape up to the start block.

The function arm_compute_static_chain_stack_bytes () should really
only compute the value once during epilogue/prologue stage. This 
pass

introduces a new member 'static_chain_stack_bytes' to the target
definition of the struct machine_function which gets calculated in
expand_prologue and is the value that is returned by
arm_compute_static_chain_stack_bytes () beyond that.

Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf
and added the reported test to the testsuite.

Is this ok for trunk?


Thanks for working on this.
I agree with the approach, I have a couple of comments inline.


Sudi


ChangeLog entries:

*** gcc/ChangeLog ***

2018-03-21  Sudakshina Das  

 PR target/84826
 * config/arm/arm.h (machine_function): Add
 static_chain_stack_bytes.
 * config/arm/arm.c (arm_compute_static_chain_stack_bytes):
 Avoid re-computing once computed.
 (arm_expand_prologue): Compute
machine->static_chain_stack_bytes.
 (arm_init_machine_status): Initialize
 machine->static_chain_stack_bytes.

*** gcc/testsuite/ChangeLog ***

2018-03-21  Sudakshina Das  

 PR target/84826
 * gcc.target/arm/pr84826.c: New test

The new test fails on
arm-none-linux-gnueabi
--with-mode thumb
--with-cpu cortex-a9
--with-fpu default
Dejagnu flags: -march=armv5t

Because:
/gcc/testsuite/gcc.target/arm/pr84826.c: In function 'a':
/gcc/testsuite/gcc.target/arm/pr84826.c:15:1: sorry, unimplemented:
-fstack-check=specific for Thumb-1
compiler exited with status 1
FAIL: gcc.target/arm/pr84826.c (test for excess errors)

You probably have to add a require-effective-target to skip the test
in such cases.


Yeah, these tests need a
{ dg-require-effective-target supports_stack_clash_protection }

A patch to add that is pre-approved.
Sorry for missing it in review.

Kyrill



Hi Christophe and Kyrill

How about the attached patch?
{ dg-require-effective-target supports_stack_clash_protection } is not
enabled for any of ARM targets, so this is my work around for that. There is
a comment in target-supports.exp which makes me a little hesitant in
tinkering with the require effective target code.

proc check_effective_target_supports_stack_clash_protection { } {

   # Temporary until the target bits are fully ACK'd.
#  if { [istarget aarch*-*-*] } {
#   return 1
#  }

if { [istarget x86_64-*-*] || [istarget i?86-*-*]
  || [istarget powerpc*-*-*] || [istarget rs6000*-*-*]
  || [istarget s390*-*-*] } {
return 1
}
  return 0
}

I have opened a new PR 85005 which mentions this that is meant
for GCC 9 as part for a bigger cleanup and redesign of the stack
clash protection code on ARM backend.


*** gcc/testsuite/ChangeLog ***

2018-03-23  Sudakshina Das  

PR target/84826
* gcc.target/arm/pr84826.c: Add dg directive.

Thanks
Sudi


Thanks,

Christophe


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index bbf3937..2809112 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function
 machine_mode thumb1_cc_mode;
 /* Set to 1 after arm_reorg has started.  */
 int after_arm_reorg;
+  /* The number of bytes used to store the static chain register 
on the

+ stack, above the stack frame.  */
+  int static_chain_stack_bytes;
   }
   machine_function;
   #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cb6ab81..bc31810 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)
   static int
   arm_compute_static_chain_stack_bytes (void)
   {
+  /* Once the value is updated from the init value of -1, do not
+ re-compute.  */
+  if (cfun->machine->static_chain_stack_bytes != -1)
+    return cfun->machine->static_chain_stack_bytes;
+

My concern is that this approach caches the first value that is 
computed

for static_chain_stack_bytes.
I believe the layout frame code is called multiple times during 
register

allocation as it goes through
the motions and I think we want the last value it computes during 
reload


How about 

Re: [PATCH][ARM][PR target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi

2018-03-23 Thread Sudakshina Das

On 23/03/18 13:50, Kyrill Tkachov wrote:


On 23/03/18 13:31, Sudakshina Das wrote:

On 23/03/18 09:12, Kyrill Tkachov wrote:


On 23/03/18 08:47, Christophe Lyon wrote:

Hi Sudi,


On 22 March 2018 at 18:26, Sudakshina Das  wrote:

Hi


On 22/03/18 16:52, Kyrill Tkachov wrote:


On 22/03/18 16:20, Sudakshina Das wrote:

Hi Kyrill

On 22/03/18 16:08, Kyrill Tkachov wrote:

Hi Sudi,

On 21/03/18 17:44, Sudakshina Das wrote:

Hi

The ICE in the bug report was happening because the macro
USE_RETURN_INSN (FALSE) was returning different values at 
different
points in the compilation. This was internally occurring 
because the
function arm_compute_static_chain_stack_bytes () which was 
dependent on

arm_r3_live_at_start_p () was giving a different value after the
cond_exec instructions were created in ce3 causing the liveness 
of r3

to escape up to the start block.

The function arm_compute_static_chain_stack_bytes () should really
only compute the value once during epilogue/prologue stage. 
This pass

introduces a new member 'static_chain_stack_bytes' to the target
definition of the struct machine_function which gets calculated in
expand_prologue and is the value that is returned by
arm_compute_static_chain_stack_bytes () beyond that.

Testing done: Bootstrapped and regtested on 
arm-none-linux-gnueabihf

and added the reported test to the testsuite.

Is this ok for trunk?


Thanks for working on this.
I agree with the approach, I have a couple of comments inline.


Sudi


ChangeLog entries:

*** gcc/ChangeLog ***

2018-03-21  Sudakshina Das  

 PR target/84826
 * config/arm/arm.h (machine_function): Add
 static_chain_stack_bytes.
 * config/arm/arm.c 
(arm_compute_static_chain_stack_bytes):

 Avoid re-computing once computed.
 (arm_expand_prologue): Compute
machine->static_chain_stack_bytes.
 (arm_init_machine_status): Initialize
 machine->static_chain_stack_bytes.

*** gcc/testsuite/ChangeLog ***

2018-03-21  Sudakshina Das  

 PR target/84826
 * gcc.target/arm/pr84826.c: New test

The new test fails on
arm-none-linux-gnueabi
--with-mode thumb
--with-cpu cortex-a9
--with-fpu default
Dejagnu flags: -march=armv5t

Because:
/gcc/testsuite/gcc.target/arm/pr84826.c: In function 'a':
/gcc/testsuite/gcc.target/arm/pr84826.c:15:1: sorry, unimplemented:
-fstack-check=specific for Thumb-1
compiler exited with status 1
FAIL: gcc.target/arm/pr84826.c (test for excess errors)

You probably have to add a require-effective-target to skip the test
in such cases.


Yeah, these tests need a
{ dg-require-effective-target supports_stack_clash_protection }

A patch to add that is pre-approved.
Sorry for missing it in review.

Kyrill



Hi Christophe and Kyrill

How about the attached patch?
{ dg-require-effective-target supports_stack_clash_protection } is not
enabled for any of ARM targets, so this is my work around for that. 
There is

a comment in target-supports.exp which makes me a little hesitant in
tinkering with the require effective target code.

proc check_effective_target_supports_stack_clash_protection { } {

   # Temporary until the target bits are fully ACK'd.
#  if { [istarget aarch*-*-*] } {
#   return 1
#  }

    if { [istarget x86_64-*-*] || [istarget i?86-*-*]
  || [istarget powerpc*-*-*] || [istarget rs6000*-*-*]
  || [istarget s390*-*-*] } {
    return 1
    }
  return 0
}

I have opened a new PR 85005 which mentions this that is meant
for GCC 9 as part for a bigger cleanup and redesign of the stack
clash protection code on ARM backend.




Ok. Thanks for doing this.


Thanks and sorry about this!
Committed as r258805.

Sudi


Kyrill


*** gcc/testsuite/ChangeLog ***

2018-03-23  Sudakshina Das  

    PR target/84826
    * gcc.target/arm/pr84826.c: Add dg directive.

Thanks
Sudi


Thanks,

Christophe


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index bbf3937..2809112 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function
 machine_mode thumb1_cc_mode;
 /* Set to 1 after arm_reorg has started.  */
 int after_arm_reorg;
+  /* The number of bytes used to store the static chain 
register on the

+ stack, above the stack frame.  */
+  int static_chain_stack_bytes;
   }
   machine_function;
   #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cb6ab81..bc31810 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)
   static int
   arm_compute_static_chain_stack_bytes (void)
   {
+  /* Once the value is updated from the init value of -1, do not
+ re-compute.  */
+  if (cfun->machine->static_chain_stack_bytes != -1)
+    return cfun->machine->static_chain_stack_bytes;
+

My concern is that this approach caches the first value that is 
computed

for static_chain_stack_bytes.
I believe the

[PATCH][AARCH64][PR target/84882] Add mno-strict-align

2018-03-27 Thread Sudakshina Das

Hi

This patch adds the no variant to -mstrict-align and the corresponding
function attribute. To enable the function attribute, I have modified
aarch64_can_inline_p () to allow checks even when the callee function
has no attribute. The need for this is shown by the new test
target_attr_18.c.

Testing: Bootstrapped, regtested and added new tests that are copies
of earlier tests checking -mstrict-align with opposite scan directives.

Is this ok for trunk?

Sudi


*** gcc/ChangeLog ***

2018-03-27  Sudakshina Das  

* common/config/aarch64/aarch64-common.c (aarch64_handle_option):
Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags.
* config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative.
* config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg
as true for strict-align.
(aarch64_can_inline_p): Perform checks even when callee has no
attributes to check for strict alignment.
* doc/extend.texi (AArch64 Function Attributes): Document
no-strict-align.
* doc/invoke.texi: (AArch64 Options): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-27  Sudakshina Das  

* gcc.target/aarch64/pr84882.c: New test.
* gcc.target/aarch64/target_attr_18.c: Likewise.
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 7fd9305..d5655a0 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -97,7 +97,10 @@ aarch64_handle_option (struct gcc_options *opts,
   return true;
 
 case OPT_mstrict_align:
-  opts->x_target_flags |= MASK_STRICT_ALIGN;
+  if (val)
+	opts->x_target_flags |= MASK_STRICT_ALIGN;
+  else
+	opts->x_target_flags &= ~MASK_STRICT_ALIGN;
   return true;
 
 case OPT_momit_leaf_frame_pointer:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4b5183b..4f35a6c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11277,7 +11277,7 @@ static const struct aarch64_attribute_info aarch64_attributes[] =
   { "fix-cortex-a53-843419", aarch64_attr_bool, true, NULL,
  OPT_mfix_cortex_a53_843419 },
   { "cmodel", aarch64_attr_enum, false, NULL, OPT_mcmodel_ },
-  { "strict-align", aarch64_attr_mask, false, NULL, OPT_mstrict_align },
+  { "strict-align", aarch64_attr_mask, true, NULL, OPT_mstrict_align },
   { "omit-leaf-frame-pointer", aarch64_attr_bool, true, NULL,
  OPT_momit_leaf_frame_pointer },
   { "tls-dialect", aarch64_attr_enum, false, NULL, OPT_mtls_dialect_ },
@@ -11640,16 +11640,13 @@ aarch64_can_inline_p (tree caller, tree callee)
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
-  /* If callee has no option attributes, then it is ok to inline.  */
-  if (!callee_tree)
-return true;
-
   struct cl_target_option *caller_opts
 	= TREE_TARGET_OPTION (caller_tree ? caller_tree
 	   : target_option_default_node);
 
-  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
-
+  struct cl_target_option *callee_opts
+	= TREE_TARGET_OPTION (callee_tree ? callee_tree
+	   : target_option_default_node);
 
   /* Callee's ISA flags should be a subset of the caller's.  */
   if ((caller_opts->x_aarch64_isa_flags & callee_opts->x_aarch64_isa_flags)
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 52eaf8c..1426b45 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -85,7 +85,7 @@ Target RejectNegative Joined Enum(cmodel) Var(aarch64_cmodel_var) Init(AARCH64_C
 Specify the code model.
 
 mstrict-align
-Target Report RejectNegative Mask(STRICT_ALIGN) Save
+Target Report Mask(STRICT_ALIGN) Save
 Don't assume that unaligned accesses are handled by the system.
 
 momit-leaf-frame-pointer
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 93a0ebc..dcda216 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3605,8 +3605,10 @@ for the command line option @option{-mcmodel=}.
 @item strict-align
 @cindex @code{strict-align} function attribute, AArch64
 Indicates that the compiler should not assume that unaligned memory references
-are handled by the system.  The behavior is the same as for the command-line
-option @option{-mstrict-align}.
+are handled by the system.  To allow the compiler to assume that aligned memory
+references are handled by the system, the inverse attribute
+@code{no-strict-align} can be specified.  The behavior is the same as for the
+command-line option @option{-mstrict-align} and @option{-mno-strict-align}.
 
 @item omit-leaf-frame-pointer
 @cindex @code{omit-leaf-frame-pointer} function attribute, AArch64
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index feacd56..0574d21 100644
--- a/gcc

Re: [PATCH][ARM][PR82989] Fix unexpected use of NEON instructions for shifts

2018-03-27 Thread Sudakshina Das

On 21/03/18 11:40, Sudakshina Das wrote:

Hi

On 21/03/18 08:51, Christophe Lyon wrote:

On 20 March 2018 at 11:58, Sudakshina Das  wrote:

Hi

On 20/03/18 10:03, Richard Earnshaw (lists) wrote:


On 14/03/18 10:11, Sudakshina Das wrote:


Hi

This patch fixes PR82989 so that we avoid NEON instructions when
-mneon-for-64bits is not enabled. This is more of a short term fix for
the real deeper problem of making and early decision of choosing or
rejecting NEON instructions. There is now a new ticket PR84467 to deal
with the longer term solution.
(Please refer to the discussion in the bug report for more details).

Testing: Bootstrapped and regtested on arm-none-linux-gnueabihf and
added a new test case based on the test given on the bug report.

Ok for trunk and backports for gcc-7 and gcc-6 branches?



OK for trunk.  Please leave it a couple of days before backporting to
ensure that the testcase doesn't tickle any multilib issues.

R.



Thanks. Committed to trunk as r258677. Will wait a week for backporting.


Backported both the commits of trunks to gcc-7 as r258883 and to gcc-6 
as r258884 (Reg-tested for both)


Thanks
Sudi



Sudi



Hi Sudi,

I've noticed that:
FAIL:    gcc.target/arm/pr82989.c scan-assembler-times lsl\\tr[0-9]+,
r[0-9]+, r[0-9] 2
FAIL:    gcc.target/arm/pr82989.c scan-assembler-times lsr\\tr[0-9]+,
r[0-9]+, r[0-9] 2
on target armeb-none-linux-gnueabihf
--with-mode thumb
--with-cpu cortex-a9
--with-fpu neon-fp16

The tests pass when using --with-mode arm

Can you check?


Yes I see this as well. Sorry about this. I am testing a quick fix for 
this at the moment.


Thanks
Sudi



Thanks

Christophe






Sudi


*** gcc/ChangeLog ***

2018-03-14  Sudakshina Das  

  * config/arm/neon.md (ashldi3_neon): Update ?s for constraints
  to favor GPR over NEON registers.
  (di3_neon): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-14  Sudakshina Das  

  * gcc.target/arm/pr82989.c: New test.

pr82989.diff


diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a6f5d7..1646b21 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1180,12 +1180,12 @@
   )
 (define_insn_and_split "ashldi3_neon"
-  [(set (match_operand:DI 0 "s_register_operand"   "= w,
w,?&r,?r,?&r, ?w,w")
-   (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, 
w, 0r,

0,  r, 0w,w")
-  (match_operand:SI 2 "general_operand"    "rUm, 
i,  r,

i,  i,rUm,i")))
-   (clobber (match_scratch:SI 3
"= X,

X,?&r, X,  X,  X,X"))
-   (clobber (match_scratch:SI 4
"= X,

X,?&r, X,  X,  X,X"))
-   (clobber (match_scratch:DI 5
"=&w,

X,  X, X,  X, &w,X"))
+  [(set (match_operand:DI 0 "s_register_operand"   "= w, 
w, &r,

r, &r, ?w,?w")
+   (ashift:DI (match_operand:DI 1 "s_register_operand" " 0w, 
w, 0r,

0,  r, 0w, w")
+  (match_operand:SI 2 "general_operand"    "rUm, 
i,  r,

i,  i,rUm, i")))
+   (clobber (match_scratch:SI 3
"= X,

X, &r, X,  X,  X, X"))
+   (clobber (match_scratch:SI 4
"= X,

X, &r, X,  X,  X, X"))
+   (clobber (match_scratch:DI 5
"=&w,

X,  X, X,  X, &w, X"))
  (clobber (reg:CC_C CC_REGNUM))]
 "TARGET_NEON"
 "#"
@@ -1276,7 +1276,7 @@
   ;; ashrdi3_neon
   ;; lshrdi3_neon
   (define_insn_and_split "di3_neon"
-  [(set (match_operand:DI 0 "s_register_operand"    "= w,
w,?&r,?r,?&r,?w,?w")
+  [(set (match_operand:DI 0 "s_register_operand"    "= w, 
w, &r,

r, &r,?w,?w")
 (RSHIFTS:DI (match_operand:DI 1 "s_register_operand" " 0w, 
w, 0r,

0,  r,0w, w")
 (match_operand:SI 2 "reg_or_int_operand" "  r, 
i,  r,

i,  i, r, i")))
  (clobber (match_scratch:SI 3
"=2r, X, &r, X,  X,2r, X"))
diff --git a/gcc/testsuite/gcc.target/arm/pr82989.c
b/gcc/testsuite/gcc.target/arm/pr82989.c
new file mode 100644
index 000..1295ee6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr82989.c
@@ -0,0 +1,38 @@
+/* PR target/82989 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } {
"-mcpu=*" } { "-mcpu=cortex-a8" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } {
"-mfpu=*" } { "-mfpu=neon" } } */
+/* { dg-skip-if "avoid conflicts with multilib options" { *-*-* } {
"-mfloat-abi=*" 

Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md

2018-03-28 Thread Sudakshina Das

Hi

On 20/03/18 10:57, Sudakshina Das wrote:

Hi

On 20/03/18 08:13, Christophe Lyon wrote:

On 19 March 2018 at 19:55, Sudakshina Das  wrote:

Hi


On 19/03/18 14:29, James Greenhalgh wrote:


On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote:


Hi

This patch fixes the inconsistent behavior observed at -O3 for the
unordered comparisons. According to the online docs

(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 


all of the following should not raise an FP exception:
- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception 
raising
instructions such as fcmgt. This patch changes the instructions 
that are
emitted to in order to not give out the exceptions. We first check 
each

operand for NaNs and force any elements containing NaN to zero before
using them in the compare.

Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 
0.0 :

a, isnan (b) ? 0.0 : b))


The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq 
(a,b)).


Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and added a new test case.

Is this ok for trunk? This will probably need a back-port to
gcc-7-branch as well.



OK.

Let it soak on trunk for a while before the backport.


Backported both r258653 and Christophe's r258672 to gcc-7-branch
as r258917 (reg-tested). Needed non-functional edits because
of the name change from vec_cmp to
vec_cmp between gcc-7 and trunk.

Thanks
Sudi




Thanks. Committed to trunk as r258653. Will wait a week before backport.



Hi,

As the test failed to compile on aarch64 bare-metal targets, I added
/* { dg-require-effective-target fenv_exceptions } */
as obvious (r258672).

2018-03-20  Christophe Lyon  

    PR target/81647
    * gcc.target/aarch64/pr81647.c: Require fenv_exceptions.

Index: testsuite/gcc.target/aarch64/pr81647.c
===
--- testsuite/gcc.target/aarch64/pr81647.c  (revision 258671)
+++ testsuite/gcc.target/aarch64/pr81647.c  (revision 258672)
@@ -1,5 +1,6 @@
  /* { dg-do run } */
  /* { dg-options "-O3 -fdump-tree-ssa" } */
+/* { dg-require-effective-target fenv_exceptions } */

  #include 



Christophe


Thanks for fixing this and apologies for missing it on the first place!

Sudi




Sudi




Thanks,
James


ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * config/aarch64/aarch64-simd.md 
(vec_cmp):

Modify
instructions for
 UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * gcc.target/aarch64/pr81647.c: New.




diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index
f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 


100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2731,10 +2731,10 @@
   break;
 }
 /* Fall through.  */
-    case UNGE:
+    case UNLT:
 std::swap (operands[2], operands[3]);
 /* Fall through.  */
-    case UNLE:
+    case UNGT:
   case GT:
 comparison = gen_aarch64_cmgt;
 break;
@@ -2745,10 +2745,10 @@
   break;
 }
 /* Fall through.  */
-    case UNGT:
+    case UNLE:
 std::swap (operands[2], operands[3]);
 /* Fall through.  */
-    case UNLT:
+    case UNGE:
   case GE:
 comparison = gen_aarch64_cmge;
 break;
@@ -2771,21 +2771,35 @@
   case UNGT:
   case UNLE:
   case UNLT:
-    case NE:
-  /* FCM returns false for lanes which are unordered, so if we 
use

-    the inverse of the comparison we actually want to emit, then
-    invert the result, we will end up with the correct result.
-    Note that a NE NaN and NaN NE b are true for all a, b.
-
-    Our transformations are:
-    a UNGE b -> !(b GT a)
-    a UNGT b -> !(b GE a)
-    a UNLE b -> !(a GT b)
-    a UNLT b -> !(a GE b)
-    a   NE b -> !(a EQ b)  */
-  gcc_assert (comparison != NULL);
-  emit_insn (comparison (operands[0], operands[2], operands[3]));
-  emit_insn (gen_one_cmpl2 (operands[0], 
operands[0]));

+  {
+   /* All of the above must not raise any FP exceptions.  Thus we
first
+  check each operand for NaNs and force any elements 
containing

NaN to
+  zero before using them in the compare.
+  Example: UN (a, b) -> UNORDERED (a, b) |
+    (cm (isnan (a) ? 0.0 : a,
+ isnan (b) ? 0.0 : b))
+  We use the following tr

[PATCH, GCC-7, GCC-6][ARM][PR target/84826] Backport Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabihf

2018-03-28 Thread Sudakshina Das

Hi

This patch is a request to backport r258777 and r258805 to gcc-7-branch
and gcc-6-branch. The same ICE occurs in both the branches with
-fstack-check. Thus the test case directive has been changed.

The discussion on the patch that went into trunk is:
https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01120.html

Testing : Regtested on both the branches with arm-none-linux-gnueabihf

Is this ok for gcc-7 and gcc-6?

Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2018-03-28  Sudakshina Das  

Backport from mainline
2018-03-22  Sudakshina Das  

PR target/84826
* config/arm/arm.h (machine_function): Add 
static_chain_stack_bytes.

* config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid
re-computing once computed.
(arm_expand_prologue): Compute machine->static_chain_stack_bytes.
(arm_init_machine_status): Initialize
machine->static_chain_stack_bytes.

*** gcc/testsuite/ChangeLog ***

2018-03-28  Sudakshina Das  

* gcc.target/arm/pr84826.c: Change dg-option to -fstack-check.

Backport from mainline
2018-03-23  Sudakshina Das  

PR target/84826
* gcc.target/arm/pr84826.c: Add dg directive.

Backport from mainline
2018-03-22  Sudakshina Das  

PR target/84826
* gcc.target/arm/pr84826.c: New test.
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 25953f5..68a6fa5 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1420,6 +1420,9 @@ typedef struct GTY(()) machine_function
   machine_mode thumb1_cc_mode;
   /* Set to 1 after arm_reorg has started.  */
   int after_arm_reorg;
+  /* The number of bytes used to store the static chain register on the
+ stack, above the stack frame.  */
+  int static_chain_stack_bytes;
 }
 machine_function;
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6f7ca43..886bcfa 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19097,6 +19097,11 @@ arm_r3_live_at_start_p (void)
 static int
 arm_compute_static_chain_stack_bytes (void)
 {
+  /* Once the value is updated from the init value of -1, do not
+ re-compute.  */
+  if (cfun->machine->static_chain_stack_bytes != -1)
+return cfun->machine->static_chain_stack_bytes;
+
   /* See the defining assertion in arm_expand_prologue.  */
   if (IS_NESTED (arm_current_func_type ())
   && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
@@ -21395,6 +21400,11 @@ arm_expand_prologue (void)
   emit_insn (gen_movsi (stack_pointer_rtx, r1));
 }
 
+  /* Let's compute the static_chain_stack_bytes required and store it.  Right
+ now the value must the -1 as stored by arm_init_machine_status ().  */
+  cfun->machine->static_chain_stack_bytes
+= arm_compute_static_chain_stack_bytes ();
+
   /* The static chain register is the same as the IP register.  If it is
  clobbered when creating the frame, we need to save and restore it.  */
   clobber_ip = IS_NESTED (func_type)
@@ -24542,6 +24552,7 @@ arm_init_machine_status (void)
 #if ARM_FT_UNKNOWN != 0
   machine->func_type = ARM_FT_UNKNOWN;
 #endif
+  machine->static_chain_stack_bytes = -1;
   return machine;
 }
 
diff --git a/gcc/testsuite/gcc.target/arm/pr84826.c b/gcc/testsuite/gcc.target/arm/pr84826.c
new file mode 100644
index 000..563ce51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr84826.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-options "-Ofast -fstack-check" } */
+
+void d (void *);
+
+void a ()
+{
+  int b;
+  void bar (int c)
+  {
+if (__builtin_expect (c, 0))
+  ++b;
+  }
+  d (bar);
+}


Re: [PATCH, GCC-7, GCC-6][ARM][PR target/84826] Backport Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabihf

2018-03-29 Thread Sudakshina Das

Hi Kyrill

On 29/03/18 09:41, Kyrill Tkachov wrote:

Hi Sudi,

On 28/03/18 15:04, Sudakshina Das wrote:

Hi

This patch is a request to backport r258777 and r258805 to gcc-7-branch
and gcc-6-branch. The same ICE occurs in both the branches with
-fstack-check. Thus the test case directive has been changed.

The discussion on the patch that went into trunk is:
https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01120.html

Testing : Regtested on both the branches with arm-none-linux-gnueabihf

Is this ok for gcc-7 and gcc-6?



Ok.
Thanks,
Kyrill


Thanks! Committed to gcc-7-branch as r258948 and gcc-6-branch as r258949.

Sudi




Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2018-03-28  Sudakshina Das  

    Backport from mainline
    2018-03-22  Sudakshina Das  

    PR target/84826
    * config/arm/arm.h (machine_function): Add 
static_chain_stack_bytes.

    * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Avoid
    re-computing once computed.
    (arm_expand_prologue): Compute machine->static_chain_stack_bytes.
    (arm_init_machine_status): Initialize
    machine->static_chain_stack_bytes.

*** gcc/testsuite/ChangeLog ***

2018-03-28  Sudakshina Das  

    * gcc.target/arm/pr84826.c: Change dg-option to -fstack-check.

    Backport from mainline
    2018-03-23  Sudakshina Das  

    PR target/84826
    * gcc.target/arm/pr84826.c: Add dg directive.

    Backport from mainline
    2018-03-22  Sudakshina Das  

    PR target/84826
    * gcc.target/arm/pr84826.c: New test.






Re: [Aarch64] Fix conditional branches with target far away.

2018-03-29 Thread Sudakshina Das

Hi Sameera

On 29/03/18 11:44, Sameera Deshpande wrote:

Hi Sudakshina,

Thanks for pointing that out. Updated the conditions for attribute
length to take care of boundary conditions for offset range.

Please find attached the updated patch.

I have tested it for gcc testsuite and the failing testcase. Ok for trunk?


Thank you so much for fixing the length as well along with you patch.
You mention a failing testcase? Maybe it would be helpful to add that
to the patch for the gcc testsuite.

Sudi



On 22 March 2018 at 19:06, Sudakshina Das  wrote:

Hi Sameera

On 22/03/18 02:07, Sameera Deshpande wrote:


Hi Sudakshina,

As per the ARMv8 ARM, for the offset range (-1048576 ,1048572), the
far branch instruction offset is inclusive of both the offsets. Hence,
I am using <=||=> and not <||>= as it was in previous implementation.



I have to admit earlier I was only looking at the patch mechanically and
found a difference with the previous implementation in offset comparison.
After you pointed out, I looked up the ARMv8 ARM and I have a couple of
doubts:

1. My understanding is that any offset in [-1048576 ,1048572] both inclusive
qualifies as an 'in range' offset. However, the code for both attribute
length and far_branch has been using [-1048576 ,1048572), that is, ( >= && <
). If the far_branch was incorrectly calculated, then maybe the length
calculations with similar magic numbers should also be corrected? Of course,
I am not an expert in this and maybe this was a conscience decision so I
would ask Ramana to maybe clarify if he remembers.

2. Now to come back to your patch, if my understanding is correct, I think a
far_branch would be anything outside of this range, that is,
(offset < -1048576 || offset > 1048572), anything that can not be
represented in the 21-bit range.

Thanks
Sudi




On 16 March 2018 at 00:51, Sudakshina Das  wrote:


On 15/03/18 15:27, Sameera Deshpande wrote:



Ping!

On 28 February 2018 at 16:18, Sameera Deshpande
 wrote:



On 27 February 2018 at 18:25, Ramana Radhakrishnan
 wrote:



On Wed, Feb 14, 2018 at 8:30 AM, Sameera Deshpande
 wrote:



Hi!

Please find attached the patch to fix bug in branches with offsets
over
1MiB.
There has been an attempt to fix this issue in commit
050af05b9761f1979f11c151519e7244d5becd7c

However, the far_branch attribute defined in above patch used
insn_length - which computes incorrect offset. Hence, eliminated the
attribute completely, and computed the offset from insn_addresses
instead.

Ok for trunk?

gcc/Changelog

2018-02-13 Sameera Deshpande 
   * config/aarch64/aarch64.md (far_branch): Remove attribute.
Eliminate
   all the dependencies on the attribute from RTL patterns.



I'm not a maintainer but this looks good to me modulo notes about how
this was tested. What would be nice is a testcase for the testsuite as
well as ensuring that the patch has been bootstrapped and regression
tested. AFAIR, the original patch was put in because match.pd failed
when bootstrap in another context.


regards
Ramana


--
- Thanks and regards,
 Sameera D.




The patch is tested with GCC testsuite and bootstrapping successfully.
Also tested for spec benchmark.



I am not a maintainer either. I noticed that the range check you do for
the offset has a (<= || >=). The "far_branch" however did (< || >=) for a
positive value. Was that also part of the incorrect offset calculation?

@@ -692,7 +675,11 @@
  {
if (get_attr_length (insn) =3D=3D 8)
  {
-   if (get_attr_far_branch (insn) =3D=3D 1)
+   long long int offset;
+   offset =3D INSN_ADDRESSES (INSN_UID (XEXP (operands[2], 0)))
+ - INSN_ADDRESSES (INSN_UID (insn));
+
+   if (offset <=3D -1048576 || offset >=3D 1048572)
 return aarch64_gen_far_branch (operands, 2, "Ltb",
"\\t%0, %1, ");
   else
@@ -709,12 +696,7 @@
   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
-32768))
  (lt (minus (match_dup 2) (pc)) (const_int
32764)))
 (const_int 4)
- (const_int 8)))
-   (set (attr "far_branch")
-   (if_then_else (and (ge (minus (match_dup 2) (pc)) (const_int
-1048576))
-  (lt (minus (match_dup 2) (pc)) (const_int
1048572)))
- (const_int 0)
- (const_int 1)))]
+ (const_int 8)))]

)

Thanks
Sudi


--
- Thanks and regards,
 Sameera D.






















Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-10 Thread Sudakshina Das

Hi Sameera

On 10/04/18 11:20, Sameera Deshpande wrote:

On 7 April 2018 at 01:25, Christophe Lyon  wrote:

Hi,

2018-04-06 12:15 GMT+02:00 Sameera Deshpande :

Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?


Thanks for the update.

Since the new intrinsics are only available on aarch64, you want to
prevent the tests from running on arm.
Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two targets.
There are several examples on how to do that in that directory.

I have also noticed that the tests fail at execution on aarch64_be.

I didn't look at the patch in details.

Christophe




- Thanks and regards,
   Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon :

2017-12-14 9:29 GMT+01:00 Sameera Deshpande :

Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
   Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  


 * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
 (st1x2): Likewise.
 (st1x3): Likewise.
 * config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
 (aarch64_ld1_x3_): Likewise
 (aarch64_st1x2): Likewise
 (aarch64_st1_x2_): Likewise
 (aarch64_st1x3): Likewise
 (aarch64_st1_x3_): Likewise
 * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
 (vld1_s8_x3): Likewise.
 (vld1_u16_x3): Likewise.
 (vld1_s16_x3): Likewise.
 (vld1_u32_x3): Likewise.
 (vld1_s32_x3): Likewise.
 (vld1_u64_x3): Likewise.
 (vld1_s64_x3): Likewise.
 (vld1_fp16_x3): Likewise.
 (vld1_f32_x3): Likewise.
 (vld1_f64_x3): Likewise.
 (vld1_p8_x3): Likewise.
 (vld1_p16_x3): Likewise.
 (vld1_p64_x3): Likewise.
 (vld1q_u8_x3): Likewise.
 (vld1q_s8_x3): Likewise.
 (vld1q_u16_x3): Likewise.
 (vld1q_s16_x3): Likewise.
 (vld1q_u32_x3): Likewise.
 (vld1q_s32_x3): Likewise.
 (vld1q_u64_x3): Likewise.
 (vld1q_s64_x3): Likewise.
 (vld1q_f16_x3): Likewise.
 (vld1q_f32_x3): Likewise.
 (vld1q_f64_x3): Likewise.
 (vld1q_p8_x3): Likewise.
 (vld1q_p16_x3): Likewise.
 (vld1q_p64_x3): Likewise.
 (vst1_s64_x2): Likewise.
 (vst1_u64_x2): Likewise.
 (vst1_f64_x2): 
Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3

patchname=armv8_2-fp16-scalar-2.patch3
refrev=259064
email_to=christophe.l...@linaro.org


 (vst1_s8_x2): Likewise.
 (vst1_p8_x2): Likewise.
 (vst1_s16_x2): Likewise.
 (vst1_p16_x2): Likewise.
 (vst1_s32_x2): Likewise.
 (vst1_u8_x2): Likewise.
 (vst1_u16_x2): Likewise.
 (vst1_u32_x2): Likewise.
 (vst1_f16_x2): Likewise.
 (vst1_f32_x2): Likewise.
 (vst1_p64_x2): Likewise.
 (vst1q_s8_x2): Likewise.
 (vst1q_p8_x2): Likewise.
 (vst1q_s16_x2): Likewise.
 (vst1q_p16_x2): Likewise.
 (vst1q_s32_x2): Likewise.
 (vst1q_s64_x2): Likewise.
 (vst1q_u8_x2): Likewise.
 (vst1q_u16_x2): Likewise.
 (vst1q_u32_x2): Likewise.
 (vst1q_u64_x2): Likewise.
 (vst1q_f16_x2): Likewise.
 (vst1q_f32_x2): Likewise.
 (vst1q_f64_x2): Likewise.
 (vst1q_p64_x2): Likewise.
 (vst1_s64_x3): Likewise.
 (vst1_u64_x3): Likewise.
 (vst1_f64_x3): Likewise.
 (vst1_s8_x3): Likewise.
 (vst1_p8_x3): Likewise.
 (vst1_s16_x3): Likewise.
 (vst1_p16_x3): Likewise.
 (vst1_s32_x3): Likewise.
 (vst1_u8_x3): Likewise.
 (vst1_u16_x3): Likewise.
 (vst1_u32_x3): Likewise.
 (vst1_f16_x3): Likewise.
 (vst1_f32_x3): Likewise.
 (vst1_p64_x3): Likewise.
 (vst1q_s8_x3): Likewise.
 (vst1q_p8_x3): Likewise.
 (vst1q_s16_x3): Likewise.
 (vst1q_p16_x3): Likewise.
 (vst1q_s32_x3): Likewise.
 (vst1q_s64_x3): Likewise.
 (vst1q_u8_x3): Likewise.
 (vst1q_u16_x3): Likewise.
 (vst1q_u32_x3): Likewise.
 (vst1q_u64_x3): Likewise.
 (vst1q_f16_x3): Likewise.
 (vst1q_f32_x3): Likewise.
 (vst1q_f64_x3): Likewise.
 (vst1q_p64_x3): Likewise.


Hi,
I'm not a maintainer, but I suspect you should add some tests.

Christophe




--
- Thanks and regards,
   Sameera D.


Hi Christophe,

Please find attached the updated patch. Similar to the testcase
vld1x2.c, I have updated the testcases to mark them XFAIL for ARM, as
the intrinsics are not implemented yet. I have also added required
target to be little endian.


I am not a maintainer either. Shouldn't these intrinsics be supported
even for big endian?

From your patch:


diff --git a/gcc/testsuite/gcc.target/aarch64/a

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sudakshina Das

Hi Sameera

On 11/04/18 09:04, Sameera Deshpande wrote:

On 10 April 2018 at 20:07, Sudakshina Das  wrote:

Hi Sameera


On 10/04/18 11:20, Sameera Deshpande wrote:


On 7 April 2018 at 01:25, Christophe Lyon 
wrote:


Hi,

2018-04-06 12:15 GMT+02:00 Sameera Deshpande
:


Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?



Thanks for the update.

Since the new intrinsics are only available on aarch64, you want to
prevent the tests from running on arm.
Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
targets.
There are several examples on how to do that in that directory.

I have also noticed that the tests fail at execution on aarch64_be.

I didn't look at the patch in details.

Christophe




- Thanks and regards,
Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon :


2017-12-14 9:29 GMT+01:00 Sameera Deshpande
:


Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  


  * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
  (st1x2): Likewise.
  (st1x3): Likewise.
  * config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
  (aarch64_ld1_x3_): Likewise
  (aarch64_st1x2): Likewise
  (aarch64_st1_x2_): Likewise
  (aarch64_st1x3): Likewise
  (aarch64_st1_x3_): Likewise
  * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
  (vld1_s8_x3): Likewise.
  (vld1_u16_x3): Likewise.
  (vld1_s16_x3): Likewise.
  (vld1_u32_x3): Likewise.
  (vld1_s32_x3): Likewise.
  (vld1_u64_x3): Likewise.
  (vld1_s64_x3): Likewise.
  (vld1_fp16_x3): Likewise.
  (vld1_f32_x3): Likewise.
  (vld1_f64_x3): Likewise.
  (vld1_p8_x3): Likewise.
  (vld1_p16_x3): Likewise.
  (vld1_p64_x3): Likewise.
  (vld1q_u8_x3): Likewise.
  (vld1q_s8_x3): Likewise.
  (vld1q_u16_x3): Likewise.
  (vld1q_s16_x3): Likewise.
  (vld1q_u32_x3): Likewise.
  (vld1q_s32_x3): Likewise.
  (vld1q_u64_x3): Likewise.
  (vld1q_s64_x3): Likewise.
  (vld1q_f16_x3): Likewise.
  (vld1q_f32_x3): Likewise.
  (vld1q_f64_x3): Likewise.
  (vld1q_p8_x3): Likewise.
  (vld1q_p16_x3): Likewise.
  (vld1q_p64_x3): Likewise.
  (vst1_s64_x2): Likewise.
  (vst1_u64_x2): Likewise.
  (vst1_f64_x2):
Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3


patchname=armv8_2-fp16-scalar-2.patch3
refrev=259064
email_to=christophe.l...@linaro.org


  (vst1_s8_x2): Likewise.
  (vst1_p8_x2): Likewise.
  (vst1_s16_x2): Likewise.
  (vst1_p16_x2): Likewise.
  (vst1_s32_x2): Likewise.
  (vst1_u8_x2): Likewise.
  (vst1_u16_x2): Likewise.
  (vst1_u32_x2): Likewise.
  (vst1_f16_x2): Likewise.
  (vst1_f32_x2): Likewise.
  (vst1_p64_x2): Likewise.
  (vst1q_s8_x2): Likewise.
  (vst1q_p8_x2): Likewise.
  (vst1q_s16_x2): Likewise.
  (vst1q_p16_x2): Likewise.
  (vst1q_s32_x2): Likewise.
  (vst1q_s64_x2): Likewise.
  (vst1q_u8_x2): Likewise.
  (vst1q_u16_x2): Likewise.
  (vst1q_u32_x2): Likewise.
  (vst1q_u64_x2): Likewise.
  (vst1q_f16_x2): Likewise.
  (vst1q_f32_x2): Likewise.
  (vst1q_f64_x2): Likewise.
  (vst1q_p64_x2): Likewise.
  (vst1_s64_x3): Likewise.
  (vst1_u64_x3): Likewise.
  (vst1_f64_x3): Likewise.
  (vst1_s8_x3): Likewise.
  (vst1_p8_x3): Likewise.
  (vst1_s16_x3): Likewise.
  (vst1_p16_x3): Likewise.
  (vst1_s32_x3): Likewise.
  (vst1_u8_x3): Likewise.
  (vst1_u16_x3): Likewise.
  (vst1_u32_x3): Likewise.
  (vst1_f16_x3): Likewise.
  (vst1_f32_x3): Likewise.
  (vst1_p64_x3): Likewise.
  (vst1q_s8_x3): Likewise.
  (vst1q_p8_x3): Likewise.
  (vst1q_s16_x3): Likewise.
  (vst1q_p16_x3): Likewise.
  (vst1q_s32_x3): Likewise.
  (vst1q_s64_x3): Likewise.
  (vst1q_u8_x3): Likewise.
  (vst1q_u16_x3): Likewise.
  (vst1q_u32_x3): Likewise.
  (vst1q_u64_x3): Likewise.
  (vst1q_f16_x3): Likewise.
  (vst1q_f32_x3): Likewise.
  (vst1q_f64_x3): Likewise.
  (vst1q_p64_x3): Likewise.



Hi,
I'm not a maintainer, but I suspect you should add some tests.

Christophe





--
- Thanks and regards,
Sameera D.



Hi Christophe,

Please find attached the updated patch. Similar to the testcase
vld1x2.c, I have updated the testcases to mark them XFAIL for ARM, as
the intrinsics are not imple

Re: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics

2018-04-11 Thread Sudakshina Das

Hi Sameera

On 11/04/18 13:05, Sameera Deshpande wrote:

On 11 April 2018 at 15:53, Sudakshina Das  wrote:

Hi Sameera


On 11/04/18 09:04, Sameera Deshpande wrote:


On 10 April 2018 at 20:07, Sudakshina Das  wrote:


Hi Sameera


On 10/04/18 11:20, Sameera Deshpande wrote:



On 7 April 2018 at 01:25, Christophe Lyon 
wrote:



Hi,

2018-04-06 12:15 GMT+02:00 Sameera Deshpande
:



Hi Christophe,

Please find attached the updated patch with testcases.

Ok for trunk?




Thanks for the update.

Since the new intrinsics are only available on aarch64, you want to
prevent the tests from running on arm.
Indeed gcc.target/aarch64/advsimd-intrinsics/ is shared between the two
targets.
There are several examples on how to do that in that directory.

I have also noticed that the tests fail at execution on aarch64_be.

I didn't look at the patch in details.

Christophe




- Thanks and regards,
 Sameera D.

2017-12-14 22:17 GMT+05:30 Christophe Lyon
:



2017-12-14 9:29 GMT+01:00 Sameera Deshpande
:



Hi!

Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and
vst1_*_x3 intrinsics as defined by Neon document.

Ok for trunk?

- Thanks and regards,
 Sameera D.

gcc/Changelog:

2017-11-14  Sameera Deshpande  


   * config/aarch64/aarch64-simd-builtins.def (ld1x3): New.
   (st1x2): Likewise.
   (st1x3): Likewise.
   * config/aarch64/aarch64-simd.md
(aarch64_ld1x3): New pattern.
   (aarch64_ld1_x3_): Likewise
   (aarch64_st1x2): Likewise
   (aarch64_st1_x2_): Likewise
   (aarch64_st1x3): Likewise
   (aarch64_st1_x3_): Likewise
   * config/aarch64/arm_neon.h (vld1_u8_x3): New function.
   (vld1_s8_x3): Likewise.
   (vld1_u16_x3): Likewise.
   (vld1_s16_x3): Likewise.
   (vld1_u32_x3): Likewise.
   (vld1_s32_x3): Likewise.
   (vld1_u64_x3): Likewise.
   (vld1_s64_x3): Likewise.
   (vld1_fp16_x3): Likewise.
   (vld1_f32_x3): Likewise.
   (vld1_f64_x3): Likewise.
   (vld1_p8_x3): Likewise.
   (vld1_p16_x3): Likewise.
   (vld1_p64_x3): Likewise.
   (vld1q_u8_x3): Likewise.
   (vld1q_s8_x3): Likewise.
   (vld1q_u16_x3): Likewise.
   (vld1q_s16_x3): Likewise.
   (vld1q_u32_x3): Likewise.
   (vld1q_s32_x3): Likewise.
   (vld1q_u64_x3): Likewise.
   (vld1q_s64_x3): Likewise.
   (vld1q_f16_x3): Likewise.
   (vld1q_f32_x3): Likewise.
   (vld1q_f64_x3): Likewise.
   (vld1q_p8_x3): Likewise.
   (vld1q_p16_x3): Likewise.
   (vld1q_p64_x3): Likewise.
   (vst1_s64_x2): Likewise.
   (vst1_u64_x2): Likewise.
   (vst1_f64_x2):

Likewise.patchurl=http://people.linaro.org/~christophe.lyon/armv8_2-fp16-scalar-2.patch3



patchname=armv8_2-fp16-scalar-2.patch3
refrev=259064
email_to=christophe.l...@linaro.org


   (vst1_s8_x2): Likewise.
   (vst1_p8_x2): Likewise.
   (vst1_s16_x2): Likewise.
   (vst1_p16_x2): Likewise.
   (vst1_s32_x2): Likewise.
   (vst1_u8_x2): Likewise.
   (vst1_u16_x2): Likewise.
   (vst1_u32_x2): Likewise.
   (vst1_f16_x2): Likewise.
   (vst1_f32_x2): Likewise.
   (vst1_p64_x2): Likewise.
   (vst1q_s8_x2): Likewise.
   (vst1q_p8_x2): Likewise.
   (vst1q_s16_x2): Likewise.
   (vst1q_p16_x2): Likewise.
   (vst1q_s32_x2): Likewise.
   (vst1q_s64_x2): Likewise.
   (vst1q_u8_x2): Likewise.
   (vst1q_u16_x2): Likewise.
   (vst1q_u32_x2): Likewise.
   (vst1q_u64_x2): Likewise.
   (vst1q_f16_x2): Likewise.
   (vst1q_f32_x2): Likewise.
   (vst1q_f64_x2): Likewise.
   (vst1q_p64_x2): Likewise.
   (vst1_s64_x3): Likewise.
   (vst1_u64_x3): Likewise.
   (vst1_f64_x3): Likewise.
   (vst1_s8_x3): Likewise.
   (vst1_p8_x3): Likewise.
   (vst1_s16_x3): Likewise.
   (vst1_p16_x3): Likewise.
   (vst1_s32_x3): Likewise.
   (vst1_u8_x3): Likewise.
   (vst1_u16_x3): Likewise.
   (vst1_u32_x3): Likewise.
   (vst1_f16_x3): Likewise.
   (vst1_f32_x3): Likewise.
   (vst1_p64_x3): Likewise.
   (vst1q_s8_x3): Likewise.
   (vst1q_p8_x3): Likewise.
   (vst1q_s16_x3): Likewise.
   (vst1q_p16_x3): Likewise.
   (vst1q_s32_x3): Likewise.
   (vst1q_s64_x3): Likewise.
   (vst1q_u8_x3): Likewise.
   (vst1q_u16_x3): Likewise.
   (vst1q_u32_x3): Likewise.
   (vst1q_u64_x3): Likewise.
   (vst1q_f16_x3): Likewise.
   (vst1q_f32_x3): Likewise.
   (vst1q_f64_x3): Likewise.
   (vst1q_p64_x3): Likewise.




Hi,
I'm not a maintainer, but I suspect you should add some tests.

Christophe






--
- 

Re: [PATCH][AARCH64][PR target/84882] Add mno-strict-align

2018-05-10 Thread Sudakshina Das

Ping!

On 27/03/18 13:58, Sudakshina Das wrote:

Hi

This patch adds the no variant to -mstrict-align and the corresponding
function attribute. To enable the function attribute, I have modified
aarch64_can_inline_p () to allow checks even when the callee function
has no attribute. The need for this is shown by the new test
target_attr_18.c.

Testing: Bootstrapped, regtested and added new tests that are copies
of earlier tests checking -mstrict-align with opposite scan directives.

Is this ok for trunk?

Sudi


*** gcc/ChangeLog ***

2018-03-27  Sudakshina Das  

 * common/config/aarch64/aarch64-common.c (aarch64_handle_option):
 Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags.
 * config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative.
 * config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg
 as true for strict-align.
 (aarch64_can_inline_p): Perform checks even when callee has no
 attributes to check for strict alignment.
 * doc/extend.texi (AArch64 Function Attributes): Document
 no-strict-align.
 * doc/invoke.texi: (AArch64 Options): Likewise.

*** gcc/testsuite/ChangeLog ***

2018-03-27  Sudakshina Das  

 * gcc.target/aarch64/pr84882.c: New test.
 * gcc.target/aarch64/target_attr_18.c: Likewise.




  1   2   >