date:20161129

On 29 November 2016 at 03:59, Martin Sebor  wrote:
> On 11/28/2016 06:35 PM, David Edelsohn wrote:
>>
>> Martin,
>>
>> I am seeing a number of new failures with the testcases on AIX.
>>
>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-1.c (test for excess errors)
>>
>> Excess errors:
>>
>> /nasfarm/edelsohn/src/src/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c:1485:3:
>> warning: specified destination size 2147483647 is too large
>> [-Wformat-length=]
>>
>>
>> Also, a number of errors like
>>
>> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-3.c  target lp64  (test for
>> warnings, line 256)
>> PASS: gcc.dg/tree-ssa/builtin-sprintf-warn-3.c  (test for warnings, line
>> 256)
>
>
> Thanks.   The DejaGnu directives in the tests likely needs adjusting.
> Let me look into it tomorrow.
>
> Martin


Probably. I'm seeing errors on arm*:
FAIL:  gcc.dg/tree-ssa/builtin-sprintf-warn-3.c  target lp64  (test
for warnings, line 256)
FAIL:  gcc.dg/tree-ssa/builtin-sprintf-warn-3.c  target lp64  (test
for warnings, line 260)
FAIL:  gcc.dg/tree-ssa/builtin-sprintf-warn-3.c  target lp64  (test
for warnings, line 264)

Christophe

[PATCH][AArch64] PR target/71112: Properly create lowpart of pic_offset_table_rtx with -fpie


Hi all,

This ICE only occurs on big-endian ILP32 -fpie code. The expansion code 
generates the invalid load:
(insn 6 5 7 (set (reg/f:SI 76)
(unspec:SI [
(mem/u/c:SI (lo_sum:SI (nil)
(symbol_ref:SI ("dbs") [flags 0x40] )) [0  S4 A8])
] UNSPEC_GOTSMALLPIC28K))
 (expr_list:REG_EQUAL (symbol_ref:SI ("dbs") [flags 0x40] )
(nil)))

to load the symbol. Note the (nil) argument to lo_sum.
The buggy hunk meant to take the lowpart of the pic_offset_table_rtx register 
but it did so by explicitly
constructing a subreg, for which the offset is wrong for big-endian. The right 
way is to use gen_lowpart which
knows what exactly to do, with this patch we emit:
(insn 6 5 7 (set (reg/f:SI 76)
(unspec:SI [
(mem/u/c:SI (lo_sum:SI (subreg:SI (reg:DI 73) 4)
(symbol_ref:SI ("dbs") [flags 0x40] )) [0  S4 A8])
] UNSPEC_GOTSMALLPIC28K))
 (expr_list:REG_EQUAL (symbol_ref:SI ("dbs") [flags 0x40] )
(nil)))

and everything works fine.

Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64_be-none-elf.
Ok for trunk?

Thanks,
Kyrill

2016-11-29  Kyrylo Tkachov  

PR target/71112
* config/aarch64/aarch64.c (aarch64_load_symref_appropriately,
case SYMBOL_SMALL_GOT_28K): Use gen_lowpart rather than constructing
subreg directly.

2016-11-29  Kyrylo Tkachov  

PR target/71112
* gcc.c-torture/compile/pr71112.c: New test.
commit 99052634535958a776a2e7b7c9c6169cb656f5a8
Author: Kyrylo Tkachov 
Date:   Mon Nov 28 12:26:57 2016 +

[AArch64] PR target/71112: Properly create lowpart of pic_offset_table_rtx with -fpie

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 686a8e92..aa833c8 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1299,7 +1299,7 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 	emit_move_insn (gp_rtx, gen_rtx_HIGH (Pmode, s));
 
 	if (mode != GET_MODE (gp_rtx))
-	  gp_rtx = simplify_gen_subreg (mode, gp_rtx, GET_MODE (gp_rtx), 0);
+	  gp_rtx = gen_lowpart (mode, gp_rtx);
 	  }
 
 	if (mode == ptr_mode)
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr71112.c b/gcc/testsuite/gcc.c-torture/compile/pr71112.c
new file mode 100644
index 000..69e2df6
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr71112.c
@@ -0,0 +1,10 @@
+/* PR target/71112.  */
+/* { dg-additional-options "-fpie" { target pie } } */
+
+extern int dbs[100];
+void f (int *);
+int nscd_init (void)
+{
+  f (dbs);
+  return 0;
+}

[PATCH 2/4] S/390: Merge compare of compare results

With this patch EQ and NE compares on CC mode reader patterns are
folded.  This allows using the result of the vec_all_* and vec_any_*
builtins directly in a conditional jump instruction as in the attached
testcase.

gcc/ChangeLog:

2016-11-29  Andreas Krebbel  

* config/s390/s390-protos.h (s390_reverse_condition): New
prototype.
* config/s390/s390.c (s390_canonicalize_comparison): Fold compares
of CC mode values.
(s390_reverse_condition): New function.
* config/s390/s390.h (REVERSE_CC_MODE, REVERSE_CONDITION): Define
target macros.

gcc/testsuite/ChangeLog:

2016-11-29  Andreas Krebbel  

* gcc.target/s390/zvector/vec-cmp-2.c: New test.
---
 gcc/config/s390/s390-protos.h |   1 +
 gcc/config/s390/s390.c|  42 +
 gcc/config/s390/s390.h|  12 ++
 gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c | 203 ++
 4 files changed, 258 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 7ae98d4..000a677 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -119,6 +119,7 @@ extern void s390_expand_atomic (machine_mode, enum rtx_code,
 extern void s390_expand_tbegin (rtx, rtx, rtx, bool);
 extern void s390_expand_vec_compare (rtx, enum rtx_code, rtx, rtx);
 extern void s390_expand_vec_compare_cc (rtx, enum rtx_code, rtx, rtx, bool);
+extern enum rtx_code s390_reverse_condition (machine_mode, enum rtx_code);
 extern void s390_expand_vcond (rtx, rtx, rtx, enum rtx_code, rtx, rtx);
 extern void s390_expand_vec_init (rtx, rtx);
 extern rtx s390_return_addr_rtx (int, rtx);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 445c147..dab4f43 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1722,6 +1722,31 @@ s390_canonicalize_comparison (int *code, rtx *op0, rtx 
*op1,
}
   tmp = *op0; *op0 = *op1; *op1 = tmp;
 }
+
+  /* A comparison result is compared against zero.  Replace it with
+ the (perhaps inverted) original comparison.
+ This probably should be done by simplify_relational_operation.  */
+  if ((*code == EQ || *code == NE)
+  && *op1 == const0_rtx
+  && COMPARISON_P (*op0)
+  && CC_REG_P (XEXP (*op0, 0)))
+{
+  enum rtx_code new_code;
+
+  if (*code == EQ)
+   new_code = reversed_comparison_code_parts (GET_CODE (*op0),
+  XEXP (*op0, 0),
+  XEXP (*op1, 0), NULL);
+  else
+   new_code = GET_CODE (*op0);
+
+  if (new_code != UNKNOWN)
+   {
+ *code = new_code;
+ *op1 = XEXP (*op0, 1);
+ *op0 = XEXP (*op0, 0);
+   }
+}
 }
 
 /* Helper function for s390_emit_compare.  If possible emit a 64 bit
@@ -6343,6 +6368,23 @@ s390_expand_vec_compare_cc (rtx target, enum rtx_code 
code,
tmp_reg, target));
 }
 
+/* Invert the comparison CODE applied to a CC mode.  This is only safe
+   if we know whether there result was created by a floating point
+   compare or not.  For the CCV modes this is encoded as part of the
+   mode.  */
+enum rtx_code
+s390_reverse_condition (machine_mode mode, enum rtx_code code)
+{
+  /* Reversal of FP compares takes care -- an ordered compare
+ becomes an unordered compare and vice versa.  */
+  if (mode == CCVFALLmode || mode == CCVFANYmode)
+return reverse_condition_maybe_unordered (code);
+  else if (mode == CCVIALLmode || mode == CCVIANYmode)
+return reverse_condition (code);
+  else
+gcc_unreachable ();
+}
+
 /* Generate a vector comparison expression loading either elements of
THEN or ELS into TARGET depending on the comparison COND of CMP_OP1
and CMP_OP2.  */
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 6be4d34..1d6d7b2 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -513,6 +513,18 @@ extern const char *s390_host_detect_local_cpu (int argc, 
const char **argv);
 #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)  \
   s390_cannot_change_mode_class ((FROM), (TO), (CLASS))
 
+/* We can reverse a CC mode safely if we know whether it comes from a
+   floating point compare or not.  With the vector modes it is encoded
+   as part of the mode.
+   FIXME: It might make sense to do this for other cc modes as well.  */
+#define REVERSIBLE_CC_MODE(MODE)   \
+  ((MODE) == CCVIALLmode || (MODE) == CCVIANYmode  \
+   || (MODE) == CCVFALLmode || (MODE) == CCVFANYmode)
+
+/* Given a condition code and a mode, return the inverse condition.  */
+#define REVERSE_CONDITION(CODE, MODE) s390_reverse_condition (MODE, CODE)
+
+
 /* Register classes.  */
 
 /* We use the following register classes:
diff --git a/gcc/testsuite/gcc.targ

[PATCH 3/4] S/390: Add vector pack/unpack patterns.

gcc/ChangeLog:

2016-11-29  Andreas Krebbel  

* config/s390/vector.md (vec_halfhalf): New mode iterator.
("vec_pack_trunc_", "vec_pack_ssat_")
("vec_pack_usat_", "vec_unpacks_hi_v16qi")
("vec_unpacks_low_v16qi", "vec_unpacku_hi_v16qi")
("vec_unpacku_low_v16qi", "vec_unpacks_hi_v8hi")
("vec_unpacks_lo_v8hi", "vec_unpacku_hi_v8hi")
("vec_unpacku_lo_v8hi", "vec_unpacks_hi_v4si")
("vec_unpacks_lo_v4si", "vec_unpacku_hi_v4si")
("vec_unpacku_lo_v4si"): New pattern definitions.
* config/s390/vx-builtins.md: Move VI_HW_HSD mode iterator to
vector.md.
---
 gcc/config/s390/vector.md  | 198 +++--
 gcc/config/s390/vx-builtins.md |   1 -
 2 files changed, 189 insertions(+), 10 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index bc4f8da..d446d5f 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -38,7 +38,8 @@
 (define_mode_iterator VIT_HW[V16QI V8HI V4SI V2DI V1TI TI])
 (define_mode_iterator VI_HW [V16QI V8HI V4SI V2DI])
 (define_mode_iterator VI_HW_QHS [V16QI V8HI V4SI])
-(define_mode_iterator VI_HW_HS  [V8HI V4SI])
+(define_mode_iterator VI_HW_HSD [V8HI  V4SI V2DI])
+(define_mode_iterator VI_HW_HS  [V8HI  V4SI])
 (define_mode_iterator VI_HW_QH  [V16QI V8HI])
 
 ; All integer vector modes supported in a vector register + TImode
@@ -114,6 +115,13 @@
(V1DF "V2SF") (V2DF "V4SF")
(V1TF "V1DF")])
 
+; Vector with half the element size AND half the number of elements.
+(define_mode_attr vec_halfhalf
+  [(V2HI "V2QI") (V4HI "V4QI") (V8HI "V8QI")
+   (V2SI "V2HI") (V4SI "V4HI")
+   (V2DI "V2SI")
+   (V2DF "V2SF")])
+
 ; The comparisons not setting CC iterate over the rtx code.
 (define_code_iterator VFCMP_HW_OP [eq gt ge])
 (define_code_attr asm_fcmp_op [(eq "e") (gt "h") (ge "he")])
@@ -1223,6 +1231,185 @@
   "vsel\t%v0,%2,%1,%3"
   [(set_attr "op_type" "VRR")])
 
+; vec_pack_trunc
+
+; vpkh, vpkf, vpkg
+(define_insn "vec_pack_trunc_"
+  [(set (match_operand: 0 "register_operand" "=v")
+   (vec_concat:
+(truncate:
+ (match_operand:VI_HW_HSD 1 "register_operand" "v"))
+(truncate:
+ (match_operand:VI_HW_HSD 2 "register_operand" "v"]
+  "TARGET_VX"
+  "vpk\t%0,%1,%2"
+  [(set_attr "op_type" "VRR")])
+
+; vpksh, vpksf, vpksg
+(define_insn "vec_pack_ssat_"
+  [(set (match_operand: 0 "register_operand" "=v")
+   (vec_concat:
+(ss_truncate:
+ (match_operand:VI_HW_HSD 1 "register_operand" "v"))
+(ss_truncate:
+ (match_operand:VI_HW_HSD 2 "register_operand" "v"]
+  "TARGET_VX"
+  "vpks\t%0,%1,%2"
+  [(set_attr "op_type" "VRR")])
+
+; vpklsh, vpklsf, vpklsg
+(define_insn "vec_pack_usat_"
+  [(set (match_operand: 0 "register_operand" "=v")
+   (vec_concat:
+(us_truncate:
+ (match_operand:VI_HW_HSD 1 "register_operand" "v"))
+(us_truncate:
+ (match_operand:VI_HW_HSD 2 "register_operand" "v"]
+  "TARGET_VX"
+  "vpkls\t%0,%1,%2"
+  [(set_attr "op_type" "VRR")])
+
+;; vector unpack v16qi
+
+; signed
+
+(define_insn "vec_unpacks_hi_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+   (sign_extend:V8HI
+(vec_select:V8QI
+ (match_operand:V16QI 1 "register_operand" "v")
+ (parallel [(const_int 0)(const_int 1)(const_int 2)(const_int 3)
+(const_int 4)(const_int 5)(const_int 6)(const_int 7)]]
+  "TARGET_VX"
+  "vuphb\t%0,%1"
+  [(set_attr "op_type" "VRR")])
+
+(define_insn "vec_unpacks_low_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+   (sign_extend:V8HI
+(vec_select:V8QI
+ (match_operand:V16QI 1 "register_operand" "v")
+ (parallel [(const_int 8) (const_int 9) (const_int 10)(const_int 11)
+(const_int 12)(const_int 13)(const_int 14)(const_int 
15)]]
+  "TARGET_VX"
+  "vuplb\t%0,%1"
+  [(set_attr "op_type" "VRR")])
+
+; unsigned
+
+(define_insn "vec_unpacku_hi_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+   (zero_extend:V8HI
+(vec_select:V8QI
+ (match_operand:V16QI 1 "register_operand" "v")
+ (parallel [(const_int 0)(const_int 1)(const_int 2)(const_int 3)
+(const_int 4)(const_int 5)(const_int 6)(const_int 7)]]
+  "TARGET_VX"
+  "vuplhb\t%0,%1"
+  [(set_attr "op_type" "VRR")])
+
+(define_insn "vec_unpacku_low_v16qi"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+   (zero_extend:V8HI
+(vec_select:V8QI
+ (match_operand:V16QI 1 "register_operand" "v")
+ (parallel [(const_int 8) (const_int 9) (const_int 10)(const_int 11)
+(const_int 12)(const_int 13)(const_int 14)(const_int 
15)]]
+  "TARGET_VX"
+  "vupllb\t%0,%1"
+  [(set_attr "op_type" "VRR")])
+
+;; vector unpack v8hi
+
+; signed
+
+(define_insn "vec_unpacks_hi_v8hi"
+  [

[PATCH 4/4] S/390: Disable peeling for alignment.

Although the S/390 backend states that the machine supports unaligned
vector accesses the loop vectorizer still tries to peel loop
iterations to get higher alignments.  Setting
vect_max_peeling_for_alignment to 0 prevents this.

gcc/ChangeLog:

2016-11-29  Andreas Krebbel  

* config/s390/s390.c (s390_option_override_internal): Set
vect_max_peeling_for_alignment to 0.

gcc/testsuite/ChangeLog:

2016-11-29  Andreas Krebbel  

* gcc.dg/tree-ssa/gen-vect-26.c: Disable peeling check for s390.
* gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
---
 gcc/config/s390/s390.c  | 13 +
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c |  5 ++---
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c |  4 ++--
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index dab4f43..2e71745 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -14586,6 +14586,19 @@ s390_option_override_internal (bool main_args_p,
  opts->x_param_values,
  opts_set->x_param_values);
 
+  /* S/390 can deal with unaligned accesses without a performance
+ penalty (as long as we do not cross a cache line boundary).  This
+ setting prevents the vectorizer from generating expensive extra
+ code emitted to reach a better alignment.
+ Don't do this when vectorize_support_vector_misalignment falls
+ back to the default path in order to avoid effects on software
+ vectorization.  */
+  if (TARGET_VX)
+maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
+  0,
+  opts->x_param_values,
+  opts_set->x_param_values);
+
   /* Call target specific restore function to do post-init work.  At the 
moment,
  this just sets opts->x_s390_cost_pointer.  */
   s390_function_specific_restore (opts, NULL);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c 
b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
index 8e5f141..461a952 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
@@ -28,7 +28,6 @@ int main ()
   return 0;
 }
 
-
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
! avr-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
"vect" { target { ! avr-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 1 "vect" { target { ! avr-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
"vect" { target { ! { avr-*-* s390*-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 1 "vect" { target { ! { avr-*-* s390*-*-* } } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c 
b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
index ce97e09..fe44e85 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
@@ -38,5 +38,5 @@ int main (void)
 
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
! avr-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
"vect" { target { ! avr-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 1 "vect" { target { ! avr-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
"vect" { target { ! { avr-*-* s390*-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
peeling" 1 "vect" { target { ! { avr-*-* s390*-*-* } } } } } */
-- 
2.9.1

[PATCH 1/4] S/390: Fix vector all/any cc modes.

This fixes a problem with the vector compares producing CC mode
results.

The instructions produce condition code modes which can be either
interpreted to check an ALL elements or an ANY element result.  As the
modes where used before they could not be inverted by the middle-end
by inverting the comparison code (e.g. eq to ne).  The result usually
was just wrong.

In fact inverting a comparison code on an CCVALL mode would require to
also change the mode to CCVANY but this cannot be done easily in the
middle-end.  With this patch the meaning of an ALL cc mode only refers
to the not-inverted comparison code (e.g. eq, gt, ge).  With that
change inverting the comparison code matches a not operation on the
condition code mask again.

Bootstrapped and regression tested on s390 and s390x.

Bye,

-Andreas-

gcc/testsuite/ChangeLog:

2016-11-29  Andreas Krebbel  

* gcc.target/s390/vector/vec-scalar-cmp-1.c: Fix and harden the
pattern checks.
* gcc.target/s390/zvector/vec-cmp-1.c: New test.

gcc/ChangeLog:

2016-11-29  Andreas Krebbel  

* config/s390/s390-modes.def (CCVEQANY, CCVH, CCVHANY, CCVHU)
(CCVHUANY): Remove modes.
(CCVIH, CCVIHU, CCVIALL, CCVIANY, CCVFALL, CCVFANY): Add modes and
documentation.
* config/s390/s390.c (s390_match_ccmode_set): Rename cc modes.
(s390_expand_vec_compare_scalar): Pick one of the cc consumer
modes.
(s390_branch_condition_mask): Adjust to use the new cc consumer
modes.  The new modes allow for proper reversal in the middle-end.
(s390_expand_vec_compare_cc): Determine the proper cc producer and
consumer modes for a comparison.
* config/s390/s390.md: Rename CCVH to CCVIH and CCVHU to CCVIHU
throughout the file.
* config/s390/vx-builtins.md: Likewise.
---
 gcc/config/s390/s390-modes.def |  72 ---
 gcc/config/s390/s390.c | 226 +++--
 gcc/config/s390/s390.md|   2 +-
 gcc/config/s390/vx-builtins.md |  44 ++--
 .../gcc.target/s390/vector/vec-scalar-cmp-1.c  |  24 ++-
 gcc/testsuite/gcc.target/s390/zvector/vec-cmp-1.c  | 173 
 6 files changed, 365 insertions(+), 176 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-cmp-1.c

diff --git a/gcc/config/s390/s390-modes.def b/gcc/config/s390/s390-modes.def
index 69235b6..15ff903 100644
--- a/gcc/config/s390/s390-modes.def
+++ b/gcc/config/s390/s390-modes.def
@@ -84,22 +84,6 @@ Requested mode-> Destination CC register mode
 CCS, CCU, CCT, CCSR, CCUR -> CCZ
 CCA   -> CCAP, CCAN
 
-Vector comparison modes
-
-CCVEQEQ  --   NE (VCEQ)
-CCVEQANY  EQ EQ   -   NE (VCEQ)
-
-CCVH GT  --   LE (VCH)
-CCVHANY  GT  GT   -   LE (VCH)
-CCVHUGTU --   LEU(VCHL)
-CCVHUANY  GTUGTU  -   LEU(VCHL)
-
-CCVFHGT  --   UNLE   (VFCH)
-CCVFHANY  GT GT   -   UNLE   (VFCH)
-CCVFHE   GE  --   UNLT   (VFCHE)
-CCVFHEANY GE GE   -   UNLT   (VFCHE)
-
-
 
 
 *** Comments ***
@@ -169,14 +153,40 @@ The compare and swap instructions sets the condition code 
to 0/1 if the
 operands were equal/unequal. The CCZ1 mode ensures the result can be
 effectively placed into a register.
 
-
-CCV*
-
-The variants with and without ANY are generated by the same
-instructions and therefore are holding the same information.  However,
-when generating a condition code mask they require checking different
-bits of CC.  In that case the variants without ANY represent the
-results for *all* elements.
+CCVIH, CCVIHU, CCVFH, CCVFHE
+
+These are condition code modes used in instructions setting the
+condition code.  The mode determines which comparison to perform (H -
+high, HU - high unsigned, HE - high or equal) and whether it is a
+floating point comparison or not (I - int, F - float).
+
+The comparison operation to be performed needs to be encoded into the
+condition code mode since the comparison operator is not available in
+compare style patterns (set cc (compare (op0) (op1))).  So the
+condition code mode is the only information to determine the
+instruction to be used.
+
+CCVIALL, CCVIANY, CCVFALL, CCVFANY
+
+These modes are used in instructions reading the condition code.
+Opposed to the CC producer patterns the comparison operator is
+available.  Hence the comparison operation does not need to be part of
+the CC mode.  However, we still need to know whether CC has been
+generated by a float or an integer comparison in order to be able to
+invert the condition correctly (int: GT -> LE, float: GT -> UNLE).
+
+The ALL and ANY variants differ only in the usage of CC1

[PATCH 0/4] S/390: Vector bugfix and improvements

Please see the patches for descriptions.

The first one is an important bugfix which needs to go also into GCC 5
and 6 branches.

I'll commit the patches in a few days to give some time for comments.

Andreas Krebbel (4):
  S/390: Fix vector all/any cc modes.
  S/390: Merge compare of compare results
  S/390: Add vector pack/unpack patterns.
  S/390: Disable peeling for alignment.

 gcc/config/s390/s390-modes.def |  72 +++---
 gcc/config/s390/s390-protos.h  |   1 +
 gcc/config/s390/s390.c | 281 +
 gcc/config/s390/s390.h |  12 +
 gcc/config/s390/s390.md|   2 +-
 gcc/config/s390/vector.md  | 198 ++-
 gcc/config/s390/vx-builtins.md |  45 ++--
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c|   5 +-
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c|   4 +-
 .../gcc.target/s390/vector/vec-scalar-cmp-1.c  |  24 +-
 gcc/testsuite/gcc.target/s390/zvector/vec-cmp-1.c  | 173 +
 gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c  | 203 +++
 12 files changed, 829 insertions(+), 191 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-cmp-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-cmp-2.c

-- 
2.9.1

Re: [PATCH, ARM] PR71607: New approach to arm_disable_literal_pool

2016-11-29 Thread Andre Vieira (lists)

On 17/11/16 10:00, Ramana Radhakrishnan wrote:
> On Thu, Oct 6, 2016 at 2:57 PM, Andre Vieira (lists)
>  wrote:
>> Hello,
>>
>> This patch tackles the issue reported in PR71607. This patch takes a
>> different approach for disabling the creation of literal pools. Instead
>> of disabling the patterns that would normally transform the rtl into
>> actual literal pools, it disables the creation of this literal pool rtl
>> by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if
>> arm_disable_literal_pool is true. I added patterns to split floating
>> point constants for both SF and DFmode. A pattern to handle the
>> addressing of label_refs had to be included as well since all
>> "memory_operand" patterns are disabled when
>> TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for
>> splitting 32-bit immediates had to be changed, it was not accepting
>> unsigned 32-bit unsigned integers with the MSB set. I believe
>> const_int_operand expects the mode of the operand to be set to VOIDmode
>> and not SImode. I have only changed it in the patterns that were
>> affecting this code, though I suggest looking into changing it in the
>> rest of the ARM backend.
>>
>> I added more test cases. No regressions for arm-none-eabi with
>> Cortex-M0, Cortex-M3 and Cortex-M7.
>>
>> Is this OK for trunk?
> 
> Including -mslow-flash-data in your multilib flags ? If no regressions
> with that ok .
> 
> 
> regards
> Ramana
> 
>>

Hello,

I found some new ICE's with the -mslow-flash-data testing so I had to
rework this patch. I took the opportunity to rebase it as well.

The problem was with the way the old version of the patch handled label
references.  After some digging I found I wasn't using the right target
hook and so I implemented the 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' for
ARM.  This target hook determines whether a literal pool ends up in an
'object_block' structure. So I reverted the changes made in the old
version of the patch to the ARM implementation of the
'TARGET_CANNOT_FORCE_CONST_MEM' hook and rely on
'TARGET_USE_BLOCKS_FOR_CONSTANT_P' instead. This patch adds an ARM
implementation for this hook that returns false if
'arm_disable_literal_pool' is set to true and true otherwise.

This version of the patch also reverts back to using the check for
'SYMBOL_REF' in 'thumb2_legitimate_address_p' that was removed in the
last version, this code is required to place the label references in
rodata sections.

Another thing this patch does is revert the changes made to the 32-bit
constant split in arm.md. The reason this was needed before was because
'real_to_target' returns a long array and does not sign-extend values in
it, which would make sense on hosts with 64-bit longs. To fix this the
value is now casted to 'int' first.  It would probably be a good idea to
change the 'real_to_target' function to return an array with
'HOST_WIDE_INT' elements instead and either use all 64-bits or
sign-extend them.  Something for the future?

I added more test cases in this patch and reran regression tests for:
Cortex-M0, Cortex-M4 with and without -mslow-flash-data. Also did a
bootstrap+regressions on arm-none-linux-gnueabihf.

Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:

2016-11-29  Andre Vieira  

PR target/71607
* config/arm/arm.md (use_literal_pool): Removes.
(64-bit immediate split): No longer takes cost into consideration
if 'arm_disable_literal_pool' is enabled.
* config/arm/arm.c (arm_use_blocks_for_constant_p): New.
(TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define.
(arm_max_const_double_inline_cost): Remove use of
arm_disable_literal_pool.
* config/arm/vfp.md (no_literal_pool_df_immediate): New.
(no_literal_pool_sf_immediate): New.

gcc/testsuite/ChangeLog:

2016-11-29  Andre Vieira  
Thomas Preud'homme  

PR target/71607
* gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
* gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
* gcc.target/arm/thumb2-slow-flash-data-2.c: New.
* gcc.target/arm/thumb2-slow-flash-data-3.c: New.
* gcc.target/arm/thumb2-slow-flash-data-4.c: New.
* gcc.target/arm/thumb2-slow-flash-data-5.c: New.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
abd3276f13125e87fe7a88b60f0bf98cd580e7fb..1fcf57ccd9bda6c477db7a98084fd6f0e359de21
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -300,6 +300,8 @@ static bool arm_asm_elf_flags_numeric (unsigned int flags, 
unsigned int *num);
 static unsigned int arm_elf_section_type_flags (tree decl, const char *name,
int reloc);
 static void arm_expand_divmod_libfunc (rtx, machine_mode, rtx, rtx, rtx *, rtx 
*);
+static bool arm_use_blocks_for_constant_p (machine_mode var, const_rtx x);
+

 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -738,6 +740,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_EXPAND_DIVM

Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

2016-11-29 Thread Tamar Christina

Hi All,

The new patch contains the proper types for the intrinsics that should be 
returning uint64x1
and has the rest of the comments by Christophe in them.

Kind Regards,
Tamar


From: Tamar Christina
Sent: Friday, November 25, 2016 4:01:30 PM
To: Christophe Lyon
Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard Earnshaw; 
James Greenhalgh; Kyrylo Tkachov; nd
Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t 
intrinsics to GCC

 >
> > A few comments about this new version:
> > * arm-neon-ref.h: why do you create
> CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?
> > Can't you just add calls to CHECK_CRYPTO in the existing
> > CHECK_RESULTS_NAMED_NO_FP16?

Yes, that should be fine, I didn't used to have CHECK_CRYPTO before and when I 
added it
I didn't remove the split. I'll do it now.

> >
> > * p64_p128:
> > From what I can see ARM and AArch64 differ on the vceq variants
> > available with poly64.
> > For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,
> > poly64x1_t __b) For AArch64, I can't see vceq_p64 in arm_neon.h? ...
> > Actually I've just noticed the other you submitted while I was writing
> > this, where you add vceq_p64 for aarch64, but it still returns
> > uint64_t.
> > Why do you change the vceq_64 test to return poly64_t instead of
> uint64_t?

This patch is slightly outdated. The correct type is `uint64_t` but when it was 
noticed
This patch was already sent. New one coming soon.

> >
> > Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?
> >

This is wrong, remove them. It was supposed to be around the vldX_lane_p64 
tests.

> > The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE
> > tests
> >
> > You need to protect the new vmov, vget_high and vget_lane tests with
> > #ifdef __aarch64__.
> >

vget_lane is already in an #ifdef, vmov you're right, but I also notice that the
test calls VDUP instead of VMOV, which explains why I didn't get a test failure.

Thanks for the feedback,
I'll get these updated.

>
> Actually, vget_high_p64 exists on arm, so no need for the #fidef for it.
>
>
> > Christophe
> >
> >> Kind regards,
> >> Tamar
> >> 
> >> From: Tamar Christina
> >> Sent: Tuesday, November 8, 2016 11:58:46 AM
> >> To: Christophe Lyon
> >> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard
> >> Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd
> >> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing
> >> Poly64_t intrinsics to GCC
> >>
> >> Hi Christophe,
> >>
> >> Thanks for the review!
> >>
> >>>
> >>> A while ago I added p64_p128.c, to contain all the poly64/128 tests
> >>> except for vreinterpret.
> >>> Why do you need to create p64.c ?
> >>
> >> I originally created it because I had a much smaller set of
> >> intrinsics that I wanted to add initially, this grew and It hadn't 
> >> occurred to
> me that I can use the existing file now.
> >>
> >> Another reason was the effective-target arm_crypto_ok as you
> mentioned below.
> >>
> >>>
> >>> Similarly, adding tests for vcreate_p64 etc... in p64.c or
> >>> p64_p128.c might be easier to maintain than adding them to vcreate.c
> >>> etc with several #ifdef conditions.
> >>
> >> Fair enough, I'll move them to p64_p128.c.
> >>
> >>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"
> >>> condition? These intrinsics are defined in arm/arm_neon.h, right?
> >>> They are tested in p64_p128.c
> >>
> >> I should have looked for them, they weren't being tested before so I
> >> had Mistakenly assumed that they weren't available. Now I realize I
> >> just need To add the proper test option to the file to enable crypto. I'll
> update this as well.
> >>
> >>> Looking at your patch, it seems some tests are currently missing for arm:
> >>> vget_high_p64. I'm not sure why I missed it when I removed neont-
> >>> testgen...
> >>
> >> I'll adjust the test conditions so they run for ARM as well.
> >>
> >>>
> >>> Regarding vreinterpret_p128.c, doesn't the existing effective-target
> >>> arm_crypto_ok prevent the tests from running on aarch64?
> >>
> >> Yes they do, I was comparing the output against a clean version and
> >> hasn't noticed That they weren't running. Thanks!
> >>
> >>>
> >>> Thanks,
> >>>
> >>> Christophe
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 462141586b3db7c5256c74b08fa0449210634226..beaf6ac31d5c5affe3702a505ad0df8679229e32 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -32,6 +32,13 @@ extern size_t strlen(const char *);
VECT_VAR(expected, int, 16, 4) -> expected_int16x4
VECT_VAR_DECL(expected, int, 16, 4) -> int16x4_t expected_int16x4
 */
+/* Some instructions don't exist on ARM.
+   Use this macro to guard aga

Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

Hi Tamar,


On 29 November 2016 at 10:50, Tamar Christina  wrote:
> Hi All,
>
> The new patch contains the proper types for the intrinsics that should be 
> returning uint64x1
> and has the rest of the comments by Christophe in them.
>

LGTM.

One more question: maybe we want to add explicit tests for vdup*_v_p64
even though they are aliases for vmov?

Christophe

> Kind Regards,
> Tamar
>
> 
> From: Tamar Christina
> Sent: Friday, November 25, 2016 4:01:30 PM
> To: Christophe Lyon
> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard Earnshaw; 
> James Greenhalgh; Kyrylo Tkachov; nd
> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t 
> intrinsics to GCC
>
>  >
>> > A few comments about this new version:
>> > * arm-neon-ref.h: why do you create
>> CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?
>> > Can't you just add calls to CHECK_CRYPTO in the existing
>> > CHECK_RESULTS_NAMED_NO_FP16?
>
> Yes, that should be fine, I didn't used to have CHECK_CRYPTO before and when 
> I added it
> I didn't remove the split. I'll do it now.
>
>> >
>> > * p64_p128:
>> > From what I can see ARM and AArch64 differ on the vceq variants
>> > available with poly64.
>> > For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,
>> > poly64x1_t __b) For AArch64, I can't see vceq_p64 in arm_neon.h? ...
>> > Actually I've just noticed the other you submitted while I was writing
>> > this, where you add vceq_p64 for aarch64, but it still returns
>> > uint64_t.
>> > Why do you change the vceq_64 test to return poly64_t instead of
>> uint64_t?
>
> This patch is slightly outdated. The correct type is `uint64_t` but when it 
> was noticed
> This patch was already sent. New one coming soon.
>
>> >
>> > Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?
>> >
>
> This is wrong, remove them. It was supposed to be around the vldX_lane_p64 
> tests.
>
>> > The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE
>> > tests
>> >
>> > You need to protect the new vmov, vget_high and vget_lane tests with
>> > #ifdef __aarch64__.
>> >
>
> vget_lane is already in an #ifdef, vmov you're right, but I also notice that 
> the
> test calls VDUP instead of VMOV, which explains why I didn't get a test 
> failure.
>
> Thanks for the feedback,
> I'll get these updated.
>
>>
>> Actually, vget_high_p64 exists on arm, so no need for the #fidef for it.
>>
>>
>> > Christophe
>> >
>> >> Kind regards,
>> >> Tamar
>> >> 
>> >> From: Tamar Christina
>> >> Sent: Tuesday, November 8, 2016 11:58:46 AM
>> >> To: Christophe Lyon
>> >> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard
>> >> Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd
>> >> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing
>> >> Poly64_t intrinsics to GCC
>> >>
>> >> Hi Christophe,
>> >>
>> >> Thanks for the review!
>> >>
>> >>>
>> >>> A while ago I added p64_p128.c, to contain all the poly64/128 tests
>> >>> except for vreinterpret.
>> >>> Why do you need to create p64.c ?
>> >>
>> >> I originally created it because I had a much smaller set of
>> >> intrinsics that I wanted to add initially, this grew and It hadn't 
>> >> occurred to
>> me that I can use the existing file now.
>> >>
>> >> Another reason was the effective-target arm_crypto_ok as you
>> mentioned below.
>> >>
>> >>>
>> >>> Similarly, adding tests for vcreate_p64 etc... in p64.c or
>> >>> p64_p128.c might be easier to maintain than adding them to vcreate.c
>> >>> etc with several #ifdef conditions.
>> >>
>> >> Fair enough, I'll move them to p64_p128.c.
>> >>
>> >>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"
>> >>> condition? These intrinsics are defined in arm/arm_neon.h, right?
>> >>> They are tested in p64_p128.c
>> >>
>> >> I should have looked for them, they weren't being tested before so I
>> >> had Mistakenly assumed that they weren't available. Now I realize I
>> >> just need To add the proper test option to the file to enable crypto. I'll
>> update this as well.
>> >>
>> >>> Looking at your patch, it seems some tests are currently missing for arm:
>> >>> vget_high_p64. I'm not sure why I missed it when I removed neont-
>> >>> testgen...
>> >>
>> >> I'll adjust the test conditions so they run for ARM as well.
>> >>
>> >>>
>> >>> Regarding vreinterpret_p128.c, doesn't the existing effective-target
>> >>> arm_crypto_ok prevent the tests from running on aarch64?
>> >>
>> >> Yes they do, I was comparing the output against a clean version and
>> >> hasn't noticed That they weren't running. Thanks!
>> >>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Christophe

Re: [RFC] Assert DECL_ABSTRACT_ORIGIN is different from the decl itself

On Mon, Nov 28, 2016 at 6:28 PM, Martin Jambor  wrote:
> Hi Jeff,
>
> On Mon, Nov 28, 2016 at 08:46:05AM -0700, Jeff Law wrote:
>> On 11/28/2016 07:27 AM, Martin Jambor wrote:
>> > Hi,
>> >
>> > one of a number of symptoms of an otherwise unrelated HSA bug I've
>> > been debugging today is gcc crashing or hanging in the C++ pretty
>> > printer when attempting to emit a warning because dump_decl() ended up
>> > in an infinite recursion calling itself on the DECL_ABSTRACT_ORIGIN of
>> > the decl it was looking at, which was however the same thing.  (It was
>> > set to itself on purpose in set_decl_origin_self as a part of final
>> > pass, the decl was being printed because it was itself an abstract
>> > origin of another one).
>> >
>> > If someone ever faces a similar problem, the following (untested)
>> > patch might save them a bit of time.  I have eventually decided not to
>> > make it a checking-only assert because it is on a cold path and
>> > because at release-build optimization levels, the tail-call is
>> > optimized to a jump and thus an infinite loop if the described
>> > situation happens, and I suppose an informative ICE is better tan that
>> > even for users.
>> >
>> > What do you think?  Would it be reasonable for trunk even now or
>> > should I queue it for the next stage1?
>> >
>> > Thanks,
>> >
>> > Martin
>> >
>> >
>> > gcc/cp/
>> >
>> > 2016-11-28  Martin Jambor  
>> >
>> > * error.c (dump_decl): Add an assert that DECL_ABSTRACT_ORIGIN
>> > is not the decl itself.
>> Given it's on an error/debug path it ought to be plenty safe for now. What's
>> more interesting is whether or not DECL_ABSTRACT_ORIGIN can legitimately
>> point to itself and if so, how is that happening.
>
> Well, I tried to explain it in my original email but I also wanted to
> be as brief as possible, so perhaps it is necessary to elaborate a bit:
>
> There is a function set_decl_origin_self() in dwarf2out.c that does
> just that, sets DECL_ABSTRACT_ORIGIN to the decl itself, and its
> comment makes it clear that is intended (according to git blame, the
> whole comment and much of the implementation come from 1992, though ;-)
> The function is called from the "final" pass through dwarf2out_decl(),
> and gen_decl_die().
>
> So, for one reason or another, this is the intended behavior.
> Apparently, after that one is not supposed to be printing the decl
> name of such a "finished" a function.  It is too bad however that this
> can happen if a "finished" function is itself an abstract origin of a
> different one, which is optimized and expanded only afterwards and you
> attempt to print its decl name, because it triggers printing the decl
> name of the finished function, in turn triggering the infinite
> recursion/loop.  I am quite surprised that we have not hit this
> earlier (e.g. with warnings in IPA-CP clones) but perhaps there is a
> reason.
>
> I will append the patch to some bootstrap and testing run and commit
> it afterwards if it passes.

Other users explicitely check for the self-reference when walking origins.

Richard.

> Thanks,
>
> Martin
>
>>
>> I don't think we have a checker for the basic tree datastructures, but maybe
>> we ought to?
>>
>> jeff
>>

[patch,lto] Fix PR78562: Wrong type mismatch warning for built-ins with same asm name.

2016-11-29 Thread Georg-Johann Lay

This is a fix for a wrong warning from -Wlto-type-mismatch that reports 
a type mismatch for two built-in functions.


The avr backend has several built-ins that have the same asm name 
because their assembler implementation in libgcc is exactly the same. 
The prototypes might differ, however.


This patch skips the warning for built-in types as discussed in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78562#c6


Testing against avr-unknown-none, this resolves all FAILs because of 
that warning, e.g. gcc.target/avr/torture/builtins-5-countlsfx.c


Ok for trunk?

Johann

gcc/lto/
PR lto/78562
* lto-symtab.c (lto_symtab_merge_decls_2): Don't diagnose type
mismatch if the two types are built-in.
Index: lto/lto-symtab.c
===
--- lto/lto-symtab.c	(revision 242823)
+++ lto/lto-symtab.c	(working copy)
@@ -655,6 +655,14 @@ lto_symtab_merge_decls_2 (symtab_node *f
   /* Diagnose all mismatched re-declarations.  */
   FOR_EACH_VEC_ELT (mismatches, i, decl)
 {
+  /* Do not diagnose two built-in declarations, there is no useful
+ location in that case.  It also happens for AVR if two built-ins
+ use the same asm name because their libgcc assembler code is the
+ same, see PR78562.  */
+  if (DECL_IS_BUILTIN (prevailing->decl)
+	  && DECL_IS_BUILTIN (decl))
+	continue;
+
   int level = warn_type_compatibility_p (TREE_TYPE (prevailing->decl),
 	 TREE_TYPE (decl),
 	 DECL_COMDAT (decl));

Re: [PATCH] improve folding of expressions that move a single bit around

On Mon, Nov 28, 2016 at 7:41 PM, Jeff Law  wrote:
> On 11/28/2016 06:10 AM, Paolo Bonzini wrote:
>>
>>
>>
>> On 27/11/2016 00:28, Marc Glisse wrote:
>>>
>>> On Sat, 26 Nov 2016, Paolo Bonzini wrote:
>>>
 --- match.pd(revision 242742)
 +++ match.pd(working copy)
 @@ -2554,6 +2554,19 @@
   (cmp (bit_and@2 @0 integer_pow2p@1) @1)
   (icmp @2 { build_zero_cst (TREE_TYPE (@0)); })))

 +/* If we have (A & C) != 0 ? D : 0 where C and D are powers of 2,
 +   convert this into a shift of (A & C).  */
 +(simplify
 + (cond
 +  (ne (bit_and@2 @0 integer_pow2p@1) integer_zerop)
 +  integer_pow2p@3 integer_zerop)
 + (with {
 +int shift = wi::exact_log2 (@3) - wi::exact_log2 (@1);
 +  }
 +  (if (shift > 0)
 +   (lshift (convert @2) { build_int_cst (integer_type_node, shift); })
 +   (convert (rshift @2 { build_int_cst (integer_type_node, -shift);
 })
>>>
>>>
>>> What happens if @1 is the sign bit, in a signed type? Do we get an
>>> arithmetic shift right?
>>
>>
>> It shouldn't happen because the canonical form of a sign bit test is A <
>> 0 (that's the pattern immediately after).  However I can add an "if" if
>> preferred, or change the pattern to do the AND after the shift.
>
> But are we absolutely sure it'll be in canonical form every time?

No, of course not (though it would be a bug).  If the pattern generates wrong
code when the non-canonical form is met that would be bad, if it merely
does not optimize (or optimize non-optimally) then that's not too bad.

>   Is there
> a pattern in in match.pd which turns an (x & SIGN_BIT) tests into canonical
> form?
>
>
>>
>>  (bit_and
>>   (if (shift > 0)
>>(lshift (convert @0) { build_int_cst (integer_type_node, shift); })
>>(convert (rshift @0 { build_int_cst (integer_type_node, -shift); })))
>>   @3)
>>
>> What do you think?
>
> Shouldn't be necessary if we're working on unsigned types.  May be necessary
> on signed types unless we're 100% sure sign bit tests will be canonicalized.

Instead of the bit_and better put a if() condition around this.

Richard.


> jeff

Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c

On Mon, Nov 28, 2016 at 10:23 PM, Jeff Law  wrote:
>
>
> I was digging into  issues around the patches for 78120 when I stumbled upon
> undesirable bb copying in bb-reorder.c on the m68k.
>
> The core issue is that the m68k does not define a length attribute and
> therefore generic code assumes that the length of all insns is 0 bytes.

What other targets behave like this?

> That in turn makes bb-reorder think it is infinitely cheap to copy basic
> blocks.  In the two codebases I looked at (GCC's runtime libraries and
> newlib) this leads to a 10% and 15% undesirable increase in code size.
>
> I've taken a slight variant of this patch and bootstrapped/regression tested
> it on x86_64-linux-gnu to verify sanity as well as built the m68k target
> libraries noted above.
>
> OK for the trunk?

I wonder if it isn't better to default to a length of 1 instead of zero when
there is no length attribute.  There are more users of the length attribute
in bb-reorder.c (and elsewhere as well I suppose).

from get_attr_length_1 it looks like a "cheap" target dependent way
would be to define ADJUST_INSN_LENGTH ...

Richard.

> Jeff
>
> * bb-reorder.c (copy_bb_p): Sanely handle case where the target
> has not defined length attributes for its insns.
>
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index 6873b4f..0b8d1d9 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -115,6 +115,7 @@
>  #include "bb-reorder.h"
>  #include "except.h"
>  #include "fibonacci_heap.h"
> +#include "insn-attr.h"
>
>  /* The number of rounds.  In most cases there will only be 4 rounds, but
> when partitioning hot and cold basic blocks into separate sections of
> @@ -1355,6 +1356,9 @@ copy_bb_p (const_basic_block bb, int code_may_grow)
>int max_size = uncond_jump_length;
>rtx_insn *insn;
>
> +  if (!HAVE_ATTR_length)
> +return false;
> +
>if (!bb->frequency)
>  return false;
>if (EDGE_COUNT (bb->preds) < 2)
>

Re: [PATCH/VRP] Fix type of EQ_EXPR

On Tue, Nov 29, 2016 at 7:36 AM, Andrew Pinski  wrote:
> While rewriting PHI-OPT to use match and simplify infrastructure, I
> ran into a problem where VRP pass would create a EQ_EXPR which has a
> non boolean type inside the VRP pass.  This currently works on
> accident as it seems we don't check the type of the argument of
> COND_EXPR gimple to be boolean type if it is a comparison.
>
> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Ok.

Richard.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-vrp.c (simplify_stmt_using_ranges): Use boolean_type_node
> for the EQ_EXPR.

[Ping][PATCH 0/6][ARM] Implement support for ACLE Coprocessor Intrinsics

2016-11-29 Thread Andre Vieira (lists)

On 21/11/16 08:42, Christophe Lyon wrote:
> Hi,
> 
> 
> On 17 November 2016 at 11:45, Kyrill Tkachov
>  wrote:
>>
>> On 17/11/16 10:31, Andre Vieira (lists) wrote:
>>>
>>> Hi Kyrill,
>>>
>>> On 17/11/16 10:11, Kyrill Tkachov wrote:

 Hi Andre,

 On 09/11/16 10:00, Andre Vieira (lists) wrote:
>
> Tested the series by bootstrapping arm-none-linux-gnuabihf and found no
> regressions, also did a normal build for arm-none-eabi and ran the
> acle.exp tests for a Cortex-M3.

 Can you please also do a full testsuite run on arm-none-linux-gnueabihf.
 Patches have to be tested by the whole testsuite.
>>>
>>> That's what I have done and meant to say with "Tested the series by
>>> bootstrapping arm-none-linux-gnuabihf and found no regressions". I
>>> compared gcc/g++/libstdc++ tests on a bootstrap with and without the
>>> patches.
>>
>>
>> Ah ok, great.
>>
>>>
>>> I'm happy to rerun the tests after a rebase when the patches get approved.
>>
> FWIW, I ran a validation with the 6 patches applied, and saw no regression.
> Given the large number of new tests, I didn't check the full details.
> 
> If you want to check that each configuration has the PASSes you expect,
> you can have a look at:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/242581-acle/report-build-info.html
> 
> Thanks,
> 
> Christophe
> 
> 
>>
>> Thanks,
>> Kyrill
>>
>>>
>>> Cheers,
>>> Andre
>>
>>
Ping. (For the patch series).

Re: [Ping][PATCH 0/6][ARM] Implement support for ACLE Coprocessor Intrinsics

On 29/11/16 10:35, Andre Vieira (lists) wrote:

On 21/11/16 08:42, Christophe Lyon wrote:

Hi,

On 17 November 2016 at 11:45, Kyrill Tkachov
wrote:

On 17/11/16 10:31, Andre Vieira (lists) wrote:

Hi Kyrill,

On 17/11/16 10:11, Kyrill Tkachov wrote:

Hi Andre,

On 09/11/16 10:00, Andre Vieira (lists) wrote:

Tested the series by bootstrapping arm-none-linux-gnuabihf and found no
regressions, also did a normal build for arm-none-eabi and ran the
acle.exp tests for a Cortex-M3.

Can you please also do a full testsuite run on arm-none-linux-gnueabihf.
Patches have to be tested by the whole testsuite.

That's what I have done and meant to say with "Tested the series by
bootstrapping arm-none-linux-gnuabihf and found no regressions". I
compared gcc/g++/libstdc++ tests on a bootstrap with and without the
patches.

Ah ok, great.

I'm happy to rerun the tests after a rebase when the patches get approved.

FWIW, I ran a validation with the 6 patches applied, and saw no regression.
Given the large number of new tests, I didn't check the full details.

If you want to check that each configuration has the PASSes you expect,
you can have a look at:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/242581-acle/report-build-info.html

Thanks,

Christophe

Thanks,
Kyrill

Cheers,
Andre

Hi Andre,

Ping. (For the patch series).

Have you seen my review at:
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01778.html ?
It might require some minor rework of some parts of the series.

Thanks,
Kyrill

Re: [PATCH 4/4] S/390: Disable peeling for alignment.

On Tue, Nov 29, 2016 at 10:42 AM, Andreas Krebbel
 wrote:
> Although the S/390 backend states that the machine supports unaligned
> vector accesses the loop vectorizer still tries to peel loop
> iterations to get higher alignments.  Setting
> vect_max_peeling_for_alignment to 0 prevents this.

Hmm, having proper costs should also do this...  but I see you neither
have TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST nor
TARGET_VECTORIZE_{INIT,ADD,FINISH}_COST hooks.

The default implementation (default_builtin_vectorization_cost)
makes unaligned loads/stores twice as expensive as the aligned
variants.

So - please instead of setting this param provide
TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.

Richard.

> gcc/ChangeLog:
>
> 2016-11-29  Andreas Krebbel  
>
> * config/s390/s390.c (s390_option_override_internal): Set
> vect_max_peeling_for_alignment to 0.
>
> gcc/testsuite/ChangeLog:
>
> 2016-11-29  Andreas Krebbel  
>
> * gcc.dg/tree-ssa/gen-vect-26.c: Disable peeling check for s390.
> * gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
> ---
>  gcc/config/s390/s390.c  | 13 +
>  gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c |  5 ++---
>  gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c |  4 ++--
>  3 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index dab4f43..2e71745 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -14586,6 +14586,19 @@ s390_option_override_internal (bool main_args_p,
>   opts->x_param_values,
>   opts_set->x_param_values);
>
> +  /* S/390 can deal with unaligned accesses without a performance
> + penalty (as long as we do not cross a cache line boundary).  This
> + setting prevents the vectorizer from generating expensive extra
> + code emitted to reach a better alignment.
> + Don't do this when vectorize_support_vector_misalignment falls
> + back to the default path in order to avoid effects on software
> + vectorization.  */
> +  if (TARGET_VX)
> +maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
> +  0,
> +  opts->x_param_values,
> +  opts_set->x_param_values);
> +
>/* Call target specific restore function to do post-init work.  At the 
> moment,
>   this just sets opts->x_s390_cost_pointer.  */
>s390_function_specific_restore (opts, NULL);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
> index 8e5f141..461a952 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
> @@ -28,7 +28,6 @@ int main ()
>return 0;
>  }
>
> -
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { ! avr-*-* } } } } */
> -/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
> "vect" { target { ! avr-*-* } } } } */
> -/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
> peeling" 1 "vect" { target { ! avr-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
> "vect" { target { ! { avr-*-* s390*-*-* } } } } } */
> +/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
> peeling" 1 "vect" { target { ! { avr-*-* s390*-*-* } } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
> index ce97e09..fe44e85 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
> @@ -38,5 +38,5 @@ int main (void)
>
>
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { ! avr-*-* } } } } */
> -/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
> "vect" { target { ! avr-*-* } } } } */
> -/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
> peeling" 1 "vect" { target { ! avr-*-* } } } } */
> +/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 
> "vect" { target { ! { avr-*-* s390*-*-* } } } } } */
> +/* { dg-final { scan-tree-dump-times "Alignment of access forced using 
> peeling" 1 "vect" { target { ! { avr-*-* s390*-*-* } } } } } */
> --
> 2.9.1
>

[PATCH] Support nested functions (PR sanitize/78541).

2016-11-29 Thread Martin Liška

Currently we an assert that prevents proper use-after-scope sanitization
in nested functions. With the attached patch, we are able to do so.
I'm adding 2 test-cases, first one is the ICE reported in PR and the second
one tests proper report of use-after-scope passed by FRAME belonging to a
nested function call.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 8e02ebdf64a82f0dfc7be531a38702497dece26b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 28 Nov 2016 13:05:33 +0100
Subject: [PATCH] Support nested functions (PR sanitize/78541).

gcc/testsuite/ChangeLog:

2016-11-28  Martin Liska  

	PR sanitize/78541
	* gcc.dg/asan/pr78541-2.c: New test.
	* gcc.dg/asan/pr78541.c: New test.

gcc/ChangeLog:

2016-11-28  Martin Liska  

	PR sanitize/78541
	* asan.c (asan_expand_mark_ifn): Properly
	select a VAR_DECL from FRAME.* component reference.
---
 gcc/asan.c|  6 ++
 gcc/testsuite/gcc.dg/asan/pr78541-2.c | 10 ++
 gcc/testsuite/gcc.dg/asan/pr78541.c   | 25 +
 3 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr78541-2.c
 create mode 100644 gcc/testsuite/gcc.dg/asan/pr78541.c

diff --git a/gcc/asan.c b/gcc/asan.c
index 6e93ea3..cb5d615 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -2713,6 +2713,12 @@ asan_expand_mark_ifn (gimple_stmt_iterator *iter)
   tree base = gimple_call_arg (g, 1);
   gcc_checking_assert (TREE_CODE (base) == ADDR_EXPR);
   tree decl = TREE_OPERAND (base, 0);
+
+  /* For a nested function, we can have: ASAN_MARK (2, &FRAME.2.fp_input, 4) */
+  if (TREE_CODE (decl) == COMPONENT_REF
+  && DECL_NONLOCAL_FRAME (TREE_OPERAND (decl, 0)))
+decl = TREE_OPERAND (decl, 0);
+
   gcc_checking_assert (TREE_CODE (decl) == VAR_DECL);
   if (asan_handled_variables == NULL)
 asan_handled_variables = new hash_set (16);
diff --git a/gcc/testsuite/gcc.dg/asan/pr78541-2.c b/gcc/testsuite/gcc.dg/asan/pr78541-2.c
new file mode 100644
index 000..44be19c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr78541-2.c
@@ -0,0 +1,10 @@
+/* PR sanitizer/78560 */
+/* { dg-do compile } */
+
+void __quadmath_mpn_extract_flt128 (long *fp_input);
+
+int fn1 ()
+{
+  long fp_input[1];
+  int hack_digit () { __quadmath_mpn_extract_flt128 (fp_input); }
+}
diff --git a/gcc/testsuite/gcc.dg/asan/pr78541.c b/gcc/testsuite/gcc.dg/asan/pr78541.c
new file mode 100644
index 000..fb02082
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr78541.c
@@ -0,0 +1,25 @@
+// PR sanitizer/78560
+// { dg-do run }
+// { dg-shouldfail "asan" }
+
+void foo (double a, double b)
+{
+  double *ptr;
+{
+  double x = a + b;
+  ptr = &x;
+}
+ double square () { __builtin_printf ("", *ptr); }
+
+ square ();
+}
+
+int main()
+{
+  foo (1.2f, 2.3f);
+  return 0;
+}
+
+// { dg-output "ERROR: AddressSanitizer: stack-use-after-scope on address.*(\n|\r\n|\r)" }
+// { dg-output "READ of size.*" }
+// { dg-output ".*'x' <== Memory access at offset \[0-9\]* is inside this variable.*" }
-- 
2.10.2

[PATCH] Make one extra BB to prevent PHI argument clash (PR, gcov-profile/78582)

2016-11-29 Thread Martin Liška

Following ICE has been reduced from bash, where a new CFG does not properly
fill a newly added PHI argument. Problem is solved by adding one extra BB that
precedes the original BB with the PHI. Doing so, we do not add a new PHI 
argument.

Tests have been running.
Ready to be installed after it finishes?

Thanks,
Martin
>From f3de44cbf026d3295d42c36e864d469f19fc56cc Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 29 Nov 2016 11:40:04 +0100
Subject: [PATCH] Make one extra BB to prevent PHI argument clash (PR
 gcov-profile/78582)

gcc/testsuite/ChangeLog:

2016-11-29  Martin Liska  

	PR gcov-profile/78582
	* gcc.dg/pr78582.c: New test.

gcc/ChangeLog:

2016-11-29  Martin Liska  

	PR gcov-profile/78582
	* tree-profile.c (gimple_gen_time_profiler): Make one extra BB
	to prevent PHI argument clash.
---
 gcc/testsuite/gcc.dg/pr78582.c | 18 ++
 gcc/tree-profile.c |  6 +++---
 2 files changed, 21 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr78582.c

diff --git a/gcc/testsuite/gcc.dg/pr78582.c b/gcc/testsuite/gcc.dg/pr78582.c
new file mode 100644
index 000..3084e3b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr78582.c
@@ -0,0 +1,18 @@
+/* PR target/78582. */
+/* { dg-options "-fprofile-generate" } */
+/* { dg-compile } */
+
+#include 
+
+void reader_loop () {}
+
+int
+main (int argc, char argv, char env)
+{
+  int a;
+  sigsetjmp (0, 0);
+  argc = a = argc;
+  reader_loop ();
+
+  return 0;
+}
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index a4f9d11..77fb86e 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -461,10 +461,10 @@ void
 gimple_gen_time_profiler (unsigned tag, unsigned base)
 {
   tree type = get_gcov_type ();
-  basic_block cond_bb
-= split_edge (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
-
+  basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+  basic_block cond_bb = split_edge (single_succ_edge (entry));
   basic_block update_bb = split_edge (single_succ_edge (cond_bb));
+  split_edge (single_succ_edge (update_bb));
 
   edge true_edge = single_succ_edge (cond_bb);
   true_edge->flags = EDGE_TRUE_VALUE;
-- 
2.10.2

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-29 Thread James Greenhalgh

On Mon, Nov 14, 2016 at 02:25:28PM +, Kyrill Tkachov wrote:
> 
> On 11/11/16 15:31, Kyrill Tkachov wrote:
> >
> >On 11/11/16 10:17, Kyrill Tkachov wrote:
> >>
> >>On 10/11/16 23:39, Segher Boessenkool wrote:
> >>>On Thu, Nov 10, 2016 at 02:42:24PM -0800, Andrew Pinski wrote:
> On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov
> >I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there 
> >were
> >some interesting swings.
> >458.sjeng +1.45%
> >471.omnetpp   +2.19%
> >445.gobmk -2.01%
> >
> >On SPECFP:
> >453.povray+7.00%
> 
> Wow, this looks really good.  Thank you for implementing this.  If I
> get some time I am going to try it out on other processors than A72
> but I doubt I have time any time soon.
> >>>I'd love to hear what causes the slowdown for gobmk as well, btw.
> >>
> >>I haven't yet gotten a direct answer for that (through performance analysis
> >>tools) but I have noticed that load/store pairs are not generated as
> >>aggressively as I hoped.  They are being merged by the sched fusion pass
> >>and peepholes (which runs after this) but it still misses cases. I've
> >>hacked the SWS hooks to generate pairs explicitly and that increases the
> >>number of pairs and helps code size to boot. It complicates the logic of
> >>the hooks a bit but not too much.
> >>
> >>I'll make those changes and re-benchmark, hopefully that
> >>will help performance.
> >>
> >
> >And here's a version that explicitly emits pairs. I've looked at assembly
> >codegen on SPEC2006 and it generates quite a few more LDP/STP pairs than the
> >original version.  I kicked off benchmarks over the weekend to see the
> >effect.  Andrew, if you want to try it out (more benchmarking and testing
> >always welcome) this is the one to try.
> >
> 
> And I discovered over the weekend that gamess and wrf have validation errors.
> This version runs correctly.  SPECINT results were fine though and there is
> even a small overall gain due to sjeng and omnetpp. However, gobmk still has
> the regression.  I'll rerun SPECFP with this patch (it's really just a small
> bugfix over the previous version) and get on with analysing gobmk.

I have some comments in line, most of which are about hardcoding the
maximum register number, but at a high level this looks good to me.

Thanks,
James

> 2016-11-11  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.h (machine_function): Add
> reg_is_wrapped_separately field.
> * config/aarch64/aarch64.c (emit_set_insn): Change return type to
> rtx_insn *.
> (aarch64_save_callee_saves): Don't save registers that are wrapped
> separately.
> (aarch64_restore_callee_saves): Don't restore registers that are
> wrapped separately.
> (offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p,
> aarch64_offset_7bit_signed_scaled_p): Move earlier in the file.
> (aarch64_get_separate_components): New function.
> (aarch64_get_next_set_bit): Likewise.
> (aarch64_components_for_bb): Likewise.
> (aarch64_disqualify_components): Likewise.
> (aarch64_emit_prologue_components): Likewise.
> (aarch64_emit_epilogue_components): Likewise.
> (aarch64_set_handled_components): Likewise.
> (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
> TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
> TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
> TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
> TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
> TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define.
> 

> commit 06ac3c30d8aa38781ee9019e60a5fcf727b85231
> Author: Kyrylo Tkachov 
> Date:   Tue Oct 11 09:25:54 2016 +0100
> 
> [AArch64] Separate shrink wrapping hooks implementation
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 325e725..2d33ef6 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1138,7 +1138,7 @@ aarch64_is_extend_from_extract (machine_mode mode, rtx 
> mult_imm,
>  
>  /* Emit an insn that's a simple single-set.  Both the operands must be
> known to be valid.  */
> -inline static rtx
> +inline static rtx_insn *
>  emit_set_insn (rtx x, rtx y)
>  {
>return emit_insn (gen_rtx_SET (x, y));
> @@ -3135,6 +3135,9 @@ aarch64_save_callee_saves (machine_mode mode, 
> HOST_WIDE_INT start_offset,
> || regno == cfun->machine->frame.wb_candidate2))
>   continue;
>  
> +  if (cfun->machine->reg_is_wrapped_separately[regno])
> +   continue;
> +
>reg = gen_rtx_REG (mode, regno);
>offset = start_offset + cfun->machine->frame.reg_offset[regno];
>mem = gen_mem_ref (mode, plus_constant (Pmode, stack_pointer_rtx,
> @@ -3143,6 +3146,7 @@ aarch64_save_callee_saves (machine_mode mode, 
> HOST_WIDE_INT start_offset,
>regno2 = aarch64_next_callee_save (regno + 1, limit);
>  
>if (regno2 <= limit
> +   && !cfun->machine->reg_is_wrapped_separately[regno2]
> && (

[PATCH][ARM] Remove movdi_vfp_cortexa8

Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
unnecessary duplication and repeating bugs like PR78439 due to changes being
applied only to one of the duplicates.

Bootstrap OK for ARM and Thumb-2 gnueabihf targets. OK for commit?

ChangeLog:
2016-11-29  Wilco Dijkstra  

* config/arm/vfp.md (movdi_vfp): Merge changes from movdi_vfp_cortexa8.
* (movdi_vfp_cortexa8): Remove pattern.
--

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
2051f1018f1cbff9c5bf044e71304d78e615458e..a917aa625a7b15f6c9e2b549ab22e5219bb9b99c
 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -304,9 +304,9 @@
 ;; DImode moves
 
 (define_insn "*movdi_vfp"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,q,q,m,w,r,w,w, 
Uv")
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
"=r,r,r,r,q,q,m,w,!r,w,w, Uv")
(match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,q,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune != TARGET_CPU_cortexa8
+  "TARGET_32BIT && TARGET_HARD_FLOAT
&& (   register_operand (operands[0], DImode)
|| register_operand (operands[1], DImode))
&& !(TARGET_NEON && CONST_INT_P (operands[1])
@@ -339,71 +339,25 @@
 }
   "
   [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1,4,5,6") (const_int 8)
+   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
   (eq_attr "alternative" "2") (const_int 12)
   (eq_attr "alternative" "3") (const_int 16)
+ (eq_attr "alternative" "4,5,6")
+  (symbol_ref "arm_count_output_move_double_insns 
(operands) * 4")
   (eq_attr "alternative" "9")
(if_then_else
  (match_test "TARGET_VFP_SINGLE")
  (const_int 8)
  (const_int 4))]
   (const_int 4)))
+   (set_attr "predicable""yes")
(set_attr "arm_pool_range" "*,*,*,*,1020,4096,*,*,*,*,1020,*")
(set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
(set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
+   (set (attr "ce_count") (symbol_ref "get_attr_length (insn) / 4"))
(set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,any")]
 )
 
-(define_insn "*movdi_vfp_cortexa8"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
"=r,r,r,r,r,r,m,w,!r,w,w, Uv")
-   (match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,r,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune == TARGET_CPU_cortexa8
-&& (   register_operand (operands[0], DImode)
-|| register_operand (operands[1], DImode))
-&& !(TARGET_NEON && CONST_INT_P (operands[1])
-&& neon_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
-  "*
-  switch (which_alternative)
-{
-case 0: 
-case 1:
-case 2:
-case 3:
-  return \"#\";
-case 4:
-case 5:
-case 6:
-  return output_move_double (operands, true, NULL);
-case 7:
-  return \"vmov%?\\t%P0, %Q1, %R1\\t%@ int\";
-case 8:
-  return \"vmov%?\\t%Q0, %R0, %P1\\t%@ int\";
-case 9:
-  return \"vmov%?.f64\\t%P0, %P1\\t%@ int\";
-case 10: case 11:
-  return output_move_vfp (operands);
-default:
-  gcc_unreachable ();
-}
-  "
-  [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
-   (eq_attr "alternative" "2") (const_int 12)
-   (eq_attr "alternative" "3") (const_int 16)
-   (eq_attr "alternative" "4,5,6") 
-  (symbol_ref 
-   "arm_count_output_move_double_insns (operands) \
- * 4")]
-  (const_int 4)))
-   (set_attr "predicable""yes")
-   (set_attr "arm_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
-   (set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
-   (set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
-   (set (attr "ce_count") 
-   (symbol_ref "get_attr_length (insn) / 4"))
-   (set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,any")]
- )
-
 ;; HFmode moves
 
 (define_insn "*movhf_vfp_fp16"

[PATCH] Remove uninitialized reads of is_leaf

GCC caches the whether a function is a leaf in crtl->is_leaf. Using this
in the backend is best as leaf_function_p may not work correctly (eg. while
emitting prolog or epilog code).  There are many reads of crtl->is_leaf
before it is initialized.  Many targets do in targetm.frame_pointer_required
(eg. arm, aarch64, i386, mips, sparc), which is called before register 
allocation by ira_setup_eliminable_regset and sched_init.

Additionally, SHRINK_WRAPPING_ENABLED calls targetm.have_simple_return,
which evaluates the condition of the simple_return instruction.  On ARM
this results in a call to use_simple_return_p which requires crtl->is_leaf
to be set correctly.

To fix this, initialize crtl->is_leaf in ira_setup_eliminable_regset and
early on in ira.  A bootstrap did not find any uninitialized reads of
crtl->is_leaf on Thumb-2.  A follow-up patch will remove incorrect uses
of leaf_function_p from the ARM backend.

Bootstrap OK (verified all reads of is_leaf in ARM backend are now after
initialization), OK for commit?

ChangeLog:
2016-11-29  Wilco Dijkstra  

* gcc/ira.c (ira_setup_eliminable_regset): Initialize crtl->is_leaf.
(ira): Move initialization of crtl->is_leaf earlier.
--

diff --git a/gcc/ira.c b/gcc/ira.c
index 
d20ec99fae562930424023be93ac77bb376445ef..00d32c3732f67c19fac922ca80d1ed11e8fc645b
 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -2266,6 +2266,10 @@ ira_setup_eliminable_regset (void)
   int i;
   static const struct {const int from, to; } eliminables[] = ELIMINABLE_REGS;
 
+  /* Setup is_leaf as frame_pointer_required may use it.  This function
+ is called by sched_init before ira.f scheduling is enabled.  */
+  crtl->is_leaf = leaf_function_p ();
+
   /* FIXME: If EXIT_IGNORE_STACK is set, we will not save and restore
  sp for alloca.  So we can't eliminate the frame pointer in that
  case.  At some point, we should improve this by emitting the
@@ -5074,6 +5078,13 @@ ira (FILE *f)
 
   clear_bb_flags ();
 
+  /* Determine if the current function is a leaf before running IRA
+ since this can impact optimizations done by the prologue and
+ epilogue thus changing register elimination offsets.
+ Other target callbacks may use crtl->is_leaf too, including
+ SHRINK_WRAPPING_ENABLED, so initialize as early as possible.  */
+  crtl->is_leaf = leaf_function_p ();
+
   /* Perform target specific PIC register initialization.  */
   targetm.init_pic_reg ();
 
@@ -5159,11 +5170,6 @@ ira (FILE *f)
   if (warn_clobbered)
 generate_setjmp_warnings ();
 
-  /* Determine if the current function is a leaf before running IRA
- since this can impact optimizations done by the prologue and
- epilogue thus changing register elimination offsets.  */
-  crtl->is_leaf = leaf_function_p ();
-
   if (resize_reg_info () && flag_ira_loop_pressure)
 ira_set_pseudo_classes (true, ira_dump_file);

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation


Hi James,

On 29/11/16 10:57, James Greenhalgh wrote:

On Mon, Nov 14, 2016 at 02:25:28PM +, Kyrill Tkachov wrote:

On 11/11/16 15:31, Kyrill Tkachov wrote:

On 11/11/16 10:17, Kyrill Tkachov wrote:

On 10/11/16 23:39, Segher Boessenkool wrote:

On Thu, Nov 10, 2016 at 02:42:24PM -0800, Andrew Pinski wrote:

On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov

I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were
some interesting swings.
458.sjeng +1.45%
471.omnetpp   +2.19%
445.gobmk -2.01%

On SPECFP:
453.povray+7.00%

Wow, this looks really good.  Thank you for implementing this.  If I
get some time I am going to try it out on other processors than A72
but I doubt I have time any time soon.

I'd love to hear what causes the slowdown for gobmk as well, btw.

I haven't yet gotten a direct answer for that (through performance analysis
tools) but I have noticed that load/store pairs are not generated as
aggressively as I hoped.  They are being merged by the sched fusion pass
and peepholes (which runs after this) but it still misses cases. I've
hacked the SWS hooks to generate pairs explicitly and that increases the
number of pairs and helps code size to boot. It complicates the logic of
the hooks a bit but not too much.

I'll make those changes and re-benchmark, hopefully that
will help performance.


And here's a version that explicitly emits pairs. I've looked at assembly
codegen on SPEC2006 and it generates quite a few more LDP/STP pairs than the
original version.  I kicked off benchmarks over the weekend to see the
effect.  Andrew, if you want to try it out (more benchmarking and testing
always welcome) this is the one to try.


And I discovered over the weekend that gamess and wrf have validation errors.
This version runs correctly.  SPECINT results were fine though and there is
even a small overall gain due to sjeng and omnetpp. However, gobmk still has
the regression.  I'll rerun SPECFP with this patch (it's really just a small
bugfix over the previous version) and get on with analysing gobmk.

I have some comments in line, most of which are about hardcoding the
maximum register number, but at a high level this looks good to me.


Thanks for having a look.
I'll respin with the comments addressed and I have a couple of comments inline.

Kyrill


Thanks,
James







+
+/* Implement TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB.  */
+
+static sbitmap
+aarch64_components_for_bb (basic_block bb)
+{
+  bitmap in = DF_LIVE_IN (bb);
+  bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
+  bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
+
+  sbitmap components = sbitmap_alloc (V31_REGNUM + 1);
+  bitmap_clear (components);
+
+  /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
+  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
The use of R0_REGNUM and V31_REGNUM scare me a little bit, as we're hardcoding
where the end of the register file is (does this, for example, fall apart
with the SVE work that was recently posted). Something like a
LAST_HARDREG_NUM might work?



I think you mean FIRST_PSEUDO_REGISTER. AFAICS the compiler uses
a loop from 0 to FIRST_PSEUDO_REGISTER to go through the hard registers
in various places in the midend.
I'll change it to use that, though if the way to save/restore such new 
registers becomes
different from the current approach (i.e. perform a DI/DFmode memory op) the 
code in these
hooks will have to be updated anyway to take it into account.


+if ((!call_used_regs[regno])
+   && (bitmap_bit_p (in, regno)
+  || bitmap_bit_p (gen, regno)
+  || bitmap_bit_p (kill, regno)))
+ bitmap_set_bit (components, regno);
+
+  return components;
+}
+
+/* Implement TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS.
+   Nothing to do for aarch64.  */
+
+static void
+aarch64_disqualify_components (sbitmap, edge, sbitmap, bool)
+{
+}

Is there no default "do nothing" hook for this?



I don't see one defined anywhere and the documentation for 
TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS
says that if it is defined, all other hooks in the group must be defined.


+
+/* Return the next set bit in BMP from START onwards.  Return the total number
+   of bits in BMP if no set bit is found at or after START.  */
+
+static unsigned int
+aarch64_get_next_set_bit (sbitmap bmp, unsigned int start)
+{
+  unsigned int nbits = SBITMAP_SIZE (bmp);
+  if (start == nbits)
+return start;
+
+  gcc_assert (start < nbits);
+  for (unsigned int i = start; i < nbits; i++)
+if (bitmap_bit_p (bmp, i))
+  return i;
+
+  return nbits;
+}
+
+/* Implement TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS.  */
+
+static void
+aarch64_emit_prologue_components (sbitmap components)
+{
+  rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
+? HARD_FRAME_POINTER_REGNUM
+: STACK_POINTER_REGNUM);
+
+  unsigned total_bits = SBITMAP_SIZE (components);

Would this be clearer called last_regno ?


+  u

[Patch, testsuite] Fix bogus pr31096-1.c failure for avr

2016-11-29 Thread Senthil Kumar Selvaraj

Hi,

  This patch fixes a bogus testsuite failure (gcc.dg/pr31096-1.c) for
  the avr target.

  The dump expects constants which would only be present if the target's
  int size is 32 bits.

  Fixed by explicitly using 32 bit ints for targets with __SIZEOF_INT__
  < 4. Committed to trunk as obvious.

Regards
Senthil


gcc/testsuite/ChangeLog

2016-11-29  Senthil Kumar Selvaraj  

* testsuite/gcc.dg/pr31096-1.c: Use __{U,}INT32_TYPE__ for
targets with sizeof(int) < 4.


Index: gcc/testsuite/gcc.dg/pr31096-1.c
===
--- gcc/testsuite/gcc.dg/pr31096-1.c(revision 242953)
+++ gcc/testsuite/gcc.dg/pr31096-1.c(revision 242954)
@@ -2,8 +2,16 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 
+#if __SIZEOF_INT__ < 4
+  __extension__ typedef __INT32_TYPE__  int32_t;
+  __extension__ typedef __UINT32_TYPE__ uint32_t;
+#else
+  typedef int int32_t;
+  typedef unsigned uint32_t;
+#endif
+
 #define zero(name, op) \
-int name (int a, int b) \
+int32_t name (int32_t a, int32_t b) \
 { return a * 0 op b * 0; }
 
 zero(zeq, ==) zero(zne, !=) zero(zlt, <)
@@ -10,7 +18,7 @@
 zero(zgt, >)  zero(zge, >=) zero(zle, <=)
 
 #define unsign_pos(name, op) \
-int name (unsigned a, unsigned b) \
+int32_t name (uint32_t a, uint32_t b) \
 { return a * 4 op b * 4; }
 
 unsign_pos(upeq, ==) unsign_pos(upne, !=) unsign_pos(uplt, <)
@@ -17,7 +25,7 @@
 unsign_pos(upgt, >)  unsign_pos(upge, >=) unsign_pos(uple, <=)
 
 #define unsign_neg(name, op) \
-int name (unsigned a, unsigned b) \
+int32_t name (uint32_t a, uint32_t b) \
 { return a * -2 op b * -2; }
 
 unsign_neg(uneq, ==) unsign_neg(unne, !=) unsign_neg(unlt, <)
@@ -24,7 +32,7 @@
 unsign_neg(ungt, >)  unsign_neg(unge, >=) unsign_neg(unle, <=)
 
 #define float(name, op) \
-int name (float a, float b) \
+int32_t name (float a, float b) \
 { return a * 5 op b * 5; }
 
 float(feq, ==) float(fne, !=) float(flt, <)
@@ -31,7 +39,7 @@
 float(fgt, >)  float(fge, >=) float(fle, <=)
 
 #define float_val(name, op) \
-int name (int a, int b) \
+int32_t name (int32_t a, int32_t b) \
 { return a * 54.0 op b * 54.0; }
 
 float_val(fveq, ==) float_val(fvne, !=) float_val(fvlt, <)
@@ -38,8 +46,8 @@
 float_val(fvgt, >)  float_val(fvge, >=) float_val(fvle, <=)
 
 #define vec(name, op) \
-int name (int a, int b) \
-{ int c[10]; return a * c[1] op b * c[1]; }
+int32_t name (int32_t a, int32_t b) \
+{ int32_t c[10]; return a * c[1] op b * c[1]; }
 
 vec(veq, ==) vec(vne, !=) vec(vlt, <)
 vec(vgt, >)  vec(vge, >=) vec(vle, <=)

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation



On 29/11/16 11:18, Kyrill Tkachov wrote:

Hi James,

On 29/11/16 10:57, James Greenhalgh wrote:

On Mon, Nov 14, 2016 at 02:25:28PM +, Kyrill Tkachov wrote:

On 11/11/16 15:31, Kyrill Tkachov wrote:

On 11/11/16 10:17, Kyrill Tkachov wrote:

On 10/11/16 23:39, Segher Boessenkool wrote:

On Thu, Nov 10, 2016 at 02:42:24PM -0800, Andrew Pinski wrote:

On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov

I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were
some interesting swings.
458.sjeng +1.45%
471.omnetpp   +2.19%
445.gobmk -2.01%

On SPECFP:
453.povray+7.00%

Wow, this looks really good.  Thank you for implementing this.  If I
get some time I am going to try it out on other processors than A72
but I doubt I have time any time soon.

I'd love to hear what causes the slowdown for gobmk as well, btw.

I haven't yet gotten a direct answer for that (through performance analysis
tools) but I have noticed that load/store pairs are not generated as
aggressively as I hoped.  They are being merged by the sched fusion pass
and peepholes (which runs after this) but it still misses cases. I've
hacked the SWS hooks to generate pairs explicitly and that increases the
number of pairs and helps code size to boot. It complicates the logic of
the hooks a bit but not too much.

I'll make those changes and re-benchmark, hopefully that
will help performance.


And here's a version that explicitly emits pairs. I've looked at assembly
codegen on SPEC2006 and it generates quite a few more LDP/STP pairs than the
original version.  I kicked off benchmarks over the weekend to see the
effect.  Andrew, if you want to try it out (more benchmarking and testing
always welcome) this is the one to try.


And I discovered over the weekend that gamess and wrf have validation errors.
This version runs correctly.  SPECINT results were fine though and there is
even a small overall gain due to sjeng and omnetpp. However, gobmk still has
the regression.  I'll rerun SPECFP with this patch (it's really just a small
bugfix over the previous version) and get on with analysing gobmk.

I have some comments in line, most of which are about hardcoding the
maximum register number, but at a high level this looks good to me.


Thanks for having a look.
I'll respin with the comments addressed and I have a couple of comments inline.

Kyrill


Thanks,
James







+
+/* Implement TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB.  */
+
+static sbitmap
+aarch64_components_for_bb (basic_block bb)
+{
+  bitmap in = DF_LIVE_IN (bb);
+  bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
+  bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
+
+  sbitmap components = sbitmap_alloc (V31_REGNUM + 1);
+  bitmap_clear (components);
+
+  /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
+  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
The use of R0_REGNUM and V31_REGNUM scare me a little bit, as we're hardcoding
where the end of the register file is (does this, for example, fall apart
with the SVE work that was recently posted). Something like a
LAST_HARDREG_NUM might work?



I think you mean FIRST_PSEUDO_REGISTER. AFAICS the compiler uses
a loop from 0 to FIRST_PSEUDO_REGISTER to go through the hard registers
in various places in the midend.
I'll change it to use that, though if the way to save/restore such new 
registers becomes
different from the current approach (i.e. perform a DI/DFmode memory op) the 
code in these
hooks will have to be updated anyway to take it into account.



And actually trying to implement this blows up. The "hard" registers include 
CC_REGNUM, which we
definitely want to avoid 'saving/restoring'. We really just want to save the 
normal register
data registers, so is it okay if I leave it as it is?
The prologue/epilogue code already uses V31_REGNUM, so it would need to change 
anyway if new
registers are added in the future.

Kyrill


+if ((!call_used_regs[regno])
+   && (bitmap_bit_p (in, regno)
+   || bitmap_bit_p (gen, regno)
+   || bitmap_bit_p (kill, regno)))
+  bitmap_set_bit (components, regno);
+
+  return components;
+}
+
+/* Implement TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS.
+   Nothing to do for aarch64.  */
+
+static void
+aarch64_disqualify_components (sbitmap, edge, sbitmap, bool)
+{
+}

Is there no default "do nothing" hook for this?



I don't see one defined anywhere and the documentation for 
TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS
says that if it is defined, all other hooks in the group must be defined.


+
+/* Return the next set bit in BMP from START onwards.  Return the total number
+   of bits in BMP if no set bit is found at or after START. */
+
+static unsigned int
+aarch64_get_next_set_bit (sbitmap bmp, unsigned int start)
+{
+  unsigned int nbits = SBITMAP_SIZE (bmp);
+  if (start == nbits)
+return start;
+
+  gcc_assert (start < nbits);
+  for (unsigned int i = start; i < nbits; i++)
+if (bitmap_bit_p (bmp, i))
+  return i;
+
+  return n

Re: [PATCH] Make one extra BB to prevent PHI argument clash (PR, gcov-profile/78582)

On Tue, Nov 29, 2016 at 11:46 AM, Martin Liška  wrote:
> Following ICE has been reduced from bash, where a new CFG does not properly
> fill a newly added PHI argument. Problem is solved by adding one extra BB that
> precedes the original BB with the PHI. Doing so, we do not add a new PHI 
> argument.
>
> Tests have been running.
> Ready to be installed after it finishes?

Ok.  (you could avoid the extra forwarder if single_succ (update_bb)
does have no PHIs,
probably not worth the trouble though, or of course add the PHI args
by copying those
from the single_succ_edge)

Richard.

> Thanks,
> Martin

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-29 Thread James Greenhalgh

On Tue, Nov 29, 2016 at 11:32:41AM +, Kyrill Tkachov wrote:



> >>+
> >>+/* Implement TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB.  */
> >>+
> >>+static sbitmap
> >>+aarch64_components_for_bb (basic_block bb)
> >>+{
> >>+  bitmap in = DF_LIVE_IN (bb);
> >>+  bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
> >>+  bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
> >>+
> >>+  sbitmap components = sbitmap_alloc (V31_REGNUM + 1);
> >>+  bitmap_clear (components);
> >>+
> >>+  /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
> >>+  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
> >>The use of R0_REGNUM and V31_REGNUM scare me a little bit, as we're 
> >>hardcoding
> >>where the end of the register file is (does this, for example, fall apart
> >>with the SVE work that was recently posted). Something like a
> >>LAST_HARDREG_NUM might work?
> >>
> >
> >I think you mean FIRST_PSEUDO_REGISTER. AFAICS the compiler uses
> >a loop from 0 to FIRST_PSEUDO_REGISTER to go through the hard registers
> >in various places in the midend.
> >I'll change it to use that, though if the way to save/restore such new 
> >registers becomes
> >different from the current approach (i.e. perform a DI/DFmode memory op) the 
> >code in these
> >hooks will have to be updated anyway to take it into account.
> >
> 
> And actually trying to implement this blows up. The "hard" registers include
> CC_REGNUM, which we definitely want to avoid 'saving/restoring'. We really
> just want to save the normal register data registers, so is it okay if I
> leave it as it is?  The prologue/epilogue code already uses V31_REGNUM, so it
> would need to change anyway if new registers are added in the future.

Well, you could always define a new constant in aarch64.md ?

   (LAST_SAVED_REGISTER  63)

Which would still save hardcoding V31_REGNUM everywhere.

And yes, the pro-/epilogue functions could also do with this fix
(preapproved).

James

Re: [PATCH] improve folding of expressions that move a single bit around

2016-11-29 Thread Paolo Bonzini

On 29/11/2016 11:16, Richard Biener wrote:
>>> >>  (bit_and
>>> >>   (if (shift > 0)
>>> >>(lshift (convert @0) { build_int_cst (integer_type_node, shift); })
>>> >>(convert (rshift @0 { build_int_cst (integer_type_node, -shift); })))
>>> >>   @3)
>>> >>
>>> >> What do you think?
>> >
>> > Shouldn't be necessary if we're working on unsigned types.  May be 
>> > necessary
>> > on signed types unless we're 100% sure sign bit tests will be 
>> > canonicalized.
> Instead of the bit_and better put a if() condition around this.

Note that the bit_and is not duplicated.  While v1 of the patch did
and-then-shift (the source BIT_AND_EXPR was @2), here I'm doing
shift-then-and so I would stop capturing the source BIT_AND_EXPR.

It also matches what I'm doing for the A < 0 ? D : C pattern, so I think
it's nicer this way.

(BTW the above of course doesn't work because (if...) is only allowed at
the top level, but I've bootstrapped and regtested already a version
that works).

Paolo

Re: [PATCH] improve folding of expressions that move a single bit around

On Tue, Nov 29, 2016 at 12:50 PM, Paolo Bonzini  wrote:
>
>
> On 29/11/2016 11:16, Richard Biener wrote:
 >>  (bit_and
 >>   (if (shift > 0)
 >>(lshift (convert @0) { build_int_cst (integer_type_node, shift); })
 >>(convert (rshift @0 { build_int_cst (integer_type_node, -shift); })))
 >>   @3)
 >>
 >> What do you think?
>>> >
>>> > Shouldn't be necessary if we're working on unsigned types.  May be 
>>> > necessary
>>> > on signed types unless we're 100% sure sign bit tests will be 
>>> > canonicalized.
>> Instead of the bit_and better put a if() condition around this.
>
> Note that the bit_and is not duplicated.  While v1 of the patch did
> and-then-shift (the source BIT_AND_EXPR was @2), here I'm doing
> shift-then-and so I would stop capturing the source BIT_AND_EXPR.

Ah, ok.

> It also matches what I'm doing for the A < 0 ? D : C pattern, so I think
> it's nicer this way.
>
> (BTW the above of course doesn't work because (if...) is only allowed at
> the top level, but I've bootstrapped and regtested already a version
> that works).
>
> Paolo

Re: [Patch 2/5] OpenACC tile clause support, omp-low parts

2016-11-29 Thread Chung-Lin Tang

Adjusted and re-tested using the way you advised, attached updated patch.

Thanks,
Chung-Lin

On 2016/11/18 7:15 PM, Jakub Jelinek wrote:
> Hi!
> 
> On Thu, Nov 17, 2016 at 05:31:40PM +0800, Chung-Lin Tang wrote:
>> +#ifndef ACCEL_COMPILER
>> +  span = integer_one_node;
>> +#else
>> +  if (!e_mask)
>> +/* Not paritioning.  */
>> +span = integer_one_node;
> ...
> This goes against the recent trend of avoiding #if/#ifdef guarded blocks
> of code as much as possible, the ACCEL_COMPILER only hunk is significant
> and will usually not be enabled, so people will not notice breakages in it
> until building an accel compiler.
> What about
> #ifndef ACCEL_COMPILER
>   if (true)
> span = integer_one_node;
>   else
> #endif
>   if (!e_mask)
> /* Not paritioning.  */
> span_integer_one_node;
>   else if
> ...
> or what I've proposed earlier:
> #ifndef ACCEL_COMPILER
>  
>   e_mask = 0; 
>  
> #endif
>  
>   if (!e_mask)
>  
> ...
> 
> Ok with that fixed.
> 
>   Jakub
> 

Index: omp-low.c
===
--- omp-low.c	(revision 241809)
+++ omp-low.c	(working copy)
@@ -213,7 +213,8 @@ struct omp_for_data
   tree chunk_size;
   gomp_for *for_stmt;
   tree pre, iter_type;
-  int collapse;
+  tree tiling;  /* Tiling values (if non null).  */
+  int collapse;  /* Collapsed loops, 1 for a non-collapsed loop.  */
   int ordered;
   bool have_nowait, have_ordered, simd_schedule;
   unsigned char sched_modifiers;
@@ -242,9 +243,10 @@ struct oacc_loop
   tree routine;  /* Pseudo-loop enclosing a routine.  */
 
   unsigned mask;   /* Partitioning mask.  */
+  unsigned e_mask; /* Partitioning of element loops (when tiling).  */
   unsigned inner;  /* Partitioning of inner loops.  */
   unsigned flags;  /* Partitioning flags.  */
-  unsigned ifns;   /* Contained loop abstraction functions.  */
+  vec ifns;  /* Contained loop abstraction functions.  */
   tree chunk_size; /* Chunk size.  */
   gcall *head_end; /* Final marker of head sequence.  */
 };
@@ -256,9 +258,10 @@ enum oacc_loop_flags {
   OLF_AUTO	= 1u << 1,	/* Compiler chooses axes.  */
   OLF_INDEPENDENT = 1u << 2,	/* Iterations are known independent.  */
   OLF_GANG_STATIC = 1u << 3,	/* Gang partitioning is static (has op). */
-
+  OLF_TILE	= 1u << 4,	/* Tiled loop. */
+  
   /* Explicitly specified loop axes.  */
-  OLF_DIM_BASE = 4,
+  OLF_DIM_BASE = 5,
   OLF_DIM_GANG   = 1u << (OLF_DIM_BASE + GOMP_DIM_GANG),
   OLF_DIM_WORKER = 1u << (OLF_DIM_BASE + GOMP_DIM_WORKER),
   OLF_DIM_VECTOR = 1u << (OLF_DIM_BASE + GOMP_DIM_VECTOR),
@@ -536,13 +539,9 @@ extract_omp_for_data (gomp_for *for_stmt, struct o
 
   fd->for_stmt = for_stmt;
   fd->pre = NULL;
-  if (gimple_omp_for_collapse (for_stmt) > 1)
-fd->loops = loops;
-  else
-fd->loops = &fd->loop;
-
   fd->have_nowait = distribute || simd;
   fd->have_ordered = false;
+  fd->tiling = NULL;
   fd->collapse = 1;
   fd->ordered = 0;
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
@@ -587,9 +586,22 @@ extract_omp_for_data (gomp_for *for_stmt, struct o
 	collapse_count = &OMP_CLAUSE_COLLAPSE_COUNT (t);
 	  }
 	break;
+  case OMP_CLAUSE_TILE:
+	fd->tiling = OMP_CLAUSE_TILE_LIST (t);
+	fd->collapse = list_length (fd->tiling);
+	gcc_assert (fd->collapse);
+	collapse_iter = &OMP_CLAUSE_TILE_ITERVAR (t);
+	collapse_count = &OMP_CLAUSE_TILE_COUNT (t);
+	break;
   default:
 	break;
   }
+
+  if (fd->collapse > 1 || fd->tiling)
+fd->loops = loops;
+  else
+fd->loops = &fd->loop;
+
   if (fd->ordered && fd->collapse == 1 && loops != NULL)
 {
   fd->loops = loops;
@@ -608,7 +620,7 @@ extract_omp_for_data (gomp_for *for_stmt, struct o
   fd->sched_kind = OMP_CLAUSE_SCHEDULE_STATIC;
   gcc_assert (fd->chunk_size == NULL);
 }
-  gcc_assert (fd->collapse == 1 || collapse_iter != NULL);
+  gcc_assert ((fd->collapse == 1 && !fd->tiling) || collapse_iter != NULL);
   if (taskloop)
 fd->sched_kind = OMP_CLAUSE_SCHEDULE_RUNTIME;
   if (fd->sched_kind == OMP_CLAUSE_SCHEDULE_RUNTIME)
@@ -626,7 +638,10 @@ extract_omp_for_data (gomp_for *for_stmt, struct o
   int cnt = fd->ordered ? fd->ordered : fd->collapse;
   for (i = 0; i < cnt; i++)
 {
-  if (i == 0 && fd->collapse == 1 && (fd->ordered == 0 || loops == NULL))
+  if (i == 0
+	  && fd->collapse == 1
+	  && !fd->tiling
+	  && (fd->ordered == 0 || loops == NULL))
 	loop = &fd->loop;
   else if (loops != NULL)
 	loop = loops + i;
@@ -65

RE: [PATCH 3/4] [ARC] Refurbish mul64 support.

Ping.

> -Original Message-
> From: Claudiu Zissulescu
> Sent: Wednesday, November 16, 2016 11:18 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu ;
> francois.bed...@synopsys.com; andrew.burg...@embecosm.com
> Subject: [PATCH 3/4] [ARC] Refurbish mul64 support.
> 
> gcc/
> 2016-07-04  Claudiu Zissulescu  
> 
>   * config/arc/arc.md (mulsidi_600): Changed.
>   (umulsidi_600): Likewise.
>   (mul64): New pattern.
>   (mulu64): Likewise.
>   (mulsidi3): Changed.
>   (umulsidi3): Likewise.
> ---
>  gcc/config/arc/arc.md | 64 -
> --
>  1 file changed, 40 insertions(+), 24 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 86de423..76a3207 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -12,10 +12,6 @@
>  ;;Profiling support and performance improvements by
>  ;;Joern Rennecke (joern.renne...@embecosm.com)
>  ;;
> -;;Support for DSP multiply instructions and mul64
> -;;instructions for ARC600; and improvements in flag setting
> -;;instructions by
> -;;Muhammad Khurram Riaz (khurram.r...@arc.com)
> 
>  ;; This file is part of GCC.
> 
> @@ -2011,14 +2007,26 @@
>[(set_attr "is_sfunc" "yes")
> (set_attr "predicable" "yes")])
> 
> -(define_insn "mulsidi_600"
> +(define_insn_and_split "mulsidi_600"
> +  [(set (match_operand:DI 0 "register_operand"   
> "=c, c,c,  c")
> + (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand"
> "%Rcq#q, c,c,  c"))
> +  (sign_extend:DI (match_operand:SI 2
> "nonmemory_operand"  "Rcq#q,cL,L,C32"
> +   (clobber (reg:DI MUL64_OUT_REG))]
> +  "TARGET_MUL64_SET"
> +  "#"
> +  "TARGET_MUL64_SET"
> +  [(const_int 0)]
> +  "emit_insn (gen_mul64 (operands[1], operands[2]));
> +   emit_move_insn (operands[0], gen_rtx_REG (DImode,
> MUL64_OUT_REG));
> +   DONE;"
> +  [(set_attr "type" "multi")
> +   (set_attr "length" "8")])
> +
> +(define_insn "mul64"
>[(set (reg:DI MUL64_OUT_REG)
> - (mult:DI (sign_extend:DI
> -(match_operand:SI 0 "register_operand"  "%Rcq#q,c,c,c"))
> -  (sign_extend:DI
> -; assembler issue for "I", see mulsi_600
> -;   (match_operand:SI 1 "register_operand"
> "Rcq#q,cL,I,Cal"]
> -(match_operand:SI 1 "register_operand"
> "Rcq#q,cL,L,C32"]
> + (mult:DI
> +  (sign_extend:DI (match_operand:SI 0 "register_operand" "%Rcq#q,
> c,c,  c"))
> +  (sign_extend:DI (match_operand:SI 1 "nonmemory_operand"
> "Rcq#q,cL,L,C32"]
>"TARGET_MUL64_SET"
>"mul64%? \t0, %0, %1%&"
>[(set_attr "length" "*,4,4,8")
> @@ -2027,14 +2035,26 @@
> (set_attr "predicable" "yes,yes,no,yes")
> (set_attr "cond" "canuse,canuse,canuse_limm,canuse")])
> 
> -(define_insn "umulsidi_600"
> +(define_insn_and_split "umulsidi_600"
> +  [(set (match_operand:DI 0 "register_operand"
> "=c,c, c")
> + (mult:DI (zero_extend:DI (match_operand:SI 1 "register_operand"
> "%c,c, c"))
> +  (sign_extend:DI (match_operand:SI 2
> "nonmemory_operand"  "cL,L,C32"
> +   (clobber (reg:DI MUL64_OUT_REG))]
> +  "TARGET_MUL64_SET"
> +  "#"
> +  "TARGET_MUL64_SET"
> +  [(const_int 0)]
> +  "emit_insn (gen_mulu64 (operands[1], operands[2]));
> +   emit_move_insn (operands[0], gen_rtx_REG (DImode,
> MUL64_OUT_REG));
> +   DONE;"
> +  [(set_attr "type" "umulti")
> +   (set_attr "length" "8")])
> +
> +(define_insn "mulu64"
>[(set (reg:DI MUL64_OUT_REG)
> - (mult:DI (zero_extend:DI
> -(match_operand:SI 0 "register_operand"  "%c,c,c"))
> -  (sign_extend:DI
> -; assembler issue for "I", see mulsi_600
> -;   (match_operand:SI 1 "register_operand" "cL,I,Cal"]
> -(match_operand:SI 1 "register_operand" "cL,L,C32"]
> + (mult:DI
> +  (zero_extend:DI (match_operand:SI 0 "register_operand"  "%c,c,c"))
> +  (zero_extend:DI (match_operand:SI 1 "nonmemory_operand"
> "cL,L,C32"]
>"TARGET_MUL64_SET"
>"mulu64%? \t0, %0, %1%&"
>[(set_attr "length" "4,4,8")
> @@ -2098,9 +2118,7 @@
>  }
>else if (TARGET_MUL64_SET)
>  {
> -  operands[2] = force_reg (SImode, operands[2]);
> -  emit_insn (gen_mulsidi_600 (operands[1], operands[2]));
> -  emit_move_insn (operands[0], gen_rtx_REG (DImode,
> MUL64_OUT_REG));
> +  emit_insn (gen_mulsidi_600 (operands[0], operands[1], operands[2]));
>DONE;
>  }
>else if (TARGET_MULMAC_32BY16_SET)
> @@ -2332,9 +2350,7 @@
>  }
>else if (TARGET_MUL64_SET)
>  {
> -  operands[2] = force_reg (SImode, operands[2]);
> -  emit_insn (gen_umulsidi_600 (operands[1], operands[2]));
> -  emit_move_insn (operands[0], gen_rtx_REG (DImode,
> MUL64_OUT_REG));
> +  emit_insn (gen_umulsidi_600 (operands[0], operands[1], operands[2]));
>DONE;
>  }
>else if (TARGET_MULMAC_32BY16_SET)
> --
> 1.9.1

RE: [PATCH 2/4] [ARC] Cleanup implementation.

Ping.

> -Original Message-
> From: Claudiu Zissulescu
> Sent: Wednesday, November 16, 2016 11:18 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu ;
> francois.bed...@synopsys.com; andrew.burg...@embecosm.com
> Subject: [PATCH 2/4] [ARC] Cleanup implementation.
> 
> gcc/
> 2016-06-30  Claudiu Zissulescu  
> 
>   * config/arc/arc-protos.h (insn_is_tls_gd_dispatch): Remove.
>   * config/arc/arc.c (arc_unspec_offset): New function.
>   (arc_finalize_pic): Change.
>   (arc_emit_call_tls_get_addr): Likewise.
>   (arc_legitimize_tls_address): Likewise.
>   (arc_legitimize_pic_address): Likewise.
>   (insn_is_tls_gd_dispatch): Remove.
>   * config/arc/arc.h (INSN_REFERENCES_ARE_DELAYED): Change.
>   * config/arc/arc.md (ls_gd_load): Remove.
>   (tls_gd_dispatch): Likewise.
> ---
>  gcc/config/arc/arc-protos.h |  1 -
>  gcc/config/arc/arc.c| 41 ++---
>  gcc/config/arc/arc.h|  2 +-
>  gcc/config/arc/arc.md   | 34 --
>  4 files changed, 19 insertions(+), 59 deletions(-)
> 
> diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h
> index 6008744..bda3d46 100644
> --- a/gcc/config/arc/arc-protos.h
> +++ b/gcc/config/arc/arc-protos.h
> @@ -118,6 +118,5 @@ extern bool arc_eh_uses (int regno);
>  extern int regno_clobbered_p (unsigned int, rtx_insn *, machine_mode,
> int);
>  extern bool arc_legitimize_reload_address (rtx *, machine_mode, int, int);
>  extern void arc_secondary_reload_conv (rtx, rtx, rtx, bool);
> -extern bool insn_is_tls_gd_dispatch (rtx_insn *);
>  extern void arc_cpu_cpp_builtins (cpp_reader *);
>  extern rtx arc_eh_return_address_location (void);
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 428676f..7eadb3c 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -2893,6 +2893,15 @@ arc_eh_return_address_location (void)
> 
>  /* PIC */
> 
> +/* Helper to generate unspec constant.  */
> +
> +static rtx
> +arc_unspec_offset (rtx loc, int unspec)
> +{
> +  return gen_rtx_CONST (Pmode, gen_rtx_UNSPEC (Pmode, gen_rtvec (1,
> loc),
> +unspec));
> +}
> +
>  /* Emit special PIC prologues and epilogues.  */
>  /* If the function has any GOTOFF relocations, then the GOTBASE
> register has to be setup in the prologue
> @@ -2918,9 +2927,7 @@ arc_finalize_pic (void)
>gcc_assert (flag_pic != 0);
> 
>pat = gen_rtx_SYMBOL_REF (Pmode, "_DYNAMIC");
> -  pat = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, pat), ARC_UNSPEC_GOT);
> -  pat = gen_rtx_CONST (Pmode, pat);
> -
> +  pat = arc_unspec_offset (pat, ARC_UNSPEC_GOT);
>pat = gen_rtx_SET (baseptr_rtx, pat);
> 
>emit_insn (pat);
> @@ -4989,8 +4996,7 @@ arc_emit_call_tls_get_addr (rtx sym, int reloc, rtx
> eqv)
> 
>start_sequence ();
> 
> -  rtx x = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, sym), reloc);
> -  x = gen_rtx_CONST (Pmode, x);
> +  rtx x = arc_unspec_offset (sym, reloc);
>emit_move_insn (r0, x);
>use_reg (&call_fusage, r0);
> 
> @@ -5046,17 +5052,18 @@ arc_legitimize_tls_address (rtx addr, enum
> tls_model model)
>addr = gen_rtx_CONST (Pmode, addr);
>base = arc_legitimize_tls_address (base,
> TLS_MODEL_GLOBAL_DYNAMIC);
>return gen_rtx_PLUS (Pmode, force_reg (Pmode, base), addr);
> +
>  case TLS_MODEL_GLOBAL_DYNAMIC:
>return arc_emit_call_tls_get_addr (addr, UNSPEC_TLS_GD, addr);
> +
>  case TLS_MODEL_INITIAL_EXEC:
> -  addr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr), UNSPEC_TLS_IE);
> -  addr = gen_rtx_CONST (Pmode, addr);
> +  addr = arc_unspec_offset (addr, UNSPEC_TLS_IE);
>addr = copy_to_mode_reg (Pmode, gen_const_mem (Pmode, addr));
>return gen_rtx_PLUS (Pmode, arc_get_tp (), addr);
> +
>  case TLS_MODEL_LOCAL_EXEC:
>  local_exec:
> -  addr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
> UNSPEC_TLS_OFF);
> -  addr = gen_rtx_CONST (Pmode, addr);
> +  addr = arc_unspec_offset (addr, UNSPEC_TLS_OFF);
>return gen_rtx_PLUS (Pmode, arc_get_tp (), addr);
>  default:
>gcc_unreachable ();
> @@ -5087,14 +5094,11 @@ arc_legitimize_pic_address (rtx orig, rtx oldx)
>else if (!flag_pic)
>   return orig;
>else if (CONSTANT_POOL_ADDRESS_P (addr) || SYMBOL_REF_LOCAL_P
> (addr))
> - return gen_rtx_CONST (Pmode,
> -   gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
> -   ARC_UNSPEC_GOTOFFPC));
> + return arc_unspec_offset (addr, ARC_UNSPEC_GOTOFFPC);
> 
>/* This symbol must be referenced via a load from the Global
>Offset Table (@GOTPC).  */
> -  pat = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
> ARC_UNSPEC_GOT);
> -  pat = gen_rtx_CONST (Pmode, pat);
> +  pat = arc_unspec_offset (addr, ARC_UNSPEC_GOT);
>pat = gen_const_mem (Pmode, pat);
> 
>if (oldx == NULL)
> @@ -10033,15 +10037,6 @@ arc_dwarf_

Re: [Patch 3/5] OpenACC tile clause support, C/C++ front-end parts

2016-11-29 Thread Chung-Lin Tang

On 2016/11/18 7:23 PM, Jakub Jelinek wrote:
> On Thu, Nov 17, 2016 at 05:34:34PM +0800, Chung-Lin Tang wrote:
>> Updated C/C++ front-end patches, adjusted as reviewed.
> 
> Jason is right, finish_omp_clauses will verify the tile operands
> when !processing_template_decl are non-negative host INTEGER_CSTs,
> so can't you just tsubst it like OMP_CLAUSE_COLLAPSE?  If the operand
> is not a constant expression, presumably it will not be INTEGER_CST.

Yeah, it appears that way will work. Updated C/C++ FE patch as attached.

> On the other side, OMP_CLAUSE_TILE has now 3 operands instead of just 1,
> don't you need to do something during instantiation for the other 2
> operands?
> 
>   Jakub

The other two operands are used only in omp-low, they're not programmer
defined operands.

Thanks,
Chung-Lin

Index: c/c-parser.c
===
--- c/c-parser.c	(revision 241809)
+++ c/c-parser.c	(working copy)
@@ -11010,6 +11010,7 @@ c_parser_omp_clause_collapse (c_parser *parser, tr
   location_t loc;
 
   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
+  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
 
   loc = c_parser_peek_token (parser)->location;
   if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -11920,10 +11921,11 @@ static tree
 c_parser_oacc_clause_tile (c_parser *parser, tree list)
 {
   tree c, expr = error_mark_node;
-  location_t loc, expr_loc;
+  location_t loc;
   tree tile = NULL_TREE;
 
   check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
 
   loc = c_parser_peek_token (parser)->location;
   if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
@@ -11931,16 +11933,19 @@ c_parser_oacc_clause_tile (c_parser *parser, tree
 
   do
 {
+  if (tile && !c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
+	return list;
+
   if (c_parser_next_token_is (parser, CPP_MULT)
 	  && (c_parser_peek_2nd_token (parser)->type == CPP_COMMA
 	  || c_parser_peek_2nd_token (parser)->type == CPP_CLOSE_PAREN))
 	{
 	  c_parser_consume_token (parser);
-	  expr = integer_minus_one_node;
+	  expr = integer_zero_node;
 	}
   else
 	{
-	  expr_loc = c_parser_peek_token (parser)->location;
+	  location_t expr_loc = c_parser_peek_token (parser)->location;
 	  c_expr cexpr = c_parser_expr_no_commas (parser, NULL);
 	  cexpr = convert_lvalue_to_rvalue (expr_loc, cexpr, false, true);
 	  expr = cexpr.value;
@@ -11952,28 +11957,19 @@ c_parser_oacc_clause_tile (c_parser *parser, tree
 	  return list;
 	}
 
-	  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)))
-	{
-	  c_parser_error (parser, "% value must be integral");
-	  return list;
-	}
-
 	  expr = c_fully_fold (expr, false, NULL);
 
-	  /* Attempt to statically determine when expr isn't positive.  */
-	  c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, expr,
-			   build_int_cst (TREE_TYPE (expr), 0));
-	  protected_set_expr_location (c, expr_loc);
-	  if (c == boolean_true_node)
+	  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
+	  || !tree_fits_shwi_p (expr)
+	  || tree_to_shwi (expr) <= 0)
 	{
-	  warning_at (expr_loc, 0,"% value must be positive");
-	  expr = integer_one_node;
+	  error_at (expr_loc, "% argument needs positive"
+			" integral constant");
+	  expr = integer_zero_node;
 	}
 	}
 
   tile = tree_cons (NULL_TREE, expr, tile);
-  if (c_parser_next_token_is (parser, CPP_COMMA))
-	c_parser_consume_token (parser);
 }
   while (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN));
 
@@ -14899,11 +14895,17 @@ c_parser_omp_for_loop (location_t loc, c_parser *p
   bool fail = false, open_brace_parsed = false;
   int i, collapse = 1, ordered = 0, count, nbraces = 0;
   location_t for_loc;
+  bool tiling = false;
   vec *for_block = make_tree_vector ();
 
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
 if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
   collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
+else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+  {
+	tiling = true;
+	collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+  }
 else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
 	 && OMP_CLAUSE_ORDERED_EXPR (cl))
   {
@@ -14933,7 +14935,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *p
 	  pc = &OMP_CLAUSE_CHAIN (*pc);
 }
 
-  gcc_assert (collapse >= 1 && ordered >= 0);
+  gcc_assert (tiling || (collapse >= 1 && ordered >= 0));
   count = ordered ? ordered : collapse;
 
   declv = make_tree_vec (count);
Index: cp/pt.c
===
--- cp/pt.c	(revision 241809)
+++ cp/pt.c	(working copy)
@@ -14742,6 +14742,7 @@ tsubst_omp_clauses (tree clauses, enum c_omp_regio
 	= tsubst_omp_clause_decl (OMP_CLAUSE_DECL (oc), args, complain,
   in_decl);
 	  break;
+	case OMP_CLAUSE_

Re: [Patch 2/5] OpenACC tile clause support, omp-low parts

On Tue, Nov 29, 2016 at 08:23:45PM +0800, Chung-Lin Tang wrote:
> Adjusted and re-tested using the way you advised, attached updated patch.

Ok, thanks.

Jakub

Re: [Patch 3/5] OpenACC tile clause support, C/C++ front-end parts

On Tue, Nov 29, 2016 at 08:27:04PM +0800, Chung-Lin Tang wrote:
> On 2016/11/18 7:23 PM, Jakub Jelinek wrote:
> > On Thu, Nov 17, 2016 at 05:34:34PM +0800, Chung-Lin Tang wrote:
> >> Updated C/C++ front-end patches, adjusted as reviewed.
> > 
> > Jason is right, finish_omp_clauses will verify the tile operands
> > when !processing_template_decl are non-negative host INTEGER_CSTs,
> > so can't you just tsubst it like OMP_CLAUSE_COLLAPSE?  If the operand
> > is not a constant expression, presumably it will not be INTEGER_CST.
> 
> Yeah, it appears that way will work. Updated C/C++ FE patch as attached.
> 
> > On the other side, OMP_CLAUSE_TILE has now 3 operands instead of just 1,
> > don't you need to do something during instantiation for the other 2
> > operands?
> > 
> > Jakub
> 
> The other two operands are used only in omp-low, they're not programmer
> defined operands.

Ok, thanks.

Jakub

[PATCH] [ARC] [COMMITTED] Fix typo in arc.opt

Committed as obvious.

gcc/
2016-11-29  Claudiu Zissulescu  

* config/arc/arc.opt (marclinux): Fix typo.
(marclinux_prof): Likewise.
---
 gcc/ChangeLog  | 5 +
 gcc/config/arc/arc.opt | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b7ccbd8..e8b1179 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-11-29  Claudiu Zissulescu  
+
+   * config/arc/arc.opt (marclinux): Fix typo.
+   (marclinux_prof): Likewise.
+
 2016-11-29  Jiong Wang  
 
* target.def (stack_protect_runtime_enabled_p): New.
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index 5685100..31b305b 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -382,11 +382,11 @@ Target
 Pass -EL option through to linker.
 
 marclinux
-target
+Target
 Pass -marclinux option through to linker.
 
 marclinux_prof
-target
+Target
 Pass -marclinux_prof option through to linker.
 
 ;; lra is still unproven for ARC, so allow to fall back to reload with 
-mno-lra.
-- 
1.9.1

[PATCH] [ARC] Fix compact casesi option.

Fixing casesi option for ARCv2 cpus.

Ok to apply?
Claudiu

gcc/
2016-11-28  Claudiu Zissulescu  

* config/arc/arc.c (arc_override_options): Avoid selection of
compact casesi for ARCv2.
---
 gcc/config/arc/arc.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 31b4147..f2575b5 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -874,11 +874,13 @@ arc_override_options (void)
 optimize_size = 1;
 
   /* Compact casesi is not a valid option for ARCv2 family.  */
-  if (TARGET_V2
-  && TARGET_COMPACT_CASESI)
+  if (TARGET_V2)
 {
-  warning (0, "compact-casesi is not applicable to ARCv2");
-  TARGET_COMPACT_CASESI = 0;
+  if (TARGET_COMPACT_CASESI)
+   {
+ warning (0, "compact-casesi is not applicable to ARCv2");
+ TARGET_COMPACT_CASESI = 0;
+   }
 }
   else if (optimize_size == 1
   && !global_options_set.x_TARGET_COMPACT_CASESI)
-- 
1.9.1

Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

On 29 November 2016 at 11:12, Christophe Lyon
 wrote:
> Hi Tamar,
>
>
> On 29 November 2016 at 10:50, Tamar Christina  wrote:
>> Hi All,
>>
>> The new patch contains the proper types for the intrinsics that should be 
>> returning uint64x1
>> and has the rest of the comments by Christophe in them.
>>
>
> LGTM.
>
> One more question: maybe we want to add explicit tests for vdup*_v_p64
> even though they are aliases for vmov?
>
Sorry, I meant vdup_n_p64, but the tests are already in place.

So, OK for me, but I can't approve.

Thanks,

Christophe

> Christophe
>
>> Kind Regards,
>> Tamar
>>
>> 
>> From: Tamar Christina
>> Sent: Friday, November 25, 2016 4:01:30 PM
>> To: Christophe Lyon
>> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard Earnshaw; 
>> James Greenhalgh; Kyrylo Tkachov; nd
>> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t 
>> intrinsics to GCC
>>
>>  >
>>> > A few comments about this new version:
>>> > * arm-neon-ref.h: why do you create
>>> CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?
>>> > Can't you just add calls to CHECK_CRYPTO in the existing
>>> > CHECK_RESULTS_NAMED_NO_FP16?
>>
>> Yes, that should be fine, I didn't used to have CHECK_CRYPTO before and when 
>> I added it
>> I didn't remove the split. I'll do it now.
>>
>>> >
>>> > * p64_p128:
>>> > From what I can see ARM and AArch64 differ on the vceq variants
>>> > available with poly64.
>>> > For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,
>>> > poly64x1_t __b) For AArch64, I can't see vceq_p64 in arm_neon.h? ...
>>> > Actually I've just noticed the other you submitted while I was writing
>>> > this, where you add vceq_p64 for aarch64, but it still returns
>>> > uint64_t.
>>> > Why do you change the vceq_64 test to return poly64_t instead of
>>> uint64_t?
>>
>> This patch is slightly outdated. The correct type is `uint64_t` but when it 
>> was noticed
>> This patch was already sent. New one coming soon.
>>
>>> >
>>> > Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?
>>> >
>>
>> This is wrong, remove them. It was supposed to be around the vldX_lane_p64 
>> tests.
>>
>>> > The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE
>>> > tests
>>> >
>>> > You need to protect the new vmov, vget_high and vget_lane tests with
>>> > #ifdef __aarch64__.
>>> >
>>
>> vget_lane is already in an #ifdef, vmov you're right, but I also notice that 
>> the
>> test calls VDUP instead of VMOV, which explains why I didn't get a test 
>> failure.
>>
>> Thanks for the feedback,
>> I'll get these updated.
>>
>>>
>>> Actually, vget_high_p64 exists on arm, so no need for the #fidef for it.
>>>
>>>
>>> > Christophe
>>> >
>>> >> Kind regards,
>>> >> Tamar
>>> >> 
>>> >> From: Tamar Christina
>>> >> Sent: Tuesday, November 8, 2016 11:58:46 AM
>>> >> To: Christophe Lyon
>>> >> Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard
>>> >> Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd
>>> >> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing
>>> >> Poly64_t intrinsics to GCC
>>> >>
>>> >> Hi Christophe,
>>> >>
>>> >> Thanks for the review!
>>> >>
>>> >>>
>>> >>> A while ago I added p64_p128.c, to contain all the poly64/128 tests
>>> >>> except for vreinterpret.
>>> >>> Why do you need to create p64.c ?
>>> >>
>>> >> I originally created it because I had a much smaller set of
>>> >> intrinsics that I wanted to add initially, this grew and It hadn't 
>>> >> occurred to
>>> me that I can use the existing file now.
>>> >>
>>> >> Another reason was the effective-target arm_crypto_ok as you
>>> mentioned below.
>>> >>
>>> >>>
>>> >>> Similarly, adding tests for vcreate_p64 etc... in p64.c or
>>> >>> p64_p128.c might be easier to maintain than adding them to vcreate.c
>>> >>> etc with several #ifdef conditions.
>>> >>
>>> >> Fair enough, I'll move them to p64_p128.c.
>>> >>
>>> >>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"
>>> >>> condition? These intrinsics are defined in arm/arm_neon.h, right?
>>> >>> They are tested in p64_p128.c
>>> >>
>>> >> I should have looked for them, they weren't being tested before so I
>>> >> had Mistakenly assumed that they weren't available. Now I realize I
>>> >> just need To add the proper test option to the file to enable crypto. 
>>> >> I'll
>>> update this as well.
>>> >>
>>> >>> Looking at your patch, it seems some tests are currently missing for 
>>> >>> arm:
>>> >>> vget_high_p64. I'm not sure why I missed it when I removed neont-
>>> >>> testgen...
>>> >>
>>> >> I'll adjust the test conditions so they run for ARM as well.
>>> >>
>>> >>>
>>> >>> Regarding vreinterpret_p128.c, doesn't the existing effective-target
>>> >>> arm_crypto_ok prevent the tests from running on aarch64?
>>> >>
>>> >> Yes they do, I was comparing the output against a clean version and
>>> >> hasn't noticed That they weren't running. Thanks!

[PATCH][ARC] Fix PIE.

2016-11-29 Thread Cupertino Miranda

Hi Claudiu and Andrew

This patches solves the dejagnu PIE failing tests.
Looking forward to your review.

Best regards,
Cupertino

gcc/
2016-07-27  Cupertino Miranda  

* config/arc/arc.h (STARTFILE_SPEC): Use default linux specs.
(ENDFILE_SPEC): Likewise.

libgcc/
2016-07-27  Cupertino Miranda  

* config.host (arc*-*-linux-uclibc*): Use default extra
objects. Include linux-android header.
* config/arc/crti.S (_init): Declare symbol as function.
(_fini): Likewise.
---
 gcc/config.gcc   |  2 +-
 gcc/config/arc/arc.h | 10 --
 libgcc/config.host   |  4 ++--
 libgcc/config/arc/crti.S |  2 ++
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 22a8946..f9df00d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1020,7 +1020,7 @@ arc*-*-elf*)
;;
 arc*-*-linux-uclibc*)
extra_headers="arc-simd.h"
-   tm_file="arc/arc-arch.h dbxelf.h elfos.h gnu-user.h linux.h 
glibc-stdint.h ${tm_file}"
+   tm_file="arc/arc-arch.h dbxelf.h elfos.h gnu-user.h linux.h 
linux-android.h glibc-stdint.h ${tm_file}"
tmake_file="${tmake_file} arc/t-uClibc arc/t-arc"
tm_defines="${tm_defines} TARGET_SDATA_DEFAULT=0"
tm_defines="${tm_defines} TARGET_MMEDIUM_CALLS_DEFAULT=1"
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 6a579eb..d4cc20dd 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -138,17 +138,15 @@ extern const char *arc_cpu_to_as (int argc, const char 
**argv);
 #define STARTFILE_SPEC "%{!shared:crt0.o%s} crti%O%s %{pg|p:crtg.o%s} " \
   "%(arc_tls_extra_start_spec) crtbegin.o%s"
 #else
-#define STARTFILE_SPEC   "%{!shared:%{!mkernel:crt1.o%s}} crti.o%s \
-  %{!shared:%{pg|p|profile:crtg.o%s} crtbegin.o%s} %{shared:crtbeginS.o%s}"
-
+#define STARTFILE_SPEC \
+  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_STARTFILE_SPEC, ANDROID_STARTFILE_SPEC)
 #endif
 
 #if DEFAULT_LIBC != LIBC_UCLIBC
 #define ENDFILE_SPEC "%{pg|p:crtgend.o%s} crtend.o%s crtn%O%s"
 #else
-#define ENDFILE_SPEC "%{!shared:%{pg|p|profile:crtgend.o%s} crtend.o%s} \
-  %{shared:crtendS.o%s} crtn.o%s"
-
+#define ENDFILE_SPEC   \
+  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_ENDFILE_SPEC, ANDROID_ENDFILE_SPEC)
 #endif
 
 #if DEFAULT_LIBC == LIBC_UCLIBC
diff --git a/libgcc/config.host b/libgcc/config.host
index 5a4e56e..7adab15 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -371,8 +371,8 @@ arc*-*-elf*)
;;
 arc*-*-linux-uclibc*)
tmake_file="${tmake_file} t-slibgcc-libgcc t-slibgcc-nolc-override 
arc/t-arc700-uClibc arc/t-arc"
-   extra_parts="crti.o crtn.o crtend.o crtbegin.o crtendS.o crtbeginS.o 
libgmon.a crtg.o crtgend.o"
-   extra_parts="${extra_parts} crttls.o"
+   extra_parts="$extra_parts crti.o crtn.o libgmon.a crtg.o crtgend.o"
+   extra_parts="$extra_parts crttls.o"
;;
 arm-wrs-vxworks)
tmake_file="$tmake_file arm/t-arm arm/t-elf t-softfp-sfdf t-softfp-excl 
arm/t-softfp t-softfp"
diff --git a/libgcc/config/arc/crti.S b/libgcc/config/arc/crti.S
index 7f64305..6867ca9 100644
--- a/libgcc/config/arc/crti.S
+++ b/libgcc/config/arc/crti.S
@@ -31,11 +31,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
.section .init
.global _init
.word 0
+   .type   _init,@function
 _init:
push_s  blink
 
.section .fini
.global _fini
.word 0
+   .type   _fini,@function
 _fini:
push_s  blink
-- 1.9.1

Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC



On 29/11/16 09:50, Tamar Christina wrote:

Hi All,

The new patch contains the proper types for the intrinsics that should be 
returning uint64x1
and has the rest of the comments by Christophe in them.


Ok with an appropriate ChangeLog entry.
Thanks,
Kyrill


Kind Regards,
Tamar


From: Tamar Christina
Sent: Friday, November 25, 2016 4:01:30 PM
To: Christophe Lyon
Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard Earnshaw; 
James Greenhalgh; Kyrylo Tkachov; nd
Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t 
intrinsics to GCC

  >

A few comments about this new version:
* arm-neon-ref.h: why do you create

CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?

Can't you just add calls to CHECK_CRYPTO in the existing
CHECK_RESULTS_NAMED_NO_FP16?

Yes, that should be fine, I didn't used to have CHECK_CRYPTO before and when I 
added it
I didn't remove the split. I'll do it now.


* p64_p128:
 From what I can see ARM and AArch64 differ on the vceq variants
available with poly64.
For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,
poly64x1_t __b) For AArch64, I can't see vceq_p64 in arm_neon.h? ...
Actually I've just noticed the other you submitted while I was writing
this, where you add vceq_p64 for aarch64, but it still returns
uint64_t.
Why do you change the vceq_64 test to return poly64_t instead of

uint64_t?

This patch is slightly outdated. The correct type is `uint64_t` but when it was 
noticed
This patch was already sent. New one coming soon.


Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?


This is wrong, remove them. It was supposed to be around the vldX_lane_p64 
tests.


The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE
tests

You need to protect the new vmov, vget_high and vget_lane tests with
#ifdef __aarch64__.


vget_lane is already in an #ifdef, vmov you're right, but I also notice that the
test calls VDUP instead of VMOV, which explains why I didn't get a test failure.

Thanks for the feedback,
I'll get these updated.


Actually, vget_high_p64 exists on arm, so no need for the #fidef for it.



Christophe


Kind regards,
Tamar

From: Tamar Christina
Sent: Tuesday, November 8, 2016 11:58:46 AM
To: Christophe Lyon
Cc: GCC Patches; christophe.l...@st.com; Marcus Shawcroft; Richard
Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd
Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing
Poly64_t intrinsics to GCC

Hi Christophe,

Thanks for the review!


A while ago I added p64_p128.c, to contain all the poly64/128 tests
except for vreinterpret.
Why do you need to create p64.c ?

I originally created it because I had a much smaller set of
intrinsics that I wanted to add initially, this grew and It hadn't occurred to

me that I can use the existing file now.

Another reason was the effective-target arm_crypto_ok as you

mentioned below.

Similarly, adding tests for vcreate_p64 etc... in p64.c or
p64_p128.c might be easier to maintain than adding them to vcreate.c
etc with several #ifdef conditions.

Fair enough, I'll move them to p64_p128.c.


For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"
condition? These intrinsics are defined in arm/arm_neon.h, right?
They are tested in p64_p128.c

I should have looked for them, they weren't being tested before so I
had Mistakenly assumed that they weren't available. Now I realize I
just need To add the proper test option to the file to enable crypto. I'll

update this as well.

Looking at your patch, it seems some tests are currently missing for arm:
vget_high_p64. I'm not sure why I missed it when I removed neont-
testgen...

I'll adjust the test conditions so they run for ARM as well.


Regarding vreinterpret_p128.c, doesn't the existing effective-target
arm_crypto_ok prevent the tests from running on aarch64?

Yes they do, I was comparing the output against a clean version and
hasn't noticed That they weren't running. Thanks!


Thanks,

Christophe

Re: [PATCH] [ARC] Fix compact casesi option.

2016-11-29 Thread Andrew Burgess

* Claudiu Zissulescu  [2016-11-29 13:43:21 
+0100]:

> Fixing casesi option for ARCv2 cpus.
> 
> Ok to apply?
> Claudiu

Approved.

Thanks,
Andrew


> 
> gcc/
> 2016-11-28  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_override_options): Avoid selection of
>   compact casesi for ARCv2.
> ---
>  gcc/config/arc/arc.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 31b4147..f2575b5 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -874,11 +874,13 @@ arc_override_options (void)
>  optimize_size = 1;
>  
>/* Compact casesi is not a valid option for ARCv2 family.  */
> -  if (TARGET_V2
> -  && TARGET_COMPACT_CASESI)
> +  if (TARGET_V2)
>  {
> -  warning (0, "compact-casesi is not applicable to ARCv2");
> -  TARGET_COMPACT_CASESI = 0;
> +  if (TARGET_COMPACT_CASESI)
> + {
> +   warning (0, "compact-casesi is not applicable to ARCv2");
> +   TARGET_COMPACT_CASESI = 0;
> + }
>  }
>else if (optimize_size == 1
>  && !global_options_set.x_TARGET_COMPACT_CASESI)
> -- 
> 1.9.1
>

Re: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

2016-11-29 Thread James Greenhalgh

On Tue, Nov 29, 2016 at 01:48:22PM +, Kyrill Tkachov wrote:
> 
> On 29/11/16 09:50, Tamar Christina wrote:
> >Hi All,
> >
> >The new patch contains the proper types for the intrinsics that should be 
> >returning uint64x1
> >and has the rest of the comments by Christophe in them.
> 
> Ok with an appropriate ChangeLog entry.

Also OK from an AArch64 persepctive based on the detailed review from
Christophe.

Thanks,
James

[PATCH] Avoid compile-time overhead of GIMPLE FE in CFG construction


$subject

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-11-29  Richard Biener  

* tree-cfg.c (lower_phi_internal_fn): Do not look for further
PHIs after a regular stmt.
(stmt_starts_bb_p): PHIs not preceeded by a PHI or a label
start a new BB.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 242953)
+++ gcc/tree-cfg.c  (working copy)
@@ -361,14 +361,11 @@ lower_phi_internal_fn ()
   /* After edge creation, handle __PHI function from GIMPLE FE.  */
   FOR_EACH_BB_FN (bb, cfun)
 {
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
+  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
{
  stmt = gsi_stmt (gsi);
  if (! gimple_call_internal_p (stmt, IFN_PHI))
-   {
- gsi_next (&gsi);
- continue;
-   }
+   break;
 
  lhs = gimple_call_lhs (stmt);
  phi_node = create_phi_node (lhs, bb);
@@ -2604,11 +2601,21 @@ stmt_starts_bb_p (gimple *stmt, gimple *
   else
return true;
 }
-  else if (gimple_code (stmt) == GIMPLE_CALL
-  && gimple_call_flags (stmt) & ECF_RETURNS_TWICE)
-/* setjmp acts similar to a nonlocal GOTO target and thus should
-   start a new block.  */
-return true;
+  else if (gimple_code (stmt) == GIMPLE_CALL)
+{
+  if (gimple_call_flags (stmt) & ECF_RETURNS_TWICE)
+   /* setjmp acts similar to a nonlocal GOTO target and thus should
+  start a new block.  */
+   return true;
+  if (gimple_call_internal_p (stmt, IFN_PHI)
+ && prev_stmt
+ && gimple_code (prev_stmt) != GIMPLE_LABEL
+ && (gimple_code (prev_stmt) != GIMPLE_CALL
+ || ! gimple_call_internal_p (prev_stmt, IFN_PHI)))
+   /* PHI nodes start a new block unless preceeded by a label
+  or another PHI.  */
+   return true;
+}
 
   return false;
 }

Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-11-29 Thread Andrew Burgess

* Jeff Law  [2016-11-28 15:08:46 -0700]:

> On 11/24/2016 02:40 PM, Andrew Burgess wrote:
> > * Christophe Lyon  [2016-11-21 13:47:09 +0100]:
> > 
> > > On 20 November 2016 at 18:27, Mike Stump  wrote:
> > > > On Nov 19, 2016, at 1:59 PM, Andrew Burgess 
> > > >  wrote:
> > > > > > So, your new test fails on arm* targets:
> > > > > 
> > > > > After a little digging I think the problem might be that
> > > > > -freorder-blocks-and-partition is not supported on arm.
> > > > > 
> > > > > This should be detected as the new tests include:
> > > > > 
> > > > >/* { dg-require-effective-target freorder } */
> > > > > 
> > > > > however this test passed on arm as -freorder-blocks-and-partition does
> > > > > not issue any warning unless -fprofile-use is also passed.
> > > > > 
> > > > > The patch below extends check_effective_target_freorder to check using
> > > > > -fprofile-use.  With this change in place the tests are skipped on
> > > > > arm.
> > > > 
> > > > > All feedback welcome,
> > > > 
> > > > Seems reasonable, unless a -freorder-blocks-and-partition/-fprofile-use 
> > > > person thinks this is the wrong solution.
> > > > 
> > > 
> > > Hi,
> > > 
> > > As promised, I tested this patch: it makes
> > > gcc.dg/tree-prof/section-attr-[123].c
> > > unsupported on arm*, and thus they are not failing anymore :-)
> > > 
> > > However, it also makes other tests unsupported, while they used to pass:
> > > 
> > >   gcc.dg/pr33648.c
> > >   gcc.dg/pr46685.c
> > >   gcc.dg/tree-prof/20041218-1.c
> > >   gcc.dg/tree-prof/bb-reorg.c
> > >   gcc.dg/tree-prof/cold_partition_label.c
> > >   gcc.dg/tree-prof/comp-goto-1.c
> > >   gcc.dg/tree-prof/pr34999.c
> > >   gcc.dg/tree-prof/pr45354.c
> > >   gcc.dg/tree-prof/pr50907.c
> > >   gcc.dg/tree-prof/pr52027.c
> > >   gcc.dg/tree-prof/va-arg-pack-1.c
> > > 
> > > and failures are now unsupported:
> > >   gcc.dg/tree-prof/cold_partition_label.c
> > >   gcc.dg/tree-prof/section-attr-1.c
> > >   gcc.dg/tree-prof/section-attr-2.c
> > >   gcc.dg/tree-prof/section-attr-3.c
> > > 
> > > So, maybe this patch is too strong?
> > 
> > In all of the cases that used to pass the tests are compile only tests
> > (except for cold_partition_label, which I discuss below).
> > 
> > On ARM passing -fprofile-use and -freorder-blocks-and-partition
> > results in a warning, and the -freorder-blocks-and-partition flag is
> > ignored.  However, disabling -freorder-blocks-and-partition doesn't
> > stop any of the tests compiling, hence the passes.
> > 
> > All the tests include:
> > 
> >   /* { dg-require-effective-target freorder } */
> > 
> > which I understand to mean, the tests requires the 'freorder' feature
> > to be supported (which corresponds to -freorder-blocks-and-partition).
> > 
> > For cold_partition_label and my new tests it's seems clear that the
> > lack of support for -freorder-blocks-and-partition on ARM is the cause
> > of the test failures.
> > 
> > So, is it reasonable to give up the other tests as "unsupported"?  I'd
> > be inclined to say yes, but I happy to rework the patch if anyone has
> > a suggestion for an alternative approach.
> It is reasonable.  It's not uncommon to have to drop various tests to
> UNSUPPORTED, particularly things which depend on assembler/linker
> capabilities, the target runtime system, etc.

OK, I'm going to take that as approval for my patch[1].  I'll wait a
couple of days to give people a chance to correct me, then I'll push
the change.  This should resolve the test regressions I introduced for
ARM.

Thanks,
Andrew

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02050.html

[PATCH] Fix PR78588 - rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too large for 64-bit type

Building gcc with -fsanitize=undefined shows:
 rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too
 large for 64-bit type 'long unsigned int'

5210   return nonzero & (HOST_WIDE_INT_1U << (bitwidth - 1))
5211  ? 1 : bitwidth - floor_log2 (nonzero) - 1;

Here (bitwidth - 1) wraps around because bitwidth is zero and unsigned. 

Fix by returning earlier if bitwidth is zero.

Tested on ppc64le.
OK for trunk?

Thanks.

  * rtlanal.c (num_sign_bit_copies1): Check for zero bitwidth.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 4e4eb2ef3458..918088a0db8e 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5203,7 +5203,7 @@ num_sign_bit_copies1 (const_rtx x, machine_mode mode, 
const_rtx known_x,
  safely compute the mask for this mode, always return BITWIDTH.  */

   bitwidth = GET_MODE_PRECISION (mode);
-  if (bitwidth > HOST_BITS_PER_WIDE_INT)
+  if (bitwidth == 0 || bitwidth > HOST_BITS_PER_WIDE_INT)
 return 1;

   nonzero = nonzero_bits (x, mode);

--
Markus

Re: [PATCH] Fix PR78588 - rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too large for 64-bit type

On Tue, Nov 29, 2016 at 03:08:15PM +0100, Markus Trippelsdorf wrote:
> Building gcc with -fsanitize=undefined shows:
>  rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too
>  large for 64-bit type 'long unsigned int'
> 
> 5210   return nonzero & (HOST_WIDE_INT_1U << (bitwidth - 1))
> 5211  ? 1 : bitwidth - floor_log2 (nonzero) - 1;
> 
> Here (bitwidth - 1) wraps around because bitwidth is zero and unsigned. 

Which modes have precision of 0?  I'd expect just VOIDmode and BLKmode, any
others?  And for those I'd say it is a bug to call num_sign_bit_copies*.

> Tested on ppc64le.
> OK for trunk?
> 
> Thanks.
> 
>   * rtlanal.c (num_sign_bit_copies1): Check for zero bitwidth.
> 
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 4e4eb2ef3458..918088a0db8e 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -5203,7 +5203,7 @@ num_sign_bit_copies1 (const_rtx x, machine_mode mode, 
> const_rtx known_x,
>   safely compute the mask for this mode, always return BITWIDTH.  */
> 
>bitwidth = GET_MODE_PRECISION (mode);
> -  if (bitwidth > HOST_BITS_PER_WIDE_INT)
> +  if (bitwidth == 0 || bitwidth > HOST_BITS_PER_WIDE_INT)
>  return 1;
> 
>nonzero = nonzero_bits (x, mode);
> 
> --
> Markus

Jakub

Re: [Patch, Fortran, OOP] PR 58175: Incorrect warning message on scalar finalizer

2016-11-29 Thread Janus Weil

Committed as r242960.



2016-11-28 14:36 GMT+01:00 Janus Weil :
> Hi all,
>
> the attached patch was posted on bugzilla by Tobias three years ago,
> but left unattended since then. It is simple, works well (fixing a
> bogus warning) and regtests cleanly on x86_64-linux-gnu.
>
> If no one objects, I will commit this to trunk by tomorrow.
>
> Cheers,
> Janus
>
>
>
> 2016-11-28  Tobias Burnus  
>
> PR fortran/58175
> * resolve.c (gfc_resolve_finalizers): Properly detect scalar finalizers.
>
> 2016-11-28  Janus Weil  
>
> PR fortran/58175
> * gfortran.dg/finalize_30.f90: New test case.

Re: [PATCH] Fix PR78588 - rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too large for 64-bit type

On 2016.11.29 at 15:14 +0100, Jakub Jelinek wrote:
> On Tue, Nov 29, 2016 at 03:08:15PM +0100, Markus Trippelsdorf wrote:
> > Building gcc with -fsanitize=undefined shows:
> >  rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too
> >  large for 64-bit type 'long unsigned int'
> >
> > 5210   return nonzero & (HOST_WIDE_INT_1U << (bitwidth - 1))
> > 5211  ? 1 : bitwidth - floor_log2 (nonzero) - 1;
> >
> > Here (bitwidth - 1) wraps around because bitwidth is zero and unsigned.
>
> Which modes have precision of 0?  I'd expect just VOIDmode and BLKmode, any
> others?  And for those I'd say it is a bug to call num_sign_bit_copies*.

Yes, only VOIDmode and BLKmode:

 233 const unsigned short mode_precision[NUM_MACHINE_MODES] =
 234 {
 235   0,   /* VOID */
 236   0,   /* BLK */


--
Markus

RE: [PATCH] [ARC] Fix compact casesi option.

> Approved.
> 

Committed, thank you for ur review,
Claudiu

Calling 'abort' on bounds violations in libmpx

2016-11-29 Thread Alexander Ivchenko

Hi,

Attached patch is addressing PR67520. Would that approach work for the
problem? Should I also change the version of the library?

2016-11-29  Alexander Ivchenko  

* mpxrt/mpxrt-utils.c (set_mpx_rt_stop_handler): New function.
(print_help): Add help for CHKP_RT_STOP_HANDLER environment
variable.
(__mpxrt_init_env_vars): Add initialization of stop_handler.
(__mpxrt_stop_handler): New function.
(__mpxrt_stop): Ditto.
* mpxrt/mpxrt-utils.h (mpx_rt_stop_mode_handler_t): New enum.



diff --git a/libmpx/mpxrt/mpxrt-utils.c b/libmpx/mpxrt/mpxrt-utils.c
index 057a355..63ee7c6 100644
--- a/libmpx/mpxrt/mpxrt-utils.c
+++ b/libmpx/mpxrt/mpxrt-utils.c
@@ -60,6 +60,9 @@
 #define MPX_RT_MODE "CHKP_RT_MODE"
 #define MPX_RT_MODE_DEFAULT MPX_RT_COUNT
 #define MPX_RT_MODE_DEFAULT_STR "count"
+#define MPX_RT_STOP_HANDLER "CHKP_RT_STOP_HANDLER"
+#define MPX_RT_STOP_HANDLER_DEFAULT MPX_RT_STOP_HANDLER_ABORT
+#define MPX_RT_STOP_HANDLER_DEFAULT_STR "abort"
 #define MPX_RT_HELP "CHKP_RT_HELP"
 #define MPX_RT_ADDPID "CHKP_RT_ADDPID"
 #define MPX_RT_BNDPRESERVE "CHKP_RT_BNDPRESERVE"
@@ -84,6 +87,7 @@ typedef struct {
 static int summary;
 static int add_pid;
 static mpx_rt_mode_t mode;
+static mpx_rt_stop_mode_handler_t stop_handler;
 static env_var_list_t env_var_list;
 static verbose_type verbose_val;
 static FILE *out;
@@ -226,6 +230,23 @@ set_mpx_rt_mode (const char *env)
   }
 }

+static mpx_rt_stop_mode_handler_t
+set_mpx_rt_stop_handler (const char *env)
+{
+  if (env == 0)
+return MPX_RT_STOP_HANDLER_DEFAULT;
+  else if (strcmp (env, "abort") == 0)
+return MPX_RT_STOP_HANDLER_ABORT;
+  else if (strcmp (env, "exit") == 0)
+return MPX_RT_STOP_HANDLER_EXIT;
+  {
+__mpxrt_print (VERB_ERROR, "Illegal value '%s' for %s. Legal values are"
+   "[abort | exit]\nUsing default value %s\n",
+   env, MPX_RT_STOP_HANDLER, MPX_RT_STOP_HANDLER_DEFAULT);
+return MPX_RT_STOP_HANDLER_DEFAULT;
+  }
+}
+
 static void
 print_help (void)
 {
@@ -244,6 +265,11 @@ print_help (void)
   fprintf (out, "%s \t\t set MPX runtime behavior on #BR exception."
" [stop | count]\n"
"\t\t\t [default: %s]\n", MPX_RT_MODE, MPX_RT_MODE_DEFAULT_STR);
+  fprintf (out, "%s \t set the handler function MPX runtime will call\n"
+   "\t\t\t on #BR exception when %s is set to \'stop\'."
+   " [abort | exit]\n"
+   "\t\t\t [default: %s]\n", MPX_RT_STOP_HANDLER, MPX_RT_MODE,
+   MPX_RT_STOP_HANDLER_DEFAULT_STR);
   fprintf (out, "%s \t\t generate out,err file for each process.\n"
"\t\t\t generated file will be MPX_RT_{OUT,ERR}_FILE.pid\n"
"\t\t\t [default: no]\n", MPX_RT_ADDPID);
@@ -357,6 +383,10 @@ __mpxrt_init_env_vars (int* bndpreserve)
   env_var_list_add (MPX_RT_MODE, env);
   mode = set_mpx_rt_mode (env);

+  env = secure_getenv (MPX_RT_STOP_HANDLER);
+  env_var_list_add (MPX_RT_STOP_HANDLER, env);
+  stop_handler = set_mpx_rt_stop_handler (env);
+
   env = secure_getenv (MPX_RT_BNDPRESERVE);
   env_var_list_add (MPX_RT_BNDPRESERVE, env);
   validate_bndpreserve (env, bndpreserve);
@@ -487,6 +517,22 @@ __mpxrt_mode (void)
   return mode;
 }

+mpx_rt_mode_t
+__mpxrt_stop_handler (void)
+{
+  return stop_handler;
+}
+
+void __attribute__ ((noreturn))
+__mpxrt_stop (void)
+{
+  if (__mpxrt_stop_handler () == MPX_RT_STOP_HANDLER_ABORT)
+abort ();
+  else if (__mpxrt_stop_handler () == MPX_RT_STOP_HANDLER_EXIT)
+exit (255);
+  __builtin_unreachable ();
+}
+
 void
 __mpxrt_print_summary (uint64_t num_brs, uint64_t l1_size)
 {
diff --git a/libmpx/mpxrt/mpxrt-utils.h b/libmpx/mpxrt/mpxrt-utils.h
index d62937d..6da12cc 100644
--- a/libmpx/mpxrt/mpxrt-utils.h
+++ b/libmpx/mpxrt/mpxrt-utils.h
@@ -54,6 +54,11 @@ typedef enum {
   MPX_RT_STOP
 } mpx_rt_mode_t;

+typedef enum {
+  MPX_RT_STOP_HANDLER_ABORT,
+  MPX_RT_STOP_HANDLER_EXIT
+} mpx_rt_stop_mode_handler_t;
+
 void __mpxrt_init_env_vars (int* bndpreserve);
 void __mpxrt_write_uint (verbose_type vt, uint64_t val, unsigned base);
 void __mpxrt_write (verbose_type vt, const char* str);
diff --git a/libmpx/mpxrt/mpxrt.c b/libmpx/mpxrt/mpxrt.c
index b52906b..0bc069c 100644
--- a/libmpx/mpxrt/mpxrt.c
+++ b/libmpx/mpxrt/mpxrt.c
@@ -252,7 +251,7 @@ handler (int sig __attribute__ ((unused)),
   uctxt->uc_mcontext.gregs[REG_IP_IDX] =
 (greg_t)get_next_inst_ip ((uint8_t *)ip);
   if (__mpxrt_mode () == MPX_RT_STOP)
-exit (255);
+__mpxrt_stop ();
   return;

  default:
@@ -269,7 +268,7 @@ handler (int sig __attribute__ ((unused)),
   __mpxrt_write (VERB_ERROR, ", ip = 0x");
   __mpxrt_write_uint (VERB_ERROR, ip, 16);
   __mpxrt_write (VERB_ERROR, "\n");
-  exit (255);
+  __mpxrt_stop ();
 }
   else
 {
@@ -278,7 +277,7 @@ handler (int sig __attribute__ ((unused)),
   __mpxrt_write (VERB_ERROR, "! at 0x");
   __mpxrt_write_uint (VERB_ERROR, ip, 16);
   __mpxrt_write (VERB_ERROR, "\n");
-  exit (255);
+  __mpxrt_stop ();
 }
 }

thanks,
Alexander

Re: [PATCH] Fix PR78588 - rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too large for 64-bit type

On 2016.11.29 at 15:21 +0100, Markus Trippelsdorf wrote:
> On 2016.11.29 at 15:14 +0100, Jakub Jelinek wrote:
> > On Tue, Nov 29, 2016 at 03:08:15PM +0100, Markus Trippelsdorf wrote:
> > > Building gcc with -fsanitize=undefined shows:
> > >  rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too
> > >  large for 64-bit type 'long unsigned int'
> > >
> > > 5210   return nonzero & (HOST_WIDE_INT_1U << (bitwidth - 1))
> > > 5211  ? 1 : bitwidth - floor_log2 (nonzero) - 1;
> > >
> > > Here (bitwidth - 1) wraps around because bitwidth is zero and unsigned.
> >
> > Which modes have precision of 0?  I'd expect just VOIDmode and BLKmode, any
> > others?  And for those I'd say it is a bug to call num_sign_bit_copies*.
> 
> Yes, only VOIDmode and BLKmode:
> 
>  233 const unsigned short mode_precision[NUM_MACHINE_MODES] =
>  234 {
>  235   0,   /* VOID */
>  236   0,   /* BLK */

markus@x4 libsupc++ % cat cp-demangle.i
d_demangle_callback_mangled() {
  if (strncmp(d_demangle_callback_mangled, "", 1))
d_type();
}

markus@x4 libsupc++ % UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 
/var/tmp/gcc_build_dir_/./gcc/cc1 -w -fpreprocessed cp-demangle.i -quiet 
-dumpbase cp-demangle.i -mtune=generic -march=x86-64 -auxbase cp-demangle -O2 
-version -o /dev/null
GNU C11 (GCC) version 7.0.0 20161129 (experimental) (x86_64-pc-linux-gnu)
compiled by GNU C version 7.0.0 20161129 (experimental), GMP version 
6.1.1, MPFR version 3.1.5, MPC version 1.0.3, isl version none
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C11 (GCC) version 7.0.0 20161129 (experimental) (x86_64-pc-linux-gnu)
compiled by GNU C version 7.0.0 20161129 (experimental), GMP version 
6.1.1, MPFR version 3.1.5, MPC version 1.0.3, isl version none
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 7cca725773f8a0693a2905f8af7b733c
../../gcc/gcc/rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is 
too large for 64-bit type 'long unsigned int'
#0 0x1b40fe1 in num_sign_bit_copies1 ../../gcc/gcc/rtlanal.c:5210
#1 0x35ef5f1 in if_then_else_cond ../../gcc/gcc/combine.c:9180
#2 0x35ef199 in if_then_else_cond ../../gcc/gcc/combine.c:9034
#3 0x35ef199 in if_then_else_cond ../../gcc/gcc/combine.c:9034
#4 0x3625f98 in combine_simplify_rtx ../../gcc/gcc/combine.c:5604
#5 0x3632525 in subst ../../gcc/gcc/combine.c:5487
#6 0x36327d6 in subst ../../gcc/gcc/combine.c:5425
#7 0x3632bd7 in subst ../../gcc/gcc/combine.c:5354
#8 0x3641a74 in try_combine ../../gcc/gcc/combine.c:3347
#9 0x365727b in combine_instructions ../../gcc/gcc/combine.c:1421
#10 0x365727b in rest_of_handle_combine ../../gcc/gcc/combine.c:14581
#11 0x365727b in execute ../../gcc/gcc/combine.c:14626
#12 0x195ad18 in execute_one_pass(opt_pass*) ../../gcc/gcc/passes.c:2370
#13 0x195cbab in execute_pass_list_1 ../../gcc/gcc/passes.c:2459
#14 0x195cbd4 in execute_pass_list_1 ../../gcc/gcc/passes.c:2460
#15 0x195cc64 in execute_pass_list(function*, opt_pass*) 
../../gcc/gcc/passes.c:2470
#16 0xc75deb in cgraph_node::expand() ../../gcc/gcc/cgraphunit.c:2001
#17 0xc7b2fa in expand_all_functions ../../gcc/gcc/cgraphunit.c:2137
#18 0xc7b2fa in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2494
#19 0xc854b7 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2587
#20 0xc854b7 in symbol_table::finalize_compilation_unit() 
../../gcc/gcc/cgraphunit.c:2584
#21 0x1d3ea10 in compile_file ../../gcc/gcc/toplev.c:488
#22 0x629a14 in do_compile ../../gcc/gcc/toplev.c:1983
#23 0x629a14 in toplev::main(int, char**) ../../gcc/gcc/toplev.c:2117
#24 0x62c046 in main ../../gcc/gcc/main.c:39
#25 0x7f4b6600f310 in __libc_start_main ../csu/libc-start.c:286
#26 0x62c469 in _start (/var/tmp/gcc_build_dir_/gcc/cc1+0x62c469)

-- 
Markus

Re: [PATCH 4/4] S/390: Disable peeling for alignment.

On Tue, Nov 29, 2016 at 11:38:15AM +0100, Richard Biener wrote:
> So - please instead of setting this param provide
> TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.

Right, that's way better.

gcc/ChangeLog:

2016-11-29  Andreas Krebbel  

* gcc/config/s390/s390.c (s390_builtin_vectorization_cost): New
function.
(TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Define target
macro.

gcc/testsuite/ChangeLog:

2016-11-29  Andreas Krebbel  

* gcc.target/s390/vector/vec-nopeel-1.c: New test.
---
 gcc/config/s390/s390.c | 38 ++
 .../gcc.target/s390/vector/vec-nopeel-1.c  | 17 ++
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-nopeel-1.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index dab4f43..82aca3f 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3674,6 +3674,41 @@ s390_address_cost (rtx addr, machine_mode mode 
ATTRIBUTE_UNUSED,
   return ad.indx? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (1);
 }
 
+/* Implement targetm.vectorize.builtin_vectorization_cost.  */
+static int
+s390_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
+tree vectype,
+int misalign ATTRIBUTE_UNUSED)
+{
+  switch (type_of_cost)
+{
+  case scalar_stmt:
+  case scalar_load:
+  case scalar_store:
+  case vector_stmt:
+  case vector_load:
+  case vector_store:
+  case vec_to_scalar:
+  case scalar_to_vec:
+  case cond_branch_not_taken:
+  case vec_perm:
+  case vec_promote_demote:
+   return 1;
+  case unaligned_load:
+  case unaligned_store:
+   return 2;
+
+  case cond_branch_taken:
+   return 3;
+
+  case vec_construct:
+   return TYPE_VECTOR_SUBPARTS (vectype) - 1;
+
+  default:
+   gcc_unreachable ();
+}
+}
+
 /* If OP is a SYMBOL_REF of a thread-local symbol, return its TLS mode,
otherwise return 0.  */
 
@@ -15428,6 +15463,9 @@ s390_excess_precision (enum excess_precision_type type)
 #define TARGET_REGISTER_MOVE_COST s390_register_move_cost
 #undef TARGET_MEMORY_MOVE_COST
 #define TARGET_MEMORY_MOVE_COST s390_memory_move_cost
+#undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
+#define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
+  s390_builtin_vectorization_cost
 
 #undef TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG s390_reorg
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-1.c 
b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-1.c
new file mode 100644
index 000..5f370a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mzarch -march=z13" } */
+/* { dg-require-effective-target vector } */
+
+int
+foo (int * restrict a, int n)
+{
+  int i, result = 0;
+
+  for (i = 0; i < n * 4; i++)
+result += a[i];
+  return result;
+}
+
+/* We do NOT want this loop to get peeled to reach better alignment.
+   Without peeling no scalar memory add should appear.  */
+/* { dg-final { scan-assembler-not "\ta\t" } } */
-- 
2.9.1

Re: [PATCH] Fix PR78588 - rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too large for 64-bit type

On 2016.11.29 at 16:01 +0100, Markus Trippelsdorf wrote:
> On 2016.11.29 at 15:21 +0100, Markus Trippelsdorf wrote:
> > On 2016.11.29 at 15:14 +0100, Jakub Jelinek wrote:
> > > On Tue, Nov 29, 2016 at 03:08:15PM +0100, Markus Trippelsdorf wrote:
> > > > Building gcc with -fsanitize=undefined shows:
> > > >  rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too
> > > >  large for 64-bit type 'long unsigned int'
> > > >
> > > > 5210   return nonzero & (HOST_WIDE_INT_1U << (bitwidth - 1))
> > > > 5211  ? 1 : bitwidth - floor_log2 (nonzero) - 1;
> > > >
> > > > Here (bitwidth - 1) wraps around because bitwidth is zero and unsigned.
> > >
> > > Which modes have precision of 0?  I'd expect just VOIDmode and BLKmode, 
> > > any
> > > others?  And for those I'd say it is a bug to call num_sign_bit_copies*.
> > 
> > Yes, only VOIDmode and BLKmode:
> > 
> >  233 const unsigned short mode_precision[NUM_MACHINE_MODES] =
> >  234 {
> >  235   0,   /* VOID */
> >  236   0,   /* BLK */
> 
> markus@x4 libsupc++ % cat cp-demangle.i
> d_demangle_callback_mangled() {
>   if (strncmp(d_demangle_callback_mangled, "", 1))
> d_type();
> }
> 
> markus@x4 libsupc++ % UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 
> /var/tmp/gcc_build_dir_/./gcc/cc1 -w -fpreprocessed cp-demangle.i -quiet 
> -dumpbase cp-demangle.i -mtune=generic -march=x86-64 -auxbase cp-demangle -O2 
> -version -o /dev/null
> GNU C11 (GCC) version 7.0.0 20161129 (experimental) (x86_64-pc-linux-gnu)
> compiled by GNU C version 7.0.0 20161129 (experimental), GMP version 
> 6.1.1, MPFR version 3.1.5, MPC version 1.0.3, isl version none
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C11 (GCC) version 7.0.0 20161129 (experimental) (x86_64-pc-linux-gnu)
> compiled by GNU C version 7.0.0 20161129 (experimental), GMP version 
> 6.1.1, MPFR version 3.1.5, MPC version 1.0.3, isl version none
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: 7cca725773f8a0693a2905f8af7b733c
> ../../gcc/gcc/rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is 
> too large for 64-bit type 'long unsigned int'
> #0 0x1b40fe1 in num_sign_bit_copies1 ../../gcc/gcc/rtlanal.c:5210
> #1 0x35ef5f1 in if_then_else_cond ../../gcc/gcc/combine.c:9180
> #2 0x35ef199 in if_then_else_cond ../../gcc/gcc/combine.c:9034
> #3 0x35ef199 in if_then_else_cond ../../gcc/gcc/combine.c:9034
> #4 0x3625f98 in combine_simplify_rtx ../../gcc/gcc/combine.c:5604
> #5 0x3632525 in subst ../../gcc/gcc/combine.c:5487
> #6 0x36327d6 in subst ../../gcc/gcc/combine.c:5425
> #7 0x3632bd7 in subst ../../gcc/gcc/combine.c:5354
> #8 0x3641a74 in try_combine ../../gcc/gcc/combine.c:3347
> #9 0x365727b in combine_instructions ../../gcc/gcc/combine.c:1421
> #10 0x365727b in rest_of_handle_combine ../../gcc/gcc/combine.c:14581
> #11 0x365727b in execute ../../gcc/gcc/combine.c:14626
> #12 0x195ad18 in execute_one_pass(opt_pass*) ../../gcc/gcc/passes.c:2370
> #13 0x195cbab in execute_pass_list_1 ../../gcc/gcc/passes.c:2459
> #14 0x195cbd4 in execute_pass_list_1 ../../gcc/gcc/passes.c:2460
> #15 0x195cc64 in execute_pass_list(function*, opt_pass*) 
> ../../gcc/gcc/passes.c:2470
> #16 0xc75deb in cgraph_node::expand() ../../gcc/gcc/cgraphunit.c:2001
> #17 0xc7b2fa in expand_all_functions ../../gcc/gcc/cgraphunit.c:2137
> #18 0xc7b2fa in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2494
> #19 0xc854b7 in symbol_table::compile() ../../gcc/gcc/cgraphunit.c:2587
> #20 0xc854b7 in symbol_table::finalize_compilation_unit() 
> ../../gcc/gcc/cgraphunit.c:2584
> #21 0x1d3ea10 in compile_file ../../gcc/gcc/toplev.c:488
> #22 0x629a14 in do_compile ../../gcc/gcc/toplev.c:1983
> #23 0x629a14 in toplev::main(int, char**) ../../gcc/gcc/toplev.c:2117
> #24 0x62c046 in main ../../gcc/gcc/main.c:39
> #25 0x7f4b6600f310 in __libc_start_main ../csu/libc-start.c:286
> #26 0x62c469 in _start (/var/tmp/gcc_build_dir_/gcc/cc1+0x62c469)

(gdb) p mode
$1 = BLKmode

#6  0x035ef5f2 in if_then_else_cond (x=0x760d3888,
ptrue=ptrue@entry=0x7fffd940, pfalse=pfalse@entry=0x7fffd950) at 
../../gcc/gcc/combine.c:9180
9180   && num_sign_bit_copies (x, mode) == GET_MODE_PRECISION 
(mode)))
(gdb) l
9175
9176  /* If X is known to be either 0 or -1, those are the true and
9177 false values when testing X.  */
9178  else if (x == constm1_rtx || x == const0_rtx
9179   || (mode != VOIDmode
9180   && num_sign_bit_copies (x, mode) == GET_MODE_PRECISION 
(mode)))
9181{
9182  *ptrue = constm1_rtx, *pfalse = const0_rtx;
9183  return x;
9184}


-- 
Markus

[libiberty] demangler formatting

2016-11-29 Thread Nathan Sidwell

In working on pr78252 I noticed a source formatting nit.  Fixed thusly 
and committed.


nathan
--
Nathan Sidwell
2016-11-29  Nathan Sidwell  

	* cp-demangle.c (d_print_comp_inner): Fix parameter indentation.

Index: cp-demangle.c
===
--- cp-demangle.c	(revision 242959)
+++ cp-demangle.c	(working copy)
@@ -4564,7 +4564,7 @@ d_maybe_print_fold_expression (struct d_
 
 static void
 d_print_comp_inner (struct d_print_info *dpi, int options,
-		  const struct demangle_component *dc)
+		const struct demangle_component *dc)
 {
   /* Magic variable to let reference smashing skip over the next modifier
  without needing to modify *dc.  */

Re: [PATCH 4/4] S/390: Disable peeling for alignment.

And again with the costs for unaligned loads/stores actually changed:

gcc/ChangeLog:

2016-11-29  Andreas Krebbel  

* gcc/config/s390/s390.c (s390_builtin_vectorization_cost): New
function.
(TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Define target
macro.

gcc/testsuite/ChangeLog:

2016-11-29  Andreas Krebbel  

* gcc.target/s390/vector/vec-nopeel-1.c: New test.
---
 gcc/config/s390/s390.c | 37 ++
 .../gcc.target/s390/vector/vec-nopeel-1.c  | 17 ++
 2 files changed, 54 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-nopeel-1.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index dab4f43..767666e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3674,6 +3674,40 @@ s390_address_cost (rtx addr, machine_mode mode 
ATTRIBUTE_UNUSED,
   return ad.indx? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (1);
 }
 
+/* Implement targetm.vectorize.builtin_vectorization_cost.  */
+static int
+s390_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
+tree vectype,
+int misalign ATTRIBUTE_UNUSED)
+{
+  switch (type_of_cost)
+{
+  case scalar_stmt:
+  case scalar_load:
+  case scalar_store:
+  case vector_stmt:
+  case vector_load:
+  case vector_store:
+  case vec_to_scalar:
+  case scalar_to_vec:
+  case cond_branch_not_taken:
+  case vec_perm:
+  case vec_promote_demote:
+  case unaligned_load:
+  case unaligned_store:
+   return 1;
+
+  case cond_branch_taken:
+   return 3;
+
+  case vec_construct:
+   return TYPE_VECTOR_SUBPARTS (vectype) - 1;
+
+  default:
+   gcc_unreachable ();
+}
+}
+
 /* If OP is a SYMBOL_REF of a thread-local symbol, return its TLS mode,
otherwise return 0.  */
 
@@ -15428,6 +15462,9 @@ s390_excess_precision (enum excess_precision_type type)
 #define TARGET_REGISTER_MOVE_COST s390_register_move_cost
 #undef TARGET_MEMORY_MOVE_COST
 #define TARGET_MEMORY_MOVE_COST s390_memory_move_cost
+#undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
+#define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
+  s390_builtin_vectorization_cost
 
 #undef TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG s390_reorg
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-1.c 
b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-1.c
new file mode 100644
index 000..581c371
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mzarch -march=z13" } */
+/* { dg-require-effective-target vector } */
+
+int
+foo (int * restrict a, int n)
+{
+  int i, result = 0;
+
+  for (i = 0; i < n * 4; i++)
+result += a[i];
+  return result;
+}
+
+/* We do NOT want this loop to get peeled.  Without peeling no scalar
+   memory add should appear.  */
+/* { dg-final { scan-assembler-not "\ta\t" } } */
-- 
2.9.1

Re: [PATCH v2] Fix PR78588 - rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too large for 64-bit type

Here is v2 of the fix.

Building gcc with -fsanitize=undefined shows:
 rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too large for 
64-bit type 'long unsigned int'

This happens because if_then_else_cond() in combine.c calls
num_sign_bit_copies() in rtlanal.c with mode==BLKmode.

5205   bitwidth = GET_MODE_PRECISION (mode);
5206   if (bitwidth > HOST_BITS_PER_WIDE_INT)
5207 return 1;
5208
5209   nonzero = nonzero_bits (x, mode);
5210   return nonzero & (HOST_WIDE_INT_1U << (bitwidth - 1))
5211  ? 1 : bitwidth - floor_log2 (nonzero) - 1;

This causes (bitwidth - 1) to wrap around.

Fix by also guarding against BLKmode.

Tested on pcc64le.
OK for trunk?

Thanks.

PR rtl-optimization/78588 
* combine.c (if_then_else_cond): Also guard against BLKmode.

diff --git a/gcc/combine.c b/gcc/combine.c
index 22fb7a976538..a32a0ecc72fb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -9176,7 +9176,7 @@ if_then_else_cond (rtx x, rtx *ptrue, rtx *pfalse)
   /* If X is known to be either 0 or -1, those are the true and
  false values when testing X.  */
   else if (x == constm1_rtx || x == const0_rtx
-  || (mode != VOIDmode
+  || (mode != VOIDmode && mode != BLKmode
   && num_sign_bit_copies (x, mode) == GET_MODE_PRECISION (mode)))
 {
   *ptrue = constm1_rtx, *pfalse = const0_rtx;

--
Markus

Re: [RFA] Handle target with no length attributes sanely in bb-reorder.c


On 11/29/2016 03:23 AM, Richard Biener wrote:

On Mon, Nov 28, 2016 at 10:23 PM, Jeff Law  wrote:



I was digging into  issues around the patches for 78120 when I stumbled upon
undesirable bb copying in bb-reorder.c on the m68k.

The core issue is that the m68k does not define a length attribute and
therefore generic code assumes that the length of all insns is 0 bytes.


What other targets behave like this?

ft32, nvptx, mmix, mn10300, m68k, c6x, rl78, vax, ia64, m32c

cris has a hack to define a length, even though no attempt is made to 
make it accurate.  The hack specifically calls out that it's to make 
bb-reorder happy.





That in turn makes bb-reorder think it is infinitely cheap to copy basic
blocks.  In the two codebases I looked at (GCC's runtime libraries and
newlib) this leads to a 10% and 15% undesirable increase in code size.

I've taken a slight variant of this patch and bootstrapped/regression tested
it on x86_64-linux-gnu to verify sanity as well as built the m68k target
libraries noted above.

OK for the trunk?


I wonder if it isn't better to default to a length of 1 instead of zero when
there is no length attribute.  There are more users of the length attribute
in bb-reorder.c (and elsewhere as well I suppose).
I pondered that as well, but felt it was riskier given we've had a 
default length of 0 for ports that don't define lengths since the early 
90s.  It's certainly easy enough to change that default if you'd prefer. 
 I don't have a strong preference either way.


Jeff

Re: [PATCH] improve folding of expressions that move a single bit around


On 11/29/2016 03:16 AM, Richard Biener wrote:

On Mon, Nov 28, 2016 at 7:41 PM, Jeff Law  wrote:

On 11/28/2016 06:10 AM, Paolo Bonzini wrote:




On 27/11/2016 00:28, Marc Glisse wrote:


On Sat, 26 Nov 2016, Paolo Bonzini wrote:


--- match.pd(revision 242742)
+++ match.pd(working copy)
@@ -2554,6 +2554,19 @@
  (cmp (bit_and@2 @0 integer_pow2p@1) @1)
  (icmp @2 { build_zero_cst (TREE_TYPE (@0)); })))

+/* If we have (A & C) != 0 ? D : 0 where C and D are powers of 2,
+   convert this into a shift of (A & C).  */
+(simplify
+ (cond
+  (ne (bit_and@2 @0 integer_pow2p@1) integer_zerop)
+  integer_pow2p@3 integer_zerop)
+ (with {
+int shift = wi::exact_log2 (@3) - wi::exact_log2 (@1);
+  }
+  (if (shift > 0)
+   (lshift (convert @2) { build_int_cst (integer_type_node, shift); })
+   (convert (rshift @2 { build_int_cst (integer_type_node, -shift);
})



What happens if @1 is the sign bit, in a signed type? Do we get an
arithmetic shift right?



It shouldn't happen because the canonical form of a sign bit test is A <
0 (that's the pattern immediately after).  However I can add an "if" if
preferred, or change the pattern to do the AND after the shift.


But are we absolutely sure it'll be in canonical form every time?


No, of course not (though it would be a bug).  If the pattern generates wrong
code when the non-canonical form is met that would be bad, if it merely
does not optimize (or optimize non-optimally) then that's not too bad.
Agreed.  I managed to convince myself that for a signed type with the 
sign bit on that we'd generate incorrect code.  But that was from a 
quick review of the pattern.


Jeff

Re: [PATCH, vec-tails] Support loop epilogue vectorization

On 18 November 2016 at 16:54, Christophe Lyon
 wrote:
> On 18 November 2016 at 16:46, Yuri Rumyantsev  wrote:
>> It is very strange that this test failed on arm, since it requires
>> target avx2 to check vectorizer dumps:
>>
>> /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" {
>> target avx2_runtime } } } */
>> /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED
>> \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */
>>
>> Could you please clarify what is the reason of the failure?
>
> It's not the scan-dumps that fail, but the execution.
> The test calls abort() for some reason.
>
> It will take me a while to rebuild the test manually in the right
> debug environment to provide you with more traces.
>
>
Sorry for the delay... This problem is not directly related to your patch.

The tests in gcc.dg/vect are compiled with -mfpu=neon
-mfloat-abi=softfp -march=armv7-a
and thus cannot be executed on older versions of the architecture.

This is another instance of what I discussed with Jakub several months ago:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00666.html
but the thread died.

Basically, check_vect_support_and_set_flags sets set
dg-do-what-default compile, but
some tests in gcc.dg/vect have dg-do run hardcoded.

Jakub was not happy with my patch that was removing all these dg-do
run directives :-)

Christophe

>
>>
>> Thanks.
>>
>> 2016-11-18 16:20 GMT+03:00 Christophe Lyon :
>>> On 15 November 2016 at 15:41, Yuri Rumyantsev  wrote:
 Hi All,

 Here is patch for non-masked epilogue vectoriziation.

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?

 Thanks.
 Changelog:

 2016-11-15  Yuri Rumyantsev  

 * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
 * tree-if-conv.c (tree_if_conversion): Make public.
 * * tree-if-conv.h: New file.
 * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
 dynamic alias checks for epilogues.
 * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
 * tree-vect-loop.c: include tree-if-conv.h.
 (new_loop_vec_info): Add zeroing orig_loop_info field.
 (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
 (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
 if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
 using passed argument.
 (vect_transform_loop): Check if created epilogue should be returned
 for further vectorization with less vf.  If-convert epilogue if
 required. Print vectorization success for epilogue.
 * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
 if it is required, pass loop_vinfo produced during vectorization of
 loop body to vect_analyze_loop.
 * tree-vectorizer.h (struct _loop_vec_info): Add new field
 orig_loop_info.
 (LOOP_VINFO_ORIG_LOOP_INFO): New.
 (LOOP_VINFO_EPILOGUE_P): New.
 (LOOP_VINFO_ORIG_VECT_FACTOR): New.
 (vect_do_peeling): Change prototype to return epilogue.
 (vect_analyze_loop): Add argument of loop_vec_info type.
 (vect_transform_loop): Return created loop.

 gcc/testsuite/

 * lib/target-supports.exp (check_avx2_hw_available): New.
 (check_effective_target_avx2_runtime): New.
 * gcc.dg/vect/vect-tail-nomask-1.c: New test.

>>>
>>> Hi,
>>>
>>> This new test fails on arm-none-eabi (using default cpu/fpu/mode):
>>>   gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test
>>>   gcc.dg/vect/vect-tail-nomask-1.c execution test
>>>
>>> It does pass on the same target if configured --with-cpu=cortex-a9.
>>>
>>> Christophe
>>>
>>>
>>>

 2016-11-14 20:04 GMT+03:00 Richard Biener :
> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
>  wrote:
>>Richard,
>>
>>I checked one of the tests designed for epilogue vectorization using
>>patches 1 - 3 and found out that build compiler performs vectorization
>>of epilogues with --param vect-epilogues-nomask=1 passed:
>>
>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
>>t1.new-nomask.s -fdump-tree-vect-details
>>$ grep VECTORIZED -c t1.c.156t.vect
>>4
>> Without param only 2 loops are vectorized.
>>
>>Should I simply add a part of tests related to this feature or I must
>>delete all not necessary changes also?
>
> Please remove all not necessary changes.
>
> Richard.
>
>>Thanks.
>>Yuri.
>>
>>2016-11-14 16:40 GMT+03:00 Richard Biener :
>>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
>>>
 Richard,

 In my previous patch I forgot to remove couple lines related to aux
>>field.
 Here is the correct updated patch.
>>>
>>> Yeah, I noticed.  This patch would be ok for trunk (together with
>>> necessary parts from 1 and 2) if all not required parts are removed
>>> (and you'd

[PR middle-end/78566] Fix uninit regressions caused by previous -Wmaybe-uninit change

2016-11-29 Thread Aldy Hernandez

This fixes the gcc.dg/uninit-pred-6* failures I seem to have caused on 
some non x86 platforms. Sorry for the delay.


The problem is that my fix for PR61409 had the logic backwards.  I was 
proving that all the uses of a PHI are invalidated by any one undefined 
PHI path, whereas what we want is to prove that EVERY uninitialized path 
is invalidated by some facor in the PHI use.


The attached patch fixes this without causing any regressions on x86-64 
Linux.  I also verified that at least on [arm-none-linux-gnueabihf
--with-cpu=cortex-a5 --with-fpu=vfpv3-d16-fp16], there are no 
gcc.dg/*uninit* regressions.


There is still one regression at large involving a double free in 
PR78548 which I will look at next/independently.


OK for trunk?
Aldy
commit 469f4c38a48bc284c268b40f5d5511f015844ea2
Author: Aldy Hernandez 
Date:   Tue Nov 29 05:59:53 2016 -0500

PR middle-end/78566
* tree-ssa-uninit.c (can_one_predicate_be_invalidated_p): Change
argument type to a pred_chain.
(can_chain_union_be_invalidated_p): Use pred_chain instead of a
worklist.
(flatten_out_predicate_chains): Remove.
(uninit_uses_cannot_happen): Rename from
uninit_ops_invalidate_phi_use.
Change logic so that we are checking that the PHI use will
invalidate _ALL_ possibly uninitialized operands.
(is_use_properly_guarded): Rename call to
uninit_ops_invalidate_phi_use into uninit_uses_cannot_happen.

diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 4557403..a648995 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -2155,115 +2155,66 @@ normalize_preds (pred_chain_union preds, gimple 
*use_or_def, bool is_use)
 
 static bool
 can_one_predicate_be_invalidated_p (pred_info predicate,
-   vec worklist)
+   pred_chain use_guard)
 {
-  for (size_t i = 0; i < worklist.length (); ++i)
+  for (size_t i = 0; i < use_guard.length (); ++i)
 {
-  pred_info *p = worklist[i];
-
   /* NOTE: This is a very simple check, and only understands an
 exact opposite.  So, [i == 0] is currently only invalidated
 by [.NOT. i == 0] or [i != 0].  Ideally we should also
 invalidate with say [i > 5] or [i == 8].  There is certainly
 room for improvement here.  */
-  if (pred_neg_p (predicate, *p))
+  if (pred_neg_p (predicate, use_guard[i]))
return true;
 }
   return false;
 }
 
-/* Return TRUE if all USE_PREDS can be invalidated by some predicate
-   in WORKLIST.  */
+/* Return TRUE if all predicates in UNINIT_PRED are invalidated by
+   USE_GUARD being true.  */
 
 static bool
-can_chain_union_be_invalidated_p (pred_chain_union use_preds,
- vec worklist)
+can_chain_union_be_invalidated_p (pred_chain_union uninit_pred,
+ pred_chain use_guard)
 {
-  /* Remember:
-   PRED_CHAIN_UNION = PRED_CHAIN1 || PRED_CHAIN2 || PRED_CHAIN3
-   PRED_CHAIN = PRED_INFO1 && PRED_INFO2 && PRED_INFO3, etc.
-
-   We need to invalidate the entire PRED_CHAIN_UNION, which means,
-   invalidating every PRED_CHAIN in this union.  But to invalidate
-   an individual PRED_CHAIN, all we need to invalidate is _any_ one
-   PRED_INFO, by boolean algebra !PRED_INFO1 || !PRED_INFO2...  */
-  for (size_t i = 0; i < use_preds.length (); ++i)
+  if (uninit_pred.is_empty ())
+return false;
+  for (size_t i = 0; i < uninit_pred.length (); ++i)
 {
-  pred_chain c = use_preds[i];
-  bool entire_pred_chain_invalidated = false;
+  pred_chain c = uninit_pred[i];
   for (size_t j = 0; j < c.length (); ++j)
-   if (can_one_predicate_be_invalidated_p (c[j], worklist))
- {
-   entire_pred_chain_invalidated = true;
-   break;
- }
-  if (!entire_pred_chain_invalidated)
-   return false;
+   if (!can_one_predicate_be_invalidated_p (c[j], use_guard))
+ return false;
 }
   return true;
 }
 
-/* Flatten out all the factors in all the pred_chain_union's in PREDS
-   into a WORKLIST of individual PRED_INFO's.
+/* Return TRUE if none of the uninitialized operands in UNINT_OPNDS
+   can actually happen if we arrived at a use for PHI.
 
-   N is the number of pred_chain_union's in PREDS.
+   PHI_USE_GUARDS are the guard conditions for the use of the PHI.  */
 
-   Since we are interested in the inverse of the PRED_CHAIN's, by
-   boolean algebra, an inverse turns those PRED_CHAINS into unions,
-   which means we can flatten all the factors out for easy access.  */
-
-static void
-flatten_out_predicate_chains (pred_chain_union preds[], size_t n,
- vec *worklist)
+static bool
+uninit_uses_cannot_happen (gphi *phi, unsigned uninit_opnds,
+  pred_chain_union phi_use_guards)
 {
-  for (size_t i = 0; i < n; ++i)
-{

Re: Ping: Re: [patch, avr] Add flash size to device info and make wrap around default

2016-11-29 Thread Denis Chertykov

2016-11-28 10:17 GMT+03:00 Pitchumani Sivanupandi
:
> On Saturday 26 November 2016 12:11 AM, Denis Chertykov wrote:
>>
>> I'm sorry for delay.
>>
>> I have a problem with the patch:
>> (Stripping trailing CRs from patch; use --binary to disable.)
>> patching file avr-arch.h
>> (Stripping trailing CRs from patch; use --binary to disable.)
>> patching file avr-devices.c
>> (Stripping trailing CRs from patch; use --binary to disable.)
>> patching file avr-mcus.def
>> Hunk #1 FAILED at 62.
>> 1 out of 1 hunk FAILED -- saving rejects to file avr-mcus.def.rej
>> (Stripping trailing CRs from patch; use --binary to disable.)
>> patching file gen-avr-mmcu-specs.c
>> Hunk #1 succeeded at 215 (offset 5 lines).
>> (Stripping trailing CRs from patch; use --binary to disable.)
>> patching file specs.h
>> Hunk #1 succeeded at 58 (offset 1 line).
>> Hunk #2 succeeded at 66 (offset 1 line).
>
>
> There are changes in avr-mcus.def after this patch is submitted.
> Now, I have incorporated the changes and attached the resolved patch.
>
> Regards,
> Pitchumani
>
> gcc/ChangeLog
>
> 2016-11-09  Pitchumani Sivanupandi 
>
> * config/avr/avr-arch.h (avr_mcu_t): Add flash_size member.
> * config/avr/avr-devices.c(avr_mcu_types): Add flash size info.
> * config/avr/avr-mcu.def: Likewise.
> * config/avr/gen-avr-mmcu-specs.c (print_mcu): Remove hard-coded prefix
> check to find wrap-around value, instead use MCU flash size. For 8k
> flash
> devices, update link_pmem_wrap spec string to add --pmem-wrap-around=8k.
> * config/avr/specs.h: Remove link_pmem_wrap from LINK_RELAX_SPEC and
> add to linker specs (LINK_SPEC) directly.

Committed.

Re: [PR middle-end/78566] Fix uninit regressions caused by previous -Wmaybe-uninit change

On 29 November 2016 at 17:33, Aldy Hernandez  wrote:
> This fixes the gcc.dg/uninit-pred-6* failures I seem to have caused on some
> non x86 platforms. Sorry for the delay.
>
> The problem is that my fix for PR61409 had the logic backwards.  I was
> proving that all the uses of a PHI are invalidated by any one undefined PHI
> path, whereas what we want is to prove that EVERY uninitialized path is
> invalidated by some facor in the PHI use.
>
> The attached patch fixes this without causing any regressions on x86-64
> Linux.  I also verified that at least on [arm-none-linux-gnueabihf
> --with-cpu=cortex-a5 --with-fpu=vfpv3-d16-fp16], there are no
> gcc.dg/*uninit* regressions.
>
> There is still one regression at large involving a double free in PR78548
> which I will look at next/independently.
>
Thanks for working on this.
I've submitted a validation with your patch, I'll let you know if I find any
regressions.

Christophe

> OK for trunk?
> Aldy

Re: [PATCH] Fix PR78306


On 11/29/2016 12:47 AM, Richard Biener wrote:

Balaji added this check explicitly. There should be tests in the testsuite
(spawnee_inline, spawner_inline) which exercise that code.


Yes he did, but no, nothing in the testsuite.

I believe the tests are:

c-c++-common/cilk-plus/CK/spawnee_inline.c
c-c++-common/cilk-plus/CK/spawner_inline.c

But as I mentioned, they don't check for proper behaviour




There is _nowhere_ documented _why_ the checks were added.  Why is
inlining a transform that can do anything bad to a function using
cilk_spawn?
I know, it's disappointing.  Even the tests mentioned above don't shed 
any real light on the issue.



Jeff

Re: [PATCH] correct handling of non-constant width and precision (pr 78521)

2016-11-29 Thread Martin Sebor


On 11/28/2016 05:42 PM, Joseph Myers wrote:

On Sun, 27 Nov 2016, Martin Sebor wrote:


Finally, the patch also tightens up the constraint on the upper bound
of bounded functions like snprintf to be INT_MAX.  The functions cannot
produce output in excess of INT_MAX + 1 bytes and some implementations
(e.g., Solaris) fail with EINVAL when the bound is INT_MAX or more.
This is the subject of PR 78520.


Note that failing with large bounds is questionable (there is an apparent
conflict between ISO C, where passing a large bound seems valid, and
POSIX, where large bounds require errors; see
; I'm not sure if any liaison
issue for this ever got passed to WG14).


Thanks!  That's useful background.  Let me check with Nick to see
is he (as the POSIX/WG14 liaison) plans to submit it.  I can also
write it up for the next WG14 meeting if we or the Austin Group
feel like WG14 should clarify or change things.

Martin

Re: [RFC] Assert DECL_ABSTRACT_ORIGIN is different from the decl itself


On 11/29/2016 03:13 AM, Richard Biener wrote:

On Mon, Nov 28, 2016 at 6:28 PM, Martin Jambor  wrote:

Hi Jeff,

On Mon, Nov 28, 2016 at 08:46:05AM -0700, Jeff Law wrote:

On 11/28/2016 07:27 AM, Martin Jambor wrote:

Hi,

one of a number of symptoms of an otherwise unrelated HSA bug I've
been debugging today is gcc crashing or hanging in the C++ pretty
printer when attempting to emit a warning because dump_decl() ended up
in an infinite recursion calling itself on the DECL_ABSTRACT_ORIGIN of
the decl it was looking at, which was however the same thing.  (It was
set to itself on purpose in set_decl_origin_self as a part of final
pass, the decl was being printed because it was itself an abstract
origin of another one).

If someone ever faces a similar problem, the following (untested)
patch might save them a bit of time.  I have eventually decided not to
make it a checking-only assert because it is on a cold path and
because at release-build optimization levels, the tail-call is
optimized to a jump and thus an infinite loop if the described
situation happens, and I suppose an informative ICE is better tan that
even for users.

What do you think?  Would it be reasonable for trunk even now or
should I queue it for the next stage1?

Thanks,

Martin


gcc/cp/

2016-11-28  Martin Jambor  

* error.c (dump_decl): Add an assert that DECL_ABSTRACT_ORIGIN
is not the decl itself.

Given it's on an error/debug path it ought to be plenty safe for now. What's
more interesting is whether or not DECL_ABSTRACT_ORIGIN can legitimately
point to itself and if so, how is that happening.


Well, I tried to explain it in my original email but I also wanted to
be as brief as possible, so perhaps it is necessary to elaborate a bit:

There is a function set_decl_origin_self() in dwarf2out.c that does
just that, sets DECL_ABSTRACT_ORIGIN to the decl itself, and its
comment makes it clear that is intended (according to git blame, the
whole comment and much of the implementation come from 1992, though ;-)
The function is called from the "final" pass through dwarf2out_decl(),
and gen_decl_die().

So, for one reason or another, this is the intended behavior.
Apparently, after that one is not supposed to be printing the decl
name of such a "finished" a function.  It is too bad however that this
can happen if a "finished" function is itself an abstract origin of a
different one, which is optimized and expanded only afterwards and you
attempt to print its decl name, because it triggers printing the decl
name of the finished function, in turn triggering the infinite
recursion/loop.  I am quite surprised that we have not hit this
earlier (e.g. with warnings in IPA-CP clones) but perhaps there is a
reason.

I will append the patch to some bootstrap and testing run and commit
it afterwards if it passes.


Other users explicitely check for the self-reference when walking origins.
I think that makes it pretty clear that we have to handle 
self-reference.  So it seems that rather than an assert that we should 
just not walk down a self-referencing DECL_ABSTRACT_ORIGIN.


jeff

Re: [PATCH 7/9] Add RTL-error-handling to host

2016-11-29 Thread David Malcolm

On Mon, 2016-11-28 at 14:47 +0100, Bernd Schmidt wrote:
> Been looking at this off and on, and I'm still not sure I entirely
> get 
> it - sorry.
> 
> On 11/11/2016 10:15 PM, David Malcolm wrote:
> > > > Implementing an RTL frontend by using the RTL reader from read
> > > > -rtl.c
> > > > means that we now need a diagnostics subsystem on the *host*
> > > > for
> > > > handling errors in RTL files, rather than just on the build
> > > > machine.
> 
> So, there are two things that bother me about this patch description:
>   - The host already has the full diagnostic subsystem.

Maybe I worded this poorly.

I meant to say:

"we now need
  ((a diagnostics subsystem for handling errors in RTL files)
   on the *host*),
rather than just on the build machine."

rather than:

"we now need a
  ((diagnostics subsystem on the *host*)
   for handling
errors in RTL files),
 rather than just on the build machine."

if that distinction makes sense.  Clearly we already have a diagnostics
subsystem on the host; what this patch is adding is the separate, rtl-s
pecific diagnostic subsystem to cc1 on the host.

> The fact that
> you're commenting out some of the functions in errors.c suggests
> that errors.c is conflicting with the full one.

It doesn't conflict: C++ overloading allows both to co-exist.  However
I wanted to make sure that we don't accidentally use the RTL-specific
error-handling within other parts of the compiler.

>   - We already compile errors.c for both build and host.

Aha, yes, we do, it's linked into gengtype on the host to allow plugins
to support GTY.  The patch adds it to OBJS so that it is available
within cc1.

> Is there a problem with using both the full and the light errors
> system 
> for read-rtl, as available? Mismatches in function signatures or 
> something like this?

As noted above, C++ overloading allows this.

> > -#ifdef HOST_GENERATOR_FILE
> > -#include "config.h"
> > -#define GENERATOR_FILE 1
> > -#else
> > +/* This file is compiled twice: once for the generator programs
> > +   once for the compiler.  */
> > +#ifdef GENERATOR_FILE
> >  #include "bconfig.h"
> > +#else
> > +#include "config.h"
> >  #endif
> >  #include "system.h"
> >  #include "errors.h"
> 
> The Makefile still has a HOST_GENERATOR_FILE definition for errors.c 
> after this.

Will remove.

Re: [PATCH 7/9] Add RTL-error-handling to host

2016-11-29 Thread Bernd Schmidt


On 11/29/2016 06:20 PM, David Malcolm wrote:


if that distinction makes sense.  Clearly we already have a diagnostics
subsystem on the host; what this patch is adding is the separate, rtl-s
pecific diagnostic subsystem to cc1 on the host.


So that still seems odd to me. Why not use the normal diagnostics 
subsystem, and add whatever you need from it to errors.c for use from 
the generator programs? What exactly makes it "rtl-specific"?



Bernd

Re: Calling 'abort' on bounds violations in libmpx

2016-11-29 Thread Ilya Enkovich

2016-11-29 17:43 GMT+03:00 Alexander Ivchenko :
> Hi,
>
> Attached patch is addressing PR67520. Would that approach work for the
> problem? Should I also change the version of the library?

Hi!

Overall patch is OK. But you need to change version because you
change default behavior. How did you test it? Did you check default
behavior change doesn't affect existing runtime MPX tests? Can we
add new ones?

Thanks,
Ilya

>
> 2016-11-29  Alexander Ivchenko  
>
> * mpxrt/mpxrt-utils.c (set_mpx_rt_stop_handler): New function.
> (print_help): Add help for CHKP_RT_STOP_HANDLER environment
> variable.
> (__mpxrt_init_env_vars): Add initialization of stop_handler.
> (__mpxrt_stop_handler): New function.
> (__mpxrt_stop): Ditto.
> * mpxrt/mpxrt-utils.h (mpx_rt_stop_mode_handler_t): New enum.
>
>
>
> diff --git a/libmpx/mpxrt/mpxrt-utils.c b/libmpx/mpxrt/mpxrt-utils.c
> index 057a355..63ee7c6 100644
> --- a/libmpx/mpxrt/mpxrt-utils.c
> +++ b/libmpx/mpxrt/mpxrt-utils.c
> @@ -60,6 +60,9 @@
>  #define MPX_RT_MODE "CHKP_RT_MODE"
>  #define MPX_RT_MODE_DEFAULT MPX_RT_COUNT
>  #define MPX_RT_MODE_DEFAULT_STR "count"
> +#define MPX_RT_STOP_HANDLER "CHKP_RT_STOP_HANDLER"
> +#define MPX_RT_STOP_HANDLER_DEFAULT MPX_RT_STOP_HANDLER_ABORT
> +#define MPX_RT_STOP_HANDLER_DEFAULT_STR "abort"
>  #define MPX_RT_HELP "CHKP_RT_HELP"
>  #define MPX_RT_ADDPID "CHKP_RT_ADDPID"
>  #define MPX_RT_BNDPRESERVE "CHKP_RT_BNDPRESERVE"
> @@ -84,6 +87,7 @@ typedef struct {
>  static int summary;
>  static int add_pid;
>  static mpx_rt_mode_t mode;
> +static mpx_rt_stop_mode_handler_t stop_handler;
>  static env_var_list_t env_var_list;
>  static verbose_type verbose_val;
>  static FILE *out;
> @@ -226,6 +230,23 @@ set_mpx_rt_mode (const char *env)
>}
>  }
>
> +static mpx_rt_stop_mode_handler_t
> +set_mpx_rt_stop_handler (const char *env)
> +{
> +  if (env == 0)
> +return MPX_RT_STOP_HANDLER_DEFAULT;
> +  else if (strcmp (env, "abort") == 0)
> +return MPX_RT_STOP_HANDLER_ABORT;
> +  else if (strcmp (env, "exit") == 0)
> +return MPX_RT_STOP_HANDLER_EXIT;
> +  {
> +__mpxrt_print (VERB_ERROR, "Illegal value '%s' for %s. Legal values are"
> +   "[abort | exit]\nUsing default value %s\n",
> +   env, MPX_RT_STOP_HANDLER, MPX_RT_STOP_HANDLER_DEFAULT);
> +return MPX_RT_STOP_HANDLER_DEFAULT;
> +  }
> +}
> +
>  static void
>  print_help (void)
>  {
> @@ -244,6 +265,11 @@ print_help (void)
>fprintf (out, "%s \t\t set MPX runtime behavior on #BR exception."
> " [stop | count]\n"
> "\t\t\t [default: %s]\n", MPX_RT_MODE, MPX_RT_MODE_DEFAULT_STR);
> +  fprintf (out, "%s \t set the handler function MPX runtime will call\n"
> +   "\t\t\t on #BR exception when %s is set to \'stop\'."
> +   " [abort | exit]\n"
> +   "\t\t\t [default: %s]\n", MPX_RT_STOP_HANDLER, MPX_RT_MODE,
> +   MPX_RT_STOP_HANDLER_DEFAULT_STR);
>fprintf (out, "%s \t\t generate out,err file for each process.\n"
> "\t\t\t generated file will be MPX_RT_{OUT,ERR}_FILE.pid\n"
> "\t\t\t [default: no]\n", MPX_RT_ADDPID);
> @@ -357,6 +383,10 @@ __mpxrt_init_env_vars (int* bndpreserve)
>env_var_list_add (MPX_RT_MODE, env);
>mode = set_mpx_rt_mode (env);
>
> +  env = secure_getenv (MPX_RT_STOP_HANDLER);
> +  env_var_list_add (MPX_RT_STOP_HANDLER, env);
> +  stop_handler = set_mpx_rt_stop_handler (env);
> +
>env = secure_getenv (MPX_RT_BNDPRESERVE);
>env_var_list_add (MPX_RT_BNDPRESERVE, env);
>validate_bndpreserve (env, bndpreserve);
> @@ -487,6 +517,22 @@ __mpxrt_mode (void)
>return mode;
>  }
>
> +mpx_rt_mode_t
> +__mpxrt_stop_handler (void)
> +{
> +  return stop_handler;
> +}
> +
> +void __attribute__ ((noreturn))
> +__mpxrt_stop (void)
> +{
> +  if (__mpxrt_stop_handler () == MPX_RT_STOP_HANDLER_ABORT)
> +abort ();
> +  else if (__mpxrt_stop_handler () == MPX_RT_STOP_HANDLER_EXIT)
> +exit (255);
> +  __builtin_unreachable ();
> +}
> +
>  void
>  __mpxrt_print_summary (uint64_t num_brs, uint64_t l1_size)
>  {
> diff --git a/libmpx/mpxrt/mpxrt-utils.h b/libmpx/mpxrt/mpxrt-utils.h
> index d62937d..6da12cc 100644
> --- a/libmpx/mpxrt/mpxrt-utils.h
> +++ b/libmpx/mpxrt/mpxrt-utils.h
> @@ -54,6 +54,11 @@ typedef enum {
>MPX_RT_STOP
>  } mpx_rt_mode_t;
>
> +typedef enum {
> +  MPX_RT_STOP_HANDLER_ABORT,
> +  MPX_RT_STOP_HANDLER_EXIT
> +} mpx_rt_stop_mode_handler_t;
> +
>  void __mpxrt_init_env_vars (int* bndpreserve);
>  void __mpxrt_write_uint (verbose_type vt, uint64_t val, unsigned base);
>  void __mpxrt_write (verbose_type vt, const char* str);
> diff --git a/libmpx/mpxrt/mpxrt.c b/libmpx/mpxrt/mpxrt.c
> index b52906b..0bc069c 100644
> --- a/libmpx/mpxrt/mpxrt.c
> +++ b/libmpx/mpxrt/mpxrt.c
> @@ -252,7 +251,7 @@ handler (int sig __attribute__ ((unused)),
>uctxt->uc_mcontext.gregs[REG_IP_IDX] =
>  (greg_t)get_next_inst_ip ((uint8_t *)ip);
>if (__mpxrt_mode () == MPX_RT_STOP)
> -exit (255);
> +__mpxrt_stop ();
>return;
>
>   default:
> @@ -269,7 +268,7 @@ handler (int sig __attribute__ (

Re: [PATCH] remove %p handling from gimple-ssa-sprintf (pr78512)


On 11/28/2016 07:57 PM, Martin Sebor wrote:

PR 78512 - r242674 miscompiles Linux kernel observes that the Linux
kernel fails to boot as a result of enabling the -fprintf-return-value
optimization in GCC.  This is likely because the kernel has its own
sprintf with a large set of extensions to the %p directive that
conflict with the optimization.  Ordinarily, programs that define
their own versions of C library functions that differ from what C
specifies are expected to disable GCC's built-ins (e.g., by
-fno-builtin, or for freestanding environments like the Linux kernel,
by -ffreestanding).  But the Linux kernel doesn't do that and hence
the conflict.

After discussing a few possible options (handling the kernel extensions
in GCC, providing a new GCC option to disable the %p handling, and
disabling both the optimization and the warning for calls involving
the %p directive, the last was viewed as the best alternative).  The
attached patch removes the %p handling from GCC.

And just to give a little more information here.

The fundamental issue is that %p handling is implementation defined and 
the implementations can (of course) change over time.   Handling of %p 
essentially turns into coding GCC to an implementation rather than a 
real specification.


The details of those implementations would have to be baked into GCC 
itself.  That wasn't terrible when we were just trying to support the 
rather limited cases found in glibc, uClibc, aix & solaris.  But when we 
add the linux kernel and its extensions into the mix, it didn't seem 
wise to continue bake that knowledge into GCC.   The need to support 
multiple %p implementations from a single compiler just makes things 
even worse.


After deliberating those issues, Jakub, Martin and myself ultimately 
decided that supporting %p for warnings and optimization was unwise.


It's unfortunate because the kernel makes extensive use of %p.  I guess 
one could create a plug-in to check %p for the kernel if they wanted to 
take advantage of the checking capabilities.


Approved for the trunk.

Thanks,

jeff

Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.


On 11/29/2016 07:02 AM, Andrew Burgess wrote:

* Jeff Law  [2016-11-28 15:08:46 -0700]:


On 11/24/2016 02:40 PM, Andrew Burgess wrote:

* Christophe Lyon  [2016-11-21 13:47:09 +0100]:


On 20 November 2016 at 18:27, Mike Stump  wrote:

On Nov 19, 2016, at 1:59 PM, Andrew Burgess  wrote:

So, your new test fails on arm* targets:


After a little digging I think the problem might be that
-freorder-blocks-and-partition is not supported on arm.

This should be detected as the new tests include:

   /* { dg-require-effective-target freorder } */

however this test passed on arm as -freorder-blocks-and-partition does
not issue any warning unless -fprofile-use is also passed.

The patch below extends check_effective_target_freorder to check using
-fprofile-use.  With this change in place the tests are skipped on
arm.



All feedback welcome,


Seems reasonable, unless a -freorder-blocks-and-partition/-fprofile-use person 
thinks this is the wrong solution.



Hi,

As promised, I tested this patch: it makes
gcc.dg/tree-prof/section-attr-[123].c
unsupported on arm*, and thus they are not failing anymore :-)

However, it also makes other tests unsupported, while they used to pass:

  gcc.dg/pr33648.c
  gcc.dg/pr46685.c
  gcc.dg/tree-prof/20041218-1.c
  gcc.dg/tree-prof/bb-reorg.c
  gcc.dg/tree-prof/cold_partition_label.c
  gcc.dg/tree-prof/comp-goto-1.c
  gcc.dg/tree-prof/pr34999.c
  gcc.dg/tree-prof/pr45354.c
  gcc.dg/tree-prof/pr50907.c
  gcc.dg/tree-prof/pr52027.c
  gcc.dg/tree-prof/va-arg-pack-1.c

and failures are now unsupported:
  gcc.dg/tree-prof/cold_partition_label.c
  gcc.dg/tree-prof/section-attr-1.c
  gcc.dg/tree-prof/section-attr-2.c
  gcc.dg/tree-prof/section-attr-3.c

So, maybe this patch is too strong?


In all of the cases that used to pass the tests are compile only tests
(except for cold_partition_label, which I discuss below).

On ARM passing -fprofile-use and -freorder-blocks-and-partition
results in a warning, and the -freorder-blocks-and-partition flag is
ignored.  However, disabling -freorder-blocks-and-partition doesn't
stop any of the tests compiling, hence the passes.

All the tests include:

  /* { dg-require-effective-target freorder } */

which I understand to mean, the tests requires the 'freorder' feature
to be supported (which corresponds to -freorder-blocks-and-partition).

For cold_partition_label and my new tests it's seems clear that the
lack of support for -freorder-blocks-and-partition on ARM is the cause
of the test failures.

So, is it reasonable to give up the other tests as "unsupported"?  I'd
be inclined to say yes, but I happy to rework the patch if anyone has
a suggestion for an alternative approach.

It is reasonable.  It's not uncommon to have to drop various tests to
UNSUPPORTED, particularly things which depend on assembler/linker
capabilities, the target runtime system, etc.


OK, I'm going to take that as approval for my patch[1].  I'll wait a
couple of days to give people a chance to correct me, then I'll push
the change.  This should resolve the test regressions I introduced for
ARM.

I'll just go ahead and explicitly ACK this.

Thanks,
jeff

[PATCH] Fix PR68838

2016-11-29 Thread David Edelsohn

Separate from ulimit, 32 bit AIX processes have the concept of memory
segments.  By default, AIX devotes one 256MB segment to the data
section of an executable.  Some libstdc++ testcases allocate more than
that amount of memory.  Instead of individually fixing tests, this
patch always adds the AIX linker option to allocate two segments for
data, fixing a number of wchar testcases.

PR libstdc++/68838
* testsuite/lib/libstdc++.exp (DEFAULT_CXXFLAGS): Add -Wl,-bmaxdata on AIX.
* testsuite/23_containers/vector/profile/vector.cc: Remove
dg-additional-options.

Index: libstdc++.exp
===
--- libstdc++.exp   (revision 242964)
+++ libstdc++.exp   (working copy)
@@ -136,6 +136,9 @@
if { [string match "powerpc-*-darwin*" $target_triplet] } {
append DEFAULT_CXXFLAGS " -multiply_defined suppress"
}
+   if { [string match "powerpc-ibm-aix*" $target_triplet] }
+   append DEFAULT_CXXFLAGS " -Wl,-bmaxdata:0x2000"
+   }
 }

v3track DEFAULT_CXXFLAGS 2

Index: 23_containers/vector/profile/vector.cc
===
--- 23_containers/vector/profile/vector.cc  (revision 242964)
+++ 23_containers/vector/profile/vector.cc  (working copy)
@@ -2,8 +2,6 @@
 // Advice: set tmp as 1

 // { dg-options "-DITERATIONS=20" { target simulator } }
-// AIX requires higher memory limit
-// { dg-additional-options "-Wl,-bmaxdata:0x2000" { target {
powerpc-ibm-aix* } } }

 #ifndef ITERATIONS
 #define ITERATIONS 2000

Re: [Ping][PATCH 0/6][ARM] Implement support for ACLE Coprocessor Intrinsics

2016-11-29 Thread Andre Vieira (lists)

On 29/11/16 10:37, Kyrill Tkachov wrote:
> 
> On 29/11/16 10:35, Andre Vieira (lists) wrote:
>> On 21/11/16 08:42, Christophe Lyon wrote:
>>> Hi,
>>>
>>>
>>> On 17 November 2016 at 11:45, Kyrill Tkachov
>>>  wrote:
 On 17/11/16 10:31, Andre Vieira (lists) wrote:
> Hi Kyrill,
>
> On 17/11/16 10:11, Kyrill Tkachov wrote:
>> Hi Andre,
>>
>> On 09/11/16 10:00, Andre Vieira (lists) wrote:
>>> Tested the series by bootstrapping arm-none-linux-gnuabihf and
>>> found no
>>> regressions, also did a normal build for arm-none-eabi and ran the
>>> acle.exp tests for a Cortex-M3.
>> Can you please also do a full testsuite run on
>> arm-none-linux-gnueabihf.
>> Patches have to be tested by the whole testsuite.
> That's what I have done and meant to say with "Tested the series by
> bootstrapping arm-none-linux-gnuabihf and found no regressions". I
> compared gcc/g++/libstdc++ tests on a bootstrap with and without the
> patches.

 Ah ok, great.

> I'm happy to rerun the tests after a rebase when the patches get
> approved.
>>> FWIW, I ran a validation with the 6 patches applied, and saw no
>>> regression.
>>> Given the large number of new tests, I didn't check the full details.
>>>
>>> If you want to check that each configuration has the PASSes you expect,
>>> you can have a look at:
>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/242581-acle/report-build-info.html
>>>
>>>
>>> Thanks,
>>>
>>> Christophe
>>>
>>>
 Thanks,
 Kyrill

> Cheers,
> Andre

> 
> Hi Andre,
> 
>> Ping. (For the patch series).
> 
> Have you seen my review at:
> https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01778.html ?
> It might require some minor rework of some parts of the series.
> 
> Thanks,
> Kyrill
> 
> 
Hmm no I had not, must have accidently marked it as read...
Ill go work on the comments. Sorry for the ping.

Re: [PATCH] Remove uninitialized reads of is_leaf


On 11/29/2016 04:10 AM, Wilco Dijkstra wrote:

GCC caches the whether a function is a leaf in crtl->is_leaf. Using this
in the backend is best as leaf_function_p may not work correctly (eg. while
emitting prolog or epilog code).  There are many reads of crtl->is_leaf
before it is initialized.  Many targets do in targetm.frame_pointer_required
(eg. arm, aarch64, i386, mips, sparc), which is called before register
allocation by ira_setup_eliminable_regset and sched_init.

Additionally, SHRINK_WRAPPING_ENABLED calls targetm.have_simple_return,
which evaluates the condition of the simple_return instruction.  On ARM
this results in a call to use_simple_return_p which requires crtl->is_leaf
to be set correctly.

To fix this, initialize crtl->is_leaf in ira_setup_eliminable_regset and
early on in ira.  A bootstrap did not find any uninitialized reads of
crtl->is_leaf on Thumb-2.  A follow-up patch will remove incorrect uses
of leaf_function_p from the ARM backend.

Bootstrap OK (verified all reads of is_leaf in ARM backend are now after
initialization), OK for commit?

ChangeLog:
2016-11-29  Wilco Dijkstra  

* gcc/ira.c (ira_setup_eliminable_regset): Initialize crtl->is_leaf.
(ira): Move initialization of crtl->is_leaf earlier.

OK.
jeff

Re: [PING] [PATCH] Fix PR31096


On 11/22/2016 10:25 PM, Hurugalawadi, Naveen wrote:

Hi,

Please consider this as a personal reminder to review the patch
at following link and let me know your comments on the same.

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01049.html
I believe Richi asked for a small change after which you can consider 
the patch approved:


https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02320.html

jeff

Re: [PATCH] Remove uninitialized reads of is_leaf

Jeff Law wrote:
> On 11/29/2016 04:10 AM, Wilco Dijkstra wrote:
> > GCC caches the whether a function is a leaf in crtl->is_leaf. Using this
> > in the backend is best as leaf_function_p may not work correctly (eg. while
> > emitting prolog or epilog code). 

I forgot to ask, would it be reasonable to add an assert to check we're not in
a sequence in leaf_function_p? I guess this will trigger on several targets
(leaf_function_p is used in several backends) but it's a real bug if 
crtl->is_leaf is true.

Wilco

Re: [PATCH] Remove uninitialized reads of is_leaf


On 11/29/2016 11:39 AM, Wilco Dijkstra wrote:

Jeff Law wrote:

On 11/29/2016 04:10 AM, Wilco Dijkstra wrote:

GCC caches the whether a function is a leaf in crtl->is_leaf. Using this
in the backend is best as leaf_function_p may not work correctly (eg. while
emitting prolog or epilog code).


I forgot to ask, would it be reasonable to add an assert to check we're not in
a sequence in leaf_function_p? I guess this will trigger on several targets
(leaf_function_p is used in several backends) but it's a real bug if
crtl->is_leaf is true.
Can it wait for the next stage1?  I'd hate to start tripping the assert 
all over the place at this point in the release cycle.


jeff

Re: [PATCH 7/9] Add RTL-error-handling to host

2016-11-29 Thread David Malcolm

On Tue, 2016-11-29 at 18:23 +0100, Bernd Schmidt wrote:
> On 11/29/2016 06:20 PM, David Malcolm wrote:
> > 
> > if that distinction makes sense.  Clearly we already have a
> > diagnostics
> > subsystem on the host; what this patch is adding is the separate,
> > rtl-s
> > pecific diagnostic subsystem to cc1 on the host.
> 
> So that still seems odd to me. Why not use the normal diagnostics 
> subsystem, and add whatever you need from it to errors.c for use from
> the generator programs? What exactly makes it "rtl-specific"?

The main issue is that the normal diagnostics subsystem tracks
locations using location_t (aka libcpp's source_location), rather than
read-md.h's struct file_location, so we'd need to start using libcpp
from the generator programs, porting the location tracking to using
libcpp (e.g. creating linemaps for the files).

There would also be various Makefile.in tweaking to build various files
twice; hopefully that wouldn't lead to any unexpected issues.

Quoting from:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00648.html

> There seem to be two ways to do this:
>
>   (A) build the "light" diagnostics system (errors.c) for the host as
> well as build machine, and link it with the RTL reader there, so
there
> are two parallel diagnostics subsystems.
>
>   (B) build the "real" diagnostics system (diagnostics*) for the
> *build* machine as well as the host, and use it from the gen* tools,
> eliminating the "light" system, and porting the gen* tools to use
> libcpp for location tracking.
>
> Approach (A) seems to be simpler, which is what this part of the
patch
> does.
>
> I've experimented with approach (B).  I think it's doable, but it's
> much more invasive (perhaps needing a libdiagnostics.a and a
> build/libdiagnostics.a in gcc/Makefile.in), so I hope this can be
> followup work.
>
> I can split the relevant parts out into a separate patch, but I was
> wondering if either of you had a strong opinion on (A) vs (B) before
I
> do so?

This patch implements approach (A).

Would you prefer that I went with approach (B), or is approach (A)
acceptable?

Thanks
Dave

[PATCH], PR 78594, Fix storing QImode/HImode on ISA 3.0/power9

2016-11-29 Thread Michael Meissner

I was developing the next round of ISA 3.0 code changes to use the vector
extract byte, half word, and word instructions (VEXTU{B,H,W}{R,L}X) that
deposit the value into a general purpose register instead of a vector register,
and I was running the changes through the simulator.  I discovered that my
previous change to allow QImode/HImode did not work if the value was in a
traditional Altivec register.

This fixes the problem that I noticed.  I didn't bother doing the full
bootstrap and check, since it only affects the power9 target.  Can I check this
in?

2016-11-29  Michael Meissner  

PR target/78594
* config/rs6000/rs6000.md (mov_internal, QHI iterator): Add
'x' to stxsix print pattern, so that QImode and HImode values
residing in traditional altivec registers can be stored
correctly.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 242942)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -6863,7 +6863,7 @@ (define_insn "*mov_internal"
lz%U1%X1 %0,%1
lxsizx %x0,%y1
st%U0%X0 %1,%0
-   stxsix %1,%y0
+   stxsix %x1,%y0
li %0,%1
xxlor %x0,%x1,%x1
xxspltib %x0,0

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[PATCH, i386]: Move mask ops from i386.md to sse.md ...

2016-11-29 Thread Uros Bizjak

... and fix gcc.target/i386/avx512f-kmovw-1.c scan-asm failure.

2016-11-29  Uros Bizjak  

* config/i386/sse.md (UNSPEC_MASKOP): Move from i386.md.
(mshift): Ditto.
(SWI1248_AVX512BWDQ): Ditto.
(SWI1248_AVX512BW): Ditto.
(k): Ditto.
(kandn): Ditto.
(kxnor): Ditto.
(knot): Ditto.
(*k): Ditto.
(kortestzhi, kortestchi): Ditto.
(kunpckhi, kunpcksi, kunpckdi): Ditto.

testsuite/ChangeLog:

2016-11-29  Uros Bizjak  

* gcc.target/i386/avx512f-kmovw-1.c (avx512f_test):
Force value through k register.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 242963)
+++ config/i386/i386.md (working copy)
@@ -186,9 +186,6 @@
   UNSPEC_PDEP
   UNSPEC_PEXT
 
-  ;; For AVX512F support
-  UNSPEC_KMASKOP
-
   UNSPEC_BNDMK
   UNSPEC_BNDMK_ADDR
   UNSPEC_BNDSTX
@@ -921,9 +918,6 @@
 (define_code_attr shift [(ashift "sll") (lshiftrt "shr") (ashiftrt "sar")])
 (define_code_attr vshift [(ashift "sll") (lshiftrt "srl") (ashiftrt "sra")])
 
-;; Mask variant left right mnemonics
-(define_code_attr mshift [(ashift "shiftl") (lshiftrt "shiftr")])
-
 ;; Mapping of rotate operators
 (define_code_iterator any_rotate [rotate rotatert])
 
@@ -966,15 +960,6 @@
 ;; All integer modes.
 (define_mode_iterator SWI1248x [QI HI SI DI])
 
-;; All integer modes with AVX512BW/DQ.
-(define_mode_iterator SWI1248_AVX512BWDQ
-  [(QI "TARGET_AVX512DQ") HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
-
-;; All integer modes with AVX512BW, where HImode operation
-;; can be used instead of QImode.
-(define_mode_iterator SWI1248_AVX512BW
-  [QI HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")])
-
 ;; All integer modes without QImode.
 (define_mode_iterator SWI248x [HI SI DI])
 
@@ -2489,11 +2474,6 @@
   ]
   (const_string "SI")))])
 
-(define_expand "kmovw"
-  [(set (match_operand:HI 0 "nonimmediate_operand")
-   (match_operand:HI 1 "nonimmediate_operand"))]
-  "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
-
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,k,k ,r,m")
(match_operand:HI 1 "general_operand"  "r ,rn,rm,rn,r,km,k,k"))]
@@ -8061,28 +8041,6 @@
   operands[3] = gen_lowpart (QImode, operands[3]);
 })
 
-(define_insn "k"
-  [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
-   (any_logic:SWI1248_AVX512BW
- (match_operand:SWI1248_AVX512BW 1 "register_operand" "k")
- (match_operand:SWI1248_AVX512BW 2 "register_operand" "k")))
-   (unspec [(const_int 0)] UNSPEC_KMASKOP)]
-  "TARGET_AVX512F"
-{
-  if (get_attr_mode (insn) == MODE_HI)
-return "kw\t{%2, %1, %0|%0, %1, %2}";
-  else
-return "k\t{%2, %1, %0|%0, %1, %2}";
-}
-  [(set_attr "type" "msklog")
-   (set_attr "prefix" "vex")
-   (set (attr "mode")
- (cond [(and (match_test "mode == QImode")
-(not (match_test "TARGET_AVX512DQ")))
-  (const_string "HI")
-  ]
-  (const_string "")))])
-
 ;; %%% This used to optimize known byte-wide and operations to memory,
 ;; and sometimes to QImode registers.  If this is considered useful,
 ;; it should be done with splitters.
@@ -8576,29 +8534,6 @@
   operands[2] = gen_lowpart (QImode, operands[2]);
 })
 
-(define_insn "kandn"
-  [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
-   (and:SWI1248_AVX512BW
- (not:SWI1248_AVX512BW
-   (match_operand:SWI1248_AVX512BW 1 "register_operand" "k"))
- (match_operand:SWI1248_AVX512BW 2 "register_operand" "k")))
-   (unspec [(const_int 0)] UNSPEC_KMASKOP)]
-  "TARGET_AVX512F"
-{
-  if (get_attr_mode (insn) == MODE_HI)
-return "kandnw\t{%2, %1, %0|%0, %1, %2}";
-  else
-return "kandn\t{%2, %1, %0|%0, %1, %2}";
-}
-  [(set_attr "type" "msklog")
-   (set_attr "prefix" "vex")
-   (set (attr "mode")
- (cond [(and (match_test "mode == QImode")
-(not (match_test "TARGET_AVX512DQ")))
- (const_string "HI")
-  ]
-  (const_string "")))])
-
 (define_insn_and_split "*andndi3_doubleword"
   [(set (match_operand:DI 0 "register_operand" "=r")
(and:DI
@@ -8987,92 +8922,6 @@
(set_attr "type" "alu")
(set_attr "modrm" "1")
(set_attr "mode" "QI")])
-
-(define_insn "kxnor"
-  [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
-   (not:SWI1248_AVX512BW
- (xor:SWI1248_AVX512BW
-   (match_operand:SWI1248_AVX512BW 1 "register_operand" "k")
-   (match_operand:SWI1248_AVX512BW 2 "register_operand" "k"
-   (unspec [(const_int 0)] UNSPEC_KMASKOP)]
-  "TARGET_AVX512F"
-{
-  if (get_attr_mode (insn) == MODE_HI)
-return "kxnorw\t{%2, %1, %0|%0, %1, %2}";
-  else
-return "kxnor\t{%2, %1, %0|%0, %1, %2}";
-}
-  [(set_attr "type" "msklog")
-   (set_attr "prefix" "vex")
-   (set (a

Re: [PATCH] combine: Tweak change_zero_ext

2016-11-29 Thread Uros Bizjak

> 2016-11-26  Segher Boessenkool  
>
> * combine.c (change_zero_ext): Also handle extends from a subreg
> to a mode bigger than that of the operand of the subreg.

This patch introduced:

FAIL: gcc.target/i386/pr44578.c (internal compiler error)

on i686 (or x86_64 32bit multi-lib).

./cc1 -O2 -mtune=athlon64 -m32 -quiet pr44578.c
pr44578.c: In function ‘test’:
pr44578.c:18:1: internal compiler error: in gen_rtx_SUBREG, at emit-rtl.c:908
 }
 ^
0x81493b gen_rtx_SUBREG(machine_mode, rtx_def*, int)
/home/uros/gcc-svn/trunk/gcc/emit-rtl.c:908
0x122609f change_zero_ext
/home/uros/gcc-svn/trunk/gcc/combine.c:11260
0x1226207 recog_for_combine
/home/uros/gcc-svn/trunk/gcc/combine.c:11346
0x1236db3 try_combine
/home/uros/gcc-svn/trunk/gcc/combine.c:3501
0x123a3e0 combine_instructions
/home/uros/gcc-svn/trunk/gcc/combine.c:1265
0x123a3e0 rest_of_handle_combine
/home/uros/gcc-svn/trunk/gcc/combine.c:14581
0x123a3e0 execute
/home/uros/gcc-svn/trunk/gcc/combine.c:14626

Uros.

[PATCH] Another debug info stv fix (PR rtl-optimization/78547)

Hi!

The following testcase ICEs because DECL_RTL/DECL_INCOMING_RTL are adjusted
by the stv pass through the PUT_MODE modifications, which means that for
var-tracking.c they contain a bogus mode.

Fixed by wrapping those into TImode subreg or adjusting the MEMs to have the
correct mode.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-29  Jakub Jelinek  

PR rtl-optimization/78547
* config/i386/i386.c (convert_scalars_to_vectors): If any
insns have been converted, adjust all parameter's DEC_RTL and
DECL_INCOMING_RTL back from V1TImode to TImode if the parameters have
TImode.

--- gcc/config/i386/i386.c.jj   2016-11-29 08:31:58.0 +0100
+++ gcc/config/i386/i386.c  2016-11-29 12:21:36.867323776 +0100
@@ -4075,6 +4075,39 @@ convert_scalars_to_vector ()
crtl->stack_alignment_needed = 128;
   if (crtl->stack_alignment_estimated < 128)
crtl->stack_alignment_estimated = 128;
+  /* Fix up DECL_RTL/DECL_INCOMING_RTL of arguments.  */
+  if (TARGET_64BIT)
+   for (tree parm = DECL_ARGUMENTS (current_function_decl);
+parm; parm = DECL_CHAIN (parm))
+ {
+   if (TYPE_MODE (TREE_TYPE (parm)) != TImode)
+ continue;
+   if (DECL_RTL_SET_P (parm)
+   && GET_MODE (DECL_RTL (parm)) == V1TImode)
+ {
+   rtx r = DECL_RTL (parm);
+   if (REG_P (r))
+ SET_DECL_RTL (parm, gen_rtx_SUBREG (TImode, r, 0));
+   else
+ {
+   gcc_assert (MEM_P (r));
+   SET_DECL_RTL (parm, adjust_address_nv (r, TImode, 0));
+ }
+ }
+   if (DECL_INCOMING_RTL (parm)
+   && GET_MODE (DECL_INCOMING_RTL (parm)) == V1TImode)
+ {
+   rtx r = DECL_INCOMING_RTL (parm);
+   if (REG_P (r))
+ DECL_INCOMING_RTL (parm) = gen_rtx_SUBREG (TImode, r, 0);
+   else
+ {
+   gcc_assert (MEM_P (r));
+   DECL_INCOMING_RTL (parm)
+ = change_address (r, TImode, NULL_RTX);
+ }
+ }
+ }
 }
 
   return 0;
--- gcc/testsuite/gcc.dg/pr78547.c.jj   2016-11-29 12:26:26.544662630 +0100
+++ gcc/testsuite/gcc.dg/pr78547.c  2016-11-29 12:26:09.0 +0100
@@ -0,0 +1,18 @@
+/* PR rtl-optimization/78547 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-Os -g -freorder-blocks-algorithm=simple -Wno-psabi" } */
+/* { dg-additional-options "-mstringop-strategy=libcall" { target i?86-*-* 
x86_64-*-* } } */
+
+typedef unsigned __int128 u128;
+typedef unsigned __int128 V __attribute__ ((vector_size (64)));
+
+V
+foo (u128 a, u128 b, u128 c, V d)
+{
+  V e = (V) {a};
+  V f = e & 1;
+  e = 0 != e;
+  c = c;
+  f = f << ((V) {c} & 7);
+  return f + e;
+}

Jakub

[PATCH] Fix format_integer (PR tree-optimization/78586)

Hi!

As mentioned in the PR, the LSHIFT_EXPR computation of values that
will need longest or shortest string is both incorrect (it shifts
integer_one_node left, so for precisions above precision of integer
it returns 0 (not to mention that it is invalid GENERIC, because the types
of first operand and result have to match)) and unnecessary - every integral
type already has TYPE_MIN_VALUE and TYPE_MAX_VALUE readily available.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, in the PR I've raised various further questions, Martin, can you look
at them?

2016-11-29  Jakub Jelinek  

PR tree-optimization/78586
* gimple-ssa-sprintf.c (format_integer): Use TYPE_MAX_VALUE or
TYPE_MIN_VALUE or build_all_ones_cst instead of folding LSHIFT_EXPR.
Don't build_int_cst min/max twice.  Formatting fix.

* gcc.c-torture/execute/pr78586.c: New test.

--- gcc/gimple-ssa-sprintf.c.jj 2016-11-28 23:50:20.0 +0100
+++ gcc/gimple-ssa-sprintf.c2016-11-29 15:54:17.605892667 +0100
@@ -1068,7 +1068,8 @@ format_integer (const conversion_spec &s
   tree argmin = NULL_TREE;
   tree argmax = NULL_TREE;
 
-  if (arg && TREE_CODE (arg) == SSA_NAME
+  if (arg
+  && TREE_CODE (arg) == SSA_NAME
   && TREE_CODE (argtype) == INTEGER_TYPE)
 {
   /* Try to determine the range of values of the integer argument
@@ -1090,12 +1091,8 @@ format_integer (const conversion_spec &s
 the upper bound for %i but -3 for %u.  */
  if (wi::neg_p (min) && !wi::neg_p (max))
{
- argmin = build_int_cst (argtype, wi::fits_uhwi_p (min)
- ? min.to_uhwi () : min.to_shwi ());
-
- argmax = build_int_cst (argtype, wi::fits_uhwi_p (max)
- ? max.to_uhwi () : max.to_shwi ());
-
+ argmin = res.argmin;
+ argmax = res.argmax;
  int minbytes = format_integer (spec, res.argmin).range.min;
  int maxbytes = format_integer (spec, res.argmax).range.max;
  if (maxbytes < minbytes)
@@ -1154,21 +1151,25 @@ format_integer (const conversion_spec &s
   int typeprec = TYPE_PRECISION (dirtype);
   int argprec = TYPE_PRECISION (argtype);
 
-  if (argprec < typeprec || POINTER_TYPE_P (argtype))
+  if (argprec < typeprec)
{
- if (TYPE_UNSIGNED (argtype))
+ if (POINTER_TYPE_P (argtype))
argmax = build_all_ones_cst (argtype);
+ else if (TYPE_UNSIGNED (argtype))
+   argmax = TYPE_MAX_VALUE (argtype);
  else
-   argmax = fold_build2 (LSHIFT_EXPR, argtype, integer_one_node,
- build_int_cst (integer_type_node,
-argprec - 1));
+   argmax = TYPE_MIN_VALUE (argtype);
}
   else
{
- argmax = fold_build2 (LSHIFT_EXPR, dirtype, integer_one_node,
-   build_int_cst (integer_type_node,
-  typeprec - 1));
+ if (POINTER_TYPE_P (dirtype))
+   argmax = build_all_ones_cst (dirtype);
+ else if (TYPE_UNSIGNED (dirtype))
+   argmax = TYPE_MAX_VALUE (dirtype);
+ else
+   argmax = TYPE_MIN_VALUE (dirtype);
}
+
   res.argmin = argmin;
   res.argmax = argmax;
 }
--- gcc/testsuite/gcc.c-torture/execute/pr78586.c.jj2016-11-29 
16:11:35.283742461 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr78586.c   2016-11-29 
16:11:16.0 +0100
@@ -0,0 +1,17 @@
+/* PR tree-optimization/78586 */
+
+void
+foo (unsigned long x)
+{
+  char a[30];
+  unsigned long b = __builtin_sprintf (a, "%lu", x);
+  if (b != 4)
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  foo (1000);
+  return 0;
+}

Jakub

Re: [PATCH] Remove uninitialized reads of is_leaf

Jeff Law wrote:
> On 11/29/2016 11:39 AM, Wilco Dijkstra wrote:
> > I forgot to ask, would it be reasonable to add an assert to check we're not 
> > in
> > a sequence in leaf_function_p? I guess this will trigger on several targets
> > (leaf_function_p is used in several backends) but it's a real bug if
> > crtl->is_leaf is true.
> Can it wait for the next stage1?  I'd hate to start tripping the assert 
> all over the place at this point in the release cycle.

Yes I don't think it is urgent as the incorrect value returned would likely 
make a leaf
function save/restore the return address unnecessarily. It starts to generate 
incorrect
code on ARM if you remove the if (reload_completed) test in 
arm_get_frame_offsets
(which should just be an optimization to avoid recomputing the frame layout 
repeatedly,
not essential for correctness).

Wilco

Re: [PATCH] correct handling of non-constant width and precision (pr 78521)

2016-11-29 Thread Martin Sebor


On 11/29/2016 09:56 AM, Martin Sebor wrote:

On 11/28/2016 05:42 PM, Joseph Myers wrote:

On Sun, 27 Nov 2016, Martin Sebor wrote:


Finally, the patch also tightens up the constraint on the upper bound
of bounded functions like snprintf to be INT_MAX.  The functions cannot
produce output in excess of INT_MAX + 1 bytes and some implementations
(e.g., Solaris) fail with EINVAL when the bound is INT_MAX or more.
This is the subject of PR 78520.


Note that failing with large bounds is questionable (there is an apparent
conflict between ISO C, where passing a large bound seems valid, and
POSIX, where large bounds require errors; see
; I'm not sure if any liaison
issue for this ever got passed to WG14).


Thanks!  That's useful background.  Let me check with Nick to see
is he (as the POSIX/WG14 liaison) plans to submit it.  I can also
write it up for the next WG14 meeting if we or the Austin Group
feel like WG14 should clarify or change things.


I've been looking at the original BSD sources where snprintf came
from (AFAICT).  The first implementation I could find is in Net/2
from 1988.  It returns EOF when the size after conversion to int
is less than 1.  The same code is still in 4.4BSD.

Early UNIX implementations also have the limitation that the buffer
size maintained by struct FILE is an int.  Since snprintf on these
early implementations usually uses vfprintf to do the work (with
the count being set to the snprinf bound), it can't store more than
INT_MAX bytes without overflowing the counter.

http://minnie.tuhs.org/cgi-bin/utree.pl?file=Net2/usr/src/lib/libc/stdio/snprintf.c

It looks to me like the POSIX spec is faithful to the historical
implementations and C should consider either tightening up its
constraints or make the behavior implementation-defined to allow
for more modern implementations that don't have this restriction.

Martin

Re: Pretty printers for versioned namespace

2016-11-29 Thread Jonathan Wakely


On 28/11/16 22:19 +0100, François Dumont wrote:

Hi

   Here is a patch to fix pretty printers when versioned namespace is 
activated.


   You will see that I have hesitated in making the fix independant 
of the version being used. In source files you will find (__7::)? 
patterns while in xmethods.py I chose (__\d+::)? making it ready for 
__8 and forward. Do you want to generalize one option ? If so which 
one ?


I don't really mind, but I note that the point of the path
libstdcxx/v6/printers.py was that we'd have different printers for v7,
v8 etc. ... I think it's simpler to keep everything in one place
though. 

   At the moment version namespace is visible within gdb, it displays 
for instance 'std::__7::string'. I am pretty sure we could hide it, is 
it preferable ? I would need some time to do so as I am neither a 
python nor regex expert.


It's fine to display it.

   I am not fully happy with the replication in printers.py of 
StdRbtreeIteratorPrinter and 
StdExpAnyPrinter(SingleObjContainerPrinter in respectively 
StdVersionedRbtreeIteratorPrinter and 
StdExpVerAnyPrinter(SingleObjContainerPrinter just to adapt 2 lines 
where regex is not an option. We could surely keep only one and pass 
it '' or '__7'. But as I said I am not a python expert so any help 
would be appreciated.


We definitely want to avoid that duplication. For
StdRbtreeIteratorPrinter you can just look at 'typename' and see
whether it starts with "std::__7" or not. If it does, you need to lookup
std::__7::_Rb_tree_node<...>, otherwise you need to lookup
std::_Rb_tree_node<...> instead.

For StdExpAnyPrinter just do two replacements: first replace
std::string with the result of gdb.lookup_type('std::string') and then
replace std::__7::string with the result of looking that up. Are you
sure that's even needed though? Does std::__7::string actually appear
in the manager function's name? I would expect it to appear as
std::__7::basic_string, std::__7::allocator 
> >
which doesn't need to be expanded anyway. So I think you can just
remove your StdExpVerAnyPrinter.



--- a/libstdc++-v3/testsuite/lib/gdb-test.exp
+++ b/libstdc++-v3/testsuite/lib/gdb-test.exp
@@ -74,6 +74,14 @@ proc whatis-test {var result} {
lappend gdb_tests $var $result whatis
}

+# A test of 'whatis'.  This tests a type rather than a variable through a
+# regexp.


Please use "regular expression" here rather than "regexp".


+proc whatis-regexp-test {var result} {
+global gdb_tests
+
+lappend gdb_tests $var $result whatisrexp
+}
+


And something other than "whatisrexp" e.g. "whatis_regexp" would be
OK, but "rexp" is not a conventional abbreviation.

Re: [libstdc++, testsuite] Add dg-require-thread-fence

2016-11-29 Thread Jonathan Wakely


On 16/11/16 22:18 +0100, Christophe Lyon wrote:

On 15 November 2016 at 12:50, Jonathan Wakely  wrote:

On 14/11/16 14:32 +0100, Christophe Lyon wrote:


On 20 October 2016 at 19:40, Jonathan Wakely  wrote:


On 20/10/16 10:33 -0700, Mike Stump wrote:



On Oct 20, 2016, at 9:34 AM, Jonathan Wakely  wrote:




On 20/10/16 09:26 -0700, Mike Stump wrote:



On Oct 20, 2016, at 5:20 AM, Jonathan Wakely 
wrote:




I am considering leaving this in the ARM backend to force people to
think what they want to do about thread safety with statics and C++
on bare-metal systems.




The quoting makes it look like those are my words, but I was quoting
Ramana from https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02751.html


Not quite in the GNU spirit?  The port people should decide the best
way
to get as much functionality as possible and everything should just
work, no
sharp edges.

Forcing people to think sounds like a sharp edge?




I'm inclined to agree, but we are talking about bare metal systems,




So?  gcc has been doing bare metal systems for more than 2 years now.
It
is pretty good at it.  All my primary targets today are themselves bare
metal systems (I test with newlib).


where there is no one-size-fits-all solution.




Configurations are like ice cream cones.  Everyone gets their flavor no
matter how weird or strange.  Putting nails in a cone because you don't
know
if they like vanilla or chocolate isn't reasonable.  If you want, make
two
flavors, and vend two, if you want to just do one, pick the flavor and
vend
it.  Put an enum #define default_flavor vanilla, and you then have
support
for any flavor you want.  Want to add a configure option for the flavor
select, add it.  You want to make a -mflavor=chocolate option, add it.
gcc
is literally littered with these things.




Like I said, you can either build the library with
-fno-threadsafe-statics or you can provide a definition of the missing
symbol.


I gave this a try (using CXXFLAGS_FOR_TARGET=-fno-threadsafe-statics).
It seems to do the trick indeed: almost all tests now pass, the flag is
added
to testcase compilation.

Among the 6 remaining failures, I noticed these two:
- experimental/type_erased_allocator/2.cc: still complains about the
missing
__sync_synchronize. Does it need dg-require-thread-fence?



Yes, I think that test actually uses atomics directly, so does depend
on the fence.


I've attached the patch to achieve this.
Is it OK?


Yes, OK, thanks.


- abi/header_cxxabi.c complains because the option is not valid for C.
I can see the test is already skipped for other C++-only options: it is OK
if I submit a patch to skip it if -fno-threadsafe-statics is used?



Yes, it makes sense there too.


This one is not as obvious as I hoped. I tried:
-// { dg-skip-if "invalid options for C" { *-*-* } { "-std=c++??"
"-std=gnu++??" } }
+// { dg-skip-if "invalid options for C" { *-*-* } { "-std=c++??"
"-std=gnu++??" "-fno-threadsafe-statics" } }

but it does not work.

I set CXXFLAGS_FOR_TARGET=-fno-threadsafe-statics
before running GCC's configure.

This results in -fno-threadsafe-statics being used when compiling the tests,
but dg-skip-if does not consider it: it would if I passed it via
runtestflags/target-board, but then it would mean passing this flag
to all tests, not only the c++ ones, leading to errors everywhere.

Am I missing something?


I'm not sure how to deal with that.

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-29 Thread Segher Boessenkool

Hi James, Kyrill,

On Tue, Nov 29, 2016 at 10:57:33AM +, James Greenhalgh wrote:
> > +static sbitmap
> > +aarch64_components_for_bb (basic_block bb)
> > +{
> > +  bitmap in = DF_LIVE_IN (bb);
> > +  bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
> > +  bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
> > +
> > +  sbitmap components = sbitmap_alloc (V31_REGNUM + 1);
> > +  bitmap_clear (components);
> > +
> > +  /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
> > +  for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++)
> 
> The use of R0_REGNUM and V31_REGNUM scare me a little bit, as we're hardcoding
> where the end of the register file is (does this, for example, fall apart
> with the SVE work that was recently posted). Something like a
> LAST_HARDREG_NUM might work?

Components and registers aren't the same thing (you can have components
for things that aren't just a register save, e.g. the frame setup, stack
alignment, save of some non-GPR via a GPR, PIC register setup, etc.)
The loop here should really only cover the non-volatile registers, and
there should be some translation from register number to component number
(it of course is convenient to have a 1-1 translation for the GPRs and
floating point registers).  For rs6000 many things in the backend already
use non-symbolic numbers for the FPRs and GPRs, so that is easier there.

> > +static void
> > +aarch64_disqualify_components (sbitmap, edge, sbitmap, bool)
> > +{
> > +}
> 
> Is there no default "do nothing" hook for this?

I can make the shrink-wrap code do nothing here if this hook isn't
defined, if you want?

Segher

Re: [v3 PATCH] Implement LWG 2534, Constrain rvalue stream operators.

2016-11-29 Thread Jonathan Wakely


On 27/11/16 20:50 +0200, Ville Voutilainen wrote:

   Implement LWG 2534, Constrain rvalue stream operators.
   * include/std/istream (__is_convertible_to_basic_istream): New.
   (__is_extractable): Likewise.
   (operator>>(basic_istream<_CharT, _Traits>&&, _Tp&&)):
   Turn the stream parameter into a template parameter
   and constrain.
   * include/std/ostream /__is_convertible_to_basic_ostream): New.
   (__is_insertable): Likewise.
   (operator<<(basic_ostream<_CharT, _Traits>&&, const _Tp&)):
   Turn the stream parameter into a template parameter
   and constrain.
   * testsuite/27_io/basic_istream/extractors_other/char/4.cc: New.
   * testsuite/27_io/basic_istream/extractors_other/wchar_t/4.cc:
   Likewise.
   * testsuite/27_io/basic_ostream/inserters_other/char/6.cc: Likewise.
   * testsuite/27_io/basic_ostream/inserters_other/wchar_t/6.cc: Likewise.


OK, thanks.

[PATCH] Fix x86_64 fix_debug_reg_uses (PR rtl-optimization/78575)

Hi!

The x86_64 stv pass uses PUT_MODE to change REGs and MEMs in place to affect
all setters and users, but that is undesirable in debug insns which are
intentionally ignored during the analysis and we should keep using correct
modes (TImode) instead of the new one (V1TImode).

The current fix_debug_reg_uses implementation just assumes such a pseudo
can appear only directly in the VAR_LOCATION's second operand, but it can of
course appear anywhere in the expression, the whole expression doesn't have
to be TImode either (e.g. on the testcase it is a QImode comparison of
originally TImode pseudo with CONST_INT, which stv incorrectly changes into
comparison of V1TImode with CONST_INT).

The following patch fixes that and also fixes an issue if the pseudo appears
multiple times in the debug info that the rescan could break traversal of
further uses.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-29  Jakub Jelinek  

PR rtl-optimization/78575
* config/i386/i386.c (timode_scalar_chain::fix_debug_reg_uses): Use
DF infrastructure to wrap all V1TImode reg uses into TImode subreg
if not already wrapped in a subreg.  Make sure df_insn_rescan does not
affect further iterations.

* gcc.dg/pr78575.c: New test.

--- gcc/config/i386/i386.c.jj   2016-11-28 10:59:08.0 +0100
+++ gcc/config/i386/i386.c  2016-11-29 08:31:58.061278522 +0100
@@ -3831,30 +3831,32 @@ timode_scalar_chain::fix_debug_reg_uses
   if (!flag_var_tracking)
 return;
 
-  df_ref ref;
-  for (ref = DF_REG_USE_CHAIN (REGNO (reg));
-   ref;
-   ref = DF_REF_NEXT_REG (ref))
+  df_ref ref, next;
+  for (ref = DF_REG_USE_CHAIN (REGNO (reg)); ref; ref = next)
 {
   rtx_insn *insn = DF_REF_INSN (ref);
+  /* Make sure the next ref is for a different instruction,
+ so that we're not affected by the rescan.  */
+  next = DF_REF_NEXT_REG (ref);
+  while (next && DF_REF_INSN (next) == insn)
+   next = DF_REF_NEXT_REG (next);
+
   if (DEBUG_INSN_P (insn))
{
  /* It may be a debug insn with a TImode variable in
 register.  */
- rtx val = PATTERN (insn);
- if (GET_MODE (val) != TImode)
-   continue;
- gcc_assert (GET_CODE (val) == VAR_LOCATION);
- rtx loc = PAT_VAR_LOCATION_LOC (val);
- /* It may have been converted to TImode already.  */
- if (GET_MODE (loc) == TImode)
-   continue;
- gcc_assert (REG_P (loc)
- && GET_MODE (loc) == V1TImode);
- /* Convert V1TImode register, which has been updated by a SET
-insn before, to SUBREG TImode.  */
- PAT_VAR_LOCATION_LOC (val) = gen_rtx_SUBREG (TImode, loc, 0);
- df_insn_rescan (insn);
+ bool changed = false;
+ for (; ref != next; ref = DF_REF_NEXT_REG (ref))
+   {
+ rtx *loc = DF_REF_LOC (ref);
+ if (REG_P (*loc) && GET_MODE (*loc) == V1TImode)
+   {
+ *loc = gen_rtx_SUBREG (TImode, *loc, 0);
+ changed = true;
+   }
+   }
+ if (changed)
+   df_insn_rescan (insn);
}
 }
 }
--- gcc/testsuite/gcc.dg/pr78575.c.jj   2016-11-29 08:36:25.821932436 +0100
+++ gcc/testsuite/gcc.dg/pr78575.c  2016-11-29 08:35:35.0 +0100
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/78575 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -g -Wno-psabi" } */
+
+typedef unsigned __int128 V __attribute__((vector_size(64)));
+
+V g;
+
+void
+foo (V v)
+{
+  unsigned __int128 x = 1;
+  int c = v[1] <= ~x;
+  v &= v[1];
+  g = v;
+}

Jakub

Go patch committed: Merge to gccgo branch

2016-11-29 Thread Ian Lance Taylor

I merged GCC trunk revision 242967 to the gccgo branch.

Ian

Re: [PATCH 7/9] Add RTL-error-handling to host

2016-11-29 Thread Bernd Schmidt


On 11/29/2016 07:53 PM, David Malcolm wrote:


Would you prefer that I went with approach (B), or is approach (A)
acceptable?


Well, I was hoping there'd be an approach (C) where the read-rtl code 
uses whatever diagnostics framework that is available. Maybe it'll turn 
out that's too hard. Somehow the current patch looked strange to me, but 
if there's no easy alternative maybe we'll have to go with it.



Bernd

Re: [PATCH] combine: Tweak change_zero_ext

On 29 November 2016 at 20:38, Uros Bizjak  wrote:
>> 2016-11-26  Segher Boessenkool  
>>
>> * combine.c (change_zero_ext): Also handle extends from a subreg
>> to a mode bigger than that of the operand of the subreg.
>
> This patch introduced:
>
> FAIL: gcc.target/i386/pr44578.c (internal compiler error)
>
> on i686 (or x86_64 32bit multi-lib).
>
> ./cc1 -O2 -mtune=athlon64 -m32 -quiet pr44578.c
> pr44578.c: In function ‘test’:
> pr44578.c:18:1: internal compiler error: in gen_rtx_SUBREG, at emit-rtl.c:908
>  }
>  ^
> 0x81493b gen_rtx_SUBREG(machine_mode, rtx_def*, int)
> /home/uros/gcc-svn/trunk/gcc/emit-rtl.c:908
> 0x122609f change_zero_ext
> /home/uros/gcc-svn/trunk/gcc/combine.c:11260
> 0x1226207 recog_for_combine
> /home/uros/gcc-svn/trunk/gcc/combine.c:11346
> 0x1236db3 try_combine
> /home/uros/gcc-svn/trunk/gcc/combine.c:3501
> 0x123a3e0 combine_instructions
> /home/uros/gcc-svn/trunk/gcc/combine.c:1265
> 0x123a3e0 rest_of_handle_combine
> /home/uros/gcc-svn/trunk/gcc/combine.c:14581
> 0x123a3e0 execute
> /home/uros/gcc-svn/trunk/gcc/combine.c:14626
>
> Uros.

Hi,

I'm seeing a similar error on aarch64:
FAIL: gcc.target/aarch64/advsimd-intrinsics/vduph_lane.c   -O1
(internal compiler error)
with the same backtrace.

Christophe

Re: [PATCH] Delete GCJ


On 11/21/2016 04:23 PM, Matthias Klose wrote:

On 21.11.2016 18:16, Rainer Orth wrote:

Hi Matthias,


ahh, didn't see that :-/ Now fixed, is this clearer now?

The options @option{--with-target-bdw-gc-include} and
@option{--with-target-bdw-gc-lib} must always specified together for

   ^ be


thanks to all sorting out the documentation issues. Now attaching the updated
diff. Ok to commit?

Matthias



2016-11-19  Matthias Klose  

* Makefile.def: Remove reference to boehm-gc target module.
* configure.ac: Include pkg.m4, check for --with-target-bdw-gc
options and for the bdw-gc pkg-config module.
* configure: Regenerate.
* Makefile.in: Regenerate.

gcc/

2016-11-19  Matthias Klose  

* doc/install.texi: Document configure options --enable-objc-gc
and --with-target-bdw-gc.

config/

2016-11-19  Matthias Klose  

* pkg.m4: New file.

libobjc/

2016-11-19  Matthias Klose  

* configure.ac (--enable-objc-gc): Allow to configure with a
system provided boehm-gc.
* configure: Regenerate.
* Makefile.in (OBJC_BOEHM_GC_LIBS): Get value from configure.
* gc.c: Include system bdw-gc headers.
* memory.c: Likewise
* objects.c: Likewise

boehm-gc/

2016-11-19  Matthias Klose  

Remove

OK.

Jeff

Re: [PATCH v2] Fix PR78588 - rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too large for 64-bit type

2016-11-29 Thread Segher Boessenkool

On Tue, Nov 29, 2016 at 05:00:05PM +0100, Markus Trippelsdorf wrote:
> Building gcc with -fsanitize=undefined shows:
>  rtlanal.c:5210:38: runtime error: shift exponent 4294967295 is too large for 
> 64-bit type 'long unsigned int'
> 
> This happens because if_then_else_cond() in combine.c calls
> num_sign_bit_copies() in rtlanal.c with mode==BLKmode.
> 
> 5205   bitwidth = GET_MODE_PRECISION (mode);
> 5206   if (bitwidth > HOST_BITS_PER_WIDE_INT)
> 5207 return 1;
> 5208
> 5209   nonzero = nonzero_bits (x, mode);
> 5210   return nonzero & (HOST_WIDE_INT_1U << (bitwidth - 1))
> 5211  ? 1 : bitwidth - floor_log2 (nonzero) - 1;
> 
> This causes (bitwidth - 1) to wrap around.

Could you also add a gcc_assert here?

>   PR rtl-optimization/78588 
>   * combine.c (if_then_else_cond): Also guard against BLKmode.

Approved, please apply.  Thanks,


Segher

Re: Import libcilkrts Build 4467 (PR target/68945)


On 11/17/2016 06:06 AM, Rainer Orth wrote:

I happened to notice that my libcilkrts SPARC port has been applied
upstream.  So to reach closure on this issue for the GCC 7 release, I'd
like to import upstream into mainline which seems to be covered by the
free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even
though https://gcc.gnu.org/codingconventions.html#upstream lists nothing
specific and we have no listed maintainer.

A few issues are worth mention:

* Upstream still has a typo in the git URL in many files.  I've
  corrected that during the import to avoid a massive diff:

-#  https://bitbucket.org/intelcilkruntime/itnel-cilk-runtime.git are
+#  https://bitbucket.org/intelcilkruntime/intel-cilk-runtime.git are

* libcilkrts.spec.in is missing upstream.  I've no idea if this is
  intentional.

* A few of my changes have been lost and I can't tell if this is by
  accident:

** Lost whitespace:

--- libcilkrts-old/Makefile.am  2016-05-04 16:44:24.0 +
+++ libcilkrts-new/Makefile.am  2016-11-17 11:35:33.782987017 +
@@ -54,7 +54,7 @@ GENERAL_FLAGS = -I$(top_srcdir)/include
 # Enable Intel Cilk Plus extension
 GENERAL_FLAGS += -fcilkplus

-# Always generate unwind tables
+#Always generate unwind tables
 GENERAL_FLAGS += -funwind-tables

 AM_CFLAGS = $(XCFLAGS) $(GENERAL_FLAGS) -std=c99

** Lost alphabetical order of targets:

diff -rup libcilkrts-old/configure.ac libcilkrts-new/configure.ac
--- libcilkrts-old/configure.ac 2016-11-16 18:34:28.0 +
+++ libcilkrts-new/configure.ac 2016-11-17 11:35:33.800015570 +
@@ -143,14 +145,14 @@ esac
 # contains information on what's needed
 case "${target}" in

-  arm-*-*)
-config_dir="arm"
-;;
-
   i?86-*-* | x86_64-*-*)
 config_dir="x86"
 ;;

+  arm-*-*)
+config_dir="arm"
+;;
+
   sparc*-*-*)
 config_dir="sparc"
 ;;
diff -rup libcilkrts-old/configure.tgt libcilkrts-new/configure.tgt
--- libcilkrts-old/configure.tgt2016-11-16 18:34:28.0 +
+++ libcilkrts-new/configure.tgt2016-11-17 11:35:33.807873451 +
@@ -44,10 +44,10 @@

 # Disable Cilk Runtime library for unsupported architectures.
 case "${target}" in
-  arm-*-*)
-;;
   i?86-*-* | x86_64-*-*)
 ;;
+  arm-*-*)
+;;
   sparc*-*-*)
 ;;
   *-*-*)

  I've done nothing about those, just wanted to point them out.

The following patch has passed x86_64-pc-linux-gnu bootstrap without
regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps
are currently running.

Ok for mainline if they pass?

Yes.  Sorry for not getting back to you sooner.

jeff

Re: [PATCH] Introduce -fdump-ipa-clones dump output


On 11/11/2016 07:30 AM, Martin Liška wrote:

Hello.

Motivation for the patch is to dump IPA clones that were created
by all inter-procedural optimizations. Usage of such input is to track
set of functions where a code from another function can eventually occur.
Usage of the dump file can be seen here: [1].

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

[1] https://github.com/marxin/kgraft-analysis-tool


0001-Introduce-fdump-ipa-clones-dump-output.patch


From 700b9833771a5b646d3db44014af81c007dd48f4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Nov 2016 14:23:30 +0100
Subject: [PATCH] Introduce -fdump-ipa-clones dump output

gcc/ChangeLog:

2016-11-11  Martin Liska  

* cgraph.c (symbol_table::initialize): Initialize
ipa_clones_dump_file.
(cgraph_node::remove): Report to ipa_clones_dump_file.
* cgraph.h: Add new argument (suffix) to cloning methods.
* cgraphclones.c (dump_callgraph_transformation): New function.
(cgraph_node::create_clone): New argument.
(cgraph_node::create_virtual_clone): Likewise.
(cgraph_node::create_version_clone): Likewise.
* dumpfile.c: Add .ipa-clones dump file.
* dumpfile.h (enum tree_dump_index): Add TDI_clones
* ipa-inline-transform.c (clone_inlined_nodes): Report operation
to dump_callgraph_transformation.
---
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index cc730d2..2d59291 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -906,13 +906,14 @@ public:
  If the new node is being inlined into another one, NEW_INLINED_TO should 
be
  the outline function the new one is (even indirectly) inlined to.
  All hooks will see this in node's global.inlined_to, when invoked.
- Can be NULL if the node is not inlined.  */
+ Can be NULL if the node is not inlined.  SUFFIX is string that is appended
+ to the original name.  */
   cgraph_node *create_clone (tree decl, gcov_type count, int freq,
 bool update_original,
 vec redirect_callers,
 bool call_duplication_hook,
 cgraph_node *new_inlined_to,
-bitmap args_to_skip);
+bitmap args_to_skip, const char *sufix = NULL);

s/sufix/suffix/
?

OK with that nit fixed.

Sorry for the delays getting to this.

jeff

[PATCH] libiberty: avoid reading past end of buffer in strndup/xstrndup (PR c/78498)

2016-11-29 Thread David Malcolm

libiberty's implementations of strndup and xstrndup call strlen on
the input string, and hence can read past the end of the input buffer
if it isn't zero-terminated (such as is the case in PR c/78498, where
the input string is from the input.c line cache).

This patch converts them to use strnlen instead (as glibc's
implementation of them does), avoiding reading more than n bytes
from the input buffer.  strnlen is provided by libiberty.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
adds 6 PASS results to gcc.sum.

The patch also adds some selftests for this case, which showed
the problem and the fix nicely via "make selftest-valgrind".
Unfortunately I had to put these selftests within the gcc
subdirectory, rather than libiberty, since selftest.h is C++ and
is itself in the gcc subdirectory.  If that's unacceptable, I can
just drop the selftest.c part of the patch (or we somehow support
selftests from within libiberty itself, though I'm not sure how to
do that, if libiberty is meant as a cross-platform compat library,
rather than as a base support layer; the simplest thing to do seemed
to be to put them in the "gcc" subdir).

gcc/ChangeLog:
PR c/78498
* selftest.c (selftest::assert_strndup_eq): New function.
(selftest::test_strndup): New function.
(selftest::test_libiberty): New function.
(selftest::selftest_c_tests): Call test_libiberty.

gcc/testsuite/ChangeLog:
PR c/78498
* gcc.dg/format/pr78494.c: New test case.

libiberty/ChangeLog:
PR c/78498
* strndup.c (strlen): Delete decl.
(strnlen): Add decl.
(strndup): Call strnlen rather than strlen.
* xstrndup.c: Likewise.
---
 gcc/selftest.c| 48 +++
 gcc/testsuite/gcc.dg/format/pr78494.c | 12 +
 libiberty/strndup.c   |  7 ++---
 libiberty/xstrndup.c  |  5 +---
 4 files changed, 63 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/format/pr78494.c

diff --git a/gcc/selftest.c b/gcc/selftest.c
index 2a729be..6df73c2 100644
--- a/gcc/selftest.c
+++ b/gcc/selftest.c
@@ -198,6 +198,53 @@ read_file (const location &loc, const char *path)
   return result;
 }
 
+/* Selftests for libiberty.  */
+
+/* Verify that both strndup and xstrndup generate EXPECTED
+   when called on SRC and N.  */
+
+static void
+assert_strndup_eq (const char *expected, const char *src, size_t n)
+{
+  char *buf = strndup (src, n);
+  if (buf)
+ASSERT_STREQ (expected, buf);
+  free (buf);
+
+  buf = xstrndup (src, n);
+  ASSERT_STREQ (expected, buf);
+  free (buf);
+}
+
+/* Verify that strndup and xstrndup work as expected.  */
+
+static void
+test_strndup ()
+{
+  assert_strndup_eq ("", "test", 0);
+  assert_strndup_eq ("t", "test", 1);
+  assert_strndup_eq ("te", "test", 2);
+  assert_strndup_eq ("tes", "test", 3);
+  assert_strndup_eq ("test", "test", 4);
+  assert_strndup_eq ("test", "test", 5);
+
+  /* Test on an string without zero termination.  */
+  const char src[4] = {'t', 'e', 's', 't'};
+  assert_strndup_eq ("", src, 0);
+  assert_strndup_eq ("t", src, 1);
+  assert_strndup_eq ("te", src, 2);
+  assert_strndup_eq ("tes", src, 3);
+  assert_strndup_eq ("test", src, 4);
+}
+
+/* Run selftests for libiberty.  */
+
+static void
+test_libiberty ()
+{
+  test_strndup ();
+}
+
 /* Selftests for the selftest system itself.  */
 
 /* Sanity-check the ASSERT_ macros with various passing cases.  */
@@ -245,6 +292,7 @@ test_read_file ()
 void
 selftest_c_tests ()
 {
+  test_libiberty ();
   test_assertions ();
   test_named_temp_file ();
   test_read_file ();
diff --git a/gcc/testsuite/gcc.dg/format/pr78494.c 
b/gcc/testsuite/gcc.dg/format/pr78494.c
new file mode 100644
index 000..4b53a68
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/format/pr78494.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wall -Wextra -fdiagnostics-show-caret" } */
+
+void f (void)
+{
+  __builtin_printf ("%i", ""); /* { dg-warning "expects argument of type" } */
+/* { dg-begin-multiline-output "" }
+   __builtin_printf ("%i", "");
+  ~^   ~~
+  %s
+   { dg-end-multiline-output "" } */
+}
diff --git a/libiberty/strndup.c b/libiberty/strndup.c
index 9e9b4e2..4556b96 100644
--- a/libiberty/strndup.c
+++ b/libiberty/strndup.c
@@ -33,7 +33,7 @@ memory was available.  The result is always NUL terminated.
 #include "ansidecl.h"
 #include 
 
-extern size_t  strlen (const char*);
+extern size_t  strnlen (const char *s, size_t maxlen);
 extern PTR malloc (size_t);
 extern PTR memcpy (PTR, const PTR, size_t);
 
@@ -41,10 +41,7 @@ char *
 strndup (const char *s, size_t n)
 {
   char *result;
-  size_t len = strlen (s);
-
-  if (n < len)
-len = n;
+  size_t len = strnlen (s, n);
 
   result = (char *) malloc (len + 1);
   if (!result)
diff --git a/libiberty/xstrndup.c b/libiberty/xstrndup.c
index 0a41f60..c3d2d83 100644
--- a/libiberty/

Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2016-11-29 Thread Bernd Edlinger

On 11/29/16 16:06, Wilco Dijkstra wrote:
> Bernd Edlinger wrote:
>
> -  "TARGET_32BIT && reload_completed
> +  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)
> && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))"
>
> This is equivalent to "&& (!TARGET_IWMMXT || reload_completed)" since we're
> already excluding NEON.
>

Aehm, no.  This would split the addi_neon insn before it is clear
if the reload pass will assign a VFP register.

With this change the stack usage with -mfpu=neon increases
from 2300 to around 2600 bytes.

> This patch expands ADD and SUB earlier, so shouldn't we do the same obvious
> change for the similar instructions CMP and NEG?
>

Good question.  I think the cmp and neg pattern are more complicated
and do typically have a more complicated data flow than the other
patterns.

I tried to create a test case which expands cmpdi and negdi patterns
as follows:

--- pr77308-1.c 2016-11-25 17:53:20.379141465 +0100
+++ pr77308-2.c 2016-11-29 20:46:51.266948631 +0100
@@ -68,10 +68,10 @@
  #define B(x,j)(((SHA_LONG64)(*(((const unsigned char 
*)(&x))+j)))<<((7-j)*8))
  #define PULL64(x) 
(B(x,0)|B(x,1)|B(x,2)|B(x,3)|B(x,4)|B(x,5)|B(x,6)|B(x,7))
  #define ROTR(x,s)   (((x)>>s) | (x)<<(64-s))
-#define Sigma0(x)   ~(ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39))
-#define Sigma1(x)   ~(ROTR((x),14) ^ ROTR((x),18) ^ ROTR((x),41))
-#define sigma0(x)   ~(ROTR((x),1)  ^ ROTR((x),8)  ^ ((x)>>7))
-#define sigma1(x)   ~(ROTR((x),19) ^ ROTR((x),61) ^ ((x)>>6))
+#define Sigma0(x)   (ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39) == 
(x) ? -(x) : (x))
+#define Sigma1(x)   (ROTR((x),14) ^ ROTR(-(x),18) ^ ROTR((x),41) < 
(x) ? -(x) : (x))
+#define sigma0(x)   (ROTR((x),1)  ^ ROTR((x),8)  ^ ((x)>>7) <= (x) 
? ~(x) : (x))
+#define sigma1(x)   ((long long)(ROTR((x),19) ^ ROTR((x),61) ^ 
((x)>>6)) < (long long)(x) ? -(x) : (x))
  #define Ch(x,y,z)   (((x) & (y)) ^ ((~(x)) & (z)))
  #define Maj(x,y,z)  (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))


This expands *arm_negdi2, *arm_cmpdi_unsigned, *arm_cmpdi_insn.
The stack usage is around 1900 bytes with previous patch,
and 2300 bytes without.

I tried to split *arm_negdi2 and *arm_cmpdi_unsined early, and it
gives indeed smaller stack sizes in the test case above (~400 bytes).
But when I make *arm_cmpdi_insn split early, it ICEs:

--- arm.md.orig 2016-11-27 09:22:41.794790123 +0100
+++ arm.md  2016-11-29 21:51:51.438163078 +0100
@@ -7432,7 +7432,7 @@
 (clobber (match_scratch:SI 2 "=r"))]
"TARGET_32BIT"
"#"   ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
-  "&& reload_completed"
+  "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)"
[(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 0) (match_dup 1)))
 (parallel [(set (reg:CC CC_REGNUM)

ontop of the latest patch, I got:

gcc -S -Os pr77308-2.c -fdump-rtl-all-verbose
pr77308-2.c: In function 'sha512_block_data_order':
pr77308-2.c:169:1: error: unrecognizable insn:
  }
  ^
(insn 4870 4869 1636 87 (set (scratch:SI)
 (minus:SI (minus:SI (subreg:SI (reg:DI 2261) 4)
 (subreg:SI (reg:DI 473 [ X$14 ]) 4))
 (ltu:SI (reg:CC_C 100 cc)
 (const_int 0 [0] "pr77308-2.c":140 -1
  (nil))
pr77308-2.c:169:1: internal compiler error: in extract_insn, at recog.c:2311
0xaf4cd8 _fatal_insn(char const*, rtx_def const*, char const*, int, char 
const*)
../../gcc-trunk/gcc/rtl-error.c:108
0xaf4d09 _fatal_insn_not_found(rtx_def const*, char const*, int, char 
const*)
../../gcc-trunk/gcc/rtl-error.c:116
0xac74ef extract_insn(rtx_insn*)
../../gcc-trunk/gcc/recog.c:2311
0x122427a decompose_multiword_subregs
../../gcc-trunk/gcc/lower-subreg.c:1467
0x122550d execute
../../gcc-trunk/gcc/lower-subreg.c:1734


So it is certainly possible, but not really simple to improve the
stack size even further.  But I would prefer to do that in a
separate patch.

BTW: there are also negd2_compare, *negdi_extendsidi,
*negdi_zero_extendsidi, *thumb2_negdi2.

I think it would be a precondition to have test cases that exercise
each of these patterns before we try to split these instructions.


Bernd.

Re: [PATCH] add common CPP_SPECS for bfin