date:20190925

[PATCH] [MIPS] Fix PR target/91769

2019-09-25 Thread Dragan Mladjenovic

From: "Dragan Mladjenovic" 

This fixes the issue by checking that addr's base reg is not part of dest
multiword reg instead just checking the first reg of dest.

gcc/ChangeLog:

2019-09-25  Dragan Mladjenovic  

PR target/91769
* config/mips/mips.c (mips_split_move): Use reg_overlap_mentioned_p
instead of REGNO equality check on addr.reg.

gcc/testsuite/ChangeLog:

2019-09-25  Dragan Mladjenovic  

PR target/91769
* gcc.target/mips/pr91769.c: New test.
---

Hi all,

Is this OK for trunk?

The test case is a bit crude, but I guess that is better than having none.

On top of that, I would like to backport this along with r273174 onto gcc 9 
branch.
That should fix BZ91702 and BZ91474 reported against gcc 9.2.

Tested on mips-mti-linux-gnu.

Best regards,
Dragan

 gcc/config/mips/mips.c  |  2 +-
 gcc/testsuite/gcc.target/mips/pr91769.c | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/pr91769.c

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index c682ebd..aa527b4 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -4862,7 +4862,7 @@ mips_split_move (rtx dest, rtx src, enum mips_split_type 
split_type, rtx insn_)
{
  rtx tmp = XEXP (src, 0);
  mips_classify_address (&addr, tmp, GET_MODE (tmp), true);
- if (addr.reg && REGNO (addr.reg) != REGNO (dest))
+ if (addr.reg && !reg_overlap_mentioned_p (dest, addr.reg))
validate_change (next, &SET_SRC (set), src, false);
}
  else
diff --git a/gcc/testsuite/gcc.target/mips/pr91769.c 
b/gcc/testsuite/gcc.target/mips/pr91769.c
new file mode 100644
index 000..b856183
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/pr91769.c
@@ -0,0 +1,19 @@
+/* PR target/91769 */
+/* { dg-do compile } */
+/* { dg-skip-if "naming registers makes this a code quality test" { *-*-* } { 
"-O0" "-g" } { "" } } */
+/* { dg-options "-EL -mgp32 -mhard-float" } */
+
+NOCOMPRESSION double
+foo (void)
+{
+  register double* pf __asm__ ("$a1");
+  __asm__ __volatile__ ("":"=r"(pf));
+  double f = *pf;
+
+  if (f != f)
+f = -f;
+  return f;
+}
+
+/* { dg-final { scan-assembler-not 
"lw\t\\\$4,0\\(\\\$5\\)\n\tlw\t\\\$5,4\\(\\\$5\\)\n\tldc1\t\\\$.*,0\\(\\\$5\\)" 
} } */
+/* { dg-final { scan-assembler 
"lw\t\\\$4,0\\(\\\$5\\)\n\tlw\t\\\$5,4\\(\\\$5\\)\n\tmtc1\t\\\$4,\\\$.*\n\tmthc1\t\\\$5,\\\$.*"
 } } */
-- 
1.9.1

Re: [PATCH] Remove vectorizer reduction operand swapping

2019-09-25 Thread Richard Biener

On Tue, 24 Sep 2019, Christophe Lyon wrote:

> On Wed, 18 Sep 2019 at 20:11, Richard Biener  wrote:
> >
> >
> > It shouldn't be neccessary.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > (SLP part testing separately)
> >
> > Richard.
> >
> > 2019-09-18  Richard Biener  
> >
> > * tree-vect-loop.c (vect_is_simple_reduction): Remove operand
> > swapping.
> > (vectorize_fold_left_reduction): Remove assert.
> > (vectorizable_reduction): Also expect COND_EXPR non-reduction
> > operand in position 2.  Remove assert.
> >
> 
> Hi,
> 
> Since this was committed (r275898), I've noticed a regression on armeb:
> FAIL: gcc.dg/vect/vect-cond-4.c execution test
> 
> I'm seeing this with qemu, but I do not have the execution traces yet.

Can you open a bugreport please?

Thanks,
Richard.

> Christophe
> 
> > Index: gcc/tree-vect-loop.c
> > ===
> > --- gcc/tree-vect-loop.c(revision 275872)
> > +++ gcc/tree-vect-loop.c(working copy)
> > @@ -3278,56 +3278,8 @@ vect_is_simple_reduction (loop_vec_info
> >   || !flow_bb_inside_loop_p (loop, gimple_bb (def2_info->stmt))
> >   || vect_valid_reduction_input_p (def2_info)))
> >  {
> > -  if (! nested_in_vect_loop && orig_code != MINUS_EXPR)
> > -   {
> > - /* Check if we can swap operands (just for simplicity - so that
> > -the rest of the code can assume that the reduction variable
> > -is always the last (second) argument).  */
> > - if (code == COND_EXPR)
> > -   {
> > - /* Swap cond_expr by inverting the condition.  */
> > - tree cond_expr = gimple_assign_rhs1 (def_stmt);
> > - enum tree_code invert_code = ERROR_MARK;
> > - enum tree_code cond_code = TREE_CODE (cond_expr);
> > -
> > - if (TREE_CODE_CLASS (cond_code) == tcc_comparison)
> > -   {
> > - bool honor_nans = HONOR_NANS (TREE_OPERAND (cond_expr, 
> > 0));
> > - invert_code = invert_tree_comparison (cond_code, 
> > honor_nans);
> > -   }
> > - if (invert_code != ERROR_MARK)
> > -   {
> > - TREE_SET_CODE (cond_expr, invert_code);
> > - swap_ssa_operands (def_stmt,
> > -gimple_assign_rhs2_ptr (def_stmt),
> > -gimple_assign_rhs3_ptr (def_stmt));
> > -   }
> > - else
> > -   {
> > - if (dump_enabled_p ())
> > -   report_vect_op (MSG_NOTE, def_stmt,
> > -   "detected reduction: cannot swap 
> > operands "
> > -   "for cond_expr");
> > - return NULL;
> > -   }
> > -   }
> > - else
> > -   swap_ssa_operands (def_stmt, gimple_assign_rhs1_ptr (def_stmt),
> > -  gimple_assign_rhs2_ptr (def_stmt));
> > -
> > - if (dump_enabled_p ())
> > -   report_vect_op (MSG_NOTE, def_stmt,
> > -   "detected reduction: need to swap operands: ");
> > -
> > - if (CONSTANT_CLASS_P (gimple_assign_rhs1 (def_stmt)))
> > -   LOOP_VINFO_OPERANDS_SWAPPED (loop_info) = true;
> > -}
> > -  else
> > -{
> > -  if (dump_enabled_p ())
> > -report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
> > -}
> > -
> > +  if (dump_enabled_p ())
> > +   report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
> >return def_stmt_info;
> >  }
> >
> > @@ -5969,7 +5921,6 @@ vectorize_fold_left_reduction (stmt_vec_
> >gcc_assert (!nested_in_vect_loop_p (loop, stmt_info));
> >gcc_assert (ncopies == 1);
> >gcc_assert (TREE_CODE_LENGTH (code) == binary_op);
> > -  gcc_assert (reduc_index == (code == MINUS_EXPR ? 0 : 1));
> >gcc_assert (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> >   == FOLD_LEFT_REDUCTION);
> >
> > @@ -6542,9 +6493,9 @@ vectorizable_reduction (stmt_vec_info st
> >   reduc_index = i;
> > }
> >
> > -  if (i == 1 && code == COND_EXPR)
> > +  if (code == COND_EXPR)
> > {
> > - /* Record how value of COND_EXPR is defined.  */
> > + /* Record how the non-reduction-def value of COND_EXPR is 
> > defined.  */
> >   if (dt == vect_constant_def)
> > {
> >   cond_reduc_dt = dt;
> > @@ -6622,10 +6573,6 @@ vectorizable_reduction (stmt_vec_info st
> >   return false;
> > }
> >
> > -  /* vect_is_simple_reduction ensured that operand 2 is the
> > -loop-carried operand.  */
> > -  gcc_assert (reduc_index == 2);
> > -
> >/* Loop peeling modifies initial value of reduction PHI, which
> >  makes the reduction stmt to be t

[PATCH] Add TODO_update_ssa for SLP BB vectorization (PR tree-optimization/91885).

2019-09-25 Thread Martin Liška

Hi.

Similarly to SLP pass, we should probably set TODO_update_ssa
when a SLP BB vectorization happens from the normal vect pass.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-09-25  Martin Liska  

PR tree-optimization/91885
* tree-vectorizer.c (try_vectorize_loop_1):
Add TODO_update_ssa similarly to what slp
pass does.

gcc/testsuite/ChangeLog:

2019-09-25  Martin Liska  

PR tree-optimization/91885
* gcc.dg/pr91885.c: New test.
---
 gcc/testsuite/gcc.dg/pr91885.c | 47 ++
 gcc/tree-vectorizer.c  |  2 +-
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr91885.c


diff --git a/gcc/testsuite/gcc.dg/pr91885.c b/gcc/testsuite/gcc.dg/pr91885.c
new file mode 100644
index 000..934e8d3e6c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr91885.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fprofile-generate" } */
+/* { dg-require-profiling "-fprofile-generate" } */
+
+typedef signed long int __int64_t;
+typedef unsigned long int __uint64_t;
+typedef __int64_t int64_t;
+typedef __uint64_t uint64_t;
+inline void
+BLI_endian_switch_int64 (int64_t *val)
+{
+  uint64_t tval = *val;
+  *val = ((tval >> 56)) | ((tval << 40) & 0x00ffll)
+	 | ((tval << 24) & 0xff00ll)
+	 | ((tval << 8) & 0x00ffll)
+	 | ((tval >> 8) & 0xff00ll)
+	 | ((tval >> 24) & 0x00ffll)
+	 | ((tval >> 40) & 0xff00ll) | ((tval << 56));
+}
+typedef struct anim_index_entry
+{
+  unsigned long long seek_pos_dts;
+  unsigned long long pts;
+} anim_index_entry;
+extern struct anim_index_entry *
+MEM_callocN (int);
+struct anim_index
+{
+  int num_entries;
+  struct anim_index_entry *entries;
+};
+struct anim_index *
+IMB_indexer_open (const char *name)
+{
+  char header[13];
+  struct anim_index *idx;
+  int i;
+  idx->entries = MEM_callocN (8);
+  if (((1 == 0) != (header[8] == 'V')))
+{
+  for (i = 0; i < idx->num_entries; i++)
+	{
+	  BLI_endian_switch_int64 ((int64_t *) &idx->entries[i].seek_pos_dts);
+	  BLI_endian_switch_int64 ((int64_t *) &idx->entries[i].pts);
+	}
+}
+}
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index c3004f6f3a2..b3da0a16dc0 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -943,7 +943,7 @@ try_vectorize_loop_1 (hash_table *&simduid_to_vf_htab,
 	  fold_loop_internal_call (loop_vectorized_call,
    boolean_true_node);
 	  loop_vectorized_call = NULL;
-	  ret |= TODO_cleanup_cfg;
+	  ret |= TODO_cleanup_cfg | TODO_update_ssa;
 	}
 	}
   /* If outer loop vectorization fails for LOOP_VECTORIZED guarded

Re: [PATCH] Fix ICE when __builtin_calloc has no LHS (PR tree-optimization/91014).

2019-09-25 Thread Martin Liška

On 9/24/19 5:57 PM, Jeff Law wrote:
> Sure, and IMHO moving tests like this should be something that can be
> done without explicit ACKs.

Ok, next time I'll not ask for a confirmation ;)

Thanks,
Martin

> 
> jeff

[PATCH][arm] Implement non-GE-setting SIMD32 intrinsics

2019-09-25 Thread Kyrill Tkachov


Hi all,

This patch is part of a series to implement the SIMD32 ACLE intrinsics [1].
The interesting parts implementation-wise involve adding support for 
setting and reading

the Q bit for saturation and the GE-bits for the packed SIMD instructions.
That will come in a later patch.

For now, this patch implements the other intrinsics that don't need 
anything special ;

just a mapping from arm_acle.h function to builtin to RTL expander+unspec.

I've compressed as many as I could with iterators so that we end up 
needing only 3

new define_insns.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Will commit to trunk within the next day or two.

Thanks,

Kyrill

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

2019-09-25  Kyrylo Tkachov  

    * config/arm/arm.md (arm_): New define_insn.
    (arm_xtb16): Likewise.
    (arm_usada8): Likewise.
    * config/arm/arm_acle.h (__qadd8, __qsub8, __shadd8, __shsub8,
    __uhadd8, __uhsub8, __uqadd8, __uqsub8, __qadd16, __qasx, __qsax,
    __qsub16, __shadd16, __shasx, __shsax, __shsub16, __uhadd16, __uhasx,
    __uhsax, __uhsub16, __uqadd16, __uqasx, __uqsax, __uqsub16, __sxtab16,
    __sxtb16, __uxtab16, __uxtb16): Define.
    * config/arm/arm_acle_builtins.def: Define builtins for the above.
    * config/arm/unspecs.md: Define unspecs for the above.
    * config/arm/iterators.md (SIMD32_NOGE_BINOP): New int_iterator.
    (USXTB16): Likewise.
    (simd32_op): New int_attribute.
    (sup): Handle UNSPEC_SXTB16, UNSPEC_UXTB16.
    * doc/sourcebuild.exp (arm_simd32_ok): Document.

2019-09-25  Kyrylo Tkachov  

    * lib/target-supports.exp
    (check_effective_target_arm_simd32_ok_nocache): New procedure.
    (check_effective_target_arm_simd32_ok): Likewise.
    (add_options_for_arm_simd32): Likewise.
    * gcc.target/arm/acle/simd32.c: New test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7f966b952bb2f394bdad2c742f82d143404458a8..d091f6744b5054428fdd11c6c10a4628c7a52d9e 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5058,6 +5058,36 @@
(set_attr "predicable" "yes")]
 )
 
+(define_insn "arm_xtb16"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")] USXTB16))]
+  "TARGET_INT_SIMD"
+  "xtb16%?\\t%0, %1"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_dsp_reg")])
+
+(define_insn "arm_"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "s_register_operand" "r")] SIMD32_NOGE_BINOP))]
+  "TARGET_INT_SIMD"
+  "%?\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_dsp_reg")])
+
+(define_insn "arm_usada8"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "s_register_operand" "r")
+	   (match_operand:SI 3 "s_register_operand" "r")] UNSPEC_USADA8))]
+  "TARGET_INT_SIMD"
+  "usada8%?\\t%0, %1, %2, %3"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "alu_dsp_reg")])
+
 (define_expand "extendsfdf2"
   [(set (match_operand:DF  0 "s_register_operand")
 	(float_extend:DF (match_operand:SF 1 "s_register_operand")))]
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index 6857ab1787df0ffa672e5078e5a0b9c9cc52e695..9c6f12d556654b094a23a327c030820172a03a4c 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -173,6 +173,238 @@ __arm_mrrc2 (const unsigned int __coproc, const unsigned int __opc1,
 #endif /*  __ARM_ARCH >= 5.  */
 #endif /* (!__thumb__ || __thumb2__) &&  __ARM_ARCH >= 4.  */
 
+#ifdef __ARM_FEATURE_SIMD32
+typedef int32_t int16x2_t;
+typedef uint32_t uint16x2_t;
+typedef int32_t int8x4_t;
+typedef uint32_t uint8x4_t;
+
+__extension__ extern __inline int16x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__sxtab16 (int16x2_t __a, int8x4_t __b)
+{
+  return __builtin_arm_sxtab16 (__a, __b);
+}
+
+__extension__ extern __inline int16x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__sxtb16 (int8x4_t __a)
+{
+  return __builtin_arm_sxtb16 (__a);
+}
+
+__extension__ extern __inline uint16x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__uxtab16 (uint16x2_t __a, uint8x4_t __b)
+{
+  return __builtin_arm_uxtab16 (__a, __b);
+}
+
+__extension__ extern __inline uint16x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__uxtb16 (uint8x4_t __a)
+{
+  return __builtin_arm_uxtb16 (__a);
+}
+
+__extension__ extern __inline int8x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__qadd8 (int8x4_t __a, int8x4_t __b)
+{
+  return __builtin_arm_qadd8 (__a, __b);
+}
+
+__extension__ extern __inline int8x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__qsub8 (int8x4_t __a, int8x4_t __b)
+{
+  return __builtin_arm_qsub8 (__

[PATCH][arm] Implement DImode SIMD32 intrinsics

2019-09-25 Thread Kyrill Tkachov


Hi all,

This patch implements some more SIMD32, but these ones have a DImode 
result+addend.

Apart from that there's nothing too exciting about them.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Will commit to trunk within the next day or two.

Thanks,
Kyrill

2019-09-25  Kyrylo Tkachov  

    * config/arm/arm.md (arm_): New define_insn.
    * config/arm/arm_acle.h (__smlald, __smlaldx, __smlsld, __smlsldx):
    Define.
    * config/arm/arm_acle.h: Define builtins for the above.
    * config/arm/iterators.md (SIMD32_DIMODE): New int_iterator.
    (simd32_op): Handle the above.
    * config/arm/unspecs.md: Define unspecs for the above.

2019-09-25  Kyrylo Tkachov  

    * gcc.target/arm/acle/simd32.c: Update test.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index d091f6744b5054428fdd11c6c10a4628c7a52d9e..b6db276b87f597ff7611060b9f856311dcd3b98a 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5088,6 +5088,17 @@
   [(set_attr "predicable" "yes")
(set_attr "type" "alu_dsp_reg")])
 
+(define_insn "arm_"
+  [(set (match_operand:DI 0 "s_register_operand" "=r")
+	(unspec:DI
+	  [(match_operand:SI 1 "s_register_operand" "r")
+	   (match_operand:SI 2 "s_register_operand" "r")
+	   (match_operand:DI 3 "s_register_operand" "0")] SIMD32_DIMODE))]
+  "TARGET_INT_SIMD"
+  "%?\\t%Q0, %R0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "smlald")])
+
 (define_expand "extendsfdf2"
   [(set (match_operand:DF  0 "s_register_operand")
 	(float_extend:DF (match_operand:SF 1 "s_register_operand")))]
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index 9c6f12d556654b094a23a327c030820172a03a4c..248a355d00239a8724e46b9203c818906a4d4908 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -403,8 +403,37 @@ __usada8 (uint8x4_t __a, uint8x4_t __b, uint32_t __c)
   return __builtin_arm_usada8 (__a, __b, __c);
 }
 
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlald (int16x2_t __a, int16x2_t __b, int64_t __c)
+{
+  return __builtin_arm_smlald (__a, __b, __c);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlaldx (int16x2_t __a, int16x2_t __b, int64_t __c)
+{
+  return __builtin_arm_smlaldx (__a, __b, __c);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlsld (int16x2_t __a, int16x2_t __b, int64_t __c)
+{
+  return __builtin_arm_smlsld (__a, __b, __c);
+}
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__smlsldx (int16x2_t __a, int16x2_t __b, int64_t __c)
+{
+  return __builtin_arm_smlsldx (__a, __b, __c);
+}
+
 #endif
 
+
 #pragma GCC push_options
 #ifdef __ARM_FEATURE_CRC32
 #ifdef __ARM_FP
diff --git a/gcc/config/arm/arm_acle_builtins.def b/gcc/config/arm/arm_acle_builtins.def
index c675fc46dae6552b8762e9bbb6147d8a6d15133a..0021c0036ad7e1bddef6553a900c9eaf145037b6 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -75,3 +75,7 @@ VAR1 (BINOP, smusd, si)
 VAR1 (BINOP, smusdx, si)
 VAR1 (UBINOP, usad8, si)
 VAR1 (UBINOP, usada8, si)
+VAR1 (TERNOP, smlald, di)
+VAR1 (TERNOP, smlaldx, di)
+VAR1 (TERNOP, smlsld, di)
+VAR1 (TERNOP, smlsldx, di)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 538f5bf6b0116f49b27eef589b0140aa7792e976..8c9f7121951ba319fcb6cf4c73e186f3764917c2 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -443,6 +443,9 @@
  UNSPEC_UQSUB16 UNSPEC_SMUSD UNSPEC_SMUSDX
  UNSPEC_SXTAB16 UNSPEC_UXTAB16 UNSPEC_USAD8])
 
+(define_int_iterator SIMD32_DIMODE [UNSPEC_SMLALD UNSPEC_SMLALDX
+UNSPEC_SMLSLD UNSPEC_SMLSLDX])
+
 (define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
 
 (define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE])
@@ -1051,7 +1054,9 @@
 			(UNSPEC_UQSAX "uqsax") (UNSPEC_UQSUB16 "uqsub16")
 			(UNSPEC_SMUSD "smusd") (UNSPEC_SMUSDX "smusdx")
 			(UNSPEC_SXTAB16 "sxtab16") (UNSPEC_UXTAB16 "uxtab16")
-			(UNSPEC_USAD8 "usad8")])
+			(UNSPEC_USAD8 "usad8") (UNSPEC_SMLALD "smlald")
+			(UNSPEC_SMLALDX "smlaldx") (UNSPEC_SMLSLD "smlsld")
+			(UNSPEC_SMLSLDX "smlsldx")])
 
 ;; Both kinds of return insn.
 (define_code_iterator RETURNS [return simple_return])
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 08a6cd77ce08d8c9cf42abcf3c9277b769043cfd..78f88d5fa09f424a9ab638053cc4fe068aa19368 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -123,6 +123,10 @@
   UNSPEC_SMUSDX		; Represent the SMUSDX operation.
   UNSPEC_USAD8		; Represent the USAD8 operation.
   UNSPEC_USADA8		; Represent the USADA8 operation.
+  UNSPEC_SMLALD		; Represent the SMLALD operation.
+  UNSPEC_SMLALDX	; Represent the SMLALDX operation.
+  UNSPEC_SMLS

[PATCH] Fix PR91896

2019-09-25 Thread Richard Biener



Removing operand swapping for reduction vectorization has enabled
more pattern recognition which in turn triggers a latent bug
in reduction vectorization.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-09-25  Richard Biener  

PR tree-optimization/91896
* tree-vect-loop.c (vectorizable_reduction): The single
def-use cycle optimization cannot apply when there's more
than one pattern stmt involved.

* gcc.dg/torture/pr91896.c: New testcase.

Index: gcc/testsuite/gcc.dg/torture/pr91896.c
===
--- gcc/testsuite/gcc.dg/torture/pr91896.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr91896.c  (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize" } */
+
+unsigned int
+zj (unsigned int et)
+{
+  signed char jr = 0;
+
+  do {
+et *= 3;
+jr += 2;
+  } while (jr >= 0);
+
+  if (et == (unsigned int) jr)
+et = 0;
+
+  return et;
+}
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 276120)
+++ gcc/tree-vect-loop.c(working copy)
@@ -6101,6 +6101,8 @@ vectorizable_reduction (stmt_vec_info st
   if (ncopies > 1
  && STMT_VINFO_RELEVANT (reduc_stmt_info) <= vect_used_only_live
  && (use_stmt_info = loop_vinfo->lookup_single_use (phi_result))
+ && (!STMT_VINFO_IN_PATTERN_P (use_stmt_info)
+ || !STMT_VINFO_PATTERN_DEF_SEQ (use_stmt_info))
  && vect_stmt_to_vectorize (use_stmt_info) == reduc_stmt_info)
single_defuse_cycle = true;
 
@@ -6868,6 +6870,8 @@ vectorizable_reduction (stmt_vec_info st
   if (ncopies > 1
   && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
   && (use_stmt_info = loop_vinfo->lookup_single_use (reduc_phi_result))
+  && (!STMT_VINFO_IN_PATTERN_P (use_stmt_info)
+ || !STMT_VINFO_PATTERN_DEF_SEQ (use_stmt_info))
   && vect_stmt_to_vectorize (use_stmt_info) == stmt_info)
 {
   single_defuse_cycle = true;

Re: [PATCH v4] Missed function specialization + partial devirtualization

2019-09-25 Thread Martin Liška

On 9/25/19 5:45 AM, luoxhu wrote:
> Hi,
> 
> Sorry for replying so late due to cauldron conference and other LTO issues
> I was working on.

Hello.

That's fine, we still have plenty of time for patch review.

Not fixed issues which I reported in v3 (and still valid in v4):
- please come up with indirect_target_info::indirect_target_info and use it
- do you need to stream out indirect_call_targets when common_target_id == 0?

Then I'm suggesting to use vec::is_empty (please see my patch).

I see following failures for the tests provided:
FAIL: gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c compilation,  
-fprofile-generate -D_PROFILE_GENERATE
FAIL: gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c compilation,  
-fprofile-generate -D_PROFILE_GENERATE
FAIL: gcc.dg/tree-prof/indir-call-prof-topn.c compilation,  -fprofile-generate 
-D_PROFILE_GENERATE

Next comments follow directly in the email body:

> 
> v4 Changes:
>  1. Rebase to trunk.
>  2. Remove num_of_ics and use vector's length to avoid redundancy.
>  3. Update the code in ipa-profile.c to improve review feasibility.
>  4. Add function has_indirect_call_p and has_multiple_indirect_call_p.
>  5. For parameter control, I will leave it to next patch as it is a
> relative independent function.  Currently, maximum number of
> promotions is GCOV_TOPN_VALUES as only 4 profiling value limited
> from profile-generate, therefore minimum probability is adjusted to
> 25% in value-prof.c, it was 75% also by hard code for single
> indirect target.  No control to minimal number of edge
> executions yet.  What's more, this patch is a bit large now.
> 
> This patch aims to fix PR69678 caused by PGO indirect call profiling
> performance issues.
> The bug that profiling data is never working was fixed by Martin's pull
> back of topN patches, performance got GEOMEAN ~1% improvement(+24% for
> 511.povray_r specifically).
> Still, currently the default profile only generates SINGLE indirect target
> that called more than 75%.  This patch leverages MULTIPLE indirect
> targets use in LTO-WPA and LTO-LTRANS stage, as a result, function
> specialization, profiling, partial devirtualization, inlining and
> cloning could be done successfully based on it.
> Performance can get improved from 0.70 sec to 0.38 sec on simple tests.
> Details are:
>   1.  PGO with topn is enabled by default now, but only one indirect
>   target edge will be generated in ipa-profile pass, so add variables to 
> enable
>   multiple speculative edges through passes, speculative_id will record the
>   direct edge index bind to the indirect edge, indirect_call_targets length
>   records how many direct edges owned by the indirect edge, postpone gimple_ic
>   to ipa-profile like default as inline pass will decide whether it is benefit
>   to transform indirect call.
>   2.  Use speculative_id to track and search the reference node matched
>   with the direct edge's callee for multiple targets.  Actually, it is the
>   caller's responsibility to handle the direct edges mapped to same indirect
>   edge.  speculative_call_info will return one of the direct edge specified,
>   this will leverage current IPA edge process framework mostly.
>   3.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
>   profile full support in ipa passes and cgraph_edge functions.  
> speculative_id
>   can be set by make_speculative id when multiple targets are binded to
>   one indirect edge, and cloned if new edge is cloned.  speculative_id
>   is streamed out and stream int by lto like lto_stmt_uid.
>   4.  Add 1 in module testcase and 2 cross module testcases.
>   5.  Bootstrap and regression test passed on Power8-LE.  No function
>   and performance regression for SPEC2017.
> 
> gcc/ChangeLog
> 
>   2019-09-25  Xiong Hu Luo  
> 
>   PR ipa/69678
>   * cgraph.c (symbol_table::create_edge): Init speculative_id.
>   (cgraph_edge::make_speculative): Add param for setting speculative_id.
>   (cgraph_edge::speculative_call_info): Find reference by
>   speculative_id for multiple indirect targets.
>   (cgraph_edge::resolve_speculation): Decrease the speculations
>   for indirect edge, drop it's speculative if not direct target
>   left.
>   (cgraph_edge::redirect_call_stmt_to_callee): Likewise.
>   (cgraph_node::verify_node): Don't report error if speculative
>   edge not include statement.
>   (cgraph_edge::has_multiple_indirect_call_p): New function.
>   (cgraph_edge::has_indirect_call_p): New function.
>   * cgraph.h (struct indirect_target_info): New struct.
>   (indirect_call_targets): New vector variable.
>   (make_speculative): Add param for setting speculative_id.
>   (cgraph_edge::has_multiple_indirect_call_p): New declare.
>   (cgraph_edge::has_indirect_call_p): New declare.
>   (speculative_id): New variable.
>   * cgraphclones.c (cgraph_node::create_clone): Clone speculative_id.
>   * ip

Re: [PATCH] driver: Also prune joined switches with negation

2019-09-25 Thread Kyrill Tkachov




On 9/24/19 7:47 PM, Matt Turner wrote:

When -march=native is passed to host_detect_local_cpu to the backend,
it overrides all command lines after it.  That means

$ gcc -march=native -march=armv8-a

is treated as

$ gcc -march=armv8-a -march=native

Prune joined switches with Negative and RejectNegative to allow
-march=armv8-a to override previous -march=native on command-line.

This is the same fix as was applied for i386 in SVN revision 269164 
but for

aarch64 and arm.

gcc/

    PR driver/69471
    * config/aarch64/aarch64.opt (march=): Add Negative(march=).
    (mtune=): Add Negative(mtune=). (mcpu=): Add Negative(mcpu=).
    * config/arm/arm.opt: Likewise.



Thanks.

This is ok for arm. LGTM for aarch64 but you'll need an aarch64 
maintainer to approve.


I've bootstrapped and tested this patch on arm-none-linux-gnueabihf and 
aarch64-none-linux-gnu and there's no fallout.


I can commit it for you once the aarch64 part is

Kyrill



---
 gcc/config/aarch64/aarch64.opt | 6 +++---
 gcc/config/arm/arm.opt | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.opt 
b/gcc/config/aarch64/aarch64.opt

index 865b6a6d8ca..fc43428b32a 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -119,15 +119,15 @@ EnumValue
 Enum(aarch64_tls_size) String(48) Value(48)

 march=
-Target RejectNegative ToLower Joined Var(aarch64_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined 
Var(aarch64_arch_string)

 Use features of architecture ARCH.

 mcpu=
-Target RejectNegative ToLower Joined Var(aarch64_cpu_string)
+Target RejectNegative Negative(mcpu=) ToLower Joined 
Var(aarch64_cpu_string)

 Use features of and optimize for CPU.

 mtune=
-Target RejectNegative ToLower Joined Var(aarch64_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined 
Var(aarch64_tune_string)

 Optimize for CPU.

 mabi=
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 452f0cf6d67..76c10ab62a2 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -82,7 +82,7 @@ mapcs-stack-check
 Target Report Mask(APCS_STACK) Undocumented

 march=
-Target RejectNegative ToLower Joined Var(arm_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined 
Var(arm_arch_string)

 Specify the name of the target architecture.

 ; Other arm_arch values are loaded from arm-tables.opt
@@ -107,7 +107,7 @@ Target Report Mask(CALLER_INTERWORKING)
 Thumb: Assume function pointers may go to non-Thumb aware code.

 mcpu=
-Target RejectNegative ToLower Joined Var(arm_cpu_string)
+Target RejectNegative Negative(mcpu=) ToLower Joined Var(arm_cpu_string)
 Specify the name of the target CPU.

 mfloat-abi=
@@ -232,7 +232,7 @@ Target Report Mask(TPCS_LEAF_FRAME)
 Thumb: Generate (leaf) stack frames even if not needed.

 mtune=
-Target RejectNegative ToLower Joined Var(arm_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined 
Var(arm_tune_string)

 Tune code for the given processor.

 mprint-tune-info
--
2.21.0

[PATCH] Implement LWG 3296 for basic_regex::assign

2019-09-25 Thread Jonathan Wakely


* include/bits/regex.h
(basic_regex::assign(const C*, size_t, flag_type)): Add default
argument (LWG 3296).
* testsuite/28_regex/basic_regex/assign/char/lwg3296.cc: New test.
* testsuite/28_regex/basic_regex/assign/wchar_t/lwg3296.cc: New test.

Tested x86_64-linux, committed to trunk.

commit b4ad7c7f6854ced87e46332da5ea49107cfc5318
Author: Jonathan Wakely 
Date:   Wed Sep 25 13:01:04 2019 +0100

Implement LWG 3296 for basic_regex::assign

* include/bits/regex.h
(basic_regex::assign(const C*, size_t, flag_type)): Add default
argument (LWG 3296).
* testsuite/28_regex/basic_regex/assign/char/lwg3296.cc: New test.
* testsuite/28_regex/basic_regex/assign/wchar_t/lwg3296.cc: New 
test.

diff --git a/libstdc++-v3/include/bits/regex.h 
b/libstdc++-v3/include/bits/regex.h
index b30b41a0759..7869c3fd1c1 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -628,8 +628,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
* expression pattern interpreted according to @p __flags.  If
* regex_error is thrown, *this remains unchanged.
*/
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3296. Inconsistent default argument for basic_regex<>::assign
   basic_regex&
-  assign(const _Ch_type* __p, std::size_t __len, flag_type __flags)
+  assign(const _Ch_type* __p, size_t __len, flag_type __flags = ECMAScript)
   { return this->assign(string_type(__p, __len), __flags); }
 
   /**
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/lwg3296.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/lwg3296.cc
new file mode 100644
index 000..29256bbbf03
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/char/lwg3296.cc
@@ -0,0 +1,36 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+
+void
+test01()
+{
+  std::regex r("", std::regex_constants::grep);
+  r.assign("(.)[", 3);  // LWG 3296
+  VERIFY( r.flags() == std::regex_constants::ECMAScript );
+  VERIFY( r.mark_count() == 1 );
+}
+
+int
+main()
+{
+  test01();
+}
diff --git 
a/libstdc++-v3/testsuite/28_regex/basic_regex/assign/wchar_t/lwg3296.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/wchar_t/lwg3296.cc
new file mode 100644
index 000..302ebd6b4f9
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/assign/wchar_t/lwg3296.cc
@@ -0,0 +1,36 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+
+void
+test01()
+{
+  std::wregex r(L"", std::regex_constants::grep);
+  r.assign(L"(.)[", 3);  // LWG 3296
+  VERIFY( r.flags() == std::regex_constants::ECMAScript );
+  VERIFY( r.mark_count() == 1 );
+}
+
+int
+main()
+{
+  test01();
+}

Re: [PATCH v2] [AARCH64] Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC

2019-09-25 Thread Kyrill Tkachov


Hi all,

On 9/3/19 9:35 AM, Shaokun Zhang wrote:

The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
Let's support the two bits if they are enabled, the CPU core will
not execute the unnecessary DCache clean or Icache Invalidation
instructions.

2019-09-03  Shaokun Zhang 

    * config/aarch64/sync-cache.c: Support CTR_EL0.IDC and CTR_EL0.DIC in
__aarch64_sync_cache_range function.



James has approved this offline, so I've committed it on Shaokun's 
behalf with r276122 with a slightly adjusted ChangeLog.


2019-09-25  Shaokun Zhang  

    * config/aarch64/sync-cache.c (__aarch64_sync_cache_range): Add 
support for

    CTR_EL0.IDC and CTR_EL0.DIC.

Thanks,

Kyrill


---
 libgcc/config/aarch64/sync-cache.c | 57 
--

 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/aarch64/sync-cache.c 
b/libgcc/config/aarch64/sync-cache.c

index 791f5e42ff44..ea3da4be02b3 100644
--- a/libgcc/config/aarch64/sync-cache.c
+++ b/libgcc/config/aarch64/sync-cache.c
@@ -23,6 +23,9 @@ a copy of the GCC Runtime Library Exception along 
with this program;

 see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
 . */

+#define CTR_IDC_SHIFT   28
+#define CTR_DIC_SHIFT   29
+
 void __aarch64_sync_cache_range (const void *, const void *);

 void
@@ -41,32 +44,44 @@ __aarch64_sync_cache_range (const void *base, 
const void *end)

   icache_lsize = 4 << (cache_info & 0xF);
   dcache_lsize = 4 << ((cache_info >> 16) & 0xF);

-  /* Loop over the address range, clearing one cache line at once.
- Data cache must be flushed to unification first to make sure the
- instruction cache fetches the updated data.  'end' is exclusive,
- as per the GNU definition of __clear_cache.  */
+  /* If CTR_EL0.IDC is enabled, Data cache clean to the Point of 
Unification is

+ not required for instruction to data coherence.  */
+
+  if (((cache_info >> CTR_IDC_SHIFT) & 0x1) == 0x0) {
+    /* Loop over the address range, clearing one cache line at once.
+   Data cache must be flushed to unification first to make sure the
+   instruction cache fetches the updated data.  'end' is exclusive,
+   as per the GNU definition of __clear_cache.  */

-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));
+    /* Make the start address of the loop cache aligned. */
+    address = (const char*) ((__UINTPTR_TYPE__) base
+    & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));

-  for (; address < (const char *) end; address += dcache_lsize)
-    asm volatile ("dc\tcvau, %0"
- :
- : "r" (address)
- : "memory");
+    for (; address < (const char *) end; address += dcache_lsize)
+  asm volatile ("dc\tcvau, %0"
+   :
+   : "r" (address)
+   : "memory");
+  }

   asm volatile ("dsb\tish" : : : "memory");

-  /* Make the start address of the loop cache aligned.  */
-  address = (const char*) ((__UINTPTR_TYPE__) base
-  & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));
+  /* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the 
Point of

+ Unification is not required for instruction to data coherence.  */
+
+  if (((cache_info >> CTR_DIC_SHIFT) & 0x1) == 0x0) {
+    /* Make the start address of the loop cache aligned. */
+    address = (const char*) ((__UINTPTR_TYPE__) base
+    & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));
+
+    for (; address < (const char *) end; address += icache_lsize)
+  asm volatile ("ic\tivau, %0"
+   :
+   : "r" (address)
+   : "memory");

-  for (; address < (const char *) end; address += icache_lsize)
-    asm volatile ("ic\tivau, %0"
- :
- : "r" (address)
- : "memory");
+    asm volatile ("dsb\tish" : : : "memory");
+  }

-  asm volatile ("dsb\tish; isb" : : : "memory");
+  asm volatile("isb" : : : "memory");
 }
--
2.7.4

Re: [C++ Patch] Use DECL_SOURCE_LOCATION more in name-lookup.c

2019-09-25 Thread Jason Merrill


On 9/24/19 3:34 PM, Marek Polacek wrote:

On Tue, Sep 24, 2019 at 09:07:03PM +0200, Paolo Carlini wrote:

Hi,

Marek's recent fix prompted an audit of name-lookup.c and I found a few
additional straightforward places where we should use a more accurate
location. Tested x86_64-linux.

Thanks, Paolo.

///




/cp
2019-09-24  Paolo Carlini  

* name-lookup.c (check_extern_c_conflict): Use DECL_SOURCE_LOCATION.
(check_local_shadow): Use it in three additional places.

/testsuite
2019-09-24  Paolo Carlini  

* g++.dg/diagnostic/redeclaration-1.C: New.
* g++.dg/lookup/extern-c-hidden.C: Test location(s) too.
* g++.dg/lookup/extern-c-redecl.C: Likewise.
* g++.dg/lookup/extern-c-redecl6.C: Likewise.
* g++.old-deja/g++.other/using9.C: Likewise.


LGTM.

Marek


OK.

Jason

Re: [PATCH][AArch64] Use implementation namespace consistently in arm_neon.h

2019-09-25 Thread Kyrill Tkachov


Hi all,

On 2/6/19 1:52 PM, Kyrill Tkachov wrote:

[resending with patch compressed]

Hi all,

We're somewhat inconsistent in arm_neon.h when it comes to using the 
implementation namespace for local

identifiers. This means things like:
#define hash_abcd 0
#define hash_e 1
#define wk 2

#include "arm_neon.h"

uint32x4_t
foo (uint32x4_t a, uint32_t b, uint32x4_t c)
{
  return vsha1cq_u32 (a, b, c);
}

don't compile.
This patch fixes these issues throughout the whole of arm_neon.h
Bootstrapped and tested on aarch64-none-linux-gnu.
The advsimd-intrinsics.exp tests pass just fine.

Don't feel sorry for me having to write the ChangeLog. 
./contrib/mklog.pl automated the whole thing.




James has approved this offline so I've committed it with r276125.

Thanks,

Kyrill



Ok for trunk?
Thanks,
Kyrill

2019-02-06  Kyrylo Tkachov  

    * config/aarch64/arm_neon.h (vaba_s8): Use __ in identifiers
    consistenly.
    (vaba_s16): Likewise.
    (vaba_s32): Likewise.
    (vaba_u8): Likewise.
    (vaba_u16): Likewise.
    (vaba_u32): Likewise.
    (vabal_high_s8): Likewise.
    (vabal_high_s16): Likewise.
    (vabal_high_s32): Likewise.
    (vabal_high_u8): Likewise.
    (vabal_high_u16): Likewise.
    (vabal_high_u32): Likewise.
    (vabal_s8): Likewise.
    (vabal_s16): Likewise.
    (vabal_s32): Likewise.
    (vabal_u8): Likewise.
    (vabal_u16): Likewise.
    (vabal_u32): Likewise.
    (vabaq_s8): Likewise.
    (vabaq_s16): Likewise.
    (vabaq_s32): Likewise.
    (vabaq_u8): Likewise.
    (vabaq_u16): Likewise.
    (vabaq_u32): Likewise.
    (vabd_s8): Likewise.
    (vabd_s16): Likewise.
    (vabd_s32): Likewise.
    (vabd_u8): Likewise.
    (vabd_u16): Likewise.
    (vabd_u32): Likewise.
    (vabdl_high_s8): Likewise.
    (vabdl_high_s16): Likewise.
    (vabdl_high_s32): Likewise.
    (vabdl_high_u8): Likewise.
    (vabdl_high_u16): Likewise.
    (vabdl_high_u32): Likewise.
    (vabdl_s8): Likewise.
    (vabdl_s16): Likewise.
    (vabdl_s32): Likewise.
    (vabdl_u8): Likewise.
    (vabdl_u16): Likewise.
    (vabdl_u32): Likewise.
    (vabdq_s8): Likewise.
    (vabdq_s16): Likewise.
    (vabdq_s32): Likewise.
    (vabdq_u8): Likewise.
    (vabdq_u16): Likewise.
    (vabdq_u32): Likewise.
    (vaddlv_s8): Likewise.
    (vaddlv_s16): Likewise.
    (vaddlv_u8): Likewise.
    (vaddlv_u16): Likewise.
    (vaddlvq_s8): Likewise.
    (vaddlvq_s16): Likewise.
    (vaddlvq_s32): Likewise.
    (vaddlvq_u8): Likewise.
    (vaddlvq_u16): Likewise.
    (vaddlvq_u32): Likewise.
    (vcvtx_f32_f64): Likewise.
    (vcvtx_high_f32_f64): Likewise.
    (vcvtxd_f32_f64): Likewise.
    (vmla_n_f32): Likewise.
    (vmla_n_s16): Likewise.
    (vmla_n_s32): Likewise.
    (vmla_n_u16): Likewise.
    (vmla_n_u32): Likewise.
    (vmla_s8): Likewise.
    (vmla_s16): Likewise.
    (vmla_s32): Likewise.
    (vmla_u8): Likewise.
    (vmla_u16): Likewise.
    (vmla_u32): Likewise.
    (vmlal_high_n_s16): Likewise.
    (vmlal_high_n_s32): Likewise.
    (vmlal_high_n_u16): Likewise.
    (vmlal_high_n_u32): Likewise.
    (vmlal_high_s8): Likewise.
    (vmlal_high_s16): Likewise.
    (vmlal_high_s32): Likewise.
    (vmlal_high_u8): Likewise.
    (vmlal_high_u16): Likewise.
    (vmlal_high_u32): Likewise.
    (vmlal_n_s16): Likewise.
    (vmlal_n_s32): Likewise.
    (vmlal_n_u16): Likewise.
    (vmlal_n_u32): Likewise.
    (vmlal_s8): Likewise.
    (vmlal_s16): Likewise.
    (vmlal_s32): Likewise.
    (vmlal_u8): Likewise.
    (vmlal_u16): Likewise.
    (vmlal_u32): Likewise.
    (vmlaq_n_f32): Likewise.
    (vmlaq_n_s16): Likewise.
    (vmlaq_n_s32): Likewise.
    (vmlaq_n_u16): Likewise.
    (vmlaq_n_u32): Likewise.
    (vmlaq_s8): Likewise.
    (vmlaq_s16): Likewise.
    (vmlaq_s32): Likewise.
    (vmlaq_u8): Likewise.
    (vmlaq_u16): Likewise.
    (vmlaq_u32): Likewise.
    (vmls_n_f32): Likewise.
    (vmls_n_s16): Likewise.
    (vmls_n_s32): Likewise.
    (vmls_n_u16): Likewise.
    (vmls_n_u32): Likewise.
    (vmls_s8): Likewise.
    (vmls_s16): Likewise.
    (vmls_s32): Likewise.
    (vmls_u8): Likewise.
    (vmls_u16): Likewise.
    (vmls_u32): Likewise.
    (vmlsl_high_n_s16): Likewise.
    (vmlsl_high_n_s32): Likewise.
    (vmlsl_high_n_u16): Likewise.
    (vmlsl_high_n_u32): Likewise.
    (vmlsl_high_s8): Likewise.
    (vmlsl_high_s16): Likewise.
    (vmlsl_high_s32): Likewise.
    (vmlsl_high_u8): Likewise.
    (vmlsl_high_u16): Likewise.
    (vmlsl_high_u32): Likewise.
    (vmlsl_n_s16): Likewise.
    (vmlsl_n_s32): Likewise.
    (vmlsl_n_u16): Likewise.
    (vmlsl_n_u32): Likewise.
    (vmlsl_s8): Likewise.
    (vmlsl_s16): Likewise.
    (vmlsl_s32): Likewise.
    (vmlsl_u8): Likewise.
    (vmlsl_u16): Likewise.
    (vmlsl_u32): Likewise.
    (vmlsq_n_f32): Likewise.
    (vmlsq_n_s16): Likewise.
    (vmlsq_n_s32): Likewise.
    (vmlsq_n_u16): Likewise.
    (vmlsq_n_u32): Likewise.
    (vmlsq_s8): Likewise.
    (vmlsq_s16): Likewise.
    (vmlsq_s32): Likewise.
    (vmlsq_u8): Likewise.
    (vmlsq_u16): Likewise.
    (vmlsq_u32): L

[PATCH] Retain TYPE_MODE more often for BIT_FIELD_REFs in get_inner_referece

2019-09-25 Thread Richard Biener



BIT_FIELD_REFs can extract almost any kind of type but specifically
is used to extract vector elements (that very special case is handled)
but also sub-vectors (missing).  RTL expansion of stores relies
on an appropriate mode to use vector stores it seems.

The following patch relaxes the condition under which we force
VOIDmode by making all non-integral types where the extraction
size matches the type size (thus isn't "bitfieldish") use the
mode of the extraction type.

The patch leaves alone things like QImode extracts from SImode
since that would need to check the offset as well whereas I
assume we cannot extract non-INTEGRAL entities at non-byte
aligned offsets(?).

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

2019-09-25  Richard Biener  

PR middle-end/91897
* expr.c (get_inner_reference): For BIT_FIELD_REF with
non-integral type and matching access size retain the original
mode.

* gcc.target/i386/pr91897.c: New testcase.

Index: gcc/expr.c
===
--- gcc/expr.c  (revision 276123)
+++ gcc/expr.c  (working copy)
@@ -7232,8 +7232,9 @@ get_inner_reference (tree exp, poly_int6
 
   /* For vector types, with the correct size of access, use the mode of
 inner type.  */
-  if (TREE_CODE (TREE_TYPE (TREE_OPERAND (exp, 0))) == VECTOR_TYPE
- && TREE_TYPE (exp) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0)))
+  if (((TREE_CODE (TREE_TYPE (TREE_OPERAND (exp, 0))) == VECTOR_TYPE
+   && TREE_TYPE (exp) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0
+  || !INTEGRAL_TYPE_P (TREE_TYPE (exp)))
  && tree_int_cst_equal (size_tree, TYPE_SIZE (TREE_TYPE (exp
 mode = TYPE_MODE (TREE_TYPE (exp));
 }
Index: gcc/testsuite/gcc.target/i386/pr91897.c
===
--- gcc/testsuite/gcc.target/i386/pr91897.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr91897.c (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+typedef double Double16 __attribute__((vector_size(8*16)));
+
+void mult(Double16 *res, const Double16 *v1, const Double16 *v2)
+{
+  *res = *v1 * *v2;
+}
+
+/* We want 4 ymm loads and 4 ymm stores.  */
+/* { dg-final { scan-assembler-times "movapd" 8 } } */

[PATCH] Remove newly unused function and variable in tree-sra.c

2019-09-25 Thread Martin Jambor

Hi,

Martin and his clang warnings discovered that I forgot to remove a
static inline function and a variable when ripping out the old IPA-SRA
from tree-sra.c and both are now unused.  Thus I am doing that now
with the patch below which I will commit as obvious (after including
it in a round of a bootstrap and testing on an x86_64-linux).

Thanks,

Martin

2019-09-25  Martin Jambor  

* tree-sra.c (no_accesses_p): Remove.
(no_accesses_representant): Likewise.
---
 gcc/tree-sra.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 48589323a1e..ba6d5406587 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -327,17 +327,6 @@ static struct obstack name_obstack;
propagated to their assignment counterparts. */
 static struct access *work_queue_head;
 
-/* Representative of no accesses at all. */
-static struct access  no_accesses_representant;
-
-/* Predicate to test the special value.  */
-
-static inline bool
-no_accesses_p (struct access *access)
-{
-  return access == &no_accesses_representant;
-}
-
 /* Dump contents of ACCESS to file F in a human friendly way.  If GRP is true,
representative fields are dumped, otherwise those which only describe the
individual access are.  */
-- 
2.23.0

[PATCH] Fix continue condition in IPA-SRA's process_scan_results

2019-09-25 Thread Martin Jambor

Hi,

On Tue, Sep 24 2019, Martin Jambor wrote:
>
>
> It is the correct thing to do, sorry for the breakage.  I have to run
> now but will prepare a patch tomorrow.
>

and here it is.  The patch fixes the thinko explained in my email
yesterday - basically the test for locally_unused was intended for
unused aggregates which have however not been marked as such yet and
going this way for unsplitable but unused register-type parameters may
cause problems in some cases, if they are for example big SVE vectors.

Passed bootstrap and testing on x86_64-linux.  OK for trunk?

Thanks,

Martin

2019-09-25  Martin Jambor  

* ipa-sra.c (process_scan_results): Fix continue condition.
---
 gcc/ipa-sra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
index 0ccebbd4607..b35fff69472 100644
--- a/gcc/ipa-sra.c
+++ b/gcc/ipa-sra.c
@@ -2239,7 +2239,7 @@ process_scan_results (cgraph_node *node, struct function 
*fun,
desc_index++, parm = DECL_CHAIN (parm))
 {
   gensum_param_desc *desc = &(*param_descriptions)[desc_index];
-  if (!desc->locally_unused && !desc->split_candidate)
+  if (!desc->split_candidate)
continue;

   if (flag_checking)
-- 
2.23.0

Re: [PATCH][arm][committed] Fix use of CRC32 intrinsics with Armv8-a and hard-float

2019-09-25 Thread Kyrill Tkachov




On 8/22/19 4:53 PM, Kyrill Tkachov wrote:

Hi all,

We currently have a nasty error when trying to use the __crc* 
intrinsics with an -mfloat-abi=hard.
That is because the target pragma guarding them uses armv8-a+crc that 
does not include fp by default.

So we get errors like:
error: '-mfloat-abi=hard': selected processor lacks an FPU

This patch fixes that by using an FP-enabled arch target pragma to 
guard these intrinsics when floating-point is available.

That way both the softfloat and hardfloat variants work.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk. Will backport to branches later.

Now backported to GCC 9 and 8 branches after bootstrapping and testing 
there.


Will do GCC 7 once I get more testing cycles.

Thanks,

Kyrill



Thanks,
Kyrill

2019-08-22  Kyrylo Tkachov 

    * config/arm/arm_acle.h: Use arch=armv8-a+crc+simd pragma for CRC32
    intrinsics if __ARM_FP.
    Use __ARM_FEATURE_CRC32 ifdef guard.

2019-08-22  Kyrylo Tkachov 

    * gcc.target/arm/acle/crc_hf_1.c: New test.

[PR 91853] Prevent IPA-SRA ICEs on type-mismatched calls

2019-09-25 Thread Martin Jambor

Hi,

PR 91853 and its duplicate PR 91894 show that IPA-SRA can stumble when
presented with code with mismatched types, whether because it is a K&R C
or happening through an originally indirect call (or probably also
because of LTO).

The problem is that we try to work with a register value - in this case
an integer constant - like if it was a pointer to a structure and try to
dereference it in the caller, leading to expressions like ADDR_EXPR of a
constant zero.  Old IPA-SRA dealt with these simply by checking type
compatibility which is difficult in an LTO-capable IPA pass, basically
we would at least have to remember and stream a bitmap for each call
telling which arguments are pointers which looks a bit excessive given
that we just don't want to ICE.

So this patch attempts to deal with the situation rather than avoid it.
When an integer is used instead of a pointer, there is some chance that
it actually contains the pointer value and so I create a NOP_EXPR to
convert it to a pointer (which in the testcase is actually a widening
conversion).  For other register types, I don't bother and simply pull
an undefined pointer default definition SSA name and use that.  I wonder
whether I should somehow warn as well.  Hopefully there is no code doing
that that can conceivably work - maybe someone coding for x86_16 and
passing a vector of integers as a segment and offset pointer? :-)

What do people think?  In any event, this patch passed bootstrap and
testing and deals with the issue, so if it is OK, I'd like to commit it
to trunk.

Martin



2019-09-23  Martin Jambor  

PR ipa/91853
* ipa-param-manipulation.c (ipa_param_adjustments::modify_call): Deal
with register type mismatches.

testsuite/
* gcc.dg/ipa/pr91853.c: New test.
---
 gcc/ipa-param-manipulation.c   | 22 --
 gcc/testsuite/gcc.dg/ipa/pr91853.c | 30 ++
 2 files changed, 50 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr91853.c

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 913b96fefa4..bc175a5541a 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -651,8 +651,26 @@ ipa_param_adjustments::modify_call (gcall *stmt,
   bool deref_base = false;
   unsigned int deref_align = 0;
   if (TREE_CODE (base) != ADDR_EXPR
- && POINTER_TYPE_P (TREE_TYPE (base)))
-   off = build_int_cst (apm->alias_ptr_type, apm->unit_offset);
+ && is_gimple_reg_type (TREE_TYPE (base)))
+   {
+ /* Detect (gimple register) type mismatches in calls so that we don't
+ICE.  Make a poor attempt to gracefully treat integers passed in
+place of pointers, for everything else create a proper undefined
+value which it is.  */
+ if (INTEGRAL_TYPE_P (TREE_TYPE (base)))
+   {
+ tree tmp = make_ssa_name (ptr_type_node);
+ gassign *convert = gimple_build_assign (tmp, NOP_EXPR, base);
+ gsi_insert_before (&gsi, convert, GSI_SAME_STMT);
+ base = tmp;
+   }
+ else if (!POINTER_TYPE_P (TREE_TYPE (base)))
+   {
+ tree tmp = create_tmp_var (ptr_type_node);
+ base = get_or_create_ssa_default_def (cfun, tmp);
+   }
+ off = build_int_cst (apm->alias_ptr_type, apm->unit_offset);
+   }
   else
{
  bool addrof;
diff --git a/gcc/testsuite/gcc.dg/ipa/pr91853.c 
b/gcc/testsuite/gcc.dg/ipa/pr91853.c
new file mode 100644
index 000..4bad7803751
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr91853.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "--param ipa-cp-value-list-size=0 -Os -fno-inline" } */
+
+struct _wincore
+{
+  int y;
+  int width;
+};
+int a;
+void fn2 (void);
+static int fn1 (dpy, winInfo) struct _XDisplay *dpy;
+struct _wincore *winInfo;
+{
+  a = winInfo->width;
+  fn2 ();
+}
+
+void fn4 (int, int, int);
+static int fn3 (dpy, winInfo, visrgn) struct _XDisplay *dpy;
+int winInfo, visrgn;
+{
+  int b = fn1 (0, winInfo);
+  fn4 (0, 0, visrgn);
+}
+
+int
+fn5 (event) struct _XEvent *event;
+{
+  fn3 (0, 0, 0);
+}
-- 
2.23.0

[PATCH] Fix quoting in a call to internal_error

2019-09-25 Thread Martin Jambor

Hi,

it was brought to my attention that my call to internal_error in the new
IPA-SRA makes -Wformat-diag complain because it thinks that everything
with an underscore is an identifier or a keyword and should be quoted.
Well, the string should not contain "IPA_SRA" but "IPA-SRA" in the first
place so this patch corrects that and hopefully the problem should go
away.  While at it I noticed that the %s in the same string should
actually probably be quoted, so I'm replacing it with %qs too.

Bootstrapped and tested on x86_64-linux.  OK for trunk?

Thanks,

Martin



2019-09-25  Martin Jambor  

* ipa-sra.c (verify_splitting_accesses): Fix quoting in a call to
internal_error.
---
 gcc/ipa-sra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
index b35fff69472..50dee69e3db 100644
--- a/gcc/ipa-sra.c
+++ b/gcc/ipa-sra.c
@@ -2452,7 +2452,7 @@ verify_splitting_accesses (cgraph_node *node, bool 
certain_must_exist)
 
   bool certain_access_present = !certain_must_exist;
   if (overlapping_certain_accesses_p (desc, &certain_access_present))
-   internal_error ("Function %s, parameter %u, has IPA_SRA accesses "
+   internal_error ("Function %qs, parameter %u, has IPA-SRA accesses "
"which overlap", node->dump_name (), pidx);
   if (!certain_access_present)
internal_error ("Function %s, parameter %u, is used but does not "
-- 
2.23.0

Re: [PATCH] Retain TYPE_MODE more often for BIT_FIELD_REFs in get_inner_referece

2019-09-25 Thread Eric Botcazou

> The following patch relaxes the condition under which we force
> VOIDmode by making all non-integral types where the extraction
> size matches the type size (thus isn't "bitfieldish") use the
> mode of the extraction type.

Wouldn't TREE_CODE (TREE_TYPE (exp)) == VECTOR_TYPE be sufficient?  At least 
this would still be in keeping with the comment...

-- 
Eric Botcazou

Re: [03/32] Add a function for getting the ABI of a call insn target

2019-09-25 Thread Richard Sandiford

Richard Sandiford  writes:
> This patch replaces get_call_reg_set_usage with call_insn_abi,
> which returns the ABI of the target of a call insn.  The ABI's
> full_reg_clobbers corresponds to regs_invalidated_by_call,
> whereas many callers instead passed call_used_or_fixed_regs, i.e.:
>
>   (regs_invalidated_by_call | fixed_reg_set)
>
> The patch slavishly preserves the "| fixed_reg_set" for these callers;
> later patches will clean this up.

On reflection, I think insn_callee_abi would be a better name for the
function than call_insn_abi, since it should make it clearer that the
function returns the ABI of the target function.  In future we could
have expr_callee_abi for CALL_EXPRs.

Also, after Segher's comments for 10/32, I've used "callee_abi" as
the name of temporary variables, instead of just "abi".

I've made the same change for later patches (except where I've posted
new versions instead), but it didn't seem worth spamming the lists
with that.

Tested as before.

Richard

PS. Ping for the series :-)

2019-09-25  Richard Sandiford  

gcc/
* target.def (insn_callee_abi): New hook.
(remove_extra_call_preserved_regs): Delete.
* doc/tm.texi.in (TARGET_INSN_CALLEE_ABI): New macro.
(TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS): Delete.
* doc/tm.texi: Regenerate.
* targhooks.h (default_remove_extra_call_preserved_regs): Delete.
* targhooks.c (default_remove_extra_call_preserved_regs): Delete.
* config/aarch64/aarch64.c (aarch64_simd_call_p): Constify the
insn argument.
(aarch64_remove_extra_call_preserved_regs): Delete.
(aarch64_insn_callee_abi): New function.
(TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS): Delete.
(TARGET_INSN_CALLEE_ABI): New macro.
* rtl.h (get_call_fndecl): Declare.
(cgraph_rtl_info): Fix formatting.  Tweak comment for
function_used_regs.  Remove function_used_regs_valid.
* rtlanal.c (get_call_fndecl): Moved from final.c
* function-abi.h (insn_callee_abi): Declare.
(target_function_abi_info): Mention insn_callee_abi.
* function-abi.cc (fndecl_abi): Handle flag_ipa_ra in a similar
way to get_call_reg_set_usage did.
(insn_callee_abi): New function.
* regs.h (get_call_reg_set_usage): Delete.
* final.c: Include function-abi.h.
(collect_fn_hard_reg_usage): Add fixed and stack registers to
function_used_regs before the main loop rather than afterwards.
Use insn_callee_abi instead of get_call_reg_set_usage.  Exit early
if function_used_regs ends up not being useful.
(get_call_fndecl): Move to rtlanal.c
(get_call_cgraph_rtl_info, get_call_reg_set_usage): Delete.
* caller-save.c: Include function-abi.h.
(setup_save_areas, save_call_clobbered_regs): Use insn_callee_abi
instead of get_call_reg_set_usage.
* cfgcleanup.c: Include function-abi.h.
(old_insns_match_p): Use insn_callee_abi instead of
get_call_reg_set_usage.
* cgraph.h (cgraph_node::rtl_info): Take a const_tree instead of
a tree.
* cgraph.c (cgraph_node::rtl_info): Likewise.  Initialize
function_used_regs.
* df-scan.c: Include function-abi.h.
(df_get_call_refs): Use insn_callee_abi instead of
get_call_reg_set_usage.
* ira-lives.c: Include function-abi.h.
(process_bb_node_lives): Use insn_callee_abi instead of
get_call_reg_set_usage.
* lra-lives.c: Include function-abi.h.
(process_bb_lives): Use insn_callee_abi instead of
get_call_reg_set_usage.
* postreload.c: Include function-abi.h.
(reload_combine): Use insn_callee_abi instead of
get_call_reg_set_usage.
* regcprop.c: Include function-abi.h.
(copyprop_hardreg_forward_1): Use insn_callee_abi instead of
get_call_reg_set_usage.
* resource.c: Include function-abi.h.
(mark_set_resources, mark_target_live_regs): Use insn_callee_abi
instead of get_call_reg_set_usage.
* var-tracking.c: Include function-abi.h.
(dataflow_set_clear_at_call): Use insn_callee_abi instead of
get_call_reg_set_usage.

Index: gcc/target.def
===
--- gcc/target.def  2019-09-25 16:23:04.0 +0100
+++ gcc/target.def  2019-09-25 16:23:05.092580444 +0100
@@ -4952,6 +4952,19 @@ interoperability between several ABIs in
  const predefined_function_abi &, (const_tree type),
  NULL)

+DEFHOOK
+(insn_callee_abi,
+ "This hook returns a description of the ABI used by the target of\n\
+call instruction @var{insn}; see the definition of\n\
+@code{predefined_function_abi} for details of the ABI descriptor.\n\
+Only the global function @code{insn_callee_abi} should call this hook\n\
+directly.\n\
+\n\
+Targets only need to define this hook if they support\n\
+interoperability between sev

Re: [04/32] [x86] Robustify vzeroupper handling across calls

2019-09-25 Thread Richard Sandiford

Ping

Richard Sandiford  writes:
> One of the effects of the function_abi series is to make -fipa-ra
> work for partially call-clobbered registers.  E.g. if a call preserves
> only the low 32 bits of a register R, we handled the partial clobber
> separately from -fipa-ra, and so treated the upper bits of R as
> clobbered even if we knew that the target function doesn't touch R.
>
> "Fixing" this caused problems for the vzeroupper handling on x86.
> The pass that inserts the vzerouppers assumes that no 256-bit or 512-bit
> values are live across a call unless the call takes a 256-bit or 512-bit
> argument:
>
>   /* Needed mode is set to AVX_U128_CLEAN if there are
>no 256bit or 512bit modes used in function arguments. */
>
> This implicitly relies on:
>
> /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The only ABI that
>saves SSE registers across calls is Win64 (thus no need to check the
>current ABI here), and with AVX enabled Win64 only guarantees that
>the low 16 bytes are saved.  */
>
> static bool
> ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
>unsigned int regno, machine_mode mode)
> {
>   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
> }
>
> The comment suggests that this code is only needed for Win64 and that
> not testing for Win64 is just a simplification.  But in practice it was
> needed for correctness on GNU/Linux and other targets too, since without
> it the RA would be able to keep 256-bit and 512-bit values in SSE
> registers across calls that are known not to clobber them.
>
> This patch conservatively treats calls as AVX_U128_ANY if the RA can see
> that some SSE registers are not touched by a call.  There are then no
> regressions if the ix86_hard_regno_call_part_clobbered check is disabled
> for GNU/Linux (not something we should do, was just for testing).
>
> If in fact we want -fipa-ra to pretend that all functions clobber
> SSE registers above 128 bits, it'd certainly be possible to arrange
> that.  But IMO that would be an optimisation decision, whereas what
> the patch is fixing is a correctness decision.  So I think we should
> have this check even so.

2019-09-25  Richard Sandiford  

gcc/
* config/i386/i386.c: Include function-abi.h.
(ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY
if they preserve some 256-bit or 512-bit SSE registers.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  2019-09-25 16:47:48.0 +0100
+++ gcc/config/i386/i386.c  2019-09-25 16:47:49.089962608 +0100
@@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1
 #include "i386-builtins.h"
 #include "i386-expand.h"
 #include "i386-features.h"
+#include "function-abi.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins
}
}
 
+  /* If the function is known to preserve some SSE registers,
+RA and previous passes can legitimately rely on that for
+modes wider than 256 bits.  It's only safe to issue a
+vzeroupper if all SSE registers are clobbered.  */
+  const function_abi &abi = insn_callee_abi (insn);
+  if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
+ abi.mode_clobbers (V4DImode)))
+   return AVX_U128_ANY;
+
   return AVX_U128_CLEAN;
 }

Re: [10/32] Remove global call sets: combine.c

2019-09-25 Thread Richard Sandiford

Segher Boessenkool  writes:
> Hi Richard,
>
> Sorry this too me so long to get back to.
>
> On Thu, Sep 12, 2019 at 08:51:59AM +0100, Richard Sandiford wrote:
>> Segher Boessenkool  writes:
>> > On Wed, Sep 11, 2019 at 08:08:38PM +0100, Richard Sandiford wrote:
>> >>hard_reg_set_iterator hrsi;
>> >> -  EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, i, 
>> >> hrsi)
>> >> +  EXECUTE_IF_SET_IN_HARD_REG_SET (abi.full_and_partial_reg_clobbers 
>> >> (),
>> >> +   0, i, hrsi)
>> >
>> > So "abi" in that means calls?
>> 
>> "abi" is the interface of the callee function, taking things like
>> function attributes and -fipa-ra into account.
>> 
>> The register sets are describing what the callee does rather than
>> what calls to it do.  E.g. on targets that allow linker stubs to be
>> inserted between calls, the scratch registers reserved for linker stubs
>> are still call-clobbered, even if the target of the call doesn't use
>> them.  (Those call clobbers are represented separately, at least when
>> TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS is true.  When it's
>> false we don't use -fipa-ra information at all.)
>> 
>> > It is not such a great name like that.  Since its children are
>> > very_long_names, it doesn't need to be only three chars itself,
>> > either?
>> 
>> OK, what name would you prefer?
>
> Maybe call_abi is a good name?  It's difficult to capture the subtleties
> in a short enough name.  As always :-)

The formatting ended up being a bit weird with a longer name,
so how about the attached instead?

Richard


2019-09-25  Richard Sandiford  

gcc/
* combine.c: Include function-abi.h.
(record_dead_and_set_regs): Use insn_callee_abi to get the ABI
of the target of call insns.  Invalidate partially-clobbered
registers as well as fully-clobbered ones.

Index: gcc/combine.c
===
--- gcc/combine.c   2019-09-12 10:52:53.0 +0100
+++ gcc/combine.c   2019-09-25 16:50:21.772865265 +0100
@@ -105,6 +105,7 @@ Software Foundation; either version 3, o
 #include "valtrack.h"
 #include "rtl-iter.h"
 #include "print-rtl.h"
+#include "function-abi.h"
 
 /* Number of attempts to combine instructions in this function.  */
 
@@ -13464,11 +13465,21 @@ record_dead_and_set_regs (rtx_insn *insn
 
   if (CALL_P (insn))
 {
+  HARD_REG_SET callee_clobbers
+   = insn_callee_abi (insn).full_and_partial_reg_clobbers ();
   hard_reg_set_iterator hrsi;
-  EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, i, hrsi)
+  EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, i, hrsi)
{
  reg_stat_type *rsp;
 
+ /* ??? We could try to preserve some information from the last
+set of register I if the call doesn't actually clobber
+(reg:last_set_mode I), which might be true for ABIs with
+partial clobbers.  However, it would be difficult to
+update last_set_nonzero_bits and last_sign_bit_copies
+to account for the part of I that actually was clobbered.
+It wouldn't help much anyway, since we rarely see this
+situation before RA.  */
  rsp = ®_stat[i];
  rsp->last_set_invalid = 1;
  rsp->last_set = insn;

Re: [11/32] Remove global call sets: cse.c

2019-09-25 Thread Richard Sandiford

Richard Sandiford  writes:
> Like with the combine.c patch, this one keeps things simple by
> invalidating values in partially-clobbered registers, rather than
> trying to tell whether the value in a partially-clobbered register
> is actually clobbered or not.  Again, this is in principle a bug fix,
> but probably never matters in practice.

Similary to the combine patch, I've updated this to avoid the
short "abi" name and use a temporary HARD_REG_SET instead.

Richard


2019-09-25  Richard Sandiford  

gcc/
* cse.c: Include regs.h and function-abi.h.
(invalidate_for_call): Take the call insn as an argument.
Use insn_callee_abi to get the ABI of the call and invalidate
partially clobbered registers as well as fully clobbered ones.
(cse_insn): Update call accordingly.

Index: gcc/cse.c
===
--- gcc/cse.c   2019-09-17 15:27:11.338066929 +0100
+++ gcc/cse.c   2019-09-25 16:55:31.202641509 +0100
@@ -42,6 +42,8 @@ Software Foundation; either version 3, o
 #include "tree-pass.h"
 #include "dbgcnt.h"
 #include "rtl-iter.h"
+#include "regs.h"
+#include "function-abi.h"
 
 /* The basic idea of common subexpression elimination is to go
through the code, keeping a record of expressions that would
@@ -566,7 +568,6 @@ static void remove_invalid_subreg_refs (
machine_mode);
 static void rehash_using_reg (rtx);
 static void invalidate_memory (void);
-static void invalidate_for_call (void);
 static rtx use_related_value (rtx, struct table_elt *);
 
 static inline unsigned canon_hash (rtx, machine_mode);
@@ -2091,23 +2092,29 @@ rehash_using_reg (rtx x)
 }
 
 /* Remove from the hash table any expression that is a call-clobbered
-   register.  Also update their TICK values.  */
+   register in INSN.  Also update their TICK values.  */
 
 static void
-invalidate_for_call (void)
+invalidate_for_call (rtx_insn *insn)
 {
-  unsigned int regno, endregno;
-  unsigned int i;
+  unsigned int regno;
   unsigned hash;
   struct table_elt *p, *next;
   int in_table = 0;
   hard_reg_set_iterator hrsi;
 
-  /* Go through all the hard registers.  For each that is clobbered in
- a CALL_INSN, remove the register from quantity chains and update
+  /* Go through all the hard registers.  For each that might be clobbered
+ in call insn INSN, remove the register from quantity chains and update
  reg_tick if defined.  Also see if any of these registers is currently
- in the table.  */
-  EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, regno, hrsi)
+ in the table.
+
+ ??? We could be more precise for partially-clobbered registers,
+ and only invalidate values that actually occupy the clobbered part
+ of the registers.  It doesn't seem worth the effort though, since
+ we shouldn't see this situation much before RA.  */
+  HARD_REG_SET callee_clobbers
+= insn_callee_abi (insn).full_and_partial_reg_clobbers ();
+  EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, regno, hrsi)
 {
   delete_reg_equiv (regno);
   if (REG_TICK (regno) >= 0)
@@ -2132,15 +2139,11 @@ invalidate_for_call (void)
  || REGNO (p->exp) >= FIRST_PSEUDO_REGISTER)
continue;
 
- regno = REGNO (p->exp);
- endregno = END_REGNO (p->exp);
-
- for (i = regno; i < endregno; i++)
-   if (TEST_HARD_REG_BIT (regs_invalidated_by_call, i))
- {
-   remove_from_table (p, hash);
-   break;
- }
+ /* This must use the same test as above rather than the
+more accurate clobbers_reg_p.  */
+ if (overlaps_hard_reg_set_p (callee_clobbers, GET_MODE (p->exp),
+  REGNO (p->exp)))
+   remove_from_table (p, hash);
}
 }
 
@@ -5823,7 +5826,7 @@ cse_insn (rtx_insn *insn)
  if (GET_CODE (XEXP (tem, 0)) == USE
  && MEM_P (XEXP (XEXP (tem, 0), 0)))
invalidate (XEXP (XEXP (tem, 0), 0), VOIDmode);
-  invalidate_for_call ();
+  invalidate_for_call (insn);
 }
 
   /* Now invalidate everything set by this instruction.

Re: [PATCH] Retain TYPE_MODE more often for BIT_FIELD_REFs in get_inner_referece

2019-09-25 Thread Richard Biener

On September 25, 2019 5:29:55 PM GMT+02:00, Eric Botcazou 
 wrote:
>> The following patch relaxes the condition under which we force
>> VOIDmode by making all non-integral types where the extraction
>> size matches the type size (thus isn't "bitfieldish") use the
>> mode of the extraction type.
>
>Wouldn't TREE_CODE (TREE_TYPE (exp)) == VECTOR_TYPE be sufficient?  At
>least 
>this would still be in keeping with the comment...

For the PR it would be good enough. Though I wonder what the original reason 
for the mode handling was. Was it to avoid not naturally aligned modes for 
strict align targets? Or modes for non-mode size entities? 

Richard.

Re: [17/32] Remove global call sets: gcse.c

2019-09-25 Thread Richard Sandiford

Richard Sandiford  writes:
> This is another case in which we can conservatively treat partial
> kills as full kills.  Again this is in principle a bug fix for
> TARGET_HARD_REGNO_CALL_PART_CLOBBERED targets, but in practice
> it probably doesn't make a difference.

Similary to the combine patch, I've updated this to avoid the
short "abi" name and use a temporary HARD_REG_SET instead.

Richard


2019-09-25  Richard Sandiford  

gcc/
* gcse.c: Include function-abi.h.
(compute_hash_table_work): Use insn_callee_abi to get the ABI of
the call insn target.  Invalidate partially call-clobbered
registers as well as fully call-clobbered ones.

Index: gcc/gcse.c
===
--- gcc/gcse.c  2019-09-25 17:03:07.0 +0100
+++ gcc/gcse.c  2019-09-25 17:03:07.427363103 +0100
@@ -160,6 +160,7 @@ Software Foundation; either version 3, o
 #include "dbgcnt.h"
 #include "gcse.h"
 #include "gcse-common.h"
+#include "function-abi.h"
 
 /* We support GCSE via Partial Redundancy Elimination.  PRE optimizations
are a superset of those done by classic GCSE.
@@ -1528,8 +1529,13 @@ compute_hash_table_work (struct gcse_has
  if (CALL_P (insn))
{
  hard_reg_set_iterator hrsi;
- EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call,
- 0, regno, hrsi)
+
+ /* We don't track modes of hard registers, so we need
+to be conservative and assume that partial kills
+are full kills.  */
+ HARD_REG_SET callee_clobbers
+   = insn_callee_abi (insn).full_and_partial_reg_clobbers ();
+ EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, regno, hrsi)
record_last_reg_set_info (insn, regno);
 
  if (! RTL_CONST_OR_PURE_CALL_P (insn)

Re: [23/32] Remove global call sets: postreload-gcse.c

2019-09-25 Thread Richard Sandiford

Richard Sandiford  writes:
> This is another case in which we should conservatively treat
> partial kills as full kills.

Similary to the combine patch, I've updated this to avoid the
short "abi" name and use a temporary HARD_REG_SET instead.

Richard


2019-09-25  Richard Sandiford  

gcc/
* postreload-gcse.c: Include regs.h and function-abi.h.
(record_opr_changes): Use insn_callee_abi to get the ABI of the
call insn target.  Conservatively assume that partially-clobbered
registers are altered.

Index: gcc/postreload-gcse.c
===
--- gcc/postreload-gcse.c   2019-09-12 10:52:50.0 +0100
+++ gcc/postreload-gcse.c   2019-09-25 17:06:55.213726369 +0100
@@ -41,6 +41,8 @@ Software Foundation; either version 3, o
 #include "intl.h"
 #include "gcse-common.h"
 #include "gcse.h"
+#include "regs.h"
+#include "function-abi.h"
 
 /* The following code implements gcse after reload, the purpose of this
pass is to cleanup redundant loads generated by reload and other
@@ -772,7 +774,11 @@ record_opr_changes (rtx_insn *insn)
 {
   unsigned int regno;
   hard_reg_set_iterator hrsi;
-  EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, regno, hrsi)
+  /* We don't track modes of hard registers, so we need to be
+conservative and assume that partial kills are full kills.  */
+  HARD_REG_SET callee_clobbers
+   = insn_callee_abi (insn).full_and_partial_reg_clobbers ();
+  EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, regno, hrsi)
record_last_reg_set_info_regno (insn, regno);
 
   if (! RTL_CONST_OR_PURE_CALL_P (insn))

Re: [SVE] PR86753

2019-09-25 Thread Prathamesh Kulkarni

On Mon, 16 Sep 2019 at 08:54, Prathamesh Kulkarni
 wrote:
>
> On Mon, 9 Sep 2019 at 09:36, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 9 Sep 2019 at 16:45, Richard Sandiford
> >  wrote:
> > >
> > > Prathamesh Kulkarni  writes:
> > > > With patch, the only following FAIL remains for aarch64-sve.exp:
> > > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> > > > scan-assembler-times \\tmovprfx\\t 6
> > > > which now contains 14.
> > > > Should I adjust the test, assuming the change isn't a regression ?
> > >
> > > Well, it is kind-of a regression, but it really just means that the
> > > integer code is now consistent with the floating-point code in having
> > > an unnecessary MOVPRFX.  So I think adjusting the count is fine.
> > > Presumably any future fix for the existing redundant MOVPRFXs will
> > > apply to the new ones as well.
> > >
> > > The patch looks good to me, just some very minor nits:
> > >
> > > > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type)
> > > >
> > > >  /* Record that a fully-masked version of LOOP_VINFO would need MASKS to
> > > > contain a sequence of NVECTORS masks that each control a vector of 
> > > > type
> > > > -   VECTYPE.  */
> > > > +   VECTYPE. SCALAR_MASK if non-null, represents the mask used for 
> > > > corresponding
> > > > +   load/store stmt.  */
> > >
> > > Should be two spaces between sentences.  Maybe:
> > >
> > >VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would AND
> > >these vector masks with the vector version of SCALAR_MASK.  */
> > >
> > > since the mask isn't necessarily for a load or store statement.
> > >
> > > > [...]
> > > > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, 
> > > > tree, stmt_vec_info,
> > > > says how the load or store is going to be implemented and GROUP_SIZE
> > > > is the number of load or store statements in the containing group.
> > > > If the access is a gather load or scatter store, GS_INFO describes
> > > > -   its arguments.
> > > > +   its arguments. SCALAR_MASK is the scalar mask used for corresponding
> > > > +   load or store stmt.
> > >
> > > Maybe:
> > >
> > >its arguments.  If the load or store is conditional, SCALAR_MASK is the
> > >condition under which it occurs.
> > >
> > > since SCALAR_MASK can be null here too.
> > >
> > > > [...]
> > > > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > > > gimple_stmt_iterator *gsi,
> > > >/* Handle cond expr.  */
> > > >for (j = 0; j < ncopies; j++)
> > > >  {
> > > > +  tree loop_mask = NULL_TREE;
> > > > +  bool swap_cond_operands = false;
> > > > +
> > > > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > > > + {
> > > > +   scalar_cond_masked_key cond (cond_expr, ncopies);
> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > + {
> > > > +   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > > > vectype, j);
> > > > + }
> > > > +   else
> > > > + {
> > > > +   cond.code = invert_tree_comparison (cond.code,
> > > > +   HONOR_NANS (TREE_TYPE 
> > > > (cond.op0)));
> > >
> > > Long line.  Maybe just split it out into a separate assignment:
> > >
> > >   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > >   cond.code = invert_tree_comparison (cond.code, honor_nans);
> > >
> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > + {
> > > > +   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > > > vectype, j);
> > >
> > > Long line here too.
> > >
> > > > [...]
> > > > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info 
> > > > stmt_info, gimple_stmt_iterator *gsi,
> > > >   }
> > > >   }
> > > >   }
> > > > +
> > > > +   if (loop_mask)
> > > > + {
> > > > +   if (COMPARISON_CLASS_P (vec_compare))
> > > > + {
> > > > +   tree tmp = make_ssa_name (vec_cmp_type);
> > > > +   gassign *g = gimple_build_assign (tmp,
> > > > + TREE_CODE 
> > > > (vec_compare),
> > > > + TREE_OPERAND 
> > > > (vec_compare, 0),
> > > d> +TREE_OPERAND 
> > > (vec_compare, 1));
> > >
> > > Two long lines.
> > >
> > > > +   vect_finish_stmt_generation (stmt_info, g, gsi);
> > > > +   vec_compare = tmp;
> > > > + }
> > > > +
> > > > +   tree tmp2 = make_ssa_name (vec_cmp_type);
> > > > +   gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, 
> > > > vec_compare, loop_mask);
> > >
> >

Make ira call df_set_regs_ever_live for extra call-clobbered regs

2019-09-25 Thread Richard Sandiford

[This follows on from:
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00778.html]

If we support multiple ABIs in the same translation unit, it can
sometimes be the case that a callee clobbers more registers than
its caller is allowed to.  We need to call df_set_regs_ever_live
on these extra registers so that the prologue and epilogue code
can handle them appropriately.

This patch does that in IRA.  I wanted to avoid another full
instruction walk just for this, so I combined it with the existing
set_paradoxical_subreg walk.  This happens before the first
calculation of elimination offsets.

Tested on aarch64-linux-gnu (where with later patches it helps the
vector PCS) and x86_64-linux-gnu (where it's a no-op).  OK to install?

Thanks,
Richard


2019-09-25  Richard Sandiford  

gcc/
* function-abi.h (function_abi_aggregator): New class.
* function-abi.cc (function_abi_aggregator::caller_save_regs): New
function.
* ira.c (update_equiv_regs_prescan): New function.  Call
set_paradoxical_subreg here rather than...
(update_equiv_regs): ...here.
(ira): Call update_equiv_regs_prescan.

Index: gcc/function-abi.h
===
--- gcc/function-abi.h  2019-09-25 17:05:18.454421613 +0100
+++ gcc/function-abi.h  2019-09-25 17:17:22.893216639 +0100
@@ -208,6 +208,27 @@ const size_t NUM_ABI_IDS = 8;
   HARD_REG_SET m_mask;
 };
 
+/* This class collects information about the ABIs of functions that are
+   called in a particular region of code.  It is mostly intended to be
+   used as a local variable during an IR walk.  */
+class function_abi_aggregator
+{
+public:
+  function_abi_aggregator () : m_abi_clobbers () {}
+
+  /* Record that the code region calls a function with the given ABI.  */
+  void
+  note_callee_abi (const function_abi &abi)
+  {
+m_abi_clobbers[abi.id ()] |= abi.full_and_partial_reg_clobbers ();
+  }
+
+  HARD_REG_SET caller_save_regs (const function_abi &) const;
+
+private:
+  HARD_REG_SET m_abi_clobbers[NUM_ABI_IDS];
+};
+
 struct target_function_abi_info
 {
   /* An array of all the target ABIs that are available in this
Index: gcc/function-abi.cc
===
--- gcc/function-abi.cc 2019-09-25 17:05:18.454421613 +0100
+++ gcc/function-abi.cc 2019-09-25 17:17:22.893216639 +0100
@@ -126,6 +126,42 @@ predefined_function_abi::add_full_reg_cl
 SET_HARD_REG_BIT (m_mode_clobbers[i], regno);
 }
 
+/* Return the set of registers that the caller of the recorded functions must
+   save in order to honor the requirements of CALLER_ABI.  */
+
+HARD_REG_SET
+function_abi_aggregator::
+caller_save_regs (const function_abi &caller_abi) const
+{
+  HARD_REG_SET result;
+  CLEAR_HARD_REG_SET (result);
+  for (unsigned int abi_id = 0; abi_id < NUM_ABI_IDS; ++abi_id)
+{
+  const predefined_function_abi &callee_abi = function_abis[abi_id];
+
+  /* Skip cases that clearly aren't problematic.  */
+  if (abi_id == caller_abi.id ()
+ || hard_reg_set_empty_p (m_abi_clobbers[abi_id]))
+   continue;
+
+  /* Collect the set of registers that can be "more clobbered" by
+CALLEE_ABI than by CALLER_ABI.  */
+  HARD_REG_SET extra_clobbers;
+  CLEAR_HARD_REG_SET (extra_clobbers);
+  for (unsigned int i = 0; i < NUM_MACHINE_MODES; ++i)
+   {
+ machine_mode mode = (machine_mode) i;
+ extra_clobbers |= (callee_abi.mode_clobbers (mode)
+& ~caller_abi.mode_clobbers (mode));
+   }
+
+  /* Restrict it to the set of registers that we actually saw
+clobbers for (e.g. taking -fipa-ra into account).  */
+  result |= (extra_clobbers & m_abi_clobbers[abi_id]);
+}
+  return result;
+}
+
 /* Return the set of registers that cannot be used to hold a value of
mode MODE across the calls in a region described by ABIS and MASK, where:
 
Index: gcc/ira.c
===
--- gcc/ira.c   2019-09-25 17:05:18.458421582 +0100
+++ gcc/ira.c   2019-09-25 17:17:22.897216612 +0100
@@ -3362,6 +3362,37 @@ def_dominates_uses (int regno)
   return true;
 }
 
+/* Scan the instructions before update_equiv_regs.  Record which registers
+   are referenced as paradoxical subregs.  Also check for cases in which
+   the current function needs to save a register that one of its call
+   instructions clobbers.
+
+   These things are logically unrelated, but it's more efficient to do
+   them together.  */
+
+static void
+update_equiv_regs_prescan (void)
+{
+  basic_block bb;
+  rtx_insn *insn;
+  function_abi_aggregator callee_abis;
+
+  FOR_EACH_BB_FN (bb, cfun)
+FOR_BB_INSNS (bb, insn)
+  if (NONDEBUG_INSN_P (insn))
+   {
+ set_paradoxical_subreg (insn);
+ if (CALL_P (insn))
+   callee_abis.note_callee_abi (insn_callee_abi (insn));
+   }
+
+  HARD_REG_SET extra_caller_saves = callee_a

Re: [SVE] PR91532

2019-09-25 Thread Prathamesh Kulkarni

On Thu, 19 Sep 2019 at 10:30, Richard Biener  wrote:
>
> On Thu, 19 Sep 2019, Prathamesh Kulkarni wrote:
>
> > Hi,
> > For PR91532, the dead store is trivially deleted if we place dse pass
> > between ifcvt and vect. Would it be OK to add another instance of dse there 
> > ?
> > Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
> > will clean up the dead store ?
>
> No, the issue is the same as PR33315 and exists on the non-vectorized
> code as well.
Oh OK, thanks for pointing out.

Thanks,
Prathamesh
>
> Richard.

Re: [SVE] PR91532

2019-09-25 Thread Prathamesh Kulkarni

On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
>
> On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> > Hi,
> > For PR91532, the dead store is trivially deleted if we place dse pass
> > between ifcvt and vect. Would it be OK to add another instance of dse there 
> > ?
> > Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
> > will clean up the dead store ?
> I'd hesitate to add another DSE pass.  If there's one nearby could we
> move the existing pass?
Well I think the nearest one is just after pass_warn_restrict. Not
sure if it's a good
idea to move it up from there ?

Thanks,
Prathamesh
>
>
> Jeff

[ping][PATCH][MSP430] Don't generate 430X insns when handling data in the lower memory region

2019-09-25 Thread Jozef Lawrynowicz

ping

On Wed, 11 Sep 2019 11:25:58 +0100
Jozef Lawrynowicz  wrote:

> The MSP430 target has a "430X" extension which increases the directly
> addressable memory range from 64KB (16-bit) to 1MB (20-bit).
> This 1MB memory range is split into a "lower" region (below address 0x1)
> and "upper" region (at or above address 0x1).
> When data in the upper region is addressed, 430 instructions cannot be used, 
> as
> their 16-bit capability will be exceeded; 430X instructions must be used
> instead. Most 430X instructions require an additional word of op-code, and 
> also
> require more cycles to execute compared to their 430 equivalent.
> 
> Currently, when the large memory model is specified (-mlarge), 430X 
> instructions
> will always be used when addressing a symbol_ref using the absolute addressing
> mode e.g. MOVX #1, &foo.
> The attached patch modifies code generation so that 430X instructions will 
> only
> be used when the symbol being addressed will not be placed in the lower memory
> region. This is determined by checking if -mdata-region=lower (the new 
> default)
> is passed, or if the "lower" attribute is set on the variable.
> 
> Since code will be generated to assume all variables are in the lower memory
> region with -mdata-region=lower, object files built with this option cannot
> be linked with objects files built with other -mdata-region= values.
> To facilitate the checking of this, a patch for binutils (to be submitted
> after this is accepted) is also attached.
> 
> The compiler will now generate assembler directives indicating the ISA, memory
> model and data region the source file was compiled with. The assembler will
> check these directives match the options it has been invoked with, and then
> add the attribute to the object file.
> 
> Successfully regtested for msp430-elf in the small and large memory model, and
> with -mdata-region=upper. Testing with -mdata-region=upper should expose any
> cases where a 430X instruction is not used where it is required, since all 
> data
> is forced into the upper region so a lack of 430X insn would cause a 
> relocation
> overflow. In fact the attached patch fixes some relocation overflows by adding
> missing "%X" operand selectors to some insns. One relocation overflow remains
> (pr65077.c), but that is a separate binutils issue.
> 
> Ok for trunk?

>From 91371f9a2721e1459429ff7ebdb258b2ef063b04 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Wed, 14 Aug 2019 13:25:03 +0100
Subject: [PATCH] MSP430: Only generate 430X instructions with -mlarge if data
 will be in the upper region

gcc/ChangeLog:

2019-09-11  Jozef Lawrynowicz  

	* config.in: Regenerate.
	* config/msp430/constraints.md: Fix docstring for "Ys" constraint.
	Add new "Yx" constraint.
	* config/msp430/driver-msp430.c (msp430_propagate_region_opt): New spec
	function.
	* config/msp430/msp430-protos.h (msp430_op_not_in_high_mem): New
	prototype.
	* config/msp430/msp430.c (msp430_option_override): Allow the lower
	code/data region to be selected in the small memory model.
	(msp430_section_attr): Don't warn if the "section" and "lower"
	attributes are used together.
	(msp430_handle_generic_attribute): Likewise.
	(msp430_var_in_low_mem): New function.
	(TARGET_ENCODE_SECTION_INFO): Define.
	(msp430_encode_section_info): New function.
	(gen_prefix): Return early in the small memory model.
	Require TARGET_USE_LOWER_REGION_PREFIX to be set before adding the
	".lower" prefix if -m{code,data}-region=lower have been passed.
	(msp430_output_aligned_decl_common): Emit common symbols when
	-mdata-region=lower is passed unless TARGET_USE_LOWER_REGION_PREFIX is
	set. 
	(TARGET_ASM_FILE_END): Define.
	(msp430_file_end): New function.
	(msp430_do_not_relax_short_jumps): Allow relaxation when
	function will be in the lower region.
	(msp430_op_not_in_high_mem): New function.
	(msp430_print_operand): Check "msp430_op_not_in_high_mem" for
	the 'X' operand selector. 
	Clarify comment for 'x' operand selector.
	* config/msp430/msp430.h (LINK_SPEC): Propagate
	-m{code,data}-region to the linker via spec function
	msp430_propagate_region_opt.
	(msp430_propagate_region_opt): New prototype.
	(EXTRA_SPEC_FUNCTIONS): Add msp430_propagate_region_opt.
	(SYMBOL_FLAG_LOW_MEM): Define.
	* config/msp430/msp430.md (addsipsi3): Add missing "%X" operand
	selector.
	(zero_extendqihi2): Fix operand number used by "%X" selector.
	(zero_extendqisi2): Likewise.
	(zero_extendhisi2): Likewise.
	(movqi): Use "Yx" constraint in place of "%X" operand selector.
	(movhi): Likewise.
	(addqi3): Likewise.
	(addhi3): Likewise.
	(addsi3): Likewise.
	(addhi3_cy): Likewise.
	(addchi4_cy): Likewise.
	(subqi3): Likewise.
	(subhi3): Likewise.
	(subsi3): Likewise.
	(bic3): Likewise.
	(and3): Likewise.
	(ior3): Likewise.
	(xor3): Likewise.
	(slli_1): Add missing "%X" operand selector.
	(slll_1): Likewise.
	(slll_2): Likewise.
	(srai_1): Likewise.
	(sral_1): Likewise.
	(sral_2): Likewise.
	(srli_1): Likewise.
	(srll_1): Likewise.
	(c

[AArch64] Allow shrink-wrapping of non-leaf vector PCS functions

2019-09-25 Thread Richard Sandiford

[This follows on from:
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00778.html
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01456.html]

With the function ABI stuff, we can now support shrink-wrapping of
non-leaf vector PCS functions.  This is particularly useful if the
vector PCS function calls an ordinary function on an error path,
since we can then keep the extra saves and restores specific to
that path too.

Tested on aarch64-linux-gnu.  OK to install?

Richard


2019-09-25  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
Delete.
* config/aarch64/aarch64.c (aarch64_components_for_bb): Check
whether the block calls a function that clobbers more registers
than the current function is allowed to.
(aarch64_use_simple_return_insn_p): Delete.
* config/aarch64/aarch64.md (simple_return): Remove condition.

gcc/testsuite/
* gcc.target/aarch64/torture/simd-abi-9.c: New test.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-09-21 13:56:09.08396 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-09-25 17:23:36.770504785 +0100
@@ -516,7 +516,6 @@ bool aarch64_split_dimode_const_store (r
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
-bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-09-25 17:05:44.898231605 +0100
+++ gcc/config/aarch64/aarch64.c2019-09-25 17:23:36.774504754 +0100
@@ -5976,13 +5976,30 @@ aarch64_components_for_bb (basic_block b
   sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1);
   bitmap_clear (components);
 
+  /* Clobbered registers don't generate values in any meaningful sense,
+ since nothing after the clobber can rely on their value.  And we can't
+ say that partially-clobbered registers are unconditionally killed,
+ because whether they're killed or not depends on the mode of the
+ value they're holding.  Thus partially call-clobbered registers
+ appear in neither the kill set nor the gen set.
+
+ Check manually for any calls that clobber more of a register than the
+ current function can.  */
+  function_abi_aggregator callee_abis;
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+if (CALL_P (insn))
+  callee_abis.note_callee_abi (insn_callee_abi (insn));
+  HARD_REG_SET extra_caller_saves = callee_abis.caller_save_regs (*crtl->abi);
+
   /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
   for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
 if ((!call_used_or_fixed_reg_p (regno)
|| (simd_function && FP_SIMD_SAVED_REGNUM_P (regno)))
-   && (bitmap_bit_p (in, regno)
-  || bitmap_bit_p (gen, regno)
-  || bitmap_bit_p (kill, regno)))
+   && (TEST_HARD_REG_BIT (extra_caller_saves, regno)
+   || bitmap_bit_p (in, regno)
+   || bitmap_bit_p (gen, regno)
+   || bitmap_bit_p (kill, regno)))
   {
unsigned regno2, offset, offset2;
bitmap_set_bit (components, regno);
@@ -6648,19 +6665,6 @@ aarch64_use_return_insn_p (void)
   return known_eq (cfun->machine->frame.frame_size, 0);
 }
 
-/* Return false for non-leaf SIMD functions in order to avoid
-   shrink-wrapping them.  Doing this will lose the necessary
-   save/restore of FP registers.  */
-
-bool
-aarch64_use_simple_return_insn_p (void)
-{
-  if (aarch64_simd_decl_p (cfun->decl) && !crtl->is_leaf)
-return false;
-
-  return true;
-}
-
 /* Generate the epilogue instructions for returning from a function.
This is almost exactly the reverse of the prolog sequence, except
that we need to insert barriers to avoid scheduling loads that read
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   2019-09-09 17:02:47.0 +0100
+++ gcc/config/aarch64/aarch64.md   2019-09-25 17:23:36.774504754 +0100
@@ -799,7 +799,7 @@ (define_expand "return"
 
 (define_insn "simple_return"
   [(simple_return)]
-  "aarch64_use_simple_return_insn_p ()"
+  ""
   "ret"
   [(set_attr "type" "branch")]
 )
Index: gcc/testsuite/gcc.target/aarch64/torture/simd-abi-9.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/torture/simd-abi-9.c   2019-09-25 
17:23:36.774504754 +0100
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-fshrink-wrap -ffat-lto-objects" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+/* { dg-final { check-function-bodies "

[AArch64] Make more use of function_abi

2019-09-25 Thread Richard Sandiford

[This follows on from:
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00778.html
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01456.html]

This patch makes more use of the function_abi infrastructure.
We can then avoid checking specifically for the vector PCS in
a few places, and can test it more directly otherwise.

Specifically: we no longer need to call df_set_regs_ever_live
for the extra call-saved registers, since IRA now does that for us.
We also don't need to handle the vector PCS specially in
aarch64_epilogue_uses, because DF now marks the registers
as live on exit.

Tested on aarch64-linux-gnu.  OK to install?

Richard


2019-09-25  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_layout_frame): Use crtl->abi
to test whether we're compiling a vector PCS function and to test
whether the function needs to save a particular register.
Remove the vector PCS handling of df_set_regs_ever_live.
(aarch64_components_for_bb): Use crtl->abi to test whether
the function needs to save a particular register.
(aarch64_process_components): Use crtl->abi to test whether
we're compiling a vector PCS function.
(aarch64_expand_prologue, aarch64_expand_epilogue): Likewise.
(aarch64_epilogue_uses): Remove handling of vector PCS functions.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-09-25 17:23:36.774504754 +0100
+++ gcc/config/aarch64/aarch64.c2019-09-25 17:26:29.433252750 +0100
@@ -5334,7 +5334,7 @@ aarch64_layout_frame (void)
 {
   HOST_WIDE_INT offset = 0;
   int regno, last_fp_reg = INVALID_REGNUM;
-  bool simd_function = aarch64_simd_decl_p (cfun->decl);
+  bool simd_function = (crtl->abi->id () == ARM_PCS_SIMD);
 
   cfun->machine->frame.emit_frame_chain = aarch64_needs_frame_chain ();
 
@@ -5348,17 +5348,6 @@ #define SLOT_REQUIRED (-1)
   cfun->machine->frame.wb_candidate1 = INVALID_REGNUM;
   cfun->machine->frame.wb_candidate2 = INVALID_REGNUM;
 
-  /* If this is a non-leaf simd function with calls we assume that
- at least one of those calls is to a non-simd function and thus
- we must save V8 to V23 in the prologue.  */
-
-  if (simd_function && !crtl->is_leaf)
-{
-  for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
-   if (FP_SIMD_SAVED_REGNUM_P (regno))
- df_set_regs_ever_live (regno, true);
-}
-
   /* First mark all the registers that really need to be saved...  */
   for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
 cfun->machine->frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
@@ -5375,14 +5364,15 @@ #define SLOT_REQUIRED (-1)
   /* ... and any callee saved register that dataflow says is live.  */
   for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
+   && !fixed_regs[regno]
&& (regno == R30_REGNUM
-   || !call_used_or_fixed_reg_p (regno)))
+   || !crtl->abi->clobbers_full_reg_p (regno)))
   cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
 
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
-   && (!call_used_or_fixed_reg_p (regno)
-   || (simd_function && FP_SIMD_SAVED_REGNUM_P (regno
+   && !fixed_regs[regno]
+   && !crtl->abi->clobbers_full_reg_p (regno))
   {
cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
last_fp_reg = regno;
@@ -5971,7 +5961,6 @@ aarch64_components_for_bb (basic_block b
   bitmap in = DF_LIVE_IN (bb);
   bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
   bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
-  bool simd_function = aarch64_simd_decl_p (cfun->decl);
 
   sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1);
   bitmap_clear (components);
@@ -5994,8 +5983,8 @@ aarch64_components_for_bb (basic_block b
 
   /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
   for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
-if ((!call_used_or_fixed_reg_p (regno)
-   || (simd_function && FP_SIMD_SAVED_REGNUM_P (regno)))
+if (!fixed_regs[regno]
+   && !crtl->abi->clobbers_full_reg_p (regno)
&& (TEST_HARD_REG_BIT (extra_caller_saves, regno)
|| bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)
@@ -6100,7 +6089,7 @@ aarch64_process_components (sbitmap comp
 mergeable with the current one into a pair.  */
   if (!satisfies_constraint_Ump (mem)
  || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
- || (aarch64_simd_decl_p (cfun->decl) && FP_REGNUM_P (regno))
+ || (crtl->abi->id () == ARM_PCS_SIMD && FP_REGNUM_P (regno))
  || maybe_ne ((offset2 - cfun->machine->frame.reg_offset[regno]),
   GET_MODE_SIZE (mode)))
{
@@ -6432,8 +6421,6 @@ aarch64_epilogue_uses (int regno)
 {
   if (regno == LR_REGNUM)
return 1;
-

Re: [10/32] Remove global call sets: combine.c

2019-09-25 Thread Segher Boessenkool

On Wed, Sep 25, 2019 at 04:52:14PM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > On Thu, Sep 12, 2019 at 08:51:59AM +0100, Richard Sandiford wrote:
> >> Segher Boessenkool  writes:
> >> > It is not such a great name like that.  Since its children are
> >> > very_long_names, it doesn't need to be only three chars itself,
> >> > either?
> >> 
> >> OK, what name would you prefer?
> >
> > Maybe call_abi is a good name?  It's difficult to capture the subtleties
> > in a short enough name.  As always :-)
> 
> The formatting ended up being a bit weird with a longer name,
> so how about the attached instead?

That looks great, thanks!

> +   /* ??? We could try to preserve some information from the last
> +  set of register I if the call doesn't actually clobber
> +  (reg:last_set_mode I), which might be true for ABIs with
> +  partial clobbers.  However, it would be difficult to
> +  update last_set_nonzero_bits and last_sign_bit_copies
> +  to account for the part of I that actually was clobbered.
> +  It wouldn't help much anyway, since we rarely see this
> +  situation before RA.  */

I would like to completely get rid of reg_stat, and have known bits
dealt with by some DF thing instead...  It would work much better and
be much easier to use at the same time.  Also, other passes could use
it as well.

If I ever will find time to do this, I don't know :-/


Segher

Re: [PATCH] PR fortran/91426: Colorize %L text to match diagnostic_show_locus

2019-09-25 Thread Thomas Koenig


Hi David,


does this look sane?


Yes.

OK for trunk, and thanks a lot!

Regards

Thomas

[AArch64] Make call insns record the callee's arm_pcs

2019-09-25 Thread Richard Sandiford

[This follows on from:
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00778.html
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01456.html]

At the moment we rely on SYMBOL_REF_DECL to get the ABI of the callee
of a call insn, falling back to the default ABI if the decl isn't
available.  I think it'd be cleaner to attach the ABI directly to the
call instruction instead, which would also have the very minor benefit
of handling indirect calls more efficiently.

Tested on aarch64-linux-gnu.  OK to install?

Richard


2019-09-25  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_expand_call): Take an
extra callee_abi argument.
* config/aarch64/aarch64.c (aarch64_expand_call): Likewise.
Insert a CALLEE_ABI unspec into the call pattern as the second
element in the PARALLEL.
(aarch64_simd_call_p): Delete.
(aarch64_insn_callee_abi): Get the arm_pcs of the callee from
the new CALLEE_ABI element of the PARALLEL.
(aarch64_init_cumulative_args): Get the arm_pcs of the callee
from the function type, if given.
(aarch64_function_arg_advance): Handle ARM_PCS_SIMD.
(aarch64_function_arg): Likewise.  Return the arm_pcs of the callee
when passed the function_arg_info end marker.
(aarch64_output_mi_thunk): Pass the arm_pcs of the callee as the
final argument of gen_sibcall.
* config/aarch64/aarch64.md (UNSPEC_CALLEE_ABI): New unspec.
(call): Make operand 2 a const_int_operand and pass it to expand_call.
Wrap it in an UNSPEC_CALLEE_ABI unspec for the dummy define_expand
pattern.
(call_value): Likewise operand 3.
(sibcall): Likewise operand 2.  Place the unspec before rather than
after the return.
(sibcall_value): Likewise operand 3.
(*call_insn, *call_value_insn): Include an UNSPEC_CALLEE_ABI.
(tlsgd_small_, *tlsgd_small_): Likewise.
(*sibcall_insn, *sibcall_value_insn): Likewise.  Remove empty
constraint strings.
(untyped_call): Pass const0_rtx as the callee ABI to gen_call.

gcc/testsuite/
* gcc.target/aarch64/torture/simd-abi-10.c: New test.
* gcc.target/aarch64/torture/simd-abi-11.c: Likewise.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-09-25 17:23:36.770504785 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-09-25 17:31:04.663257639 +0100
@@ -452,7 +452,7 @@ bool aarch64_const_vec_all_same_in_range
 bool aarch64_constant_address_p (rtx);
 bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
-void aarch64_expand_call (rtx, rtx, bool);
+void aarch64_expand_call (rtx, rtx, rtx, bool);
 bool aarch64_expand_cpymem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_float_const_rtx_p (rtx);
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-09-25 17:31:04.0 +0100
+++ gcc/config/aarch64/aarch64.c2019-09-25 17:31:04.667257609 +0100
@@ -1872,37 +1872,17 @@ aarch64_reg_save_mode (tree fndecl, unsi
   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
-/* Return true if the instruction is a call to a SIMD function, false
-   if it is not a SIMD function or if we do not know anything about
-   the function.  */
-
-static bool
-aarch64_simd_call_p (const rtx_insn *insn)
-{
-  rtx symbol;
-  rtx call;
-  tree fndecl;
-
-  gcc_assert (CALL_P (insn));
-  call = get_call_rtx_from (insn);
-  symbol = XEXP (XEXP (call, 0), 0);
-  if (GET_CODE (symbol) != SYMBOL_REF)
-return false;
-  fndecl = SYMBOL_REF_DECL (symbol);
-  if (!fndecl)
-return false;
-
-  return aarch64_simd_decl_p (fndecl);
-}
-
 /* Implement TARGET_INSN_CALLEE_ABI.  */
 
 const predefined_function_abi &
 aarch64_insn_callee_abi (const rtx_insn *insn)
 {
-  if (aarch64_simd_call_p (insn))
-return aarch64_simd_abi ();
-  return default_function_abi;
+  rtx pat = PATTERN (insn);
+  gcc_assert (GET_CODE (pat) == PARALLEL);
+  rtx unspec = XVECEXP (pat, 0, 1);
+  gcc_assert (GET_CODE (unspec) == UNSPEC
+ && XINT (unspec, 1) == UNSPEC_CALLEE_ABI);
+  return function_abis[INTVAL (XVECEXP (unspec, 0, 0))];
 }
 
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
@@ -4847,10 +4827,11 @@ aarch64_layout_arg (cumulative_args_t pc
 aarch64_function_arg (cumulative_args_t pcum_v, const function_arg_info &arg)
 {
   CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
-  gcc_assert (pcum->pcs_variant == ARM_PCS_AAPCS64);
+  gcc_assert (pcum->pcs_variant == ARM_PCS_AAPCS64
+ || pcum->pcs_variant == ARM_PCS_SIMD);
 
   if (arg.end_marker_p ())
-return NULL_RTX;
+return gen_int_mode (pcum->pcs_variant, DImode);
 
   aarch64_layout_arg (pcum_v, arg.mode, arg.type, arg.name

Re: [PATCH][RFC] Add new ipa-reorder pass

2019-09-25 Thread Evgeny Kudryashov


On 2019-09-19 11:33, Martin Liška wrote:

Hi.

Function reordering has been around for quite some time and a naive
implementation was also part of my diploma thesis some time ago.
Currently, the GCC can reorder function based on first execution, which
happens with PGO and LTO of course. Known limitation is that the order
is preserved only partially as various symbols go into different
LTRANS partitions.

There has been some research in the area and I would point out the
Facebook paper
([1]) and Sony presentation ([2]). Based on that, I decided to make a
new implementation
in the GCC that does the same (in a proper way). First part of the
enablement are patches
to ld.bfd and ld.gold that come up with a new section .text.sorted,
that is always sorted.

Thoughts? I would definitely welcome any interesting measurement on a
bigger load.

Martin



Hi, Martin!

Some time ago I tried to do the same but didn't go that far.

I also used the C3 algorithm, except for the fact that ipa_fn_summary 
contains information about size and time (somehow missed it). The linker 
option --sort-section=name was used for prototyping. It, obviously, 
sorts sections and allows to place functions in the desired order (by 
placing them into named sections .text.sorted., without patching 
linkers or adjusting linker scripts). For testing my implementation I 
used several benchmarks from SPEC2006: perlbench, sjeng and gobmk. 
Unfortunately, no significant positive changes were obtained.


I've tested the proposed pass on perlbench with train and reference 
input (PGO+LTO as a base) but couldn't obtain stable results (still 
unclear environment or perlbench specific reasons).


Evgeny.

[AArch64] Use calls for SVE TLSDEC

2019-09-25 Thread Richard Sandiford

[This follows on from:
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00778.html
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01456.html
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01464.html]

One (unintended) side effect of the patches to support multiple
ABIs is that we can now represent tlsdesc calls as normal calls
on SVE targets.  This is likely to be handled more efficiently than
clobber_high, and for example fixes the long-standing failure in
gcc.target/aarch64/sve/tls_preserve_1.c.

Tested on aarch64-linux-gnu.  I'll apply if the dependencies above
are approved.

Richard


2019-09-25  Richard Sandiford  

gcc/
PR target/91452
* config/aarch64/aarch64.h (ARM_PCS_TLSDESC): New arm_pcs.
* config/aarch64/aarch64-protos.h (aarch64_tlsdesc_abi_id): Declare.
* config/aarch64/aarch64.c (aarch64_hard_regno_call_part_clobbered):
Handle ARM_PCS_TLSDESC.
(aarch64_tlsdesc_abi_id): New function.
* config/aarch64/aarch64.md (tlsdesc_small_sve_): Use a call
rtx instead of a list of clobbers and clobber_highs.
(tlsdesc_small_): Update accordingly.

Index: gcc/config/aarch64/aarch64.h
===
--- gcc/config/aarch64/aarch64.h2019-09-25 16:22:58.464627786 +0100
+++ gcc/config/aarch64/aarch64.h2019-09-25 17:34:36.077725643 +0100
@@ -784,6 +784,7 @@ enum arm_pcs
 {
   ARM_PCS_AAPCS64, /* Base standard AAPCS for 64 bit.  */
   ARM_PCS_SIMD,/* For aarch64_vector_pcs functions.  */
+  ARM_PCS_TLSDESC, /* For targets of tlsdesc calls.  */
   ARM_PCS_UNKNOWN
 };
 
Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2019-09-25 17:31:04.663257639 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2019-09-25 17:34:36.073725674 +0100
@@ -519,6 +519,7 @@ bool aarch64_use_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
+unsigned int aarch64_tlsdesc_abi_id ();
 enum aarch64_symbol_type aarch64_classify_symbol (rtx, HOST_WIDE_INT);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-09-25 17:31:04.667257609 +0100
+++ gcc/config/aarch64/aarch64.c2019-09-25 17:34:36.077725643 +0100
@@ -1896,12 +1896,13 @@ aarch64_hard_regno_call_part_clobbered (
 {
   if (FP_REGNUM_P (regno))
 {
-  bool simd_p = (abi_id == ARM_PCS_SIMD);
   poly_int64 per_register_size = GET_MODE_SIZE (mode);
   unsigned int nregs = hard_regno_nregs (regno, mode);
   if (nregs > 1)
per_register_size = exact_div (per_register_size, nregs);
-  return maybe_gt (per_register_size, simd_p ? 16 : 8);
+  if (abi_id == ARM_PCS_SIMD || abi_id == ARM_PCS_TLSDESC)
+   return maybe_gt (per_register_size, 16);
+  return maybe_gt (per_register_size, 8);
 }
   return false;
 }
@@ -13873,6 +13874,26 @@ aarch64_can_inline_p (tree caller, tree
   return true;
 }
 
+/* Return the ID of the TLDESC ABI, initializing the descriptor if hasn't
+   been already.  */
+
+unsigned int
+aarch64_tlsdesc_abi_id ()
+{
+  predefined_function_abi &tlsdesc_abi = function_abis[ARM_PCS_TLSDESC];
+  if (!tlsdesc_abi.initialized_p ())
+{
+  HARD_REG_SET full_reg_clobbers;
+  CLEAR_HARD_REG_SET (full_reg_clobbers);
+  SET_HARD_REG_BIT (full_reg_clobbers, R0_REGNUM);
+  SET_HARD_REG_BIT (full_reg_clobbers, CC_REGNUM);
+  for (int regno = P0_REGNUM; regno <= P15_REGNUM; ++regno)
+   SET_HARD_REG_BIT (full_reg_clobbers, regno);
+  tlsdesc_abi.initialize (ARM_PCS_TLSDESC, full_reg_clobbers);
+}
+  return tlsdesc_abi.id ();
+}
+
 /* Return true if SYMBOL_REF X binds locally.  */
 
 static bool
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   2019-09-25 17:31:04.667257609 +0100
+++ gcc/config/aarch64/aarch64.md   2019-09-25 17:34:36.081725616 +0100
@@ -6805,7 +6805,12 @@ (define_expand "tlsdesc_small_"
   "TARGET_TLS_DESC"
   {
 if (TARGET_SVE)
-  emit_insn (gen_tlsdesc_small_sve_ (operands[0]));
+  {
+   rtx abi = gen_int_mode (aarch64_tlsdesc_abi_id (), DImode);
+   rtx_insn *call
+ = emit_call_insn (gen_tlsdesc_small_sve_ (operands[0], abi));
+   RTL_CONST_CALL_P (call) = 1;
+  }
 else
   emit_insn (gen_tlsdesc_small_advsimd_ (operands[0]));
 DONE;
@@ -6827,67 +6832,20 @@ (define_insn "tlsdesc_small_advsimd_"
   [(set (reg:PTR R0_REGNUM)
-(unspec:PTR [(match_operand 0 "aarch64_valid_symref" "S")]
-   UNSPEC_TLSDESC))
+   (call (mem:DI (unspec:PTR
+   [(match_

Re: [PATCH] Retain TYPE_MODE more often for BIT_FIELD_REFs in get_inner_referece

2019-09-25 Thread Eric Botcazou

> For the PR it would be good enough. Though I wonder what the original reason
> for the mode handling was. Was it to avoid not naturally aligned modes for
> strict align targets? Or modes for non-mode size entities?

Bit-field extraction ultimately required integer modes before vector modes 
came to light so I think that preserving their original mode was useless.

-- 
Eric Botcazou

Remove clobber_high

2019-09-25 Thread Richard Sandiford

[This follows on from:
 https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01466.html]

The AArch64 SVE tlsdesc patterns were the main motivating reason
for clobber_high.  It's no longer needed now that the patterns use
calls instead.

At the time, one of the possible future uses for clobber_high was for
asm statements.  However, the current code wouldn't handle that case
without modification, so I think we might as well remove it for now.
We can always reapply it in future if it turns out to be useful again.

[Perhaps we should have a syntax for saying asms clobber the same
registers as calls, with a syntax for specifying a particular
ABI where necessary?]

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install
if the prerequisites are?

Richard


2019-09-25  Richard Sandiford  

gcc/
* rtl.def (CLOBBER_HIGH): Delete.
* doc/rtl.texi (clobber_high): Remove documentation.
* rtl.h (SET_DEST): Remove CLOBBER_HIGH from the list of codes.
(reg_is_clobbered_by_clobber_high): Delete.
(gen_hard_reg_clobber_high): Likewise.
* alias.c (record_set): Remove CLOBBER_HIGH handling.
* cfgexpand.c (expand_gimple_stmt): Likewise.
* combine-stack-adj.c (single_set_for_csa): Likewise.
* combine.c (find_single_use_1, set_nonzero_bits_and_sign_copies)
(can_combine_p, is_parallel_of_n_reg_sets, try_combine)
(record_dead_and_set_regs_1, reg_dead_at_p_1): Likewise.
* cse.c (invalidate_reg): Remove clobber_high parameter.
(invalidate): Update call accordingly.
(canonicalize_insn): Remove CLOBBER_HIGH handling.
(invalidate_from_clobbers, invalidate_from_sets_and_clobbers)
(count_reg_usage, insn_live_p): Likewise.
* cselib.h (cselib_invalidate_rtx): Remove sett argument.
* cselib.c (cselib_invalidate_regno, cselib_invalidate_rtx): Likewise.
(cselib_invalidate_rtx_note_stores): Update call accordingly.
(cselib_expand_value_rtx_1): Remove CLOBBER_HIGH handling.
(cselib_invalidate_regno, cselib_process_insn): Likewise.
* dce.c (deletable_insn_p, mark_nonreg_stores_1): Likewise.
(mark_nonreg_stores_2): Likewise.
* df-scan.c (df_find_hard_reg_defs, df_uses_record): Likewise.
(df_get_call_refs): Likewise.
* dwarf2out.c (mem_loc_descriptor): Likewise.
* emit-rtl.c (verify_rtx_sharing): Likewise.
(copy_insn_1, copy_rtx_if_shared_1): Likewise.
(hard_reg_clobbers_high, gen_hard_reg_clobber_high): Delete.
* genconfig.c (walk_insn_part): Remove CLOBBER_HIGH handling.
* genemit.c (gen_exp, gen_insn): Likewise.
* genrecog.c (validate_pattern, remove_clobbers): Likewise.
* haifa-sched.c (haifa_classify_rtx): Likewise.
* ira-build.c (create_insn_allocnos): Likewise.
* ira-costs.c (scan_one_insn): Likewise.
* ira.c (equiv_init_movable_p, memref_referenced_p): Likewise.
(rtx_moveable_p, interesting_dest_for_shprep): Likewise.
* jump.c (mark_jump_label_1): Likewise.
* lra-int.h (lra_insn_reg::clobber_high): Delete.
* lra-eliminations.c (lra_eliminate_regs_1): Remove CLOBBER_HIGH
handling.
(mark_not_eliminable): Likewise.
* lra-lives.c (process_bb_lives): Likewise.
* lra.c (new_insn_reg): Remove clobber_high parameter.
(collect_non_operand_hard_regs): Likewise.  Update call to new
insn_reg.  Remove CLOBBER_HIGH handling.
(lra_set_insn_recog_data): Remove CLOBBER_HIGH handling.  Update call
to collect_non_operand_hard_regs.
(add_regs_to_insn_regno_info): Remove CLOBBER_HIGH handling.
Update call to new_insn_reg.
(lra_update_insn_regno_info): Remove CLOBBER_HIGH handling.
* postreload.c (reload_cse_simplify, reload_combine_note_use)
(move2add_note_store): Likewise.
* print-rtl.c (print_pattern): Likewise.
* recog.c (store_data_bypass_p_1, store_data_bypass_p): Likewise.
(if_test_bypass_p): Likewise.
* regcprop.c (kill_clobbered_value, kill_set_value): Likewise.
* reginfo.c (reg_scan_mark_refs): Likewise.
* reload1.c (maybe_fix_stack_asms, eliminate_regs_1): Likewise.
(elimination_effects, mark_not_eliminable, scan_paradoxical_subregs)
(forget_old_reloads_1): Likewise.
* reorg.c (find_end_label, try_merge_delay_insns, redundant_insn)
(own_thread_p, fill_simple_delay_slots, fill_slots_from_thread)
(dbr_schedule): Likewise.
* resource.c (update_live_status, mark_referenced_resources)
(mark_set_resources): Likewise.
* rtl.c (copy_rtx): Likewise.
* rtlanal.c (reg_referenced_p, set_of_1, single_set_2, noop_move_p)
(note_pattern_stores): Likewise.
(reg_is_clobbered_by_clobber_high): Delete.
* sched-deps.c (sched_analyze_reg, sched_analyze_insn): Remove
CLOBBER_HIGH handling.

Index: gcc/rtl.

[PATCH] Remove some restrictions from rust-demangle.

2019-09-25 Thread Eduard-Mihai Burtescu

The main change here is in the treatment of $...$ escapes.
I've relaxed the treatment of unknown escapes, during
unescaping, to continue processing the input string,
leaving the remainder of current path segment as-is.
Relatedly, rust_is_mangled function doesn't check escapes
at all anymore (as unknown escapes aren't errors now).

E.g. "a$LT$b$X$c$GT$::d$C$e" would now be demangled to
"a
libiberty/ChangeLog:
* rust-demangle.c (looks_like_rust): Remove.
(rust_is_mangled): Don't check escapes.
(is_prefixed_hash): Allow 0-9a-f permutations.
(rust_demangle_sym): Don't bail on unknown escapes.
* testsuite/rust-demangle-expected: Update 'main::$99$' test.

diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c
index da591902db1..6b62e6dbd80 100644
--- a/libiberty/rust-demangle.c
+++ b/libiberty/rust-demangle.c
@@ -85,7 +85,6 @@ static const size_t hash_prefix_len = 3;
 static const size_t hash_len = 16;
 
 static int is_prefixed_hash (const char *start);
-static int looks_like_rust (const char *sym, size_t len);
 static int parse_lower_hex_nibble (char nibble);
 static char parse_legacy_escape (const char **in);
 
@@ -105,16 +104,13 @@ static char parse_legacy_escape (const char **in);
   negative (the rare Rust symbol is not demangled) so this sets
   the balance in favor of false negatives.
 
-   3. There must be no characters other than a-zA-Z0-9 and _.:$
-
-   4. There must be no unrecognized $-sign sequences.
-
-   5. There must be no sequence of three or more dots in a row ("...").  */
+   3. There must be no characters other than a-zA-Z0-9 and _.:$  */
 
 int
 rust_is_mangled (const char *sym)
 {
   size_t len, len_without_hash;
+  const char *end;
 
   if (!sym)
 return 0;
@@ -128,12 +124,22 @@ rust_is_mangled (const char *sym)
   if (!is_prefixed_hash (sym + len_without_hash))
 return 0;
 
-  return looks_like_rust (sym, len_without_hash);
+  end = sym + len_without_hash;
+
+  while (sym < end)
+{
+  if (*sym == '$' || *sym == '.' || *sym == '_' || *sym == ':'
+  || ISALNUM (*sym))
+sym++;
+  else
+return 0;
+}
+
+  return 1;
 }
 
 /* A hash is the prefix "::h" followed by 16 lowercase hex digits. The
-   hex digits must comprise between 5 and 15 (inclusive) distinct
-   digits.  */
+   hex digits must contain at least 5 distinct digits.  */
 
 static int
 is_prefixed_hash (const char *str)
@@ -162,28 +168,7 @@ is_prefixed_hash (const char *str)
 if (seen[i])
   count++;
 
-  return count >= 5 && count <= 15;
-}
-
-static int
-looks_like_rust (const char *str, size_t len)
-{
-  const char *end = str + len;
-
-  while (str < end)
-{
-  if (*str == '$')
-{
-  if (!parse_legacy_escape (&str))
-return 0;
-}
-  else if (*str == '.' || *str == '_' || *str == ':' || ISALNUM (*str))
-str++;
-  else
-return 0;
-}
-
-  return 1;
+  return count >= 5;
 }
 
 /*
@@ -215,8 +200,9 @@ rust_demangle_sym (char *sym)
   if (unescaped)
 *out++ = unescaped;
   else
-/* unexpected escape sequence, not looks_like_rust. */
-goto fail;
+/* unexpected escape sequence, skip the rest of this segment. */
+while (in < end && *in != ':')
+  *out++ = *in++;
 }
   else if (*in == '_')
 {
@@ -248,14 +234,14 @@ rust_demangle_sym (char *sym)
   else if (*in == ':' || ISALNUM (*in))
 *out++ = *in++;
   else
-/* unexpected character in symbol, not looks_like_rust.  */
-goto fail;
+{
+  /* unexpected character in symbol, not rust_is_mangled.  */
+  *out++ = '?'; /* This is pretty lame, but it's hard to do better. */
+  *out = '\0';
+  return;
+}
 }
-  goto done;
 
-fail:
-  *out++ = '?'; /* This is pretty lame, but it's hard to do better. */
-done:
   *out = '\0';
 }
 
diff --git a/libiberty/testsuite/rust-demangle-expected 
b/libiberty/testsuite/rust-demangle-expected
index c3b03f9f02d..74774794736 100644
--- a/libiberty/testsuite/rust-demangle-expected
+++ b/libiberty/testsuite/rust-demangle-expected
@@ -41,7 +41,7 @@ main::main::he714a2e23ed7db2g
 # $XX$ substitutions should not contain just numbers.
 --format=auto
 _ZN4main4$99$17he714a2e23ed7db23E
-main::$99$::he714a2e23ed7db23
+main::$99$
 # _ at start of path should be removed.
 # ".." translates to "::" "$GT$" to ">" and "$LT$" to "<".
 --format=rust

Re: [04/32] [x86] Robustify vzeroupper handling across calls

2019-09-25 Thread Uros Bizjak

On Wed, Sep 25, 2019 at 5:48 PM Richard Sandiford
 wrote:
>
> Ping
>
> Richard Sandiford  writes:
> > One of the effects of the function_abi series is to make -fipa-ra
> > work for partially call-clobbered registers.  E.g. if a call preserves
> > only the low 32 bits of a register R, we handled the partial clobber
> > separately from -fipa-ra, and so treated the upper bits of R as
> > clobbered even if we knew that the target function doesn't touch R.
> >
> > "Fixing" this caused problems for the vzeroupper handling on x86.
> > The pass that inserts the vzerouppers assumes that no 256-bit or 512-bit
> > values are live across a call unless the call takes a 256-bit or 512-bit
> > argument:
> >
> >   /* Needed mode is set to AVX_U128_CLEAN if there are
> >no 256bit or 512bit modes used in function arguments. */
> >
> > This implicitly relies on:
> >
> > /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The only ABI that
> >saves SSE registers across calls is Win64 (thus no need to check the
> >current ABI here), and with AVX enabled Win64 only guarantees that
> >the low 16 bytes are saved.  */
> >
> > static bool
> > ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
> >unsigned int regno, machine_mode mode)
> > {
> >   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
> > }
> >
> > The comment suggests that this code is only needed for Win64 and that
> > not testing for Win64 is just a simplification.  But in practice it was
> > needed for correctness on GNU/Linux and other targets too, since without
> > it the RA would be able to keep 256-bit and 512-bit values in SSE
> > registers across calls that are known not to clobber them.
> >
> > This patch conservatively treats calls as AVX_U128_ANY if the RA can see
> > that some SSE registers are not touched by a call.  There are then no
> > regressions if the ix86_hard_regno_call_part_clobbered check is disabled
> > for GNU/Linux (not something we should do, was just for testing).
> >
> > If in fact we want -fipa-ra to pretend that all functions clobber
> > SSE registers above 128 bits, it'd certainly be possible to arrange
> > that.  But IMO that would be an optimisation decision, whereas what
> > the patch is fixing is a correctness decision.  So I think we should
> > have this check even so.
>
> 2019-09-25  Richard Sandiford  
>
> gcc/
> * config/i386/i386.c: Include function-abi.h.
> (ix86_avx_u128_mode_needed): Treat function calls as AVX_U128_ANY
> if they preserve some 256-bit or 512-bit SSE registers.

OK.

Thanks,
Uros.

>
> Index: gcc/config/i386/i386.c
> ===
> --- gcc/config/i386/i386.c  2019-09-25 16:47:48.0 +0100
> +++ gcc/config/i386/i386.c  2019-09-25 16:47:49.089962608 +0100
> @@ -95,6 +95,7 @@ #define IN_TARGET_CODE 1
>  #include "i386-builtins.h"
>  #include "i386-expand.h"
>  #include "i386-features.h"
> +#include "function-abi.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -13511,6 +13512,15 @@ ix86_avx_u128_mode_needed (rtx_insn *ins
> }
> }
>
> +  /* If the function is known to preserve some SSE registers,
> +RA and previous passes can legitimately rely on that for
> +modes wider than 256 bits.  It's only safe to issue a
> +vzeroupper if all SSE registers are clobbered.  */
> +  const function_abi &abi = insn_callee_abi (insn);
> +  if (!hard_reg_set_subset_p (reg_class_contents[ALL_SSE_REGS],
> + abi.mode_clobbers (V4DImode)))
> +   return AVX_U128_ANY;
> +
>return AVX_U128_CLEAN;
>  }
>

[patch, fortran] PR 84487

2019-09-25 Thread Thomas Koenig


Hello world,

this patch makes sure that the __def_init variables, which have been
generated for normal allocatable arrays for quite some time, do not fill
up huge amounts of space in the object files with zeros. This is done by
not marking them read-only, which means that they are put into the BSS.

Setting DECL_ARTIFICIAL on the __def_init variable makes sure it
is handled as predetermined shared in gfc_omp_predetermined_sharing .

This is not an optimum solution. As the xfail shows, we are now missing
out on an optimization (as seen by the xfail that is now needed), and
having large all-zero variables seems wrong. However, this patch solves
the most urgent problem in this respect.

This is an 8/9/10 regression, so I would like to commit this to
all of these branches (waiting before gcc 9 reopens, of course).

I wold then close the PR and open an enchancement PR for the xfail
and the design improvement.

Test case... I'm not sure what to test for.

Regression-tested. OK for all affected branches?

Regards

Thomas

2019-09-25  Thomas Koenig 

PR fortran/84487
* trans-decl.c (gfc_get_symbol_decl): For __def_init, set
DECL_ARTIFICAL and do not set TREE_READONLY.

2019-09-25  Thomas Koenig 

PR fortran/84487
* gfortran.dg/typebound_call_22.f03: xfail.
Index: fortran/trans-decl.c
===
--- fortran/trans-decl.c	(Revision 275719)
+++ fortran/trans-decl.c	(Arbeitskopie)
@@ -1911,9 +1911,13 @@ gfc_get_symbol_decl (gfc_symbol * sym)
   if (sym->attr.associate_var)
 GFC_DECL_ASSOCIATE_VAR_P (decl) = 1;
 
+  /* We no longer mark __def_init as read-only so it does not take up
+ space in the read-only section and dan go into the BSS instead,
+ see PR 84487.  Marking this as artificial means that OpenMP will
+ treat this as predetermined shared.  */
   if (sym->attr.vtab
   || (sym->name[0] == '_' && gfc_str_startswith (sym->name, "__def_init")))
-TREE_READONLY (decl) = 1;
+DECL_ARTIFICIAL (decl) = 1;
 
   return decl;
 }
Index: testsuite/gfortran.dg/typebound_call_22.f03
===
--- testsuite/gfortran.dg/typebound_call_22.f03	(Revision 275713)
+++ testsuite/gfortran.dg/typebound_call_22.f03	(Arbeitskopie)
@@ -26,4 +26,4 @@ program test
   call x%bar ()
 end program
 
-! { dg-final { scan-tree-dump-times "base \\(\\);" 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "base \\(\\);" 1 "optimized" { xfail *-*-* } } }

Re: [PATCH] Help compiler detect invalid code

2019-09-25 Thread François Dumont

Some more tests have revealed  a small problem in is_sorted test. It was 
revealed by the Debug mode but not in a clean ways so for the moment I 
prefer to fix it and I'll add a _neg test when Debug is able to report 
it correctly.


I've also added a _neg test for equal which doesn't need Debug mode.

OK to commit ?

François



On 9/20/19 7:08 AM, François Dumont wrote:
I already realized that previous patch will be too controversial to be 
accepted.


In this new version I just implement a real memmove in __memmove so 
that in copy_backward there is no need for a shortcut to a more 
defensive code.


I'll see if in Debug mode I can do something.

François


On 9/19/19 10:27 PM, François Dumont wrote:

Hi

    I start working on making recently added constexpr tests to work 
in Debug mode.


    It appears that the compiler is able to detect code mistakes 
pretty well as long we don't try to hide the code intention with a 
defensive approach. This is why I'd like to propose to replace '__n > 
0' conditions with '__n != 0'.


    The result is demonstrated by the constexpr_neg.cc tests. What do 
you think ?


    * include/bits/stl_algobase.h (__memmove): Return _Tp*.
    (__memmove): Loop as long as __n is not 0.
    (__copy_move<>::__copy_m): Likewise.
    (__copy_move_backward<>::__copy_move_b): Likewise.
    * testsuite/25_algorithms/copy/constexpr.cc: Add check on copied 
values.

    * testsuite/25_algorithms/copy_backward/constexpr.cc: Likewise.
    * testsuite/25_algorithms/copy/constexpr_neg.cc: New.
    * testsuite/25_algorithms/copy_backward/constexpr.cc: New.

    I'll submit the patch to fix Debug mode depending on the decision 
for this one.


François





diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index 4eba053ac75..94a79b85d15 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -83,27 +83,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   template
 _GLIBCXX14_CONSTEXPR
-inline void*
-__memmove(_Tp* __dst, const _Tp* __src, size_t __num)
+inline void
+__memmove(_Tp* __dst, const _Tp* __src, ptrdiff_t __num)
 {
 #ifdef __cpp_lib_is_constant_evaluated
   if (std::is_constant_evaluated())
 	{
-	  for(; __num > 0; --__num)
+	  __dst += __num;
+	  __src += __num;
+	  for (; __num != 0; --__num)
 	{
 	  if constexpr (_IsMove)
-		*__dst = std::move(*__src);
+		*--__dst = std::move(*--__src);
 	  else
-		*__dst = *__src;
-	  ++__src;
-	  ++__dst;
+		*--__dst = *--__src;
 	}
-	  return __dst;
 	}
   else
 #endif
-	return __builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
-  return __dst;
+	__builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
 }
 
   /*
@@ -730,12 +728,6 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 			 && __is_pointer<_BI2>::__value
 			 && __are_same<_ValueType1, _ValueType2>::__value);
 
-#ifdef __cpp_lib_is_constant_evaluated
-  if (std::is_constant_evaluated())
-	return std::__copy_move_backward::__copy_move_b(__first, __last,
-			   __result);
-#endif
   return std::__copy_move_backward<_IsMove, __simple,
    _Category>::__copy_move_b(__first,
  __last,
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy/constexpr.cc b/libstdc++-v3/testsuite/25_algorithms/copy/constexpr.cc
index 67910b8773e..a0460603496 100644
--- a/libstdc++-v3/testsuite/25_algorithms/copy/constexpr.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/copy/constexpr.cc
@@ -24,12 +24,12 @@
 constexpr bool
 test()
 {
-  constexpr std::array ca0{{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}};
+  constexpr std::array ca0{{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}};
   std::array ma0{{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}};
 
   const auto out6 = std::copy(ca0.begin(), ca0.begin() + 8, ma0.begin() + 2);
 
-  return out6 == ma0.begin() + 10;
+  return out6 == ma0.begin() + 10 && *(ma0.begin() + 2) == 1 && *out6 == 0;
 }
 
 static_assert(test());
diff --git a/libstdc++-v3/testsuite/25_algorithms/copy/constexpr_neg.cc b/libstdc++-v3/testsuite/25_algorithms/copy/constexpr_neg.cc
new file mode 100644
index 000..49052467409
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/copy/constexpr_neg.cc
@@ -0,0 +1,38 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  I

[PATCH] FreeBSD PowerPC use secure-plt

2019-09-25 Thread Andreas Tobler


Hi all,

the attached patch makes use of the secure-plt for 32-bit PowerPC on 
FreeBSD 13 and upwards. The OS support will arrive in FreeBSD 13.0


I'd like to commit this patch to head and later to all open branches.

Comments appreciated!

If I do not get any, I'll commit in a few days.

TIA,
Andreas

--- UTC
Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 276112)
+++ gcc/config.gcc  (working copy)
@@ -2687,8 +2687,14 @@
tm_file="${tm_file} rs6000/default64.h rs6000/freebsd64.h"
tmake_file="${tmake_file} rs6000/t-freebsd64"
extra_options="${extra_options} rs6000/linux64.opt"
+   if test $fbsd_major -ge 13; then
+   tm_defines="${tm_defines} TARGET_FREEBSD32_SECURE_PLT=1"
+   fi
;;
 *)
+   if test $fbsd_major -ge 13; then
+   tm_file="rs6000/secureplt.h ${tm_file}"
+   fi
tm_file="${tm_file} rs6000/freebsd.h"
;;
esac
Index: gcc/config/rs6000/t-freebsd64
===
--- gcc/config/rs6000/t-freebsd64   (revision 276090)
+++ gcc/config/rs6000/t-freebsd64   (working copy)
@@ -27,3 +27,6 @@
 MULTILIB_EXCEPTIONS =
 MULTILIB_OSDIRNAMES= ../lib32
 
+SECURE_PLT = $(if $(findstring TARGET_FREEBSD32_SECURE_PLT=1, 
$(tm_defines)),msecure-plt)
+
+MULTILIB_EXTRA_OPTS += $(SECURE_PLT)

[PATCH 0/2] libada: Installation improvements

2019-09-25 Thread Maciej W. Rozycki

Hi,

 Here's a mini patch series that addresses a couple of long-standing 
installation issues observed with libada.  These have been verified by 
bootstrapping GCC with an `x86_64-linux-gnu' native configuration and 
using that compiler to build a `riscv-linux-gnu' cross-compiler, in both 
cases with and without the `--disable-version-specific-runtime-libs' 
configuration option used.  Also the resulting installed directory tree 
was examined for correct structure, and in particular unchanged in the 
absence of the option.

 OK to apply?

  Maciej

[PATCH 1/2] libada: Remove racy duplicate gnatlib installation

2019-09-25 Thread Maciej W. Rozycki

For some reason, presumably historical, the `install-gnatlib' target for 
the default multilib is invoked twice, once via the `ada.install-common' 
target in `gcc/ada/gcc-interface/Make-lang.in' invoked from gcc/ and 
again via the `install-libada' target in libada/.

Apart from doing the same twice this is actually harmful in sufficiently 
parallelized `make' invocation, as the removal of old files performed 
within the `install-gnatlib' recipe in the former case actually races 
with the installation of new files done in the latter case, causing the 
recipe to fail and abort, however non-fatally, having not completed the 
installation of all the built files needed for the newly-built compiler 
to work correctly.

This can be observed with a native `x86_64-linux-gnu' bootstrap:

make[4]: Entering directory '.../gcc/ada'
rm -rf .../lib/gcc/x86_64-linux-gnu/10.0.0/adalib
rm: cannot remove '.../lib/gcc/x86_64-linux-gnu/10.0.0/adalib': Directory not 
empty
make[4]: *** [gcc-interface/Makefile:512: install-gnatlib] Error 1
make[4]: Leaving directory '.../gcc/ada'
make[3]: *** [.../gcc/ada/gcc-interface/Make-lang.in:853: install-gnatlib] 
Error 2
make[2]: [.../gcc/ada/gcc-interface/Make-lang.in:829: ada.install-common] Error 
2 (ignored)

which then causes missing files to be reported when an attempt is made 
to use the newly-installed non-functional compiler to build a 
`riscv-linux-gnu' cross-compiler:

(cd ada/bldtools/sinfo; gnatmake -q xsinfo ; ./xsinfo sinfo.h )
error: "ada.ali" not found, "ada.ads" must be compiled
error: "s-memory.ali" not found, "s-memory.adb" must be compiled
gnatmake: *** bind failed.
/bin/sh: ./xsinfo: No such file or directory
make[2]: *** [.../gcc/ada/Make-generated.in:45: ada/sinfo.h] Error 127
make[2]: Leaving directory '.../gcc'
make[1]: *** [Makefile:4369: all-gcc] Error 2
make[1]: Leaving directory '...'
make: *** [Makefile:965: all] Error 2

Depending on timing `.../lib/gcc/x86_64-linux-gnu/10.0.0/adainclude' may
cause an installation failure instead and the resulting compiler may be 
non-functional in a different way.

Remove the extraneous `install-gnatlib' invocation from within gcc/ then 
as all the gnatlib handling ought to be done in libada/ nowadays.

gcc/ada/
* gcc-interface/Make-lang.in (ada.install-common): Remove 
`install-gnatlib' invocation.
---
 gcc/ada/gcc-interface/Make-lang.in |8 
 1 file changed, 8 deletions(-)

gcc-lang-no-install-gnatlib.diff
Index: gcc/gcc/ada/gcc-interface/Make-lang.in
===
--- gcc.orig/gcc/ada/gcc-interface/Make-lang.in
+++ gcc/gcc/ada/gcc-interface/Make-lang.in
@@ -840,14 +840,6 @@ doc/gnat-style.pdf: ada/gnat-style.texi
  $(INSTALL_PROGRAM) gnatdll$(exeext) 
$(DESTDIR)$(bindir)/gnatdll$(exeext); \
fi
 
-#
-# Finally, install the library
-#
-   -if [ -f gnat1$(exeext) ] ; \
-   then \
- $(MAKE) $(COMMON_FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) install-gnatlib; 
\
-   fi
-
 install-gnatlib:
$(MAKE) -C ada $(COMMON_FLAGS_TO_PASS) $(ADA_FLAGS_TO_PASS) 
install-gnatlib$(LIBGNAT_TARGET)

Re: [AArch64] Split built-in function codes into major and minor codes

2019-09-25 Thread James Greenhalgh

On Wed, Aug 07, 2019 at 08:28:50PM +0100, Richard Sandiford wrote:
> It was easier to add the SVE ACLE support without enumerating every
> function at build time.  This in turn meant that it was easier if the
> SVE builtins occupied a distinct numberspace from the existing AArch64
> ones, which *are* enumerated at build time.  This patch therefore
> divides the built-in functions codes into "major" and "minor" codes.
> At present the major code is just "general", but the SVE patch will add
> "SVE" as well.
> 
> Also, it was convenient to put the SVE ACLE support in its own file,
> so the patch makes aarch64.c provide the frontline target hooks directly,
> forwarding to the other files for the real work.
> 
> The reason for organising the files this way is that aarch64.c needs
> to define the target hook macros whatever happens, and having aarch64.c
> macros forward to aarch64-builtins.c functions and aarch64-bulitins.c
> functions forward to the SVE file seemed a bit indirect.  Doing things
> the way the patch does them puts aarch64-builtins.c and the SVE code on
> more of an equal footing.
> 
> The aarch64_(general_)gimple_fold_builtin change is mostly just
> reindentation.  I've attached a -b version of the diff as well.
> 
> Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf.
> OK to install when the ACLE patch itself is ready to install?

OK.

Thanks,
James

> 
> Richard
> 
> 
> 2019-08-07  Richard Sandiford  
> 
> gcc/
>   * config/aarch64/aarch64-protos.h (aarch64_builtin_class): New enum.
>   (AARCH64_BUILTIN_SHIFT, AARCH64_BUILTIN_CLASS): New constants.
>   (aarch64_gimple_fold_builtin, aarch64_mangle_builtin_type)
>   (aarch64_fold_builtin, aarch64_init_builtins, aarch64_expand_builtin):
>   (aarch64_builtin_decl, aarch64_builtin_rsqrt): Delete.
>   (aarch64_general_mangle_builtin_type, aarch64_general_init_builtins):
>   (aarch64_general_fold_builtin, aarch64_general_gimple_fold_builtin):
>   (aarch64_general_expand_builtin, aarch64_general_builtin_decl):
>   (aarch64_general_builtin_rsqrt): Declare.
>   * config/aarch64/aarch64-builtins.c (aarch64_general_add_builtin):
>   New function.
>   (aarch64_mangle_builtin_type): Rename to...
>   (aarch64_general_mangle_builtin_type): ...this.
>   (aarch64_init_fcmla_laneq_builtins, aarch64_init_simd_builtins)
>   (aarch64_init_crc32_builtins, aarch64_init_builtin_rsqrt)
>   (aarch64_init_pauth_hint_builtins, aarch64_init_tme_builtins): Use
>   aarch64_general_add_builtin instead of add_builtin_function.
>   (aarch64_init_builtins): Rename to...
>   (aarch64_general_init_builtins): ...this.  Use
>   aarch64_general_add_builtin instead of add_builtin_function.
>   (aarch64_builtin_decl): Rename to...
>   (aarch64_general_builtin_decl): ...this and remove the unused
>   arguments.
>   (aarch64_expand_builtin): Rename to...
>   (aarch64_general_expand_builtin): ...this and remove the unused
>   arguments.
>   (aarch64_builtin_rsqrt): Rename to...
>   (aarch64_general_builtin_rsqrt): ...this.
>   (aarch64_fold_builtin): Rename to...
>   (aarch64_general_fold_builtin): ...this.  Take the function subcode
>   and return type as arguments.  Remove the "ignored" argument.
>   (aarch64_gimple_fold_builtin): Rename to...
>   (aarch64_general_gimple_fold_builtin): ...this.  Take the function
>   subcode and gcall as arguments, and return the new function call.
>   * config/aarch64/aarch64.c (aarch64_init_builtins)
>   (aarch64_fold_builtin, aarch64_gimple_fold_builtin)
>   (aarch64_expand_builtin, aarch64_builtin_decl): New functions.
>   (aarch64_builtin_reciprocal): Call aarch64_general_builtin_rsqrt
>   instead of aarch64_builtin_rsqrt.
>   (aarch64_mangle_type): Call aarch64_general_mangle_builtin_type
>   instead of aarch64_mangle_builtin_type.
>

[PATCH 2/2] libada: Respect `--enable-version-specific-runtime-libs'

2019-09-25 Thread Maciej W. Rozycki

Respect the `--enable-version-specific-runtime-libs' configuration 
option in libada/, so that shared gnatlib libraries will be installed 
in non-version-specific $(toolexeclibdir) if requested.  In a 
cross-compilation environment this helps setting up a consistent 
sysroot, which can then be shared between the host and the target 
system.

Update the settings of $(toolexecdir) and $(toolexeclibdir), unused till 
now, to keep the current arrangement in the version-specific case and 
make the new option to be enabled by default, unlike with the other 
target libraries, so as to keep existing people's build infrastructure 
unaffected.

Of course if someone does use `--disable-version-specific-runtime-libs' 
already, then the installation location of shared gnatlib libraries will 
change, but presumably this is what they do want anyway as the current 
situation where the option is ignored in libada/ only is an anomaly 
really rather than one that is expected or desired.

gcc/ada/
* gcc-interface/Makefile.in (ADA_RTL_DSO_DIR): New variable.
(install-gnatlib): Use it in place of ADA_RTL_OBJ_DIR for shared 
library installation.

libada/
* Makefile.in (toolexecdir, toolexeclibdir): New variables.
(LIBADA_FLAGS_TO_PASS): Add `toolexeclibdir'.
* configure.ac: Add `--enable-version-specific-runtime-libs'.
Update version-specific `toolexecdir' and `toolexeclibdir' from 
ADA_RTL_OBJ_DIR from gcc/ada/gcc-interface/Makefile.in.
* configure: Regenerate.
---
 gcc/ada/gcc-interface/Makefile.in |7 ---
 libada/Makefile.in|3 +++
 libada/configure  |   25 ++---
 libada/configure.ac   |   20 +---
 4 files changed, 46 insertions(+), 9 deletions(-)

gcc-install-sysroot-gnatlib.diff
Index: gcc/gcc/ada/gcc-interface/Makefile.in
===
--- gcc.orig/gcc/ada/gcc-interface/Makefile.in
+++ gcc/gcc/ada/gcc-interface/Makefile.in
@@ -534,15 +534,15 @@ install-gnatlib: ../stamp-gnatlib-$(RTSD
for file in gnat gnarl; do \
   if [ -f $(RTSDIR)/lib$${file}$(hyphen)$(LIBRARY_VERSION)$(soext) ]; 
then \
  $(INSTALL) 
$(RTSDIR)/lib$${file}$(hyphen)$(LIBRARY_VERSION)$(soext) \
-$(DESTDIR)$(ADA_RTL_OBJ_DIR); \
+$(DESTDIR)$(ADA_RTL_DSO_DIR); \
   fi; \
   if [ -f $(RTSDIR)/lib$${file}$(soext) ]; then \
  $(LN_S) lib$${file}$(hyphen)$(LIBRARY_VERSION)$(soext) \
- $(DESTDIR)$(ADA_RTL_OBJ_DIR)/lib$${file}$(soext); \
+ $(DESTDIR)$(ADA_RTL_DSO_DIR)/lib$${file}$(soext); \
   fi; \
   if [ -d 
$(RTSDIR)/lib$${file}$(hyphen)$(LIBRARY_VERSION)$(soext).dSYM ]; then \
  $(CP) -r 
$(RTSDIR)/lib$${file}$(hyphen)$(LIBRARY_VERSION)$(soext).dSYM \
-   $(DESTDIR)$(ADA_RTL_OBJ_DIR); \
+   $(DESTDIR)$(ADA_RTL_DSO_DIR); \
   fi; \
done
 # This copy must be done preserving the date on the original file.
@@ -882,6 +882,7 @@ b_gnatm.o : b_gnatm.adb
 
 ADA_INCLUDE_DIR = $(libsubdir)/adainclude
 ADA_RTL_OBJ_DIR = $(libsubdir)/adalib
+ADA_RTL_DSO_DIR = $(toolexeclibdir)
 
 # Special flags
 
Index: gcc/libada/Makefile.in
===
--- gcc.orig/libada/Makefile.in
+++ gcc/libada/Makefile.in
@@ -38,6 +38,8 @@ target = @target@
 prefix = @prefix@
 
 # Nonstandard autoconf-set variables.
+toolexecdir = @toolexecdir@
+toolexeclibdir = @toolexeclibdir@
 enable_shared = @enable_shared@
 
 LN_S=@LN_S@
@@ -88,6 +90,7 @@ LIBADA_FLAGS_TO_PASS = \
 "TRACE=$(TRACE)" \
 "MULTISUBDIR=$(MULTISUBDIR)" \
 "libsubdir=$(libsubdir)" \
+"toolexeclibdir=$(toolexeclibdir)" \
 "objext=$(objext)" \
 "prefix=$(prefix)" \
 "exeext=.exeext.should.not.be.used " \
Index: gcc/libada/configure
===
--- gcc.orig/libada/configure
+++ gcc/libada/configure
@@ -702,6 +702,7 @@ ac_subst_files=''
 ac_user_opts='
 enable_option_checking
 with_build_libsubdir
+enable_version_specific_runtime_libs
 enable_maintainer_mode
 enable_multilib
 enable_shared
@@ -1325,6 +1326,9 @@ if test -n "$ac_init_help"; then
   --disable-option-checking  ignore unrecognized --enable/--with options
   --disable-FEATURE   do not include FEATURE (same as --enable-FEATURE=no)
   --enable-FEATURE[=ARG]  include FEATURE [ARG=yes]
+  --enable-version-specific-runtime-libs
+  specify that runtime libraries should be installed
+  in a compiler-specific directory
   --enable-maintainer-mode
   enable make rules and dependencies not useful (and
   sometimes confusing) to the casual installer
@@ -2215,6 +2219,22 @@ target_subdir=${target_noncanoni

Re: [PATCH][AArch64] Don't split 64-bit constant stores to volatile location

2019-09-25 Thread James Greenhalgh

On Tue, Sep 24, 2019 at 02:40:20PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> On 8/22/19 10:16 AM, Kyrill Tkachov wrote:
> > Hi all,
> >
> > The optimisation to optimise:
> >    typedef unsigned long long u64;
> >
> >    void bar(u64 *x)
> >    {
> >  *x = 0xabcdef10abcdef10;
> >    }
> >
> > from:
> >     mov x1, 61200
> >     movk    x1, 0xabcd, lsl 16
> >     movk    x1, 0xef10, lsl 32
> >     movk    x1, 0xabcd, lsl 48
> >     str x1, [x0]
> >
> > into:
> >     mov w1, 61200
> >     movk    w1, 0xabcd, lsl 16
> >     stp w1, w1, [x0]
> >
> > ends up producing two distinct stores if the destination is volatile:
> >   void bar(u64 *x)
> >   {
> >     *(volatile u64 *)x = 0xabcdef10abcdef10;
> >   }
> >     mov w1, 61200
> >     movk    w1, 0xabcd, lsl 16
> >     str w1, [x0]
> >     str w1, [x0, 4]
> >
> > because we end up not merging the strs into an stp. It's questionable 
> > whether the use of STP is valid for volatile in the first place.
> > To avoid unnecessary pain in a context where it's unlikely to be 
> > performance critical [1] (use of volatile), this patch avoids this
> > transformation for volatile destinations, so we produce the original 
> > single STR-X.
> >
> > Bootstrapped and tested on aarch64-none-linux-gnu.
> >
> > Ok for trunk (and eventual backports)?
> >
> This has been approved by James offline.
> 
> Committed to trunk with r276098.

Does this need backporting?

Thanks,
James

> 
> Thanks,
> 
> Kyrill
> 
> > Thanks,
> > Kyrill
> >
> > [1] 
> > https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/
> >
> >
> > gcc/
> > 2019-08-22  Kyrylo Tkachov 
> >
> >     * config/aarch64/aarch64.md (mov): Don't call
> >     aarch64_split_dimode_const_store on volatile MEM.
> >
> > gcc/testsuite/
> > 2019-08-22  Kyrylo Tkachov 
> >
> >     * gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.
> >

Re: [AArch64] Fix cost of (plus ... (const_int -C))

2019-09-25 Thread James Greenhalgh

On Mon, Sep 23, 2019 at 10:45:29AM +0100, Richard Sandiford wrote:
> The PLUS handling in aarch64_rtx_costs only checked for nonnegative
> constants, meaning that simple immediate subtractions like:
> 
>   (set (reg R1) (plus (reg R2) (const_int -8)))
> 
> had a cost of two instructions.
> 
> Tested on aarch64-linux-gnu (with and without SVE).  OK to install?

OK.

Thanks,
James

> 
> Richard
> 
> 
> 2019-09-23  Richard Sandiford  
> 
> gcc/
>   * config/aarch64/aarch64.c (aarch64_rtx_costs): Use
>   aarch64_plus_immediate rather than aarch64_uimm12_shift
>   to test for valid PLUS immediates.
>

Re: [PATCH] FreeBSD PowerPC use secure-plt

2019-09-25 Thread Segher Boessenkool

Hi!

On Wed, Sep 25, 2019 at 10:46:57PM +0200, Andreas Tobler wrote:
> --- gcc/config/rs6000/t-freebsd64 (revision 276090)
> +++ gcc/config/rs6000/t-freebsd64 (working copy)
> @@ -27,3 +27,6 @@
>  MULTILIB_EXCEPTIONS =
>  MULTILIB_OSDIRNAMES  = ../lib32
>  
> +SECURE_PLT = $(if $(findstring TARGET_FREEBSD32_SECURE_PLT=1, 
> $(tm_defines)),msecure-plt)
> +
> +MULTILIB_EXTRA_OPTS += $(SECURE_PLT)

$(findstring) isn't super great, it looks for substrings, so it would
also match "TARGET_FREEBSD32_SECURE_PLT=123"; you can use $(filter) instead?

Looks fine to me either way.

Segher

Re: [PATCH, AArch64] PR target/91833

2019-09-25 Thread Joseph Myers

On Fri, 20 Sep 2019, Richard Henderson wrote:

> Tested on aarch64-linux (glibc) and aarch64-elf (installed newlib).
> 
> The existing configure claims to be generated by 2.69, but there
> are changes wrt the autoconf distributed with Ubuntu 18.  Nothing
> that seems untoward though.

They're meant to be generated with *unmodified* 2.69 (they were when I did 
the move to 2.69).  Not with a distribution version that may have some 
patches, such as the runstatedir patch.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, AArch64] PR target/91833

2019-09-25 Thread Richard Henderson

On 9/25/19 3:54 PM, Joseph Myers wrote:
> On Fri, 20 Sep 2019, Richard Henderson wrote:
> 
>> Tested on aarch64-linux (glibc) and aarch64-elf (installed newlib).
>>
>> The existing configure claims to be generated by 2.69, but there
>> are changes wrt the autoconf distributed with Ubuntu 18.  Nothing
>> that seems untoward though.
> 
> They're meant to be generated with *unmodified* 2.69 (they were when I did 
> the move to 2.69).  Not with a distribution version that may have some 
> patches, such as the runstatedir patch.

Oops.  Well, I'll re-re-generate with stock 2.69.

That still retains the _DARWIN_USE_64_BIT_INODE, which
wasn't there before.  I presume that's an artifact of
a previous rebuild.


r~

Re: [PATCH, AArch64] PR target/91833

2019-09-25 Thread Richard Henderson

On 9/25/19 3:54 PM, Joseph Myers wrote:
> On Fri, 20 Sep 2019, Richard Henderson wrote:
> 
>> Tested on aarch64-linux (glibc) and aarch64-elf (installed newlib).
>>
>> The existing configure claims to be generated by 2.69, but there
>> are changes wrt the autoconf distributed with Ubuntu 18.  Nothing
>> that seems untoward though.
> 
> They're meant to be generated with *unmodified* 2.69 (they were when I did 
> the move to 2.69).  Not with a distribution version that may have some 
> patches, such as the runstatedir patch.

For the record, the first attachment here is the adjustment patch that I
committed over my incorrect rebuild.  The second attachment is the composite
autoconf diff against r276133.

Sorry for the noise.


r~


* config.in, configure: Re-rebuild with stock autoconf 2.69,
not the ubuntu modified 2.69.

diff --git a/libgcc/configure b/libgcc/configure
index 28c7394b3f9..117e9c97e57 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -675,7 +675,6 @@ infodir
 docdir
 oldincludedir
 includedir
-runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -766,7 +765,6 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
-runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
@@ -1019,15 +1017,6 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
-  -runstatedir | --runstatedir | --runstatedi | --runstated \
-  | --runstate | --runstat | --runsta | --runst | --runs \
-  | --run | --ru | --r)
-ac_prev=runstatedir ;;
-  -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \
-  | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \
-  | --run=* | --ru=* | --r=*)
-runstatedir=$ac_optarg ;;
-
   -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb)
 ac_prev=sbindir ;;
   -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \
@@ -1165,7 +1154,7 @@ fi
 for ac_var in  exec_prefix prefix bindir sbindir libexecdir datarootdir \
datadir sysconfdir sharedstatedir localstatedir includedir \
oldincludedir docdir infodir htmldir dvidir pdfdir psdir \
-   libdir localedir mandir runstatedir
+   libdir localedir mandir
 do
   eval ac_val=\$$ac_var
   # Remove trailing slashes.
@@ -1318,7 +1307,6 @@ Fine tuning of the installation directories:
   --sysconfdir=DIRread-only single-machine data [PREFIX/etc]
   --sharedstatedir=DIRmodifiable architecture-independent data [PREFIX/com]
   --localstatedir=DIR modifiable single-machine data [PREFIX/var]
-  --runstatedir=DIR   modifiable per-process data [LOCALSTATEDIR/run]
   --libdir=DIRobject code libraries [EPREFIX/lib]
   --includedir=DIRC header files [PREFIX/include]
   --oldincludedir=DIR C header files for non-gcc [/usr/include]
@@ -4185,7 +4173,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -4231,7 +4219,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -4255,7 +4243,7 @@ rm -f core conftest.err conftest.$ac_objext 
conftest.$ac_ext
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1 : -1];
@@ -4300,7 +4288,7 @@ else
 We can't simply define LARGE_OFF_T to be 9223372036854775807,
 since some C++ compilers masquerading as C compilers
 incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
+#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
   int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
   && LARGE_OFF_T % 2147483647 == 1)
  ? 1

Re: [libcpp] Issue a pedantic warning for UCNs outside UCS codespace

2019-09-25 Thread Joseph Myers

On Tue, 24 Sep 2019, Florian Weimer wrote:

> I think this has to depend on the C standards version.  I think each C
> standard needs to be read against the edition of ISO 10646 current at
> the time of standards approval (the references are sadly not
> versioned, so the version is implied).  Early versions of ISO 10646
> definitely do not have the codespace restriction you mention.

Undated references aren't implicitly dated to the version when the 
standard was published.  The ISO/IEC Directives, Part 2 (Principles and 
rules for the structure and drafting of ISO and IEC documents) (2018 
edition, subclause 10.4) 

say:

  Undated references may be made:

  * only to a complete document;

  * if it will be possible to use all future changes of the referenced 
  document for the purposes of the referring document;

  * when it is understood that the reference will include all amendments 
  to and revisions of the referenced document.

I think that's clear that the latest version at the time the standard is 
used applies (so if the document in the undated normative reference is 
revised, that effectively changes the requirements of the standad version 
referencing it).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [libcpp] Issue a pedantic warning for UCNs outside UCS codespace

2019-09-25 Thread Joseph Myers

On Tue, 24 Sep 2019, Eric Botcazou wrote:

> Hi,
> 
> the Universal Character Names accepted by the C family of compilers are 
> mapped 
> to those of ISO/IEC 10646, which defines the Universal Character Set 
> codespace 
> as the range 0-0x10 inclusive.  The upper bound is already enforced for 
> identifiers but not for literals, so the following code is accepted in C99:
> 
> #include 
> 
> wchar_t a = L'\U0011';
> 
> whereas it is rejected with an error by other compilers (Clang, MSVC).
> 
> I'm not sure whether the compiler is really equired to issue a diagnostic in 
> this case.  Moreover a few tests in the testsuite manipulate UCNs outside the 
> UCS codespace.  That's why I suggest issuing a pedantic warning.

For C, I think such UCNs violate the Semantics but not the Constraints on 
UCNs, so no diagnostic is actually required in C, although it is permitted 
as a pedwarn / error.

However, while C++ doesn't have that Semantics / Constraints division, 
it's also the case that before C++2a, C++ only has a dated normative 
reference to ISO/IEC 10646-1:1993 (C++2a adds an undated reference and 
says the dated one is only for deprecated features, as well as explicitly 
making such UCNs outside the ISO 10646 code point range ill-formed).  So I 
think that for C++, this is only correct as an error / pedwarn in the 
C++2a case.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [libcpp] Issue a pedantic warning for UCNs outside UCS codespace

2019-09-25 Thread Joseph Myers

On Tue, 24 Sep 2019, Eric Botcazou wrote:

> > I think this has to depend on the C standards version.  I think each C
> > standard needs to be read against the edition of ISO 10646 current at
> > the time of standards approval (the references are sadly not
> > versioned, so the version is implied).  Early versions of ISO 10646
> > definitely do not have the codespace restriction you mention.
> 
> Note the already existing hardcoded check in ucn_valid_in_identifier though.

No C or C++ standard version allows characters outside that range in 
identifiers, so that check is independent of the ISO 10646 version being 
used.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] The inline keyword is supported in all new C standards

2019-09-25 Thread Joseph Myers

On Tue, 24 Sep 2019, Palmer Dabbelt wrote:

> The documentation used to indicate that the inline keyword was only
> supported by c99 and c11, whereas in fact it is supported by c99 and all
> newer standards.
> 
> gcc/ChangeLog
> 
> 2019-09-24  Palmer Dabbelt  
> 
> * doc/extended.texi (Alternate Keywords): Change "-std=c11" to "a
> later standard."

OK with the filename corrected in the ChangeLog entry.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [RFC] Move hash-table.h and related files to libiberty

2019-09-25 Thread Christian Biesinger via gcc-patches

On Sat, Sep 21, 2019 at 7:41 AM Richard Biener
 wrote:
>
> On September 21, 2019 12:28:57 PM GMT+02:00, Christian Biesinger 
>  wrote:
> >On Sat, Sep 21, 2019 at 7:22 PM Richard Biener
> > wrote:
> >>
> >> On September 21, 2019 11:12:38 AM GMT+02:00, Christian Biesinger via
> >gcc-patches  wrote:
> >> >Hello,
> >> >
> >> >I would like to move hash-table.h, hash-map.h and related files
> >> >to libiberty, so that GDB can make use of it.
> >> >
> >> >I see that gcc already has a C++ file in include/ (unique-ptr.h),
> >> >which I understand is libiberty.
> >> >
> >> >However, this patch is not complete yet (for a start, it doesn't
> >> >compile). Before I go further down this road, is this acceptable
> >> >in principle to the gcc/libiberty maintainers?
> >> >
> >> >(the bulk of the patch is including vec.h in a lot of files,
> >> >because hash-table.h previously included it. It doesn't
> >> >actually use it, and I didn't think it was necessary to
> >> >move that to libiberty as well, so I removed that include
> >> >and instead am adding it to all the files that now don't
> >> >compile.)
> >>
> >> The bulk seems to be hash_table to hash_table_ggc renaming. Can you
> >explain?
> >
> >Yeah, sure. If hash-table.h lives in libiberty, I wanted to reduce the
> >dependencies on other headers. GCC's garbage collector seems like
> >something that does not belong there, so I moved this create function
> >to a separate header, which required renaming it since it now can't be
> >part of the same class. (the other option would be some kind of #ifdef
> >GCC thing, but that seemed ugly to me)
>
> As long as gengtype can still pick up everything correctly via the GTY 
> annotations that's probably OK.

OK, I've decided to give up on this project for now -- there are too
many GCC dependencies in this file. But I may try forking the file for
GDB.

Christian

Re: [PATCH] Remove unused #include "vec.h" from hash-table.h

2019-09-25 Thread Christian Biesinger via gcc-patches

On Mon, Sep 23, 2019 at 3:15 PM Jason Merrill  wrote:
>
> On Mon, Sep 23, 2019 at 3:52 PM Christian Biesinger via gcc-patches
>  wrote:
> >
> > From: Christian Biesinger 
> >
> > Removes an unused include as a cleanup. Requires updating
> > lots of files who previously relied on this transitive include.
> >
> > I have only been able to test this on x86_64 because I failed
> > at building a cross compiler.
> >
> > gcc/ChangeLog:
> >
> > 2019-09-23  Christian Biesinger  
> >
> > * bitmap.c: Include vec.h.
> > * common/common-target.h: Likewise.
> > * common/common-targhooks.h: Likewise.
> > * config/aarch64/aarch64-protos.h: Likewise.
> > * config/aarch64/aarch64.c: Likewise.
> > * config/aarch64/cortex-a57-fma-steering.c: Likewise.
> > * config/arc/arc.c: Likewise.
> > * config/avr/avr-c.c: Likewise.
> > * config/c6x/c6x.c: Likewise.
> > * config/cris/cris.c: Likewise.
> > * config/darwin.c: Likewise.
> > * config/epiphany/resolve-sw-modes.c: Likewise.
> > * config/i386/i386-features.c: Likewise.
> > * config/i386/i386.c: Likewise.
> > * config/ia64/ia64.c: Likewise.
> > * config/mips/mips.c: Likewise.
> > * config/mn10300/mn10300.c: Likewise.
> > * config/nds32/nds32-relax-opt.c: Likewise.
> > * config/nds32/nds32.c: Likewise.
> > * config/nvptx/nvptx.c: Likewise.
> > * config/pa/pa.c: Likewise.
> > * config/pdp11/pdp11.c: Likewise.
> > * config/rs6000/rs6000-c.c: Likewise.
> > * config/rx/rx.c: Likewise.
> > * config/s390/s390-c.c: Likewise.
> > * config/s390/s390.c: Likewise.
> > * config/sparc/sparc.c: Likewise.
> > * config/visium/visium.c: Likewise.
> > * config/vms/vms.c: Likewise.
> > * config/vxworks.c: Likewise.
> > * dbgcnt.c: Likewise.
> > * diagnostic-show-locus.c: Likewise.
> > * edit-context.c: Likewise.
> > * fibonacci_heap.h: Likewise.
> > * function.h: Likewise.
> > * genmatch.c: Likewise.
> > * ggc-common.c: Likewise.
> > * graphds.h: Likewise.
> > * hash-table.h: Remove unused include of vec.h.
> > * input.c: Include vec.h.
> > * json.h: Likewise.
> > * opt-suggestions.h: Likewise.
> > * opts.h: Likewise.
> > * rtl.h: Likewise.
> > * target.h: Likewise.
> > * timevar.c: Likewise.
> > * tree-core.h: Likewise.
> > * typed-splay-tree.c: Likewise.
> > * vec.c: Likewise.
> > * vector-builder.h: Likewise.
> > * vtable-verify.h: Likewise.
>
> This is a surprising list of files.  For instance, common-target.h
> uses nothing from vec.h, and most of these files include tree-core.h,
> so adding the include there should avoid the need in most other files.

I did add it to tree-core.h, but it did not help. (Note: I added it to
the files in config/ without verifying if it is strictly needed,
because I couldn't try compiling most of them; they do all use vec<>
or auto_vec<>. However, all the other files did need the include)

I added it to common-target.h because of:
n file included from ../../gcc/gcc/params.c:24:
../../gcc/gcc/common/common-target.def:98:2: error: ‘vec’ does not name a type
  vec, (int option_code, const char *prefix),
  ^~~
../../gcc/gcc/common/common-target.h:62:48: note: in definition of
macro ‘DEFHOOK’
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
^~~~

Including it in common-target.def doesn't seem to work; I'm not
entirely sure why.

It sounds like you're saying that in GCC, it's common and OK to rely
on transitive includes for the classes you use?

Bernhard: Thanks, I will look into that

Segher: reduce-headers seems to not work for header files, or maybe I
used it wrong. (it couldn't find the output file for hash-table.h)

Thanks,
Christian

Re: [PATCH] Remove unused #include "vec.h" from hash-table.h

2019-09-25 Thread Jason Merrill

On Wed, Sep 25, 2019 at 8:01 PM Christian Biesinger
 wrote:
> On Mon, Sep 23, 2019 at 3:15 PM Jason Merrill  wrote:
> > On Mon, Sep 23, 2019 at 3:52 PM Christian Biesinger via gcc-patches
> >  wrote:
> > >
> > > From: Christian Biesinger 
> > >
> > > Removes an unused include as a cleanup. Requires updating
> > > lots of files who previously relied on this transitive include.
> > >
> > > I have only been able to test this on x86_64 because I failed
> > > at building a cross compiler.
> > >
> > > gcc/ChangeLog:
> > >
> > > 2019-09-23  Christian Biesinger  
> > >
> > > * bitmap.c: Include vec.h.
> > > * common/common-target.h: Likewise.
> > > * common/common-targhooks.h: Likewise.
> > > * config/aarch64/aarch64-protos.h: Likewise.
> > > * config/aarch64/aarch64.c: Likewise.
> > > * config/aarch64/cortex-a57-fma-steering.c: Likewise.
> > > * config/arc/arc.c: Likewise.
> > > * config/avr/avr-c.c: Likewise.
> > > * config/c6x/c6x.c: Likewise.
> > > * config/cris/cris.c: Likewise.
> > > * config/darwin.c: Likewise.
> > > * config/epiphany/resolve-sw-modes.c: Likewise.
> > > * config/i386/i386-features.c: Likewise.
> > > * config/i386/i386.c: Likewise.
> > > * config/ia64/ia64.c: Likewise.
> > > * config/mips/mips.c: Likewise.
> > > * config/mn10300/mn10300.c: Likewise.
> > > * config/nds32/nds32-relax-opt.c: Likewise.
> > > * config/nds32/nds32.c: Likewise.
> > > * config/nvptx/nvptx.c: Likewise.
> > > * config/pa/pa.c: Likewise.
> > > * config/pdp11/pdp11.c: Likewise.
> > > * config/rs6000/rs6000-c.c: Likewise.
> > > * config/rx/rx.c: Likewise.
> > > * config/s390/s390-c.c: Likewise.
> > > * config/s390/s390.c: Likewise.
> > > * config/sparc/sparc.c: Likewise.
> > > * config/visium/visium.c: Likewise.
> > > * config/vms/vms.c: Likewise.
> > > * config/vxworks.c: Likewise.
> > > * dbgcnt.c: Likewise.
> > > * diagnostic-show-locus.c: Likewise.
> > > * edit-context.c: Likewise.
> > > * fibonacci_heap.h: Likewise.
> > > * function.h: Likewise.
> > > * genmatch.c: Likewise.
> > > * ggc-common.c: Likewise.
> > > * graphds.h: Likewise.
> > > * hash-table.h: Remove unused include of vec.h.
> > > * input.c: Include vec.h.
> > > * json.h: Likewise.
> > > * opt-suggestions.h: Likewise.
> > > * opts.h: Likewise.
> > > * rtl.h: Likewise.
> > > * target.h: Likewise.
> > > * timevar.c: Likewise.
> > > * tree-core.h: Likewise.
> > > * typed-splay-tree.c: Likewise.
> > > * vec.c: Likewise.
> > > * vector-builder.h: Likewise.
> > > * vtable-verify.h: Likewise.
> >
> > This is a surprising list of files.  For instance, common-target.h
> > uses nothing from vec.h, and most of these files include tree-core.h,
> > so adding the include there should avoid the need in most other files.
>
> I did add it to tree-core.h, but it did not help.

Curious, but I now think coretypes.h would be a better place; it
currently includes hash-table.h, and comes before common-target.h.

> It sounds like you're saying that in GCC, it's common and OK to rely
> on transitive includes for the classes you use?

In many cases, I think so.

Jason

Re: [PATCH] Remove vectorizer reduction operand swapping

2019-09-25 Thread Christophe Lyon

On Wed, 25 Sep 2019 at 09:33, Richard Biener  wrote:
>
> On Tue, 24 Sep 2019, Christophe Lyon wrote:
>
> > On Wed, 18 Sep 2019 at 20:11, Richard Biener  wrote:
> > >
> > >
> > > It shouldn't be neccessary.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > > (SLP part testing separately)
> > >
> > > Richard.
> > >
> > > 2019-09-18  Richard Biener  
> > >
> > > * tree-vect-loop.c (vect_is_simple_reduction): Remove operand
> > > swapping.
> > > (vectorize_fold_left_reduction): Remove assert.
> > > (vectorizable_reduction): Also expect COND_EXPR non-reduction
> > > operand in position 2.  Remove assert.
> > >
> >
> > Hi,
> >
> > Since this was committed (r275898), I've noticed a regression on armeb:
> > FAIL: gcc.dg/vect/vect-cond-4.c execution test
> >
> > I'm seeing this with qemu, but I do not have the execution traces yet.
>
> Can you open a bugreport please?
>
Sure, I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91909


> Thanks,
> Richard.
>
> > Christophe
> >
> > > Index: gcc/tree-vect-loop.c
> > > ===
> > > --- gcc/tree-vect-loop.c(revision 275872)
> > > +++ gcc/tree-vect-loop.c(working copy)
> > > @@ -3278,56 +3278,8 @@ vect_is_simple_reduction (loop_vec_info
> > >   || !flow_bb_inside_loop_p (loop, gimple_bb (def2_info->stmt))
> > >   || vect_valid_reduction_input_p (def2_info)))
> > >  {
> > > -  if (! nested_in_vect_loop && orig_code != MINUS_EXPR)
> > > -   {
> > > - /* Check if we can swap operands (just for simplicity - so that
> > > -the rest of the code can assume that the reduction variable
> > > -is always the last (second) argument).  */
> > > - if (code == COND_EXPR)
> > > -   {
> > > - /* Swap cond_expr by inverting the condition.  */
> > > - tree cond_expr = gimple_assign_rhs1 (def_stmt);
> > > - enum tree_code invert_code = ERROR_MARK;
> > > - enum tree_code cond_code = TREE_CODE (cond_expr);
> > > -
> > > - if (TREE_CODE_CLASS (cond_code) == tcc_comparison)
> > > -   {
> > > - bool honor_nans = HONOR_NANS (TREE_OPERAND (cond_expr, 
> > > 0));
> > > - invert_code = invert_tree_comparison (cond_code, 
> > > honor_nans);
> > > -   }
> > > - if (invert_code != ERROR_MARK)
> > > -   {
> > > - TREE_SET_CODE (cond_expr, invert_code);
> > > - swap_ssa_operands (def_stmt,
> > > -gimple_assign_rhs2_ptr (def_stmt),
> > > -gimple_assign_rhs3_ptr (def_stmt));
> > > -   }
> > > - else
> > > -   {
> > > - if (dump_enabled_p ())
> > > -   report_vect_op (MSG_NOTE, def_stmt,
> > > -   "detected reduction: cannot swap 
> > > operands "
> > > -   "for cond_expr");
> > > - return NULL;
> > > -   }
> > > -   }
> > > - else
> > > -   swap_ssa_operands (def_stmt, gimple_assign_rhs1_ptr 
> > > (def_stmt),
> > > -  gimple_assign_rhs2_ptr (def_stmt));
> > > -
> > > - if (dump_enabled_p ())
> > > -   report_vect_op (MSG_NOTE, def_stmt,
> > > -   "detected reduction: need to swap operands: 
> > > ");
> > > -
> > > - if (CONSTANT_CLASS_P (gimple_assign_rhs1 (def_stmt)))
> > > -   LOOP_VINFO_OPERANDS_SWAPPED (loop_info) = true;
> > > -}
> > > -  else
> > > -{
> > > -  if (dump_enabled_p ())
> > > -report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
> > > -}
> > > -
> > > +  if (dump_enabled_p ())
> > > +   report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
> > >return def_stmt_info;
> > >  }
> > >
> > > @@ -5969,7 +5921,6 @@ vectorize_fold_left_reduction (stmt_vec_
> > >gcc_assert (!nested_in_vect_loop_p (loop, stmt_info));
> > >gcc_assert (ncopies == 1);
> > >gcc_assert (TREE_CODE_LENGTH (code) == binary_op);
> > > -  gcc_assert (reduc_index == (code == MINUS_EXPR ? 0 : 1));
> > >gcc_assert (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> > >   == FOLD_LEFT_REDUCTION);
> > >
> > > @@ -6542,9 +6493,9 @@ vectorizable_reduction (stmt_vec_info st
> > >   reduc_index = i;
> > > }
> > >
> > > -  if (i == 1 && code == COND_EXPR)
> > > +  if (code == COND_EXPR)
> > > {
> > > - /* Record how value of COND_EXPR is defined.  */
> > > + /* Record how the non-reduction-def value of COND_EXPR is 
> > > defined.  */
> > >   if (dt == vect_constant_def)
> > > {
> > >   cond_reduc_dt = dt;
> > > @@ -66

Re: new x86 cmpmemsi expander, and adjustments for cmpstrn*

2019-09-25 Thread Alexandre Oliva

On Sep 10, 2019, Alexandre Oliva  wrote:

> This patchset fixes some latent problems in cmpstrn* patterns for x86,
> and introduces cmpmemsi for short fixed-size memcmp.

Ping?  https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00701.html


> Would it make sense to install the test program somewhere?


> make cmpstrnsi patterns safer

> for  gcc/ChangeLog

>   * config/i386/i386.md (cmpstrnsi): Create separate output
> count to pass to cmpstrnqi_nz_1 and ...
>   (cmpstrnqi_1): ... this.  Do not use a match_dup of the count
> input as an output.  Preserve FLAGS_REG when length is zero.
>   (*cmpstrnqi_1): Preserve FLAGS_REG when length is zero.


> x86 cmpmemsi pattern - single compare

> for  gcc/ChangeLog

>   * config/i386/i386.md (cmpmemsi): New pattern.


> extend x86 cmpmemsi to use loops

> for  gcc/ChangeLog

>   * config/i386/i386.md (cmpmemsi): Expand more than one
>   fragment compare sequence depending on optimization level.
>   (subcmpsi3): New expand pattern.


-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

Re: [PATCH] DWARF array bounds missing from C++ array definitions

2019-09-25 Thread Alexandre Oliva

On Sep 13, 2019, Richard Biener  wrote:

> On Fri, Sep 13, 2019 at 1:32 AM Alexandre Oliva  wrote:
>> On Sep 12, 2019, Richard Biener  wrote:

>> > So - maybe we can have the patch a bit cleaner by adding
>> > a flag to add_type_attribute saying we only want it if it's
>> > different from that already present (on the specification DIE)?

>> That's exactly what I meant completing_type_p to check.  Do you mean it
>> should be stricter, do it more cheaply, or what?

> I meant to do it more cheaply

How?  If it's the same type, lookup will find the DIE and be done with
it.  If it's not, we'll want to build a new DIE anyway.  Where
computation is there to avoid that is not already avoided?

> also the name completing_type_p is
> misleading IMHO since it simply checks (re-)creating the type DIE
> will yield a different one

True.  The expectation is that the type of a decl will not transition
from complete to incomplete.  different_type_p would be more accurate
for unexpected frontends that behaved so weirdly, but not quite as
intuitive as to the intent.  I still like completing_type_p better, but
if you insist, I'll change it.  Bonus points for a better name ;-)

> sure how that would look like (create a new variable DIE and
> splice out "common" parts into an abstract DIE used by
> both the old and the new DIE?)

In my exhaustive verification of all hits of completing_type_p in an
all-languages bootstrap, we had either a new DIE for the outermost array
type, now with bounds, sharing the remaining array dimensions; an
alternate symbolic name ultimately referring to the same type
definition; or an extra const-qualifying DIE for a const array type
whose base type is already const-qualified.  None of these seemed
excessive to me, though the last one might be desirable and not too hard
to avoid.

I haven't seen any case of transitioning to less-defined types.

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

Problem exposed by recent ARM multiply changes

2019-09-25 Thread Jeff Law

> commit fa761b10d40aaa71e62fbc0c9f2ab8fc07a98b49 (HEAD, refs/bisect/bad)
> Author: wilco 
> Date:   Wed Sep 18 18:33:30 2019 +
> 
> [ARM] Cleanup 64-bit multiplies
> 
> Cleanup 64-bit multiplies.  Combine the expanders using iterators.
> Merge the signed/unsigned multiplies as well as the pre-Armv6 and Armv6
> variants.  Split DImode operands early into parallel sets inside the
> MULL/MLAL instructions - this improves register allocation and avoids
> subreg issues due to other DImode operations splitting early.
> 
> gcc/
> * config/arm/arm.md (maddsidi4): Remove expander.
> (mulsidi3adddi): Remove pattern.
> (mulsidi3adddi_v6): Likewise.
> (mulsidi3_nov6): Likewise.
> (mulsidi3_v6): Likewise.
> (umulsidi3): Remove expander.
> (umulsidi3_nov6): Remove pattern.
> (umulsidi3_v6): Likewise.
> (umulsidi3adddi): Likewise.
> (umulsidi3adddi_v6): Likewise.
> (mulsidi3): Add combined expander.
> (maddsidi4): Likewise.
> (mull): Add combined umull and smull pattern.
> (mlal): Likewise.
> * config/arm/iterators.md (Us): Add new iterator.

Is causing the linux kernel to fail to build for arm-linux-gnueabi.

-O2 with this testcase:

__extension__ typedef unsigned long long __u64;
typedef __u64 u64;
static inline unsigned long
__timespec64_to_jiffies (u64 sec, long nsec)
{
  return ((sec *
   ((unsigned
 long) u64) 10L << (32 - 7)) +
 ((10L + 100 / 2) / 100) -
 1) / (u64) ((10L + 100 / 2) / 100 +
  (((u64) nsec *
((unsigned
  long) u64) 1 << ((32 - 7) + 29)) +
  ((10L + 100 / 2) / 100) -
  1) / (u64) ((10L + 100 / 2) / 100 >>
(((32 -

 7) +

29) -

(32 -

7
>> (32 - 7);
}

unsigned long
__timespec_to_jiffies (unsigned long sec, long nsec)
{
  return __timespec64_to_jiffies ((u64) sec, nsec);
}

But I think your change has just exposed a latent bug in cse.c

A bit of review of how CONST_INT nodes work.  They are modeless which is
a historical wart and tends to cause all kinds of not so fun problems.
Primarily because the validity of any particular constant is context
dependent.

Essentially if the constant is used in a mode smaller than a
HOST_WIDE_INT, then the constant must be sign extended from that mode to
a HOST_WIDE_INT.

So a node like

(const_int 0xc800) is valid if it is used in a DImode context, but
not in an SImode context (where it would have to be 0xc800).


So with that background we have the following key insns:



(insn 13 12 14 2 (set (reg:SI 124)
(const_int -939524096 [0xc800])) "j.c":10:54 161
{*arm_movsi_insn}
 (nil))

(insn 14 13 16 2 (parallel [
(set (reg:SI 132)
(plus:SI (mult:SI (zero_extend:DI (reg/v:SI 115 [ sec ]))
(zero_extend:DI (reg:SI 124)))
(reg:SI 130)))
(set (reg:SI 133 [+4 ])
(plus:SI (truncate:SI (lshiftrt:DI (plus:DI (mult:DI
(zero_extend:DI (reg/v:SI 115 [ sec ]))
(zero_extend:DI (reg:SI 124)))
(zero_extend:DI (reg:SI 130)))
(const_int 32 [0x20])))
(reg:SI 131 [+4 ])))
]) "j.c":10:54 60 {umlal}
 (expr_list:REG_DEAD (reg:SI 131 [+4 ])
(expr_list:REG_DEAD (reg:SI 130)
(expr_list:REG_DEAD (reg:SI 124)
(expr_list:REG_DEAD (reg/v:SI 115 [ sec ])
(nil))


So the (const_int -939524096) mode in insn 13 is absolutely correct.
When we substitute it into insn 14 and simplify the zero_extend the form
changes into (const_int 3355443200) since it's used in a DImode context.

Things have already gone wrong at this point though, but let's follow
things further to see what eventually happens.  Eventually we want to
simplify MULT and call simplify_binary_operation_1.


code = MULT
mode = E_SImode
op0/trueop0 = (zero_extend:DI (reg/v:SI 115 [ sec ]))
op1/trueop1 = (const_int 3355443200 [0xc800])

It's critical to note that MODE here is the mode of the *output
operand*, so SImode is the right value.

Inside simplify_binary_operation_1 we want to convert a multiply by a
constant power of two into a shift via this code:


>   /* Convert multiply by constant power of two into shift.  */
>   if (CONST_SCALAR_INT_P (trueop1))
> {
>   val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
>   if (val >= 0)
> return simplify_gen_binary (ASHIFT, mode, op0,
> gen_int_shift_amount (mode, val));
> }


And we promptly fault inside wi::exact_log2 because we passed it a
constant (0x3355443200

Re: [PATCH v4] Missed function specialization + partial devirtualization

2019-09-25 Thread luoxhu


Thanks Martin,


On 2019/9/25 18:57, Martin Liška wrote:

On 9/25/19 5:45 AM, luoxhu wrote:

Hi,

Sorry for replying so late due to cauldron conference and other LTO issues
I was working on.


Hello.

That's fine, we still have plenty of time for patch review.

Not fixed issues which I reported in v3 (and still valid in v4):
- please come up with indirect_target_info::indirect_target_info and use it

Sorry for miss out.



- do you need to stream out indirect_call_targets when common_target_id == 0?


No need to stream out items with common_target_id == 0, removed the if 
condition in lto-cgraph.c.




Then I'm suggesting to use vec::is_empty (please see my patch).
OK.  But has_multiple_indirect_call_p should return different than 
has_indirect_call_p as it checks more that one targets?


gcc/cgraph.c
/* Return true if this edge has multiple indirect call targets.  */
 bool
 cgraph_edge::has_multiple_indirect_call_p (void)
 {
-  return indirect_info && indirect_info->indirect_call_targets
-&& indirect_info->indirect_call_targets->length () > 1;
+  return (indirect_info && indirect_info->indirect_call_targets
+ && indirect_info->indirect_call_targets->length () > 1);
 }

 /* Return true if this edge has at least one indirect call target.  */
 bool
 cgraph_edge::has_indirect_call_p (void)
 {
-  return indirect_info && indirect_info->indirect_call_targets
-&& indirect_info->indirect_call_targets->length ();
+  return (indirect_info && indirect_info->indirect_call_targets
+ && !indirect_info->indirect_call_targets->is_empty ());
 }



I see following failures for the tests provided:
FAIL: gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c compilation,  
-fprofile-generate -D_PROFILE_GENERATE
FAIL: gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c compilation,  
-fprofile-generate -D_PROFILE_GENERATE
FAIL: gcc.dg/tree-prof/indir-call-prof-topn.c compilation,  -fprofile-generate 
-D_PROFILE_GENERATE


Sorry that I forgot to remove the deprecated build option in the 3 cases
(also updated the scan exp check):
-/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate --param 
indir-call-topn-profile=1" } */

+/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate" } */


The new patch is attached.  Thanks.


Xiong Hu



Next comments follow directly in the email body:



v4 Changes:
  1. Rebase to trunk.
  2. Remove num_of_ics and use vector's length to avoid redundancy.
  3. Update the code in ipa-profile.c to improve review feasibility.
  4. Add function has_indirect_call_p and has_multiple_indirect_call_p.
  5. For parameter control, I will leave it to next patch as it is a
 relative independent function.  Currently, maximum number of
 promotions is GCOV_TOPN_VALUES as only 4 profiling value limited
 from profile-generate, therefore minimum probability is adjusted to
 25% in value-prof.c, it was 75% also by hard code for single
 indirect target.  No control to minimal number of edge
 executions yet.  What's more, this patch is a bit large now.

This patch aims to fix PR69678 caused by PGO indirect call profiling
performance issues.
The bug that profiling data is never working was fixed by Martin's pull
back of topN patches, performance got GEOMEAN ~1% improvement(+24% for
511.povray_r specifically).
Still, currently the default profile only generates SINGLE indirect target
that called more than 75%.  This patch leverages MULTIPLE indirect
targets use in LTO-WPA and LTO-LTRANS stage, as a result, function
specialization, profiling, partial devirtualization, inlining and
cloning could be done successfully based on it.
Performance can get improved from 0.70 sec to 0.38 sec on simple tests.
Details are:
   1.  PGO with topn is enabled by default now, but only one indirect
   target edge will be generated in ipa-profile pass, so add variables to enable
   multiple speculative edges through passes, speculative_id will record the
   direct edge index bind to the indirect edge, indirect_call_targets length
   records how many direct edges owned by the indirect edge, postpone gimple_ic
   to ipa-profile like default as inline pass will decide whether it is benefit
   to transform indirect call.
   2.  Use speculative_id to track and search the reference node matched
   with the direct edge's callee for multiple targets.  Actually, it is the
   caller's responsibility to handle the direct edges mapped to same indirect
   edge.  speculative_call_info will return one of the direct edge specified,
   this will leverage current IPA edge process framework mostly.
   3.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
   profile full support in ipa passes and cgraph_edge functions.  speculative_id
   can be set by make_speculative id when multiple targets are binded to
   one indirect edge, and cloned if new edge is cloned.  speculative_id
   is streamed out and stream int by lto like lto_stmt_uid.
   4.  Add 1 in module testcase and

Re: [PATCH] FreeBSD PowerPC use secure-plt

2019-09-25 Thread Andreas Tobler


Hi Segher,

On 26.09.19 00:49, Segher Boessenkool wrote:


On Wed, Sep 25, 2019 at 10:46:57PM +0200, Andreas Tobler wrote:

--- gcc/config/rs6000/t-freebsd64   (revision 276090)
+++ gcc/config/rs6000/t-freebsd64   (working copy)
@@ -27,3 +27,6 @@
  MULTILIB_EXCEPTIONS =
  MULTILIB_OSDIRNAMES   = ../lib32
  
+SECURE_PLT = $(if $(findstring TARGET_FREEBSD32_SECURE_PLT=1, $(tm_defines)),msecure-plt)

+
+MULTILIB_EXTRA_OPTS += $(SECURE_PLT)


$(findstring) isn't super great, it looks for substrings, so it would
also match "TARGET_FREEBSD32_SECURE_PLT=123"; you can use $(filter) instead?


Thank you for the feedback. Indeed, filter looks better. Testing again :)

And here for completeness the CL.

2019-09-25  Andreas Tobler  

* config.gcc: Use the secure-plt on FreeBSD 13 and upwards for
32-bit PowerPC.
Define TARGET_FREEBSD32_SECURE_PLT for 64-bit PowerPC.
* config/rs6000/t-freebsd64: Make use of the above define and build
the 32-bit libraries with secure-plt.


Looks fine to me either way.


Thank you.

Andreas

Re: [PATCH] Retain TYPE_MODE more often for BIT_FIELD_REFs in get_inner_referece

2019-09-25 Thread Richard Biener

On Wed, 25 Sep 2019, Eric Botcazou wrote:

> > For the PR it would be good enough. Though I wonder what the original reason
> > for the mode handling was. Was it to avoid not naturally aligned modes for
> > strict align targets? Or modes for non-mode size entities?
> 
> Bit-field extraction ultimately required integer modes before vector modes 
> came to light so I think that preserving their original mode was useless.

I see.  So I misremember seeing aggregate typed BIT_FIELD_REFs
(that was probably VIEW_CONVERTs then...).  Still the GIMPLE verifier
only has

  else if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
   && TYPE_MODE (TREE_TYPE (expr)) != BLKmode
   && maybe_ne (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE 
(expr))),
size))
{
  error ("mode size of non-integral result does not "
 "match field size of %qs",
 code_name);
  return true;
}

it doesn't verify that besides integral typed expressions only
vector typed expressions are allowed.

Anyhow - the original patch succeeded bootstrapping and testing.
The way I proposed it:

  /* For vector types, with the correct size of access, use the mode 
of
 inner type.  */
  if (((TREE_CODE (TREE_TYPE (TREE_OPERAND (exp, 0))) == VECTOR_TYPE
&& TREE_TYPE (exp) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 
0
   || !INTEGRAL_TYPE_P (TREE_TYPE (exp)))
  && tree_int_cst_equal (size_tree, TYPE_SIZE (TREE_TYPE (exp
mode = TYPE_MODE (TREE_TYPE (exp));

matches in sofar that we only restrict integer types (not modes) and
for integer types allow extracts from vectors (the preexisting
check for a matching component type is a bit too strict I guess).

Thus, OK with adjusting the comment to reflect the change?

Thanks,
Richard.

[PATCH] Use build_clobber some more

2019-09-25 Thread Jakub Jelinek

Hi!

I wasn't aware of this new build_clobber function added last year,
apparently we have still tons of spots that build clobbers by hand and using
build_clobber will make it clear what we are actually building.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-09-26  Jakub Jelinek  

* function.c (gimplify_parameters): Use build_clobber function.
* tree-ssa.c (execute_update_addresses_taken): Likewise.
* tree-inline.c (expand_call_inline): Likewise.
* tree-sra.c (clobber_subtree): Likewise.
* tree-ssa-ccp.c (insert_clobber_before_stack_restore): Likewise.
* omp-low.c (lower_rec_simd_input_clauses, lower_rec_input_clauses,
lower_omp_single, lower_depend_clauses, lower_omp_taskreg,
lower_omp_target): Likewise.
* omp-expand.c (expand_omp_for_generic): Likewise.
* omp-offload.c (ompdevlow_adjust_simt_enter): Likewise.

--- gcc/function.c.jj   2019-09-11 10:27:40.170772183 +0200
+++ gcc/function.c  2019-09-25 18:22:43.414133650 +0200
@@ -3892,9 +3892,8 @@ gimplify_parameters (gimple_seq *cleanup
  if (!is_gimple_reg (local)
  && flag_stack_reuse != SR_NONE)
{
- tree clobber = build_constructor (type, NULL);
+ tree clobber = build_clobber (type);
  gimple *clobber_stmt;
- TREE_THIS_VOLATILE (clobber) = 1;
  clobber_stmt = gimple_build_assign (local, clobber);
  gimple_seq_add_stmt (cleanup, clobber_stmt);
}
--- gcc/tree-ssa.c.jj   2019-07-19 11:56:10.438964997 +0200
+++ gcc/tree-ssa.c  2019-09-25 18:24:27.077559222 +0200
@@ -2016,9 +2016,7 @@ execute_update_addresses_taken (void)
/* In ASAN_MARK (UNPOISON, &b, ...) the variable
   is uninitialized.  Avoid dependencies on
   previous out of scope value.  */
-   tree clobber
- = build_constructor (TREE_TYPE (var), NULL);
-   TREE_THIS_VOLATILE (clobber) = 1;
+   tree clobber = build_clobber (TREE_TYPE (var));
gimple *g = gimple_build_assign (var, clobber);
gsi_replace (&gsi, g, GSI_SAME_STMT);
  }
--- gcc/tree-inline.c.jj2019-09-20 12:25:48.187387060 +0200
+++ gcc/tree-inline.c   2019-09-25 18:23:35.633340550 +0200
@@ -5016,9 +5016,8 @@ expand_call_inline (basic_block bb, gimp
  tree *varp = id->decl_map->get (p);
  if (varp && VAR_P (*varp) && !is_gimple_reg (*varp))
{
- tree clobber = build_constructor (TREE_TYPE (*varp), NULL);
+ tree clobber = build_clobber (TREE_TYPE (*varp));
  gimple *clobber_stmt;
- TREE_THIS_VOLATILE (clobber) = 1;
  clobber_stmt = gimple_build_assign (*varp, clobber);
  gimple_set_location (clobber_stmt, gimple_location (stmt));
  gsi_insert_before (&stmt_gsi, clobber_stmt, GSI_SAME_STMT);
@@ -5086,9 +5085,8 @@ expand_call_inline (basic_block bb, gimp
  && !is_gimple_reg (id->retvar)
  && !stmt_ends_bb_p (stmt))
{
- tree clobber = build_constructor (TREE_TYPE (id->retvar), NULL);
+ tree clobber = build_clobber (TREE_TYPE (id->retvar));
  gimple *clobber_stmt;
- TREE_THIS_VOLATILE (clobber) = 1;
  clobber_stmt = gimple_build_assign (id->retvar, clobber);
  gimple_set_location (clobber_stmt, gimple_location (old_stmt));
  gsi_insert_after (&stmt_gsi, clobber_stmt, GSI_SAME_STMT);
@@ -5134,9 +5132,8 @@ expand_call_inline (basic_block bb, gimp
   && !TREE_THIS_VOLATILE (id->retvar)
   && !is_gimple_reg (id->retvar))
{
- tree clobber = build_constructor (TREE_TYPE (id->retvar), NULL);
+ tree clobber = build_clobber (TREE_TYPE (id->retvar));
  gimple *clobber_stmt;
- TREE_THIS_VOLATILE (clobber) = 1;
  clobber_stmt = gimple_build_assign (id->retvar, clobber);
  gimple_set_location (clobber_stmt, gimple_location (stmt));
  gsi_replace (&stmt_gsi, clobber_stmt, false);
--- gcc/tree-sra.c.jj   2019-09-20 12:25:46.832408059 +0200
+++ gcc/tree-sra.c  2019-09-25 18:23:59.899971991 +0200
@@ -3039,8 +3039,7 @@ clobber_subtree (struct access *access,
   if (access->grp_to_be_replaced)
 {
   tree rep = get_access_replacement (access);
-  tree clobber = build_constructor (access->type, NULL);
-  TREE_THIS_VOLATILE (clobber) = 1;
+  tree clobber = build_clobber (access->type);
   gimple *stmt = gimple_build_assign (rep, clobber);
 
   if (insert_after)
--- gcc/tree-ssa-ccp.c.jj   2019-09-24 14:39:04.685508253 +0200
+++ gcc/tree-ssa-ccp.c  2019-09-25 18:24:51.572187200 +020

Re: [SVE] PR91532

2019-09-25 Thread Richard Biener

On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:

> On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
> >
> > On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> > > Hi,
> > > For PR91532, the dead store is trivially deleted if we place dse pass
> > > between ifcvt and vect. Would it be OK to add another instance of dse 
> > > there ?
> > > Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
> > > will clean up the dead store ?
> > I'd hesitate to add another DSE pass.  If there's one nearby could we
> > move the existing pass?
> Well I think the nearest one is just after pass_warn_restrict. Not
> sure if it's a good
> idea to move it up from there ?

You'll need it inbetween ifcvt and vect so it would be disabled
w/o vectorization, so no, that doesn't work.

ifcvt already invokes SEME region value-numbering so if we had
MESE region DSE it could use that.  Not sure if you feel like
refactoring DSE to work on regions - it currently uses a DOM
walk which isn't suited for that.

if-conversion has a little "local" dead predicate compute removal
thingy (not that I like that), eventually it can be enhanced to
do the DSE you want?  Eventually it should be moved after the local
CSE invocation though.

Richard.

Re: Problem exposed by recent ARM multiply changes

2019-09-25 Thread Jakub Jelinek

On Wed, Sep 25, 2019 at 10:06:13PM -0600, Jeff Law wrote:
> (insn 13 12 14 2 (set (reg:SI 124)
> (const_int -939524096 [0xc800])) "j.c":10:54 161
> {*arm_movsi_insn}
>  (nil))
> 
> (insn 14 13 16 2 (parallel [
> (set (reg:SI 132)
> (plus:SI (mult:SI (zero_extend:DI (reg/v:SI 115 [ sec ]))
> (zero_extend:DI (reg:SI 124)))
> (reg:SI 130)))

IMNSHO the bug is just in the backend, the above is not valid RTL.
SImode MULT has to have SImode operands, not DImode operands.

> (set (reg:SI 133 [+4 ])
> (plus:SI (truncate:SI (lshiftrt:DI (plus:DI (mult:DI
> (zero_extend:DI (reg/v:SI 115 [ sec ]))
> (zero_extend:DI (reg:SI 124)))
> (zero_extend:DI (reg:SI 130)))
> (const_int 32 [0x20])))
> (reg:SI 131 [+4 ])))

>From the rest of the pattern, I'd say the right fix is just to:
--- gcc/config/arm/arm.md.jj2019-09-20 23:17:28.786629241 +0200
+++ gcc/config/arm/arm.md   2019-09-26 08:47:40.068517793 +0200
@@ -1812,8 +1812,8 @@
   [(set (match_operand:SI 0 "s_register_operand" "=r,&r")
(plus:SI
 (mult:SI
- (SE:DI (match_operand:SI 4 "s_register_operand" "%r,r"))
- (SE:DI (match_operand:SI 5 "s_register_operand" "r,r")))
+ (match_operand:SI 4 "s_register_operand" "%r,r")
+ (match_operand:SI 5 "s_register_operand" "r,r"))
 (match_operand:SI 1 "s_register_operand" "0,0")))
(set (match_operand:SI 2 "s_register_operand" "=r,&r")
(plus:SI
because it really only cares about the DImode zext values in the second part
of the instruction, but I don't have spare cycles to test this right now nor
write testcases.

Jakub

73 matches

Mail list logo