[PATCH] Fix up wrong-code caused by store_expr (PR middle-end/90025)

2019-04-10 Thread Jakub Jelinek
Hi!

In r268957 aka PR66152 fix I've slightly changed the store_expr
code which does store_by_pieces for just a part of an array and
clear_storage for the rest and since then the following testcase
is miscompiled on s390x-linux.  I believe one could construct other testcase
that would be miscompiled before as well though.

The problem is that store_by_pieces with RETURN_END returns a MEM object
with QImode and thus also MEM_SIZE of 1, we then adjust_address it to
BLKmode which doesn't change MEM_SIZE and nothing at least in the generic
clear_storage code tweaks that MEM_SIZE later on (and while perhaps some
targets do, e.g. s390 does not).  Because in this case the size is
really known and constant, the following patch makes sure we do
what adjust_address does, but in addition make sure to set MEM_SIZE
properly.  When MEM_SIZE is 1 and in the testcase there is a store after
it to the first few bytes of the remainder, RTL DSE happily removes that
memory clear insn emitted for clear_storage, because if it was really size
1, it would be indeed redundant, but there are 20 bytes that are not
redundant after it.

Bootstrapped/regtested on {x86_64,i686,s390x,powerpc64le}-linux, ok for
trunk?

2019-04-10  Jakub Jelinek  

PR middle-end/90025
* expr.c (store_expr): Set properly size on the MEM passed to
clear_storage.

* gcc.c-torture/execute/pr90025.c: New test.

--- gcc/expr.c.jj   2019-02-21 00:01:59.418057433 +0100
+++ gcc/expr.c  2019-04-09 12:49:20.704457867 +0200
@@ -5658,7 +5658,8 @@ store_expr (tree exp, rtx target, int ca
   dest_mem = store_by_pieces (target, str_copy_len, string_cst_read_str,
  (void *) str, MEM_ALIGN (target), false,
  RETURN_END);
-  clear_storage (adjust_address (dest_mem, BLKmode, 0),
+  clear_storage (adjust_address_1 (dest_mem, BLKmode, 0, 1, 1, 0,
+  exp_len - str_copy_len),
 GEN_INT (exp_len - str_copy_len), BLOCK_OP_NORMAL);
   return NULL_RTX;
 }
--- gcc/testsuite/gcc.c-torture/execute/pr90025.c.jj2019-04-09 
12:59:35.876449686 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr90025.c   2019-04-09 
12:59:20.156705430 +0200
@@ -0,0 +1,28 @@
+/* PR middle-end/90025 */
+
+__attribute__((noipa)) void
+bar (char *p)
+{
+  int i;
+  for (i = 0; i < 6; i++)
+if (p[i] != "foobar"[i])
+  __builtin_abort ();
+  for (; i < 32; i++)
+if (p[i] != '\0')
+  __builtin_abort ();
+}
+
+__attribute__((noipa)) void
+foo (unsigned int x)
+{
+  char s[32] = { 'f', 'o', 'o', 'b', 'a', 'r', 0 };
+  ((unsigned int *) s)[2] = __builtin_bswap32 (x);
+  bar (s);
+}
+
+int
+main ()
+{
+  foo (0);
+  return 0;
+}

Jakub


AW: [Patch, fortran] PRs 89843 and 90022 - C Fortran Interop fixes.

2019-04-10 Thread Bader, Reinhold
Hi Gilles,

>
> I also found an other potential issue with copy-in.
>
> If in Fortran, we
>
> call foo(buf(0,0))
>
> then the C subroutine can only access buf(0,0), and other elements such
> as buf(1025,1025) cannot be accessed.
>
> Such elements are valid in buf, but out of bounds in the copy (that
> contains a single element).
>
> Strictly speaking, I cannot say whether this is a violation of the
> standard or not, but I can see how this will
>
> break a lot of existing apps (once again, those apps might be incorrect
> in the first place, but most of us got used to them working).


The above call will only be conforming if the dummy argument is declared 
assumed or explicit size.
Otherwise, the compiler should reject it due to rank mismatch. For assumed 
rank, the call would be
legitimate, but the rank of the dummy argument is then zero. Even if no 
copy-in is performed,
accessing data beyond the address range of that scalar is not strictly 
allowed.

Of more interest is the situation where the dummy argument in Fortran is 
declared, e.g.,

TYPE(*), ASYNCHRONOUS, INTENT(IN) :: BUF(..)

The standard's semantics *forbids* performing copy-in/out in this case, IIRC. 
Otherwise
ASYNCHRONOUS semantics would not work, and non-blocking MPI calls would fail 
due
to buffers vanishing into thin air.

Regards
Reinhold

>
> To me, this is a second reason why copy-in is not desirable (at least as
> a default option).
>
>
>
> Cheers,
>
>
> Gilles
>
> On 4/9/2019 7:18 PM, Paul Richard Thomas wrote:
> > The most part of this patch is concerned with implementing calls from
> > C of of fortran bind c procedures with assumed shape or assumed rank
> > dummies to completely fix PR89843. The conversion of the descriptors
> > from CFI to gfc occur on entry to and reversed on exit from the
> > procedure.
> >
> > This patch is safe for trunk, even at this late stage, because its
> > effects are barricaded behind the tests for CFI descriptors. I believe
> > that it appropriately rewards the bug reporters to have this feature
> > work as well as possible at release.
> >
> > Between comments and the ChangeLogs, this patch is self explanatory.
> >
> > Bootstrapped and regtested on FC29/x86_64 - OK for trunk?
> >
> > Paul
> >
> > 2019-04-09  Paul Thomas  
> >
> >  PR fortran/89843
> >  * trans-decl.c (gfc_get_symbol_decl): Assumed shape and assumed
> >  rank dummies of bind C procs require deferred initialization.
> >  (convert_CFI_desc): New procedure to convert incoming CFI
> >  descriptors to gfc types and back again.
> >  (gfc_trans_deferred_vars): Call it.
> >  * trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Null the CFI
> >  descriptor pointer. Free the descriptor in all cases.
> >
> >  PR fortran/90022
> >  * trans-decl.c (gfc_get_symbol_decl): Make sure that the se
> >  expression is a pointer type before converting it to the symbol
> >  backend_decl type.
> >
> > 2019-04-09  Paul Thomas  
> >
> >  PR fortran/89843
> >  * gfortran.dg/ISO_Fortran_binding_4.f90: Modify the value of x
> >  in ctg. Test the conversion of the descriptor types in the main
> >  program.
> >  * gfortran.dg/ISO_Fortran_binding_10.f90: New test.
> >  * gfortran.dg/ISO_Fortran_binding_10.c: Called by it.
> >
> >  PR fortran/90022
> >  * gfortran.dg/ISO_Fortran_binding_1.c: Correct the indexing for
> >  the computation of 'ans'. Also, change the expected results for
> >  CFI_is_contiguous to comply with standard.
> >  * gfortran.dg/ISO_Fortran_binding_1.f90: Correct the expected
> >  results for CFI_is_contiguous to comply with standard.
> >  * gfortran.dg/ISO_Fortran_binding_9.f90: New test.
> >  * gfortran.dg/ISO_Fortran_binding_9.c: Called by it.
> >
> > 2019-04-09  Paul Thomas  
> >
> >  PR fortran/89843
> >  * runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc): Only
> >  return immediately if the source pointer is null. Bring
> >  forward the extraction of the gfc type. Extract the kind so
> >  that the element size can be correctly computed for sections
> >  and components of derived type arrays. Remove the free of the
> >  CFI descriptor since this is now done in trans-expr.c.
> >  (gfc_desc_to_cfi_desc): Only allocate the CFI descriptor if it
> >  is not null.
> >  (CFI_section): Normalise the difference between the upper and
> >  lower bounds by the stride to correctly calculate the extents
> >  of the section.
> >
> >  PR fortran/90022
> >  * runtime/ISO_Fortran_binding.c (CFI_is_contiguous) : Return
> >  1 for true and 0 otherwise to comply with the standard. Correct
> >  the contiguity check for rank 3 and greater by using the stride
> >  measure of the lower dimension rather than the element length.


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] Fix up wrong-code caused by store_expr (PR middle-end/90025)

2019-04-10 Thread Richard Biener
On Wed, 10 Apr 2019, Jakub Jelinek wrote:

> Hi!
> 
> In r268957 aka PR66152 fix I've slightly changed the store_expr
> code which does store_by_pieces for just a part of an array and
> clear_storage for the rest and since then the following testcase
> is miscompiled on s390x-linux.  I believe one could construct other testcase
> that would be miscompiled before as well though.
> 
> The problem is that store_by_pieces with RETURN_END returns a MEM object
> with QImode and thus also MEM_SIZE of 1, we then adjust_address it to
> BLKmode which doesn't change MEM_SIZE and nothing at least in the generic
> clear_storage code tweaks that MEM_SIZE later on (and while perhaps some
> targets do, e.g. s390 does not).  Because in this case the size is
> really known and constant, the following patch makes sure we do
> what adjust_address does, but in addition make sure to set MEM_SIZE
> properly.  When MEM_SIZE is 1 and in the testcase there is a store after
> it to the first few bytes of the remainder, RTL DSE happily removes that
> memory clear insn emitted for clear_storage, because if it was really size
> 1, it would be indeed redundant, but there are 20 bytes that are not
> redundant after it.
> 
> Bootstrapped/regtested on {x86_64,i686,s390x,powerpc64le}-linux, ok for
> trunk?

OK.

Richard.

> 2019-04-10  Jakub Jelinek  
> 
>   PR middle-end/90025
>   * expr.c (store_expr): Set properly size on the MEM passed to
>   clear_storage.
> 
>   * gcc.c-torture/execute/pr90025.c: New test.
> 
> --- gcc/expr.c.jj 2019-02-21 00:01:59.418057433 +0100
> +++ gcc/expr.c2019-04-09 12:49:20.704457867 +0200
> @@ -5658,7 +5658,8 @@ store_expr (tree exp, rtx target, int ca
>dest_mem = store_by_pieces (target, str_copy_len, string_cst_read_str,
> (void *) str, MEM_ALIGN (target), false,
> RETURN_END);
> -  clear_storage (adjust_address (dest_mem, BLKmode, 0),
> +  clear_storage (adjust_address_1 (dest_mem, BLKmode, 0, 1, 1, 0,
> +exp_len - str_copy_len),
>GEN_INT (exp_len - str_copy_len), BLOCK_OP_NORMAL);
>return NULL_RTX;
>  }
> --- gcc/testsuite/gcc.c-torture/execute/pr90025.c.jj  2019-04-09 
> 12:59:35.876449686 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr90025.c 2019-04-09 
> 12:59:20.156705430 +0200
> @@ -0,0 +1,28 @@
> +/* PR middle-end/90025 */
> +
> +__attribute__((noipa)) void
> +bar (char *p)
> +{
> +  int i;
> +  for (i = 0; i < 6; i++)
> +if (p[i] != "foobar"[i])
> +  __builtin_abort ();
> +  for (; i < 32; i++)
> +if (p[i] != '\0')
> +  __builtin_abort ();
> +}
> +
> +__attribute__((noipa)) void
> +foo (unsigned int x)
> +{
> +  char s[32] = { 'f', 'o', 'o', 'b', 'a', 'r', 0 };
> +  ((unsigned int *) s)[2] = __builtin_bswap32 (x);
> +  bar (s);
> +}
> +
> +int
> +main ()
> +{
> +  foo (0);
> +  return 0;
> +}
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Re: [PATCH][stage1] Support profile (BB counts and edge probabilities) in GIMPLE FE.

2019-04-10 Thread Martin Liška
On 4/9/19 3:19 PM, Jan Hubicka wrote:
>> Hi.
>>
>> There's updated version that supports profile quality for both counts
>> and probabilities. I'm wondering whether ENTRY and EXIT BBs needs to
>> have set probability. Apparently, I haven't seen any verifier that
>> would complain.
> 
> Well, you do not need to define it but then several cases will
> degenerate. In particular BB frequencies (for callgraph profile or
> hot/cold decisions) are calculated as ratios of entry BB and given BB
> count. If entry BB is undefined you will get those undefined and
> heuristics will resort to conservative answers.
> 
> I do not think we use exit block count.
> 
> Honza
> 

Thank you Honza for explanation. I'm sending version of the patch
that supports entry BB count.

I've been testing the patch right now.

Martin
>From cc041144e55b0d002f43a1a2a61364b0e085c4a0 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 4 Apr 2019 14:46:15 +0200
Subject: [PATCH] Support profile (BB counts and edge probabilities) in GIMPLE
 FE.

gcc/ChangeLog:

2019-04-05  Martin Liska  

	* tree-cfg.c (dump_function_to_file): Dump entry BB count.
	* gimple-pretty-print.c (dump_gimple_bb_header):
	Dump BB count.
	(pp_cfg_jump): Dump edge probability.
	* profile-count.c (profile_quality_as_string): Simplify
	with a static array.
	(parse_profile_quality): New function.
	(profile_count::dump): Simplify with a static array.
	(profile_count::from_gcov_type): Add new argument.
	* profile-count.h (parse_profile_quality): Likewise.

gcc/c/ChangeLog:

2019-04-05  Martin Liska  

	* gimple-parser.c (struct gimple_parser): Add probability.
	for gimple_parser_edge.
	(gimple_parser::push_edge): Add new argument probability.
	(c_parser_gimple_parse_bb_spec): Parse also probability
	if present.
	(c_parser_parse_gimple_body): Set edge probability.
	(c_parser_gimple_compound_statement): Consume token
	before calling c_parser_gimple_goto_stmt.
	Parse BB counts.
	(c_parser_gimple_statement): Pass new argument.
	(c_parser_gimple_goto_stmt): Likewise.
	(c_parser_gimple_if_stmt): Likewise.
	(c_parser_gimple_or_rtl_pass_list): Parse hot_bb_threshold.
	* c-parser.c (c_parser_declaration_or_fndef): Pass
	hot_bb_threshold argument.
	* c-tree.h (struct c_declspecs): Add hot_bb_threshold
	field.

gcc/testsuite/ChangeLog:

2019-04-05  Martin Liska  

	* gcc.dg/gimplefe-37.c: New test.
	* gcc.dg/gimplefe-33.c: Likewise.
---
 gcc/c/c-parser.c   |   4 +-
 gcc/c/c-tree.h |   4 +
 gcc/c/gimple-parser.c  | 195 +
 gcc/c/gimple-parser.h  |   3 +-
 gcc/gimple-pretty-print.c  |  13 ++
 gcc/profile-count.c|  88 +++--
 gcc/profile-count.h|  22 +++-
 gcc/testsuite/gcc.dg/gimplefe-37.c |  27 
 gcc/testsuite/gcc.dg/gimplefe-38.c |  27 
 gcc/tree-cfg.c |  11 +-
 10 files changed, 322 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-37.c
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-38.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 741d172ff30..8a6dea4ba06 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -2347,7 +2347,9 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 	  bool saved = in_late_binary_op;
 	  in_late_binary_op = true;
 	  c_parser_parse_gimple_body (parser, specs->gimple_or_rtl_pass,
-  specs->declspec_il);
+  specs->declspec_il,
+  specs->hot_bb_threshold,
+  specs->entry_bb_count);
 	  in_late_binary_op = saved;
 	}
   else
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 9393f6d1545..9294fd4106e 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -317,6 +317,10 @@ struct c_declspecs {
   tree attrs;
   /* The pass to start compiling a __GIMPLE or __RTL function with.  */
   char *gimple_or_rtl_pass;
+  /* Hotness threshold for __GIMPLE FE.  */
+  gcov_type hot_bb_threshold;
+  /* ENTRY BB count.  */
+  profile_count entry_bb_count;
   /* The base-2 log of the greatest alignment required by an _Alignas
  specifier, in bytes, or -1 if no such specifiers with nonzero
  alignment.  */
diff --git a/gcc/c/gimple-parser.c b/gcc/c/gimple-parser.c
index fff34606dae..8a4262d6642 100644
--- a/gcc/c/gimple-parser.c
+++ b/gcc/c/gimple-parser.c
@@ -81,20 +81,23 @@ struct gimple_parser
 int src;
 int dest;
 int flags;
+profile_probability probability;
   };
   auto_vec edges;
   basic_block current_bb;
 
-  void push_edge (int, int, int);
+  void push_edge (int, int, int, profile_probability);
 };
 
 void
-gimple_parser::push_edge (int src, int dest, int flags)
+gimple_parser::push_edge (int src, int dest, int flags,
+			  profile_probability prob)
 {
   gimple_parser_edge e;
   e.src = src;
   e.dest = dest;
   e.flags = flags;
+  e.probability = prob;
   edges.safe_push (e);
 }
 
@@ -120,10 +123,12 @@ static void c_parser_gimple_expr_list (gimple_parser &, vec *);
 
 
 /* See if VAL is an identif

[aarch64] PR90016 - aarch64: reference to undeclared N in help for command line option

2019-04-10 Thread Richard Earnshaw (lists)
'to N' is now redundant and misleading given the earlier change to use
.

Removed

* config/aarch64/aarch64.opt (msve-vector-bits): Remove redundant and
obsolete reference to N.

R.
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 5a1e687091c..3dc768fab39 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -216,7 +216,7 @@ Enum(sve_vector_bits) String(2048) Value(SVE_2048)
 
 msve-vector-bits=
 Target RejectNegative Joined Enum(sve_vector_bits) Var(aarch64_sve_vector_bits) Init(SVE_SCALABLE)
--msve-vector-bits=	Set the number of bits in an SVE vector register to N.
+-msve-vector-bits=	Set the number of bits in an SVE vector register
 
 mverbose-cost-dump
 Target Undocumented Var(flag_aarch64_verbose_cost)


Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-04-10 Thread Richard Earnshaw (lists)
On 01/04/2019 18:23, Steve Ellcey wrote:
> This is a ping**3 for a patch to fix one of the test failures in PR 877763.
> It fixes the gcc.target/aarch64/combine_bfi_1.c failure, but not the other
> ones.
> 
> Could one of the Aarch64 maintainers take a look at it?  This version of
> the patch was originally submitted on February 11 after incorporating some
> changes suggested by Wilco Dijkstra but I have not gotten any comments
> since then despite pinging it.  I updated it to apply cleanly on ToT but
> did not make any other changes since the February 11th version.
> 
> If we want to encourage people to fix bugs in the run up to a release it
> would help to get feedback on bugfix patches.
> 
> Steve Ellcey
> sell...@marvell.com
> 
> 
> 2018-04-01  Steve Ellcey  
> 
>   PR rtl-optimization/87763
>   * config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
>   New prototype.
>   * config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
>   New function.
>   * config/aarch64/aarch64.md (*aarch64_bfi5_shift):
>   New instruction.
>   (*aarch64_bfi4_noand): Ditto.
>   (*aarch64_bfi4_noshift): Ditto.
>   (*aarch64_bfi4_noshift_alt): Ditto.
> 
> 
> 2018-04-01  Steve Ellcey  
> 
>   PR rtl-optimization/87763
>   * gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
>   to bfi.
>   * gcc.target/aarch64/combine_bfi_2.c: New test.
> 
> 
> combine.patch
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index b035e35..b6c0d0a 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
>  void aarch64_declare_function_name (FILE *, const char*, tree);
>  bool aarch64_legitimate_pic_operand_p (rtx);
>  bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
> +bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned 
> HOST_WIDE_INT,
> + unsigned HOST_WIDE_INT,
> + unsigned HOST_WIDE_INT);
>  bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
>  bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
>  opt_machine_mode aarch64_sve_pred_mode (unsigned int);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b38505b..3017e99 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9336,6 +9336,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode 
> mode, rtx mask,
>& ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
>  }
>  
> +/* Return true if the masks and a shift amount from an RTX of the form
> +   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
> +   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
> +
> +bool
> +aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
> +unsigned HOST_WIDE_INT mask1,
> +unsigned HOST_WIDE_INT shft_amnt,
> +unsigned HOST_WIDE_INT mask2)
> +{
> +  unsigned HOST_WIDE_INT t;
> +
> +  /* Verify that there is no overlap in what bits are set in the two masks.  
> */
> +  if (mask1 != ~mask2)
> +return false;
> +
> +  /* Verify that mask2 is not all zeros or ones.  */
> +  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
> +return false;
> +
> +  /* The shift amount should always be less than the mode size.  */
> +  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
> +
> +  /* Verify that the mask being shifted is contiguous and would be in the
> + least significant bits after shifting by shft_amnt.  */
> +  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
> +  return (t == (t & -t));
> +}
> +
>  /* Calculate the cost of calculating X, storing it in *COST.  Result
> is true if the total cost of the operation has now been calculated.  */
>  static bool
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 70f0418..baa8983 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5490,6 +5490,76 @@
>[(set_attr "type" "bfm")]
>  )
>  
> +;;  Match a bfi instruction where the shift of OP3 means that we are
> +;;  actually copying the least significant bits of OP3 into OP0 by way
> +;;  of the AND masks and the IOR instruction.  A similar instruction
> +;;  with the two parts of the IOR swapped around was never triggered
> +;;  in a bootstrap build and test of GCC so it was not included.
> +
> +(define_insn "*aarch64_bfi5_shift"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
> +  (match_operand:GPI 2 "const_int_operand" "n"))
> + (and:GPI (ashift:GPI
> +   (match_operand:GPI 3 "register_operand" "r")
> +  

Re: [PATCH] Add PSTL internal namespace qualifications

2019-04-10 Thread Jonathan Wakely

On 09/04/19 14:48 -0700, Thomas Rodgers wrote:


Committed to trunk.


Thanks!

For the record, I approved the patch over IRC but I forgot to say so
on the list.




[PATCH] Clearly document behaviour of multiple -g options

2019-04-10 Thread Jonathan Wakely

This copies the wording from the -O options to clearly state what
happens if more than one -g option is used.

* doc/invoke.texi (Debugging Options): Explicitly state the semantics
of using multiple -g options.

OK for trunk?


commit 9d950d2398b0a01358520a1c0a69bbe46ba7d997
Author: Jonathan Wakely 
Date:   Wed Apr 10 12:15:29 2019 +0100

Clearly document behaviour of multiple -g options

This copies the wording from the -O options to clearly state what
happens if more than one -g option is used.

* doc/invoke.texi (Debugging Options): Explicitly state the 
semantics
of using multiple -g options.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index cfc3063929b..73a7789d879 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7818,6 +7818,9 @@ Level 3 includes extra information, such as all the macro 
definitions
 present in the program.  Some debuggers support macro expansion when
 you use @option{-g3}.
 
+If you use multiple @option{-g} options, with or without level numbers,
+the last such option is the one that is effective.
+
 @option{-gdwarf} does not accept a concatenated debug level, to avoid
 confusion with @option{-gdwarf-@var{level}}.
 Instead use an additional @option{-g@var{level}} option to change the


Re: [PATCH] Remove unused DR_GROUP_SAME_DR_STMT

2019-04-10 Thread Richard Biener
yOn Tue, 9 Apr 2019, Richard Biener wrote:

> 
> While looking at PR90018 I figured DR_GROUP_SAME_DR_STMT is never
> set since GCC 4.9.  The following removes it to confuse the
> occasional reader of the vectorizer code less.
> 
> Bootstrap / regtest in progress on x86_64-unknown-linux-gnu.

Testing shows an issue in the new group splitting being done
(comparing 8(OVF) with 8 by pointer equality) and a write-only
variable leftover.

I've applied the following instead.

Bootstrapped/tested on x86_64-unknown-linux-gnu.

Richard.

2019-04-10  Richard Biener  

* tree-vectorizer.h (_stmt_vec_info): Remove same_dr_stmt
member.
(DR_GROUP_SAME_DR_STMT): Remove.
* tree-vect-stmts.c (vectorizable_load): Remove unreachable code.
* tree-vect-data-refs.c (vect_analyze_group_access_1): Likewise,
replace with assert.
(vect_analyze_data_ref_accesses): Fix INTEGER_CST comparison.
(vect_record_grouped_load_vectors): Remove unreachable code.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   (revision 270223)
+++ gcc/tree-vectorizer.h   (working copy)
@@ -872,9 +872,6 @@ struct _stmt_vec_info {
   stmt_vec_info first_element;
   /* Pointer to the next element in the group.  */
   stmt_vec_info next_element;
-  /* For data-refs, in case that two or more stmts share data-ref, this is the
- pointer to the previously detected stmt with the same dr.  */
-  stmt_vec_info same_dr_stmt;
   /* The size of the group.  */
   unsigned int size;
   /* For stores, number of stores from this group seen. We vectorize the last
@@ -1044,8 +1041,6 @@ STMT_VINFO_BB_VINFO (stmt_vec_info stmt_
   (gcc_checking_assert ((S)->dr_aux.dr), (S)->store_count)
 #define DR_GROUP_GAP(S) \
   (gcc_checking_assert ((S)->dr_aux.dr), (S)->gap)
-#define DR_GROUP_SAME_DR_STMT(S) \
-  (gcc_checking_assert ((S)->dr_aux.dr), (S)->same_dr_stmt)
 
 #define REDUC_GROUP_FIRST_ELEMENT(S) \
   (gcc_checking_assert (!(S)->dr_aux.dr), (S)->first_element)
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 270223)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -7704,19 +7727,6 @@ vectorizable_load (stmt_vec_info stmt_in
 "group loads with negative dependence distance\n");
  return false;
}
-
-  /* Similarly when the stmt is a load that is both part of a SLP
- instance and a loop vectorized stmt via the same-dr mechanism
-we have to give up.  */
-  if (DR_GROUP_SAME_DR_STMT (stmt_info)
- && (STMT_SLP_TYPE (stmt_info)
- != STMT_SLP_TYPE (DR_GROUP_SAME_DR_STMT (stmt_info
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"conflicting SLP types for CSEd load\n");
- return false;
-   }
 }
   else
 group_size = 1;
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 270245)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -2523,40 +2562,15 @@ vect_analyze_group_access_1 (dr_vec_info
   struct data_reference *data_ref = dr;
   unsigned int count = 1;
   tree prev_init = DR_INIT (data_ref);
-  stmt_vec_info prev = stmt_info;
   HOST_WIDE_INT diff, gaps = 0;
 
   /* By construction, all group members have INTEGER_CST DR_INITs.  */
   while (next)
 {
-  /* Skip same data-refs.  In case that two or more stmts share
- data-ref (supported only for loads), we vectorize only the first
- stmt, and the rest get their vectorized loads from the first
- one.  */
-  if (!tree_int_cst_compare (DR_INIT (data_ref),
-DR_INIT (STMT_VINFO_DATA_REF (next
-{
-  if (DR_IS_WRITE (data_ref))
-{
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "Two store stmts share the same dr.\n");
-  return false;
-}
-
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"Two or more load stmts share the same dr.\n");
+  /* We never have the same DR multiple times.  */
+  gcc_assert (tree_int_cst_compare (DR_INIT (data_ref),
+   DR_INIT (STMT_VINFO_DATA_REF (next))) != 0);
 
- /* For load use the same data-ref load.  */
- DR_GROUP_SAME_DR_STMT (next) = prev;
-
- prev = next;
- next = DR_GROUP_NEXT_ELEMENT (next);
- continue;
-}
-
- prev = next;
  data_ref = STMT_VINFO_DATA_REF (next);
 
  

PR90030 "Fortran OpenACC subarray data alignment" (was: [PATCH] Fortran OpenMP 4.0 target support)

2019-04-10 Thread Thomas Schwinge
Hi Jakub!

In context of PR90030 "Fortran OpenACC subarray data alignment" (which
actually is reproducible for OpenMP with nvptx offloading in the very
same way, see below), can you please explain the reason for the seven
"[var] = fold_convert (build_pointer_type (char_type_node), [var])"
instances that you've added as part of your 2014 trunk r211768 "Fortran
OpenMP 4.0 target support" commit?

Replacing all these with "gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr)))"
(see the attached WIP patch, which also includes an OpenMP test case), I
don't see any ill effects for 'check-gcc-fortran', and
'check-target-libgomp' with nvptx offloading, and the errors 'libgomp:
cuStreamSynchronize error: misaligned address' are gone.  I added these
'gcc_assert's just for checking; Cesar in
, and Julian in
 propose to
simply drop (a subset of) these casts.  Do we need (a) all, (b) some, (c)
none of these casts?  And do we want to replace them with 'gcc_assert's,
or not do that?

If approving such a patch (for all release branches), please respond with
"Reviewed-by: NAME " so that your effort will be recorded in the
commit log, see .

For reference, see the seven 'char_type_node' instances:

On Tue, 17 Jun 2014 23:03:47 +0200, Jakub Jelinek  wrote:
> --- gcc/fortran/trans-openmp.c.jj 2014-06-16 10:06:39.164099047 +0200
> +++ gcc/fortran/trans-openmp.c2014-06-17 19:32:58.939176877 +0200
> @@ -873,6 +873,110 @@ gfc_omp_clause_dtor (tree clause, tree d
>  }
>  
>  
> +void
> +gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
> +{
> +  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
> +return;
> +
> +  tree decl = OMP_CLAUSE_DECL (c);
> +  tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
> +  if (POINTER_TYPE_P (TREE_TYPE (decl)))
> +{
> +  if (!gfc_omp_privatize_by_reference (decl)
> +   && !GFC_DECL_GET_SCALAR_POINTER (decl)
> +   && !GFC_DECL_GET_SCALAR_ALLOCATABLE (decl)
> +   && !GFC_DECL_CRAY_POINTEE (decl)
> +   && !GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (TREE_TYPE (decl
> + return;
> +  c4 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
> +  OMP_CLAUSE_MAP_KIND (c4) = OMP_CLAUSE_MAP_POINTER;
> +  OMP_CLAUSE_DECL (c4) = decl;
> +  OMP_CLAUSE_SIZE (c4) = size_int (0);
> +  decl = build_fold_indirect_ref (decl);
> +  OMP_CLAUSE_DECL (c) = decl;
> +}
> +  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
> +{
> +  stmtblock_t block;
> +  gfc_start_block (&block);
> +  tree type = TREE_TYPE (decl);
> +  tree ptr = gfc_conv_descriptor_data_get (decl);
> +  ptr = fold_convert (build_pointer_type (char_type_node), ptr);
> +  ptr = build_fold_indirect_ref (ptr);
> +  OMP_CLAUSE_DECL (c) = ptr;
> +  c2 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
> +  OMP_CLAUSE_MAP_KIND (c2) = OMP_CLAUSE_MAP_TO_PSET;
> +  OMP_CLAUSE_DECL (c2) = decl;
> +  OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (type);
> +  c3 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
> +  OMP_CLAUSE_MAP_KIND (c3) = OMP_CLAUSE_MAP_POINTER;
> +  OMP_CLAUSE_DECL (c3) = gfc_conv_descriptor_data_get (decl);
> +  OMP_CLAUSE_SIZE (c3) = size_int (0);
> +  tree size = create_tmp_var (gfc_array_index_type, NULL);
> +  tree elemsz = TYPE_SIZE_UNIT (gfc_get_element_type (type));
> +  elemsz = fold_convert (gfc_array_index_type, elemsz);
> +  if (GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER
> +   || GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER_CONT)
> + {
> +   stmtblock_t cond_block;
> +   tree tem, then_b, else_b, zero, cond;
> +
> +   gfc_init_block (&cond_block);
> +   tem = gfc_full_array_size (&cond_block, decl,
> +  GFC_TYPE_ARRAY_RANK (type));
> +   gfc_add_modify (&cond_block, size, tem);
> +   gfc_add_modify (&cond_block, size,
> +   fold_build2 (MULT_EXPR, gfc_array_index_type,
> +size, elemsz));
> +   then_b = gfc_finish_block (&cond_block);
> +   gfc_init_block (&cond_block);
> +   zero = build_int_cst (gfc_array_index_type, 0);
> +   gfc_add_modify (&cond_block, size, zero);
> +   else_b = gfc_finish_block (&cond_block);
> +   tem = gfc_conv_descriptor_data_get (decl);
> +   tem = fold_convert (pvoid_type_node, tem);
> +   cond = fold_build2_loc (input_location, NE_EXPR,
> +   boolean_type_node, tem, null_pointer_node);
> +   gfc_add_expr_to_block (&block, build3_loc (input_location, COND_EXPR,
> +  void_type_node, cond,
> +  then_b, else_b));
> + }
> +  else
> + {
> +   gfc_add_modify (&block, size,
> +   gfc_full_array_size 

[PATCH] Fix PR90018

2019-04-10 Thread Richard Biener


The following patch will fix the 527.cam4_r regression on the
branch (backport in testing, 2nd patch).  The issue is latent
on trunk, trunk avoids the offening grouping in more intelligent
way.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-04-10  Richard Biener  

PR tree-optimization/90018
* tree-vect-data-refs.c (vect_preserves_scalar_order_p):
Test both SLP and interleaving variants.

* gcc.dg/vect/pr90018.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 270252)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -234,26 +234,60 @@ vect_preserves_scalar_order_p (dr_vec_in
 return true;
 
   /* STMT_A and STMT_B belong to overlapping groups.  All loads in a
- group are emitted at the position of the last scalar load and all
- stores in a group are emitted at the position of the last scalar store.
+ SLP group are emitted at the position of the last scalar load and
+ all loads in an interleaving group are emitted at the position
+ of the first scalar load.
+ Stores in a group are emitted at the position of the last scalar store.
  Compute that position and check whether the resulting order matches
- the current one.  */
-  stmt_vec_info last_a = DR_GROUP_FIRST_ELEMENT (stmtinfo_a);
+ the current one.
+ We have not yet decided between SLP and interleaving so we have
+ to conservatively assume both.  */
+  stmt_vec_info il_a;
+  stmt_vec_info last_a = il_a = DR_GROUP_FIRST_ELEMENT (stmtinfo_a);
   if (last_a)
-for (stmt_vec_info s = DR_GROUP_NEXT_ELEMENT (last_a); s;
-s = DR_GROUP_NEXT_ELEMENT (s))
-  last_a = get_later_stmt (last_a, s);
+{
+  for (stmt_vec_info s = DR_GROUP_NEXT_ELEMENT (last_a); s;
+  s = DR_GROUP_NEXT_ELEMENT (s))
+   last_a = get_later_stmt (last_a, s);
+  if (!DR_IS_WRITE (STMT_VINFO_DATA_REF (stmtinfo_a)))
+   {
+ for (stmt_vec_info s = DR_GROUP_NEXT_ELEMENT (il_a); s;
+  s = DR_GROUP_NEXT_ELEMENT (s))
+   if (get_later_stmt (il_a, s) == il_a)
+ il_a = s;
+   }
+  else
+   il_a = last_a;
+}
   else
-last_a = stmtinfo_a;
-  stmt_vec_info last_b = DR_GROUP_FIRST_ELEMENT (stmtinfo_b);
+last_a = il_a = stmtinfo_a;
+  stmt_vec_info il_b;
+  stmt_vec_info last_b = il_b = DR_GROUP_FIRST_ELEMENT (stmtinfo_b);
   if (last_b)
-for (stmt_vec_info s = DR_GROUP_NEXT_ELEMENT (last_b); s;
-s = DR_GROUP_NEXT_ELEMENT (s))
-  last_b = get_later_stmt (last_b, s);
+{
+  for (stmt_vec_info s = DR_GROUP_NEXT_ELEMENT (last_b); s;
+  s = DR_GROUP_NEXT_ELEMENT (s))
+   last_b = get_later_stmt (last_b, s);
+  if (!DR_IS_WRITE (STMT_VINFO_DATA_REF (stmtinfo_b)))
+   {
+ for (stmt_vec_info s = DR_GROUP_NEXT_ELEMENT (il_b); s;
+  s = DR_GROUP_NEXT_ELEMENT (s))
+   if (get_later_stmt (il_b, s) == il_b)
+ il_b = s;
+   }
+  else
+   il_b = last_b;
+}
   else
-last_b = stmtinfo_b;
-  return ((get_later_stmt (last_a, last_b) == last_a)
- == (get_later_stmt (stmtinfo_a, stmtinfo_b) == stmtinfo_a));
+last_b = il_b = stmtinfo_b;
+  bool a_after_b = (get_later_stmt (stmtinfo_a, stmtinfo_b) == stmtinfo_a);
+  return (/* SLP */
+ (get_later_stmt (last_a, last_b) == last_a) == a_after_b
+ /* Interleaving */
+ && (get_later_stmt (il_a, il_b) == il_a) == a_after_b
+ /* Mixed */
+ && (get_later_stmt (il_a, last_b) == il_a) == a_after_b
+ && (get_later_stmt (last_a, il_b) == last_a) == a_after_b);
 }
 
 /* A subroutine of vect_analyze_data_ref_dependence.  Handle
Index: gcc/testsuite/gcc.dg/vect/pr90018.c
===
--- gcc/testsuite/gcc.dg/vect/pr90018.c (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/pr90018.c (working copy)
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+
+#include "tree-vect.h"
+
+void __attribute__((noinline,noclone))
+foo (double *a4, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  /* We may not apply interleaving to the group (a), (b) because of (c).
+ Instead group (d) and (b).  */
+  double tem1 = a4[i*4] + a4[i*4+n*4] /* (a) */;
+  double tem2 = a4[i*4+2*n*4+1];
+  a4[i*4+n*4+1] = tem1; /* (c) */
+  a4[i*4+1] = tem2;
+  double tem3 = a4[i*4] - tem2;
+  double tem4 = tem3 + a4[i*4+n*4] /* (d) */;
+  a4[i*4+n*4+1] = tem4 + a4[i*4+n*4+1] /* (b) */;
+}
+}
+int main(int argc, char **argv)
+{
+  int n = 11;
+  double a4[4 * n * 3];
+  double a42[4 * n * 3];
+  check_vect ();
+  for (int i = 0; i < 4 * n * 3; ++i)
+{
+  a4[i] = a42[i] = i;
+  __asm__ volatile ("": : : "memory");
+}
+  foo (a4, n);
+  for (int i = 0; i < n; ++i)
+{
+  double tem1 = a42[i*4] + a42[i

Re: [PATCH] Clearly document behaviour of multiple -g options

2019-04-10 Thread Sandra Loosemore

On 4/10/19 5:18 AM, Jonathan Wakely wrote:

This copies the wording from the -O options to clearly state what
happens if more than one -g option is used.

 * doc/invoke.texi (Debugging Options): Explicitly state the semantics
 of using multiple -g options.

OK for trunk?


Looks fine to me, assuming the behavior you're documenting is indeed 
what happens.  Thanks.


-Sandra




Re: [PATCH][Version 3]Come up with -flive-patching master option.

2019-04-10 Thread Jonathan Wakely

On 20/11/18 09:32 -0600, Qing Zhao wrote:

Hi,

this is the newest version of the patch.
major changes:
1. format fixes raised by Martin;
2. output error when disabled IPA optimizations are turned on 
explicitly by the user
with -flive-patching at the same time. based on Honza’s suggestions.

the new changes have been bootstrapped and tested on both aarch64 and x86, no 
regressions.

Okay for commit?

thanks.

Qing.

===
gcc/ChangeLog:

2018-11-20  qing zhao  

* cif-code.def (EXTERN_LIVE_ONLY_STATIC): New CIF code.
* common.opt: Add -flive-patching flag.
* doc/invoke.texi: Document -flive-patching.
* flag-types.h (enum live_patching_level): New enum.
* ipa-inline.c (can_inline_edge_p): Disable external functions from
inlining when flag_live_patching is LIVE_PATCHING_INLINE_ONLY_STATIC.
* opts.c (control_options_for_live_patching): New function.
(finish_options): Make flag_live_patching incompatible with flag_lto.
Control IPA optimizations based on different levels of
flag_live_patching.

gcc/testsuite/ChangeLog:

2018-11-20  qing zhao  

* gcc.dg/live-patching-1.c: New test.
* gcc.dg/live-patching-2.c: New test.
* gcc.dg/live-patching-3.c: New test.
* gcc.dg/tree-ssa/writeonly-3.c: New test.
* gcc.target/i386/ipa-stack-alignment-2.c: New test.




Hi Qing Zhao,

I was looking at the new docs for -flive-patching and have some
questions.


--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -416,6 +416,7 @@ Objective-C and Objective-C++ Dialects}.
-fipa-bit-cp -fipa-vrp @gol
-fipa-pta  -fipa-profile  -fipa-pure-const  -fipa-reference  
-fipa-reference-addressable @gol
-fipa-stack-alignment  -fipa-icf  -fira-algorithm=@var{algorithm} @gol
+-flive-patching=@var{level} @gol
-fira-region=@var{region}  -fira-hoist-pressure @gol
-fira-loop-pressure  -fno-ira-share-save-slots @gol
-fno-ira-share-spill-slots @gol
@@ -9045,6 +9046,65 @@ equivalences that are found only by GCC and equivalences 
found only by Gold.

This flag is enabled by default at @option{-O2} and @option{-Os}.

+@item -flive-patching=@var{level}
+@opindex flive-patching
+Control GCC's optimizations to provide a safe compilation for live-patching.


"provide a safe compilation" isn't very clear to me. I don't know what
it means to "provide a compilation", let alone a safe one.

Could we say something like "Control GCC’s optimizations to produce
output suitable for live-patching." ?



+If the compiler's optimization uses a function's body or information extracted
+from its body to optimize/change another function, the latter is called an
+impacted function of the former.  If a function is patched, its impacted
+functions should be patched too.
+
+The impacted functions are decided by the compiler's interprocedural


decided or determined?


+optimizations.  For example, inlining a function into its caller, cloning
+a function and changing its caller to call this new clone, or extracting
+a function's pureness/constness information to optimize its direct or
+indirect callers, etc.


I don't know what the second sentence is saying. I can read it two
different ways:

1) Those are examples of interprocedural optimizations which
participate in the decision making, but the actual details of how the
decisions are made are not specified here.

2) Performing those optimizations causes a function to be impacted.

If 1) is the intended meaning, then I think it should say "For
example, when inlining a function into its caller, ..."

If 2) is the intended meaning, then I think it should say "For
example, a caller is impacted when inlining a function
into its caller ...".

Does either of those suggestions match the intended meaning? Or do you
have a better way to rephrase it?


+Usually, the more IPA optimizations enabled, the larger the number of
+impacted functions for each function.  In order to control the number of
+impacted functions and computed the list of impacted function easily,


Should be "and more easily compute the list of impacted functions".


+we provide control to partially enable IPA optimizations on two different
+levels.


We don't usually say "we provide" like this. I suggest simply "IPA
optimizations can be partially enabled at two different levels."


+
+The @var{level} argument should be one of the following:
+
+@table @samp
+
+@item inline-clone
+
+Only enable inlining and cloning optimizations, which includes inlining,
+cloning, interprocedural scalar replacement of aggregates and partial inlining.
+As a result, when patching a function, all its callers and its clones'
+callers need to be patched as well.


Since you've defined the term "impacted" could this just say "all its
callers and its clones' callers are impacted."?


+@option{-flive-patching=inline-clone} disables the following optimization 
flags:
+@gccoptlist{-fwhole-program  -fipa-pta  -fipa-reference  -fipa-ra @gol
+-fi

Re: [Patch, libstdc++.exp]Update the usage of cached result, rebuild nlocale wrapper for each variant.

2019-04-10 Thread Jonathan Wakely

On 11/03/19 11:54 +, Jonathan Wakely wrote:

Hi, I've just noticed this in your r266209 change from November:

On 16/11/18 10:42 +, Renlin Li wrote:

+return [check_v3_target_prop_cached et_parallel_mode {
+   global cxxflags
+   global v3-libgomp
# If 'make check-parallel' is running the test succeeds.
if { ${v3-libgomp} == 1 && [regexp "libgomp" $cxxflags] } {
-   set et_parallel_mode 1
+   return1 1


Is this a typo? It should be "return 1" not "return1 1" right?



I've committed this fix to trunk.


commit 116f85727b49103ff3c8d4fb152c7a6047eb3546
Author: Jonathan Wakely 
Date:   Wed Apr 10 15:44:35 2019 +0100

Fix typo in effective-target check

* testsuite/lib/libstdc++.exp (check_v3_target_parallel_mode): Fix
typo.

diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 5a12ac0242d..d0efc90a1ba 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -1048,7 +1048,7 @@ proc check_v3_target_parallel_mode { } {
 	global v3-libgomp
 	# If 'make check-parallel' is running the test succeeds.
 	if { ${v3-libgomp} == 1 && [regexp "libgomp" $cxxflags] } {
-	return1 1
+	return 1
 	}
 	return 0
 }]


[PATCH] Change wording of -fipa-icf documentation

2019-04-10 Thread Jonathan Wakely

* doc/invoke.texi (Optimize Options): Change "Nevertheless" to
"Although" in -fipa-icf documentation.

THe old wording doesn't make sense, so committed to trunk as obvious.

commit a2a452bc8191a7ec0ec910a44484c1aab89154d6
Author: Jonathan Wakely 
Date:   Wed Apr 10 15:49:50 2019 +0100

Change wording of -fipa-icf documentation

* doc/invoke.texi (Optimize Options): Change "Nevertheless" to
"Although" in -fipa-icf documentation.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 73a7789d879..755b9f754a1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9359,7 +9359,7 @@ The optimization reduces code size and may disturb unwind 
stacks by replacing
 a function by equivalent one with a different name. The optimization works
 more effectively with link-time optimization enabled.
 
-Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF
+Although the behavior is similar to the Gold Linker's ICF optimization, GCC ICF
 works on different levels and thus the optimizations are not same - there are
 equivalences that are found only by GCC and equivalences found only by Gold.
 


[PATCH] S/390: Add arch13 pipeline description

2019-04-10 Thread Robin Dapp
Hi,

this patch adds the pipeline description and the cpu model number for
arch13.

Bootstrapped and regtested on s390x.

Regards
 Robin

--

gcc/ChangeLog:

2019-04-10  Robin Dapp  

* config/s390/8561.md: New file.
* config/s390/driver-native.c (s390_host_detect_local_cpu): Add arch13
cpu model.
* config/s390/s390-opts.h (enum processor_type): Likewise.
* config/s390/s390.c (s390_get_sched_attrmask): Add arch13.
(s390_get_unit_mask): Likewise.
(s390_is_fpd): Likewise.
(s390_is_fxd): Likewise.
* config/s390/s390.h (s390_tune_attr): Likewise.
* config/s390/s390.md: Include arch13 pipeline description.
* config/s390/s390.opt: Add arch13.
diff --git a/gcc/config/s390/8561.md b/gcc/config/s390/8561.md
new file mode 100644
index 000..e5a345f4dba
--- /dev/null
+++ b/gcc/config/s390/8561.md
@@ -0,0 +1,287 @@
+;; Scheduling description for arch13.
+;;   Copyright (C) 2019 Free Software Foundation, Inc.
+;;   Contributed by Robin Dapp (rd...@linux.ibm.com)
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+;; for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_attr "arch13_unit_fpd" ""
+(cond [(eq_attr "mnemonic" "ddb,ddbr,deb,debr,dxbr,sqdb,sqdbr,sqeb,\
+sqebr,sqxbr,vfddb,vfdsb,vfsqdb,vfsqsb,wfddb,wfdsb,wfdxb,wfsqdb,wfsqxb")
+ (const_int 1)] (const_int 0)))
+
+(define_attr "arch13_unit_fxa" ""
+(cond [(eq_attr "mnemonic" "a,afi,ag,agf,agfi,agfr,agh,aghi,aghik,\
+agr,agrk,ah,ahi,ahik,ahy,al,alc,alcg,alcgr,alcr,alfi,alg,algf,algfi,algfr,\
+alghsik,algr,algrk,alhsik,alr,alrk,aly,ar,ark,ay,bras,brasl,etnd,exrl,flogr,\
+ic,icm,icmh,icmy,icy,iihf,iilf,ipm,la,larl,lay,lb,lbr,lcgr,lcr,lgb,lgbr,\
+lgf,lgfi,lgfr,lgfrl,lgh,lghi,lghr,lghrl,lgr,lh,lhi,lhr,lhrl,lhy,llcr,\
+llgcr,llgfr,llghr,llgtr,llhr,llihf,llihh,llihl,llilf,llilh,llill,lngr,lnr,\
+loc,locg,locghi,locgr,lochi,locr,lpgr,lpr,lr,lrv,lrvg,lrvgr,lrvh,lrvr,lt,\
+ltg,ltgf,ltgfr,ltgr,ltr,m,mfy,mg,mgh,mghi,mgrk,mh,mhi,mhy,ml,mlg,mlgr,mlr,\
+mr,ms,msfi,msg,msgf,msgfi,msgfr,msgr,msgrkc,msr,msrkc,msy,n,ng,ngr,ngrk,\
+nihf,nihh,nihl,nilf,nilh,nill,nr,nrk,ny,o,og,ogr,ogrk,oihf,oihh,oihl,oilf,\
+oilh,oill,or,ork,oy,pfpo,popcnt,risbg,risbgn,rll,rllg,s,sg,sgf,sgfr,sgh,\
+sgr,sgrk,sh,shy,sl,slb,slbg,slbgr,slbr,slfi,slg,slgf,slgfi,slgfr,slgr,\
+slgrk,sll,sllg,sllk,slr,slrk,sly,sr,sra,srag,srak,srk,srl,srlg,srlk,sy,x,xg,\
+xgr,xgrk,xihf,xilf,xr,xrk,xy")
+ (const_int 1)] (const_int 0)))
+
+(define_attr "arch13_unit_fxb" ""
+(cond [(eq_attr "mnemonic" "agsi,algsi,alsi,asi,b,bc,bcr,bi,br,brcl,\
+c,cfi,cg,cgf,cgfi,cgfr,cgfrl,cgh,cghi,cghrl,cghsi,cgit,cgr,cgrl,cgrt,ch,\
+chi,chrl,chsi,chy,cit,cl,clfhsi,clfi,clfit,clg,clgf,clgfi,clgfr,clgfrl,\
+clghrl,clghsi,clgit,clgr,clgrl,clgrt,clgt,clhhsi,clhrl,cli,cliy,clm,clmy,clr,\
+clrl,clrt,clt,cly,cr,crl,crt,cy,j,jg,laa,laag,lan,lang,lao,laog,lat,lax,\
+laxg,lcdfr,ldgr,ldr,lgat,lgdr,lndfr,lpdfr,lzdr,lzer,mvghi,mvhhi,mvhi,mvi,\
+mviy,ni,niy,nop,nopr,ntstg,oi,oiy,ppa,st,stc,stcy,std,stdy,ste,stey,stg,\
+stgrl,sth,sthrl,sthy,stoc,stocg,strl,strv,strvg,strvh,sty,tend,tm,tmh,tmhh,\
+tmhl,tml,tmlh,tmll,tmy,vlgvb,vlgvf,vlgvg,vlgvh,vlr,vlvgb,vlvgf,vlvgg,vlvgh,\
+vlvgp,vst,vstef,vsteg,vstl,vstrl,vstrlr,xi,xiy")
+ (const_int 1)] (const_int 0)))
+
+(define_attr "arch13_unit_fxd" ""
+(cond [(eq_attr "mnemonic" "dlgr,dlr,dr,dsgfr,dsgr")
+ (const_int 1)] (const_int 0)))
+
+(define_attr "arch13_unit_lsu" ""
+(cond [(eq_attr "mnemonic" "a,adb,aeb,ag,agf,agh,agsi,ah,ahy,al,alc,\
+alcg,alg,algf,algsi,alsi,aly,asi,ay,c,cdb,ceb,cg,cgf,cgfrl,cgh,cghrl,cghsi,\
+cgrl,ch,chrl,chsi,chy,cl,clc,clfhsi,clg,clgf,clgfrl,clghrl,clghsi,clgrl,\
+clgt,clhhsi,clhrl,cli,cliy,clm,clmy,clrl,clt,cly,crl,cy,ddb,deb,ear,ic,icm,\
+icmh,icmy,icy,l,laa,laag,lan,lang,lao,laog,lat,lax,laxg,lb,lcbb,ld,lde,\
+ldeb,ldy,le,ley,lg,lgat,lgb,lgf,lgfrl,lgh,lghrl,lgrl,lh,lhrl,lhy,llc,llgc,\
+llgf,llgfrl,llgh,llghrl,llgt,llh,llhrl,loc,locg,lrl,lrv,lrvg,lrvh,lt,ltg,\
+ltgf,ly,m,madb,maeb,mdb,meeb,mfy,mg,mgh,mh,mhy,ml,mlg,ms,msdb,mseb,msg,\
+msgf,msy,mvghi,mvhhi,mvhi,mvi,mviy,n,ng,ni,niy,ntstg,ny,o,og,oi,oiy,oy,s,\
+sar,sdb,seb,sfpc,sg,sgf,sgh,sh,shy,sl,slb,slbg,slg,slgf,sly,sqdb,sqeb,st,\
+stc,stcy,std,stdy,ste,stey,stg,stgrl,sth,sthrl,sthy,stoc,stocg,strl,strv,\
+strvg,strvh,sty,sy,tabort,tm,tmy,vl,vlbb,vleb,vlef,vleg,vleh,vll,vllezb,\
+vllezf,vllezg,vllezh,vllezlf,vlrepb,vlrepf,vlrepg,vlreph,vlrl,vlrlr,vst,\
+vstef,vsteg,vstl,vstrl,vstrlr,x,xg,xi

Re: [PATCH] D support for RISC-V

2019-04-10 Thread Iain Buclaw
On Wed, 10 Apr 2019 at 01:58, Jim Wilson  wrote:
>
> On Tue, Apr 9, 2019 at 10:36 AM Iain Buclaw  wrote:
> > Any objection if I upstream the non-asm bits?
>
> Doesn't all of it, except maybe the configure.tgt patch need to go
> upstream first?  And do you need some paperwork or button click from
> David for D language patches for the copyright assignment?
>

There is nothing to be signed, only the compiler front-end itself has
such a process in place.

> Is there a problem with the asm bits?  The fenv.h functions like
> fegetenv could be used for this, assuming you can include a C header
> file in D code.  But if this is the D language implementation of
> fenv.h, then you probably can't avoid the assembly code.
>

The asm syntax is specific to gdc, and there's not presently a policy
in place to upstream them (each module that contains deviations is
appropriately marked at the top of the file) .  Ideally asm bits
should be in separate modules, then it would just be a case of
swapping upstreams for our own.

The FloatingPointControl struct is more closely related to
fpu-control.h.  And no, you can't just include C headers from the D
compiler. that would make quite a lot of things simpler if it did.

Iain.


Re: [PATCH] S/390: Add arch13 pipeline description

2019-04-10 Thread Andreas Krebbel
On 10.04.19 17:19, Robin Dapp wrote:
> 2019-04-10  Robin Dapp  
> 
>   * config/s390/8561.md: New file.
>   * config/s390/driver-native.c (s390_host_detect_local_cpu): Add arch13
> cpu model.
>   * config/s390/s390-opts.h (enum processor_type): Likewise.
>   * config/s390/s390.c (s390_get_sched_attrmask): Add arch13.
>   (s390_get_unit_mask): Likewise.
>   (s390_is_fpd): Likewise.
>   (s390_is_fxd): Likewise.
>   * config/s390/s390.h (s390_tune_attr): Likewise.
>   * config/s390/s390.md: Include arch13 pipeline description.
>   * config/s390/s390.opt: Add arch13.
> 

Ok. Thanks!

Andreas



[PATCH] Fix __patchable_function_entries section flags

2019-04-10 Thread Joao Moreira
When -fpatchable-relocation-entry is used, gcc places nops on the
prologue of each compiled function and creates a section named
__patchable_function_entries which holds relocation entries for the
positions in which the nops were placed. As is, gcc creates this
section without the proper section flags, causing crashes in the
compiled program during its load.

Given the above, fix the problem by creating the section with the
SECTION_WRITE and SECTION_RELRO flags.

The problem was noticed while compiling glibc with
-fpatchable-function-entry compiler flag. After applying the patch,
this issue was solved.

This was also tested on x86-64 arch without visible problems under
the gcc standard tests.

2019-04-10  Joao Moreira 

* gcc/targhooks.c (default_print_patchable_function_entry): emit
__patchable_function_entries section with writable flags to allow
relocation resolution.

---
 gcc/targhooks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 318f7e9784a..9fbaa255747 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1814,7 +1814,7 @@ default_print_patchable_function_entry (FILE *file,
   ASM_GENERATE_INTERNAL_LABEL (buf, "LPFE", patch_area_number);
 
   switch_to_section (get_section ("__patchable_function_entries",
- 0, NULL));
+ SECTION_WRITE | SECTION_RELRO , NULL));
   fputs (asm_op, file);
   assemble_name_raw (file, buf);
   fputc ('\n', file);
-- 
2.16.4



Re: [PATCH] Fix __patchable_function_entries section flags

2019-04-10 Thread Marek Polacek
On Wed, Apr 10, 2019 at 03:00:43PM -0300, Joao Moreira wrote:
> When -fpatchable-relocation-entry is used, gcc places nops on the
> prologue of each compiled function and creates a section named
> __patchable_function_entries which holds relocation entries for the
> positions in which the nops were placed. As is, gcc creates this
> section without the proper section flags, causing crashes in the
> compiled program during its load.
> 
> Given the above, fix the problem by creating the section with the
> SECTION_WRITE and SECTION_RELRO flags.
> 
> The problem was noticed while compiling glibc with
> -fpatchable-function-entry compiler flag. After applying the patch,
> this issue was solved.
> 
> This was also tested on x86-64 arch without visible problems under
> the gcc standard tests.

Thanks for the patch.  Just some nits:

> 2019-04-10  Joao Moreira 

Two spaces after the name.

> 
>   * gcc/targhooks.c (default_print_patchable_function_entry): emit
>   __patchable_function_entries section with writable flags to allow
>   relocation resolution.

Please drop the gcc/ prefix and use capital E in "emit".

> ---
>  gcc/targhooks.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 318f7e9784a..9fbaa255747 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -1814,7 +1814,7 @@ default_print_patchable_function_entry (FILE *file,
>ASM_GENERATE_INTERNAL_LABEL (buf, "LPFE", patch_area_number);
>  
>switch_to_section (get_section ("__patchable_function_entries",
> -   0, NULL));
> +   SECTION_WRITE | SECTION_RELRO , NULL));

Redundant space before ,.

Marek


Re: [PATCH][Version 3]Come up with -flive-patching master option.

2019-04-10 Thread Qing Zhao
Hi, Jonathan,

thanks for your review on the documentation change for -flive-patching option.

> 
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -416,6 +416,7 @@ Objective-C and Objective-C++ Dialects}.
>> -fipa-bit-cp -fipa-vrp @gol
>> -fipa-pta  -fipa-profile  -fipa-pure-const  -fipa-reference  
>> -fipa-reference-addressable @gol
>> -fipa-stack-alignment  -fipa-icf  -fira-algorithm=@var{algorithm} @gol
>> +-flive-patching=@var{level} @gol
>> -fira-region=@var{region}  -fira-hoist-pressure @gol
>> -fira-loop-pressure  -fno-ira-share-save-slots @gol
>> -fno-ira-share-spill-slots @gol
>> @@ -9045,6 +9046,65 @@ equivalences that are found only by GCC and 
>> equivalences found only by Gold.
>> This flag is enabled by default at @option{-O2} and @option{-Os}.
>> +@item -flive-patching=@var{level}
>> +@opindex flive-patching
>> +Control GCC's optimizations to provide a safe compilation for live-patching.
> 
> "provide a safe compilation" isn't very clear to me. I don't know what
> it means to "provide a compilation", let alone a safe one.
> 
> Could we say something like "Control GCC’s optimizations to produce
> output suitable for live-patching.” ?

yes, this is better.

> 
> 
>> +If the compiler's optimization uses a function's body or information 
>> extracted
>> +from its body to optimize/change another function, the latter is called an
>> +impacted function of the former.  If a function is patched, its impacted
>> +functions should be patched too.
>> +
>> +The impacted functions are decided by the compiler's interprocedural
> 
> decided or determined?
determined is better.

> 
>> +optimizations.  For example, inlining a function into its caller, cloning
>> +a function and changing its caller to call this new clone, or extracting
>> +a function's pureness/constness information to optimize its direct or
>> +indirect callers, etc.
> 
> I don't know what the second sentence is saying. I can read it two
> different ways:
> 
> 1) Those are examples of interprocedural optimizations which
> participate in the decision making, but the actual details of how the
> decisions are made are not specified here.
> 
> 2) Performing those optimizations causes a function to be impacted.
> 
> If 1) is the intended meaning, then I think it should say "For
> example, when inlining a function into its caller, ..."
> 
> If 2) is the intended meaning, then I think it should say "For
> example, a caller is impacted when inlining a function
> into its caller …".

2) is the intended meaining.

> 
> Does either of those suggestions match the intended meaning? Or do you
> have a better way to rephrase it?
> 
>> +Usually, the more IPA optimizations enabled, the larger the number of
>> +impacted functions for each function.  In order to control the number of
>> +impacted functions and computed the list of impacted function easily,
> 
> Should be "and more easily compute the list of impacted functions”.

this is good.
> 
>> +we provide control to partially enable IPA optimizations on two different
>> +levels.
> 
> We don't usually say "we provide" like this. I suggest simply "IPA
> optimizations can be partially enabled at two different levels.”

Okay.
> 
>> +
>> +The @var{level} argument should be one of the following:
>> +
>> +@table @samp
>> +
>> +@item inline-clone
>> +
>> +Only enable inlining and cloning optimizations, which includes inlining,
>> +cloning, interprocedural scalar replacement of aggregates and partial 
>> inlining.
>> +As a result, when patching a function, all its callers and its clones'
>> +callers need to be patched as well.
> 
> Since you've defined the term "impacted" could this just say "all its
> callers and its clones' callers are impacted.”?

I think that the following might be better:

when patching a function, all its callers and its clones’ callers are impacted, 
therefore need to be patched as well.

> 
>> +@option{-flive-patching=inline-clone} disables the following optimization 
>> flags:
>> +@gccoptlist{-fwhole-program  -fipa-pta  -fipa-reference  -fipa-ra @gol
>> +-fipa-icf  -fipa-icf-functions  -fipa-icf-variables @gol
>> +-fipa-bit-cp  -fipa-vrp  -fipa-pure-const  -fipa-reference-addressable @gol
>> +-fipa-stack-alignment}
>> +
>> +@item inline-only-static
>> +
>> +Only enable inlining of static functions.
>> +As a result, when patching a static function, all its callers need to be
>> +patches as well.
> 
> "Typo: "patches" should be "patched", but I'd suggest "are impacted"
> here too.

Okay.
> 
>> +In addition to all the flags that -flive-patching=inline-clone disables,
>> +@option{-flive-patching=inline-only-static} disables the following 
>> additional
>> +optimization flags:
>> +@gccoptlist{-fipa-cp-clone  -fipa-sra  -fpartial-inlining  -fipa-cp}
>> +
>> +@end table
>> +
>> +When -flive-patching specified without any value, the default value
>> +is "inline-clone".
> 
> This should use @option{} and @var{} and is missing the word "is”.
Okay.
> 
>> +This flag is disabled by defa

[PATCH] Update documentation regarding bogus memory leaks in libstdc++

2019-04-10 Thread Jonathan Wakely

* doc/xml/faq.xml: Add information about emergency EH pool.
* doc/xml/manual/debug.xml: Update list of memory debugging tools.
Move outdated information on mt_allocator to a separate section.
* doc/xml/manual/evolution.xml: Clarify that GLIBCXX_FORCE_NEW
doesn't affect the default allocator.

I haven't regenerated the HTML docs, as further doc patches are coming
soon.

Committed to trunk.

commit 0d4f98089277ae49ea56471e74658cb367284f7f
Author: Jonathan Wakely 
Date:   Wed Apr 10 20:20:22 2019 +0100

Update documentation regarding bogus memory leaks in libstdc++

* doc/xml/faq.xml: Add information about emergency EH pool.
* doc/xml/manual/debug.xml: Update list of memory debugging tools.
Move outdated information on mt_allocator to a separate section.
* doc/xml/manual/evolution.xml: Clarify that GLIBCXX_FORCE_NEW
doesn't affect the default allocator.

diff --git a/libstdc++-v3/doc/xml/faq.xml b/libstdc++-v3/doc/xml/faq.xml
index edc07f16acb..b4bf333e26a 100644
--- a/libstdc++-v3/doc/xml/faq.xml
+++ b/libstdc++-v3/doc/xml/faq.xml
@@ -1001,21 +1001,31 @@
 
   
 
-  Memory leaks in containers
+  Memory leaks in libstdc++
 
   
   
-
-  This answer is old and probably no longer be relevant.
-
 
-A few people have reported that the standard containers appear
+Since GCC 5.1.0, libstdc++ automatically allocates a pool
+of a few dozen kilobytes on startup. This pool is used to ensure it's
+possible to throw exceptions (such as bad_alloc)
+even when malloc is unable to allocate any more memory.
+With some versions of http://www.w3.org/1999/xlink"; 
xlink:href="http://valgrind.org/";>valgrind
+this pool will be shown as "still reachable" when the process exits, e.g.
+still reachable: 72,704 bytes in 1 blocks.
+This memory is not a leak, because it's still in use by libstdc++,
+and the memory will be returned to the OS when the process exits.
+Later versions of valgrind know how to free this
+pool as the process exits, and so won't show any "still reachable" memory.
+
+
+In the past, a few people reported that the standard containers appear
 to leak memory when tested with memory checkers such as
 http://www.w3.org/1999/xlink"; 
xlink:href="http://valgrind.org/";>valgrind.
 Under some (non-default) configurations the library's allocators keep
 free memory in a
-pool for later reuse, rather than returning it to the OS.  Although
-this memory is always reachable by the library and is never
+pool for later reuse, rather than deallocating it with delete
+Although this memory is always reachable by the library and is never
 lost, memory debugging tools can report it as a leak.  If you
 want to test the library for memory leaks please read
 Tips for memory leak hunting
diff --git a/libstdc++-v3/doc/xml/manual/debug.xml 
b/libstdc++-v3/doc/xml/manual/debug.xml
index 37e330d3ed2..091e0b6914c 100644
--- a/libstdc++-v3/doc/xml/manual/debug.xml
+++ b/libstdc++-v3/doc/xml/manual/debug.xml
@@ -94,50 +94,35 @@
 
 Memory Leak Hunting
 
+
+  On many targets GCC supports AddressSanitizer, a fast memory error detector,
+  which is enabled by the -fsanitize=address option.
+
 
 
-  There are various third party memory tracing and debug utilities
+  There are also various third party memory tracing and debug utilities
   that can be used to provide detailed memory allocation information
   about C++ code. An exhaustive list of tools is not going to be
   attempted, but includes mtrace, valgrind,
-  mudflap, and the non-free commercial product
-  purify. In addition, libcwd has a
-  replacement for the global new and delete operators that can track
-  memory allocation and deallocation and provide useful memory
-  statistics.
-
-
-
-  Regardless of the memory debugging tool being used, there is one
-  thing of great importance to keep in mind when debugging C++ code
-  that uses new and delete: there are
-  different kinds of allocation schemes that can be used by 
-  std::allocator. For implementation details, see the mt allocator documentation and
-  look specifically for GLIBCXX_FORCE_NEW.
-
-
-
-  In a nutshell, the optional mt_allocator
-  is a high-performance pool allocator, and can
-  give the mistaken impression that in a suspect executable, memory is
-  being leaked, when in reality the memory "leak" is a pool being used
-  by the library's allocator and is reclaimed after program
-  termination.
+  mudflap (no longer supported since GCC 4.9.0), ElectricFence,
+  and the non-free commercial product purify.
+  In addition, libcwd, jemalloc and TCMalloc have replacements
+  for the global new and delete operators
+  that can track memory allocation and deallocation and provide useful
+  memory statistics.
 
 
 
   For valgrind, there are some specific items to keep in mind. First
   of all

Re: [PATCH][Version 3]Come up with -flive-patching master option.

2019-04-10 Thread Jonathan Wakely

On 10/04/19 13:55 -0500, Qing Zhao wrote:

Hi, Jonathan,

thanks for your review on the documentation change for -flive-patching option.




--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -416,6 +416,7 @@ Objective-C and Objective-C++ Dialects}.
-fipa-bit-cp -fipa-vrp @gol
-fipa-pta  -fipa-profile  -fipa-pure-const  -fipa-reference  
-fipa-reference-addressable @gol
-fipa-stack-alignment  -fipa-icf  -fira-algorithm=@var{algorithm} @gol
+-flive-patching=@var{level} @gol
-fira-region=@var{region}  -fira-hoist-pressure @gol
-fira-loop-pressure  -fno-ira-share-save-slots @gol
-fno-ira-share-spill-slots @gol
@@ -9045,6 +9046,65 @@ equivalences that are found only by GCC and equivalences 
found only by Gold.
This flag is enabled by default at @option{-O2} and @option{-Os}.
+@item -flive-patching=@var{level}
+@opindex flive-patching
+Control GCC's optimizations to provide a safe compilation for live-patching.


"provide a safe compilation" isn't very clear to me. I don't know what
it means to "provide a compilation", let alone a safe one.

Could we say something like "Control GCC’s optimizations to produce
output suitable for live-patching.” ?


yes, this is better.





+If the compiler's optimization uses a function's body or information extracted
+from its body to optimize/change another function, the latter is called an
+impacted function of the former.  If a function is patched, its impacted
+functions should be patched too.
+
+The impacted functions are decided by the compiler's interprocedural


decided or determined?

determined is better.




+optimizations.  For example, inlining a function into its caller, cloning
+a function and changing its caller to call this new clone, or extracting
+a function's pureness/constness information to optimize its direct or
+indirect callers, etc.


I don't know what the second sentence is saying. I can read it two
different ways:

1) Those are examples of interprocedural optimizations which
participate in the decision making, but the actual details of how the
decisions are made are not specified here.

2) Performing those optimizations causes a function to be impacted.

If 1) is the intended meaning, then I think it should say "For
example, when inlining a function into its caller, ..."

If 2) is the intended meaning, then I think it should say "For
example, a caller is impacted when inlining a function
into its caller …".


2) is the intended meaining.



Does either of those suggestions match the intended meaning? Or do you
have a better way to rephrase it?


+Usually, the more IPA optimizations enabled, the larger the number of
+impacted functions for each function.  In order to control the number of
+impacted functions and computed the list of impacted function easily,


Should be "and more easily compute the list of impacted functions”.


this is good.



+we provide control to partially enable IPA optimizations on two different
+levels.


We don't usually say "we provide" like this. I suggest simply "IPA
optimizations can be partially enabled at two different levels.”


Okay.



+
+The @var{level} argument should be one of the following:
+
+@table @samp
+
+@item inline-clone
+
+Only enable inlining and cloning optimizations, which includes inlining,
+cloning, interprocedural scalar replacement of aggregates and partial inlining.
+As a result, when patching a function, all its callers and its clones'
+callers need to be patched as well.


Since you've defined the term "impacted" could this just say "all its
callers and its clones' callers are impacted.”?


I think that the following might be better:

when patching a function, all its callers and its clones’ callers are impacted, 
therefore need to be patched as well.


Agreed.




+@option{-flive-patching=inline-clone} disables the following optimization 
flags:
+@gccoptlist{-fwhole-program  -fipa-pta  -fipa-reference  -fipa-ra @gol
+-fipa-icf  -fipa-icf-functions  -fipa-icf-variables @gol
+-fipa-bit-cp  -fipa-vrp  -fipa-pure-const  -fipa-reference-addressable @gol
+-fipa-stack-alignment}
+
+@item inline-only-static
+
+Only enable inlining of static functions.
+As a result, when patching a static function, all its callers need to be
+patches as well.


"Typo: "patches" should be "patched", but I'd suggest "are impacted"
here too.


Okay.



+In addition to all the flags that -flive-patching=inline-clone disables,
+@option{-flive-patching=inline-only-static} disables the following additional
+optimization flags:
+@gccoptlist{-fipa-cp-clone  -fipa-sra  -fpartial-inlining  -fipa-cp}
+
+@end table
+
+When -flive-patching specified without any value, the default value
+is "inline-clone".


This should use @option{} and @var{} and is missing the word "is”.

Okay.



+This flag is disabled by default.
+
+Note that -flive-patching is not supported with link-time optimizer.


s/optimizer./optimization/

Okay.



+(@option{-flto}).
+
@item -fisolate-erroneous-paths-dereference
@opindex fisolate-erroneous-paths-dere

Re: [PATCH] Replace direct PSTL uses of assert() with a macro

2019-04-10 Thread Jonathan Wakely

On 09/04/19 15:23 -0700, Thomas Rodgers wrote:

This also replaces calls to __TBB_ASSERT so that there are two macro
definitions provided by c++config -
__PSTL_ASSERT(_Condition)
__PSTL_ASSERT_MSG(_Condition, _Message)

* include/bits/c++config:
Add definition for __PSTL_ASSERT.
Add definition for __PSTL_ASSERT_MSG.
* include/pstl/algorithm_impl.h: Replace use of assert().
* include/pstl/numeric_impl.h: Replace use of assert().
* include/pstl/parallel_backend_tbb.h:
Replace use of assert().
Replace use of __TBB_ASSERT().

* include/pstl/parallel_backend_utils.h: Replace use of assert().




From d95934a0f325e0934ada829378c3c0dfd6b3628c Mon Sep 17 00:00:00 2001
From: Thomas Rodgers 
Date: Fri, 5 Apr 2019 15:27:35 -0700
Subject: [PATCH] Replace direct PSTL uses of assert() with a macro

This also replaces calls to __TBB_ASSERT so that there are two macro
definitions provided by c++config -
__PSTL_ASSERT(_Condition)
__PSTL_ASSERT_MSG(_Condition, _Message)

* include/bits/c++config:
Add definition for __PSTL_ASSERT.
Add definition for __PSTL_ASSERT_MSG.
* include/pstl/algorithm_impl.h: Replace use of assert().
* include/pstl/numeric_impl.h: Replace use of assert().
* include/pstl/parallel_backend_tbb.h:
Replace use of assert().
Replace use of __TBB_ASSERT().

* include/pstl/parallel_backend_utils.h: Replace use of assert().
---
libstdc++-v3/include/bits/c++config   |  4 
libstdc++-v3/include/pstl/algorithm_impl.h| 14 +++---
libstdc++-v3/include/pstl/numeric_impl.h  |  8 
libstdc++-v3/include/pstl/parallel_backend_tbb.h  | 11 ++-
.../include/pstl/parallel_backend_utils.h | 15 +++
5 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index 66420a9a3f2..8dd04f218b4 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -690,6 +690,10 @@ namespace std
#  undef __PSTL_PAR_BACKEND_TBB
# endif

+# define __PSTL_ASSERT(_Condition) (__glibcxx_assert(_Condition))
+# define __PSTL_ASSERT_MSG(_Condition, _Message) (__glibcxx_assert(_Condition))


I don't think these extra parens around the assert will work. When
_GLIBXCXX_ASSERTIONS is defined __glibcxx_assert expands to a do-while
statement, which isn't valid inside an (expression).


# define __PSTL_PRAGMA(x) _Pragma (#x)

# define __PSTL_STRING_AUX(x) #x
diff --git a/libstdc++-v3/include/pstl/algorithm_impl.h 
b/libstdc++-v3/include/pstl/algorithm_impl.h
index e06bf60151e..a42d3993d1b 100644
--- a/libstdc++-v3/include/pstl/algorithm_impl.h
+++ b/libstdc++-v3/include/pstl/algorithm_impl.h
@@ -1309,7 +1309,7 @@ __pattern_unique_copy(_ExecutionPolicy&& __exec, 
_RandomAccessIterator __first,
return __internal::__except_handler([&__exec, __n, __first, __result, 
__pred, __is_vector, &__mask_buf]() {
bool* __mask = __mask_buf.get();
_DifferenceType __m{};
-__par_backend::parallel_strict_scan(
+__par_backend::__parallel_strict_scan(
std::forward<_ExecutionPolicy>(__exec), __n, 
_DifferenceType(0),
[=](_DifferenceType __i, _DifferenceType __len) -> 
_DifferenceType { // Reduce
_DifferenceType __extra = 0;
@@ -2731,8 +2731,8 @@ __pattern_includes(_ExecutionPolicy&& __exec, 
_ForwardIterator1 __first1, _Forwa
 return !__internal::__parallel_or(
std::forward<_ExecutionPolicy>(__exec), __first2, __last2,
[__first1, __last1, __first2, __last2, &__comp](_ForwardIterator2 
__i, _ForwardIterator2 __j) {
-assert(__j > __i);
-//assert(__j - __i > 1);
+__PSTL_ASSERT(__j > __i);
+//__PSTL_ASSERT(__j - __i > 1);

//1. moving boundaries to "consume" subsequence of equal 
elements
auto __is_equal = [&__comp](_ForwardIterator2 __a, 
_ForwardIterator2 __b) -> bool {
@@ -2756,8 +2756,8 @@ __pattern_includes(_ExecutionPolicy&& __exec, 
_ForwardIterator1 __first1, _Forwa
//2. testing is __a subsequence of the second range included 
into the first range
auto __b = std::lower_bound(__first1, __last1, *__i, __comp);

-assert(!__comp(*(__last1 - 1), *__b));
-assert(!__comp(*(__j - 1), *__i));
+__PSTL_ASSERT(!__comp(*(__last1 - 1), *__b));
+__PSTL_ASSERT(!__comp(*(__j - 1), *__i));
return !std::includes(__b, __last1, __i, __j, __comp);
});
});
@@ -2801,7 +2801,7 @@ __parallel_set_op(_ExecutionPolicy&& __exec, 
_ForwardIterator1 __first1, _Forwar
 

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-04-10 Thread Steve Ellcey
On Wed, 2019-04-10 at 11:10 +0100, Richard Earnshaw (lists) wrote:
> 
> OK with those changes.
> 
> R.

I made the changes you suggested and checked in the patch.  Just to be
complete, here is the final version of the patch that I checked in.

2018-04-10  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
New prototype.
* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
New function.
* config/aarch64/aarch64.md (*aarch64_bfi5_shift):
New instruction.
(*aarch64_bfi5_shift_alt): Ditto.
(*aarch64_bfi4_noand): Ditto.
(*aarch64_bfi4_noand_alt): Ditto.
(*aarch64_bfi4_noshift): Ditto.

2018-04-10  Steve Ellcey  

PR rtl-optimization/87763
* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
to bfi.
* gcc.target/aarch64/combine_bfi_2.c: New test.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35..b6c0d0a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 95e5b03..9be7548 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9336,6 +9336,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	 & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+   unsigned HOST_WIDE_INT mask1,
+   unsigned HOST_WIDE_INT shft_amnt,
+   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if (mask1 != ~mask2)
+return false;
+
+  /* Verify that mask2 is not all zeros or ones.  */
+  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
+return false;
+
+  /* The shift amount should always be less than the mode size.  */
+  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+ least significant bits after shifting by shft_amnt.  */
+  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
+  return (t == (t & -t));
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ab8786a..e0df975 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5491,6 +5491,94 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never triggered
+;;  in a bootstrap build and test of GCC so it was not included.
+
+(define_insn "*aarch64_bfi5_shift"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "0")
+  (match_operand:GPI 2 "const_int_operand" "n"))
+ (and:GPI (ashift:GPI
+   (match_operand:GPI 3 "register_operand" "r")
+   (match_operand:GPI 4 "aarch64_simd_shift_imm_" "n"))
+  (match_operand:GPI 5 "const_int_operand" "n"]
+  "aarch64_masks_and_shift_for_bfi_p (mode, UINTVAL (operands[2]),
+  UINTVAL (operands[4]),
+  UINTVAL(operands[5]))"
+  "bfi\t%0, %3, %4, %P5"
+  [(set_attr "type" "bfm")]
+)
+
+(define_insn "*aarch64_bfi5_shift_alt"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+(ior:GPI (and:GPI (ashift:GPI
+   (match_operand:GPI 1 "register_operand" "r")
+   (match_operand:GPI 2 "aarch64_simd_shift_imm_" "n"))
+  (match_operand:GPI 3 "const_int_operand" "n"))
+		 (and:GPI (match_operand:GPI 4 "register_operand" "0")
+  (match_operand:GPI 5 "const_in

Re: [PATCH] Implement std::visit for C++2a (P0655R1)

2019-04-10 Thread Jonathan Wakely

On 08/04/19 19:54 +0100, Jonathan Wakely wrote:

On 08/04/19 17:36 +0100, Jonathan Wakely wrote:

On 08/04/19 19:20 +0300, Ville Voutilainen wrote:

On Mon, 8 Apr 2019 at 19:12, Ville Voutilainen
 wrote:


On Mon, 8 Apr 2019 at 19:02, Jonathan Wakely  wrote:

The attached patch implements the same thing with totally separate
__gen_vtable_r and __gen_vtable_r_impl class templates, instead of
adding all the visit functionality into the existing code (and then
needing to tease it apart again with if-constexpr).

The visit implementation doesn't need to care about the
__variant_cookie or __variant_idx_cookie cases, which simplifies
things.

This also adjusts some whitespace, for correct indentation and for
readability. And removes a redundant && from a type.

What do you think?


I hate the duplication of __gen_vtable with a burning passion, because
*that* is the part that causes me heartburn,
not the compile-time ifs in the other bits. But if this is what you
want to ship, I can live with it.


A bit of elaboration: in all of this, the most dreadful parts to
understand were _Multi_array and __gen_vtable.


Completely agreed, that's why I found extending __gen_vtable_impl to
handle a new feature made it even harder to understand.


Whereas the attached patch *removes* code from __gen_vtable, but fixes
the bug in my original std::visit implementation, *and* fixes a bug
in __visitor_result_type (it doesn't use INVOKE).

The downside of this patch is that it fails to diagnose some
ill-formed visitors where not all invocations return the same type. So
it's not acceptable. But I still think there's a simpler design
struggling to get out.


Here's a version that doesn't have that problem.

This adds std::__invoke_r to implement INVOKE, and then uses it in
std::bind, std::function, std::packaged_task and std::visit. Having
a helper for INVOKE  simplifies a number of things, including the
visitation code in . It also means that we'll only need to
change one place to implement https://wg21.link/p0932r0 once a revised
version of that gets accepted into the draft.

I'll commit this (in a series of patches) in stage 1.


diff --git a/libstdc++-v3/include/bits/invoke.h b/libstdc++-v3/include/bits/invoke.h
index a5278a59f0c..59e22da84d4 100644
--- a/libstdc++-v3/include/bits/invoke.h
+++ b/libstdc++-v3/include/bits/invoke.h
@@ -96,6 +96,65 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	std::forward<_Args>(__args)...);
 }
 
+#if __cplusplus >= 201703L
+  // INVOKE: Invoke a callable object and convert the result to R.
+  template
+constexpr enable_if_t, _Res>
+__invoke_r(_Callable&& __fn, _Args&&... __args)
+noexcept(is_nothrow_invocable_r_v<_Res, _Callable, _Args...>)
+{
+  using __result = __invoke_result<_Callable, _Args...>;
+  using __type = typename __result::type;
+  using __tag = typename __result::__invoke_type;
+  if constexpr (is_void_v<_Res>)
+	std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
+	std::forward<_Args>(__args)...);
+  else
+	return std::__invoke_impl<__type>(__tag{},
+	  std::forward<_Callable>(__fn),
+	  std::forward<_Args>(__args)...);
+}
+#else // C++11
+  template
+using __can_invoke_as_void = __enable_if_t<
+  __and_, __is_invocable<_Callable, _Args...>>::value,
+  _Res
+>;
+
+  template
+using __can_invoke_as_nonvoid = __enable_if_t<
+  __and_<__not_>,
+	 is_convertible::type,
+			_Res>
+  >::value,
+  _Res
+>;
+
+  // INVOKE: Invoke a callable object and convert the result to R.
+  template
+constexpr __can_invoke_as_nonvoid<_Res, _Callable, _Args...>
+__invoke_r(_Callable&& __fn, _Args&&... __args)
+{
+  using __result = __invoke_result<_Callable, _Args...>;
+  using __type = typename __result::type;
+  using __tag = typename __result::__invoke_type;
+  return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
+	std::forward<_Args>(__args)...);
+}
+
+  // INVOKE when R is cv void
+  template
+constexpr __can_invoke_as_void<_Res, _Callable, _Args...>
+__invoke_r(_Callable&& __fn, _Args&&... __args)
+{
+  using __result = __invoke_result<_Callable, _Args...>;
+  using __type = typename __result::type;
+  using __tag = typename __result::__invoke_type;
+  std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
+ std::forward<_Args>(__args)...);
+}
+#endif // C++11
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 
diff --git a/libstdc++-v3/include/bits/std_function.h b/libstdc++-v3/include/bits/std_function.h
index b70ed564d11..5733bf5f3f9 100644
--- a/libstdc++-v3/include/bits/std_function.h
+++ b/libstdc++-v3/include/bits/std_function.h
@@ -109,21 +109,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __destroy_functor
   };
 
-  // Simple type wrapper that helps avoid annoying const problems
-  // when casting between void pointers and pointers-to-pointers.
-  template
-struct _Simp

[PATCH 0/3] OpenRISC floating point support + fixes

2019-04-10 Thread Stafford Horne
Hello,

This is a set of patches to bring FPU support to the OpenRISC backend.  The
backend also add support for 64-bit floating point operations on 32-bit cores
using register pairs, see orfpx64a32 [0].

This depends on binutils patches which have also been submitted per review. [1]

The toolchain has been tested using the gcc and binutils testsuites as well as
floating point test suites running on sim and an fpga soft core or1k_marocchino.
[2]

There is also an unrelated, but trivial patch to fix a code quality issue with
volatile memory loads.

This whole patch series can be found on my github repo [3] as well.

-Stafford

[0] https://openrisc.io/proposals/orfpx64a32
[1] g...@github.com:stffrdhrn/binutils-gdb.git orfpx64a32-2
[2] https://github.com/openrisc/or1k_marocchino
[3] g...@github.com:stffrdhrn/gcc.git or1k-fpu-1a

Stafford Horne (3):
  or1k: Initial support for FPU
  or1k: Allow volatile memory for sign/zero extend loads
  or1k: only force reg for immediates

 gcc/config.gcc|   1 +
 gcc/config/or1k/or1k.c|  10 ++--
 gcc/config/or1k/or1k.md   | 109 --
 gcc/config/or1k/or1k.opt  |  15 -
 gcc/config/or1k/predicates.md |  16 +
 gcc/doc/invoke.texi   |  15 +
 6 files changed, 156 insertions(+), 10 deletions(-)

-- 
2.19.1



[PATCH 1/3] or1k: Initial support for FPU

2019-04-10 Thread Stafford Horne
Or1k only supports ordered compares so we fall back to lib functions
for unordered operations.

Doubles work on this 32-bit architecture by using register pairing as
specified in: https://openrisc.io/proposals/orfpx64a32

Or1k does not support sf/df or df/sf conversions.

gcc/ChangeLog:

* config.gcc (or1k*-*-*): Add mhard-float and mdouble-float validations.
* config/or1k/or1k.md (type): Add fpu.
(fpu): New instruction reservation.
(F, f, fi, FI, FOP, fop): New.
(3): New ALU instruction definition.
(float2): New conversion instruction definition.
(fix_trunc2): New conversion instruction definition.
(fpcmpcc): New code iterator.
(*sf_fp_insn): New instruction definition.
(cstore4): New expand definition.
(cbranch4): New expand definition.
* config/or1k/or1k.opt (msoft-float, mhard-float, mdouble-float): New
options.
* doc/invoke.texi: Document msoft-float, mhard-float and mdouble-float.
---
 gcc/config.gcc   |   1 +
 gcc/config/or1k/or1k.md  | 103 ++-
 gcc/config/or1k/or1k.opt |  15 +-
 gcc/doc/invoke.texi  |  15 ++
 4 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ebbab5d8b6a..8017851922e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2541,6 +2541,7 @@ or1k*-*-*)
for or1k_multilib in ${or1k_multilibs}; do
case ${or1k_multilib} in
mcmov | msext | msfimm | \
+   mhard-float | mdouble-float | \
mhard-div | mhard-mul | \
msoft-div | msoft-mul )

TM_MULTILIB_CONFIG="${TM_MULTILIB_CONFIG},${or1k_multilib}"
diff --git a/gcc/config/or1k/or1k.md b/gcc/config/or1k/or1k.md
index 2dad51cd46b..202493c5ab9 100644
--- a/gcc/config/or1k/or1k.md
+++ b/gcc/config/or1k/or1k.md
@@ -60,7 +60,7 @@
 (define_attr "length" "" (const_int 4))
 
 (define_attr "type"
-  "alu,st,ld,control,multi"
+  "alu,st,ld,control,multi,fpu"
   (const_string "alu"))
 
 (define_attr "insn_support" "class1,sext,sfimm,shftimm" (const_string 
"class1"))
@@ -93,6 +93,10 @@
 (define_insn_reservation "control" 1
   (eq_attr "type" "control")
   "cpu")
+(define_insn_reservation "fpu" 2
+  (eq_attr "type" "fpu")
+  "cpu")
+
 
 ; Define delay slots for any branch
 (define_delay (eq_attr "type" "control")
@@ -155,6 +159,46 @@
   ""
   "l.sub\t%0, %r1, %2")
 
+;; -
+;; Floating Point Arithmetic instructions
+;; -
+
+;; Mode iterator for single/double float
+(define_mode_iterator F [(SF "TARGET_HARD_FLOAT")
+(DF "TARGET_DOUBLE_FLOAT")])
+(define_mode_attr f [(SF "s") (DF "d")])
+(define_mode_attr fi [(SF "si") (DF "di")])
+(define_mode_attr FI [(SF "SI") (DF "DI")])
+
+;; Basic arithmetic instructions
+(define_code_iterator FOP [plus minus mult div])
+(define_code_attr fop [(plus "add") (minus "sub") (mult "mul") (div "div")])
+
+(define_insn "3"
+  [(set (match_operand:F 0 "register_operand" "=r")
+   (FOP:F (match_operand:F 1 "register_operand" "r")
+  (match_operand:F 2 "register_operand" "r")))]
+  "TARGET_HARD_FLOAT"
+  "lf..\t%0, %1, %2"
+  [(set_attr "type" "fpu")])
+
+;; Basic float<->int conversion
+(define_insn "float2"
+  [(set (match_operand:F 0 "register_operand" "=r")
+   (float:F
+   (match_operand: 1 "register_operand" "r")))]
+  "TARGET_HARD_FLOAT"
+  "lf.itof.\t%0, %1"
+  [(set_attr "type" "fpu")])
+
+(define_insn "fix_trunc2"
+  [(set (match_operand: 0 "register_operand" "=r")
+   (fix:
+   (match_operand:F 1 "register_operand" "r")))]
+  "TARGET_HARD_FLOAT"
+  "lf.ftoi.\t%0, %1"
+  [(set_attr "type" "fpu")])
+
 ;; -
 ;; Logical operators
 ;; -
@@ -388,6 +432,31 @@
l.sfi\t%r0, %1"
   [(set_attr "insn_support" "*,sfimm")])
 
+;; Support FP comparisons too
+
+;; The OpenRISC FPU supports these comparisons:
+;;
+;;lf.sfeq.{d,s} - equality, r r, double or single precision
+;;lf.sfge.{d,s} - greater than or equal, r r, double or single precision
+;;lf.sfgt.{d,s} - greater than, r r, double or single precision
+;;lf.sfle.{d,s} - less than or equal, r r, double or single precision
+;;lf.sflt.{d,s} - less than, r r, double or single precision
+;;lf.sfne.{d,s} - not equal, r r, double or single precision
+;;
+;; Double precision is only supported on some hardware.  Only register/register
+;; comparisons are supported.  All comparisons are signed.
+
+(define_code_iterator fpcmpcc [ne eq lt gt ge le])
+
+(define_insn "*sf_fp_insn"
+  [(set (reg:BI SR_F_REGNUM)
+   (fpcmpcc:BI (match_operand:F 0 "register_operand" "r")
+   (match_o

[PATCH 2/3] or1k: Allow volatile memory for sign/zero extend loads

2019-04-10 Thread Stafford Horne
Volatile memory does not match the memory_operand predicate.  This
causes extra extend/mask instructions instructions when reading
from volatile memory.  On OpenRISC this can be treated the same
as regular memory.

gcc/ChangeLog:

* config/or1k/or1k.md (zero_extendsi2): Update predicate.
(extendsi2): Update predicate.
* gcc/config/or1k/predicates.md (volatile_mem_operand): New.
(reg_or_mem_operand): New.
---
 gcc/config/or1k/or1k.md   |  6 +++---
 gcc/config/or1k/predicates.md | 16 
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/config/or1k/or1k.md b/gcc/config/or1k/or1k.md
index 202493c5ab9..23ded94feb3 100644
--- a/gcc/config/or1k/or1k.md
+++ b/gcc/config/or1k/or1k.md
@@ -372,11 +372,11 @@
 ;; Sign Extending
 ;; -
 
-;; Zero extension can always be done with AND and an extending load.
+;; Zero extension can always be done with AND or an extending load.
 
 (define_insn "zero_extendsi2"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-   (zero_extend:SI (match_operand:I12 1 "nonimmediate_operand" "r,m")))]
+   (zero_extend:SI (match_operand:I12 1 "reg_or_mem_operand" "r,m")))]
   ""
   "@
l.andi\t%0, %1, 
@@ -388,7 +388,7 @@
 
 (define_insn "extendsi2"
   [(set (match_operand:SI 0 "register_operand"  "=r,r")
-   (sign_extend:SI (match_operand:I12 1 "nonimmediate_operand"  "r,m")))]
+   (sign_extend:SI (match_operand:I12 1 "reg_or_mem_operand"  "r,m")))]
   "TARGET_SEXT"
   "@
l.exts\t%0, %1
diff --git a/gcc/config/or1k/predicates.md b/gcc/config/or1k/predicates.md
index 879236bca49..17599d0ee3b 100644
--- a/gcc/config/or1k/predicates.md
+++ b/gcc/config/or1k/predicates.md
@@ -82,3 +82,19 @@
 
 (define_predicate "equality_comparison_operator"
   (match_code "ne,eq"))
+
+;; Borrowed from rs6000
+;  Return 1 if the operand is in volatile memory.  Note that during the
+;; RTL generation phase, memory_operand does not return TRUE for volatile
+;; memory references.  So this function allows us to recognize volatile
+;; references where it's safe.
+(define_predicate "volatile_mem_operand"
+  (and (and (match_code "mem")
+   (match_test "MEM_VOLATILE_P (op)"))
+   (if_then_else (match_test "reload_completed")
+(match_operand 0 "memory_operand")
+(match_test "memory_address_p (mode, XEXP (op, 0))"
+
+(define_predicate "reg_or_mem_operand"
+  (ior (match_operand 0 "nonimmediate_operand")
+   (match_operand 0 "volatile_mem_operand")))
-- 
2.19.1



[PATCH 3/3] or1k: only force reg for immediates

2019-04-10 Thread Stafford Horne
The force_reg in or1k_expand_compare is hard coded for SImode, which is fine as
this used to only be used on SI expands.  However, with FP support this will
cause issues.  In general we should only force the right hand operand to a
register if its an immediate.  This patch adds an condition to check for that.

gcc/ChangeLog:

* config/or1k/or1k.c (or1k_expand_compare): Check for int before
force_reg.
---
 gcc/config/or1k/or1k.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/or1k/or1k.c b/gcc/config/or1k/or1k.c
index fc10fcfabde..35d984533be 100644
--- a/gcc/config/or1k/or1k.c
+++ b/gcc/config/or1k/or1k.c
@@ -1435,11 +1435,13 @@ void
 or1k_expand_compare (rtx *operands)
 {
   rtx sr_f = gen_rtx_REG (BImode, SR_F_REGNUM);
+  rtx righthand_op = XEXP (operands[0], 1);
 
-  /* The RTL may receive an immediate in argument 1 of the compare, this is not
- supported unless we have l.sf*i instructions, force them into registers.  
*/
-  if (!TARGET_SFIMM)
-XEXP (operands[0], 1) = force_reg (SImode, XEXP (operands[0], 1));
+  /* Integer RTL may receive an immediate in argument 1 of the compare, this is
+ not supported unless we have l.sf*i instructions, force them into
+ registers.  */
+  if (!TARGET_SFIMM && CONST_INT_P (righthand_op))
+XEXP (operands[0], 1) = force_reg (SImode, righthand_op);
 
   /* Emit the given comparison into the Flag bit.  */
   PUT_MODE (operands[0], BImode);
-- 
2.19.1



Re: [PATCH 1/3] [ARC] Emit blockage regardless to avoid delay slot scheduling.

2019-04-10 Thread Andrew Burgess
* Claudiu Zissulescu  [2019-03-25 12:03:11 +0100]:

> 1.The delay slot scheduler can reschedule some of the frame related
> instructions resulting in having incorect CFI information. This patch
> introduces a schedule blockage to avoid this problem.
> 
> 2.There are cases when an interrupt may happen and not all the current
> function stack operations are done, which may result in stack
> corruption. Such an example is accessing an returning a local
> structure members, which members are allocated on stack. The stack
> adjustment and the accessing of the struct member can be reorder as
> they may not use both the SP register for the access.
> 
> 3.Also, do not save/restore SP when in interrupt. The SP is switch by
> the core IRQ machinery.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_expand_prologue): Emit blockage regardless
>   to avoid delay slot scheduling.
>   * config/arc/arc.md (stack_tie): Remove.
>   (UNSPEC_ARC_STKTIE): Likewise.
>   (arc_must_save_register): Don't save SP.
>   (arc_expand_epilogue): Emit blockage.

This looks fine, just one minor issue...

> ---
>  gcc/config/arc/arc.c  | 17 +
>  gcc/config/arc/arc.md | 13 -
>  2 files changed, 9 insertions(+), 21 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index fa49c562b46..62f435b0a1d 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -2658,6 +2658,7 @@ arc_must_save_register (int regno, struct function 
> *func)
>  
>if ((regno) != RETURN_ADDR_REGNUM
>&& (regno) != FRAME_POINTER_REGNUM
> +  && (regno) != STACK_POINTER_REGNUM
>&& df_regs_ever_live_p (regno)
>&& (!call_used_regs[regno]
> || ARC_INTERRUPT_P (fn_type))

The comment at the head of this function mentions the special handling
for ra and fp, but wasn't updated with this change.  Would it make
sense to do that?

Thanks,
Andrew

> @@ -3731,14 +3732,10 @@ arc_expand_prologue (void)
>  
>/* Allocate the stack frame.  */
>if (frame_size_to_allocate > 0)
> -{
> -  frame_stack_add ((HOST_WIDE_INT) 0 - frame_size_to_allocate);
> -  /* If the frame pointer is needed, emit a special barrier that
> -  will prevent the scheduler from moving stores to the frame
> -  before the stack adjustment.  */
> -  if (arc_frame_pointer_needed ())
> - emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
> -}
> +frame_stack_add ((HOST_WIDE_INT) 0 - frame_size_to_allocate);
> +
> +  /* Emit a blockage to avoid delay slot scheduling.  */
> +  emit_insn (gen_blockage ());
>  }
>  
>  /* Do any necessary cleanup after a function to restore stack, frame,
> @@ -3775,6 +3772,10 @@ arc_expand_epilogue (int sibcall_p)
>if (!can_trust_sp_p)
>  gcc_assert (arc_frame_pointer_needed ());
>  
> +  /* Emit a blockage to avoid/flush all pending sp operations.  */
> +  if (size)
> +emit_insn (gen_blockage ());
> +
>if (TARGET_CODE_DENSITY
>&& TARGET_CODE_DENSITY_FRAME
>&& !ARC_AUTOFP_IRQ_P (fn_type)
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 0482980122d..3a903d4224a 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -136,7 +136,6 @@
>UNSPEC_ARC_VMAC2HU
>UNSPEC_ARC_VMPY2H
>UNSPEC_ARC_VMPY2HU
> -  UNSPEC_ARC_STKTIE
>  
>VUNSPEC_ARC_RTIE
>VUNSPEC_ARC_SYNC
> @@ -6301,18 +6300,6 @@ core_3, archs4x, archs4xd, archs4xd_slow"
>(set_attr "predicable" "yes,no,no,yes,no")
>(set_attr "cond" "canuse,nocond,nocond,canuse_limm,nocond")])
>  
> -(define_insn "stack_tie"
> -  [(set (mem:BLK (scratch))
> - (unspec:BLK [(match_operand:SI 0 "register_operand" "r")
> -  (match_operand:SI 1 "register_operand" "r")]
> - UNSPEC_ARC_STKTIE))]
> -  ""
> -  ""
> -  [(set_attr "length" "0")
> -   (set_attr "iscompact" "false")
> -   (set_attr "type" "block")]
> -  )
> -
>  (define_insn "*add_shift"
>[(set (match_operand:SI 0 "register_operand" "=q,r,r")
>   (plus:SI (ashift:SI (match_operand:SI 1 "register_operand" "q,r,r")
> -- 
> 2.20.1
> 


Re: [PATCH 2/3] [ARC] Refurb eliminate regs.

2019-04-10 Thread Andrew Burgess
* Claudiu Zissulescu  [2019-03-25 12:03:12 +0100]:

> Refurbish eliminable regs howto by introducing a fake
> FRAME_POINTER_REGNUM with the purpose to release FP register to be
> used freely by the register allocator.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_hard_regno_modes): Add two missing modes
>   for last two fake registers.
>   (arc_conditional_register_usage): Make sure fake frame and arg
>   pointer regs are in general regs class.
>   (FRAME_POINTER_MASK): Remove.
>   (RETURN_ADDR_MASK): Remove.
>   (arc_must_save_register): Use hard frame regnum.
>   (frame_restore_reg): Use hard_frame_pointer_rtx.
>   (arc_save_callee_saves): Likewise.
>   (arc_restore_callee_saves): Likewise.
>   (arc_save_callee_enter): Likewise.
>   (arc_restore_callee_leave): Likewise.
>   (arc_save_callee_milli): Likewise.
>   (arc_eh_return_address_location): Likewise.
>   (arc_check_multi): Use hard frame regnum.
>   (arc_can_eliminate): Likewise.
>   * config/arc/arc.h (FIXED_REGISTERS): Make FP register available
>   for register allocator.
>   (REG_CLASS_CONTENTS): Update GENERAL_REGS.
>   (REGNO_OK_FOR_BASE_P): Consider FRAME_POINTER_REGNUM.
>   (FRAME_POINTER_REGNUM): Change it to a fake register.
>   (HARD_FRAME_POINTER_REGNUM): Defined.
>   (ARG_POINTER_REGNUM): Change it to a new fake register.
>   (ELIMINABLE_REGS): Update.
>   (REGISTER_NAMES): Update names.
>   * config/arc/arc.md (LP_START): Remove.
>   (LP_END): Likewise.
>   (shift_si3_loop): Update pattern.

This is fine, thanks.

Andrew


> ---
>  gcc/config/arc/arc.c  | 173 +++---
>  gcc/config/arc/arc.h  |  31 
>  gcc/config/arc/arc.md |  12 +--
>  3 files changed, 115 insertions(+), 101 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 62f435b0a1d..9938a774d91 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -1651,7 +1651,8 @@ static unsigned int arc_hard_regno_modes[] = {
>V_MODES, V_MODES, V_MODES, V_MODES, V_MODES, V_MODES, V_MODES, V_MODES,
>  
>S_MODES, S_MODES, S_MODES, S_MODES, S_MODES, S_MODES, S_MODES, S_MODES,
> -  S_MODES, S_MODES, S_MODES, S_MODES, S_MODES, S_MODES, S_MODES, S_MODES
> +  S_MODES, S_MODES, S_MODES, S_MODES, S_MODES, S_MODES, S_MODES, S_MODES,
> +  S_MODES, S_MODES
>  };
>  
>  static unsigned int arc_mode_class [NUM_MACHINE_MODES];
> @@ -1886,7 +1887,8 @@ arc_conditional_register_usage (void)
>  
>/* Handle Special Registers.  */
>arc_regno_reg_class[CC_REG] = NO_REGS;  /* CC_REG: must be NO_REGS.  */
> -  arc_regno_reg_class[62] = GENERAL_REGS;
> +  arc_regno_reg_class[FRAME_POINTER_REGNUM] = GENERAL_REGS;
> +  arc_regno_reg_class[ARG_POINTER_REGNUM] = GENERAL_REGS;
>  
>if (TARGET_DPFP)
>  for (i = R40_REG; i < R44_REG; ++i)
> @@ -2616,8 +2618,53 @@ arc_compute_function_type (struct function *fun)
>return fun->machine->fn_type = fn_type;
>  }
>  
> -#define FRAME_POINTER_MASK (1 << (FRAME_POINTER_REGNUM))
> -#define RETURN_ADDR_MASK (1 << (RETURN_ADDR_REGNUM))
> +/* Helper function to wrap FRAME_POINTER_NEEDED.  We do this as
> +   FRAME_POINTER_NEEDED will not be true until the IRA (Integrated
> +   Register Allocator) pass, while we want to get the frame size
> +   correct earlier than the IRA pass.
> +
> +   When a function uses eh_return we must ensure that the fp register
> +   is saved and then restored so that the unwinder can restore the
> +   correct value for the frame we are going to jump to.
> +
> +   To do this we force all frames that call eh_return to require a
> +   frame pointer (see arc_frame_pointer_required), this
> +   will ensure that the previous frame pointer is stored on entry to
> +   the function, and will then be reloaded at function exit.
> +
> +   As the frame pointer is handled as a special case in our prologue
> +   and epilogue code it must not be saved and restored using the
> +   MUST_SAVE_REGISTER mechanism otherwise we run into issues where GCC
> +   believes that the function is not using a frame pointer and that
> +   the value in the fp register is the frame pointer, while the
> +   prologue and epilogue are busy saving and restoring the fp
> +   register.
> +
> +   During compilation of a function the frame size is evaluated
> +   multiple times, it is not until the reload pass is complete the the
> +   frame size is considered fixed (it is at this point that space for
> +   all spills has been allocated).  However the frame_pointer_needed
> +   variable is not set true until the register allocation pass, as a
> +   result in the early stages the frame size does not include space
> +   for the frame pointer to be spilled.
> +
> +   The problem that this causes is that the rtl generated for
> +   EH_RETURN_HANDLER_RTX uses the details of the frame size to compute
> +   the offset from the frame pointer at which t

Re: [PATCH 3/3] [ARC] Remove Rs5 constraint.

2019-04-10 Thread Andrew Burgess
* Claudiu Zissulescu  [2019-03-25 12:03:13 +0100]:

> New LRA algorithms require the all the register constraints to be
> defined using define_register_constraint keyword. However, Rs5
> constraint was not LRA proof. Remove it and replace it by equivalent
> Rcd constraint.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.md (sibcall_insn): Use Rcd constraint.
>   (sibcall_value_insn): Likewise.
>   * config/arc/constraints.md (Rs5): Remove.

This is fine.

thanks,
Andrew


> ---
>  gcc/config/arc/arc.md | 24 +++
>  gcc/config/arc/constraints.md | 10 --
>  gcc/testsuite/gcc.target/arc/long-calls.c |  4 ++--
>  3 files changed, 14 insertions(+), 24 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 7ac5a1b5785..54d073107a8 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -4703,17 +4703,17 @@ core_3, archs4x, archs4xd, archs4xd_slow"
>  
>  (define_insn "*sibcall_insn"
>   [(call (mem:SI (match_operand:SI 0 "call_address_operand"
> -  "Cbp,Cbr,Rs5,Rsc,Cal"))
> +  "Cbp,Cbr,!Rcd,Rsc,Cal"))
>   (match_operand 1 "" ""))
>(simple_return)
>(use (match_operand 2 "" ""))]
>""
>"@
> -   b%!%* %P0
> -   b%!%* %P0
> -   j%!%* [%0]%&
> -   j%!%* [%0]
> -   j%! %P0"
> +   b%!%*\\t%P0
> +   b%!%*\\t%P0
> +   j%!%*\\t[%0]
> +   j%!%*\\t[%0]
> +   j%!\\t%P0"
>[(set_attr "type" "call,call,call,call,call_no_delay_slot")
> (set_attr "predicable" "yes,no,no,yes,yes")
> (set_attr "iscompact" "false,false,maybe,false,false")
> @@ -4723,17 +4723,17 @@ core_3, archs4x, archs4xd, archs4xd_slow"
>  (define_insn "*sibcall_value_insn"
>   [(set (match_operand 0 "dest_reg_operand" "")
> (call (mem:SI (match_operand:SI 1 "call_address_operand"
> -   "Cbp,Cbr,Rs5,Rsc,Cal"))
> +   "Cbp,Cbr,!Rcd,Rsc,Cal"))
>(match_operand 2 "" "")))
>(simple_return)
>(use (match_operand 3 "" ""))]
>""
>"@
> -   b%!%* %P1
> -   b%!%* %P1
> -   j%!%* [%1]%&
> -   j%!%* [%1]
> -   j%! %P1"
> +   b%!%*\\t%P1
> +   b%!%*\\t%P1
> +   j%!%*\\t[%1]
> +   j%!%*\\t[%1]
> +   j%!\\t%P1"
>[(set_attr "type" "call,call,call,call,call_no_delay_slot")
> (set_attr "predicable" "yes,no,no,yes,yes")
> (set_attr "iscompact" "false,false,maybe,false,false")
> diff --git a/gcc/config/arc/constraints.md b/gcc/config/arc/constraints.md
> index 523210432da..494e4792316 100644
> --- a/gcc/config/arc/constraints.md
> +++ b/gcc/config/arc/constraints.md
> @@ -480,16 +480,6 @@
>(and (match_code "reg")
> (match_test "REGNO (op) == 31")))
>  
> -(define_constraint "Rs5"
> -  "@internal
> -   sibcall register - only allow one of the five available 16 bit isnsn.
> -   Registers usable in ARCompact 16-bit instructions: @code{r0}-@code{r3},
> -   @code{r12}"
> -  (and (match_code "reg")
> -   (match_test "!arc_ccfsm_cond_exec_p ()")
> -   (ior (match_test "(unsigned) REGNO (op) <= 3")
> - (match_test "REGNO (op) == 12"
> -
>  (define_constraint "Rcc"
>"@internal
>Condition Codes"
> diff --git a/gcc/testsuite/gcc.target/arc/long-calls.c 
> b/gcc/testsuite/gcc.target/arc/long-calls.c
> index 63fafbcc674..9ae36ca0df3 100644
> --- a/gcc/testsuite/gcc.target/arc/long-calls.c
> +++ b/gcc/testsuite/gcc.target/arc/long-calls.c
> @@ -5,7 +5,7 @@ int g (void);
>  
>  int f (void)
>  {
> -g();
> +  g();
>  }
>  
> -/* { dg-final { scan-assembler "j @g" } } */
> +/* { dg-final { scan-assembler "j\\t@g" } } */
> -- 
> 2.20.1
> 


[Patch] [Aarch64] PR rtl-optimization/87763 - this patch fixes gcc.target/aarch64/lsl_asr_sbfiz.c

2019-04-10 Thread Steve Ellcey

Here is another patch to fix one of the failures
listed in PR rtl-optimization/87763. This change
fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
an alternative version of *ashiftsi_extv_bfiz that
has a subreg in it.

Tested with bootstrap and regression test run.

OK for checkin?

Steve Ellcey


2018-04-10  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
New Instruction.


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e0df975..04dc06f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5634,6 +5634,22 @@
   [(set_attr "type" "bfx")]
 )
 
+(define_insn "*ashiftsi_extv_bfiz_alt"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (ashift:SI
+ (subreg:SI
+   (sign_extract:DI
+ (subreg:DI (match_operand:SI 1 "register_operand" "r") 0)
+ (match_operand 2 "aarch64_simd_shift_imm_offset_si" "n")
+ (const_int 0))
+   0)
+ (match_operand 3 "aarch64_simd_shift_imm_si" "n")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+1, GET_MODE_BITSIZE (SImode) - 1)"
+  "sbfiz\\t%w0, %w1, %3, %2"
+  [(set_attr "type" "bfx")]
+)
+
 ;; When the bit position and width of the equivalent extraction add up to 32
 ;; we can use a W-reg LSL instruction taking advantage of the implicit
 ;; zero-extension of the X-reg.


Re: [Patch] [Aarch64] PR rtl-optimization/87763 - this patch fixes gcc.target/aarch64/lsl_asr_sbfiz.c

2019-04-10 Thread Steve Ellcey
On Wed, 2019-04-10 at 15:03 -0700, Steve Ellcey wrote:
> Here is another patch to fix one of the failures
> listed in PR rtl-optimization/87763. This change
> fixes gcc.target/aarch64/lsl_asr_sbfiz.c by adding
> an alternative version of *ashiftsi_extv_bfiz that
> has a subreg in it.
> 
> Tested with bootstrap and regression test run.
> 
> OK for checkin?
> 
> Steve Ellcey
> 
> 
> 2018-04-10  Steve Ellcey  
> 
>   PR rtl-optimization/87763
>   * config/aarch64/aarch64.md (*ashiftsi_extv_bfiz_alt):
>   New Instruction.

I forgot I sent this out before (in January).  It generated some
discussion then but no action.  So I guess this is a ping.

https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01694.html

Steve Ellcey
sell...@marvell.com


Re: [PATCH] Replace direct PSTL uses of assert() with a macro

2019-04-10 Thread Thomas Rodgers
Ok, lets try this again.

> On 09/04/19 15:23 -0700, Thomas Rodgers wrote:
>>This also replaces calls to __TBB_ASSERT so that there are two macro
>>definitions provided by c++config -
>>  __PSTL_ASSERT(_Condition)
>>  __PSTL_ASSERT_MSG(_Condition, _Message)
>>
>>  * include/bits/c++config:
>>  Add definition for __PSTL_ASSERT.
>>  Add definition for __PSTL_ASSERT_MSG.
>>  * include/pstl/algorithm_impl.h: Replace use of assert().
>>  * include/pstl/numeric_impl.h: Replace use of assert().
>>  * include/pstl/parallel_backend_tbb.h:
>>  Replace use of assert().
>>  Replace use of __TBB_ASSERT().
>>
>>  * include/pstl/parallel_backend_utils.h: Replace use of assert().

Jonathan Wakely writes:
> ... you fix now ...

>From fdd06789266d7703c48f53c23a85a36144649334 Mon Sep 17 00:00:00 2001
From: Thomas Rodgers 
Date: Fri, 5 Apr 2019 15:27:35 -0700
Subject: [PATCH] Replace direct PSTL uses of assert() with a macro

This also replaces calls to __TBB_ASSERT so that there are two macro
definitions provided by c++config -
	__PSTL_ASSERT(_Condition)
	__PSTL_ASSERT_MSG(_Condition, _Message)

	* include/bits/c++config:
	Add definition for __PSTL_ASSERT.
	Add definition for __PSTL_ASSERT_MSG.
	* include/pstl/algorithm_impl.h: Replace use of assert().
	* include/pstl/numeric_impl.h: Replace use of assert().
	* include/pstl/parallel_backend_tbb.h:
	Replace use of assert().
	Replace use of __TBB_ASSERT().

	* include/pstl/parallel_backend_utils.h: Replace use of assert().
---
 libstdc++-v3/include/bits/c++config   |  4 
 libstdc++-v3/include/pstl/algorithm_impl.h| 10 +-
 libstdc++-v3/include/pstl/numeric_impl.h  |  4 ++--
 libstdc++-v3/include/pstl/parallel_backend_tbb.h  |  8 
 .../include/pstl/parallel_backend_utils.h | 15 +++
 5 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 66420a9a3f2..ef8ba96737b 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -690,6 +690,10 @@ namespace std
 #  undef __PSTL_PAR_BACKEND_TBB
 # endif
 
+# define __PSTL_ASSERT(_Condition) __glibcxx_assert(_Condition)
+# define __PSTL_ASSERT_MSG(_Condition, _Message) __glibcxx_assert(_Condition)
+
+
 # define __PSTL_PRAGMA(x) _Pragma (#x)
 
 # define __PSTL_STRING_AUX(x) #x
diff --git a/libstdc++-v3/include/pstl/algorithm_impl.h b/libstdc++-v3/include/pstl/algorithm_impl.h
index e06bf60151e..b0d60baae14 100644
--- a/libstdc++-v3/include/pstl/algorithm_impl.h
+++ b/libstdc++-v3/include/pstl/algorithm_impl.h
@@ -2731,8 +2731,8 @@ __pattern_includes(_ExecutionPolicy&& __exec, _ForwardIterator1 __first1, _Forwa
  return !__internal::__parallel_or(
 std::forward<_ExecutionPolicy>(__exec), __first2, __last2,
 [__first1, __last1, __first2, __last2, &__comp](_ForwardIterator2 __i, _ForwardIterator2 __j) {
-assert(__j > __i);
-//assert(__j - __i > 1);
+__PSTL_ASSERT(__j > __i);
+//__PSTL_ASSERT(__j - __i > 1);
 
 //1. moving boundaries to "consume" subsequence of equal elements
 auto __is_equal = [&__comp](_ForwardIterator2 __a, _ForwardIterator2 __b) -> bool {
@@ -2756,8 +2756,8 @@ __pattern_includes(_ExecutionPolicy&& __exec, _ForwardIterator1 __first1, _Forwa
 //2. testing is __a subsequence of the second range included into the first range
 auto __b = std::lower_bound(__first1, __last1, *__i, __comp);
 
-assert(!__comp(*(__last1 - 1), *__b));
-assert(!__comp(*(__j - 1), *__i));
+__PSTL_ASSERT(!__comp(*(__last1 - 1), *__b));
+__PSTL_ASSERT(!__comp(*(__j - 1), *__i));
 return !std::includes(__b, __last1, __i, __j, __comp);
 });
 });
@@ -2948,7 +2948,7 @@ __parallel_set_union_op(_ExecutionPolicy&& __exec, _ForwardIterator1 __first1, _
 }
 
 const auto __m2 = __left_bound_seq_2 - __first2;
-assert(__m1 == 0 || __m2 == 0);
+__PSTL_ASSERT(__m1 == 0 || __m2 == 0);
 if (__m2 > __set_algo_cut_off)
 {
 auto __res_or = __result;
diff --git a/libstdc++-v3/include/pstl/numeric_impl.h b/libstdc++-v3/include/pstl/numeric_impl.h
index 49a4abf5a95..738a61d92f6 100644
--- a/libstdc++-v3/include/pstl/numeric_impl.h
+++ b/libstdc++-v3/include/pstl/numeric_impl.h
@@ -314,7 +314,7 @@ _ForwardIterator2
 __brick_adjacent_difference(_ForwardIterator1 __first, _ForwardIterator1 __last, _ForwardIterator2 __d_first,
 BinaryOperation __op, /*is_vector=*/std::true_type) noexcept
 {
-assert(__first != __last);
+__PSTL_ASSERT(__first != __last);
 
 typedef typename std::iterator_traits<_ForwardIterator1>::reference _ReferenceType1;
   

Re: [PATCH] Replace direct PSTL uses of assert() with a macro

2019-04-10 Thread Jonathan Wakely

On 10/04/19 15:57 -0700, Thomas Rodgers wrote:

Ok, lets try this again.


On 09/04/19 15:23 -0700, Thomas Rodgers wrote:

This also replaces calls to __TBB_ASSERT so that there are two macro
definitions provided by c++config -
__PSTL_ASSERT(_Condition)
__PSTL_ASSERT_MSG(_Condition, _Message)

* include/bits/c++config:
Add definition for __PSTL_ASSERT.
Add definition for __PSTL_ASSERT_MSG.
* include/pstl/algorithm_impl.h: Replace use of assert().
* include/pstl/numeric_impl.h: Replace use of assert().
* include/pstl/parallel_backend_tbb.h:
Replace use of assert().
Replace use of __TBB_ASSERT().

* include/pstl/parallel_backend_utils.h: Replace use of assert().


Jonathan Wakely writes:

... you fix now ...


Looks good, OK for trunk, thanks.




Re: [PATCH] Replace direct PSTL uses of assert() with a macro

2019-04-10 Thread Jonathan Wakely

On 10/04/19 23:59 +0100, Jonathan Wakely wrote:

On 10/04/19 15:57 -0700, Thomas Rodgers wrote:

Ok, lets try this again.


On 09/04/19 15:23 -0700, Thomas Rodgers wrote:

This also replaces calls to __TBB_ASSERT so that there are two macro
definitions provided by c++config -
__PSTL_ASSERT(_Condition)
__PSTL_ASSERT_MSG(_Condition, _Message)

* include/bits/c++config:
Add definition for __PSTL_ASSERT.
Add definition for __PSTL_ASSERT_MSG.
* include/pstl/algorithm_impl.h: Replace use of assert().
* include/pstl/numeric_impl.h: Replace use of assert().
* include/pstl/parallel_backend_tbb.h:
Replace use of assert().
Replace use of __TBB_ASSERT().

* include/pstl/parallel_backend_utils.h: Replace use of assert().


Jonathan Wakely writes:

... you fix now ...


Looks good, OK for trunk, thanks.


Looks like parallel_backend_tbb.h still includes  after this
patch. Assuming tests still pass with that removed, that tweak to the
patch is pre-approved.