date:20200303

[PATCH] PR target/93995 ICE in patch_jump_insn, at cfgrtl.c:1290 on riscv64-linux-gnu

2020-03-03 Thread Kito Cheng

Last code gen change of LTGT didn't consider the situation of cbranch with LTGT,
branch only support few compare codes.

gcc/ChangeLog

* config/riscv/riscv.c (riscv_emit_float_compare): Using NE to compare
the result of IOR.

gcc/testsuite/ChangeLog

* gcc.dg/pr93995.c: New.
---
 gcc/config/riscv/riscv.c   |  7 +++---
 gcc/testsuite/gcc.dg/pr93995.c | 46 ++
 2 files changed, 50 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr93995.c

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index d45b19d861b..94b5ac01762 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2299,9 +2299,10 @@ riscv_emit_float_compare (enum rtx_code *code, rtx *op0, 
rtx *op1)
 
 case LTGT:
   /* (a < b) | (a > b) */
-  *code = IOR;
-  *op0 = riscv_force_binary (word_mode, LT, cmp_op0, cmp_op1);
-  *op1 = riscv_force_binary (word_mode, GT, cmp_op0, cmp_op1);
+  tmp0 = riscv_force_binary (word_mode, LT, cmp_op0, cmp_op1);
+  tmp1 = riscv_force_binary (word_mode, GT, cmp_op0, cmp_op1);
+  *op0 = riscv_force_binary (word_mode, IOR, tmp0, tmp1);
+  *op1 = const0_rtx;
   break;
 
 default:
diff --git a/gcc/testsuite/gcc.dg/pr93995.c b/gcc/testsuite/gcc.dg/pr93995.c
new file mode 100644
index 000..b89c85db10a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr93995.c
@@ -0,0 +1,46 @@
+/* PR target/93995 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-trapping-math" } */
+
+double s1[4], s2[4], s3[64];
+
+int
+main (void)
+{
+  int i;
+  asm volatile ("" : : : "memory");
+  for (i = 0; i < 4; i++)
+s3[0 * 4 + i] = __builtin_isgreater (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[1 * 4 + i] = (!__builtin_isgreater (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[2 * 4 + i] = __builtin_isgreaterequal (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[3 * 4 + i] = (!__builtin_isgreaterequal (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[4 * 4 + i] = __builtin_isless (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[5 * 4 + i] = (!__builtin_isless (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[6 * 4 + i] = __builtin_islessequal (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[7 * 4 + i] = (!__builtin_islessequal (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[8 * 4 + i] = __builtin_islessgreater (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[9 * 4 + i] = (!__builtin_islessgreater (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[10 * 4 + i] = __builtin_isunordered (s1[i], s2[i]) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[11 * 4 + i] = (!__builtin_isunordered (s1[i], s2[i])) ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[12 * 4 + i] = s1[i] > s2[i] ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[13 * 4 + i] = s1[i] >= s2[i] ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[14 * 4 + i] = s1[i] < s2[i] ? -1.0 : 0.0;
+  for (i = 0; i < 4; i++)
+s3[15 * 4 + i] = s1[i] <= s2[i] ? -1.0 : 0.0;
+  asm volatile ("" : : : "memory");
+  return 0;
+}
-- 
2.25.1

Re: [PATCH] Clear --help=language and --help=common interaction.

2020-03-03 Thread Martin Liška


On 3/2/20 11:52 PM, Joseph Myers wrote:

On Mon, 2 Mar 2020, Martin Liška wrote:


+version of GCC@.  If an option is supported by all languages, one needs
+to use @var{common} qualifier instead.


"common" is literal text, so it should be @samp{common} not @var{common},
and the existing documentation here describes it as a "class" with other
things such as "undocumented" or "joined" being "qualifiers"



Thank you for the comments. I've got an updated version of the patch.

Martin
>From 4fc81d25275a98493eaa9494e77dae9691fdbd20 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 2 Mar 2020 14:25:05 +0100
Subject: [PATCH] Clear --help=language and --help=common interaction.

gcc/ChangeLog:

2020-03-02  Martin Liska  

	PR c/93886
	PR c/93887
	* doc/invoke.texi: Clarify --help=language and --help=common
	interaction.
---
 gcc/doc/invoke.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4f88fe68999..98102020f55 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1665,7 +1665,8 @@ option.
 @item @var{language}
 Display the options supported for @var{language}, where
 @var{language} is the name of one of the languages supported in this
-version of GCC@.
+version of GCC@.  If an option is supported by all languages, one needs
+to select @samp{common} class.
 
 @item @samp{common}
 Display the options that are common to all languages.
-- 
2.25.1

Re: [PATCH 4/4] arc: Update legitimate small data address.

2020-03-03 Thread Claudiu Zissulescu

Thank you for your review. All 4 patches are pushed.

Claudiu

From: Jeff Law 
Sent: Friday, February 28, 2020 11:15 PM
To: Claudiu Zissulescu ; gcc-patches@gcc.gnu.org 

Cc: Claudiu Zissulescu ; Francois Bedard 
; andrew.burg...@embecosm.com 

Subject: Re: [PATCH 4/4] arc: Update legitimate small data address.

On Wed, 2020-02-26 at 16:59 +0200, Claudiu Zissulescu wrote:
> All ARC's small data adressing is using address scaling feature of the
> load/store instructions (i.e., the address is made of a general
> pointer plus a shifted offset. The shift amount depends on the
> addressing mode).  This patch is checking the offset of an address if
> it fits the scalled constraint.  If so, a small data access is
> generated.  This patch fixes execute' pr93249 failure.

>
> gcc/
> -xx-xx  Claudiu Zissulescu  
>
>* config/arc/arc.c (leigitimate_small_data_address_p): Check if an
>address has an offset which fits the scalling constraint for a
>load/store operation.
>(legitimate_scaled_address_p): Update use
>leigitimate_small_data_address_p.
>(arc_print_operand): Likewise.
>(arc_legitimate_address_p): Likewise.
>(legitimate_small_data_address_p): Likewise.
OK.
jeff
>

[PATCH] Wrap array in ctor with braces.

2020-03-03 Thread Martin Liška


Hi.

The patch is about to silent a few clang warnings:

/home/marxin/Programming/gcc/gcc/cp/method.c:903:26: warning: suggest braces 
around initialization of subobject [-Wmissing-braces]
   { "partial_ordering", "equivalent", "greater", "less", "unordered" },
 ^~~~
 {   }
/home/marxin/Programming/gcc/gcc/cp/method.c:904:23: warning: suggest braces 
around initialization of subobject [-Wmissing-braces]
   { "weak_ordering", "equivalent", "greater", "less" },
  ^~~
  {  }
/home/marxin/Programming/gcc/gcc/cp/method.c:905:25: warning: suggest braces 
around initialization of subobject [-Wmissing-braces]
   { "strong_ordering", "equal", "greater", "less" }
^~
{ }

Ready to be installed?
Thanks,
Martin

gcc/cp/ChangeLog:

2020-03-03  Martin Liska  

* method.c: Wrap array in ctor with braces in order
to silent clang warnings.
---
 gcc/cp/method.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index 790d5704092..f10cfecaae8 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -900,9 +900,9 @@ struct comp_cat_info_t
 };
 static const comp_cat_info_t comp_cat_info[cc_last]
 = {
-   { "partial_ordering", "equivalent", "greater", "less", "unordered" },
-   { "weak_ordering", "equivalent", "greater", "less" },
-   { "strong_ordering", "equal", "greater", "less" }
+   { "partial_ordering", { "equivalent", "greater", "less", "unordered" } },
+   { "weak_ordering", { "equivalent", "greater", "less" } },
+   { "strong_ordering", { "equal", "greater", "less" } }
 };
 
 /* A cache of the category types to speed repeated lookups.  */

[PATCH] [COMMITTED] arc: Add ARC entry for gcc-10/changes.html

2020-03-03 Thread Claudiu Zissulescu

Add ARC entry for gcc-10/changes.html

---
 htdocs/gcc-10/changes.html | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 53d0ca08..4e27c05b 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -557,7 +557,19 @@ a work-in-progress.
   much improved.
 
 
-
+ARC
+
+  The interrupt service routine functions saves all used
+  registers, including extension registers and auxiliary registers
+  used by Zero Overhead Loops.
+  Improve code-size by using multiple short instructions instead
+  of a single long mov or ior instruction when its long immediate
+  constant is known.
+  Fix usage of the accumulator register for ARC600.
+  Fix issues with uncached attribute.
+  Remove -mq-class option.
+  Improve 64-bit integer addition and subtraction operations.
+
 
 arm
 
-- 
2.24.1

Re: [PATCH] explow: Fix ICE caused by plus_constant [PR94002]

2020-03-03 Thread Richard Biener

On Tue, 3 Mar 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs in cross to riscv64-linux.  The problem is
> that we have a DImode integral constant (that doesn't fit into SImode),
> which is pushed into a constant pool and later access just the first half of
> it using a MEM.  When plus_constant is called on such a MEM, if the constant
> has mode, we verify the mode, but if it doesn't, we don't and ICE later on
> when we think the CONST_INT is a valid SImode constant.
> 
> Fixed thusly, tested with cross to riscv64-linux and bootstrapped/regtested
> on x86_64-linux and i686-linux, ok for trunk?

Looks OK, but I guess we could do better by appropriately sub-regging
the constant (which might not always be possible of course).

Thanks,
Richard.

> 2020-03-03  Jakub Jelinek  
> 
>   PR rtl-optimization/94002
>   * explow.c (plus_constant): Punt if cst has VOIDmode and
>   get_pool_mode is different from mode.
> 
>   * gcc.dg/pr94002.c: New test.
> 
> --- gcc/explow.c.jj   2020-01-12 11:54:36.564411130 +0100
> +++ gcc/explow.c  2020-03-02 22:09:19.544380020 +0100
> @@ -128,6 +128,9 @@ plus_constant (machine_mode mode, rtx x,
> cst = gen_lowpart (mode, cst);
> gcc_assert (cst);
>   }
> +  else if (GET_MODE (cst) == VOIDmode
> +&& get_pool_mode (XEXP (x, 0)) != mode)
> + break;
> if (GET_MODE (cst) == VOIDmode || GET_MODE (cst) == mode)
>   {
> tem = plus_constant (mode, cst, c);
> --- gcc/testsuite/gcc.dg/pr94002.c.jj 2020-03-02 22:05:58.508338170 +0100
> +++ gcc/testsuite/gcc.dg/pr94002.c2020-03-02 22:05:32.864715503 +0100
> @@ -0,0 +1,13 @@
> +/* PR rtl-optimization/94002 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fno-tree-dce -fno-tree-reassoc" } */
> +/* { dg-additional-options "-fPIC" { target fpic } } */
> +
> +unsigned a, b;
> +
> +void
> +foo (void)
> +{
> +  __builtin_sub_overflow (b, 44852956282LL, &a);
> +  a += ~b;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH PR93674]Avoid introducing IV of enumeral type in case of -fstrict-enums

2020-03-03 Thread Richard Biener

On Mon, Mar 2, 2020 at 6:14 PM Andrew Pinski  wrote:
>
> On Mon, Mar 2, 2020 at 1:40 AM Richard Biener
>  wrote:
> >
> > On Mon, Mar 2, 2020 at 9:07 AM bin.cheng  
> > wrote:
> > >
> > > Hi,
> > > This is a simple fix for PR93674.  It adds cand carefully for enumeral 
> > > type iv_use in
> > > case of -fstrict-enums, it also avoids computing, replacing iv_use with 
> > > the candidate
> > > so that no IV of enumeral type is introduced with -fstrict-enums option.
> > >
> > > Testcase is also added.  Bootstrap and test on x86_64.  Any comment?
> >
> > I think we should avoid enum-typed (or bool-typed) IVs in general, not just
> > with -fstrict-enums.  That said, the ENUMERAL_TYPE checks should be
> > !(INTEGER_TYPE || POINTER_TYPE_P) checks.
>
> Maybe even check type_has_mode_precision_p or
> TYPE_MIN_VALUE/TYPE_MAX_VALUE have the same as the min/max for that
> precision/signedness.

Indeed we don't want non-mode precision INTEGER_TYPE IVs either.  I wouldn't
check TYPE_MIN/MAX_VALUE here though.

Richard.

> Thanks,
> Andrew
>
> >
> > +  /* Check if cand can represent values of use for strict enums.  */
> > +  else if (TREE_CODE (ctype) == ENUMERAL_TYPE && flag_strict_enums)
> > +{
> >
> > if we don't have enum-typed IV candidates then the computation should
> > be carried out in INTEGER_TYPE and then be converted to enum type.
> > So why's this and the may_eliminate_iv hunks necessary?
> >
> > Richard.
> >
> > > Thanks,
> > > bin
> > > 2020-03-02  Bin Cheng  
> > >
> > > PR tree-optimization/93674
> > > * tree-ssa-loop-ivopts.c (add_iv_candidate_for_use): Add candidate
> > > for enumeral type iv_use converted from other iv.
> > > (get_computation_cost, may_eliminate_iv): Avoid compute, eliminate
> > > iv_use with enumeral type iv_cand in case of -fstrict-enums.
> > >
> > > gcc/testsuite
> > > 2020-03-02  Bin Cheng  
> > >
> > > PR tree-optimization/93674
> > > * g++.dg/pr93674.C: New test.

Re: [Ping][PATCH][Arm] ACLE intrinsics: AdvSIMD BFloat16 convert instructions

2020-03-03 Thread Kyrill Tkachov


Hi Dennis,

On 3/2/20 5:41 PM, Dennis Zhang wrote:

Hi all,

On 17/01/2020 16:46, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on Arm BFMode patch
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>
> This patch implements intrinsics to convert between bfloat16 and 
float32

> formats.
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regression tested.
>
> Is it OK for trunk please?



Ok.

Thanks,

Kyrill



>
> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2020-01-17  Dennis Zhang  
>
>  * config/arm/arm_bf16.h (vcvtah_f32_bf16, vcvth_bf16_f32): New.
>  * config/arm/arm_neon.h (vcvt_f32_bf16, vcvtq_low_f32_bf16): New.
>  (vcvtq_high_f32_bf16, vcvt_bf16_f32): New.
>  (vcvtq_low_bf16_f32, vcvtq_high_bf16_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfcvt, vbfcvt_high): New 
entries.

>  (vbfcvtv4sf, vbfcvtv4sf_high): Likewise.
>  * config/arm/iterators.md (VBFCVT, VBFCVTM): New mode iterators.
>  (V_bf_low, V_bf_cvt_m): New mode attributes.
>  * config/arm/neon.md (neon_vbfcvtv4sf): New.
>  (neon_vbfcvtv4sf_highv8bf, neon_vbfcvtsf): New.
>  (neon_vbfcvt, neon_vbfcvt_highv8bf): New.
>  (neon_vbfcvtbf_cvtmode, neon_vbfcvtbf): New
>  * config/arm/unspecs.md (UNSPEC_BFCVT, UNSPEC_BFCVT_HIG): New.
>
> gcc/testsuite/ChangeLog:
>
> 2020-01-17  Dennis Zhang  
>
>  * gcc.target/arm/simd/bf16_cvt_1.c: New test.
>
>

The tests are updated in this patch for assembly test.
Rebased to trunk top.

Is it OK to commit please?

Cheers
Dennis

[PATCH 1/3] [ARC] Remove mmixed-code option.

2020-03-03 Thread Claudiu Zissulescu

The mmixed-code option is obsolete, remove it.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.c (arc_override_options): Remove
TARGET_MIXED_CODE reference.
* config/arc/arc.md (abssi2_mixed): Remove pattern.
* config/arc/arc.h (TARGET_MIXED_CODE): Remove macro.
(INDEX_REG_CLASS): Only refer to GENERAL_REGS.
* config/arc/arc.opt (mmixed-code): Remove option.
* doc/invoke.texi (ARC): Remove mmixed-code doc.
---
 gcc/config/arc/arc.h   | 4 +---
 gcc/config/arc/arc.md  | 8 
 gcc/config/arc/arc.opt | 8 
 gcc/doc/invoke.texi| 8 +---
 4 files changed, 2 insertions(+), 26 deletions(-)

diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index b7fa7ba8fa3..21ffeee9ad2 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -115,8 +115,6 @@ extern const char *arc_cpu_to_as (int argc, const char 
**argv);
 
 /* Run-time compilation parameters selecting different hardware subsets.  */
 
-#define TARGET_MIXED_CODE (TARGET_MIXED_CODE_SET)
-
 #define TARGET_SPFP (TARGET_SPFP_FAST_SET || TARGET_SPFP_COMPACT_SET)
 #define TARGET_DPFP (TARGET_DPFP_FAST_SET || TARGET_DPFP_COMPACT_SET   \
 || TARGET_FP_DP_AX)
@@ -571,7 +569,7 @@ extern enum reg_class arc_regno_reg_class[];
a scale factor or added to another register (as well as added to a
displacement).  */
 
-#define INDEX_REG_CLASS (TARGET_MIXED_CODE ? ARCOMPACT16_REGS : GENERAL_REGS)
+#define INDEX_REG_CLASS GENERAL_REGS
 
 /* The class value for valid base registers. A base register is one used in
an address which is the register value plus a displacement.  */
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index a26a7a4dd5f..e1958fda2e6 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -2001,14 +2001,6 @@ archs4x, archs4xd"
 
 ;; Absolute instructions
 
-(define_insn "*abssi2_mixed"
-  [(set (match_operand:SI 0 "compact_register_operand" "=q")
-   (abs:SI (match_operand:SI 1 "compact_register_operand" "q")))]
-  "TARGET_MIXED_CODE"
-  "abs%? %0,%1%&"
-  [(set_attr "type" "two_cycle_core")
-   (set_attr "iscompact" "true")])
-
 (define_insn "abssi2"
   [(set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,w,w")
(abs:SI (match_operand:SI 1 "nonmemory_operand" "Rcq#q,cL,Cal")))]
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index deaf306739e..2b2b947ca08 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -135,14 +135,6 @@ mcode-density
 Target Report Mask(CODE_DENSITY)
 Enable code density instructions for ARCv2.
 
-mmixed-code
-Target Report Mask(MIXED_CODE_SET)
-Tweak register allocation to help 16-bit instruction generation.
-; originally this was:
-;Generate ARCompact 16-bit instructions intermixed with 32-bit instructions
-; but we do that without -mmixed-code, too, it's just a different instruction
-; count / size tradeoff.
-
 ; We use an explict definition for the negative form because that is the
 ; actually interesting option, and we want that to have its own comment.
 mvolatile-cache
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8c6c90217f4..7627e889b5d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -727,7 +727,7 @@ Objective-C and Objective-C++ Dialects}.
 -mcase-vector-pcrel  -mcompact-casesi  -mno-cond-exec  -mearly-cbranchsi @gol
 -mexpand-adddi  -mindexed-loads  -mlra  -mlra-priority-none @gol
 -mlra-priority-compact mlra-priority-noncompact  -mmillicode @gol
--mmixed-code  -mq-class  -mRcq  -mRcw  -msize-level=@var{level} @gol
+-mq-class  -mRcq  -mRcw  -msize-level=@var{level} @gol
 -mtune=@var{cpu}  -mmultcost=@var{num}  -mcode-density-frame @gol
 -munalign-prob-threshold=@var{probability}  -mmpy-option=@var{multo} @gol
 -mdiv-rem  -mcode-density  -mll64  -mfpu=@var{fpu}  -mrf16  -mbranch-index}
@@ -17956,12 +17956,6 @@ This option enable the compiler to emit @code{enter} 
and @code{leave}
 instructions.  These instructions are only valid for CPUs with
 code-density feature.
 
-@item -mmixed-code
-@opindex mmixed-code
-Tweak register allocation to help 16-bit instruction generation.
-This generally has the effect of decreasing the average instruction size
-while increasing the instruction count.
-
 @item -mq-class
 @opindex mq-class
 Ths option is deprecated.  Enable @samp{q} instruction alternatives.
-- 
2.24.1

[PATCH 3/3] [ARC] Remove munalign-prob-threshold.

2020-03-03 Thread Claudiu Zissulescu

The munalign-prob-threshold option is obsolete, remove it.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.opt (munalign-prob-threshold): Remove option.
* doc/invoke.texi (ARC): Remove munalign-prob-threshold doc.
* config/arc/arc.c (arc_unalign_branch_p): Remove unused function.
---
 gcc/config/arc/arc.c   | 23 ---
 gcc/config/arc/arc.opt |  6 --
 gcc/doc/invoke.texi| 11 +--
 3 files changed, 1 insertion(+), 39 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index c57febd3783..af26e5b9711 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -9920,29 +9920,6 @@ gen_acc2 (void)
   return gen_rtx_REG (SImode, TARGET_BIG_ENDIAN ? 57: 56);
 }
 
-/* FIXME: a parameter should be added, and code added to final.c,
-   to reproduce this functionality in shorten_branches.  */
-#if 0
-/* Return nonzero iff BRANCH should be unaligned if possible by upsizing
-   a previous instruction.  */
-int
-arc_unalign_branch_p (rtx branch)
-{
-  rtx note;
-
-  if (!TARGET_UNALIGN_BRANCH)
-return 0;
-  /* Do not do this if we have a filled delay slot.  */
-  if (get_attr_delay_slot_filled (branch) == DELAY_SLOT_FILLED_YES
-  && !NEXT_INSN (branch)->deleted ())
-return 0;
-  note = find_reg_note (branch, REG_BR_PROB, 0);
-  return (!note
- || (arc_unalign_prob_threshold && !br_prob_note_reliable_p (note))
- || INTVAL (XEXP (note, 0)) < arc_unalign_prob_threshold);
-}
-#endif
-
 /* When estimating sizes during arc_reorg, when optimizing for speed, there
are three reasons why we need to consider branches to be length 6:
- annull-false delay slot insns are implemented using conditional execution,
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index a8af5138183..244c3abd86d 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -287,12 +287,6 @@ mmul32x16
 Target Report Mask(MULMAC_32BY16_SET)
 Generate 32x16 multiply and mac instructions.
 
-; the initializer is supposed to be: Init(REG_BR_PROB_BASE/2) ,
-; alas, basic-block.h is not included in options.c .
-munalign-prob-threshold=
-Target RejectNegative Joined UInteger Var(arc_unalign_prob_threshold) 
Init(1/2)
-Set probability threshold for unaligning branches.
-
 mmedium-calls
 Target Var(TARGET_MEDIUM_CALLS) Init(TARGET_MMEDIUM_CALLS_DEFAULT)
 Don't use less than 25 bit addressing range for calls.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 802d36d4098..51c2d6d04de 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -729,7 +729,7 @@ Objective-C and Objective-C++ Dialects}.
 -mlra-priority-compact mlra-priority-noncompact  -mmillicode @gol
 -mq-class  -mRcq  -mRcw  -msize-level=@var{level} @gol
 -mtune=@var{cpu}  -mmultcost=@var{num}  -mcode-density-frame @gol
--munalign-prob-threshold=@var{probability}  -mmpy-option=@var{multo} @gol
+-mmpy-option=@var{multo} @gol
 -mdiv-rem  -mcode-density  -mll64  -mfpu=@var{fpu}  -mrf16  -mbranch-index}
 
 @emph{ARM Options}
@@ -18031,15 +18031,6 @@ Tune for ARC4x release 3.10a.
 Cost to assume for a multiply instruction, with @samp{4} being equal to a
 normal instruction.
 
-@item -munalign-prob-threshold=@var{probability}
-@opindex munalign-prob-threshold
-Set probability threshold for unaligning branches.
-When tuning for @samp{ARC700} and optimizing for speed, branches without
-filled delay slot are preferably emitted unaligned and long, unless
-profiling indicates that the probability for the branch to be taken
-is below @var{probability}.  @xref{Cross-profiling}.
-The default is (REG_BR_PROB_BASE/2), i.e.@: 5000.
-
 @end table
 
 The following options are maintained for backward compatibility, but
-- 
2.24.1

[PATCH 2/3] [ARC] Remove malign-call

2020-03-03 Thread Claudiu Zissulescu

The malign-call option is obsolete, remove it.

gcc/
-xx-xx  Claudiu Zissulescu  

* config/arc/arc.opt (malign-call): Remove option.
* doc/invoke.texi (ARC): Remove malign-call doc.
* common/config/arc/arc-common.c (arc_option_optimization_table):
Remove malign-call.
---
 gcc/common/config/arc/arc-common.c | 1 -
 gcc/config/arc/arc.opt | 4 
 gcc/doc/invoke.texi| 6 +-
 3 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/gcc/common/config/arc/arc-common.c 
b/gcc/common/config/arc/arc-common.c
index 14c20123c70..7f46f547e30 100644
--- a/gcc/common/config/arc/arc-common.c
+++ b/gcc/common/config/arc/arc-common.c
@@ -62,7 +62,6 @@ static const struct default_options 
arc_option_optimization_table[] =
 { OPT_LEVELS_SIZE, OPT_fif_conversion, NULL, 0 },
 { OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
 { OPT_LEVELS_3_PLUS_SPEED_ONLY, OPT_msize_level_, NULL, 0 },
-{ OPT_LEVELS_3_PLUS_SPEED_ONLY, OPT_malign_call, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
index 2b2b947ca08..a8af5138183 100644
--- a/gcc/config/arc/arc.opt
+++ b/gcc/config/arc/arc.opt
@@ -301,10 +301,6 @@ mannotate-align
 Target Var(TARGET_ANNOTATE_ALIGN)
 Explain what alignment considerations lead to the decision to make an insn 
short or long.
 
-malign-call
-Target Var(TARGET_ALIGN_CALL)
-Do alignment optimizations for call instructions.
-
 mRcq
 Target Var(TARGET_Rcq)
 Enable Rcq constraint handling - most short code generation depends on this.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7627e889b5d..802d36d4098 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -723,7 +723,7 @@ Objective-C and Objective-C++ Dialects}.
 -mlong-calls  -mmedium-calls  -msdata  -mirq-ctrl-saved @gol
 -mrgf-banked-regs  -mlpc-width=@var{width}  -G @var{num} @gol
 -mvolatile-cache  -mtp-regno=@var{regno} @gol
--malign-call  -mauto-modify-reg  -mbbit-peephole  -mno-brcc @gol
+-mauto-modify-reg  -mbbit-peephole  -mno-brcc @gol
 -mcase-vector-pcrel  -mcompact-casesi  -mno-cond-exec  -mearly-cbranchsi @gol
 -mexpand-adddi  -mindexed-loads  -mlra  -mlra-priority-none @gol
 -mlra-priority-compact mlra-priority-noncompact  -mmillicode @gol
@@ -17861,10 +17861,6 @@ Enable cache bypass for volatile references.
 The following options fine tune code generation:
 @c code generation tuning options
 @table @gcctabopt
-@item -malign-call
-@opindex malign-call
-Do alignment optimizations for call instructions.
-
 @item -mauto-modify-reg
 @opindex mauto-modify-reg
 Enable the use of pre/post modify with register displacement.
-- 
2.24.1

Re: [PATCH] use all same precision in wide_int arguments (PR 93986)

2020-03-03 Thread Richard Biener

On Tue, Mar 3, 2020 at 12:04 AM Martin Sebor  wrote:
>
> The wide_int APIs expect operands to have the same precision and
> abort when they don't.  This is especially insidious in code where
> the operands normally do have the same precision but where mixed
> precision arguments can come up as a result of unusual combinations
> optimization options.  That is also what precipitated pr93986.

If you want sth like (signed) arbitrary precision arithmetic then you can
use widest_int instead.  Or, since you're working with offsets, offset_int
is another good choice.

> The attached patch adjusts the code to extend all wide_int operands
> to the same precision to avoid the ICE.
>
> Besides the usual bootstrap/testing I also compiled all string tests
> in gcc.dg with the same options as in the test case in pr93986 in
> an effort to weed out any lingering bugs like it (found none).
>
> Martin

Re: [PATCH] sccvn: Improve handling of load masked with integer constant [PR93582]

2020-03-03 Thread Richard Biener

On Mon, 2 Mar 2020, Jakub Jelinek wrote:

> On Mon, Mar 02, 2020 at 12:46:30PM +0100, Richard Biener wrote:
> > > +   void *r = data.push_partial_def (pd, 0, prec);
> > > +   if (r == (void *) -1)
> > > + return NULL_TREE;
> > > +   gcc_assert (r == NULL_TREE);
> > > + }
> > > +   pos += tz;
> > > +   if (pos == prec)
> > > + break;
> > > +   w = wi::lrshift (w, tz);
> > > +   tz = wi::ctz (wi::bit_not (w));
> > > +   if (pos + tz > prec)
> > > + tz = prec - pos;
> > > +   pos += tz;
> > > +   w = wi::lrshift (w, tz);
> > > + }
> > 
> > I'd do this in the vn_walk_cb_data CTOR instead - you pass mask != 
> > NULL_TREE anyway so you can as well pass mask.
> 
> I've tried, but have no idea how to handle the case where
> data.push_partial_def (pd, 0, prec); fails above if it is done in the
> constructor.
> Though, the BIT_AND_EXPR case already checks:
> + && CHAR_BIT == 8
> + && BITS_PER_UNIT == 8
> + && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
> and also checks the pathological cases of mask being all ones or all zeros,
> so it is just the theoretical case of
> maxsizei > bufsize * BITS_PER_UNIT
> so maybe it is moot and we can just assert that push_partial_def
> returned NULL.
> 
> > I wonder if we can instead make the above return NULL (finish
> > return (void *)-1) and do sth like
> > 
> >  if (!wvnresult && mask)
> >return data.masked_result;
> > 
> > and thus avoid the type-"unsafe" return frobbing by storing the
> > result value in an extra member of the vn_walk_cb_data struct.
> 
> Done that way.
> 
> > Any reason you piggy-back on visit_reference_op_load instead of using
> > vn_reference_lookup directly?  I'd very much prefer that since it
> > doesn't even try to mess with the SSA lattice.
> 
> I didn't want to duplicate the VCE case, but it isn't that long.

Ah, I'm not sure this will ever trigger though, does it?

> So, like this if it passes bootstrap/regtest?

Yes, this is OK (remove the VCE case at your discretion).

Thanks,
Richard.

> 2020-03-02  Jakub Jelinek  
> 
>   PR tree-optimization/93582
>   * tree-ssa-sccvn.h (vn_reference_lookup): Add mask argument.
>   * tree-ssa-sccvn.c (struct vn_walk_cb_data): Add mask and masked_result
>   members, initialize them in the constructor and if mask is non-NULL,
>   artificially push_partial_def {} for the portions of the mask that
>   contain zeros.
>   (vn_walk_cb_data::finish): If mask is non-NULL, set masked_result to
>   val and return (void *)-1.  Formatting fix.
>   (vn_reference_lookup_pieces): Adjust vn_walk_cb_data initialization.
>   Formatting fix.
>   (vn_reference_lookup): Add mask argument.  If non-NULL, don't call
>   fully_constant_vn_reference_p nor vn_reference_lookup_1 and return
>   data.mask_result.
>   (visit_nary_op): Handle BIT_AND_EXPR of a memory load and INTEGER_CST
>   mask.
>   (visit_stmt): Formatting fix.
> 
>   * gcc.dg/tree-ssa/pr93582-10.c: New test.
>   * gcc.dg/pr93582.c: New test.
>   * gcc.c-torture/execute/pr93582.c: New test.
> 
> --- gcc/tree-ssa-sccvn.h.jj   2020-02-28 17:32:56.391363613 +0100
> +++ gcc/tree-ssa-sccvn.h  2020-03-02 13:52:17.488680037 +0100
> @@ -256,7 +256,7 @@ tree vn_reference_lookup_pieces (tree, a
>vec ,
>vn_reference_t *, vn_lookup_kind);
>  tree vn_reference_lookup (tree, tree, vn_lookup_kind, vn_reference_t *, bool,
> -   tree * = NULL);
> +   tree * = NULL, tree = NULL_TREE);
>  void vn_reference_lookup_call (gcall *, vn_reference_t *, vn_reference_t);
>  vn_reference_t vn_reference_insert_pieces (tree, alias_set_type, tree,
>  vec ,
> --- gcc/tree-ssa-sccvn.c.jj   2020-02-28 17:32:56.390363628 +0100
> +++ gcc/tree-ssa-sccvn.c  2020-03-02 15:48:12.982620557 +0100
> @@ -1686,15 +1686,55 @@ struct pd_data
>  struct vn_walk_cb_data
>  {
>vn_walk_cb_data (vn_reference_t vr_, tree orig_ref_, tree *last_vuse_ptr_,
> -vn_lookup_kind vn_walk_kind_, bool tbaa_p_)
> +vn_lookup_kind vn_walk_kind_, bool tbaa_p_, tree mask_)
>  : vr (vr_), last_vuse_ptr (last_vuse_ptr_), last_vuse (NULL_TREE),
> -  vn_walk_kind (vn_walk_kind_), tbaa_p (tbaa_p_),
> -  saved_operands (vNULL), first_set (-2), known_ranges (NULL)
> -   {
> - if (!last_vuse_ptr)
> -   last_vuse_ptr = &last_vuse;
> - ao_ref_init (&orig_ref, orig_ref_);
> -   }
> +  mask (mask_), masked_result (NULL_TREE), vn_walk_kind (vn_walk_kind_),
> +  tbaa_p (tbaa_p_), saved_operands (vNULL), first_set (-2),
> +  known_ranges (NULL)
> +  {
> +if (!last_vuse_ptr)
> +  last_vuse_ptr = &last_vuse;
> +ao_ref_init (&orig_ref, orig_ref_);
> +if (mask)
> +  {
> + wide_int w = wi::to_wide (mask);
> + unsi

Re: [PATCH 1/3] [ARC] Remove mmixed-code option.

2020-03-03 Thread Richard Biener

On Tue, Mar 3, 2020 at 10:41 AM Claudiu Zissulescu  wrote:
>
> The mmixed-code option is obsolete, remove it.

You might want to preserve the option and ignore it like we do
for some in common.opt:

fargument-alias
Common Ignore
Does nothing. Preserved for backward compatibility.

this avoids compiler errors when updating the compiler but not
adjusting flags.

Richard.

> gcc/
> -xx-xx  Claudiu Zissulescu  
>
> * config/arc/arc.c (arc_override_options): Remove
> TARGET_MIXED_CODE reference.
> * config/arc/arc.md (abssi2_mixed): Remove pattern.
> * config/arc/arc.h (TARGET_MIXED_CODE): Remove macro.
> (INDEX_REG_CLASS): Only refer to GENERAL_REGS.
> * config/arc/arc.opt (mmixed-code): Remove option.
> * doc/invoke.texi (ARC): Remove mmixed-code doc.
> ---
>  gcc/config/arc/arc.h   | 4 +---
>  gcc/config/arc/arc.md  | 8 
>  gcc/config/arc/arc.opt | 8 
>  gcc/doc/invoke.texi| 8 +---
>  4 files changed, 2 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index b7fa7ba8fa3..21ffeee9ad2 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -115,8 +115,6 @@ extern const char *arc_cpu_to_as (int argc, const char 
> **argv);
>
>  /* Run-time compilation parameters selecting different hardware subsets.  */
>
> -#define TARGET_MIXED_CODE (TARGET_MIXED_CODE_SET)
> -
>  #define TARGET_SPFP (TARGET_SPFP_FAST_SET || TARGET_SPFP_COMPACT_SET)
>  #define TARGET_DPFP (TARGET_DPFP_FAST_SET || TARGET_DPFP_COMPACT_SET   \
>  || TARGET_FP_DP_AX)
> @@ -571,7 +569,7 @@ extern enum reg_class arc_regno_reg_class[];
> a scale factor or added to another register (as well as added to a
> displacement).  */
>
> -#define INDEX_REG_CLASS (TARGET_MIXED_CODE ? ARCOMPACT16_REGS : GENERAL_REGS)
> +#define INDEX_REG_CLASS GENERAL_REGS
>
>  /* The class value for valid base registers. A base register is one used in
> an address which is the register value plus a displacement.  */
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index a26a7a4dd5f..e1958fda2e6 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -2001,14 +2001,6 @@ archs4x, archs4xd"
>
>  ;; Absolute instructions
>
> -(define_insn "*abssi2_mixed"
> -  [(set (match_operand:SI 0 "compact_register_operand" "=q")
> -   (abs:SI (match_operand:SI 1 "compact_register_operand" "q")))]
> -  "TARGET_MIXED_CODE"
> -  "abs%? %0,%1%&"
> -  [(set_attr "type" "two_cycle_core")
> -   (set_attr "iscompact" "true")])
> -
>  (define_insn "abssi2"
>[(set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,w,w")
> (abs:SI (match_operand:SI 1 "nonmemory_operand" "Rcq#q,cL,Cal")))]
> diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
> index deaf306739e..2b2b947ca08 100644
> --- a/gcc/config/arc/arc.opt
> +++ b/gcc/config/arc/arc.opt
> @@ -135,14 +135,6 @@ mcode-density
>  Target Report Mask(CODE_DENSITY)
>  Enable code density instructions for ARCv2.
>
> -mmixed-code
> -Target Report Mask(MIXED_CODE_SET)
> -Tweak register allocation to help 16-bit instruction generation.
> -; originally this was:
> -;Generate ARCompact 16-bit instructions intermixed with 32-bit instructions
> -; but we do that without -mmixed-code, too, it's just a different instruction
> -; count / size tradeoff.
> -
>  ; We use an explict definition for the negative form because that is the
>  ; actually interesting option, and we want that to have its own comment.
>  mvolatile-cache
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 8c6c90217f4..7627e889b5d 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -727,7 +727,7 @@ Objective-C and Objective-C++ Dialects}.
>  -mcase-vector-pcrel  -mcompact-casesi  -mno-cond-exec  -mearly-cbranchsi @gol
>  -mexpand-adddi  -mindexed-loads  -mlra  -mlra-priority-none @gol
>  -mlra-priority-compact mlra-priority-noncompact  -mmillicode @gol
> --mmixed-code  -mq-class  -mRcq  -mRcw  -msize-level=@var{level} @gol
> +-mq-class  -mRcq  -mRcw  -msize-level=@var{level} @gol
>  -mtune=@var{cpu}  -mmultcost=@var{num}  -mcode-density-frame @gol
>  -munalign-prob-threshold=@var{probability}  -mmpy-option=@var{multo} @gol
>  -mdiv-rem  -mcode-density  -mll64  -mfpu=@var{fpu}  -mrf16  -mbranch-index}
> @@ -17956,12 +17956,6 @@ This option enable the compiler to emit @code{enter} 
> and @code{leave}
>  instructions.  These instructions are only valid for CPUs with
>  code-density feature.
>
> -@item -mmixed-code
> -@opindex mmixed-code
> -Tweak register allocation to help 16-bit instruction generation.
> -This generally has the effect of decreasing the average instruction size
> -while increasing the instruction count.
> -
>  @item -mq-class
>  @opindex mq-class
>  Ths option is deprecated.  Enable @samp{q} instruction alternatives.
> --
> 2.24.1
>

[PATCH] tree-optimization/93946 - fix bogus redundant store removal in FRE, DSE and DOM

2020-03-03 Thread Richard Biener

This fixes a common mistake in removing a store that looks redudnant but
is not because it changes the dynamic type of the memory and thus makes
a difference for following loads with TBAA.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2020-03-03  Richard Biener  

PR tree-optimization/93946
* alias.h (refs_same_for_tbaa_p): Declare.
* alias.c (refs_same_for_tbaa_p): New function.
* tree-ssa-alias.c (ao_ref_alias_set): For a NULL ref return
zero.
* tree-ssa-scopedtables.h
(avail_exprs_stack::lookup_avail_expr): Add output argument
giving access to the hashtable entry.
* tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr):
Likewise.
* tree-ssa-dom.c: Include alias.h.
(dom_opt_dom_walker::optimize_stmt): Validate TBAA state before
removing redundant store.
* tree-ssa-sccvn.h (vn_reference_s::base_set): New member.
(ao_ref_init_from_vn_reference): Adjust prototype.
(vn_reference_lookup_pieces): Likewise.
(vn_reference_insert_pieces): Likewise.
* tree-ssa-sccvn.c: Track base alias set in addition to alias
set everywhere.
(eliminate_dom_walker::eliminate_stmt): Also check base alias
set when removing redundant stores.
(visit_reference_op_store): Likewise.
* dse.c (record_store): Adjust valdity check for redundant
store removal.

* gcc.dg/torture/pr93946-1.c: New testcase.
* gcc.dg/torture/pr93946-2.c: Likewise.
---
 gcc/alias.c  |  20 +++
 gcc/alias.h  |   1 +
 gcc/dse.c|   9 +-
 gcc/testsuite/gcc.dg/torture/pr93946-1.c |  27 
 gcc/testsuite/gcc.dg/torture/pr93946-2.c |  28 
 gcc/tree-ssa-alias.c |   2 +
 gcc/tree-ssa-dom.c   |  10 +-
 gcc/tree-ssa-dse.c   |  34 -
 gcc/tree-ssa-pre.c   |  24 ++--
 gcc/tree-ssa-sccvn.c | 222 +--
 gcc/tree-ssa-sccvn.h |  11 +-
 gcc/tree-ssa-scopedtables.c  |   5 +-
 gcc/tree-ssa-scopedtables.h  |   2 +-
 13 files changed, 272 insertions(+), 123 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr93946-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr93946-2.c

diff --git a/gcc/alias.c b/gcc/alias.c
index fe75e22cdb5..d38e386d0e8 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -368,6 +368,26 @@ rtx_refs_may_alias_p (const_rtx x, const_rtx mem, bool 
tbaa_p)
 && MEM_ALIAS_SET (mem) != 0);
 }
 
+/* Return true if the ref EARLIER behaves the same as LATER with respect
+   to TBAA for every memory reference that might follow LATER.  */
+
+bool
+refs_same_for_tbaa_p (tree earlier, tree later)
+{
+  ao_ref earlier_ref, later_ref;
+  ao_ref_init (&earlier_ref, earlier);
+  ao_ref_init (&later_ref, later);
+  alias_set_type earlier_set = ao_ref_alias_set (&earlier_ref);
+  alias_set_type later_set = ao_ref_alias_set (&later_ref);
+  if (!(earlier_set == later_set
+   || alias_set_subset_of (later_set, earlier_set)))
+return false;
+  alias_set_type later_base_set = ao_ref_base_alias_set (&later_ref);
+  alias_set_type earlier_base_set = ao_ref_base_alias_set (&earlier_ref);
+  return (earlier_base_set == later_base_set
+ || alias_set_subset_of (later_base_set, earlier_base_set));
+}
+
 /* Returns a pointer to the alias set entry for ALIAS_SET, if there is
such an entry, or NULL otherwise.  */
 
diff --git a/gcc/alias.h b/gcc/alias.h
index 781b040fec1..4453d9723ce 100644
--- a/gcc/alias.h
+++ b/gcc/alias.h
@@ -38,6 +38,7 @@ extern void dump_alias_stats_in_alias_c (FILE *s);
 tree reference_alias_ptr_type (tree);
 bool alias_ptr_types_compatible_p (tree, tree);
 int compare_base_decls (tree, tree);
+bool refs_same_for_tbaa_p (tree, tree);
 
 /* This alias set can be used to force a memory to conflict with all
other memories, creating a barrier across which no memory reference
diff --git a/gcc/dse.c b/gcc/dse.c
index fe1181b502f..0d96bd43457 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -1540,9 +1540,12 @@ record_store (rtx body, bb_info_t bb_info)
 width)
  /* We can only remove the later store if the earlier aliases
 at least all accesses the later one.  */
- && (MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
- || alias_set_subset_of (MEM_ALIAS_SET (mem),
- MEM_ALIAS_SET (s_info->mem
+ && ((MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
+  || alias_set_subset_of (MEM_ALIAS_SET (mem),
+  MEM_ALIAS_SET (s_info->mem)))
+ && (!MEM_EXPR (s_info->mem)
+ || refs_same_fo

Hope Project

2020-03-03 Thread Leonie Botha




HSA Hope Project



Hundreds of children are going to school hungry, exhausted and poorly clothed 
because their parents are so strapped for cash. Starving pupils dressed in 
threadbare clothes and shabby shoes stealing food from classmates - it sounds 
like a scene from a horror movie, but this is the reality of life, and still a 
reality even in the new decade of 2020. there has been an increase in the 
number of pupils turning up to school without having eaten breakfast or the 
night before.

Some are so hungry they have resorted to stealing food from others, some even 
stealing extra keeping it for their siblings at home.
None of this is because the parents are lacking in parental skills or are poor 
managers of their budget, as they have no budget to rely on or earn so little, 
it is a daily struggle just to keep a roof over their children’s heads, with 
living expenses going up and jobs getting cut on a regular basis. These 
children already suffer so much, when they get to school, they get teased and 
bullied for not wearing the correct uniform and not having the necessary school 
needs. Please help us make a change in a child’s life. The children and 
families that we assist is located across South Africa.

If they have no or little lunch, we provide them with one but who knows what 
they get for dinner? And what happens in the holidays? That is why Helping SA 
will be helping these children and their families with food parcels, furniture, 
clothing and whatever they may need we will also be assisting in helping to 
find the parents employment.
We assist families with children with weekly/monthly food parcels; it becomes 
difficult for children to stay positive about the future when they don’t even 
know when and where they will receive their next meal, it cost us +/- R1800 per 
month per family for their food parcel, that is why we rely on the public for 
donations so that we can be able to do this.



The fundraising will be done by asking for a R100 donation, in return you will 
receive a Hope Ribbon and an invoice to claim back from tax. Please help us to 
make this a memorable year for all. If you are doing a donation through the 
bank please reply to this email with your details and postal address to ensure 
you receive your Invoice and ribbon. Donations received towards our Hope 
project will be used to buy food, school clothes and shoes and stationary. This 
project will assist families with children as well as children safety houses 
and orphanages. 



To view our NPO and Section 18A certificate please visit our website 
(to receive copies of our certificates / proof of registration, accounting 
letters etc. you can also email adminor reply to this email requesting 
documents)



Because we buy in bulk, we are able to assist more with less;



R50 –   2 Children / Elderly have food for 1 day


R 100 –2 Children / Elderly have food for 2 days


R 500 –2 Children / Elderly have food for 10 days


R 1000 – 2 Children / Elderly have food for 20 days


R5000 –   2 Children / Elderly have food for 3 months


R10’000 – 2 Children / Elderly have food for 6 Months (186 days)
You can give hope where there is none!!



Hope you can find it your heart to assist with this project.

ABSA 
HELPING SA 
ACC NR; 4080925296 
BRANCH:  632005 
CURRENT ACCOUNT 
PLEASE USE REFERENCE YOUR NAME OR COMPANY NAME 
WHEN DOING A TRANSFER - THANKING YOU IN ADVANCE 
OR 
NEDBANK 
HELPING SA 
ACC NR; 1162381620 
BRANCH: 198765  
CURRENT ACCOUNT 





  

Yours Faithfully
Leonie Botha
  
Vice-Chairperson/Fundraiser
Helping SA
081 703 6774
Alternative email: admin (@) helpingsa (.co.za) 
Web:  (www) helping-sa (.co.za)
please remove brackets when emailing or visiting our website, its for email 
purposes to prevent extra links
NPO:  115-333
PBO: 930043138
"Generosity consists not the sum given, but the manner in which it is bestowed"




If you wish not to receive emails from me, please reply with a blank email and 
I will never email you again, we do not mean to spam anyone we are just trying 
to assist as many as possible. Thanking you in advance. OR REMOVE  We 
rather email than trouble you on the phone while you are busy this way you can 
decide when you have time to go through our email.

[committed] libstdc++: Micro-optimisations for lexicographical_compare_three_way

2020-03-03 Thread Jonathan Wakely

As noted in LWG 3410 the specification in the C++20 draft performs more
iterator comparisons than necessary when the end of either range is
reached. Our implementation followed that specification. This removes
the redundant comparisons so that we do no unnecessary work as soon as
we find that we've reached the end of either range.

The odd-looking return statement is because it generates better code
than the original version that copied the global constants.

* include/bits/stl_algobase.h (lexicographical_compare_three_way):
Avoid redundant iterator comparisons (LWG 3410).

Tested powerpc64le-linux, committed to master.


commit 9b4f00dd3f799337d8b8ef5e79f5a682c8059ab9
Author: Jonathan Wakely 
Date:   Tue Mar 3 11:06:26 2020 +

libstdc++: Micro-optimisations for lexicographical_compare_three_way

As noted in LWG 3410 the specification in the C++20 draft performs more
iterator comparisons than necessary when the end of either range is
reached. Our implementation followed that specification. This removes
the redundant comparisons so that we do no unnecessary work as soon as
we find that we've reached the end of either range.

The odd-looking return statement is because it generates better code
than the original version that copied the global constants.

* include/bits/stl_algobase.h (lexicographical_compare_three_way):
Avoid redundant iterator comparisons (LWG 3410).

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 7a9d932b421..4b63086965d 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1711,15 +1711,16 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
return __lencmp;
  }
 #endif // is_constant_evaluated
-  while (__first1 != __last1 && __first2 != __last2)
+  while (__first1 != __last1)
{
+ if (__first2 == __last2)
+   return strong_ordering::greater;
  if (auto __cmp = __comp(*__first1, *__first2); __cmp != 0)
return __cmp;
  ++__first1;
  ++__first2;
}
-  return __first1 != __last1 ? strong_ordering::greater
-   : __first2 != __last2 ? strong_ordering::less : strong_ordering::equal;
+  return (__first2 == __last2) <=> true; // See PR 94006
 }
 
   template

[committed] Darwin, libsanitizer: Update minimum supported system version.

2020-03-03 Thread Iain Sandoe

Hi,

The imported sources from 'upstream’ on the 9 branch do not
support Darwin10 or earlier, so do not enable these by default.

tested on x86_64-darwin and linux,
applied to the branch
thanks
Iain

libsanitizer/ChangeLog:

2020-03-03  Iain Sandoe  

* configure.tgt (x86_64-*-darwin*, i?86-*-darwin*): Enable by
default only for Darwin versions greater than equal to 11
(macOS 10.7).

diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index b241ddbfec4..424fb17a45c 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -60,7 +60,7 @@ case "${target}" in
TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_aarch64.lo
fi
;;
-  x86_64-*-darwin[1]* | i?86-*-darwin[1]*)
+  x86_64-*-darwin1[1-9]* | i?86-*-darwin1[1-9]*)
TSAN_SUPPORTED=no
;;
   x86_64-*-solaris2.11* | i?86-*-solaris2.11*)

Re: [PATCH coroutines] Handle component_ref in captures_temporary

2020-03-03 Thread Nathan Sidwell


On 3/3/20 12:42 AM, JunMa wrote:

在 2020/3/2 下午10:49, Nathan Sidwell 写道:

On 2/12/20 2:23 AM, JunMa wrote:



Hi nathan

Here is the updated patch


This is ok, with a correction in a comment:
+  /* This isn't a temporary or argument.  */
   /* This isn't a temporary.  */
 is sufficient.  Otherwise it reads as 'this is neither a temporary nor 
an argument' which isn't the case.


nathan

--
Nathan Sidwell

Re: [PATCH 2/4 GCC11] Add target hook stride_dform_valid_p

2020-03-03 Thread Kewen.Lin

>> Hi Segher and Richard S.,
>>
>> Sorry for late response.  Thanks for your comments on legitimate_address_p 
>> hook
>> and function addr_offset_valid_p.  I updated the IVOPTs part with
>> addr_offset_valid_p, although rs6000_legitimate_offset_address_p doesn't 
>> check
>> strictly all the time (like worst_case is false), it works well with 
>> SPEC2017.
>> Based on it, the hook is simplified as attached patch.
> 
> Thanks for the update.  I think it would be better to add a --param
> rather than a bool hook though.  Targets can then change the default
> (if necessary) using SET_OPTION_IF_UNSET.  The user can override the
> default if they want to.
> 
> It might also be better to start with an opt-out rather than an opt-in
> (i.e. with the default param value being true rather than false).
> With a default-off option, it's much harder to tell whether something
> has been deliberately turned off or whether no-one's thought about it
> either way.  We can always flip the default later if it turns out that
> nothing other than rs6000 benefits.
> 
> Richard
> 

Hi Richard,

Thanks for your comments!  It's a good idea to use param due to the
flexibility.  And yes, it sounds good to have more targets to try and
make it better.  But I have a bit concern on turning it on by default.
Since it replies on unroll factor estimation, as part 1/4 shows, it
calls targetm.loop_unroll_adjust if target supports, which used to
work on RTL level.  To avoid possible ICE, I'm intended to turn it
off for those targets (s390 & i386) with that hook, since without good
understanding on those targets, it's hard for me to extend them with
gimple level support.  Does it make sense?

The updated patch has been attached.

BR,
Kewen
-

gcc/ChangeLog

2020-03-03  Kewen Lin  

* doc/invoke.texi (iv-consider-reg-offset-for-unroll): Document new 
option.
* params.opt (iv-consider-reg-offset-for-unroll): New.
* config/s390/s390.c (s390_option_override_internal): Disable parameter
iv-consider-reg-offset-for-unroll by default.
* config/i386/i386-options.c (ix86_option_override_internal): Likewise.
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index e0be493..41c99b3 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -2902,6 +2902,12 @@ ix86_option_override_internal (bool main_args_p,
   if (ix86_indirect_branch != indirect_branch_keep)
 SET_OPTION_IF_UNSET (opts, opts_set, flag_jump_tables, 0);
 
+  /* Disable this for now till loop_unroll_adjust supports gimple level checks,
+ to avoid possible ICE.  */
+  if (opts->x_optimize >= 1)
+SET_OPTION_IF_UNSET (opts, opts_set,
+param_iv_consider_reg_offset_for_unroll, 0);
+
   return true;
 }
 
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index ebba670..ae4c2bd 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -15318,6 +15318,12 @@ s390_option_override_internal (struct gcc_options 
*opts,
  not the case when the code runs before the prolog. */
   if (opts->x_flag_fentry && !TARGET_64BIT)
 error ("%<-mfentry%> is supported only for 64-bit CPUs");
+
+  /* Disable this for now till loop_unroll_adjust supports gimple level checks,
+ to avoid possible ICE.  */
+  if (opts->x_optimize >= 1)
+SET_OPTION_IF_UNSET (opts, opts_set,
+param_iv_consider_reg_offset_for_unroll, 0);
 }
 
 static void
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fa98e2f..502031c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12220,6 +12220,15 @@ If the number of candidates in the set is smaller than 
this value,
 always try to remove unnecessary ivs from the set
 when adding a new one.
 
+@item iv-consider-reg-offset-for-unroll
+When RTL unrolling performs on a loop, the duplicated loop iterations introduce
+appropriate induction variable step update expressions.  But if an induction
+variable is derived from address object, it is profitable to fill its required
+offset updates into appropriate memory access expressions if target memory
+accessing supports the register offset mode and the resulted offset is in the
+valid range.  The induction variable optimizations consider this information
+for better unrolling code.  It requires unroll factor estimation in middle-end.
+
 @item avg-loop-niter
 Average number of iterations of a loop.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index 8e4217d..31424cf 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -270,6 +270,10 @@ Bound on number of candidates below that all candidates 
are considered in iv opt
 Common Joined UInteger Var(param_iv_max_considered_uses) Init(250) Param 
Optimization
 Bound on number of iv uses in loop optimized in iv optimizations.
 
+-param=iv-consider-reg-offset-for-unroll=
+Common Joined UInteger Var(param_iv_consider_reg_offset_for_unroll) Init(1) 
Optimization IntegerRange(0, 1) Param
+Whether iv optimizatio

[PATCH] c++: Fix non-constant TARGET_EXPR constexpr handing [PR93998]

2020-03-03 Thread Jakub Jelinek

Hi!

We ICE on the following testcase since I've added the SAVE_EXPR-like
constexpr handling where the TARGET_EXPR initializer (and cleanup) is
evaluated only once (because it might have side-effects like new or delete
expressions in it).
The problem is if the TARGET_EXPR (but I guess in theory SAVE_EXPR too)
initializer is *non_constant_p.  We still remember the result, but already
not that it is *non_constant_p.  Normally that wouldn't be a big problem,
if something is *non_constant_p, we only or into it and so the whole
expression will be non-constant too.  Except in the builtins handling,
we try to evaluate the arguments with non_constant_p pointing into a dummy1
bool which we ignore.  This is because some builtins might fold into a
constant even if they don't have a constexpr argument.  Unfortunately if
we evaluate the TARGET_EXPR first in the argument of such a builtin and then
once again, we don't set *non_constant_p.

So, either we don't remember the TARGET_EXPR/SAVE_EXPR result if it wasn't
constant, like the following patch does, or we could remember it, but in
some way that would make it clear that it is non-constant (e.g. by
pushing into the global->values SAVE_EXPR, SAVE_EXPR entry and perhaps
for TARGET_EXPR don't remember it on TARGET_EXPR_SLOT, but the TARGET_EXPR
itself and similarly push TARGET_EXPR, TARGET_EXPR and if we see those
after the lookup, diagnose + set *non_constant_p.  Or we could perhaps
during the builtin argument evaluation push expressions into a different
save_expr vec and undo them afterwards.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-03-03  Jakub Jelinek  

PR c++/93998
* constexpr.c (cxx_eval_constant_expression)
: Don't record anything if
*non_constant_p is true.

* g++.dg/ext/pr93998.C: New test.

--- gcc/cp/constexpr.c.jj   2020-02-27 09:28:46.227958669 +0100
+++ gcc/cp/constexpr.c  2020-03-02 18:29:38.014333067 +0100
@@ -5474,9 +5474,10 @@ cxx_eval_constant_expression (const cons
   r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 1),
false,
non_constant_p, overflow_p);
-  if (!*non_constant_p)
-   /* Adjust the type of the result to the type of the temporary.  */
-   r = adjust_temp_type (TREE_TYPE (t), r);
+  if (*non_constant_p)
+   break;
+  /* Adjust the type of the result to the type of the temporary.  */
+  r = adjust_temp_type (TREE_TYPE (t), r);
   if (TARGET_EXPR_CLEANUP (t) && !CLEANUP_EH_ONLY (t))
ctx->global->cleanups->safe_push (TARGET_EXPR_CLEANUP (t));
   r = unshare_constructor (r);
@@ -5528,6 +5529,8 @@ cxx_eval_constant_expression (const cons
{
  r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0), false,
non_constant_p, overflow_p);
+ if (*non_constant_p)
+   break;
  ctx->global->values.put (t, r);
  if (ctx->save_exprs)
ctx->save_exprs->safe_push (t);
--- gcc/testsuite/g++.dg/ext/pr93998.C.jj   2020-03-02 18:40:14.843965039 
+0100
+++ gcc/testsuite/g++.dg/ext/pr93998.C  2020-03-02 18:39:27.486661682 +0100
@@ -0,0 +1,14 @@
+// PR c++/93998
+// { dg-do compile { target c++11 } }
+
+struct C
+{
+  constexpr bool operator== (C x) const noexcept { return v == x.v; }
+  int v;
+};
+
+int
+foo (const C a, const C b, bool c)
+{
+  return __builtin_expect (!!(a == b || c), 1) ? 0 : 1;
+}

Jakub

Re: [Ping][PATCH][Arm] ACLE intrinsics: AdvSIMD BFloat16 convert instructions

2020-03-03 Thread Dennis Zhang


Hi Kyrill

On 03/03/2020 09:39, Kyrill Tkachov wrote:

Hi Dennis,

On 3/2/20 5:41 PM, Dennis Zhang wrote:

Hi all,

On 17/01/2020 16:46, Dennis Zhang wrote:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It depends on Arm BFMode patch
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>
> This patch implements intrinsics to convert between bfloat16 and 
float32

> formats.
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regression tested.
>
> Is it OK for trunk please?



Ok.

Thanks,

Kyrill


Thanks for the approval.
It's pushed as 8e6d0dba166324f4b257329bd4b4ddc2b4522359.

Cheers
Dennis





>
> Thanks,
> Dennis
>
> gcc/ChangeLog:
>
> 2020-01-17  Dennis Zhang  
>
>  * config/arm/arm_bf16.h (vcvtah_f32_bf16, vcvth_bf16_f32): New.
>  * config/arm/arm_neon.h (vcvt_f32_bf16, vcvtq_low_f32_bf16): New.
>  (vcvtq_high_f32_bf16, vcvt_bf16_f32): New.
>  (vcvtq_low_bf16_f32, vcvtq_high_bf16_f32): New.
>  * config/arm/arm_neon_builtins.def (vbfcvt, vbfcvt_high): New 
entries.

>  (vbfcvtv4sf, vbfcvtv4sf_high): Likewise.
>  * config/arm/iterators.md (VBFCVT, VBFCVTM): New mode iterators.
>  (V_bf_low, V_bf_cvt_m): New mode attributes.
>  * config/arm/neon.md (neon_vbfcvtv4sf): New.
>  (neon_vbfcvtv4sf_highv8bf, neon_vbfcvtsf): New.
>  (neon_vbfcvt, neon_vbfcvt_highv8bf): New.
>  (neon_vbfcvtbf_cvtmode, neon_vbfcvtbf): New
>  * config/arm/unspecs.md (UNSPEC_BFCVT, UNSPEC_BFCVT_HIG): New.
>
> gcc/testsuite/ChangeLog:
>
> 2020-01-17  Dennis Zhang  
>
>  * gcc.target/arm/simd/bf16_cvt_1.c: New test.
>
>

The tests are updated in this patch for assembly test.
Rebased to trunk top.

Is it OK to commit please?

Cheers
Dennis

Re: [PATCH coroutines] Handle component_ref in captures_temporary

2020-03-03 Thread JunMa


在 2020/3/3 下午8:15, Nathan Sidwell 写道:

On 3/3/20 12:42 AM, JunMa wrote:

在 2020/3/2 下午10:49, Nathan Sidwell 写道:

On 2/12/20 2:23 AM, JunMa wrote:



Hi nathan

Here is the updated patch


This is ok, with a correction in a comment:
+  /* This isn't a temporary or argument.  */
   /* This isn't a temporary.  */
 is sufficient.  Otherwise it reads as 'this is neither a temporary 
nor an argument' which isn't the case.



Thanks, will check in later.

Regards
JunMa

nathan

devel/omp/gcc-9 branch (was: [wwwdocs] Document existence of openacc-gcc-9-branch)

2020-03-03 Thread Thomas Schwinge

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
--- Begin Message ---
Hi!

On 2019-06-04T23:05:53+0100, Julian Brown  wrote:
> I've pushed a new branch "openacc-gcc-9-branch" to the Git
> mirror (i.e. as a Git-only branch), for development of OpenACC and
> related functionality

(Later also for OpenMP support, in particular for AMD GCN offloading;
hence the decision to tag this "omp" instead of just "openacc" going
forward.)

> on top of the GCC 9 branch.

Based on the releases/gcc-9 branch point, I've now rebased/recreated this
for the new Git world, with only minor changes:

  - for easy cross-referencing, in the commit logs point to the original
Git-mirror og9 commits: "(cherry picked from openacc-gcc-9-branch
commit [...])"
  - use 'ChangeLog.omp' files
  - merge the "Add missing ChangeLog.openacc entry" commit into what it's
fixing up -- to avoid having to edit the commit log for the changed
'ChangeLog.omp' filename ;-)
  - fix a handful of "git-svn" commit author IDs, which the server-side
Git hooks probably would've rejected to be pushed

Given that nobody protested to my suggestion in

"Git branch name to use for development branch based on release branch,
not master", I've pushed this as devel/omp/gcc-9, and pushed to
gcc-wwwdocs the attached commit 6b92a4c7033db33ed5b6827e826a561077ed2181
"Document devel/omp/gcc-9 branch".


Grüße
 Thomas


From 6b92a4c7033db33ed5b6827e826a561077ed2181 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 3 Mar 2020 14:37:36 +0100
Subject: [PATCH] Document devel/omp/gcc-9 branch

..., and retire openacc-gcc-9-branch.
---
 htdocs/git.html | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/htdocs/git.html b/htdocs/git.html
index 7fd22a9b..bec93ead 100644
--- a/htdocs/git.html
+++ b/htdocs/git.html
@@ -280,19 +280,16 @@ in Git.
   Makarov vmaka...@redhat.com.
   
 
-  https://gcc.gnu.org/wiki/OpenACC";>openacc-gcc-9-branch
-  This https://gcc.gnu.org/wiki/GitMirror";>Git-only branch is
-  used for collaborative development
-  of https://gcc.gnu.org/wiki/OpenACC";>OpenACC support and related
+  https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=shortlog;h=refs/heads/devel/omp/gcc-9";>devel/omp/gcc-9
+  This branch is for collaborative development of
+  https://gcc.gnu.org/wiki/OpenACC";>OpenACC and
+  https://gcc.gnu.org/wiki/openmp";>OpenMP support and related
   functionality, such
-  as https://gcc.gnu.org/wiki/Offloading";>offloading support.  The
-  branch is based on gcc-9-branch.  Find it
-  at git://gcc.gnu.org/git/gcc.git,
-  https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/openacc-gcc-9-branch>,
-  or
-  https://github.com/gcc-mirror/gcc/tree/openacc-gcc-9-branch>.
+  as https://gcc.gnu.org/wiki/Offloading";>offloading support (OMP:
+  offloading and multi processing).
+  The branch is based on releases/gcc-9.
   Please send patch emails with a short-hand [og9] tag in the
-  subject line, and use ChangeLog.openacc files.
+  subject line, and use ChangeLog.omp files.
 
   unified-autovect
   This branch is for work on improving effectiveness and generality of GCC's
@@ -943,10 +940,12 @@ merged.
 
   openacc-gcc-7-branch
   openacc-gcc-8-branch
+  openacc-gcc-9-branch
   These branches were used for development of
   https://gcc.gnu.org/wiki/OpenACC";>OpenACC support and related
-  functionality, based on gcc-7-branch and gcc-8-branch respectively.
-  Work is now proceeding on the openacc-gcc-9-branch.
+  functionality, based on gcc-7-branch, gcc-8-branch, and gcc-9-branch
+  respectively.
+  Work is now proceeding on the devel/omp/gcc-9 branch.
 
   hammer-3_3-branch
   The goal of this branch was to have a stable compiler based on GCC 3.3
-- 
2.25.1

openacc-gcc-9-branch rebased onto f47f687a97260b1a1305cbf2d7ee3d74b2916a74

git log --reverse --stat -p upstream--git-old/openacc-gcc-9-branch 
^upstream--git-old/gcc-9-branch

while read s x; do git cherry-pick -x "$s" && for f in ChangeLog.openacc 
*/ChangeLog.openacc */*/ChangeLog.openacc; do if test -f "$f"; then git mv "$f" 
"${f%%openacc}omp"; fi; done && git commit --amend -v; done < ~/tmp/tmp

-(cherry picked from commit [...]
+(cherry picked from openacc-gcc-9-branch commit [...]
+(cherry picked from openacc-gcc-9-branch commit [...], commit [...]

789c1d022a87 [PATCH] Forward -foffload=[...] from the driver (compile-time) to 
libgomp (run-time)
b7d99d5bc7b4 Async rework (v6)
dc9711235460 Host-to-device transfer coalescing & magic offset value 
self-documentation
e7f6d3b9a0b1 Factor out duplicate code in gimplify_scan_omp_clauses
1242d5e65813 OpenACC 2.6 manual deep copy support (att

Remove unnecessary XFAILs from existing testcase 20050603-3.c

2020-03-03 Thread will schmidt

Remove unnecessary XFAILs from existing testcase 20050603-3.c.

Hi,
  The XFAILs in this testcase (20050603-3.c) are no longer necessary
  since the fix to PR68803 was committed with svn revision r242681.

OK for master?

Thanks
-Will

2020-03-03  Will Schmidt  

testsuite

* gcc.target/powerpc/20050603-3.c: Remove XFAILS.

diff --git a/gcc/testsuite/gcc.target/powerpc/20050603-3.c 
b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
index 1884241..4017d34 100644
--- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
@@ -10,11 +10,10 @@ struct Q
 void rotins (unsigned int x)
 {
   b.y = (x<<12) | (x>>20);
 }
 
-/* The XFAILs are PR68803.  */
-/* { dg-final { scan-assembler-not {\mrlwinm} { xfail powerpc64le-*-* } } } */
+/* { dg-final { scan-assembler-not {\mrlwinm} } } */
 /* { dg-final { scan-assembler-not {\mrldic} } } */
 /* { dg-final { scan-assembler-not {\mrot[lr]} } } */
 /* { dg-final { scan-assembler-not {\ms[lr][wd]} } } */
-/* { dg-final { scan-assembler-times {\mrl[wd]imi} 1 { xfail powerpc64le-*-* } 
} } */
+/* { dg-final { scan-assembler-times {\mrl[wd]imi} 1 } } */

Re: [committed][ARM] Fix minor testsuite fallout on ARM due to recent IRA changes

2020-03-03 Thread Richard Earnshaw (lists)


On 02/03/2020 16:44, Jeff Law wrote:

On Mon, 2020-03-02 at 16:40 +, Richard Earnshaw (lists) wrote:

On 02/03/2020 15:46, Jeff Law wrote:

More minor fallout from Vlad's IRA changes.

Previously this test used r3 to hold a value across a call (it's an ipa-ra
test).  After Vlad's changes we're using r1 instead.

This patch makes the obvious change to pattern we can for which should
bring
the test back to a passing status.

There's a note about r3 being special on thumb1 and the pattern check is
skipped for thumb1.  That special casing my not be necessary anymore -- I
leave
that to the ARM maintainers to resolve one way or the other.

Committing on the trunk momentarily.

jeff



Any of r1, r2, r3 could be chosen for the 'save' register, so why not
put that in the regexp?

Something like:

+/* { dg-final { scan-assembler-times "mov\tr[123], r0" 1 { target { !
arm_thumb1 } } } } */

And then we are future-proof.

Seems reasonable.  I'll do that later today once the tester is finished with
its current run of arm-linux-gnueabi.

Any thoughts on the thumb1 issue?  I guess leaving it as-is just means slightly
less coverage for thumb1...

jeff



My reading of the comment (I haven't looked at the test output) is that 
this is something that the thumb1 backed doesn't support, probably 
because the regs are all in CLASS_LIKELY_SPILLED.  I'm not sure if 
that's a hang-over from the old reload days, or if it's still relevant 
even today.


R.

Re: [PATCH] use all same precision in wide_int arguments (PR 93986)

2020-03-03 Thread Martin Sebor


On 3/3/20 2:42 AM, Richard Biener wrote:

On Tue, Mar 3, 2020 at 12:04 AM Martin Sebor  wrote:


The wide_int APIs expect operands to have the same precision and
abort when they don't.  This is especially insidious in code where
the operands normally do have the same precision but where mixed
precision arguments can come up as a result of unusual combinations
optimization options.  That is also what precipitated pr93986.


If you want sth like (signed) arbitrary precision arithmetic then you can
use widest_int instead.  Or, since you're working with offsets, offset_int
is another good choice.


Yes, I would much prefer not to have to do all this myself (and
risk getting it wrong).  Unfortunately, the APIs that obtain
the ranges all use wide_int, so I'd have to convert them one way
or the other.  I could change some of the APIs but not all of
them (e.g., get_range_info).

Martin





The attached patch adjusts the code to extend all wide_int operands
to the same precision to avoid the ICE.

Besides the usual bootstrap/testing I also compiled all string tests
in gcc.dg with the same options as in the test case in pr93986 in
an effort to weed out any lingering bugs like it (found none).

Martin

[PATCH] [rs6000] Fix a wrong GC issue

2020-03-03 Thread Bin Bin Lv

Hi,

The source file rs6000.c was split up into several smaller source files
through commit 1acf024.  However, variable "altivec_builtin_mask_for_load" and
"builtin_mode_to_type[MAX_MACHINE_MODE][2]" were marked with the wrong syntax
"GTY(([options])) type name", which led these two variables were not marked as
roots correctly and wrongly GCed.  And when "altivec_builtin_mask_for_load"
was wrongly GCed, the compiling for openJDK is failed with ICEs enabling
precompiled header under mcpu=power7.  So roots must be declared using one of
the following syntaxes: "extern GTY(([options])) type name;" and "static
GTY(([options])) type name;".

And the following patch adds variable "altivec_builtin_mask_for_load" and
"builtin_mode_to_type[MAX_MACHINE_MODE][2]" into the roots array.

Bootstrap and regression tests were done on powerpc64le-linux-gnu (LE) with no
regressions.  Is it OK for trunk?

Thanks,
Bin Bin Lv

gcc/ChangLog

2020-03-03  Bin Bin Lv  

* config/rs6000/rs6000-internal.h (altivec_builtin_mask_for_load,
builtin_mode_to_type[MAX_MACHINE_MODE][2]): Remove GTY(()).
* config/rs6000/rs6000.h (altivec_builtin_mask_for_load,
builtin_mode_to_type[MAX_MACHINE_MODE][2]): Add an extern GTY(())
declaration.
* config/rs6000/rs6000.h (MAX_MACHINE_MODE): Include the header file
for MAX_MACHINE_MODE.
* config/rs6000/rs6000.c (altivec_builtin_mask_for_load,
builtin_mode_to_type[MAX_MACHINE_MODE][2]): Remove the GTY(())
declaration and add the definition.
---
 gcc/config/rs6000/rs6000-internal.h | 4 ++--
 gcc/config/rs6000/rs6000.c  | 4 ++--
 gcc/config/rs6000/rs6000.h  | 6 ++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-internal.h 
b/gcc/config/rs6000/rs6000-internal.h
index a23e956..202b2b2 100644
--- a/gcc/config/rs6000/rs6000-internal.h
+++ b/gcc/config/rs6000/rs6000-internal.h
@@ -187,7 +187,7 @@ extern bool rs6000_passes_long_double;
 extern bool rs6000_passes_vector;
 extern bool rs6000_returns_struct;
 extern bool cpu_builtin_p;
-extern GTY(()) tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
-extern GTY(()) tree altivec_builtin_mask_for_load;
+extern tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
+extern tree altivec_builtin_mask_for_load;
 
 #endif
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9910b27..0faf44b 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -99,7 +99,7 @@
 #endif
 
 /* Support targetm.vectorize.builtin_mask_for_load.  */
-GTY(()) tree altivec_builtin_mask_for_load;
+tree altivec_builtin_mask_for_load;
 
 #ifdef USING_ELFOS_H
 /* Counter for labels which are to be placed in .fixup.  */
@@ -196,7 +196,7 @@ enum reg_class rs6000_constraints[RS6000_CONSTRAINT_MAX];
 int rs6000_vector_align[NUM_MACHINE_MODES];
 
 /* Map selected modes to types for builtins.  */
-GTY(()) tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
+tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
 
 /* What modes to automatically generate reciprocal divide estimate (fre) and
reciprocal sqrt (frsqrte) for.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 1697186..3844bec 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -35,6 +35,10 @@
 #include "config/rs6000/rs6000-modes.h"
 #endif
 
+#ifndef MAX_MACHINE_MODE
+#include "insn-modes.h"
+#endif
+
 /* Definitions for the object file format.  These are set at
compile-time.  */
 
@@ -2488,6 +2492,8 @@ enum rs6000_builtin_type_index
 
 extern GTY(()) tree rs6000_builtin_types[RS6000_BTI_MAX];
 extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];
+extern GTY(()) tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
+extern GTY(()) tree altivec_builtin_mask_for_load;
 
 #ifndef USED_FOR_TARGET
 /* A C structure for machine-specific, per-function data.
-- 
1.8.3.1

Add dg-require to existing powerpc/pr93122.c test

2020-03-03 Thread will schmidt

Add dg-require to existing pr93122.c test

Hi,
  This test (gcc.target/powerpc/pr93122.c) uses the
-mcpu=future option.  It should also ensure the
target can support the same.
Thus, add a dg-requires clause to indicate
a future target is supported on the platform.

Sniff tested successfully.  (mostly obvious).

OK for master?

Thanks
-Will


2020-03-03  Will Schmidt  

testsuite/

* gcc.target/powerpc/pr93122.c: Add dg-require.

diff --git a/gcc/testsuite/gcc.target/powerpc/pr93122.c 
b/gcc/testsuite/gcc.target/powerpc/pr93122.c
index 701711a..a440a05 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr93122.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr93122.c
@@ -1,6 +1,7 @@
 /* PR target/93122 */
+/* { dg-require-effective-target powerpc_future_ok } */
 /* { dg-do compile { target lp64 } } */
 /* { dg-options "-fstack-clash-protection -mprefixed -mfuture" } */
 
 void bar (char *);

Re: [PATCH][AARCH64] Fix for PR86901

2020-03-03 Thread Wilco Dijkstra

Hi Modi,

> The zero extract now matching against other modes would generate a test + 
> branch rather
> than the combined instruction which led to the code size regression. I've 
> updated the patch
> so that tbnz etc. matches GPI and that brings code size down to <0.2% in 
> spec2017 and <0.4% in spec2006.

That's looking better indeed. I notice there are still differences, eg. 
tbz/tbnz counts are
significantly different in perlbench, with ~350 missed cases overall (mostly 
tbz reg, #7).

There are also more uses of uxtw, ubfiz, sbfiz - for example 


@Wilco I've gotten instruction on my side to set up an individual contributor's 
license for the time being. Can you send me the necessary documents to make 
that happen? Thanks!

Re: [PATCH][AARCH64] Fix for PR86901

2020-03-03 Thread Wilco Dijkstra

Hi Modi,

> The zero extract now matching against other modes would generate a test + 
> branch rather
> than the combined instruction which led to the code size regression. I've 
> updated the patch
> so that tbnz etc. matches GPI and that brings code size down to <0.2% in 
> spec2017 and <0.4% in spec2006.

That's looking better indeed. I notice there are still differences, eg. 
tbz/tbnz counts are
significantly different in perlbench, with ~350 missed cases overall (mostly 
tbz reg, #7).

There are also more uses of uxtw, ubfiz, sbfiz - for example I see cases like 
this in namd:

  42c7dc:       13007400        sbfx    w0, w0, #0, #30
  42c7e0:       937c7c00        sbfiz   x0, x0, #4, #32

So it would be a good idea to check any benchmarks where there is still a 
non-trivial
codesize difference. You can get a quick idea what is happening by grepping for
instructions like this:

grep -c sbfiz out1.txt out2.txt
out1.txt:872
out2.txt:934

grep -c tbnz out1.txt out2.txt
out1.txt:5189
out2.txt:4989

> Can you send me the necessary documents to make that happen? Thanks!

That's something you need to sort out with the fsf. There is a mailing list for 
this:
ass...@gnu.org.

Cheers,
Wilco

[committed] libgcc: arm: convert thumb1 code to unified syntax

2020-03-03 Thread Richard Earnshaw (lists)

Unified syntax has been the official syntax for thumb1 assembly for over 
10 years now.  It's time we made preparations for that becoming the 
default in the assembler.  But before we can start doing that we really 
need to clean up some laggards from the olden days.  Libgcc support for 
thumb1 is one such example.


This patch converts all of the legacy (disjoint) syntax that I could 
find over to unified code.  The identification was done by using a trick 
version of gas that defaulted to unified mode which then faults if 
legacy syntax is encountered.  The code produced was then compared 
against the old code to check for differences.  One such difference does 
exist, but that is because in unified syntax 'movs rd, rn' is encoded as 
'lsls rd, rn, #0', rather than 'adds rd, rn, #0'; but that is a 
deliberate change that was introduced because the lsls encoding more 
closely reflects the behaviour of 'movs' in arm state (where only some 
of the condition flags are modified).


libgcc:
* config/arm/bpabi-v6m.S (aeabi_lcmp): Convert thumb1 code
to unified syntax.
(aeabi_ulcmp, aeabi_ldivmod, aeabi_uldivmod): Likewise.
(aeabi_frsub, aeabi_cfcmpeq, aeabi_fcmpeq): Likewise.
(aeabi_fcmp, aeabi_drsub, aeabi_cdrcmple): Likewise.
(aeabi_cdcmpeq, aeabi_dcmpeq, aeabi_dcmp): Likewise.
* config/arm/lib1funcs.S (Lend_fde): Convert thumb1 code
to unified syntax.
(divsi3, modsi3): Likewise.
(clzdi2, ctzsi2): Likewise.
* config/arm/libunwind.S (restore_core_regs): Convert
thumb1 code to unified syntax.
(UNWIND_WRAPPER): Likewise.

Committed to trunk.

R.
diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
index 29fe8faa6e6..1a403efc872 100644
--- a/libgcc/config/arm/bpabi-v6m.S
+++ b/libgcc/config/arm/bpabi-v6m.S
@@ -39,21 +39,21 @@ FUNC_START aeabi_lcmp
 	cmp	xxh, yyh
 	beq	1f
 	bgt	2f
-	mov	r0, #1
-	neg	r0, r0
+	movs	r0, #1
+	negs	r0, r0
 	RET
 2:
-	mov	r0, #1
+	movs	r0, #1
 	RET
 1:
-	sub	r0, xxl, yyl
+	subs	r0, xxl, yyl
 	beq	1f
 	bhi	2f
-	mov	r0, #1
-	neg	r0, r0
+	movs	r0, #1
+	negs	r0, r0
 	RET
 2:
-	mov	r0, #1
+	movs	r0, #1
 1:
 	RET
 	FUNC_END aeabi_lcmp
@@ -65,15 +65,15 @@ FUNC_START aeabi_lcmp
 FUNC_START aeabi_ulcmp
 	cmp	xxh, yyh
 	bne	1f
-	sub	r0, xxl, yyl
+	subs	r0, xxl, yyl
 	beq	2f
 1:
 	bcs	1f
-	mov	r0, #1
-	neg	r0, r0
+	movs	r0, #1
+	negs	r0, r0
 	RET
 1:
-	mov	r0, #1
+	movs	r0, #1
 2:
 	RET
 	FUNC_END aeabi_ulcmp
@@ -91,29 +91,29 @@ FUNC_START aeabi_ulcmp
 	cmp	xxl, #0
 2:
 	beq	3f
-	mov	xxh, #0
-	mvn	xxh, xxh		@ 0x
-	mov	xxl, xxh
+	movs	xxh, #0
+	mvns	xxh, xxh		@ 0x
+	movs	xxl, xxh
 3:
 	.else
 	blt	6f
 	bgt	4f
 	cmp	xxl, #0
 	beq	5f
-4:	mov	xxl, #0
-	mvn	xxl, xxl		@ 0x
-	lsr	xxh, xxl, #1		@ 0x7fff
+4:	movs	xxl, #0
+	mvns	xxl, xxl		@ 0x
+	lsrs	xxh, xxl, #1		@ 0x7fff
 	b	5f
-6:	mov	xxh, #0x80
-	lsl	xxh, xxh, #24		@ 0x8000
-	mov	xxl, #0
+6:	movs	xxh, #0x80
+	lsls	xxh, xxh, #24		@ 0x8000
+	movs	xxl, #0
 5:
 	.endif
 	@ tailcalls are tricky on v6-m.
 	push	{r0, r1, r2}
 	ldr	r0, 1f
 	adr	r1, 1f
-	add	r0, r1
+	adds	r0, r1
 	str	r0, [sp, #8]
 	@ We know we are not on armv4t, so pop pc is safe.
 	pop	{r0, r1, pc}
@@ -128,15 +128,15 @@ FUNC_START aeabi_ulcmp
 FUNC_START aeabi_ldivmod
 	test_div_by_zero signed
 
-	push {r0, r1}
-	mov r0, sp
-	push {r0, lr}
-	ldr r0, [sp, #8]
-	bl SYM(__gnu_ldivmod_helper)
-	ldr r3, [sp, #4]
-	mov lr, r3
-	add sp, sp, #8
-	pop {r2, r3}
+	push	{r0, r1}
+	mov	r0, sp
+	push	{r0, lr}
+	ldr	r0, [sp, #8]
+	bl	SYM(__gnu_ldivmod_helper)
+	ldr	r3, [sp, #4]
+	mov	lr, r3
+	add	sp, sp, #8
+	pop	{r2, r3}
 	RET
 	FUNC_END aeabi_ldivmod
 
@@ -147,15 +147,15 @@ FUNC_START aeabi_ldivmod
 FUNC_START aeabi_uldivmod
 	test_div_by_zero unsigned
 
-	push {r0, r1}
-	mov r0, sp
-	push {r0, lr}
-	ldr r0, [sp, #8]
-	bl SYM(__udivmoddi4)
-	ldr r3, [sp, #4]
-	mov lr, r3
-	add sp, sp, #8
-	pop {r2, r3}
+	push	{r0, r1}
+	mov	r0, sp
+	push	{r0, lr}
+	ldr	r0, [sp, #8]
+	bl	SYM(__udivmoddi4)
+	ldr	r3, [sp, #4]
+	mov	lr, r3
+	add	sp, sp, #8
+	pop	{r2, r3}
 	RET
 	FUNC_END aeabi_uldivmod
 	
@@ -166,9 +166,9 @@ FUNC_START aeabi_uldivmod
 FUNC_START aeabi_frsub
 
   push	{r4, lr}
-  mov	r4, #1
-  lsl	r4, #31
-  eor	r0, r0, r4
+  movs	r4, #1
+  lsls	r4, #31
+  eors	r0, r0, r4
   bl	__aeabi_fadd
   pop	{r4, pc}
 
@@ -181,7 +181,7 @@ FUNC_START aeabi_frsub
 FUNC_START aeabi_cfrcmple
 
 	mov	ip, r0
-	mov	r0, r1
+	movs	r0, r1
 	mov	r1, ip
 	b	6f
 
@@ -196,8 +196,8 @@ FUNC_ALIAS aeabi_cfcmple aeabi_cfcmpeq
 	cmp	r0, #0
 	@ Clear the C flag if the return value was -1, indicating
 	@ that the first operand was smaller than the second.
-	bmi 1f
-	mov	r1, #0
+	bmi	1f
+	movs	r1, #0
 	cmn	r0, r1
 1:
 	pop	{r0, r1, r2, r3, r4, pc}
@@ -210,8 +210,8 @@ FUNC_START	aeabi_fcmpeq
 
 	push	{r4, lr}
 	bl	__eqsf2
-	neg	r0, r0
-	add	r0, r0, #1
+	negs	r0, r0
+	adds	r0, r0, #1
 	pop	{r4, pc}
 
 	FUNC_END aeabi_fcmpeq
@@ -223,10 +22

[PATCH] [rs6000] Rewrite the declaration of a variable

2020-03-03 Thread Bin Bin Lv

Hi,

Rewrite the declaration of toc_section from the source file rs6000.c to its
header file for standardizing the code.

Bootstrap and regression were done on powerpc64le-linux-gnu (LE) with no
regressions.  Is it OK for trunk?

Thanks,
Bin Bin Lv

gcc/ChangeLog

2020-03-03  Bin Bin Lv  

* config/rs6000/rs6000.c (toc_section): Remove its declaration.
* config/rs6000/rs6000.h (toc_section): Add its declaration.
---
 gcc/config/rs6000/rs6000.c | 1 -
 gcc/config/rs6000/rs6000.h | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 0faf44b..c0a6e86 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -181,7 +181,6 @@ static GTY(()) section *tls_private_data_section;
 static GTY(()) section *read_only_private_data_section;
 static GTY(()) section *sdata2_section;
 
-extern GTY(()) section *toc_section;
 section *toc_section = 0;
 
 /* Describe the vector unit used for modes.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3844bec..e77a84a 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2494,6 +2494,7 @@ extern GTY(()) tree rs6000_builtin_types[RS6000_BTI_MAX];
 extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];
 extern GTY(()) tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
 extern GTY(()) tree altivec_builtin_mask_for_load;
+extern union GTY(()) section *toc_section;
 
 #ifndef USED_FOR_TARGET
 /* A C structure for machine-specific, per-function data.
-- 
1.8.3.1

Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

2020-03-03 Thread Delia Burduv


Hi,

I made a mistake in the previous patch. This is the latest version. 
Please let me know if it is ok.


Thanks,
Delia

On 2/21/20 3:18 PM, Delia Burduv wrote:

Hi Kyrill,

The arm_bf16.h is only used for scalar operations. That is how the 
aarch64 versions are implemented too.


Thanks,
Delia

On 2/21/20 2:06 PM, Kyrill Tkachov wrote:

Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in the
AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output registers as
> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst{q}_bf16 as part of the BFloat16 extension.
>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>> have commit rights, so if this is ok can someone please commit it 
for me?

>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>  (bfloat16x4x2_t): New typedef.
>>  (bfloat16x8x2_t): New typedef.
>>  (bfloat16x4x3_t): New typedef.
>>  (bfloat16x8x3_t): New typedef.
>>  (bfloat16x4x4_t): New typedef.
>>  (bfloat16x8x4_t): New typedef.
>>  (vst2_bf16): New.
>>  (vst2q_bf16): New.
>>  (vst3_bf16): New.
>>  (vst3q_bf16): New.
>>  (vst4_bf16): New.
>>  (vst4q_bf16): New.
>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>  (VAR13): New.
>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>  * config/arm/arm-modes.def (V2BF): New mode.
>>  * config/arm/arm-simd-builtin-types.def
>>  (Bfloat16x2_t): New entry.
>>  * config/arm/arm_neon_builtins.def
>>  (vst2): Changed to VAR13 and added v4bf, v8bf
>>  (vst3): Changed to VAR13 and added v4bf, v8bf
>>  (vst4): Changed to VAR13 and added v4bf, v8bf
>>  * config/arm/iterators.md (VDXBF): New iterator.
>>  (VQ2BF): New iterator.
>>  (V_elem): Added V4BF, V8BF.
>>  (V_sz_elem): Added V4BF, V8BF.
>>  (V_mode_nunits): Added V4BF, V8BF.
>>  (q): Added V4BF, V8BF.
>>  *config/arm/neon.md (vst2): Used new iterators.
>>  (vst3): Used new iterators.
>>  (vst3qa): Used new iterators.
>>  (vst3qb): Used new iterators.
>>  (vst4): Used new iterators.
>>  (vst4qa): Used new iterators.
>>  (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * gcc.target/arm/simd/bf16_vstn_1.c: New test.


One thing I just noticed in this and the other arm bfloat16 patches...

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, 
float32x4_t __a, float32x4_t __b,

    return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+  bfloat16x4_t val[2];
+} bfloat16x4x2_t;


These should be in a new arm_bf16.h file that gets included in the 
main arm_neon.h file, right?

I believe the aarch64 versions are implemented that way.

Otherwise the patch looks good to me.
Thanks!
Kyrill


  +
+typedef struct bfloat16x8x2_t
+{
+  bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+

Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

2020-03-03 Thread Delia Burduv


Sorry, I forgot the attachment.

On 3/3/20 4:20 PM, Delia Burduv wrote:

Hi,

I made a mistake in the previous patch. This is the latest version. 
Please let me know if it is ok.


Thanks,
Delia

On 2/21/20 3:18 PM, Delia Burduv wrote:

Hi Kyrill,

The arm_bf16.h is only used for scalar operations. That is how the 
aarch64 versions are implemented too.


Thanks,
Delia

On 2/21/20 2:06 PM, Kyrill Tkachov wrote:

Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in the
AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output 
registers as

> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst{q}_bf16 as part of the BFloat16 extension.
>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>> have commit rights, so if this is ok can someone please commit it 
for me?

>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>  (bfloat16x4x2_t): New typedef.
>>  (bfloat16x8x2_t): New typedef.
>>  (bfloat16x4x3_t): New typedef.
>>  (bfloat16x8x3_t): New typedef.
>>  (bfloat16x4x4_t): New typedef.
>>  (bfloat16x8x4_t): New typedef.
>>  (vst2_bf16): New.
>>  (vst2q_bf16): New.
>>  (vst3_bf16): New.
>>  (vst3q_bf16): New.
>>  (vst4_bf16): New.
>>  (vst4q_bf16): New.
>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>  (VAR13): New.
>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>  * config/arm/arm-modes.def (V2BF): New mode.
>>  * config/arm/arm-simd-builtin-types.def
>>  (Bfloat16x2_t): New entry.
>>  * config/arm/arm_neon_builtins.def
>>  (vst2): Changed to VAR13 and added v4bf, v8bf
>>  (vst3): Changed to VAR13 and added v4bf, v8bf
>>  (vst4): Changed to VAR13 and added v4bf, v8bf
>>  * config/arm/iterators.md (VDXBF): New iterator.
>>  (VQ2BF): New iterator.
>>  (V_elem): Added V4BF, V8BF.
>>  (V_sz_elem): Added V4BF, V8BF.
>>  (V_mode_nunits): Added V4BF, V8BF.
>>  (q): Added V4BF, V8BF.
>>  *config/arm/neon.md (vst2): Used new iterators.
>>  (vst3): Used new iterators.
>>  (vst3qa): Used new iterators.
>>  (vst3qb): Used new iterators.
>>  (vst4): Used new iterators.
>>  (vst4qa): Used new iterators.
>>  (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * gcc.target/arm/simd/bf16_vstn_1.c: New test.


One thing I just noticed in this and the other arm bfloat16 patches...

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, 
float32x4_t __a, float32x4_t __b,

    return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+  bfloat16x4_t val[2];
+} bfloat16x4x2_t;


These should be in a new arm_bf16.h file that gets included in the 
main arm_neon.h file, right?

I believe the aarch64 versions are implemented that way.

Otherwise the patch looks good to me.
Thanks!
Kyrill


  +
+typedef struct bfloat16x8x2_t
+{
+  bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 7f279cca6688c6f11948159666ee647ae533c61d..44c6f46fd63d5eaa1c3c84340d9acd017bb663e4 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -318,6 +318,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define v4bf_UP  E_V4BFmode
 #define v2si_UP  E_V2SImode
 #define v2sf_UP  E_V2SFmode
+#define v2bf_UP  E_V2BFmode
 #define di_UPE_DImode
 #define v16qi_UP E_V16QImode
 #define v8hi_UP  E_V8HImode
@@ -381,6 +382,9 @@ typedef struct {
 #define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
+#define VAR13(T, N, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR12 (T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR1 (T, N, M)
 
 /* The builtin data can be found in arm_neon_builtins.def, arm_vfp_builtins.def

[PATCH 2/4] libstdc++: Add a move-only testsuite iterator type

2020-03-03 Thread Patrick Palka

This adds a move-only testsuite iterator type to , which
will be used in the tests that verify LWG 3355 and has already seen a need in
the tests for LWG 3389 and 3390.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_iterators.h (input_iterator_wrapper_nocopy):
New testsuite iterator.
* testsuite/24_iterators/counted_iterator/lwg3389.cc: Use it.
* testsuite/24_iterators/move_iterator/lwg3390.cc: Likewise.
---
 .../24_iterators/counted_iterator/lwg3389.cc  | 35 ++-
 .../24_iterators/move_iterator/lwg3390.cc | 35 ++-
 .../testsuite/util/testsuite_iterators.h  | 28 +++
 3 files changed, 32 insertions(+), 66 deletions(-)

diff --git a/libstdc++-v3/testsuite/24_iterators/counted_iterator/lwg3389.cc 
b/libstdc++-v3/testsuite/24_iterators/counted_iterator/lwg3389.cc
index cf74fd47bec..8b9bf53f6c5 100644
--- a/libstdc++-v3/testsuite/24_iterators/counted_iterator/lwg3389.cc
+++ b/libstdc++-v3/testsuite/24_iterators/counted_iterator/lwg3389.cc
@@ -23,44 +23,13 @@
 #include 
 
 using __gnu_test::test_range;
-using __gnu_test::input_iterator_wrapper;
-
-template
-struct move_only_wrapper : input_iterator_wrapper
-{
-  using input_iterator_wrapper::input_iterator_wrapper;
-
-  move_only_wrapper()
-: input_iterator_wrapper(nullptr, nullptr)
-  { }
-
-  move_only_wrapper(const move_only_wrapper&) = delete;
-  move_only_wrapper&
-  operator=(const move_only_wrapper&) = delete;
-
-  move_only_wrapper(move_only_wrapper&&) = default;
-  move_only_wrapper&
-  operator=(move_only_wrapper&&) = default;
-
-  using input_iterator_wrapper::operator++;
-
-  move_only_wrapper&
-  operator++()
-  {
-input_iterator_wrapper::operator++();
-return *this;
-  }
-};
-
-static_assert(std::input_iterator>);
-static_assert(!std::forward_iterator>);
-static_assert(!std::copyable>);
+using __gnu_test::input_iterator_wrapper_nocopy;
 
 // LWG 3389
 void
 test01()
 {
   int x[] = {1,2,3,4};
-  test_range rx(x);
+  test_range rx(x);
   auto it = std::counted_iterator(rx.begin(), 2);
 }
diff --git a/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3390.cc 
b/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3390.cc
index 1df7caccece..7e9f4a0d0cc 100644
--- a/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3390.cc
+++ b/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3390.cc
@@ -23,44 +23,13 @@
 #include 
 
 using __gnu_test::test_range;
-using __gnu_test::input_iterator_wrapper;
-
-template
-struct move_only_wrapper : input_iterator_wrapper
-{
-  using input_iterator_wrapper::input_iterator_wrapper;
-
-  move_only_wrapper()
-: input_iterator_wrapper(nullptr, nullptr)
-  { }
-
-  move_only_wrapper(const move_only_wrapper&) = delete;
-  move_only_wrapper&
-  operator=(const move_only_wrapper&) = delete;
-
-  move_only_wrapper(move_only_wrapper&&) = default;
-  move_only_wrapper&
-  operator=(move_only_wrapper&&) = default;
-
-  using input_iterator_wrapper::operator++;
-
-  move_only_wrapper&
-  operator++()
-  {
-input_iterator_wrapper::operator++();
-return *this;
-  }
-};
-
-static_assert(std::input_iterator>);
-static_assert(!std::forward_iterator>);
-static_assert(!std::copyable>);
+using __gnu_test::input_iterator_wrapper_nocopy;
 
 // LWG 3390
 void
 test01()
 {
   int x[] = {1,2,3,4};
-  test_range rx(x);
+  test_range rx(x);
   auto it = std::make_move_iterator(rx.begin());
 }
diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h 
b/libstdc++-v3/testsuite/util/testsuite_iterators.h
index 417dff23c50..e47b2b03e40 100644
--- a/libstdc++-v3/testsuite/util/testsuite_iterators.h
+++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h
@@ -674,6 +674,34 @@ namespace __gnu_test
   { return iter -= n; }
 };
 
+  // A move-only input iterator type.
+  template
+struct input_iterator_wrapper_nocopy : input_iterator_wrapper
+{
+  using input_iterator_wrapper::input_iterator_wrapper;
+
+  input_iterator_wrapper_nocopy()
+   : input_iterator_wrapper(nullptr, nullptr)
+  { }
+
+  input_iterator_wrapper_nocopy(const input_iterator_wrapper_nocopy&) = 
delete;
+  input_iterator_wrapper_nocopy&
+  operator=(const input_iterator_wrapper_nocopy&) = delete;
+
+  input_iterator_wrapper_nocopy(input_iterator_wrapper_nocopy&&) = default;
+  input_iterator_wrapper_nocopy&
+  operator=(input_iterator_wrapper_nocopy&&) = default;
+
+  using input_iterator_wrapper::operator++;
+
+  input_iterator_wrapper_nocopy&
+  operator++()
+  {
+   input_iterator_wrapper::operator++();
+   return *this;
+  }
+};
+
   // A type meeting the minimum std::range requirements
   template class Iter>
 class test_range
-- 
2.25.1.377.g2d2118b814

[PATCH 1/4] libstdc++: Fix use of is_nothrow_assignable_v in

2020-03-03 Thread Patrick Palka

We are passing a value type as the first argument to is_nothrow_assignable_v,
but the result of that is always false.  Since this predicate is a part of the
condition that guards the corresponding optimizations for these algorithms, this
bug means these optimizations are never used.  We should be passing a reference
type to is_nothrow_assignable_v instead.

libstdc++-v3/ChangeLog:

* include/bits/ranges_uninitialized.h
(uninitialized_copy_fn::operator()): Pass a reference type as the first
argument to is_nothrow_assignable_v.
(uninitialized_copy_fn::operator()): Likewise.
(uninitialized_move_fn::operator()): Likewise.  Return an in_out_result
with the input iterator stripped of its move_iterator.
(uninitialized_move_n_fn::operator()): Likewise.
(uninitialized_fill_fn::operator()): Pass a reference type as the first
argument to is_nothrow_assignable_v.
(uninitialized_fill_n_fn::operator()): Likewise.
---
 .../include/bits/ranges_uninitialized.h   | 24 +++
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_uninitialized.h 
b/libstdc++-v3/include/bits/ranges_uninitialized.h
index 01e1cad646c..f97a07a9b4a 100644
--- a/libstdc++-v3/include/bits/ranges_uninitialized.h
+++ b/libstdc++-v3/include/bits/ranges_uninitialized.h
@@ -269,7 +269,7 @@ namespace ranges
if constexpr (sized_sentinel_for<_ISent, _Iter>
  && sized_sentinel_for<_OSent, _Out>
  && is_trivial_v<_OutType>
- && is_nothrow_assignable_v<_OutType,
+ && is_nothrow_assignable_v<_OutType&,
 iter_reference_t<_Iter>>)
  {
auto __d1 = ranges::distance(__ifirst, __ilast);
@@ -316,7 +316,7 @@ namespace ranges
using _OutType = remove_reference_t>;
if constexpr (sized_sentinel_for<_Sent, _Out>
  && is_trivial_v<_OutType>
- && is_nothrow_assignable_v<_OutType,
+ && is_nothrow_assignable_v<_OutType&,
 iter_reference_t<_Iter>>)
  {
auto __d = ranges::distance(__ofirst, __olast);
@@ -354,13 +354,15 @@ namespace ranges
if constexpr (sized_sentinel_for<_ISent, _Iter>
  && sized_sentinel_for<_OSent, _Out>
  && is_trivial_v<_OutType>
- && is_nothrow_assignable_v<_OutType,
+ && is_nothrow_assignable_v<_OutType&,
 
iter_rvalue_reference_t<_Iter>>)
  {
auto __d1 = ranges::distance(__ifirst, __ilast);
auto __d2 = ranges::distance(__ofirst, __olast);
-   return ranges::copy_n(std::make_move_iterator(__ifirst),
- std::min(__d1, __d2), __ofirst);
+   auto [__in, __out]
+ = ranges::copy_n(std::make_move_iterator(__ifirst),
+  std::min(__d1, __d2), __ofirst);
+   return {std::move(__in).base(), __out};
  }
else
  {
@@ -404,12 +406,14 @@ namespace ranges
using _OutType = remove_reference_t>;
if constexpr (sized_sentinel_for<_Sent, _Out>
  && is_trivial_v<_OutType>
- && is_nothrow_assignable_v<_OutType,
+ && is_nothrow_assignable_v<_OutType&,
 
iter_rvalue_reference_t<_Iter>>)
  {
auto __d = ranges::distance(__ofirst, __olast);
-   return ranges::copy_n(std::make_move_iterator(__ifirst),
- std::min(__n, __d), __ofirst);
+   auto [__in, __out]
+ = ranges::copy_n(std::make_move_iterator(__ifirst),
+  std::min(__n, __d), __ofirst);
+   return {std::move(__in).base(), __out};
  }
else
  {
@@ -436,7 +440,7 @@ namespace ranges
   {
using _ValueType = remove_reference_t>;
if constexpr (is_trivial_v<_ValueType>
- && is_nothrow_assignable_v<_ValueType, const _Tp&>)
+ && is_nothrow_assignable_v<_ValueType&, const _Tp&>)
  return ranges::fill(__first, __last, __x);
else
  {
@@ -469,7 +473,7 @@ namespace ranges
   {
using _ValueType = remove_reference_t>;
if constexpr (is_trivial_v<_ValueType>
- && is_nothrow_assignable_v<_ValueType, const _Tp&>)
+ && is_nothrow_assignable_v<_ValueType&, const _Tp&>)
  return ranges::fill_n(__first, __n, __x);
else
  {
-- 
2.25.1.377.g2d2118b814

[PATCH 3/4] libstdc++: Add a test range type that has a sized sentinel

2020-03-03 Thread Patrick Palka

This adds a test range type whose end() is a sized sentinel to
, which will be used in the tests that verify LWG 3355.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_iterators.h (test_range::get_iterator): Make
protected instead of private.
(test_sized_range_sized_sent): New.
---
 .../testsuite/util/testsuite_iterators.h  | 32 +++
 1 file changed, 32 insertions(+)

diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h 
b/libstdc++-v3/testsuite/util/testsuite_iterators.h
index e47b2b03e40..756940ed092 100644
--- a/libstdc++-v3/testsuite/util/testsuite_iterators.h
+++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h
@@ -735,6 +735,7 @@ namespace __gnu_test
  { return i.ptr - s.end; }
};
 
+protected:
   auto
   get_iterator(T* p)
   {
@@ -812,6 +813,37 @@ namespace __gnu_test
 using test_output_sized_range
   = test_sized_range;
 
+  // A type meeting the minimum std::sized_range requirements, and whose end()
+  // returns a size sentinel.
+  template class Iter>
+struct test_sized_range_sized_sent : test_sized_range
+{
+  using test_sized_range::test_sized_range;
+
+  template
+   struct sentinel
+   {
+ T* end;
+
+ friend bool operator==(const sentinel& s, const I& i) noexcept
+ { return s.end == i.ptr; }
+
+ friend std::iter_difference_t
+ operator-(const sentinel& s, const I& i) noexcept
+ { return s.end - i.ptr; }
+
+ friend std::iter_difference_t
+ operator-(const I& i, const sentinel& s) noexcept
+ { return i.ptr - s.end; }
+   };
+
+  auto end() &
+  {
+   using I = decltype(this->get_iterator(this->bounds.last));
+   return sentinel{this->bounds.last};
+  }
+};
+
 // test_range and test_sized_range do not own their elements, so they model
 // std::ranges::borrowed_range.  This file does not define specializations of
 // std::ranges::enable_borrowed_range, so that individual tests can decide
-- 
2.25.1.377.g2d2118b814

[PATCH 4/4] libstdc++: Move-only input iterator support in algorithms (LWG 3355)

2020-03-03 Thread Patrick Palka

This adds support for move-only input iterators in the ranges::unitialized_*
algorithms defined in , as per LWG 3355.  The only changes needed are to
add calls to std::move in the appropriate places and to use operator-() instead
of ranges::distance() because the latter cannot be used with a move-only
iterator with a sized sentinel as is the case here.  (This issue with
ranges::distance is LWG 3392.)

libstdc++-v3/ChangeLog:

LWG 3355 The memory algorithms should support move-only input iterators
introduced by P1207
* include/bits/ranges_uninitialized.h
(__uninitialized_copy_fn::operator()): Use std::move to avoid attempting
to copy __ifirst, which could be a move-only input iterator.  Use
operator- instead of ranges::distance to compute distance from a sized
sentinel.
(__uninitialized_copy_n_fn::operator()): Likewise.
(__uninitialized_move_fn::operator()): Likewise.
(__uninitialized_move_n_fn::operator()): Likewise.
(__uninitialized_destroy_fn::operator()): Use std::move to avoid
attempting to copy __first.
(__uninitialized_destroy_n_fn::operator()): Likewise.
* testsuite/20_util/specialized_algorithms/destroy/constrained.cc:
Augment test.
* .../specialized_algorithms/uninitialized_copy/constrained.cc:
Likewise.
* .../specialized_algorithms/uninitialized_move/constrained.cc:
Likewise.
---
 .../include/bits/ranges_uninitialized.h   | 34 ++-
 .../destroy/constrained.cc| 15 
 .../uninitialized_copy/constrained.cc | 25 ++
 .../uninitialized_move/constrained.cc | 25 ++
 4 files changed, 83 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_uninitialized.h 
b/libstdc++-v3/include/bits/ranges_uninitialized.h
index f97a07a9b4a..d758078fc03 100644
--- a/libstdc++-v3/include/bits/ranges_uninitialized.h
+++ b/libstdc++-v3/include/bits/ranges_uninitialized.h
@@ -272,9 +272,10 @@ namespace ranges
  && is_nothrow_assignable_v<_OutType&,
 iter_reference_t<_Iter>>)
  {
-   auto __d1 = ranges::distance(__ifirst, __ilast);
-   auto __d2 = ranges::distance(__ofirst, __olast);
-   return ranges::copy_n(__ifirst, std::min(__d1, __d2), __ofirst);
+   auto __d1 = __ilast - __ifirst;
+   auto __d2 = __olast - __ofirst;
+   return ranges::copy_n(std::move(__ifirst), std::min(__d1, __d2),
+ __ofirst);
  }
else
  {
@@ -283,7 +284,7 @@ namespace ranges
 ++__ofirst, (void)++__ifirst)
  ::new (__detail::__voidify(*__ofirst)) _OutType(*__ifirst);
__guard.release();
-   return {__ifirst, __ofirst};
+   return {std::move(__ifirst), __ofirst};
  }
   }
 
@@ -319,8 +320,9 @@ namespace ranges
  && is_nothrow_assignable_v<_OutType&,
 iter_reference_t<_Iter>>)
  {
-   auto __d = ranges::distance(__ofirst, __olast);
-   return ranges::copy_n(__ifirst, std::min(__n, __d), __ofirst);
+   auto __d = __olast - __ofirst;
+   return ranges::copy_n(std::move(__ifirst), std::min(__n, __d),
+ __ofirst);
  }
else
  {
@@ -329,7 +331,7 @@ namespace ranges
 ++__ofirst, (void)++__ifirst, (void)--__n)
  ::new (__detail::__voidify(*__ofirst)) _OutType(*__ifirst);
__guard.release();
-   return {__ifirst, __ofirst};
+   return {std::move(__ifirst), __ofirst};
  }
   }
   };
@@ -357,10 +359,10 @@ namespace ranges
  && is_nothrow_assignable_v<_OutType&,
 
iter_rvalue_reference_t<_Iter>>)
  {
-   auto __d1 = ranges::distance(__ifirst, __ilast);
-   auto __d2 = ranges::distance(__ofirst, __olast);
+   auto __d1 = __ilast - __ifirst;
+   auto __d2 = __olast - __ofirst;
auto [__in, __out]
- = ranges::copy_n(std::make_move_iterator(__ifirst),
+ = ranges::copy_n(std::make_move_iterator(std::move(__ifirst)),
   std::min(__d1, __d2), __ofirst);
return {std::move(__in).base(), __out};
  }
@@ -372,7 +374,7 @@ namespace ranges
  ::new (__detail::__voidify(*__ofirst))
_OutType(ranges::iter_move(__ifirst));
__guard.release();
-   return {__ifirst, __ofirst};
+   return {std::move(__ifirst), __ofirst};
  }
   }
 
@@ -409,9 +411,9 @@ namespace ranges
  && is_nothrow_assignable_v<_OutType&,
 
iter_rvalue_reference_t<_Ite

[PATCH v2 3/3] Keep .GCC.command.line sections of LTO objetcs.

2020-03-03 Thread Egeyar Bagcioglu

This patch is for .GCC.command.line sections in LTO objects to be copied
into the final objects as in the following example:

[egeyar@localhost lto]$ gcc -flto -O3 demo.c -c -g --record-gcc-command-line
[egeyar@localhost lto]$ gcc -flto -O2 demo2.c -c -g --record-gcc-command-line 
-DFORTIFY=2
[egeyar@localhost lto]$ gcc demo.o demo2.o -o a.out  
[egeyar@localhost lto]$ readelf -p .GCC.command.line a.out 

String dump of section '.GCC.command.line':
  [ 0]  10.0.1 20200227 (experimental) : gcc -flto -O3 demo.c -c -g 
--record-gcc-command-line
  [56]  10.0.1 20200227 (experimental) : gcc -flto -O2 demo2.c -c -g 
--record-gcc-command-line -DFORTIFY=2

Regards
Egeyar

libiberty:
2020-02-27  Egeyar Bagcioglu  

* simple-object.c (handle_lto_debug_sections): Name
".GCC.command.line" among debug sections to be copied over
from lto objects.
---
 libiberty/simple-object.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libiberty/simple-object.c b/libiberty/simple-object.c
index d9c648a..3b3ca9c 100644
--- a/libiberty/simple-object.c
+++ b/libiberty/simple-object.c
@@ -298,6 +298,9 @@ handle_lto_debug_sections (const char *name, int rename)
  COMDAT sections in objects produced by GCC.  */
   else if (strcmp (name, ".comment") == 0)
 return strcpy (newname, name);
+  /* Copy over .GCC.command.line section under the same name if present.  */
+  else if (strcmp (name, ".GCC.command.line") == 0)
+return strcpy (newname, name);
   free (newname);
   return NULL;
 }
-- 
1.8.3.1

[PATCH v2 0/3] Introduce a new GCC option, --record-gcc-command-line

2020-03-03 Thread Egeyar Bagcioglu

Hello,

I would like to propose the second version of the patches which introduce a 
compile option --record-gcc-command-line. When passed to gcc, it saves the 
command line invoking gcc into the produced object file. The option makes it 
trivial to trace back with which command a file was compiled and by which 
version of the gcc. It helps with debugging, reproducing bugs and repeating the 
build process.

The reviews addressed in this version include indentation changes, error 
handling and corner case coverage pointed out by Segher Boessenkool in the 
first two patches; while the new third patch is another corner case (lto) 
coverage requested by Martin Liska.

Although we discussed after the submission of the first version that there are 
several other options performing similar tasks, I believe we established that 
there is still a need for this specific functionality. Therefore, I am skipping 
in this email the comparison between this option and the existing options with 
similarities.

This functionality operates as the following: It saves gcc's argv into a 
temporary file, and passes --record-gcc-command-line  to cc1 or 
cc1plus. The functionality of the backend is implemented via a hook. This patch 
includes an example implementation of the hook for elf targets: 
elf_record_gcc_command_line function. This function reads the given file and 
writes gcc's version and the command line into a mergeable string section, 
.GCC.command.line. It creates one entry per invocation. By doing so, it makes 
it clear which options were used together in a single gcc invocation, even 
after linking.

Here is an *example usage* of the option:
[egeyar@localhost save-commandline]$ gcc main.c --record-gcc-command-line
[egeyar@localhost save-commandline]$ readelf -p .GCC.command.line a.out

String dump of section '.GCC.command.line':
  [ 0]  10.0.1 20200227 (experimental) : gcc main.c 
--record-gcc-command-line


The following is a *second example* calling g++ with -save-temps and a 
repetition of options, where --save-temps saves the intermediate file, 
main.cmdline in this case. You can see that the options are recorded 
unprocessed:

[egeyar@localhost save-commandline]$ g++ main.c -save-temps 
--record-gcc-command-line -O0 -O2 -O3 -DFORTIFY=2 --record-gcc-command-line
[egeyar@localhost save-commandline]$ readelf -p .GCC.command.line a.out

String dump of section '.GCC.command.line':
  [ 0]  10.0.1 20200227 (experimental) : g++ main.c -save-temps 
--record-gcc-command-line -O0 -O2 -O3 -DFORTIFY=2 --record-gcc-command-line


The first patch of this three-patch-series only extends the testsuite 
machinery, while the second patch implements the functionality and adds a test 
case for it. The third patch that alters libiberty is to make sure the 
.GCC.command.line section in LTO objects survive the linking and appear in the 
linked object.

In addition to the new test case, I built binutils as my test case after 
passing this option to CFLAGS. The added .GCC.command.line section of ld.bfd 
listed many compile commands as expected. Tested on x86_64-pc-linux-gnu.

Please review the patches, let me know what you think and apply if appropriate.

Regards
Egeyar

Egeyar Bagcioglu (3):
  Introduce dg-require-target-object-format
  Introduce the gcc option --record-gcc-command-line
  Keep .GCC.command.line sections of LTO objetcs.

 gcc/common.opt |  4 +++
 gcc/config/elfos.h |  5 +++
 gcc/doc/tm.texi| 22 
 gcc/doc/tm.texi.in |  4 +++
 gcc/gcc.c  | 41 ++
 gcc/gcc.h  |  1 +
 gcc/target.def | 30 
 gcc/target.h   |  3 ++
 .../c-c++-common/record-gcc-command-line.c |  8 +
 gcc/testsuite/lib/target-supports-dg.exp   | 11 ++
 gcc/toplev.c   | 13 +++
 gcc/varasm.c   | 40 +
 libiberty/simple-object.c  |  3 ++
 13 files changed, 185 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/record-gcc-command-line.c

-- 
1.8.3.1

[PATCH v2 2/3] Introduce the gcc option --record-gcc-command-line

2020-03-03 Thread Egeyar Bagcioglu

gcc:
2020-02-27  Egeyar Bagcioglu  

* common.opt (--record-gcc-command-line): New option.
* config/elfos.h (TARGET_ASM_RECORD_GCC_COMMAND_LINE): Define as
elf_record_gcc_command_line.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_ASM_RECORD_GCC_COMMAND_LINE): Introduce.
(TARGET_ASM_RECORD_GCC_COMMAND_LINE_SECTION): Likewise.
* gcc.c (_gcc_argc): New static variable.
(_gcc_argv): Likewise.
(record_gcc_command_line_spec_function): New function.
(cc1_options): Handle --record-gcc-command-line.
(static_spec_functions): Add record_gcc_command_line_spec_function
with pseudo name record-gcc-command-line.
(driver::main): Call set_commandline.
(driver::set_commandline): Declare.
* gcc.h (driver::set_commandline): Declare.
* target.def (record_gcc_command_line): A new hook.
(record_gcc_command_line_section): A new hookpod.
* target.h (elf_record_gcc_command_line): Declare.
* toplev.c (init_asm_output): Check for gcc_command_line_file and
call record_gcc_command_line if necessary.
* varasm.c: Include "version.h".
(elf_record_gcc_command_line): Define.

gcc/testsuite:
2020-02-27  Egeyar Bagcioglu  

* c-c++-common/record-gcc-command-line.c: New.
---
 gcc/common.opt |  4 +++
 gcc/config/elfos.h |  5 +++
 gcc/doc/tm.texi| 22 
 gcc/doc/tm.texi.in |  4 +++
 gcc/gcc.c  | 41 ++
 gcc/gcc.h  |  1 +
 gcc/target.def | 30 
 gcc/target.h   |  3 ++
 .../c-c++-common/record-gcc-command-line.c |  8 +
 gcc/toplev.c   | 13 +++
 gcc/varasm.c   | 40 +
 11 files changed, 171 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/record-gcc-command-line.c

diff --git a/gcc/common.opt b/gcc/common.opt
index fa9da50..1bacded 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -391,6 +391,10 @@ Driver Alias(print-sysroot-headers-suffix)
 -profile
 Common Alias(p)
 
+-record-gcc-command-line
+Common NoDriverArg Separate Var(gcc_command_line_file)
+Record the command line making this gcc call in the produced object file.
+
 -save-temps
 Driver Alias(save-temps)
 
diff --git a/gcc/config/elfos.h b/gcc/config/elfos.h
index 74a3eaf..1d5f447 100644
--- a/gcc/config/elfos.h
+++ b/gcc/config/elfos.h
@@ -462,6 +462,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #undef  TARGET_ASM_RECORD_GCC_SWITCHES
 #define TARGET_ASM_RECORD_GCC_SWITCHES elf_record_gcc_switches
 
+/* Allow the use of the --record-gcc-command-line switch via the
+   elf_record_gcc_command_line function defined in varasm.c.  */
+#undef  TARGET_ASM_RECORD_GCC_COMMAND_LINE
+#define TARGET_ASM_RECORD_GCC_COMMAND_LINE elf_record_gcc_command_line
+
 /* A C statement (sans semicolon) to output to the stdio stream STREAM
any text necessary for declaring the name of an external symbol
named NAME which is referenced in this compilation but not defined.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 19985ad..0a8ef03 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -8113,6 +8113,28 @@ ELF implementation of the 
@code{TARGET_ASM_RECORD_GCC_SWITCHES} target
 hook.
 @end deftypevr
 
+@deftypefn {Target Hook} int TARGET_ASM_RECORD_GCC_COMMAND_LINE ()
+Provides the target with the ability to record the command line that
+has been passed to the compiler driver. The @var{gcc_command_line_file}
+variable specifies the intermediate file that holds the command line.
+
+The return value must be zero.  Other return values may be supported
+in the future.
+
+By default this hook is set to NULL, but an example implementation,
+@var{elf_record_gcc_command_line}, is provided for ELF based targets.
+it records the command line as ASCII text inside a new, mergable string
+section in the assembler output file.  The name of the new section is
+provided by the @code{TARGET_ASM_RECORD_GCC_COMMAND_LINE_SECTION}
+target hook.
+@end deftypefn
+
+@deftypevr {Target Hook} {const char *} 
TARGET_ASM_RECORD_GCC_COMMAND_LINE_SECTION
+This is the name of the section that will be created by the example
+ELF implementation of the @code{TARGET_ASM_RECORD_GCC_COMMAND_LINE}
+target hook.
+@end deftypevr
+
 @need 2000
 @node Data Output
 @subsection Output of Data
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 1a16150..174840b 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -5192,6 +5192,10 @@ It must not be modified by command-line option 
processing.
 
 @hook TARGET_ASM_RECORD_GCC_SWITCHES_SECTION
 
+@hook TARGET_ASM_RECORD_GCC_

[PATCH v2 1/3] Introduce dg-require-target-object-format

2020-03-03 Thread Egeyar Bagcioglu

gcc/testsuite/:
2020-02-27  Egeyar Bagcioglu  

* lib/target-supports-dg.exp (dg-require-target-object-format): New.
---
 gcc/testsuite/lib/target-supports-dg.exp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp
index 2a21424..9678a66 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -164,6 +164,17 @@ proc dg-require-dll { args } {
 set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
 }
 
+# If this target does not produce the given object format skip this test.
+
+proc dg-require-target-object-format { args } {
+if [string equal [gcc_target_object_format] [lindex $args 1] ] {
+   return
+}
+
+upvar dg-do-what dg-do-what
+set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+}
+
 # If this host does not support an ASCII locale, skip this test.
 
 proc dg-require-ascii-locale { args } {
-- 
1.8.3.1

Re: [GCC][PATCH][AArch64] ACLE intrinsics for BFCVTN, BFCVTN2 (AArch64 AdvSIMD) and BFCVT (AArch64 FP)

2020-03-03 Thread Delia Burduv


Hi,

Here is the latest version of the patch.

On 2/18/20 1:51 PM, Richard Sandiford wrote:

Tamar Christina  writes:

Hi Richard,


..ffb5305e2e5ea1aadae07e82f
d8e

d6f9f247c1a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-compil
+++ e.c
@@ -0,0 +1,48 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */


The { target ... } isn't necessary here.  (Missed that in the other review, 
sorry.)



Why not? The advsimd-intrinsics tests are shared between both AArch32 and 
AArch64.


Ah, so they are.  Think it would better to move them to a new
gcc.target/arm-common or something in that case.  Tests in
gcc.target/aarch64 really ought to be specific to aarch64.

Thanks,
Richard



I left the advsimd-intrinsics tests shared since creating a new 
gcc.target/arm-common should probably be a separate patch.


Let me know if this patch is ok. And if it is, can someone please commit 
it for me?


Thanks,
Delia



Tamar.


+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+
+#include 
+
+/*
+**test_bfcvtn:
+** bfcvtn\tv0.4h, v0.4s


Like with the other review, I think the literal tab you had in the original 
patch
looks better than \t.


[...]
diff --git
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c
new file mode 100644
index


..8d7dffe16275de60e884c449af
a0

fea0b1af6081
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd
+++ .c
@@ -0,0 +1,15 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */


This needs:

/* { dg-require-effective-target aarch64_asm_bf16_ok } */

(Doesn't exist yet, but I hope to post a patch soon.)

Looks good otherwise, thanks.

Richard
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index a118f4f121de067c0a80f691b852247b0ab27f7a..c1e364b4d1cb7a207c1de5a409a08e18a405a107 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -692,3 +692,9 @@
   VAR2 (TERNOP, bfdot, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_lane, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_laneq, 0, v2sf, v4sf)
+
+  /* Implemented by aarch64_bfcvtn{q}{2}  */
+  VAR1 (UNOP, bfcvtn, 0, v4bf)
+  VAR1 (UNOP, bfcvtn_q, 0, v8bf)
+  VAR1 (BINOP, bfcvtn2, 0, v8bf)
+  VAR1 (UNOP, bfcvt, 0, bf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 97f46f96968a6bc2f93bbc812931537b819b3b19..111e48ea6b70548158ba696d997a2f2fc3cb2769 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7091,3 +7091,32 @@
 }
   [(set_attr "type" "neon_dot")]
 )
+
+;; bfcvtn
+(define_insn "aarch64_bfcvtn"
+  [(set (match_operand:V4SF_TO_BF 0 "register_operand" "=w")
+(unspec:V4SF_TO_BF [(match_operand:V4SF 1 "register_operand" "w")]
+UNSPEC_BFCVTN))]
+  "TARGET_BF16_SIMD"
+  "bfcvtn\\t%0.4h, %1.4s"
+  [(set_attr "type" "neon_fp_cvt_narrow_s_q")]
+)
+
+(define_insn "aarch64_bfcvtn2v8bf"
+  [(set (match_operand:V8BF 0 "register_operand" "=w")
+(unspec:V8BF [(match_operand:V8BF 1 "register_operand" "0")
+  (match_operand:V4SF 2 "register_operand" "w")]
+  UNSPEC_BFCVTN2))]
+  "TARGET_BF16_SIMD"
+  "bfcvtn2\\t%0.8h, %2.4s"
+  [(set_attr "type" "neon_fp_cvt_narrow_s_q")]
+)
+
+(define_insn "aarch64_bfcvtbf"
+  [(set (match_operand:BF 0 "register_operand" "=w")
+(unspec:BF [(match_operand:SF 1 "register_operand" "w")]
+UNSPEC_BFCVT))]
+  "TARGET_BF16_FP"
+  "bfcvt\\t%h0, %s1"
+  [(set_attr "type" "f_cvt")]
+)
diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
index 3759c0d1cb449a7f0125cc2a1433127564d66622..fa7080c2953bc3254f01d842a8afef917d469080 100644
--- a/gcc/config/aarch64/arm_bf16.h
+++ b/gcc/config/aarch64/arm_bf16.h
@@ -27,6 +27,19 @@
 #ifndef _AARCH64_BF16_H_
 #define _AARCH64_BF16_H_
 
+#pragma GCC push_options
+#pragma GCC target ("+nothing+bf16")
+
 typedef __bf16 bfloat16_t;
+typedef float float32_t;
+
+__extension__ extern __inline bfloat16_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcvth_bf16_f32 (float32_t __a)
+{
+  return __builtin_aarch64_bfcvtbf (__a);
+}
+
+#pragma GCC pop_options
 
 #endif
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 7f05c3f9eca844b0e7b824a191223a4906c825b1..36f82743231a7160050695267e75a08e0cd73e03 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34660,6 +34660,27 @@ vbfdotq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t

Re: [PATCH] [COMMITTED] arc: Add ARC entry for gcc-10/changes.html

2020-03-03 Thread Martin Sebor


On 3/3/20 2:12 AM, Claudiu Zissulescu wrote:

Add ARC entry for gcc-10/changes.html

---
  htdocs/gcc-10/changes.html | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 53d0ca08..4e27c05b 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -557,7 +557,19 @@ a work-in-progress.
much improved.
  
  
-

+ARC
+
+  The interrupt service routine functions saves all used

 ^
Just a small typo: they "save".


+  registers, including extension registers and auxiliary registers
+  used by Zero Overhead Loops.
+  Improve code-size by using multiple short instructions instead


Not terribly important but I don't think the hyphen in code size is
called for (it's not used as an adjective).

Martin


+  of a single long mov or ior instruction when its long immediate
+  constant is known.
+  Fix usage of the accumulator register for ARC600.
+  Fix issues with uncached attribute.
+  Remove -mq-class option.
+  Improve 64-bit integer addition and subtraction operations.
+
  
  arm

Re: [PATCH 1/3] [ARC] Remove mmixed-code option.

2020-03-03 Thread Jeff Law

On Tue, 2020-03-03 at 10:54 +0100, Richard Biener wrote:
> On Tue, Mar 3, 2020 at 10:41 AM Claudiu Zissulescu  wrote:
> > The mmixed-code option is obsolete, remove it.
> 
> You might want to preserve the option and ignore it like we do
> for some in common.opt:
> 
> fargument-alias
> Common Ignore
> Does nothing. Preserved for backward compatibility.
> 
> this avoids compiler errors when updating the compiler but not
> adjusting flags.
Yea, I'd recommend this as well.

jeff
>

Re: [PATCH 2/3] [ARC] Remove malign-call

2020-03-03 Thread Jeff Law

On Tue, 2020-03-03 at 11:40 +0200, Claudiu Zissulescu wrote:
> The malign-call option is obsolete, remove it.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.opt (malign-call): Remove option.
>   * doc/invoke.texi (ARC): Remove malign-call doc.
>   * common/config/arc/arc-common.c (arc_option_optimization_table):
>   Remove malign-call.
Similarly.  No problem removing the code, but standard operating procedure is
to leave the option.  Also applies to patch 3/3 of this series.

jeff
>

Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

2020-03-03 Thread Delia Burduv


Hi,

I noticed that the patch doesn't apply cleanly. I fixed it and this is 
the latest version.


Thanks,
Delia

On 3/3/20 4:23 PM, Delia Burduv wrote:

Sorry, I forgot the attachment.

On 3/3/20 4:20 PM, Delia Burduv wrote:

Hi,

I made a mistake in the previous patch. This is the latest version. 
Please let me know if it is ok.


Thanks,
Delia

On 2/21/20 3:18 PM, Delia Burduv wrote:

Hi Kyrill,

The arm_bf16.h is only used for scalar operations. That is how the 
aarch64 versions are implemented too.


Thanks,
Delia

On 2/21/20 2:06 PM, Kyrill Tkachov wrote:

Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in the
AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output 
registers as

> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst{q}_bf16 as part of the BFloat16 extension.
>> 
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics) 


>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>> have commit rights, so if this is ok can someone please commit 
it for me?

>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>  (bfloat16x4x2_t): New typedef.
>>  (bfloat16x8x2_t): New typedef.
>>  (bfloat16x4x3_t): New typedef.
>>  (bfloat16x8x3_t): New typedef.
>>  (bfloat16x4x4_t): New typedef.
>>  (bfloat16x8x4_t): New typedef.
>>  (vst2_bf16): New.
>>  (vst2q_bf16): New.
>>  (vst3_bf16): New.
>>  (vst3q_bf16): New.
>>  (vst4_bf16): New.
>>  (vst4q_bf16): New.
>>  * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>  (VAR13): New.
>>  (arm_simd_types[Bfloat16x2_t]):New type.
>>  * config/arm/arm-modes.def (V2BF): New mode.
>>  * config/arm/arm-simd-builtin-types.def
>>  (Bfloat16x2_t): New entry.
>>  * config/arm/arm_neon_builtins.def
>>  (vst2): Changed to VAR13 and added v4bf, v8bf
>>  (vst3): Changed to VAR13 and added v4bf, v8bf
>>  (vst4): Changed to VAR13 and added v4bf, v8bf
>>  * config/arm/iterators.md (VDXBF): New iterator.
>>  (VQ2BF): New iterator.
>>  (V_elem): Added V4BF, V8BF.
>>  (V_sz_elem): Added V4BF, V8BF.
>>  (V_mode_nunits): Added V4BF, V8BF.
>>  (q): Added V4BF, V8BF.
>>  *config/arm/neon.md (vst2): Used new iterators.
>>  (vst3): Used new iterators.
>>  (vst3qa): Used new iterators.
>>  (vst3qb): Used new iterators.
>>  (vst4): Used new iterators.
>>  (vst4qa): Used new iterators.
>>  (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14  Delia Burduv 
>>
>>  * gcc.target/arm/simd/bf16_vstn_1.c: New test.


One thing I just noticed in this and the other arm bfloat16 patches...

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 
3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 
100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, 
float32x4_t __a, float32x4_t __b,

    return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+  bfloat16x4_t val[2];
+} bfloat16x4x2_t;


These should be in a new arm_bf16.h file that gets included in the 
main arm_neon.h file, right?

I believe the aarch64 versions are implemented that way.

Otherwise the patch looks good to me.
Thanks!
Kyrill


  +
+typedef struct bfloat16x8x2_t
+{
+  bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 4d31405cf6e09e3a61faa3e8142940bbdb23c60a..e0561c58fb3367876ce0164880df76f7331ec4e8 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -342,6 +342,7 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define v4bf_UP  E_V4BFmode
 #define v2si_UP  E_V2SImode
 #define v2sf_UP  E_V2SFmode
+#define v2bf_UP  E_V2BFmode
 #define di_UPE_DImode
 #define v16qi_UP E_V16QImode
 #define v8hi_UP  E_V8HImode
@@ -405,6 +406,9 @@ typedef struct {
 #define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
   VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, L)
+#define VAR13(T, N, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  V

Re: [PATCH v2 0/3] Introduce a new GCC option, --record-gcc-command-line

2020-03-03 Thread Egeyar Bagcioglu




On 3/3/20 3:44 PM, Egeyar Bagcioglu wrote:

In addition to the new test case, I built binutils as my test case after 
passing this option to CFLAGS. The added .GCC.command.line section of ld.bfd 
listed many compile commands as expected. Tested on x86_64-pc-linux-gnu.


As mentioned above, I used binutils as the sample project to compile 
with --record-gcc-command-line. I am attaching to this email the file 
produced by the command "readelf -p .GCC.command.line ld/ld-new  > 
ld.commandline.out". You can see in it, all the gcc invocations whose 
output is merged into ld.bfd.


Regards
Egeyar

String dump of section '.GCC.command.line':
  [ 0]  10.0.1 20200227 (experimental) : gcc -DHAVE_CONFIG_H -I. 
-I../../binutils-catools4/ld -I. -I../../binutils-catools4/ld -I../bfd 
-I../../binutils-catools4/ld/../bfd -I../../binutils-catools4/ld/../include 
-I../../binutils-catools4/ld/../zlib --record-gcc-command-line -DENABLE_PLUGINS 
-DLOCALEDIR="/home/egeyar/catools4repos/binutils-catools4-build/share/locale" 
-W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wstack-usage=262144 
-Werror -DELF_LIST_OPTIONS=TRUE -DELF_SHLIB_LIST_OPTIONS=TRUE 
-DELF_PLT_UNWIND_LIST_OPTIONS=TRUE --record-gcc-command-line -MT ldgram.o -MD 
-MP -MF .deps/ldgram.Tpo -c -o ldgram.o ldgram.c -Wno-error
  [   282]  10.0.1 20200227 (experimental) : gcc -DHAVE_CONFIG_H -I. 
-I../../binutils-catools4/ld -I. -I../../binutils-catools4/ld -I../bfd 
-I../../binutils-catools4/ld/../bfd -I../../binutils-catools4/ld/../include 
-I../../binutils-catools4/ld/../zlib --record-gcc-command-line -DENABLE_PLUGINS 
-DLOCALEDIR="/home/egeyar/catools4repos/binutils-catools4-build/share/locale" 
-W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wstack-usage=262144 
-Werror -DELF_LIST_OPTIONS=TRUE -DELF_SHLIB_LIST_OPTIONS=TRUE 
-DELF_PLT_UNWIND_LIST_OPTIONS=TRUE --record-gcc-command-line -MT 
ldlex-wrapper.o -MD -MP -MF .deps/ldlex-wrapper.Tpo -c -o ldlex-wrapper.o 
../../binutils-catools4/ld/ldlex-wrapper.c -Wno-error
  [   53b]  10.0.1 20200227 (experimental) : gcc -DHAVE_CONFIG_H -I. 
-I../../binutils-catools4/ld -I. -I../../binutils-catools4/ld -I../bfd 
-I../../binutils-catools4/ld/../bfd -I../../binutils-catools4/ld/../include 
-I../../binutils-catools4/ld/../zlib --record-gcc-command-line -DENABLE_PLUGINS 
-DLOCALEDIR="/home/egeyar/catools4repos/binutils-catools4-build/share/locale" 
-W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wstack-usage=262144 
-Werror -DELF_LIST_OPTIONS=TRUE -DELF_SHLIB_LIST_OPTIONS=TRUE 
-DELF_PLT_UNWIND_LIST_OPTIONS=TRUE --record-gcc-command-line -MT lexsup.o -MD 
-MP -MF .deps/lexsup.Tpo -c -o lexsup.o ../../binutils-catools4/ld/lexsup.c
  [   7cd]  10.0.1 20200227 (experimental) : gcc -DHAVE_CONFIG_H -I. 
-I../../binutils-catools4/ld -I. -I../../binutils-catools4/ld -I../bfd 
-I../../binutils-catools4/ld/../bfd -I../../binutils-catools4/ld/../include 
-I../../binutils-catools4/ld/../zlib --record-gcc-command-line -DENABLE_PLUGINS 
-DLOCALEDIR="/home/egeyar/catools4repos/binutils-catools4-build/share/locale" 
-W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wstack-usage=262144 
-Werror -DELF_LIST_OPTIONS=TRUE -DELF_SHLIB_LIST_OPTIONS=TRUE 
-DELF_PLT_UNWIND_LIST_OPTIONS=TRUE --record-gcc-command-line -MT ldlang.o -MD 
-MP -MF .deps/ldlang.Tpo -c -o ldlang.o ../../binutils-catools4/ld/ldlang.c
  [   a5f]  10.0.1 20200227 (experimental) : gcc -DHAVE_CONFIG_H -I. 
-I../../binutils-catools4/ld -I. -I../../binutils-catools4/ld -I../bfd 
-I../../binutils-catools4/ld/../bfd -I../../binutils-catools4/ld/../include 
-I../../binutils-catools4/ld/../zlib --record-gcc-command-line -DENABLE_PLUGINS 
-DLOCALEDIR="/home/egeyar/catools4repos/binutils-catools4-build/share/locale" 
-W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wstack-usage=262144 
-Werror -DELF_LIST_OPTIONS=TRUE -DELF_SHLIB_LIST_OPTIONS=TRUE 
-DELF_PLT_UNWIND_LIST_OPTIONS=TRUE --record-gcc-command-line -MT mri.o -MD -MP 
-MF .deps/mri.Tpo -c -o mri.o ../../binutils-catools4/ld/mri.c
  [   ce5]  10.0.1 20200227 (experimental) : gcc -DHAVE_CONFIG_H -I. 
-I../../binutils-catools4/ld -I. -I../../binutils-catools4/ld -I../bfd 
-I../../binutils-catools4/ld/../bfd -I../../binutils-catools4/ld/../include 
-I../../binutils-catools4/ld/../zlib --record-gcc-command-line -DENABLE_PLUGINS 
-DLOCALEDIR="/home/egeyar/catools4repos/binutils-catools4-build/share/locale" 
-W -Wall -Wstrict-prototypes -Wmissing-prototypes -Wshadow -Wstack-usage=262144 
-Werror -DELF_LIST_OPTIONS=TRUE -DELF_SHLIB_LIST_OPTIONS=TRUE 
-DELF_PLT_UNWIND_LIST_OPTIONS=TRUE --record-gcc-command-line -MT ldctor.o -MD 
-MP -MF .deps/ldctor.Tpo -c -o ldctor.o ../../binutils-catools4/ld/ldctor.c
  [   f77]  10.0.1 20200227 (experimental) : gcc -DHAVE_CONFIG_H -I. 
-I../../binutils-catools4/ld -I. -I../../binutils-catools4/ld -I../bfd 
-I../../binutils-catools4/ld/../bfd -I../../binutils-catools4/ld/../include 
-I../../binutils-catools4/ld/../zlib --record-gcc-comm

Re: [committed] Fix STATIC_CHAIN_REGNUM for v850 port

2020-03-03 Thread Hans-Peter Nilsson

On Sat, 29 Feb 2020, Jeff Law wrote:
>
> Wow, I think I wrote the v850 port back in circa 1997 and this bug has been
> latent all this time.  Vlad's IRA changes twiddled register allocation in just
> the right way to expose this bug.
>
> I'm not sure what I was thinking, but apparently I made a spectacularly bad
> choice for the STATIC_CHAIN_REGNUM in choosing a call-saved register (r20).
>
> It's simply wrong to use a call-saved register for the static chain.

Heh.  I did that mistake too, for CRIS. :/

A comment from RTH below my (incorrect) comment in cris.c above
cris_asm_trampoline_template alludes to there being an
ABI-neutral solution: "??? See the i386 regparm=3 implementation
that pushes the static chain value to the stack in the
trampoline, and uses a call-saved register when called
directly." ... but IIRC it either didn't apply for CRIS or I
didn't look into it thoroughly enough.  Or that's also buggy.

brgds, H-P
PS. Perhaps a doc update with a warning is a suitable penance? :)

Re: [PATCH] Wrap array in ctor with braces.

2020-03-03 Thread Jason Merrill


On 3/3/20 4:04 AM, Martin Liška wrote:

Hi.

The patch is about to silent a few clang warnings:

/home/marxin/Programming/gcc/gcc/cp/method.c:903:26: warning: suggest 
braces around initialization of subobject [-Wmissing-braces]

    { "partial_ordering", "equivalent", "greater", "less", "unordered" },
  ^~~~
  {   }
/home/marxin/Programming/gcc/gcc/cp/method.c:904:23: warning: suggest 
braces around initialization of subobject [-Wmissing-braces]

    { "weak_ordering", "equivalent", "greater", "less" },
   ^~~
   {  }
/home/marxin/Programming/gcc/gcc/cp/method.c:905:25: warning: suggest 
braces around initialization of subobject [-Wmissing-braces]

    { "strong_ordering", "equal", "greater", "less" }
     ^~
     { }

Ready to be installed?
Thanks,
Martin

gcc/cp/ChangeLog:

2020-03-03  Martin Liska  

 * method.c: Wrap array in ctor with braces in order
 to silent clang warnings.
---
  gcc/cp/method.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)




OK.

Re: [PATCH] c++: Fix non-constant TARGET_EXPR constexpr handing [PR93998]

2020-03-03 Thread Jason Merrill


On 3/3/20 8:03 AM, Jakub Jelinek wrote:

Hi!

We ICE on the following testcase since I've added the SAVE_EXPR-like
constexpr handling where the TARGET_EXPR initializer (and cleanup) is
evaluated only once (because it might have side-effects like new or delete
expressions in it).
The problem is if the TARGET_EXPR (but I guess in theory SAVE_EXPR too)
initializer is *non_constant_p.  We still remember the result, but already
not that it is *non_constant_p.  Normally that wouldn't be a big problem,
if something is *non_constant_p, we only or into it and so the whole
expression will be non-constant too.  Except in the builtins handling,
we try to evaluate the arguments with non_constant_p pointing into a dummy1
bool which we ignore.  This is because some builtins might fold into a
constant even if they don't have a constexpr argument.  Unfortunately if
we evaluate the TARGET_EXPR first in the argument of such a builtin and then
once again, we don't set *non_constant_p.

So, either we don't remember the TARGET_EXPR/SAVE_EXPR result if it wasn't
constant, like the following patch does, or we could remember it, but in
some way that would make it clear that it is non-constant (e.g. by
pushing into the global->values SAVE_EXPR, SAVE_EXPR entry and perhaps
for TARGET_EXPR don't remember it on TARGET_EXPR_SLOT, but the TARGET_EXPR
itself and similarly push TARGET_EXPR, TARGET_EXPR and if we see those
after the lookup, diagnose + set *non_constant_p.  Or we could perhaps
during the builtin argument evaluation push expressions into a different
save_expr vec and undo them afterwards.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2020-03-03  Jakub Jelinek  

PR c++/93998
* constexpr.c (cxx_eval_constant_expression)
: Don't record anything if
*non_constant_p is true.

* g++.dg/ext/pr93998.C: New test.

--- gcc/cp/constexpr.c.jj   2020-02-27 09:28:46.227958669 +0100
+++ gcc/cp/constexpr.c  2020-03-02 18:29:38.014333067 +0100
@@ -5474,9 +5474,10 @@ cxx_eval_constant_expression (const cons
r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 1),
false,
non_constant_p, overflow_p);
-  if (!*non_constant_p)
-   /* Adjust the type of the result to the type of the temporary.  */
-   r = adjust_temp_type (TREE_TYPE (t), r);
+  if (*non_constant_p)
+   break;
+  /* Adjust the type of the result to the type of the temporary.  */
+  r = adjust_temp_type (TREE_TYPE (t), r);
if (TARGET_EXPR_CLEANUP (t) && !CLEANUP_EH_ONLY (t))
ctx->global->cleanups->safe_push (TARGET_EXPR_CLEANUP (t));
r = unshare_constructor (r);
@@ -5528,6 +5529,8 @@ cxx_eval_constant_expression (const cons
{
  r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0), false,
non_constant_p, overflow_p);
+ if (*non_constant_p)
+   break;
  ctx->global->values.put (t, r);
  if (ctx->save_exprs)
ctx->save_exprs->safe_push (t);
--- gcc/testsuite/g++.dg/ext/pr93998.C.jj   2020-03-02 18:40:14.843965039 
+0100
+++ gcc/testsuite/g++.dg/ext/pr93998.C  2020-03-02 18:39:27.486661682 +0100
@@ -0,0 +1,14 @@
+// PR c++/93998
+// { dg-do compile { target c++11 } }
+
+struct C
+{
+  constexpr bool operator== (C x) const noexcept { return v == x.v; }
+  int v;
+};
+
+int
+foo (const C a, const C b, bool c)
+{
+  return __builtin_expect (!!(a == b || c), 1) ? 0 : 1;
+}

Jakub

Re: [PATCH] use all same precision in wide_int arguments (PR 93986)

2020-03-03 Thread Richard Biener

On March 3, 2020 4:39:34 PM GMT+01:00, Martin Sebor  wrote:
>On 3/3/20 2:42 AM, Richard Biener wrote:
>> On Tue, Mar 3, 2020 at 12:04 AM Martin Sebor 
>wrote:
>>>
>>> The wide_int APIs expect operands to have the same precision and
>>> abort when they don't.  This is especially insidious in code where
>>> the operands normally do have the same precision but where mixed
>>> precision arguments can come up as a result of unusual combinations
>>> optimization options.  That is also what precipitated pr93986.
>> 
>> If you want sth like (signed) arbitrary precision arithmetic then you
>can
>> use widest_int instead.  Or, since you're working with offsets,
>offset_int
>> is another good choice.
>
>Yes, I would much prefer not to have to do all this myself (and
>risk getting it wrong).  Unfortunately, the APIs that obtain
>the ranges all use wide_int, so I'd have to convert them one way
>or the other.  I could change some of the APIs but not all of
>them (e.g., get_range_info).

You can convert wide_int to both offset and widest int. 

Richard. 

>Martin
>
>
>> 
>>> The attached patch adjusts the code to extend all wide_int operands
>>> to the same precision to avoid the ICE.
>>>
>>> Besides the usual bootstrap/testing I also compiled all string tests
>>> in gcc.dg with the same options as in the test case in pr93986 in
>>> an effort to weed out any lingering bugs like it (found none).
>>>
>>> Martin

[PATCH] inliner: Copy DECL_BY_REFERENCE in copy_decl_to_var [PR93888]

2020-03-03 Thread Jakub Jelinek

Hi!

In the following testcase we emit wrong debug info for the karg
parameter in the DW_TAG_inlined_subroutine into main.
The problem is that the karg PARM_DECL is DECL_BY_REFERENCE and thus
in the IL has const K & type, but in the source just const K.
When the function is inlined, we create a VAR_DECL for it, but don't
set DECL_BY_REFERENCE, so when emitting DW_AT_location, we treat it like
a const K & typed variable, but it has DW_AT_abstract_origin which has
just the const K type and thus the debugger thinks the variable has
const K type.

Fixed by copying the DECL_BY_REFERENCE flag.  Not doing it in
copy_decl_for_dup_finish, because copy_decl_no_change already copies
that flag through copy_node and in copy_result_decl_to_var it is
undesirable, as we handle DECL_BY_REFERENCE in that case instead
by changing the type.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-03-03  Jakub Jelinek  

PR debug/93888
* tree-inline.c (copy_decl_to_var): Copy DECL_BY_REFERENCE flag.

* g++.dg/guality/pr93888.C: New test.

--- gcc/tree-inline.c.jj2020-02-07 19:11:57.444981885 +0100
+++ gcc/tree-inline.c   2020-03-03 13:27:57.811046011 +0100
@@ -5929,6 +5929,7 @@ copy_decl_to_var (tree decl, copy_body_d
   TREE_READONLY (copy) = TREE_READONLY (decl);
   TREE_THIS_VOLATILE (copy) = TREE_THIS_VOLATILE (decl);
   DECL_GIMPLE_REG_P (copy) = DECL_GIMPLE_REG_P (decl);
+  DECL_BY_REFERENCE (copy) = DECL_BY_REFERENCE (decl);
 
   return copy_decl_for_dup_finish (id, decl, copy);
 }
--- gcc/testsuite/g++.dg/guality/pr93888.C.jj   2020-03-03 13:38:16.273935942 
+0100
+++ gcc/testsuite/g++.dg/guality/pr93888.C  2020-03-03 13:40:15.890174469 
+0100
@@ -0,0 +1,24 @@
+// PR debug/93888
+// { dg-do run }
+// { dg-options "-g -fvar-tracking -fno-inline" }
+// { dg-skip-if "" { *-*-* }  { "*" } { "-O0" } }
+
+struct K
+{
+  K () {}
+  K (K const &rhs) { k[0] = 'C'; }
+  char k[8] = {'B','B','B','B','B','B','B','B'};
+};
+
+__attribute__((always_inline)) inline bool
+foo (const K karg)
+{
+  return karg.k[0] != 'C'; // { dg-final { gdb-test 16 "karg.k[0]" "'C'" } 
}
+}  // { dg-final { gdb-test 16 "karg.k[1]" "'B'" } 
}
+
+int
+main ()
+{
+  K x;
+  return foo (x);
+}

Jakub

[PATCH] sccvn: Avoid overflows in push_partial_def

2020-03-03 Thread Jakub Jelinek

Hi!

The following patch attempts to avoid dangerous overflows in the various
push_partial_def HOST_WIDE_INT computations.
This is achieved by performing the subtraction offset2i - offseti in
the push_partial_def function and before doing that doing some tweaks.
If a constant store (non-CONSTRUCTOR) is too large (perhaps just
hypothetical case), native_encode_expr would fail for it, but we don't
necessarily need to fail right away, instead we can treat it like
non-constant store and if it is already shadowed, we can ignore it.
Otherwise, if it at most 64-byte and the caller ensured that there is
a range overlap and push_partial_def ensures the load is at most 64-byte,
I think we should be fine, offset (relative to the load)
can be from -64*8+1 to 64*8-1 only and size at most 64*8, so no risks of
overflowing HOST_WIDE_INT computations.
For CONSTRUCTOR (or non-constant) stores, those can be indeed arbitrarily
large, the caller just checks that both the absolute offset and size fit
into signed HWI.  But, we store the same bytes in that case over and over
(both in the {} case where it is all 0, and in the hypothetical future case
where we handle in push_partial_def also memset (, 123, )), so we can tweak
the write range for our purposes.  For {} store we could just cap it at the
start offset and/or offset+size because all the bits are 0, but I wrote it
in anticipation of the memset case and so the relative offset can now be
down to -7 and similarly size can grow up to 64 bytes + 14 bits, all this
trying to preserve the offset difference % BITS_PER_UNIT or end as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I've tried to construct a testcase and came with
/* PR tree-optimization/93582 */
/* { dg-do compile { target lp64 } } */

union U { struct A { unsigned long long a : 1, b : 62, c : 1; } a; unsigned 
long long i; };

unsigned long long
foo (char *p)
{
  __builtin_memset (p - 0xfffULL, 0, 0xffeULL);
  __builtin_memset (p + 1, 0, 0xffeULL);
  union U *q = (union U *) (void *) (p - 4);
  q->a.b = -1;
  return q->i;
}
With this testcase, one can see signed integer overflows in the compiler
without the patch.  But unfortunately even with the patch it isn't optimized
as it should.  I believe the problem is in:
  gimple *def = SSA_NAME_DEF_STMT (ref2);
  if (is_gimple_assign (def)
  && gimple_assign_rhs_code (def) == POINTER_PLUS_EXPR
  && gimple_assign_rhs1 (def) == TREE_OPERAND (base, 0)
  && poly_int_tree_p (gimple_assign_rhs2 (def))
  && (wi::to_poly_offset (gimple_assign_rhs2 (def))
  << LOG2_BITS_PER_UNIT).to_shwi (&offset2))
{
where POINTER_PLUS_EXPR last operand has sizetype type, thus unsigned,
and in the testcase gimple_assign_rhs2 (def) is thus 0xf001ULL
which multiplied by 8 doesn't fit into signed HWI.  If it would be treated
as signed offset instead, it would fit (-0xfffLL, multiplied
by 8 is -0x7ff8LL).  Unfortunately with the poly_int obfuscation
I'm not sure how to convert it from unsigned to signed poly_int.

2020-03-03  Jakub Jelinek  

* tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): Add offseti
argument.  Change pd argument so that it can be modified.  Turn
constant non-CONSTRUCTOR store into non-constant if it is too large.
Adjust offset and size of CONSTRUCTOR or non-constant store to avoid
overflows.
(vn_walk_cb_data::vn_walk_cb_data, vn_reference_lookup_3): Adjust
callers.

--- gcc/tree-ssa-sccvn.c.jj 2020-03-03 11:20:52.761545034 +0100
+++ gcc/tree-ssa-sccvn.c2020-03-03 16:22:49.387657379 +0100
@@ -1716,7 +1716,7 @@ struct vn_walk_cb_data
else
  pd.offset = pos;
pd.size = tz;
-   void *r = push_partial_def (pd, 0, 0, prec);
+   void *r = push_partial_def (pd, 0, 0, 0, prec);
gcc_assert (r == NULL_TREE);
  }
pos += tz;
@@ -1733,8 +1733,9 @@ struct vn_walk_cb_data
   }
   ~vn_walk_cb_data ();
   void *finish (alias_set_type, alias_set_type, tree);
-  void *push_partial_def (const pd_data& pd,
- alias_set_type, alias_set_type, HOST_WIDE_INT);
+  void *push_partial_def (pd_data pd,
+ alias_set_type, alias_set_type, HOST_WIDE_INT,
+ HOST_WIDE_INT);
 
   vn_reference_t vr;
   ao_ref orig_ref;
@@ -1817,8 +1818,9 @@ pd_tree_dealloc (void *, void *)
on failure.  */
 
 void *
-vn_walk_cb_data::push_partial_def (const pd_data &pd,
+vn_walk_cb_data::push_partial_def (pd_data pd,
   alias_set_type set, alias_set_type base_set,
+  HOST_WIDE_INT offseti,
   HOST_WIDE_INT maxsizei)
 {
   const HOST_WIDE_INT bufsize = 64;
@@ -1831

[PATCH] Ada: gcc-interface: fixed assertion for aliased entities

2020-03-03 Thread Richard Wai

Hi,

 

Discovered this error when attempting to allocate from a
Root_Storage_Pool_With_Subpools derived type. If the type is derived in the
current compilation unit, and Allocate is not overridden on derivation (as
is typically the case with Root_Storage_Pool_With_Subpools), the entity for
Allocate for the derived type is then an alias to
System.Storage_Pools.Subpools.Allocate. When the allocator is built,
gnat_to_gnu_entity is called with definition == false for the derived
storage pool's allocate operation. An assertion is gnat_to_gnu_entity fails
in this case, since it is not a definition, and Is_Public is false. If the
storage pool type was instead derived in a different compilation unit, this
assertion is not triggered since the aliased entity has the Public property.

 

This patch adds an extra check in the assertion (decl.c: gnat_to_gnu_entity)
that the entity has the Aliased property. Also included a comment that
describes the special case as per the description above.

 

Bootstrapped and tested on x86_64-unknown-freebsd12.1 with no regressions.

 

 

diff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c

index 871a309ab7d..5ea930f4f65 100644

--- a/gcc/ada/gcc-interface/decl.c

+++ b/gcc/ada/gcc-interface/decl.c

@@ -447,6 +447,15 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree
gnu_expr, bool definition)

   /* If we get here, it means we have not yet done anything with this
entity.

  If we are not defining it, it must be a type or an entity that is
defined

 elsewhere or externally, otherwise we should have defined it already.
*/

+

+  /* One exception relates to an entity, typically an inherited operation,

+ which has an alias pointing to the parent's operation. Often
such an

+ aliased entity will also carry with it the Is_Public property
if it was

+ declared in a separate compilation unit, but when a type is
extended

+ within the current unit, the aliased entity will not pass this

+ assertion. It is neither defined (since it is an inherited
operation,

+ and is not Public, since it is within the current compilation
unit. */

+

   gcc_assert (definition

 || is_type

 || kind == E_Discriminant

@@ -454,6 +463,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree
gnu_expr, bool definition)

 || kind == E_Label

 || (kind == E_Constant && Present (Full_View
(gnat_entity)))

 || Is_Public (gnat_entity)

+   || Present (Alias (gnat_entity))

 || type_annotate_only);

   /* Get the name of the entity and set up the line number and filename of

 

 

Thanks,

 

Richard Wai

ANNEXI-STRAYLINE

[PATCH] tree-ssa-strlen: Fix up count_nonzero_bytes* [PR94015]

2020-03-03 Thread Jakub Jelinek

Hi!

As I said already yesterday in another PR, I'm afraid the mixing of apples
and oranges (what we are actually computing, whether what bytes are zero or
non-zero in the native representation of EXP itself or what EXP points to)
in a single function where it performs some handling which must be specific
to one or the other case unconditionally and only from time to time
determines something based on if nbytes is 0 or not will continue to bite us
again and again.
So, this patch performs at least a partial cleanup to separate those two
cases into two functions.
In addition to the separation, the patch uses e.g. ctor_for_folding so that
it does handle volatile loads properly and various other checks instead of
directly using DECL_INITIAL or does guard native_encode_expr call the way it
is guarded elsewhere (that host and target byte sizes are expected).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I've left other issues I found as is for now, like the *allnonnul being IMHO
wrongly computed (if we don't know anything about the bytes, such as if
_1 = MEM[s_2(D)];
MEM[whatever] = _1;
where nothing really is known about strlen(s) etc., the code right now
clears *nulterm and *allnul, but keeps *allnonnull set), but the callers
seem to never use that value for anything (so the question is why is it
computed and how exactly should it be defined).  Another thing I find quite
weird is the distinction between count_nonzero_bytes failing (return false)
and when it succeeds, but sets values to a don't know state (the warning is
only issued if it succeeds), plus what lenrange[2] is for.  The size of the
store should be visible already from the store statement.  Also the looking
at the type of the MEM_REF first operand to determine if it is is_char_store
is really weird, because both in user code and through sccvn where pointer
conversions are useless the type of the MEM_REF operand doesn't have to have
anything to do with what the code actually does.

2020-03-03  Jakub Jelinek  

PR tree-optimization/94015
* tree-ssa-strlen.c (count_nonzero_bytes): Split portions of the
function where EXP is address of the bytes being stored rather than
the bytes themselves into count_nonzero_bytes_addr.  Punt on zero
sized MEM_REF.  Use VAR_P macro and handle CONST_DECL like VAR_DECLs.
Use ctor_for_folding instead of looking at DECL_INITIAL.  Punt before
calling native_encode_expr if host or target doesn't have 8-bit
chars.  Formatting fixes.
(count_nonzero_bytes_addr): New function.

* gcc.dg/pr94015.c: New test.

--- gcc/tree-ssa-strlen.c.jj2020-03-03 07:57:22.324124042 +0100
+++ gcc/tree-ssa-strlen.c   2020-03-03 18:37:29.382722923 +0100
@@ -4585,6 +4585,11 @@ int ssa_name_limit_t::next_ssa_name (tre
   return 0;
 }
 
+static bool
+count_nonzero_bytes_addr (tree, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+ unsigned [3], bool *, bool *, bool *,
+ const vr_values *, ssa_name_limit_t &);
+
 /* Determines the minimum and maximum number of leading non-zero bytes
in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
to each.
@@ -4607,102 +4612,6 @@ count_nonzero_bytes (tree exp, unsigned
 bool *allnul, bool *allnonnul, const vr_values *rvals,
 ssa_name_limit_t &snlim)
 {
-  int idx = get_stridx (exp);
-  if (idx > 0)
-{
-  strinfo *si = get_strinfo (idx);
-  if (!si)
-   return false;
-
-  /* Handle both constant lengths as well non-constant lengths
-in some range.  */
-  unsigned HOST_WIDE_INT minlen, maxlen;
-  if (tree_fits_shwi_p (si->nonzero_chars))
-   minlen = maxlen = tree_to_shwi (si->nonzero_chars);
-  else if (nbytes
-  && si->nonzero_chars
-  && TREE_CODE (si->nonzero_chars) == SSA_NAME)
-   {
- const value_range_equiv *vr
-   = CONST_CAST (class vr_values *, rvals)
-   ->get_value_range (si->nonzero_chars);
- if (vr->kind () != VR_RANGE
- || !range_int_cst_p (vr))
-   return false;
-
-minlen = tree_to_uhwi (vr->min ());
-maxlen = tree_to_uhwi (vr->max ());
-   }
-  else
-   return false;
-
-  if (maxlen < offset)
-   return false;
-
-  minlen = minlen < offset ? 0 : minlen - offset;
-  maxlen -= offset;
-  if (maxlen + 1 < nbytes)
-   return false;
-
-  if (!nbytes
- && TREE_CODE (si->ptr) == SSA_NAME
- && !POINTER_TYPE_P (TREE_TYPE (si->ptr)))
-   {
- /* SI->PTR is an SSA_NAME with a DEF_STMT like
-  _1 = MEM  [(char * {ref-all})s_4(D)];  */
- gimple *stmt = SSA_NAME_DEF_STMT (exp);
- if (gimple_assign_single_p (stmt)
- && gimple_assign_rhs_code (stmt) == MEM_REF)
-   {
- tree rhs = gimple_assign_rhs1 (stmt);
- if (tree refsize

Re: [PATCH] Ada: gcc-interface: fixed assertion for aliased entities

2020-03-03 Thread Eric Botcazou

> Discovered this error when attempting to allocate from a
> Root_Storage_Pool_With_Subpools derived type. If the type is derived in the
> current compilation unit, and Allocate is not overridden on derivation (as
> is typically the case with Root_Storage_Pool_With_Subpools), the entity for
> Allocate for the derived type is then an alias to
> System.Storage_Pools.Subpools.Allocate. When the allocator is built,
> gnat_to_gnu_entity is called with definition == false for the derived
> storage pool's allocate operation. An assertion is gnat_to_gnu_entity fails
> in this case, since it is not a definition, and Is_Public is false. If the
> storage pool type was instead derived in a different compilation unit, this
> assertion is not triggered since the aliased entity has the Public property.

We need a testcase here, we cannot relax assertions without testcases.

> This patch adds an extra check in the assertion (decl.c: gnat_to_gnu_entity)
> that the entity has the Aliased property. Also included a comment that
> describes the special case as per the description above.

I don't really understand the new condition, did you forget to test Is_Public?

-- 
Eric Botcazou

RE: [PATCH] Ada: gcc-interface: fixed assertion for aliased entities

2020-03-03 Thread Richard Wai




Richard Wai
Managing Director
T. 416.316.9806


> -Original Message-
> From: Eric Botcazou 
> Sent: March 3, 2020 3:50 PM
> To: Richard Wai 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Ada: gcc-interface: fixed assertion for aliased
entities
> 
> > Discovered this error when attempting to allocate from a
> > Root_Storage_Pool_With_Subpools derived type. If the type is derived
> > in the current compilation unit, and Allocate is not overridden on
> > derivation (as is typically the case with
> > Root_Storage_Pool_With_Subpools), the entity for Allocate for the
> > derived type is then an alias to
> > System.Storage_Pools.Subpools.Allocate. When the allocator is built,
> > gnat_to_gnu_entity is called with definition == false for the derived
> > storage pool's allocate operation. An assertion is gnat_to_gnu_entity
> > fails in this case, since it is not a definition, and Is_Public is
> > false. If the storage pool type was instead derived in a different
> compilation unit, this assertion is not triggered since the aliased entity
has
> the Public property.
> 
> We need a testcase here, we cannot relax assertions without testcases.
> 

I'll have to look into this.. Any pointers? This assertion is not a language
rule assertion.

> > This patch adds an extra check in the assertion (decl.c:
> > gnat_to_gnu_entity) that the entity has the Aliased property. Also
> > included a comment that describes the special case as per the
description
> above.
> 
> I don't really understand the new condition, did you forget to test
Is_Public?
> 

As you see, the assertion being modified already tests for "Is_Public". So
the issue is precisely that this assertion wrongly fails for cases where the
entity is not Public. The specific case we ran into is where you have a
derived Root_Storage_Pool_With_Subpools type where Allocate (resp.
Deallocate) is inherited. If that derived type is anywhere except a package
specification, The N_Defining_Identifier for allocate for the derived
subpool will both be an alias to System.Storage_Pools.Subpools.Allocate, and
will NOT be Public. It will then cause this assertion to fail upon building
of the allocator or deallocator.

> --
> Eric Botcazou 

Richard

Fix Debug mode Undefined Behavior

2020-03-03 Thread François Dumont

After the fix of PR 91910 I tried to consider other possible race 
condition and I think we still have a problem.


Like stated in the PR when a container is destroyed all associated 
iterators are made singular. If at the same time another thread try to 
access this iterator the _M_singular check will face a data race when 
accessing _M_sequence member. In case of race condition the program is 
likely to abort but maybe because of memory access violation rather than 
a clear singular iterator assertion.


To avoid this I rework _M_sequence manipulation to use atomic read when 
necessary and make sure that otherwise container mutex is locked.


    * src/c++/debug.cc
    (_Safe_sequence_base::_M_attach_single): Set attached iterator
    sequence pointer and version.
    (_Safe_sequence_base::_M_detach_single): Reset detached 
iterator.
    (_Safe_iterator_base::_M_attach): Remove attached iterator 
sequence

    pointer and version assignments.
    (_Safe_iterator_base::_M_attach_single): Likewise.
    (_Safe_iterator_base::_M_detach_single): Remove detached 
iterator

    reset.
    (_Safe_iterator_base::_M_singular): Use atomic load to 
access parent

    sequence.
    (_Safe_iterator_base::_M_can_compare): Likewise.
    (_Safe_iterator_base::_M_get_mutex): Likewise.
    (_Safe_local_iterator_base::_M_attach): Remove attached 
iterator container

    pointer and version assignments.
    (_Safe_local_iterator_base::_M_attach_single): Likewise.
(_Safe_unordered_container_base::_M_attach_local_single):
    Set attached iterator container pointer and version.
(_Safe_unordered_container_base::_M_detach_local_single): Reset detached
    iterator.

Running tests in Debug mode.

Ok to commit if successful ?

François

diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index 18da9da9c52..711ba558eb2 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -318,6 +318,8 @@ namespace __gnu_debug
   _Safe_sequence_base::
   _M_attach_single(_Safe_iterator_base* __it, bool __constant) throw ()
   {
+__it->_M_sequence = this;
+__it->_M_version = _M_version;
 _Safe_iterator_base*& __its =
   __constant ? _M_const_iterators : _M_iterators;
 __it->_M_next = __its;
@@ -341,6 +343,7 @@ namespace __gnu_debug
   {
 // Remove __it from this sequence's list
 __it->_M_unlink();
+__it->_M_reset();
 if (_M_const_iterators == __it)
   _M_const_iterators = __it->_M_next;
 if (_M_iterators == __it)
@@ -355,11 +358,7 @@ namespace __gnu_debug
 
 // Attach to the new sequence (if there is one)
 if (__seq)
-  {
-	_M_sequence = __seq;
-	_M_version = _M_sequence->_M_version;
-	_M_sequence->_M_attach(this, __constant);
-  }
+  __seq->_M_attach(this, __constant);
   }
 
   void
@@ -370,11 +369,7 @@ namespace __gnu_debug
 
 // Attach to the new sequence (if there is one)
 if (__seq)
-  {
-	_M_sequence = __seq;
-	_M_version = _M_sequence->_M_version;
-	_M_sequence->_M_attach_single(this, __constant);
-  }
+  __seq->_M_attach_single(this, __constant);
   }
 
   void
@@ -400,10 +395,7 @@ namespace __gnu_debug
   _M_detach_single() throw ()
   {
 if (_M_sequence)
-  {
-	_M_sequence->_M_detach_single(this);
-	_M_reset();
-  }
+  _M_sequence->_M_detach_single(this);
   }
 
   void
@@ -419,20 +411,32 @@ namespace __gnu_debug
   bool
   _Safe_iterator_base::
   _M_singular() const throw ()
-  { return !_M_sequence || _M_version != _M_sequence->_M_version; }
+  {
+auto seq = __atomic_load_n(&_M_sequence, __ATOMIC_ACQUIRE);
+return !seq || _M_version != seq->_M_version;
+  }
 
   bool
   _Safe_iterator_base::
   _M_can_compare(const _Safe_iterator_base& __x) const throw ()
   {
-return (!_M_singular()
-	&& !__x._M_singular() && _M_sequence == __x._M_sequence);
+auto seq = __atomic_load_n(&_M_sequence, __ATOMIC_ACQUIRE);
+if (seq && _M_version == seq->_M_version)
+  {
+	auto xseq = __atomic_load_n(&__x._M_sequence, __ATOMIC_ACQUIRE);
+	return xseq && __x._M_version == xseq->_M_version && seq == xseq;
+  }
+
+return false;
   }
 
   __gnu_cxx::__mutex&
   _Safe_iterator_base::
   _M_get_mutex() throw ()
-  { return _M_sequence->_M_get_mutex(); }
+  {
+auto seq = __atomic_load_n(&_M_sequence, __ATOMIC_ACQUIRE);
+return get_safe_base_mutex(seq);
+  }
 
   _Safe_unordered_container_base*
   _Safe_local_iterator_base::
@@ -447,11 +451,8 @@ namespace __gnu_debug
 
 // Attach to the new container (if there is one)
 if (__cont)
-  {
-	_M_sequence = __cont;
-	_M_version = _M_sequence->_M_version;
-	_M_get_container()->_M_attach_local(this, __constant);
-  }
+  static_cast<_Safe_unordered_container_base*>(__cont)
+	->_M_attach_local(this, __constant);
   }
 
   void
@@ -462,11 +463,8 @@ namespace __gnu_d

Re: Fix Debug mode Undefined Behavior

2020-03-03 Thread Jonathan Wakely


On 03/03/20 22:11 +0100, François Dumont wrote:
After the fix of PR 91910 I tried to consider other possible race 
condition and I think we still have a problem.


Like stated in the PR when a container is destroyed all associated 
iterators are made singular. If at the same time another thread try to 
access this iterator the _M_singular check will face a data race when 
accessing _M_sequence member. In case of race condition the program is 
likely to abort but maybe because of memory access violation rather 
than a clear singular iterator assertion.


To avoid this I rework _M_sequence manipulation to use atomic read 
when necessary and make sure that otherwise container mutex is locked.


    * src/c++/debug.cc
    (_Safe_sequence_base::_M_attach_single): Set attached iterator
    sequence pointer and version.
    (_Safe_sequence_base::_M_detach_single): Reset detached 
iterator.
    (_Safe_iterator_base::_M_attach): Remove attached iterator 
sequence

    pointer and version assignments.
    (_Safe_iterator_base::_M_attach_single): Likewise.
    (_Safe_iterator_base::_M_detach_single): Remove detached 
iterator

    reset.
    (_Safe_iterator_base::_M_singular): Use atomic load to 
access parent

    sequence.
    (_Safe_iterator_base::_M_can_compare): Likewise.
    (_Safe_iterator_base::_M_get_mutex): Likewise.
    (_Safe_local_iterator_base::_M_attach): Remove attached 
iterator container

    pointer and version assignments.
    (_Safe_local_iterator_base::_M_attach_single): Likewise.
(_Safe_unordered_container_base::_M_attach_local_single):
    Set attached iterator container pointer and version.
(_Safe_unordered_container_base::_M_detach_local_single): Reset detached
    iterator.

Running tests in Debug mode.

Ok to commit if successful ?


I don't think we want to change this so close to the end of the GCC 10
cycle. Let's revisit it in a few weeks.

Re: [PATCH] PR libstdc++/91620 Implement DR 526 for std::[forward_]list::remove_if/unique

2020-03-03 Thread Jonathan Wakely


On 03/03/20 06:42 +0100, François Dumont wrote:

Hi

    Isn't it something to fix before gcc 10 release ?


No, I don't think so. It's not a regression.

Re: [PATCH 1/4] libstdc++: Fix use of is_nothrow_assignable_v in

2020-03-03 Thread Jonathan Wakely


On 03/03/20 11:30 -0500, Patrick Palka wrote:

We are passing a value type as the first argument to is_nothrow_assignable_v,
but the result of that is always false.  Since this predicate is a part of the
condition that guards the corresponding optimizations for these algorithms, this
bug means these optimizations are never used.  We should be passing a reference
type to is_nothrow_assignable_v instead.

libstdc++-v3/ChangeLog:

* include/bits/ranges_uninitialized.h
(uninitialized_copy_fn::operator()): Pass a reference type as the first
argument to is_nothrow_assignable_v.
(uninitialized_copy_fn::operator()): Likewise.
(uninitialized_move_fn::operator()): Likewise.  Return an in_out_result
with the input iterator stripped of its move_iterator.
(uninitialized_move_n_fn::operator()): Likewise.
(uninitialized_fill_fn::operator()): Pass a reference type as the first
argument to is_nothrow_assignable_v.
(uninitialized_fill_n_fn::operator()): Likewise.


OK.

Re: [PATCH 2/4] libstdc++: Add a move-only testsuite iterator type

2020-03-03 Thread Jonathan Wakely


On 03/03/20 11:30 -0500, Patrick Palka wrote:

This adds a move-only testsuite iterator type to , which
will be used in the tests that verify LWG 3355 and has already seen a need in
the tests for LWG 3389 and 3390.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_iterators.h (input_iterator_wrapper_nocopy):
New testsuite iterator.
* testsuite/24_iterators/counted_iterator/lwg3389.cc: Use it.
* testsuite/24_iterators/move_iterator/lwg3390.cc: Likewise.


OK, thanks.

Re: GLIBC libmvec status

2020-03-03 Thread GT

‐‐‐ Original Message ‐‐‐
On Monday, March 2, 2020 4:59 PM, Jakub Jelinek  wrote:

> Indeed, there aren't any yet on the vectorizer side, I thought I've 
> implemented it
> already in the vectorizer but apparently didn't, just the omp-simd-clone.c 
> part is
> implemented (the more important part, as it matters for the ABI).

What is in omp-simd-clone.c? What is missing from the vectorizer? My assumption 
was that
the implementation of vector function masking was complete, but no test was 
created to
verify the functionality.

> A testcase could
> be something along the lines of
> #pragma omp declare simd
> int foo (int, int);
>
> void
> bar (int *a, int *b, int *c)
> {
> #pragma omp simd
> for (int i = 0; i < 1024; i++)
> {
> int d = b[i], e = c[i], f;
> if (b[i] < 20)
> f = foo (d, e);
> else
> f = d + e;
> }
> }

I thought the test would be more like:

#pragma omp declare simd
int
foo (int *x, int *y)
{
  *y = *x + 2;
}

void
bar (int *a, float *b, float *c)
{
#pragma omp simd
for (int i = 0; i < 1024; i++)
{
int d = b[i], e = c[i], f;
if ( i % 2)
  f = foo (d, e);
}
}

The point being that only items at odd indices are updated. That would require
masking to avoid altering items at even indices.

Bert.

Re: [PATCH] rs6000: Fix -mpower9-vector -mno-altivec ICE (PR87560)

2020-03-03 Thread Segher Boessenkool

Hi!

On Mon, Mar 02, 2020 at 04:01:11PM -0600, Bill Schmidt wrote:
> PR87560   reports an ICE when a test case is compiled with -mpower9-vector
> and -mno-altivec.  This   patch terminates compilation with an error when
> this combination (and other unreasonable ones) are requested.

(If this is your commit message: there are a lot of stray tabs in it).

> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Reported error is now:
> 
> f951: Error: '-mno-altivec' turns off '-mpower9-vector'

I'm not happy at all with this mechanism, it is yet another place we
put this same info down.  It should be possible to just handle this in
rs6000.opt .  But, this patch just adds an extra entry.

> Is this okay for master, and for backport to releases/gcc-9 after the
> 9.3 release?  There's no urgency in getting this in 9.3.

Okay for both.  Thanks!

One thing: please add a PR tag to the changelog:

> 2020-03-02  Bill Schmidt  

PR target/87560

> * rs6000-cpus.def (OTHER_ALTIVEC_MASKS): New #define.
> * rs6000.c (rs6000_disable_incompatible_switches): Add table entry
>   for OPTION_MASK_ALTIVEC.

(and indent the rest with tabs as well?)


Segher

Re: [PATCH] PR libstdc++/91620 Implement DR 526 for std::[forward_]list::remove_if/unique

2020-03-03 Thread Jonathan Wakely


On 03/03/20 21:32 +, Jonathan Wakely wrote:

On 03/03/20 06:42 +0100, François Dumont wrote:

Hi

    Isn't it something to fix before gcc 10 release ?


No, I don't think so. It's not a regression.


(And is not experimental C++20 stuff, and is not just changing tests
or docs).

Re: [PATCH 3/4] libstdc++: Add a test range type that has a sized sentinel

2020-03-03 Thread Jonathan Wakely


On 03/03/20 11:30 -0500, Patrick Palka wrote:

This adds a test range type whose end() is a sized sentinel to
, which will be used in the tests that verify LWG 3355.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_iterators.h (test_range::get_iterator): Make
protected instead of private.
(test_sized_range_sized_sent): New.
---
.../testsuite/util/testsuite_iterators.h  | 32 +++
1 file changed, 32 insertions(+)

diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h 
b/libstdc++-v3/testsuite/util/testsuite_iterators.h
index e47b2b03e40..756940ed092 100644
--- a/libstdc++-v3/testsuite/util/testsuite_iterators.h
+++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h
@@ -735,6 +735,7 @@ namespace __gnu_test
  { return i.ptr - s.end; }
};

+protected:
  auto
  get_iterator(T* p)
  {
@@ -812,6 +813,37 @@ namespace __gnu_test
using test_output_sized_range
  = test_sized_range;

+  // A type meeting the minimum std::sized_range requirements, and whose end()
+  // returns a size sentinel.


s/size/sized/ here, no?

OK for master.

Re: [PATCH 4/4] libstdc++: Move-only input iterator support in algorithms (LWG 3355)

2020-03-03 Thread Jonathan Wakely


On 03/03/20 11:30 -0500, Patrick Palka wrote:

This adds support for move-only input iterators in the ranges::unitialized_*
algorithms defined in , as per LWG 3355.  The only changes needed are to
add calls to std::move in the appropriate places and to use operator-() instead
of ranges::distance() because the latter cannot be used with a move-only
iterator with a sized sentinel as is the case here.  (This issue with
ranges::distance is LWG 3392.)

libstdc++-v3/ChangeLog:

LWG 3355 The memory algorithms should support move-only input iterators
introduced by P1207
* include/bits/ranges_uninitialized.h
(__uninitialized_copy_fn::operator()): Use std::move to avoid attempting
to copy __ifirst, which could be a move-only input iterator.  Use
operator- instead of ranges::distance to compute distance from a sized
sentinel.
(__uninitialized_copy_n_fn::operator()): Likewise.
(__uninitialized_move_fn::operator()): Likewise.
(__uninitialized_move_n_fn::operator()): Likewise.
(__uninitialized_destroy_fn::operator()): Use std::move to avoid
attempting to copy __first.
(__uninitialized_destroy_n_fn::operator()): Likewise.
* testsuite/20_util/specialized_algorithms/destroy/constrained.cc:
Augment test.
* .../specialized_algorithms/uninitialized_copy/constrained.cc:
Likewise.
* .../specialized_algorithms/uninitialized_move/constrained.cc:
Likewise.


OK, thanks.

[PATCH] libstdc++: Fix incorrect use of memset in ranges::fill_n (PR 94017)

2020-03-03 Thread Patrick Palka

When deciding whether to perform the memset optimization in ranges::fill_n, we
were crucially neglecting to check whether the output pointer's value type is a
byte type.  This patch adds such a check to the problematic condition in
ranges::fill_n.

I think the __is_byte<_Tp>::__value check, which checks that the fill type is a
byte type, is too restrictive.  It means that we won't enable the memset
optimization in the following example:

  char c[100];
  ranges::fill(c, 37);

since the fill type is deduced to be int here.  It seems we could get away
with instead just checking that _Tp is an integral type; I've added a TODO
about this in the code.

libstdc++-v3/ChangeLog:

PR libstdc++/94017
* include/bits/ranges_algobase.h (__fill_n_fn::operator()): Fix
condition for when to use memset.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc:
New test.
* 
testsuite/20_util/specialized_algorithms/uninitialized_fill_n/94017.cc:
New test.
* testsuite/25_algorithms/fill/94017.cc: New test.
* testsuite/25_algorithms/fill_n/94017.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algobase.h   |  9 ++-
 .../uninitialized_fill/94017.cc   | 74 +++
 .../uninitialized_fill_n/94017.cc | 74 +++
 .../testsuite/25_algorithms/fill/94017.cc | 73 ++
 .../testsuite/25_algorithms/fill_n/94017.cc   | 73 ++
 5 files changed, 301 insertions(+), 2 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/94017.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill/94017.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill_n/94017.cc

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index feb6c5723dd..35309986e53 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -531,8 +531,13 @@ namespace ranges
if (__n <= 0)
  return __first;
 
-   // TODO: is __is_byte the best condition?
-   if constexpr (is_pointer_v<_Out> && __is_byte<_Tp>::__value)
+   // TODO: Generalize this optimization to contiguous iterators.
+   if constexpr (is_pointer_v<_Out>
+ // Note that __is_byte already implies !is_volatile.
+ && __is_byte>::__value
+ // TODO: Can we relax this next condition to just
+ // integral<_Tp>?
+ && __is_byte<_Tp>::__value)
  {
__builtin_memset(__first, static_cast(__value), __n);
return __first + __n;
diff --git 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc
 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc
new file mode 100644
index 000..c039935d78e
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc
@@ -0,0 +1,74 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+#include 
+#include 
+#include 
+#include 
+
+using __gnu_test::test_output_range;
+
+namespace ranges = std::ranges;
+
+template
+void
+test01()
+{
+{
+  Out x[5];
+  ranges::uninitialized_fill(x, value);
+  VERIFY( ranges::count(x, static_cast(value)) == ranges::size(x) );
+}
+
+{
+  Out x[5];
+  test_output_range rx(x);
+  ranges::uninitialized_fill(x, value);
+  VERIFY( ranges::count(x, static_cast(value)) == ranges::size(x) );
+}
+}
+
+int
+main()
+{
+  test01();
+  test01();
+  test01();
+  test01();
+  test01();
+
+  test01();
+  test01();
+  test01();
+
+  test01();
+  test01();
+  test01();
+
+  test01();
+  test01();
+  test01();
+  test01();
+
+  test01();
+  test01();
+  test01();
+  test01();
+}
diff --git 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/94017.cc
 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/94017.cc
ne

Re: [PATCH] libstdc++: Fix incorrect use of memset in ranges::fill_n (PR 94017)

2020-03-03 Thread Patrick Palka

On Tue, 3 Mar 2020, Patrick Palka wrote:

> When deciding whether to perform the memset optimization in ranges::fill_n, we
> were crucially neglecting to check whether the output pointer's value type is 
> a
> byte type.  This patch adds such a check to the problematic condition in
> ranges::fill_n.
> 
> I think the __is_byte<_Tp>::__value check, which checks that the fill type is 
> a
> byte type, is too restrictive.  It means that we won't enable the memset
> optimization in the following example:
> 
>   char c[100];
>   ranges::fill(c, 37);
> 
> since the fill type is deduced to be int here.  It seems we could get away
> with instead just checking that _Tp is an integral type; I've added a TODO
> about this in the code.

Here's v2 of the patch which actually replaces the aforementioned
conservative condition with integral<_Tp>, following discussion on IRC.

-- >8 --

Subject: [PATCH] libstdc++: Fix incorrect use of memset in ranges::fill_n (PR
 94017)

When deciding whether to perform the memset optimization in ranges::fill_n, we
were crucially neglecting to check whether the output pointer's value type is a
byte type.  This patch adds such a check to the problematic condition in
ranges::fill_n.

At the same time, this patch relaxes the overly conservative
__is_byte<_Tp>::__value check that requires the fill type be a byte
type.  It's overly conservative because it means we won't enable the
memset optimization in the following example

  char c[100];
  ranges::fill(c, 37);

because the fill type is deduced to be int here.  Rather than requiring that the
fill type be a byte type, it seems safe to instead require the fill type be
an integral type, which is what this patch does.

libstdc++-v3/ChangeLog:

PR libstdc++/94017
* include/bits/ranges_algobase.h (__fill_n_fn::operator()): Refine
condition for when to use memset, making sure to additionally check that
the output pointer's value type is a non-volatile byte type.  Instead of
requiring that the fill type is a byte type, just require that it's an
integral type.
* testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc:
New test.
* 
testsuite/20_util/specialized_algorithms/uninitialized_fill_n/94017.cc:
New test.
* testsuite/25_algorithms/fill/94013.cc: Uncomment part of test that was
blocked by PR 94017.
* testsuite/25_algorithms/fill/94017.cc: New test.
* testsuite/25_algorithms/fill_n/94017.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algobase.h   |  7 +-
 .../uninitialized_fill/94017.cc   | 77 +++
 .../uninitialized_fill_n/94017.cc | 77 +++
 .../testsuite/25_algorithms/fill/94013.cc |  5 +-
 .../testsuite/25_algorithms/fill/94017.cc | 76 ++
 .../testsuite/25_algorithms/fill_n/94017.cc   | 76 ++
 6 files changed, 313 insertions(+), 5 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill_n/94017.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill/94017.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill_n/94017.cc

diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index c0102f5ab11..80c9a774301 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -516,8 +516,11 @@ namespace ranges
if (__n <= 0)
  return __first;
 
-   // TODO: is __is_byte the best condition?
-   if constexpr (is_pointer_v<_Out> && __is_byte<_Tp>::__value)
+   // TODO: Generalize this optimization to contiguous iterators.
+   if constexpr (is_pointer_v<_Out>
+ // Note that __is_byte already implies !is_volatile.
+ && __is_byte>::__value
+ && integral<_Tp>)
  {
__builtin_memset(__first, static_cast(__value), __n);
return __first + __n;
diff --git 
a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc
 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc
new file mode 100644
index 000..1686d1ba8d5
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/94017.cc
@@ -0,0 +1,77 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied war

Re: [PATCH], PR target/93937, Fix variable vec_extract insn that will never match

2020-03-03 Thread Segher Boessenkool

On Mon, Mar 02, 2020 at 07:41:42PM -0500, Michael Meissner wrote:
> On Fri, Feb 28, 2020 at 06:45:25AM -0600, Segher Boessenkool wrote:
> > On Fri, Feb 28, 2020 at 12:32:06AM -0500, Michael Meissner wrote:
> > > There is a wider issue to optimize all cases of vec_extract to do the 
> > > sign,
> > > zero, and float extension automatically when we are loading from memory, 
> > > which
> > > is PR target/93230.  I have patches for all of the cases for 93230, but 
> > > they
> > > will need to wait until GCC 11 opens up.
> > 
> > If you don't use reload_completed in the split condition you do not have
> > this problem (in the normal case).  Please work on that?
> 
> No.  I tend to think that if we do the split before reload, that it will cause
> some regressions, because the register allocator will take the opportunity to
> change loads to vector registers to be loads to GPRs and direct moves.

It will cause better optimisations, yes.  Earlier optimisations.

The compiler will use a different register set if it thinks that is
cheaper to do.  We need to get that right *anyway*, because it is used
from many more places.

> One of
> the original motivations for some of these patches is to avoid direct moves.

You only need to do all of this manually because you split after reload.
That is only a good thing to do if you *have* to, usually because some
datum can end up in memory or in a register, and those are significantly
differently for the machine code you need.

This is not true here, because you do *not* allow both regs and mem (in
the normal case).  If you split earlier, you do not have to do all the
main rtl optimisations (say cprop, fwprop, combine, insn selection in
general) manually, as you do have to do to not get terrible code if you
split after reload.

> I also worry that things like having to use SUBREG's before RA (instead of 
> just
> changing the mode and/or the register number that we can do after reload) will
> not work because generally vectors and scalars aren't tieable.

You have pseudos before reload.  Subregs work fine.

If they weren't tieable you could not do this on hard regs either, so I
don't see your point at all here?

Segher

Re: [PATCH] re PR tree-optimization/90883 (Generated code is worse if returned struct is unnamed)

2020-03-03 Thread Jim Wilson

On Mon, Mar 2, 2020 at 11:34 PM Kito Cheng  wrote:
> PR tree-optimization/90883
> * g++.dg/tree-ssa/pr90883.c: Add --param max-inline-insns-size=1.
> Add aarch64-*-* mips*-*-* to XFAIL.

This looks good to me.

Jim

Re: [PATCH] PR target/93995 ICE in patch_jump_insn, at cfgrtl.c:1290 on riscv64-linux-gnu

2020-03-03 Thread Jim Wilson

On Tue, Mar 3, 2020 at 12:03 AM Kito Cheng  wrote:
> gcc/ChangeLog
> * config/riscv/riscv.c (riscv_emit_float_compare): Using NE to compare
> the result of IOR.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/pr93995.c: New.

Thanks.  This looks good to me.

Jim

Re: [PATCH] libstdc++: Fix incorrect use of memset in ranges::fill_n (PR 94017)

2020-03-03 Thread Jonathan Wakely


On 03/03/20 17:13 -0500, Patrick Palka wrote:

On Tue, 3 Mar 2020, Patrick Palka wrote:


When deciding whether to perform the memset optimization in ranges::fill_n, we
were crucially neglecting to check whether the output pointer's value type is a
byte type.  This patch adds such a check to the problematic condition in
ranges::fill_n.

I think the __is_byte<_Tp>::__value check, which checks that the fill type is a
byte type, is too restrictive.  It means that we won't enable the memset
optimization in the following example:

  char c[100];
  ranges::fill(c, 37);

since the fill type is deduced to be int here.  It seems we could get away
with instead just checking that _Tp is an integral type; I've added a TODO
about this in the code.


Here's v2 of the patch which actually replaces the aforementioned
conservative condition with integral<_Tp>, following discussion on IRC.


OK for master, thanks.

Re: [PATCH] Clear --help=language and --help=common interaction.

2020-03-03 Thread Joseph Myers

On Tue, 3 Mar 2020, Martin Liška wrote:

> On 3/2/20 11:52 PM, Joseph Myers wrote:
> > On Mon, 2 Mar 2020, Martin Liška wrote:
> > 
> > > +version of GCC@.  If an option is supported by all languages, one needs
> > > +to use @var{common} qualifier instead.
> > 
> > "common" is literal text, so it should be @samp{common} not @var{common},
> > and the existing documentation here describes it as a "class" with other
> > things such as "undocumented" or "joined" being "qualifiers"
> > 
> 
> Thank you for the comments. I've got an updated version of the patch.

This version is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH v2 0/3] Introduce a new GCC option, --record-gcc-command-line

2020-03-03 Thread Joseph Myers

On Tue, 3 Mar 2020, Egeyar Bagcioglu wrote:

> Although we discussed after the submission of the first version that 
> there are several other options performing similar tasks, I believe we 
> established that there is still a need for this specific functionality. 
> Therefore, I am skipping in this email the comparison between this 
> option and the existing options with similarities.

We're now using git-style commit messages with self-contained explanation 
/ justification of the change being committed.

This means that one of the commit messages (not just message 0, whose 
contents don't go in a commit message) for an individual patch should have 
the explanation, which should include the self-contained justification by 
reference to comparison with other existing similar options.  People 
should be able to find the relevant information in the commit without 
needing to search the list archives for reviews of a previous patch 
version.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH v2 0/3] Introduce a new GCC option, --record-gcc-command-line

2020-03-03 Thread Fangrui Song


On 2020-03-03, Joseph Myers wrote:

On Tue, 3 Mar 2020, Egeyar Bagcioglu wrote:


Although we discussed after the submission of the first version that
there are several other options performing similar tasks, I believe we
established that there is still a need for this specific functionality.
Therefore, I am skipping in this email the comparison between this
option and the existing options with similarities.


Mentioning -frecord-gcc-switches will be much appreciated.

How is the new .GCC.command.line different?

Does it still have the SHF_MERGE | SHF_STRINGS flag?
If you change the flags, the .GCC.command.line section may not play with
another object file (generated by -frecord-gcc-switches) whose 
.GCC.command.line is
SHF_MERGE | SHF_STRINGS.

When both -frecord-gcc-switches and --record-command-line are specified,
is it an error?


We're now using git-style commit messages with self-contained explanation
/ justification of the change being committed.

This means that one of the commit messages (not just message 0, whose
contents don't go in a commit message) for an individual patch should have
the explanation, which should include the self-contained justification by
reference to comparison with other existing similar options.  People
should be able to find the relevant information in the commit without
needing to search the list archives for reviews of a previous patch
version.

Re: [PATCH] use all same precision in wide_int arguments (PR 93986)

2020-03-03 Thread Martin Sebor


On 3/3/20 11:50 AM, Richard Biener wrote:

On March 3, 2020 4:39:34 PM GMT+01:00, Martin Sebor  wrote:

On 3/3/20 2:42 AM, Richard Biener wrote:

On Tue, Mar 3, 2020 at 12:04 AM Martin Sebor 

wrote:


The wide_int APIs expect operands to have the same precision and
abort when they don't.  This is especially insidious in code where
the operands normally do have the same precision but where mixed
precision arguments can come up as a result of unusual combinations
optimization options.  That is also what precipitated pr93986.


If you want sth like (signed) arbitrary precision arithmetic then you

can

use widest_int instead.  Or, since you're working with offsets,

offset_int

is another good choice.


Yes, I would much prefer not to have to do all this myself (and
risk getting it wrong).  Unfortunately, the APIs that obtain
the ranges all use wide_int, so I'd have to convert them one way
or the other.  I could change some of the APIs but not all of
them (e.g., get_range_info).


You can convert wide_int to both offset and widest int.


Yes, I realize that.  But it seems like six of one vs half a dozen
of the other.  Either way some variables need converting.  I don't
really have a preference for either approach.  I just copied
the solution I already used in gimple_call_alloc_size, and I chose
that one there because the get_range calls take wide_int, and
because gimple_call_alloc_size's caller (the function I'm changing
now) also uses wide_int.

Everything could be changed to widest_int instead but I'm not sure
it would make sense (e.g., get_range_info).  And unless everything
is changed, the APIs that interoperate need to convert between one
another.

I went ahead and rewrote the patch to use widest_int.  It let me get
rid of some conversions but it introduced others.  Most of them look
pretty much the same between the two approaches but there are more
of them with widest_int because of the extra variables.  The updated
patch is also about one and half times longer.

Is one approach significantly better than the other or were you just
pointing out another way of doing it?

Either way, please choose one and approve.

Thanks
Martin



Richard.


Martin





The attached patch adjusts the code to extend all wide_int operands
to the same precision to avoid the ICE.

Besides the usual bootstrap/testing I also compiled all string tests
in gcc.dg with the same options as in the test case in pr93986 in
an effort to weed out any lingering bugs like it (found none).

Martin




PR tree-optimization/93986 - ICE on mixed-precision wide_int arguments

gcc/testsuite/ChangeLog:

PR tree-optimization/93986
* gcc.dg/pr93986.c: New test.

gcc/ChangeLog:

PR tree-optimization/93986
* tree-ssa-strlen.c (maybe_warn_overflow): Convert all wide_int
operands to the same precision widest_int to avoid ICEs.

diff --git a/gcc/testsuite/gcc.dg/pr93986.c b/gcc/testsuite/gcc.dg/pr93986.c
new file mode 100644
index 000..bdbc192a01d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr93986.c
@@ -0,0 +1,16 @@
+/* PR tree-optimization/93986 - ICE in decompose, at wide-int.h:984
+   { dg-do compile }
+   { dg-options "-O1 -foptimize-strlen -ftree-slp-vectorize" } */
+
+int dd (void);
+
+void ya (int cm)
+{
+  char s2[cm];
+
+  s2[cm-12] = s2[cm-11] = s2[cm-10] = s2[cm-9]
+= s2[cm-8] = s2[cm-7] = s2[cm-6] = s2[cm-5] = ' ';
+
+  if (dd ())
+__builtin_exit (0);
+}
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index b76b54efbd8..356aada5fe0 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -1924,11 +1924,21 @@ maybe_warn_overflow (gimple *stmt, tree len,
   if (TREE_NO_WARNING (dest))
 return;
 
+  /* Use maximum precision to avoid overflow in the addition below.
+ Make sure all operands have the same precision to keep wide_int
+ from ICE'ing.  */
+
+  /* Convenience constants.  */
+  const widest_int diff_min
+= wi::to_widest (TYPE_MIN_VALUE (ptrdiff_type_node));
+  const widest_int diff_max
+= wi::to_widest (TYPE_MAX_VALUE (ptrdiff_type_node));
+  const widest_int size_max
+= wi::to_widest (TYPE_MAX_VALUE (size_type_node));
+
   /* The offset into the destination object computed below and not
  reflected in DESTSIZE.  */
-  wide_int offrng[2];
-  const int off_prec = TYPE_PRECISION (ptrdiff_type_node);
-  offrng[0] = offrng[1] = wi::zero (off_prec);
+  widest_int offrng[2] = { 0, 0 };
 
   if (!si)
 {
@@ -1941,15 +1951,17 @@ maybe_warn_overflow (gimple *stmt, tree len,
 	 ARRAY_REF (MEM_REF (vlaptr, 0), N].  */
 	  tree off = TREE_OPERAND (ref, 1);
 	  ref = TREE_OPERAND (ref, 0);
-	  if (get_range (off, offrng, rvals))
+	  wide_int rng[2];
+	  if (get_range (off, rng, rvals))
 	{
-	  offrng[0] = offrng[0].from (offrng[0], off_prec, SIGNED);
-	  offrng[1] = offrng[1].from (offrng[1], off_prec, SIGNED);
+	  /* Convert offsets to the expected precision.  */
+	  offrng[0] = widest_int::from (rng[0], SIGNED);
+

Re: [PATCH] [rs6000] Fix a wrong GC issue

2020-03-03 Thread Segher Boessenkool

Hi!

On Tue, Mar 03, 2020 at 09:40:47AM -0600, Bin Bin Lv wrote:
> The source file rs6000.c was split up into several smaller source files
> through commit 1acf024.  However, variable "altivec_builtin_mask_for_load" and
> "builtin_mode_to_type[MAX_MACHINE_MODE][2]" were marked with the wrong syntax
> "GTY(([options])) type name", which led these two variables were not marked as
> roots correctly and wrongly GCed.  And when "altivec_builtin_mask_for_load"
> was wrongly GCed, the compiling for openJDK is failed with ICEs enabling
> precompiled header under mcpu=power7.  So roots must be declared using one of
> the following syntaxes: "extern GTY(([options])) type name;" and "static
> GTY(([options])) type name;".
> 
> And the following patch adds variable "altivec_builtin_mask_for_load" and
> "builtin_mode_to_type[MAX_MACHINE_MODE][2]" into the roots array.

> 2020-03-03  Bin Bin Lv  
> 
>   * config/rs6000/rs6000-internal.h (altivec_builtin_mask_for_load,
>   builtin_mode_to_type[MAX_MACHINE_MODE][2]): Remove GTY(()).
>   * config/rs6000/rs6000.h (altivec_builtin_mask_for_load,
>   builtin_mode_to_type[MAX_MACHINE_MODE][2]): Add an extern GTY(())
>   declaration.

Why in both of the header files?  Can you just remove the declaration
from rs6000-internal.h?

>   * config/rs6000/rs6000.h (MAX_MACHINE_MODE): Include the header file
>   for MAX_MACHINE_MODE.

The changelog entry should say *what* file is included, and under what
condition.  It doesn't have to say why (that belongs in the commit
message).

But, can't you just include it unconditionally?  Don't we already,
anyway, via coretypes.h -> machmode.h -> insn-modes.h?


Segher

Re: Remove unnecessary XFAILs from existing testcase 20050603-3.c

2020-03-03 Thread Segher Boessenkool

Hi!

On Tue, Mar 03, 2020 at 08:59:52AM -0600, will schmidt wrote:
> Remove unnecessary XFAILs from existing testcase 20050603-3.c.
> 
>   The XFAILs in this testcase (20050603-3.c) are no longer necessary
>   since the fix to PR68803 was committed with svn revision r242681.
> 
> OK for master?

Yes please.  Thanks!

(On 9 branch as well, if it is a fix there?  If it misses 9.3, that's
not a big deal of course, but if it is tested before the RC spins, it
is a very safe patch).

Segher

Re: Add dg-require to existing powerpc/pr93122.c test

2020-03-03 Thread Segher Boessenkool

Hi!

On Tue, Mar 03, 2020 at 09:42:10AM -0600, will schmidt wrote:
>   This test (gcc.target/powerpc/pr93122.c) uses the
> -mcpu=future option.  It should also ensure the
> target can support the same.
> Thus, add a dg-requires clause to indicate
> a future target is supported on the platform.
> 
> Sniff tested successfully.  (mostly obvious).
> 
> OK for master?

Okay for trunk.  Thanks!


Segher

Re: [PATCH] [rs6000] Rewrite the declaration of a variable

2020-03-03 Thread Segher Boessenkool

Hi!

On Tue, Mar 03, 2020 at 10:13:56AM -0600, Bin Bin Lv wrote:
> Rewrite the declaration of toc_section from the source file rs6000.c to its
> header file for standardizing the code.

> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 0faf44b..c0a6e86 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -181,7 +181,6 @@ static GTY(()) section *tls_private_data_section;
>  static GTY(()) section *read_only_private_data_section;
>  static GTY(()) section *sdata2_section;
>  
> -extern GTY(()) section *toc_section;
>  section *toc_section = 0;
>  
>  /* Describe the vector unit used for modes.  */
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 3844bec..e77a84a 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2494,6 +2494,7 @@ extern GTY(()) tree 
> rs6000_builtin_types[RS6000_BTI_MAX];
>  extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];
>  extern GTY(()) tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
>  extern GTY(()) tree altivec_builtin_mask_for_load;
> +extern union GTY(()) section *toc_section;

Why does this add "union"?


Segher

[PATCH] c++: Fix mismatch in template argument deduction [PR90505]

2020-03-03 Thread Marek Polacek

My GCC 9 patch for C++20 P0846R0 (ADL and function templates) tweaked
cp_parser_template_name to only return an identifier if name lookup
didn't find anything.  In the deduce4.C case it means that we now
return an OVERLOAD.  That means that cp_parser_template_id will call
lookup_template_function whereby producing a TEMPLATE_ID_EXPR with
unknown_type_node.  Previously, we created a TEMPLATE_ID_EXPR with
no type, making it type-dependent.  What we have now is no longer
type-dependent.  And so, when we call finish_call_expr after we've
parsed "foo(10)", even though we're in a template, we still do
the normal processing, thus perform overload resolution.  When adding
the template candidate foo we need to deduce the template arguments,
and that is where things go downhill.

When fn_type_unification sees that we have explicit template arguments,
but they aren't complete, it will use them to substitute the function
type.  So we substitute e.g. "void  (U)".  But the explicit
template argument was for a different parameter so we don't actually
substitute anything.  But the problem here was that we reduced the
template level of 'U' anyway.  So then when we're actually deducing
the template arguments via type_unification_real, we fail in unify:
22932   if (TEMPLATE_TYPE_LEVEL (parm)
22933   != template_decl_level (tparm))
22934 /* The PARM is not one we're trying to unify.  Just check
22935to see if it matches ARG.  */
because 'parm' has been reduced but 'tparm' has not yet.

Therefore we shouldn't reduce the template level of template parameters
when tf_partial aka template argument deduction substitution.  But we
can only return after performing the cp_build_qualified_type etc.
business otherwise things break horribly.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 9?  I'd like
to put this in before 9.3, if possible.

2020-03-03  Jason Merrill  
Marek Polacek  

PR c++/90505 - mismatch in template argument deduction.
* pt.c (tsubst): Don't reduce the template level of template
parameters when tf_partial.

* g++.dg/template/deduce4.C: New test.
* g++.dg/template/deduce5.C: New test.
* g++.dg/template/deduce6.C: New test.
* g++.dg/template/deduce7.C: New test.
---
 gcc/cp/pt.c | 14 --
 gcc/testsuite/g++.dg/template/deduce4.C | 17 +
 gcc/testsuite/g++.dg/template/deduce5.C | 17 +
 gcc/testsuite/g++.dg/template/deduce6.C | 17 +
 gcc/testsuite/g++.dg/template/deduce7.C | 10 ++
 5 files changed, 69 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/deduce4.C
 create mode 100644 gcc/testsuite/g++.dg/template/deduce5.C
 create mode 100644 gcc/testsuite/g++.dg/template/deduce6.C
 create mode 100644 gcc/testsuite/g++.dg/template/deduce7.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 230331f60cb..1c721b31176 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15057,12 +15057,6 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
int levels;
tree arg = NULL_TREE;
 
-   /* Early in template argument deduction substitution, we don't
-  want to reduce the level of 'auto', or it will be confused
-  with a normal template parm in subsequent deduction.  */
-   if (is_auto (t) && (complain & tf_partial))
- return t;
-
r = NULL_TREE;
 
gcc_assert (TREE_VEC_LENGTH (args) > 0);
@@ -15193,6 +15187,14 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
 about the template parameter in question.  */
  return t;
 
+   /* Early in template argument deduction substitution, we don't
+  want to reduce the level of 'auto', or it will be confused
+  with a normal template parm in subsequent deduction.
+  Similarly, don't reduce the level of template parameters to
+  avoid mismatches when deducing their types.  */
+   if (complain & tf_partial)
+ return t;
+
/* If we get here, we must have been looking at a parm for a
   more deeply nested template.  Make a new version of this
   template parameter, but with a lower level.  */
diff --git a/gcc/testsuite/g++.dg/template/deduce4.C 
b/gcc/testsuite/g++.dg/template/deduce4.C
new file mode 100644
index 000..e2c165dc788
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/deduce4.C
@@ -0,0 +1,17 @@
+// PR c++/90505 - mismatch in template argument deduction.
+// { dg-do compile }
+
+template 
+struct S {
+  template 
+  static void foo(V) { }
+
+  void bar () { foo(10); }
+};
+
+void
+test ()
+{
+  S s;
+  s.bar ();
+}
diff --git a/gcc/testsuite/g++.dg/template/deduce5.C 
b/gcc/testsuite/g++.dg/template/deduce5.C
new file mode 100644
index 000..9d382bfe03a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/deduce5.C
@@ -0,0 +1,17 @@
+// PR c++/90505 - mismatch in template argu

Re: [PATCH] c++: Fix mismatch in template argument deduction [PR90505]

2020-03-03 Thread Jason Merrill


On 3/3/20 7:50 PM, Marek Polacek wrote:

My GCC 9 patch for C++20 P0846R0 (ADL and function templates) tweaked
cp_parser_template_name to only return an identifier if name lookup
didn't find anything.  In the deduce4.C case it means that we now
return an OVERLOAD.  That means that cp_parser_template_id will call
lookup_template_function whereby producing a TEMPLATE_ID_EXPR with
unknown_type_node.  Previously, we created a TEMPLATE_ID_EXPR with
no type, making it type-dependent.  What we have now is no longer
type-dependent.  And so, when we call finish_call_expr after we've
parsed "foo(10)", even though we're in a template, we still do
the normal processing, thus perform overload resolution.  When adding
the template candidate foo we need to deduce the template arguments,
and that is where things go downhill.

When fn_type_unification sees that we have explicit template arguments,
but they aren't complete, it will use them to substitute the function
type.  So we substitute e.g. "void  (U)".  But the explicit
template argument was for a different parameter so we don't actually
substitute anything.  But the problem here was that we reduced the
template level of 'U' anyway.  So then when we're actually deducing
the template arguments via type_unification_real, we fail in unify:
22932   if (TEMPLATE_TYPE_LEVEL (parm)
22933   != template_decl_level (tparm))
22934 /* The PARM is not one we're trying to unify.  Just check
22935to see if it matches ARG.  */
because 'parm' has been reduced but 'tparm' has not yet.

Therefore we shouldn't reduce the template level of template parameters
when tf_partial aka template argument deduction substitution.  But we
can only return after performing the cp_build_qualified_type etc.
business otherwise things break horribly.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 9?  I'd like
to put this in before 9.3, if possible.


OK.


2020-03-03  Jason Merrill  
Marek Polacek  

PR c++/90505 - mismatch in template argument deduction.
* pt.c (tsubst): Don't reduce the template level of template
parameters when tf_partial.

* g++.dg/template/deduce4.C: New test.
* g++.dg/template/deduce5.C: New test.
* g++.dg/template/deduce6.C: New test.
* g++.dg/template/deduce7.C: New test.
---
  gcc/cp/pt.c | 14 --
  gcc/testsuite/g++.dg/template/deduce4.C | 17 +
  gcc/testsuite/g++.dg/template/deduce5.C | 17 +
  gcc/testsuite/g++.dg/template/deduce6.C | 17 +
  gcc/testsuite/g++.dg/template/deduce7.C | 10 ++
  5 files changed, 69 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/deduce4.C
  create mode 100644 gcc/testsuite/g++.dg/template/deduce5.C
  create mode 100644 gcc/testsuite/g++.dg/template/deduce6.C
  create mode 100644 gcc/testsuite/g++.dg/template/deduce7.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 230331f60cb..1c721b31176 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15057,12 +15057,6 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
int levels;
tree arg = NULL_TREE;
  
-	/* Early in template argument deduction substitution, we don't

-  want to reduce the level of 'auto', or it will be confused
-  with a normal template parm in subsequent deduction.  */
-   if (is_auto (t) && (complain & tf_partial))
- return t;
-
r = NULL_TREE;
  
  	gcc_assert (TREE_VEC_LENGTH (args) > 0);

@@ -15193,6 +15187,14 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
 about the template parameter in question.  */
  return t;
  
+	/* Early in template argument deduction substitution, we don't

+  want to reduce the level of 'auto', or it will be confused
+  with a normal template parm in subsequent deduction.
+  Similarly, don't reduce the level of template parameters to
+  avoid mismatches when deducing their types.  */
+   if (complain & tf_partial)
+ return t;
+
/* If we get here, we must have been looking at a parm for a
   more deeply nested template.  Make a new version of this
   template parameter, but with a lower level.  */
diff --git a/gcc/testsuite/g++.dg/template/deduce4.C 
b/gcc/testsuite/g++.dg/template/deduce4.C
new file mode 100644
index 000..e2c165dc788
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/deduce4.C
@@ -0,0 +1,17 @@
+// PR c++/90505 - mismatch in template argument deduction.
+// { dg-do compile }
+
+template 
+struct S {
+  template 
+  static void foo(V) { }
+
+  void bar () { foo(10); }
+};
+
+void
+test ()
+{
+  S s;
+  s.bar ();
+}
diff --git a/gcc/testsuite/g++.dg/template/deduce5.C 
b/gcc/testsuite/g++.dg/template/deduce5.C
new file mode 100644
index 000..9d382bfe03a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/deduce5.C
@@ -0,0 +1,17

Ping^1 [PATCH v3] ipa-cp: Fix PGO regression caused by r278808

2020-03-03 Thread luoxhu

Hi Honza,

The performance regression still exists. For exchange2, the performance
is about 28% slower for option:
"-fprofile-generate/-fprofile-use --param ipa-cp-eval-threshold=0 --param 
ipa-cp-unit-growth=80 -fno-inline".

r278808:
commit ad06966f6677d55c11214d9c7b6d5518f915e341
Author: hubicka 
Date:   Thu Nov 28 14:16:29 2019 +

* ipa-cp.c (update_profiling_info): Fix scaling.


Fix v3 patch and logs are here:
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00764.html


Thanks
Xionghu

On 2020/1/14 14:45, luoxhu wrote:
> Hi,
> 
> On 2020/1/3 00:58, Jan Hubicka wrote:
>>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>>> index 14064ae0034..947bf7c7199 100644
>>> --- a/gcc/ipa-cp.c
>>> +++ b/gcc/ipa-cp.c
>>> @@ -4272,6 +4272,31 @@ update_profiling_info (struct cgraph_node 
>>> *orig_node,
>>>     false);
>>>     new_sum = stats.count_sum;
>>> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);
>>> +  if (info && info->node_is_self_scc)
>>> +    {
>>> +  profile_count self_recursive_count;
>>> +
>>> +  /* The self recursive edge is on the orig_node.  */
>>> +  for (cs = orig_node->callees; cs; cs = cs->next_callee)
>>> +    if (ipa_edge_within_scc (cs))
>>> +  {
>>> +    cgraph_node *callee = cs->callee->function_symbol ();
>>> +    if (callee && cs->caller == cs->callee)
>>> +  {
>>> +    self_recursive_count = cs->count;
>>> +    break;
>>> +  }
>>> +  }
>>
>> What happens here when there are multiple self recursive calls or when
>> the SCC has two mutually recursive functions?
>>
>> I am still confused by the logic of this function.  I will take a deeper
>> look at your previous email.
>>> +
>>> +  /* Proportion count for self recursive node from all callers.  */
>>> +  new_sum
>>> +    = (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
>>> +
>>> +  /* Proportion count for specialized cloned node.  */
>>> +  new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
>>> +    }
>>> +
>>>     if (orig_node_count < orig_sum + new_sum)
>>>   {
>>>     if (dump_file)
>>> diff --git a/gcc/params.opt b/gcc/params.opt
>>> index d88ae0c468b..40a03b04580 100644
>>> --- a/gcc/params.opt
>>> +++ b/gcc/params.opt
>>> @@ -199,7 +199,7 @@ Common Joined UInteger 
>>> Var(param_ipa_cp_loop_hint_bonus) Init(64) Param
>>>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds 
>>> or strides known.
>>>   -param=ipa-cp-max-recursive-depth=
>>> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param
>>> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) 
>>> IntegerRange(1, 10) Param
>>>   Maximum depth of recursive cloning for self-recursive function.
>>
>> The values stats from 0 but I also wonder why 10 is a good upper bound?
>> If the function calls itself with one type of value (like depth-1) then
>> we may clone well over 10 times, if it calls itself with two different
>> sets then 8 is already quite high I would say...
>>
>> I suppose the call probabilities will eventually drop to be very low,
>> but I am not quite sure about that because we do not handle any IP
>> frequency propagation.  Do we have some way to treat wide trees? Or do
>> we clone only all self recursive calls are the same?
>>
>> Honza
> 
> Update v3 patch.  This regression could only be reproduced when built with
> "-fprofile-generate/-fprofile-use --param ipa-cp-eval-threshold=0 --param
> ipa-cp-unit-growth=80" on exchange_2, on some platforms -fno-inline may be
> also needed, I attached 3 files (compressed to exchange.tar.gz)
> exchange2_gcc.use.orig.wpa.071i.cp.old, exchange2_gcc.use.orig.wpa.071i.cp.new
> and cp.diff to show the profile count difference of specialized node
> digits_2.constprop/152 to digits_2.constprop/159 without/with this patch.
> 
> Profile count is decreasing slowly with this patch instead of keeping very
> small from the first clone (very low count as cold will cause complete unroll
> not working), it still differs from really execution (exchange2.png), but this
> average model takes the recursive edge as feedback.  Thanks.
> 
> 
> v3:
> 1. Enable proportion orig_sum to the new nodes for self recursive node (k 
> means
>    multiple self recursive calls):
>    new_sum = (orig_sum + new_sum) * 1 / k \
>    * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
> 2. Two mutually recursive functions are not supported in self recursive
>    clone yet so also not supported in update_profiling_info here.
> 3. Improve value range for param_ipa_cp_max_recursive_depth to (0, 8).
>    If it calls itself two different sets, usually recursive boudary limit
>    will stop the specialize first, otherwise it is slow even without
>    recursive specialize.
> 
> The performance of exchange2 built with PGO will decrease ~28% by r278808
> due to profile count set incorrectly.  The cloned nodes are updated to a
> very small coun

Re: [PATCH 3/4] libstdc++: Add a test range type that has a sized sentinel

2020-03-03 Thread Patrick Palka

On Tue, 3 Mar 2020, Jonathan Wakely wrote:

> On 03/03/20 11:30 -0500, Patrick Palka wrote:
> > This adds a test range type whose end() is a sized sentinel to
> > , which will be used in the tests that verify LWG
> > 3355.
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * testsuite/util/testsuite_iterators.h (test_range::get_iterator):
> > Make
> > protected instead of private.
> > (test_sized_range_sized_sent): New.
> > ---
> > .../testsuite/util/testsuite_iterators.h  | 32 +++
> > 1 file changed, 32 insertions(+)
> > 
> > diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h
> > b/libstdc++-v3/testsuite/util/testsuite_iterators.h
> > index e47b2b03e40..756940ed092 100644
> > --- a/libstdc++-v3/testsuite/util/testsuite_iterators.h
> > +++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h
> > @@ -735,6 +735,7 @@ namespace __gnu_test
> >   { return i.ptr - s.end; }
> > };
> > 
> > +protected:
> >   auto
> >   get_iterator(T* p)
> >   {
> > @@ -812,6 +813,37 @@ namespace __gnu_test
> > using test_output_sized_range
> >   = test_sized_range;
> > 
> > +  // A type meeting the minimum std::sized_range requirements, and whose
> > end()
> > +  // returns a size sentinel.
> 
> s/size/sized/ here, no?
> 
> OK for master.

Thanks for the review.  I committed this series with that change just
now.

Re: [PATCH] PR target/93995 ICE in patch_jump_insn, at cfgrtl.c:1290 on riscv64-linux-gnu

2020-03-03 Thread Kito Cheng

Committed, thanks :)


On Wed, Mar 4, 2020 at 6:23 AM Jim Wilson  wrote:
>
> On Tue, Mar 3, 2020 at 12:03 AM Kito Cheng  wrote:
> > gcc/ChangeLog
> > * config/riscv/riscv.c (riscv_emit_float_compare): Using NE to 
> > compare
> > the result of IOR.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.dg/pr93995.c: New.
>
> Thanks.  This looks good to me.
>
> Jim

Re: [PATCH] [rs6000] Fix a wrong GC issue

2020-03-03 Thread binbin


Hi

On 2020/3/4 上午8:25, Segher Boessenkool wrote:

Hi!

On Tue, Mar 03, 2020 at 09:40:47AM -0600, Bin Bin Lv wrote:

The source file rs6000.c was split up into several smaller source files
through commit 1acf024.  However, variable "altivec_builtin_mask_for_load" and
"builtin_mode_to_type[MAX_MACHINE_MODE][2]" were marked with the wrong syntax
"GTY(([options])) type name", which led these two variables were not marked as
roots correctly and wrongly GCed.  And when "altivec_builtin_mask_for_load"
was wrongly GCed, the compiling for openJDK is failed with ICEs enabling
precompiled header under mcpu=power7.  So roots must be declared using one of
the following syntaxes: "extern GTY(([options])) type name;" and "static
GTY(([options])) type name;".

And the following patch adds variable "altivec_builtin_mask_for_load" and
"builtin_mode_to_type[MAX_MACHINE_MODE][2]" into the roots array.



2020-03-03  Bin Bin Lv  

* config/rs6000/rs6000-internal.h (altivec_builtin_mask_for_load,
builtin_mode_to_type[MAX_MACHINE_MODE][2]): Remove GTY(()).
* config/rs6000/rs6000.h (altivec_builtin_mask_for_load,
builtin_mode_to_type[MAX_MACHINE_MODE][2]): Add an extern GTY(())
declaration.


Why in both of the header files?  Can you just remove the declaration
from rs6000-internal.h?


OK, removed.  Thanks.




* config/rs6000/rs6000.h (MAX_MACHINE_MODE): Include the header file
for MAX_MACHINE_MODE.


The changelog entry should say *what* file is included, and under what
condition.  It doesn't have to say why (that belongs in the commit
message).

But, can't you just include it unconditionally?  Don't we already,
anyway, via coretypes.h -> machmode.h -> insn-modes.h?


Segher



OK, change it to uncondition.  Thanks for your suggestion.
gcc/ChangeLog

2020-03-04  Bin Bin Lv  

* config/rs6000/rs6000-internal.h (altivec_builtin_mask_for_load,
builtin_mode_to_type[MAX_MACHINE_MODE][2]): Remove GTY(()).
* config/rs6000/rs6000.h (altivec_builtin_mask_for_load,
builtin_mode_to_type[MAX_MACHINE_MODE][2]): Add an extern GTY(())
declaration.
* config/rs6000/rs6000.h (MAX_MACHINE_MODE): Include insn-modes.h.
* config/rs6000/rs6000.c (altivec_builtin_mask_for_load,
builtin_mode_to_type[MAX_MACHINE_MODE][2]): Remove the GTY(())
declaration and add the definition.
---
 gcc/config/rs6000/rs6000-internal.h | 2 --
 gcc/config/rs6000/rs6000.c  | 4 ++--
 gcc/config/rs6000/rs6000.h  | 4 
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-internal.h 
b/gcc/config/rs6000/rs6000-internal.h
index a23e956..d331b9e 100644
--- a/gcc/config/rs6000/rs6000-internal.h
+++ b/gcc/config/rs6000/rs6000-internal.h
@@ -187,7 +187,5 @@ extern bool rs6000_passes_long_double;
 extern bool rs6000_passes_vector;
 extern bool rs6000_returns_struct;
 extern bool cpu_builtin_p;
-extern GTY(()) tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
-extern GTY(()) tree altivec_builtin_mask_for_load;
 
 #endif
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9910b27..0faf44b 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -99,7 +99,7 @@
 #endif
 
 /* Support targetm.vectorize.builtin_mask_for_load.  */
-GTY(()) tree altivec_builtin_mask_for_load;
+tree altivec_builtin_mask_for_load;
 
 #ifdef USING_ELFOS_H
 /* Counter for labels which are to be placed in .fixup.  */
@@ -196,7 +196,7 @@ enum reg_class rs6000_constraints[RS6000_CONSTRAINT_MAX];
 int rs6000_vector_align[NUM_MACHINE_MODES];
 
 /* Map selected modes to types for builtins.  */
-GTY(()) tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
+tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
 
 /* What modes to automatically generate reciprocal divide estimate (fre) and
reciprocal sqrt (frsqrte) for.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 1697186..cd3d054 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -35,6 +35,8 @@
 #include "config/rs6000/rs6000-modes.h"
 #endif
 
+#include "insn-modes.h"
+
 /* Definitions for the object file format.  These are set at
compile-time.  */
 
@@ -2488,6 +2490,8 @@ enum rs6000_builtin_type_index
 
 extern GTY(()) tree rs6000_builtin_types[RS6000_BTI_MAX];
 extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];
+extern GTY(()) tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
+extern GTY(()) tree altivec_builtin_mask_for_load;
 
 #ifndef USED_FOR_TARGET
 /* A C structure for machine-specific, per-function data.
-- 
1.8.3.1

[testsuite] Fix PR94023 to guard case under vect_hw_misalign

2020-03-03 Thread Kewen.Lin

Hi,

As PR94023 shows, the expected SLP requires misaligned vector access
support.  This patch is to guard the check under the target condition
vect_hw_misalign to ensure that.

Verified it on ppc64-redhat-linux (Power7 BE).

Is it ok for trunk, and backport to GCC 9 after some burn-in time?


BR,
Kewen


gcc/testsuite/ChangeLog

2020-03-04  Kewen Lin  

PR testsuite/94023
* gcc.dg/vect/slp-perm-12.c: Expect loop vectorized messages only on
vect_hw_misalign targets.


diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-12.c 
b/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
index 4d4c534..113223a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-12.c
@@ -49,4 +49,4 @@ int main()
   return 0;
 }

-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target vect_perm } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { vect_perm && vect_hw_misalign } } } } */

[testsuite] Fix PR94019 to allow one vector char when !vect_hw_misalign

2020-03-03 Thread Kewen.Lin

Hi,

As PR94019 shows, without misaligned vector access support but with
realign load, the vectorized loop will end up with realign scheme.
It generates mask (control vector) with return type vector signed
char which breaks the not check.

The fix is to differentiate powerpc vect_hw_misalign and powerpc
!vect_hw_misalign, permit one vector char occurance for powerpc
!vect_hw_misalign and keep other targets same as before.

Verified it on ppc64-redhat-linux (Power7 BE).

Is it ok for trunk, and backport to GCC 9 after some burn-in time?


BR,
Kewen


gcc/testsuite/ChangeLog

2020-03-04  Kewen Lin  

PR testsuite/94019
* gcc.dg/vect/vect-over-widen-17.c: Expect one vector char if it's on
POWER and without misaligned vector access support.



--- a/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-over-widen-17.c
@@ -41,6 +41,10 @@ main (void)
 }

 /* { dg-final { scan-tree-dump-not {vect_recog_over_widening_pattern: 
detected} "vect" } } */
-/* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector[^\n]*char} "vect" { target { { ! 
powerpc*-*-* } || { powerpc*-*-* && vect_hw_misalign } } } } }
+/* On Power, if there is no vect_hw_misalign support, unaligned vector access
+   adopts realign_load scheme.  It requires rs6000_builtin_mask_for_load to
+   generate mask whose return type is vector char.  */
+/* { dg-final { scan-tree-dump-times {vector[^\n]*char} 1 "vect" { target { 
powerpc*-*-* && { ! vect_hw_misalign } } } } } */
 /* { dg-final { scan-tree-dump-not {vector[^ ]* int} "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" } } */

[PATCH] tailcall: Fix up process_assignment [PR94001]

2020-03-03 Thread Jakub Jelinek

Hi!

When a function returns void or the return value is ignored, ass_var
is NULL_TREE.  The tail recursion handling generally assumes DCE has been
performed and so doesn't expect to encounter useless assignments after the
call and expects them to be part of the return value adjustment that need
to be changed into tail recursion additions/multiplications.
process_assignment does some verification and has a way to tell the caller
to try to move dead or whatever other stmts that don't participate in the
return value modifications before it is returned.
For binary rhs assignments it is just fine, neither op0 nor op1 will be
NULL_TREE and thus if *ass_var is NULL_TREE, it will not match, but unary
rhs is handled by only setting op0 to rhs1 and setting op1 to NULL_TREE.
And at this point, NULL_TREE == NULL_TREE and thus we think e.g. the
  c_2 = -e_3(D);
dead stmt is actually a return value modification, so we queue it as
multiplication and then create a void type SSA_NAME accumulator for it
and ICE shortly after.

Fixed by making sure op1 == *ass_var comparison is done only if *ass_var.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-03-04  Jakub Jelinek  

PR tree-optimization/94001
* tree-tailcall.c (process_assignment): Before comparing op1 to
*ass_var, verify *ass_var is non-NULL.

* gcc.dg/pr94001.c: New test.

--- gcc/tree-tailcall.c.jj  2020-01-12 11:54:38.517381665 +0100
+++ gcc/tree-tailcall.c 2020-03-03 20:38:54.282458700 +0100
@@ -339,7 +339,8 @@ process_assignment (gassign *stmt,
   && (non_ass_var = independent_of_stmt_p (op1, stmt, call,
to_move)))
 ;
-  else if (op1 == *ass_var
+  else if (*ass_var
+  && op1 == *ass_var
   && (non_ass_var = independent_of_stmt_p (op0, stmt, call,
to_move)))
 ;
--- gcc/testsuite/gcc.dg/pr94001.c.jj   2020-03-03 20:40:20.848184911 +0100
+++ gcc/testsuite/gcc.dg/pr94001.c  2020-03-03 20:34:13.415591577 +0100
@@ -0,0 +1,11 @@
+/* PR tree-optimization/94001 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-dce" } */
+
+void
+bar (int e)
+{
+  bar (3);
+  int c;
+  c = -e;
+}

Jakub

Re: [PATCH] [rs6000] Rewrite the declaration of a variable

2020-03-03 Thread binbin


Hi

On 2020/3/4 上午8:33, Segher Boessenkool wrote:

Hi!

On Tue, Mar 03, 2020 at 10:13:56AM -0600, Bin Bin Lv wrote:

Rewrite the declaration of toc_section from the source file rs6000.c to its
header file for standardizing the code.



diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 0faf44b..c0a6e86 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -181,7 +181,6 @@ static GTY(()) section *tls_private_data_section;
  static GTY(()) section *read_only_private_data_section;
  static GTY(()) section *sdata2_section;
  
-extern GTY(()) section *toc_section;

  section *toc_section = 0;
  
  /* Describe the vector unit used for modes.  */

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3844bec..e77a84a 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2494,6 +2494,7 @@ extern GTY(()) tree rs6000_builtin_types[RS6000_BTI_MAX];
  extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];
  extern GTY(()) tree builtin_mode_to_type[MAX_MACHINE_MODE][2];
  extern GTY(()) tree altivec_builtin_mask_for_load;
+extern union GTY(()) section *toc_section;


Why does this add "union"?


Segher



If "union" is not added, it reports error showing unknown type name 
‘section’

in file included from ../../host-powerpc64le-unknown-linux-gnu/gcc/tm.h:25,
from ../.././libgcc/generic-morestack-thread.c:29:
extern GTY(()) section *toc_section.  Then add "union" to solve this. 
Thanks.

Re: [PATCH] tailcall: Fix up process_assignment [PR94001]

2020-03-03 Thread Richard Biener

On Wed, 4 Mar 2020, Jakub Jelinek wrote:

> Hi!
> 
> When a function returns void or the return value is ignored, ass_var
> is NULL_TREE.  The tail recursion handling generally assumes DCE has been
> performed and so doesn't expect to encounter useless assignments after the
> call and expects them to be part of the return value adjustment that need
> to be changed into tail recursion additions/multiplications.
> process_assignment does some verification and has a way to tell the caller
> to try to move dead or whatever other stmts that don't participate in the
> return value modifications before it is returned.
> For binary rhs assignments it is just fine, neither op0 nor op1 will be
> NULL_TREE and thus if *ass_var is NULL_TREE, it will not match, but unary
> rhs is handled by only setting op0 to rhs1 and setting op1 to NULL_TREE.
> And at this point, NULL_TREE == NULL_TREE and thus we think e.g. the
>   c_2 = -e_3(D);
> dead stmt is actually a return value modification, so we queue it as
> multiplication and then create a void type SSA_NAME accumulator for it
> and ICE shortly after.
> 
> Fixed by making sure op1 == *ass_var comparison is done only if *ass_var.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-03-04  Jakub Jelinek  
> 
>   PR tree-optimization/94001
>   * tree-tailcall.c (process_assignment): Before comparing op1 to
>   *ass_var, verify *ass_var is non-NULL.
> 
>   * gcc.dg/pr94001.c: New test.
> 
> --- gcc/tree-tailcall.c.jj2020-01-12 11:54:38.517381665 +0100
> +++ gcc/tree-tailcall.c   2020-03-03 20:38:54.282458700 +0100
> @@ -339,7 +339,8 @@ process_assignment (gassign *stmt,
>  && (non_ass_var = independent_of_stmt_p (op1, stmt, call,
>   to_move)))
>  ;
> -  else if (op1 == *ass_var
> +  else if (*ass_var
> +&& op1 == *ass_var
>  && (non_ass_var = independent_of_stmt_p (op0, stmt, call,
>   to_move)))
>  ;
> --- gcc/testsuite/gcc.dg/pr94001.c.jj 2020-03-03 20:40:20.848184911 +0100
> +++ gcc/testsuite/gcc.dg/pr94001.c2020-03-03 20:34:13.415591577 +0100
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/94001 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-dce" } */
> +
> +void
> +bar (int e)
> +{
> +  bar (3);
> +  int c;
> +  c = -e;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

94 matches

Mail list logo