Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread H.J. Lu
On 10/17/18, Marc Glisse  wrote:
> On Wed, 17 Oct 2018, H.J. Lu wrote:
>
>> We may simplify
>>
>>  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)
>>
>> to X when mode of X is the same as of mode of subreg.
>
> Hello,
>
> we already have code to simplify vec_select(vec_merge):
>
>  /* If we select elements in a vec_merge that all come from the same
> operand, select from that operand directly.  */
>
> It would make sense to me to make the subreg transform as similar to it as
> possible, in particular you don't need to special case vec_duplicate, the
> transformation would see that everything comes from the first vector,
> produce (subreg (vec_duplicate X) 0), and let another transformation
> optimize that.

What do you mean by another transformation? If simplify_subreg doesn't
return X for

  (subreg (vec_merge (vec_duplicate X)
 (vector)
 (const_int ((1 << N) | M)))
  (N * sizeof (X)))


no further transformation will be done.

-- 
H.J.


Re: [C++ PATCH] Allow __ prefix+suffix on C++11 attribute namespaces (PR c++/86288)

2018-10-18 Thread Jakub Jelinek
On Thu, Oct 18, 2018 at 12:45:00AM +0200, Jakub Jelinek wrote:
> Is a partial backport (just add
>   attr_id = canonicalize_attr_name (attr_id);
> in the else if (attr_ns) case plus the non-__gnu__ lines from the testcase)
> ok for 7/8 release branches where it ICEs?

Small clarification, only needs to go to 8.x, 7.x compiled it fine, it
got broken with r250911.

Jakub


Re: [PATCH] Reset insn priority after inc/ref replacement in haifa sched

2018-10-18 Thread Robin Dapp
Hi,

I added a check before calling priority in restore_pattern.  In the last
version, not checking that would lead to assertion failure in priority
since the insn might already have been scheduled.

Bootstrapped and regtested on x86_64 and ppc8, regtested on s390x.

Regards
 Robin

--

gcc/ChangeLog:

2018-10-16  Robin Dapp  

* haifa-sched.c (priority): Add force_recompute parameter.
(apply_replacement): Call priority () with force_recompute = true.
(restore_pattern): Likewise.
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index 1fdc9df9fb2..2c84ce38143 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -830,7 +830,7 @@ add_delay_dependencies (rtx_insn *insn)
 
 /* Forward declarations.  */
 
-static int priority (rtx_insn *);
+static int priority (rtx_insn *, bool force_recompute = false);
 static int autopref_rank_for_schedule (const rtx_insn *, const rtx_insn *);
 static int rank_for_schedule (const void *, const void *);
 static void swap_sort (rtx_insn **, int);
@@ -1590,7 +1590,7 @@ bool sched_fusion;
 
 /* Compute the priority number for INSN.  */
 static int
-priority (rtx_insn *insn)
+priority (rtx_insn *insn, bool force_recompute)
 {
   if (! INSN_P (insn))
 return 0;
@@ -1598,7 +1598,7 @@ priority (rtx_insn *insn)
   /* We should not be interested in priority of an already scheduled insn.  */
   gcc_assert (QUEUE_INDEX (insn) != QUEUE_SCHEDULED);
 
-  if (!INSN_PRIORITY_KNOWN (insn))
+  if (force_recompute || !INSN_PRIORITY_KNOWN (insn))
 {
   int this_priority = -1;
 
@@ -4713,7 +4713,12 @@ apply_replacement (dep_t dep, bool immediately)
   success = validate_change (desc->insn, desc->loc, desc->newval, 0);
   gcc_assert (success);
 
+  rtx_insn *insn = DEP_PRO (dep);
+
+  /* Recompute priority since dependent priorities may have changed.  */
+  priority (insn, true);
   update_insn_after_change (desc->insn);
+
   if ((TODO_SPEC (desc->insn) & (HARD_DEP | DEP_POSTPONED)) == 0)
 	fix_tick_ready (desc->insn);
 
@@ -4767,7 +4772,17 @@ restore_pattern (dep_t dep, bool immediately)
 
   success = validate_change (desc->insn, desc->loc, desc->orig, 0);
   gcc_assert (success);
+
+  rtx_insn *insn = DEP_PRO (dep);
+
+  if (QUEUE_INDEX (insn) != QUEUE_SCHEDULED)
+	{
+	  /* Recompute priority since dependent priorities may have changed.  */
+	  priority (insn, true);
+	}
+
   update_insn_after_change (desc->insn);
+
   if (backtrack_queue != NULL)
 	{
 	  backtrack_queue->replacement_deps.safe_push (dep);


Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread Richard Sandiford
"H.J. Lu"  writes:
> We can simplify
>
>   (subreg (vec_merge (vec_duplicate X)
>(vector)
>(const_int ((1 << N) | M)))
> (N * sizeof (X)))
>
> to X when mode of X is the same as of mode of subreg.
>
> gcc/
>
>   PR target/87537
>   * simplify-rtx.c (simplify_subreg): Simplify subreg of vec_merge
>   of vec_duplicate.
>   (test_vector_ops_duplicate): Add test for a scalar subreg of a
>   VEC_MERGE of a VEC_DUPLICATE.
>
> gcc/testsuite/
>
>   PR target/87537
>   * gcc.target/i386/pr87537-1.c: New test.

OK, thanks.

Richard


Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread Richard Sandiford
"H.J. Lu"  writes:
> On 10/17/18, Marc Glisse  wrote:
>> On Wed, 17 Oct 2018, H.J. Lu wrote:
>>
>>> We may simplify
>>>
>>>  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)
>>>
>>> to X when mode of X is the same as of mode of subreg.
>>
>> Hello,
>>
>> we already have code to simplify vec_select(vec_merge):
>>
>>  /* If we select elements in a vec_merge that all come from the same
>> operand, select from that operand directly.  */
>>
>> It would make sense to me to make the subreg transform as similar to it as
>> possible, in particular you don't need to special case vec_duplicate, the
>> transformation would see that everything comes from the first vector,
>> produce (subreg (vec_duplicate X) 0), and let another transformation
>> optimize that.

Sorry, didn't see this before the OK.

> What do you mean by another transformation? If simplify_subreg doesn't
> return X for
>
>   (subreg (vec_merge (vec_duplicate X)
>(vector)
>(const_int ((1 << N) | M)))
> (N * sizeof (X)))
>
>
> no further transformation will be done.

I think the point was that we should transform:

  (subreg (vec_merge X
 (vector)
 (const_int ((1 << N) | M)))
  (N * sizeof (X)))

into:

  simplify_gen_subreg (outermode, X, innermode, byte)

which should further simplify when X is a vec_duplicate.

Thanks,
Richard


[PATCH] Revert fix for PR84204

2018-10-18 Thread Richard Biener


I have tested the following patch to revert the fix for PR84204.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk
and branch.

Richard.

2018-10-18  Richard Biener  

PR middle-end/87087
Revert
2018-02-07  Richard Biener  

PR tree-optimization/84204
* tree-chrec.c (chrec_fold_plus_1): Remove size limiting in
this place.

* gcc.dg/torture/pr87087.c: New testcase.
* gcc.dg/graphite/pr84204.c: XFAIL.
* gcc.dg/graphite/pr85935.c: Likewise.

Index: gcc/testsuite/gcc.dg/graphite/pr84204.c
===
--- gcc/testsuite/gcc.dg/graphite/pr84204.c (revision 265234)
+++ gcc/testsuite/gcc.dg/graphite/pr84204.c (working copy)
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O -floop-parallelize-all -fno-tree-loop-im --param 
scev-max-expr-size=3" } */
+/* The fix for PR84204 was reverted.  */
+/* { dg-additional-options "--param graphite-allow-codegen-errors=1" } */
 
 int oc;
 
Index: gcc/testsuite/gcc.dg/graphite/pr85935.c
===
--- gcc/testsuite/gcc.dg/graphite/pr85935.c (revision 265234)
+++ gcc/testsuite/gcc.dg/graphite/pr85935.c (working copy)
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O -floop-parallelize-all -fno-tree-loop-im --param 
scev-max-expr-size=3" } */
+/* The fix for PR84204 was reverted.  */
+/* { dg-additional-options "--param graphite-allow-codegen-errors=1" } */
 
 typedef int dq;
 
Index: gcc/testsuite/gcc.dg/torture/pr87087.c
===
--- gcc/testsuite/gcc.dg/torture/pr87087.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr87087.c  (working copy)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int32plus } */
+
+int b;
+int d;
+void e()
+{
+  unsigned f;
+  unsigned g;
+  int h;
+  long i = 901380;
+  for (;;) {
+  d = 0;
+  for (; d; d++) {
+ h = 143366620;
+ f = 0;
+ for (; f < 15; f += 3) {
+ g = 0;
+ for (; g < 9; g++)
+   b = h = i - (h << 5) + h;
+ }
+  }
+  i = 0;
+  }
+}
Index: gcc/tree-chrec.c
===
--- gcc/tree-chrec.c(revision 265234)
+++ gcc/tree-chrec.c(working copy)
@@ -375,10 +375,12 @@ chrec_fold_plus_1 (enum tree_code code,
 
default:
  {
-   if (tree_contains_chrecs (op0, NULL)
-   || tree_contains_chrecs (op1, NULL))
+   int size = 0;
+   if ((tree_contains_chrecs (op0, &size)
+|| tree_contains_chrecs (op1, &size))
+   && size < PARAM_VALUE (PARAM_SCEV_MAX_EXPR_SIZE))
  return build2 (code, type, op0, op1);
-   else
+   else if (size < PARAM_VALUE (PARAM_SCEV_MAX_EXPR_SIZE))
  {
if (code == POINTER_PLUS_EXPR)
  return fold_build_pointer_plus (fold_convert (type, op0),
@@ -388,6 +390,8 @@ chrec_fold_plus_1 (enum tree_code code,
  fold_convert (type, op0),
  fold_convert (type, op1));
  }
+   else
+ return chrec_dont_know;
  }
}
 }


[PATCH] S/390: Add loc patterns for QImode and HImode

2018-10-18 Thread Robin Dapp
Hi,

this enables QImode and HImode for load on condition.  For SPEC2006 this
reduces code size overall, performance impact is negligible.

Regtested on s390x.

Regards
 Robin

--

gcc/ChangeLog:

2018-10-18  Robin Dapp  

* config/s390/s390.md: Add movcc for QImode and HImode.
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 70a619f06f5..6c687a1416b 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -614,6 +614,9 @@
 (define_mode_iterator DD_DF [DF DD])
 (define_mode_iterator TD_TF [TF TD])
 
+(define_mode_iterator QHI [QI HI])
+(define_mode_attr qhi_si_offset [(QI "3") (HI "2")])
+
 ;; These mode iterators allow 31-bit and 64-bit GPR patterns to be generated
 ;; from the same template.
 (define_mode_iterator GPR [(DI "TARGET_ZARCH") SI])
@@ -6593,6 +6596,49 @@
  XEXP (operands[1], 1));
 })
 
+;;
+;; - Allow QImode and HImode
+(define_expand "movcc"
+ [(set (match_dup 4) (match_operand:QHI 2 "nonimmediate_operand" ""))
+ (set (match_dup 5) (match_operand:QHI 3 "loc_operand" ""))
+ (set (match_dup 6) (if_then_else:SI (match_operand 1 "comparison_operator" "")
+		 (match_dup 4) (match_dup 5)))
+ (set (match_operand:QHI 0 "nonimmediate_operand" "") (subreg:QHI (match_dup 6) ))]
+ "TARGET_Z196"
+{
+  operands[4] = gen_reg_rtx (E_SImode);
+  operands[5] = gen_reg_rtx (E_SImode);
+  operands[6] = gen_reg_rtx (E_SImode);
+
+  if (!CONSTANT_P (operands[2]) && !MEM_P (operands[2]))
+{
+  operands[2] = simplify_gen_subreg (E_SImode, operands[2], mode, 0);
+}
+  else if (MEM_P (operands[2]))
+{
+  rtx tmp = gen_reg_rtx (E_SImode);
+  if (mode == E_QImode)
+	emit_insn (gen_zero_extendqisi2 (tmp, operands[2]));
+  else if (mode == E_HImode)
+	emit_insn (gen_zero_extendhisi2 (tmp, operands[2]));
+  operands[2] = tmp;
+}
+
+  if (!CONSTANT_P (operands[3]) && !MEM_P (operands[3]))
+{
+  operands[3] = simplify_gen_subreg (E_SImode, operands[3], mode, 0);
+}
+
+  /* Emit the comparison insn in case we do not already have a comparison
+ result. */
+  if (!s390_comparison (operands[1], VOIDmode))
+operands[1] = s390_emit_compare (GET_CODE (operands[1]),
+			  XEXP (operands[1], 0),
+			  XEXP (operands[1], 1));
+})
+
+
+
 ; locr, loc, stoc, locgr, locg, stocg, lochi, locghi
 (define_insn "*movcc"
   [(set (match_operand:GPR 0 "nonimmediate_operand"   "=d,d,d,d,d,d,S,S")


Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread H.J. Lu
On 10/18/18, Richard Sandiford  wrote:
> "H.J. Lu"  writes:
>> On 10/17/18, Marc Glisse  wrote:
>>> On Wed, 17 Oct 2018, H.J. Lu wrote:
>>>
 We may simplify

  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)

 to X when mode of X is the same as of mode of subreg.
>>>
>>> Hello,
>>>
>>> we already have code to simplify vec_select(vec_merge):
>>>
>>>  /* If we select elements in a vec_merge that all come from the same
>>> operand, select from that operand directly.  */
>>>
>>> It would make sense to me to make the subreg transform as similar to it
>>> as
>>> possible, in particular you don't need to special case vec_duplicate,
>>> the
>>> transformation would see that everything comes from the first vector,
>>> produce (subreg (vec_duplicate X) 0), and let another transformation
>>> optimize that.
>
> Sorry, didn't see this before the OK.
>
>> What do you mean by another transformation? If simplify_subreg doesn't
>> return X for
>>
>>   (subreg (vec_merge (vec_duplicate X)
>>   (vector)
>>   (const_int ((1 << N) | M)))
>>(N * sizeof (X)))
>>
>>
>> no further transformation will be done.
>
> I think the point was that we should transform:
>
>   (subreg (vec_merge X
>(vector)
>(const_int ((1 << N) | M)))
> (N * sizeof (X)))
>
> into:
>
>   simplify_gen_subreg (outermode, X, innermode, byte)
>
> which should further simplify when X is a vec_duplicate.

But sizeof (X) is the size of scalar of vec_dup.  How do we
check the mask of vec_merge?

-- 
H.J.


[PATCH] i386: Enable AVX512 memory broadcast for FMA

2018-10-18 Thread H.J. Lu
Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FMA operations.

gcc/

PR target/72782
* config/i386/sse.md (VF_AVX512): New.
(avx512bcst): Likewise.
(*fma_fmadd__bcst_1):
Likewise.
(*fma_fmadd__bcst_2):
Likewise.
(*fma_fmadd__bcst_3):
Likewise.

gcc/testsuite/

PR target/72782
* gcc.target/i386/avx512-fma-1.h: New file.
* gcc.target/i386/avx512-fma-2.h: Likewise.
* gcc.target/i386/avx512-fma-3.h: Likewise.
* gcc.target/i386/avx512-fma-4.h: Likewise.
* gcc.target/i386/avx512-fma-5.h: Likewise.
* gcc.target/i386/avx512-fma-6.h: Likewise.
* gcc.target/i386/avx512-fma-7.h: Likewise.
* gcc.target/i386/avx512f-fmadd-df-zmm-1.c: Likewise.
* gcc.target/i386/avx512f-fmadd-sf-zmm-1.c: Likewise.
* gcc.target/i386/avx512f-fmadd-sf-zmm-2.c: Likewise.
* gcc.target/i386/avx512f-fmadd-sf-zmm-3.c: Likewise.
* gcc.target/i386/avx512f-fmadd-sf-zmm-4.c: Likewise.
* gcc.target/i386/avx512f-fmadd-sf-zmm-5.c: Likewise.
* gcc.target/i386/avx512f-fmadd-sf-zmm-6.c: Likewise.
* gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Likewise.
* gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c: Likewise.
* gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c: Likewise.
---
 gcc/config/i386/sse.md| 50 +++
 gcc/testsuite/gcc.target/i386/avx512-fma-1.h  | 12 +
 gcc/testsuite/gcc.target/i386/avx512-fma-2.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-3.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-4.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-5.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-6.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-7.h  | 13 +
 .../gcc.target/i386/avx512f-fmadd-df-zmm-1.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-1.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-2.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-3.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-4.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-5.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-6.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c  | 11 
 .../gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c | 12 +
 .../gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c | 12 +
 18 files changed, 259 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-1.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-2.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-3.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-4.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-5.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-6.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-7.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-df-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 13dc7370fd3..594975a8b80 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -654,6 +654,16 @@
(V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
 (define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF])
 
+(define_mode_iterator VF_AVX512
+  [(V4SF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
+   (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
+   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
+
+(define_mode_attr avx512bcst
+  [(V4SF "%{1to4%}") (V2DF "%{1to2%}")
+   (V8SF "%{1to8%}") (V4DF "%{1to4%}")
+   (V16SF "%{1to16%}") (V8DF "%{1to8%}")])
+
 ;; Mapping from float mode to required SSE level
 (define_mode_attr sse
   [(SF "sse") (DF "sse2")
@@ -3740,6 +3750,46 @@
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
+(define_insn "*fma_fmadd__bcst_1"
+  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
+   (fma:VF_AVX512
+ (match_operand:VF_AVX512 1 "nonimmediate_operand" "0,v")
+ (match_operand:VF_AVX512 2 "nonimmediate_operand" "v,0")
+ (vec_duplicate:VF_AVX512
+   (match_operand: 3 "nonimmediate_operand" "m,m"]
+  "TARGET_AVX512F && "
+  "vfmadd213\t{%3, %2, 
%0|%0, %2, %3}"
+  [(set_attr "type" "ssemuladd"

Re: [PATCH] i386: Enable AVX512 memory broadcast for FMA

2018-10-18 Thread Uros Bizjak
On Thu, Oct 18, 2018 at 11:11 AM H.J. Lu  wrote:
>
> Many AVX512 vector operations can broadcast from a scalar memory source.
> This patch enables memory broadcast for FMA operations.
>
> gcc/
>
> PR target/72782
> * config/i386/sse.md (VF_AVX512): New.
> (avx512bcst): Likewise.
> (*fma_fmadd__bcst_1):
> Likewise.
> (*fma_fmadd__bcst_2):
> Likewise.
> (*fma_fmadd__bcst_3):
> Likewise.
>
> gcc/testsuite/
>
> PR target/72782
> * gcc.target/i386/avx512-fma-1.h: New file.
> * gcc.target/i386/avx512-fma-2.h: Likewise.
> * gcc.target/i386/avx512-fma-3.h: Likewise.
> * gcc.target/i386/avx512-fma-4.h: Likewise.
> * gcc.target/i386/avx512-fma-5.h: Likewise.
> * gcc.target/i386/avx512-fma-6.h: Likewise.
> * gcc.target/i386/avx512-fma-7.h: Likewise.
> * gcc.target/i386/avx512f-fmadd-df-zmm-1.c: Likewise.
> * gcc.target/i386/avx512f-fmadd-sf-zmm-1.c: Likewise.
> * gcc.target/i386/avx512f-fmadd-sf-zmm-2.c: Likewise.
> * gcc.target/i386/avx512f-fmadd-sf-zmm-3.c: Likewise.
> * gcc.target/i386/avx512f-fmadd-sf-zmm-4.c: Likewise.
> * gcc.target/i386/avx512f-fmadd-sf-zmm-5.c: Likewise.
> * gcc.target/i386/avx512f-fmadd-sf-zmm-6.c: Likewise.
> * gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Likewise.
> * gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c: Likewise.
> * gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c: Likewise.
> ---
>  gcc/config/i386/sse.md| 50 +++
>  gcc/testsuite/gcc.target/i386/avx512-fma-1.h  | 12 +
>  gcc/testsuite/gcc.target/i386/avx512-fma-2.h  | 13 +
>  gcc/testsuite/gcc.target/i386/avx512-fma-3.h  | 13 +
>  gcc/testsuite/gcc.target/i386/avx512-fma-4.h  | 13 +
>  gcc/testsuite/gcc.target/i386/avx512-fma-5.h  | 13 +
>  gcc/testsuite/gcc.target/i386/avx512-fma-6.h  | 13 +
>  gcc/testsuite/gcc.target/i386/avx512-fma-7.h  | 13 +
>  .../gcc.target/i386/avx512f-fmadd-df-zmm-1.c  | 12 +
>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-1.c  | 12 +
>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-2.c  | 12 +
>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-3.c  | 12 +
>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-4.c  | 12 +
>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-5.c  | 12 +
>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-6.c  | 12 +
>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c  | 11 
>  .../gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c | 12 +
>  .../gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c | 12 +
>  18 files changed, 259 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-1.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-2.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-3.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-4.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-5.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-6.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-7.h
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-df-zmm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 13dc7370fd3..594975a8b80 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -654,6 +654,16 @@
> (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
>  (define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF])
>
> +(define_mode_iterator VF_AVX512
> +  [(V4SF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
> +   (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
> +   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])

No need for TARGET_AVX512F conditions, since TARGET_AVX512F is
baseline for these modes and is expressed in insn condition.

> +(define_mode_attr avx512bcst
> +  [(V4SF "%{1to4%}") (V2DF "%{1to2%}")
> +   (V8SF "%{1to8%}") (V4DF "%{1to4%}")
> +   (V16SF "%{1to16%}") (V8DF "%{1to8%}")])
> +
>  ;; Mapping from float mode to required SSE level
>  (define_mode_attr sse
>[(SF "sse") (DF "sse2")
> @@ -3740,6 +3750,46 @@
>[(set_attr "type" "ssemuladd")
> (set_attr "mode" "")])
>
> +(define_insn "*fma_fmadd__bcst_1"
> +  [(set (match_operand:VF_AVX512 0 "register_operan

Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread Richard Sandiford
"H.J. Lu"  writes:
> On 10/18/18, Richard Sandiford  wrote:
>> "H.J. Lu"  writes:
>>> On 10/17/18, Marc Glisse  wrote:
 On Wed, 17 Oct 2018, H.J. Lu wrote:

> We may simplify
>
>  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)
>
> to X when mode of X is the same as of mode of subreg.

 Hello,

 we already have code to simplify vec_select(vec_merge):

  /* If we select elements in a vec_merge that all come from the same
 operand, select from that operand directly.  */

 It would make sense to me to make the subreg transform as similar to it
 as
 possible, in particular you don't need to special case vec_duplicate,
 the
 transformation would see that everything comes from the first vector,
 produce (subreg (vec_duplicate X) 0), and let another transformation
 optimize that.
>>
>> Sorry, didn't see this before the OK.
>>
>>> What do you mean by another transformation? If simplify_subreg doesn't
>>> return X for
>>>
>>>   (subreg (vec_merge (vec_duplicate X)
>>>  (vector)
>>>  (const_int ((1 << N) | M)))
>>>   (N * sizeof (X)))
>>>
>>>
>>> no further transformation will be done.
>>
>> I think the point was that we should transform:
>>
>>   (subreg (vec_merge X
>>   (vector)
>>   (const_int ((1 << N) | M)))
>>(N * sizeof (X)))
>>
>> into:
>>
>>   simplify_gen_subreg (outermode, X, innermode, byte)
>>
>> which should further simplify when X is a vec_duplicate.
>
> But sizeof (X) is the size of scalar of vec_dup.  How do we
> check the mask of vec_merge?

Yeah, should be sizeof (outermode) (which was the same thing
in the original pattern, but not here).

Richard


Re: [Patch, fortran] PR58618 - Wrong code with character substring and ASSOCIATE

2018-10-18 Thread Paul Richard Thomas
I do not think that there will be a PR for the ICE. This is a
regression introduced by my patch for PR70149 (September 30th). A
patch is attached. I will commit it as 'obvious' as soon as it has
finished regtesting. I will also commit the patch for PR58618 shortly
afterwards. Thanks for the review.

Paul

On Wed, 17 Oct 2018 at 22:17, Tobias Burnus  wrote:
>
> Hi Paul,
>
> Paul Richard Thomas wrote:
> > This problem concerned associate targets being substrings. It turns
> > out that they are returned as pointer types (with a different cast for
> > unity based substrings ***sigh***) and so can be assigned directly to
> > the associate name. The patch quite simply removed the condition that
> > such targets be allocatable, pointer or dummy.
> > I noticed in the course of working up the testcase that
> >  character (:), pointer :: ptr => NULL()
> >  character (6), target :: tgt = 'lmnopq'
> >  ptr => tgt
> >  print *, len (ptr), ptr
> > end
> > ICEs on the NULL initialization of the pointer but works fine if this
> > is removed. Has this already been posted as a PR?
>
>
> I leave it to Dominique to search for a PR; otherwise, I believe the
> attach patch fixes the issue. – It just needs someone to package it with
> a test case, regtest and commit it.
>
>
> > Bootstrapped and regtested on FC28/x86_64 - OK for trunk?
>
> OK – thanks for the fix.
>
> Tobias
>
> > 2018-10-17  Paul Thomas  
> >
> >  PR fortran/58618
> >  * trans-stmt.c (trans_associate_var): All strings that return
> >  as pointer types can be assigned directly to the associate
> >  name so remove 'attr' and the condition that uses it.
> >
> > 2018-10-17  Paul Thomas  
> >
> >  PR fortran/58618
> >  * gfortran.dg/associate_45.f90 : New test.



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein
Index: gcc/fortran/trans-decl.c
===
*** gcc/fortran/trans-decl.c	(revision 265231)
--- gcc/fortran/trans-decl.c	(working copy)
*** gfc_get_symbol_decl (gfc_symbol * sym)
*** 1762,1768 
gfc_finish_var_decl (length, sym);
if (!sym->attr.associate_var
  	  && TREE_CODE (length) == VAR_DECL
! 	  && sym->value && sym->value->ts.u.cl->length)
  	{
  	  gfc_expr *len = sym->value->ts.u.cl->length;
  	  DECL_INITIAL (length) = gfc_conv_initializer (len, &len->ts,
--- 1762,1769 
gfc_finish_var_decl (length, sym);
if (!sym->attr.associate_var
  	  && TREE_CODE (length) == VAR_DECL
! 	  && sym->value && sym->value->expr_type != EXPR_NULL
! 	  && sym->value->ts.u.cl->length)
  	{
  	  gfc_expr *len = sym->value->ts.u.cl->length;
  	  DECL_INITIAL (length) = gfc_conv_initializer (len, &len->ts,
*** gfc_get_symbol_decl (gfc_symbol * sym)
*** 1772,1778 
  		DECL_INITIAL (length));
  	}
else
! 	gcc_assert (!sym->value);
  }
  
gfc_finish_var_decl (decl, sym);
--- 1773,1779 
  		DECL_INITIAL (length));
  	}
else
! 	gcc_assert (!sym->value || sym->value->expr_type == EXPR_NULL);
  }
  
gfc_finish_var_decl (decl, sym);
Index: gcc/testsuite/gfortran.dg/deferred_character_30.f90
===
*** gcc/testsuite/gfortran.dg/deferred_character_30.f90	(nonexistent)
--- gcc/testsuite/gfortran.dg/deferred_character_30.f90	(working copy)
***
*** 0 
--- 1,9 
+ ! { dg-do compile }
+ !
+ ! Fix a regression introduced by the patch for PR70149.
+ !
+ character (:), pointer :: ptr => NULL() ! The NULL () caused an ICE.
+ character (6), target :: tgt = 'lmnopq'
+ ptr => tgt
+ print *, len (ptr), ptr
+ end


Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread H.J. Lu
On 10/18/18, Richard Sandiford  wrote:
> "H.J. Lu"  writes:
>> On 10/18/18, Richard Sandiford  wrote:
>>> "H.J. Lu"  writes:
 On 10/17/18, Marc Glisse  wrote:
> On Wed, 17 Oct 2018, H.J. Lu wrote:
>
>> We may simplify
>>
>>  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)
>>
>> to X when mode of X is the same as of mode of subreg.
>
> Hello,
>
> we already have code to simplify vec_select(vec_merge):
>
>  /* If we select elements in a vec_merge that all come from the
> same
> operand, select from that operand directly.  */
>
> It would make sense to me to make the subreg transform as similar to
> it
> as
> possible, in particular you don't need to special case vec_duplicate,
> the
> transformation would see that everything comes from the first vector,
> produce (subreg (vec_duplicate X) 0), and let another transformation
> optimize that.
>>>
>>> Sorry, didn't see this before the OK.
>>>
 What do you mean by another transformation? If simplify_subreg doesn't
 return X for

   (subreg (vec_merge (vec_duplicate X)
 (vector)
 (const_int ((1 << N) | M)))
  (N * sizeof (X)))


 no further transformation will be done.
>>>
>>> I think the point was that we should transform:
>>>
>>>   (subreg (vec_merge X
>>>  (vector)
>>>  (const_int ((1 << N) | M)))
>>>   (N * sizeof (X)))
>>>
>>> into:
>>>
>>>   simplify_gen_subreg (outermode, X, innermode, byte)
>>>
>>> which should further simplify when X is a vec_duplicate.
>>
>> But sizeof (X) is the size of scalar of vec_dup.  How do we
>> check the mask of vec_merge?
>
> Yeah, should be sizeof (outermode) (which was the same thing
> in the original pattern, but not here).
>
> Richard
>

Like this

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index b0cf3bbb2a9..e12b5c0e165 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -6601,20 +6601,21 @@ simplify_subreg (machine_mode outermode, rtx op,
   return NULL_RTX;
 }

-  /* Return X for
-  (subreg (vec_merge (vec_duplicate X)
+  /* Simplify
+  (subreg (vec_merge (X)
(vector)
(const_int ((1 << N) | M)))
- (N * sizeof (X)))
+ (N * sizeof (outermode)))
+ to
+  (subreg ((X) (N * sizeof (outermode)))
*/
   unsigned int idx;
   if (constant_multiple_p (byte, GET_MODE_SIZE (outermode), &idx)
   && GET_CODE (op) == VEC_MERGE
-  && GET_CODE (XEXP (op, 0)) == VEC_DUPLICATE
-  && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
+  && GET_MODE_INNER (innermode) == outermode
   && CONST_INT_P (XEXP (op, 2))
   && (UINTVAL (XEXP (op, 2)) & (HOST_WIDE_INT_1U << idx)) != 0)
-return XEXP (XEXP (op, 0), 0);
+return simplify_gen_subreg (outermode, XEXP (op, 0), innermode, byte);

   /* A SUBREG resulting from a zero extension may fold to zero if
  it extracts higher bits that the ZERO_EXTEND's source bits.  */

-- 
H.J.


Re: [Patch, fortran] PR58618 - Wrong code with character substring and ASSOCIATE

2018-10-18 Thread Paul Richard Thomas
Patch for the PR70149 regression committed as revision 265263.

Likewise the patch for PR58618 has been committed as revision 265264.

Cheers

Paul

On Wed, 17 Oct 2018 at 22:17, Tobias Burnus  wrote:
>
> Hi Paul,
>
> Paul Richard Thomas wrote:
> > This problem concerned associate targets being substrings. It turns
> > out that they are returned as pointer types (with a different cast for
> > unity based substrings ***sigh***) and so can be assigned directly to
> > the associate name. The patch quite simply removed the condition that
> > such targets be allocatable, pointer or dummy.
> > I noticed in the course of working up the testcase that
> >  character (:), pointer :: ptr => NULL()
> >  character (6), target :: tgt = 'lmnopq'
> >  ptr => tgt
> >  print *, len (ptr), ptr
> > end
> > ICEs on the NULL initialization of the pointer but works fine if this
> > is removed. Has this already been posted as a PR?
>
>
> I leave it to Dominique to search for a PR; otherwise, I believe the
> attach patch fixes the issue. – It just needs someone to package it with
> a test case, regtest and commit it.
>
>
> > Bootstrapped and regtested on FC28/x86_64 - OK for trunk?
>
> OK – thanks for the fix.
>
> Tobias
>
> > 2018-10-17  Paul Thomas  
> >
> >  PR fortran/58618
> >  * trans-stmt.c (trans_associate_var): All strings that return
> >  as pointer types can be assigned directly to the associate
> >  name so remove 'attr' and the condition that uses it.
> >
> > 2018-10-17  Paul Thomas  
> >
> >  PR fortran/58618
> >  * gfortran.dg/associate_45.f90 : New test.



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


[PATCH] Fix some EVRP stupidness

2018-10-18 Thread Richard Biener


At some point we decided to not simply intersect all ranges we get
via register_edge_assert_for.  Instead we simply register them
in-order.  That causes things like replacing [64, +INF] with ~[0, 0].

The following patch avoids replacing a range with a larger one
as obvious improvement.

Compared to assert_expr based VRP we lack the ability to put down
actual assert_exprs and thus multiple SSA names with ranges we
could link via equivalences.  In the end we need sth similar,
for example by keeping a stack of active ranges for each SSA name.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2018-10-18  Richard Biener  

* gimple-ssa-evrp-analyze.c
(evrp_range_analyzer::record_ranges_from_incoming_edge): Be
smarter about what ranges to use.
* tree-vrp.c (add_assert_info): Dump here.
(register_edge_assert_for_2): Instead of here at multiple but
not all places.

* gcc.dg/tree-ssa/evrp12.c: New testcase.
* gcc.dg/predict-6.c: Adjust.
* gcc.dg/tree-ssa/vrp33.c: Disable EVRP.
* gcc.dg/tree-ssa/vrp02.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.

diff --git a/gcc/gimple-ssa-evrp-analyze.c b/gcc/gimple-ssa-evrp-analyze.c
index e9afa80e191..0748a53cdb8 100644
--- a/gcc/gimple-ssa-evrp-analyze.c
+++ b/gcc/gimple-ssa-evrp-analyze.c
@@ -206,6 +206,17 @@ evrp_range_analyzer::record_ranges_from_incoming_edge 
(basic_block bb)
 ordering issues that can lead to worse ranges.  */
  for (unsigned i = 0; i < vrs.length (); ++i)
{
+ /* But make sure we do not weaken ranges like when
+getting first [64, +INF] and then ~[0, 0] from
+conditions like (s & 0x3cc0) == 0).  */
+ value_range *old_vr = get_value_range (vrs[i].first);
+ value_range tem = *old_vr;
+ tem.equiv = NULL;
+ vrp_intersect_ranges (&tem, vrs[i].second);
+ if (tem.type == old_vr->type
+ && tem.min == old_vr->min
+ && tem.max == old_vr->max)
+   continue;
  push_value_range (vrs[i].first, vrs[i].second);
  if (is_fallthru
  && all_uses_feed_or_dominated_by_stmt (vrs[i].first, stmt))
diff --git a/gcc/testsuite/gcc.dg/predict-6.c b/gcc/testsuite/gcc.dg/predict-6.c
index 5d6fbf158f2..08ce5cdb81d 100644
--- a/gcc/testsuite/gcc.dg/predict-6.c
+++ b/gcc/testsuite/gcc.dg/predict-6.c
@@ -10,9 +10,9 @@ void foo (int base, int bound)
   int i, ret = 0;
   for (i = base; i <= bound; i++)
 {
-  if (i < base)
+  if (i <= base)
global += bar (i);
-  if (i < base + 1)
+  if (i < base + 2)
global += bar (i);
   if (i <= base + 3)
global += bar (i);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
index 0e4407dcbd7..886dc147ad1 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdisable-tree-evrp" } */
 void abort (void);
 int q (void);
 int a[10];
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/evrp12.c 
b/gcc/testsuite/gcc.dg/tree-ssa/evrp12.c
new file mode 100644
index 000..b3906c23465
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/evrp12.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+extern void link_error ();
+
+void
+f3 (unsigned int s)
+{
+  if ((s & 0x3cc0) == 0)
+{
+  if (s >= -15552U)
+   link_error ();
+}
+  else
+{
+  if (s <= 0x3f)
+   link_error ();
+}
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp02.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp02.c
index 8d14feadb6a..4be538f5944 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp02.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp02.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdelete-null-pointer-checks" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdelete-null-pointer-checks 
-fdisable-tree-evrp" } */
 
 struct A
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp33.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp33.c
index 75fefa49925..f1d3863943e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp33.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp33.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fno-tree-fre" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fno-tree-fre -fdisable-tree-evrp" } */
 
 /* This is from PR14052.  */
 
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index cbc2ea2f26b..6f5ec43670e 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -2133,6 +2133,17 @@ add_assert_info (vec &asserts,
   info.val = val;
   info.expr = expr;
   asserts.safe_push (info);
+
+  if (dump_file && (dump

Re: [PATCH] Initial commit of Networking TS implementation

2018-10-18 Thread Renlin Li

Hi Jonathan,

I saw those tests failed to compile on baremetal targets with the following 
error:
```
libstdc++-v3/include/experimental/io_context:45: fatal error: poll.h: No such 
file or directory
```

Should we add a check to prevent it from running on unsupported platforms?

Thanks!
Renlin

On 10/16/2018 05:15 PM, Jonathan Wakely wrote:

On 16/10/18 17:12 +0100, Jonathan Wakely wrote:

On 16/10/18 16:36 +0100, Jonathan Wakely wrote:

On 16/10/18 16:24 +0100, Jonathan Wakely wrote:

On 12/10/18 11:50 +0100, Jonathan Wakely wrote:

This implementation is very incomplete (see the various TODO comments
in the code) but rather than keeping it out of tree any longer I'm
committing it to trunk. This will allow others to experiment with it
and (I hope) work on finishing it. Either way we'll ship somehing for
gcc 9. It works OK for some synchronous operations, but most of the
async ops are not done yet.

* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/experimental/bits/net.h: New header for common
implementation details of Networking TS.
* include/experimental/buffer: New header.
* include/experimental/executor: New header.
* include/experimental/internet: New header.
* include/experimental/io_context: New header.
* include/experimental/net: New header.
* include/experimental/netfwd: New header.
* include/experimental/socket: New header.
* include/experimental/timer: New header.
* testsuite/experimental/net/buffer/arithmetic.cc: New test.
* testsuite/experimental/net/buffer/const.cc: New test.
* testsuite/experimental/net/buffer/creation.cc: New test.
* testsuite/experimental/net/buffer/mutable.cc: New test.
* testsuite/experimental/net/buffer/size.cc: New test.
* testsuite/experimental/net/buffer/traits.cc: New test.
* testsuite/experimental/net/execution_context/use_service.cc: New
test.
* testsuite/experimental/net/headers.cc: New test.
* testsuite/experimental/net/internet/address/v4/comparisons.cc: New
test.
* testsuite/experimental/net/internet/address/v4/cons.cc: New test.
* testsuite/experimental/net/internet/address/v4/creation.cc: New
test.
* testsuite/experimental/net/internet/address/v4/members.cc: New
test.
* testsuite/experimental/net/internet/resolver/base.cc: New test.
* testsuite/experimental/net/internet/resolver/ops/lookup.cc: New
test.
* testsuite/experimental/net/internet/resolver/ops/reverse.cc: New
test.
* testsuite/experimental/net/timer/waitable/cons.cc: New test.
* testsuite/experimental/net/timer/waitable/dest.cc: New test.
* testsuite/experimental/net/timer/waitable/ops.cc: New test.


A minor correction. Committed to trunk.


The tests were written three years ago, before we used effective
targets to control the C++14 dialect used for tests. This fixes them
to use the modern style.


And this makes it a bit more portable (but still a long way from
compiling for mingw).


This fixes a name collision in a test, because various systems (at
least GNU and AIX) define struct ip in .

Tested x86_64-linux and powerpc-aix, committed to trunk.



Re: [PATCH] Initial commit of Networking TS implementation

2018-10-18 Thread Jonathan Wakely

On 18/10/18 12:06 +0100, Renlin Li wrote:

Hi Jonathan,

I saw those tests failed to compile on baremetal targets with the following 
error:
```
libstdc++-v3/include/experimental/io_context:45: fatal error: poll.h: No such 
file or directory
```


That error should be fixed at r265203 but the test will still fail
with a different error.


Should we add a check to prevent it from running on unsupported platforms?


Yes, I've been trying to decide what the right target selector is.

Those tests will fail for baremetal and *-*-mingw* and probably
other targets too.



Re: [PATCH] i386: Enable AVX512 memory broadcast for FMA

2018-10-18 Thread H.J. Lu
On 10/18/18, Uros Bizjak  wrote:
> On Thu, Oct 18, 2018 at 11:11 AM H.J. Lu  wrote:
>>
>> Many AVX512 vector operations can broadcast from a scalar memory source.
>> This patch enables memory broadcast for FMA operations.
>>
>> gcc/
>>
>> PR target/72782
>> * config/i386/sse.md (VF_AVX512): New.
>> (avx512bcst): Likewise.
>> (*fma_fmadd__bcst_1):
>> Likewise.
>> (*fma_fmadd__bcst_2):
>> Likewise.
>> (*fma_fmadd__bcst_3):
>> Likewise.
>>
>> gcc/testsuite/
>>
>> PR target/72782
>> * gcc.target/i386/avx512-fma-1.h: New file.
>> * gcc.target/i386/avx512-fma-2.h: Likewise.
>> * gcc.target/i386/avx512-fma-3.h: Likewise.
>> * gcc.target/i386/avx512-fma-4.h: Likewise.
>> * gcc.target/i386/avx512-fma-5.h: Likewise.
>> * gcc.target/i386/avx512-fma-6.h: Likewise.
>> * gcc.target/i386/avx512-fma-7.h: Likewise.
>> * gcc.target/i386/avx512f-fmadd-df-zmm-1.c: Likewise.
>> * gcc.target/i386/avx512f-fmadd-sf-zmm-1.c: Likewise.
>> * gcc.target/i386/avx512f-fmadd-sf-zmm-2.c: Likewise.
>> * gcc.target/i386/avx512f-fmadd-sf-zmm-3.c: Likewise.
>> * gcc.target/i386/avx512f-fmadd-sf-zmm-4.c: Likewise.
>> * gcc.target/i386/avx512f-fmadd-sf-zmm-5.c: Likewise.
>> * gcc.target/i386/avx512f-fmadd-sf-zmm-6.c: Likewise.
>> * gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Likewise.
>> * gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c: Likewise.
>> * gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c: Likewise.
>> ---
>>  gcc/config/i386/sse.md| 50 +++
>>  gcc/testsuite/gcc.target/i386/avx512-fma-1.h  | 12 +
>>  gcc/testsuite/gcc.target/i386/avx512-fma-2.h  | 13 +
>>  gcc/testsuite/gcc.target/i386/avx512-fma-3.h  | 13 +
>>  gcc/testsuite/gcc.target/i386/avx512-fma-4.h  | 13 +
>>  gcc/testsuite/gcc.target/i386/avx512-fma-5.h  | 13 +
>>  gcc/testsuite/gcc.target/i386/avx512-fma-6.h  | 13 +
>>  gcc/testsuite/gcc.target/i386/avx512-fma-7.h  | 13 +
>>  .../gcc.target/i386/avx512f-fmadd-df-zmm-1.c  | 12 +
>>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-1.c  | 12 +
>>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-2.c  | 12 +
>>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-3.c  | 12 +
>>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-4.c  | 12 +
>>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-5.c  | 12 +
>>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-6.c  | 12 +
>>  .../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c  | 11 
>>  .../gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c | 12 +
>>  .../gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c | 12 +
>>  18 files changed, 259 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-1.h
>>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-2.h
>>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-3.h
>>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-4.h
>>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-5.h
>>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-6.h
>>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-7.h
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512f-fmadd-df-zmm-1.c
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-1.c
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-2.c
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-3.c
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-4.c
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-5.c
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-6.c
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c
>>  create mode 100644
>> gcc/testsuite/gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c
>>
>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>> index 13dc7370fd3..594975a8b80 100644
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -654,6 +654,16 @@
>> (V2DI "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")])
>>  (define_mode_iterator VI48F_256 [V8SI V8SF V4DI V4DF])
>>
>> +(define_mode_iterator VF_AVX512
>> +  [(V4SF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")
>> +   (V8SF "TARGET_AVX512VL") (V4DF "TARGET_AVX512VL")
>> +   (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")])
>
> No need for TARGET_AVX512F conditions, since TARGET_AVX512F is
> baseline for these modes and is expressed in insn condition.

Fixed.

>> +(define_mode_attr avx512bcst
>> +  [(V4SF "%{1to4%}") (V2DF "%{1to2%}")
>> +   (V8SF "%{1to8%}") (V4DF "%{1to4%}")
>> +   (V16SF "%{1to16%}") (V8DF "%{1to8%}")])
>> +
>>  ;; Mapping from float mode to required SSE level
>>  (define_mode_attr sse
>>[(SF "sse") (DF "sse2")
>> @@ -3740

Re: [PATCH][i386] Fix vec_construct cost, remove unused ix86_vec_cost arg

2018-10-18 Thread Jan Hubicka
> 
> The following fixes vec_construct cost calculation to properly consider
> that the inserts will happen to SSE regs thus forgo the multiplication
> done in ix86_vec_cost which is passed the wrong mode.  This gets rid of
> the only call passing false to ix86_vec_cost (so consider the patch
> amended to remove the arg if approved).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK for trunk?

OK, thanks!
> 
> I am considering to make the factor we apply in ix86_vec_cost
> which currently depends on X86_TUNE_AVX128_OPTIMAL and
> X86_TUNE_SSE_SPLIT_REGS part of the actual cost tables since
> the reason we apply them are underlying CPU architecture details.
> Was the original reason of doing the multiplication based on
> those tunings to be able to "share" the same basic cost table
> across architectures that differ in this important detail?

No, just to have fewer entries in the table (since they are rather big
and painful to fill in as they are already)

> I see X86_TUNE_SSE_SPLIT_REGS is only used for m_ATHLON_K8
> and X86_TUNE_AVX128_OPTIMAL is used for m_BDVER, m_BTVER2
> and m_ZNVER1.  Those all have (multiple) exclusive processor_cost_table
> entries.
> 
> As a first step I'd like to remove the use of ix86_vec_cost for
> the entries that already have entries for multiple modes
> (loads and stores) and apply the factor there.  For example
> Zen can do two 128bit loads per cycle but only one 128bit store.

That sounds like a good plan (I think I introduced the entries in cost
table only afer introducing the vec_cost thingy)

> With multiplying AVX256 costs by two we seem to cost sth like
> # instructions to dispatch * instruction latency which is an
> odd thing.  I'd have expected # instructions to dispatch / instruction 
> throughput * instruction latency - so a AVX256 add would cost
> the same as a AVX128 add, likewise for loads but stores would be
> more expensive because of the throughput issue.  This all
> ignores resource utilization across multiple insns but that's
> how the cost model works ...

Yep, cost model simply uses latencies because it originated at a time
CPUs was not parallel.  I know that LLVM backend goes the other way
and uses throughputs only.
Correct thing would be to build the dependence dag and guess how CPU
will schedule but that would be fun to implement at gimple level...

Honza
> 
> Thanks,
> Richard.
> 
> 2018-10-11  Richard Biener  
> 
>   * config/i386/i386.c (ix86_vec_cost): Remove !parallel path
>   and argument.
>   (ix86_builtin_vectorization_cost): For vec_construct properly
>   cost insertion into SSE regs.
>   (...): Adjust calls to ix86_vec_cost.
> 
> Index: gcc/config/i386/i386.c
> ===
> --- gcc/config/i386/i386.c(revision 265022)
> +++ gcc/config/i386/i386.c(working copy)
> @@ -39846,11 +39846,10 @@ ix86_set_reg_reg_cost (machine_mode mode
>  static int
>  ix86_vec_cost (machine_mode mode, int cost, bool parallel)
>  {
> +  gcc_assert (parallel);
>if (!VECTOR_MODE_P (mode))
>  return cost;
> - 
> -  if (!parallel)
> -return cost * GET_MODE_NUNITS (mode);
> +
>if (GET_MODE_BITSIZE (mode) == 128
>&& TARGET_SSE_SPLIT_REGS)
>  return cost * 2;
> @@ -45190,8 +45189,9 @@ ix86_builtin_vectorization_cost (enum ve
>  
>case vec_construct:
>   {
> -   /* N element inserts.  */
> -   int cost = ix86_vec_cost (mode, ix86_cost->sse_op, false);
> +   gcc_assert (VECTOR_MODE_P (mode));
> +   /* N element inserts into SSE vectors.  */
> +   int cost = GET_MODE_NUNITS (mode) * ix86_cost->sse_op;
> /* One vinserti128 for combining two SSE vectors for AVX256.  */
> if (GET_MODE_BITSIZE (mode) == 256)
>   cost += ix86_vec_cost (mode, ix86_cost->addss, true);
> 


Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

2018-10-18 Thread Richard Sandiford
Joseph Myers  writes:
> On Wed, 17 Oct 2018, Richard Sandiford wrote:
>
>> > But as shown in the related discussions, there are other possible features 
>> > that might also involve non-VLA types whose size is not a compile-time 
>> > constant.  And so it's necessary to work with the people interested in 
>> > those features in order to clarify what the underlying concepts ought to 
>> > look like to support different such features.
>> 
>> Could you give pointers to the specific proposals/papers you mean?
>
> They're generally reflector discussions rather than written up as papers, 
> exploring the space of problems and solutions in various areas (including 
> bignums and runtime introspection of types).  I think the first message in 
> those discussions is number 15529 
>  and then relevant 
> discussions continue for much of the next 200 messages or so.

OK, thanks.  I've read from there to the latest message at the time
of writing (15720).  There seemed to be various ideas:

- a new int128_t, which started the discussion off.

- support for parameterised fixed-size integers like _Int(40), which
  seemed to be a C version of C++ template and wouldn't need
  variable-length types.

- bignums that extend as necessary.  On that I agree with what you said in:


A bignum type, in the sense of one that grows its storage if you
store a too-big number in it (as opposed to fixed-width int where
you can specify an arbitrary integer constant expression for N),
cannot meet other requirements for C integer types such as being
directly represented in binary - it has to, effectively, be a fixed
size but contain a pointer to allocated storage (and then there are
considerations of how such a type should handle errors for
allocation failure).

  and Hans Boehm said in:


2) Provide an integral type that is reasonably efficient for small
integers, but gracefully overflows to something along the lines of
(1). A common way to do that in other languages is to represent
e.g. 63-bit integers directly by adding a zero bit on the right.
On overflow a more complex result is represented by e.g. a 64-bit
aligned pointer with the low bit set to one. That way integer
addition is just an add instruction followed by an overflow check in
the normal case. Probably a better way to do integer arithmetic in
many, maybe even most, cases. Especially since such integers need to
be usable as array elements, I don't see how to avoid memory
allocation under the covers, along the slow path.

  This IIRC is how LLVM's APInt is implemented.  It doesn't need
  variable-length types, and although it would need some kind of
  memory management support for C, it doesn't need any language
  changes at all for C++.

  It's also similar to what GCC does with auto_vec and LLVM does
  with SmallVector: the types have embedded room for common cases and
  fall back to separately-allocated storage if the contents get too big.

  There was talk about having it as a true variable-length type in:


(2) is difficult because of the requirements for memory management and
the necessity to deal with allocation failures.

For avoiding integer overflow vulnerabilities, there is a variant of (2)
which is not possible to implement in a library, where expressions are
evaluated with a sufficient number of bits to obtain the mathematically
correct result.  GNAT has implemented something in this direction
(MINIMIZED and ELIMINATED):




I think that for expressions which do not involve shifts by
non-constants, it should be possible to determine the required storage
at compile time, so it would avoid the memory allocation issue.  Unlike
Ada, C doesn't have a power operator, so the storage requirements would
grow with the size of the expression (still under the assumption that
left shifts are excluded).

  But AIUI that was intended to be more special purpose, for
  intermediate results while evaluating an expression.  It solves
  the memory allocation issue because the (stack) memory used for
  evaluating the expression could be recovered after evaluation is
  complete.

  This approach wouldn't work if it was extended to an assignable bignum
  object type.  E.g. prohibiting left shifts wouldn't then help since:

 bignum x = ...;
 x <<= var; // invalid

  would be equivalent to:

 bignum x = ...;
 for (int i = 0; i < var; ++i)
   x += x; // valid

  Thus it would be easy to create what are effectively allocas of O(1<> ...and here is that any size changes come only from changes in the
>> implementation-defined built-in sizeless ty

Re: [PATCH] i386: Enable AVX512 memory broadcast for FMA

2018-10-18 Thread Uros Bizjak
On Thu, Oct 18, 2018 at 1:19 PM H.J. Lu  wrote:

> >> +(define_insn "*fma_fmadd__bcst_1"
> >> +  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
> >> +   (fma:VF_AVX512
> >> + (match_operand:VF_AVX512 1 "nonimmediate_operand" "0,v")
> >> + (match_operand:VF_AVX512 2 "nonimmediate_operand" "v,0")
> >> + (vec_duplicate:VF_AVX512
> >> +   (match_operand: 3 "nonimmediate_operand"
> >> "m,m"]
> >
> > Please note that having "nonimmediate_operand" predicate with "m"
> > constraint will force scalar value that lives in any register to
> > memory. So, scalar value will be pushed from either integer or SSE
> > register to memory, and will be broadcast to SSE register from here. I
> > guess this is not the optimal way, and we still want (eventual movq
> > from integer reg) + broadcast insn in this case.
> >
> > If this predicate is changed to "memory_operand", then only scalars
> > that live in memory will be considered.
>
> Using "memory_operand" causes:
>
> FAIL: gcc.target/i386/avx512f-fmadd-sf-zmm-7.c scan-assembler-times
> vfmadd...ps[ \\t]+[^\n\r]+\\{1to[1-8]+\\}, %zmm[0-9]+, %zmm0 1
> FAIL: gcc.target/i386/avx512f-fmadd-sf-zmm-7.c scan-assembler-not
> vbroadcastss[^\n]*%zmm[0-9]+
>
> __m512
> foo (__m512 x, __m512 y)
> {
>   return _mm512_fmadd_ps (x, y, _mm512_set1_ps (2.f));
> }
>
> Combiner:
>
> Failed to match this instruction:
> (set (reg:V16SF 91)
> (fma:V16SF (reg/v:V16SF 85 [ x ])
> (reg:V16SF 21 xmm1 [ y ])
> (vec_duplicate:V16SF (reg:SF 88

This is expected, there is no memory operand there. Can you check what
prevents combiner from propagating memory into the insn?

Uros.


Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread Richard Sandiford
"H.J. Lu"  writes:
> On 10/18/18, Richard Sandiford  wrote:
>> "H.J. Lu"  writes:
>>> On 10/18/18, Richard Sandiford  wrote:
 "H.J. Lu"  writes:
> On 10/17/18, Marc Glisse  wrote:
>> On Wed, 17 Oct 2018, H.J. Lu wrote:
>>
>>> We may simplify
>>>
>>>  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)
>>>
>>> to X when mode of X is the same as of mode of subreg.
>>
>> Hello,
>>
>> we already have code to simplify vec_select(vec_merge):
>>
>>  /* If we select elements in a vec_merge that all come from the
>> same
>> operand, select from that operand directly.  */
>>
>> It would make sense to me to make the subreg transform as similar to
>> it
>> as
>> possible, in particular you don't need to special case vec_duplicate,
>> the
>> transformation would see that everything comes from the first vector,
>> produce (subreg (vec_duplicate X) 0), and let another transformation
>> optimize that.

 Sorry, didn't see this before the OK.

> What do you mean by another transformation? If simplify_subreg doesn't
> return X for
>
>   (subreg (vec_merge (vec_duplicate X)
>(vector)
>(const_int ((1 << N) | M)))
> (N * sizeof (X)))
>
>
> no further transformation will be done.

 I think the point was that we should transform:

   (subreg (vec_merge X
 (vector)
 (const_int ((1 << N) | M)))
  (N * sizeof (X)))

 into:

   simplify_gen_subreg (outermode, X, innermode, byte)

 which should further simplify when X is a vec_duplicate.
>>>
>>> But sizeof (X) is the size of scalar of vec_dup.  How do we
>>> check the mask of vec_merge?
>>
>> Yeah, should be sizeof (outermode) (which was the same thing
>> in the original pattern, but not here).
>>
>> Richard
>>
>
> Like this
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index b0cf3bbb2a9..e12b5c0e165 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -6601,20 +6601,21 @@ simplify_subreg (machine_mode outermode, rtx op,
>return NULL_RTX;
>  }
>
> -  /* Return X for
> -  (subreg (vec_merge (vec_duplicate X)
> +  /* Simplify
> +  (subreg (vec_merge (X)
> (vector)
> (const_int ((1 << N) | M)))
> - (N * sizeof (X)))
> + (N * sizeof (outermode)))
> + to
> +  (subreg ((X) (N * sizeof (outermode)))

Stray "(": (subreg (X) (N * sizeof (outermode)))

OK with that change if it passes testing.

Thanks,
Richard

> */
>unsigned int idx;
>if (constant_multiple_p (byte, GET_MODE_SIZE (outermode), &idx)
>&& GET_CODE (op) == VEC_MERGE
> -  && GET_CODE (XEXP (op, 0)) == VEC_DUPLICATE
> -  && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
> +  && GET_MODE_INNER (innermode) == outermode
>&& CONST_INT_P (XEXP (op, 2))
>&& (UINTVAL (XEXP (op, 2)) & (HOST_WIDE_INT_1U << idx)) != 0)
> -return XEXP (XEXP (op, 0), 0);
> +return simplify_gen_subreg (outermode, XEXP (op, 0), innermode, byte);
>
>/* A SUBREG resulting from a zero extension may fold to zero if
>   it extracts higher bits that the ZERO_EXTEND's source bits.  */


Re: [PATCH][i386] Fix vec_construct cost, remove unused ix86_vec_cost arg

2018-10-18 Thread Jan Hubicka
> 
> So like the following which removes the use of ix86_vec_cost
> for SSE loads and stores since we have per-mode costs already.
> I've applied the relevant factor to the individual cost tables
> (noting that for X86_TUNE_SSE_SPLIT_REGS we only apply the
> multiplication for size == 128, not size >= 128 ...)
> 
> There's a ??? hunk in inline_memory_move_cost where we
> failed to apply the scaling thus in that place we'd now have
> a behavior change.  Alternatively I could leave the cost
> tables unaltered if that costing part is more critical than
> the vectorizer one.

Changing the behaviour (applying the scale there) seems like
right way to go to me...
> 
> I've also spotted, when reviewing ix86_vec_cost uses, a bug
> in ix86_rtx_cost which keys on SFmode which doesn't work
> for SSE modes, thus use GET_MODE_INNER.
> 
> Also I've changed X86_TUNE_AVX128_OPTIMAL to also apply
> to BTVER1 - everywhere else we glob BTVER1 and BTVER2 so
> this must surely be a omission.

BTVER1 did not have AVX :)
> 
> Honza - is a patch like this OK?

Looks OK to me.  Splitting up individual changes is up to you.
I think it is not that dramatic change so hopefully regressions
won't be that hard to analyze.
> 
> Should I split out individual fixes to make bisection possible?
> 
> Should I update the cost tables or instead change the vectorizer
> costing when considering the inline_memory_move_cost "issue"?

Looks like memory move cost should do the right thing now after your patch?
Having larger loads/stores more expensive seems correct to me.

Patch is OK, without the ??? comment ;)
Honza
> 
> Thanks,
> Richard.
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 0cf4152acb2..f5392232f61 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -39432,6 +39432,7 @@ inline_memory_move_cost (machine_mode mode, enum 
> reg_class regclass,
>int index = sse_store_index (mode);
>if (index == -1)
>   return 100;
> +  /* ??? */
>if (in == 2)
>  return MAX (ix86_cost->sse_load [index], ix86_cost->sse_store 
> [index]);
>return in ? ix86_cost->sse_load [index] : ix86_cost->sse_store [index];
> @@ -40183,7 +40181,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> outer_code_i, int opno,
>  gcc_assert (TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F);
>  
>  *total = ix86_vec_cost (mode,
> - mode == SFmode ? cost->fmass : cost->fmasd,
> + GET_MODE_INNER (mode) == SFmode
> + ? cost->fmass : cost->fmasd,
>   true);
>   *total += rtx_cost (XEXP (x, 1), mode, FMA, 1, speed);
>  
> @@ -45122,18 +45121,14 @@ ix86_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
>   /* See PR82713 - we may end up being called on non-vector type.  */
>   if (index < 0)
> index = 2;
> -return ix86_vec_cost (mode,
> -   COSTS_N_INSNS (ix86_cost->sse_load[index]) / 2,
> -   true);
> +return COSTS_N_INSNS (ix86_cost->sse_load[index]) / 2;
>  
>case vector_store:
>   index = sse_store_index (mode);
>   /* See PR82713 - we may end up being called on non-vector type.  */
>   if (index < 0)
> index = 2;
> -return ix86_vec_cost (mode,
> -   COSTS_N_INSNS (ix86_cost->sse_store[index]) / 2,
> -   true);
> +return COSTS_N_INSNS (ix86_cost->sse_store[index]) / 2;
>  
>case vec_to_scalar:
>case scalar_to_vec:
> @@ -45146,20 +45141,14 @@ ix86_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
>   /* See PR82713 - we may end up being called on non-vector type.  */
>   if (index < 0)
> index = 2;
> -return ix86_vec_cost (mode,
> -   COSTS_N_INSNS
> -  (ix86_cost->sse_unaligned_load[index]) / 2,
> -   true);
> +return COSTS_N_INSNS (ix86_cost->sse_unaligned_load[index]) / 2;
>  
>case unaligned_store:
>   index = sse_store_index (mode);
>   /* See PR82713 - we may end up being called on non-vector type.  */
>   if (index < 0)
> index = 2;
> -return ix86_vec_cost (mode,
> -   COSTS_N_INSNS
> -  (ix86_cost->sse_unaligned_store[index]) / 2,
> -   true);
> +return COSTS_N_INSNS (ix86_cost->sse_unaligned_store[index]) / 2;
>  
>case vector_gather_load:
>  return ix86_vec_cost (mode,
> diff --git a/gcc/config/i386/x86-tune-costs.h 
> b/gcc/config/i386/x86-tune-costs.h
> index 9c8ae0a7841..59d0a8b17d0 100644
> --- a/gcc/config/i386/x86-tune-costs.h
> +++ b/gcc/config/i386/x86-tune-costs.h
> @@ -795,12 +795,12 @@ struct processor_costs athlon_cost = {
>{4, 4},/* cost of storing 

Re: [PATCH] i386: Enable AVX512 memory broadcast for FMA

2018-10-18 Thread H.J. Lu
On 10/18/18, Uros Bizjak  wrote:
> On Thu, Oct 18, 2018 at 1:19 PM H.J. Lu  wrote:
>
>> >> +(define_insn
>> >> "*fma_fmadd__bcst_1"
>> >> +  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
>> >> +   (fma:VF_AVX512
>> >> + (match_operand:VF_AVX512 1 "nonimmediate_operand" "0,v")
>> >> + (match_operand:VF_AVX512 2 "nonimmediate_operand" "v,0")
>> >> + (vec_duplicate:VF_AVX512
>> >> +   (match_operand: 3 "nonimmediate_operand"
>> >> "m,m"]
>> >
>> > Please note that having "nonimmediate_operand" predicate with "m"
>> > constraint will force scalar value that lives in any register to
>> > memory. So, scalar value will be pushed from either integer or SSE
>> > register to memory, and will be broadcast to SSE register from here. I
>> > guess this is not the optimal way, and we still want (eventual movq
>> > from integer reg) + broadcast insn in this case.
>> >
>> > If this predicate is changed to "memory_operand", then only scalars
>> > that live in memory will be considered.
>>
>> Using "memory_operand" causes:
>>
>> FAIL: gcc.target/i386/avx512f-fmadd-sf-zmm-7.c scan-assembler-times
>> vfmadd...ps[ \\t]+[^\n\r]+\\{1to[1-8]+\\}, %zmm[0-9]+, %zmm0 1
>> FAIL: gcc.target/i386/avx512f-fmadd-sf-zmm-7.c scan-assembler-not
>> vbroadcastss[^\n]*%zmm[0-9]+
>>
>> __m512
>> foo (__m512 x, __m512 y)
>> {
>>   return _mm512_fmadd_ps (x, y, _mm512_set1_ps (2.f));
>> }
>>
>> Combiner:
>>
>> Failed to match this instruction:
>> (set (reg:V16SF 91)
>> (fma:V16SF (reg/v:V16SF 85 [ x ])
>> (reg:V16SF 21 xmm1 [ y ])
>> (vec_duplicate:V16SF (reg:SF 88
>
> This is expected, there is no memory operand there. Can you check what
> prevents combiner from propagating memory into the insn?

Failed to match this instruction:
(set (reg:V16SF 91)
(fma:V16SF (reg/v:V16SF 85 [ x ])
(reg:V16SF 21 xmm1 [ y ])
(const_vector:V16SF [
(const_double:SF 2.0e+0 [0x0.8p+2]) repeated x16
])))

I don't know if we want to fix it.

Here is the updated patch with "memory_operand".  OK for trunk?

Thanks.,

-- 
H.J.
From 0606516cfc962b4c41a0f1ca18ba72cae67c22db Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 2 Oct 2018 12:34:40 -0700
Subject: [PATCH] i386: Enable AVX512 memory broadcast for FMA

Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FMA operations.

gcc/

	PR target/72782
	* config/i386/sse.md (VF_AVX512): New.
	(avx512bcst): Likewise.
	(*fma_fmadd__bcst_1):
	Likewise.
	(*fma_fmadd__bcst_2):
	Likewise.
	(*fma_fmadd__bcst_3):
	Likewise.

gcc/testsuite/

	PR target/72782
	* gcc.target/i386/avx512-fma-1.h: New file.
	* gcc.target/i386/avx512-fma-2.h: Likewise.
	* gcc.target/i386/avx512-fma-3.h: Likewise.
	* gcc.target/i386/avx512-fma-4.h: Likewise.
	* gcc.target/i386/avx512-fma-5.h: Likewise.
	* gcc.target/i386/avx512-fma-6.h: Likewise.
	* gcc.target/i386/avx512-fma-7.h: Likewise.
	* gcc.target/i386/avx512-fma-8.h: Likewise.
	* gcc.target/i386/avx512f-fmadd-df-zmm-1.c: Likewise.
	* gcc.target/i386/avx512f-fmadd-sf-zmm-1.c: Likewise.
	* gcc.target/i386/avx512f-fmadd-sf-zmm-2.c: Likewise.
	* gcc.target/i386/avx512f-fmadd-sf-zmm-3.c: Likewise.
	* gcc.target/i386/avx512f-fmadd-sf-zmm-4.c: Likewise.
	* gcc.target/i386/avx512f-fmadd-sf-zmm-5.c: Likewise.
	* gcc.target/i386/avx512f-fmadd-sf-zmm-6.c: Likewise.
	* gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Likewise.
	* gcc.target/i386/avx512f-fmadd-sf-zmm-8.c: Likewise.
	* gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c: Likewise.
	* gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c: Likewise.
---
 gcc/config/i386/sse.md| 50 +++
 gcc/testsuite/gcc.target/i386/avx512-fma-1.h  | 12 +
 gcc/testsuite/gcc.target/i386/avx512-fma-2.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-3.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-4.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-5.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-6.h  | 13 +
 gcc/testsuite/gcc.target/i386/avx512-fma-7.h  | 16 ++
 gcc/testsuite/gcc.target/i386/avx512-fma-8.h  | 13 +
 .../gcc.target/i386/avx512f-fmadd-df-zmm-1.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-1.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-2.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-3.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-4.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-5.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-6.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c  | 12 +
 .../gcc.target/i386/avx512f-fmadd-sf-zmm-8.c  | 12 +
 .../gcc.target/i386/avx512vl-fmadd-sf-xmm-1.c | 12 +
 .../gcc.target/i386/avx512vl-fmadd-sf-ymm-1.c | 12 +
 20 files changed, 288 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-1.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-fma-2.h
 create mode 100644 gc

Re: [PATCH] i386: Enable AVX512 memory broadcast for FMA

2018-10-18 Thread Uros Bizjak
On Thu, Oct 18, 2018 at 1:54 PM H.J. Lu  wrote:
>
> On 10/18/18, Uros Bizjak  wrote:
> > On Thu, Oct 18, 2018 at 1:19 PM H.J. Lu  wrote:
> >
> >> >> +(define_insn
> >> >> "*fma_fmadd__bcst_1"
> >> >> +  [(set (match_operand:VF_AVX512 0 "register_operand" "=v,v")
> >> >> +   (fma:VF_AVX512
> >> >> + (match_operand:VF_AVX512 1 "nonimmediate_operand" "0,v")
> >> >> + (match_operand:VF_AVX512 2 "nonimmediate_operand" "v,0")
> >> >> + (vec_duplicate:VF_AVX512
> >> >> +   (match_operand: 3 "nonimmediate_operand"
> >> >> "m,m"]
> >> >
> >> > Please note that having "nonimmediate_operand" predicate with "m"
> >> > constraint will force scalar value that lives in any register to
> >> > memory. So, scalar value will be pushed from either integer or SSE
> >> > register to memory, and will be broadcast to SSE register from here. I
> >> > guess this is not the optimal way, and we still want (eventual movq
> >> > from integer reg) + broadcast insn in this case.
> >> >
> >> > If this predicate is changed to "memory_operand", then only scalars
> >> > that live in memory will be considered.
> >>
> >> Using "memory_operand" causes:
> >>
> >> FAIL: gcc.target/i386/avx512f-fmadd-sf-zmm-7.c scan-assembler-times
> >> vfmadd...ps[ \\t]+[^\n\r]+\\{1to[1-8]+\\}, %zmm[0-9]+, %zmm0 1
> >> FAIL: gcc.target/i386/avx512f-fmadd-sf-zmm-7.c scan-assembler-not
> >> vbroadcastss[^\n]*%zmm[0-9]+
> >>
> >> __m512
> >> foo (__m512 x, __m512 y)
> >> {
> >>   return _mm512_fmadd_ps (x, y, _mm512_set1_ps (2.f));
> >> }
> >>
> >> Combiner:
> >>
> >> Failed to match this instruction:
> >> (set (reg:V16SF 91)
> >> (fma:V16SF (reg/v:V16SF 85 [ x ])
> >> (reg:V16SF 21 xmm1 [ y ])
> >> (vec_duplicate:V16SF (reg:SF 88
> >
> > This is expected, there is no memory operand there. Can you check what
> > prevents combiner from propagating memory into the insn?
>
> Failed to match this instruction:
> (set (reg:V16SF 91)
> (fma:V16SF (reg/v:V16SF 85 [ x ])
> (reg:V16SF 21 xmm1 [ y ])
> (const_vector:V16SF [
> (const_double:SF 2.0e+0 [0x0.8p+2]) repeated x16
> ])))
>
> I don't know if we want to fix it.
>
> Here is the updated patch with "memory_operand".  OK for trunk?

LGTM.

Thanks,
Uros.


Re: PING^1 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-10-18 Thread Jan Hubicka
> we need to generate
> 
>   vxorp[ds]   %xmmN, %xmmN, %xmmN
>   ...
>   vcvtss2sd   f(%rip), %xmmN, %xmmX
>   ...
>   vcvtsi2ss   i(%rip), %xmmN, %xmmY
>
> to avoid partial XMM register stall.  This patch adds a pass to generate
> a single
> 
>   vxorps  %xmmN, %xmmN, %xmmN
> 
> at function entry, which is shared by all SF and DF conversions, instead
> of generating one
> 
>   vxorp[ds]   %xmmN, %xmmN, %xmmN
> 
> for each SF/DF conversion.
> 
> Performance impacts on SPEC CPU 2017 rate with 1 copy using
> 
> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
> 
> are
> 
> 1. On Broadwell server:
> 
> 500.perlbench_r (-0.82%)
> 502.gcc_r (0.73%)
> 505.mcf_r (-0.24%)
> 520.omnetpp_r (-2.22%)
> 523.xalancbmk_r (-1.47%)
> 525.x264_r (0.31%)
> 531.deepsjeng_r (0.27%)
> 541.leela_r (0.85%)
> 548.exchange2_r (-0.11%)
> 557.xz_r (-0.34%)
> Geomean: (-0.23%)
> 
> 503.bwaves_r (0.00%)
> 507.cactuBSSN_r (-1.88%)
> 508.namd_r (0.00%)
> 510.parest_r (-0.56%)
> 511.povray_r (0.49%)
> 519.lbm_r (-1.28%)
> 521.wrf_r (-0.28%)
> 526.blender_r (0.55%)
> 527.cam4_r (-0.20%)
> 538.imagick_r (2.52%)
> 544.nab_r (-0.18%)
> 549.fotonik3d_r (-0.51%)
> 554.roms_r (-0.22%)
> Geomean: (0.00%)

I wonder why the patch seems to have more effect on specint that should not 
care much
about float<->double conversions?

> number of vxorp[ds]:
> 
> beforeafter   difference
> 14570 4515-69%
> 
> OK for trunk?

This looks very nice though.

+/* At function entry, generate a single
+   vxorps %xmmN, %xmmN, %xmmN
+   for all
+   vcvtss2sd  op, %xmmN, %xmmX
+   vcvtsd2ss  op, %xmmN, %xmmX
+   vcvtsi2ss  op, %xmmN, %xmmX
+   vcvtsi2sd  op, %xmmN, %xmmX
+ */
+
+static unsigned int
+remove_partial_avx_dependency (void)
+{
+  timevar_push (TV_MACH_DEP);
+
+  calculate_dominance_info (CDI_DOMINATORS);
+  df_set_flags (DF_DEFER_INSN_RESCAN);
+  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
+  df_md_add_problem ();
+  df_analyze ();
+
+  basic_block bb;
+  rtx_insn *insn, *set_insn;
+  rtx set;
+  rtx v4sf_const0 = NULL_RTX;
+
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  FOR_BB_INSNS (bb, insn)
+   {
+ if (!NONDEBUG_INSN_P (insn))
+   continue;
+
+ set = single_set (insn);
+ if (set)
+   {
+ machine_mode dest_vecmode, dest_mode;
+ rtx src = SET_SRC (set);
+ rtx dest, vec, zero;
+
+ /* Check for conversions to SF or DF.  */
+ switch (GET_CODE (src))
+   {
+   case FLOAT_TRUNCATE:
+ /* DF -> SF.  */
+ if (GET_MODE (XEXP (src, 0)) != DFmode)
+   continue;
+ /* Fall through.  */
+   case FLOAT_EXTEND:
+ /* SF -> DF.  */
+   case FLOAT:
+ /* SI -> SF, SI -> DF, DI -> SF, DI -> DF.  */
+ dest = SET_DEST (set);
+ dest_mode = GET_MODE (dest);
+ switch (dest_mode)
+   {
+   case E_SFmode:
+ dest_vecmode = V4SFmode;
+ break;
+   case E_DFmode:
+ dest_vecmode = V2DFmode;
+ break;
+   default:
+ continue;
+   }
+
+ if (!TARGET_64BIT
+ && GET_MODE (XEXP (src, 0)) == DImode)
+   continue;
+
+ if (!v4sf_const0)
+   v4sf_const0 = gen_reg_rtx (V4SFmode);
+
+ if (dest_vecmode == V4SFmode)
+   zero = v4sf_const0;
+ else
+   zero = gen_rtx_SUBREG (V2DFmode, v4sf_const0, 0);
+
+ /* Change source to vector mode.  */
+ src = gen_rtx_VEC_DUPLICATE (dest_vecmode, src);
+ src = gen_rtx_VEC_MERGE (dest_vecmode, src, zero,
+  GEN_INT (HOST_WIDE_INT_1U));
+ /* Change destination to vector mode.  */
+ vec = gen_reg_rtx (dest_vecmode);
+ /* Generate a XMM vector SET.  */
+ set = gen_rtx_SET (vec, src);
+ set_insn = emit_insn_before (set, insn);
+ df_insn_rescan (set_insn);
+
+ src = gen_rtx_SUBREG (dest_mode, vec, 0);
+ set = gen_rtx_SET (dest, src);
+
+ /* Drop possible dead definitions.  */
+ PATTERN (insn) = set;
+
+ INSN_CODE (insn) = -1;
+ recog_memoized (insn);
+ df_insn_rescan (insn);
+ break;
+
+   default:
+ break;
+   }
+   }
+   }
+}
+
+  if (v4sf_const0)
+{
+  /* Generate a single vxorps at function entry and preform df
+rescan. */
+  bb = ENT

Re: [PATCH] Fix some EVRP stupidness

2018-10-18 Thread Richard Biener
On Thu, 18 Oct 2018, Richard Biener wrote:

> 
> At some point we decided to not simply intersect all ranges we get
> via register_edge_assert_for.  Instead we simply register them
> in-order.  That causes things like replacing [64, +INF] with ~[0, 0].
> 
> The following patch avoids replacing a range with a larger one
> as obvious improvement.
> 
> Compared to assert_expr based VRP we lack the ability to put down
> actual assert_exprs and thus multiple SSA names with ranges we
> could link via equivalences.  In the end we need sth similar,
> for example by keeping a stack of active ranges for each SSA name.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Actually not.  Needed to update to the new value_range class and after
that (and its introduction of ->check()) we now ICE during bootstrap
with

during GIMPLE pass: evrp
/space/rguenther/src/svn/trunk/libgcc/config/libbid/bid128_div.c: In 
function ‘get_BID128’:
/space/rguenther/src/svn/trunk/libgcc/config/libbid/bid128_div.c:1851:1: 
internal compiler error: in check, at tree-vrp.c:155
 1851 | }
  | ^
0xf3a8b5 value_range::check()
/space/rguenther/src/svn/trunk/gcc/tree-vrp.c:155
0xf42424 value_range::value_range(value_range_kind, tree_node*, 
tree_node*, bitmap_head*)
/space/rguenther/src/svn/trunk/gcc/tree-vrp.c:110
0xf42424 set_value_range_with_overflow
/space/rguenther/src/svn/trunk/gcc/tree-vrp.c:1422
0xf42424 extract_range_from_binary_expr_1(value_range*, tree_code, 
tree_node*, value_range const*, value_range const*)
/space/rguenther/src/svn/trunk/gcc/tree-vrp.c:1679

for a PLUS_EXPR of [12254, expon_43] and [-1, -1] yielding
(temporarily!) [12254, -1] before supposed to be adjusted by the
symbolic bound:

  /* Adjust the range for possible overflow.  */
  set_value_range_with_overflow (*vr, expr_type,
 wmin, wmax, min_ovf, max_ovf);
^^^ ICE
  if (vr->varying_p ())
return;

  /* Build the symbolic bounds if needed.  */
  min = vr->min ();
  max = vr->max ();
  adjust_symbolic_bound (min, code, expr_type,
 sym_min_op0, sym_min_op1,
 neg_min_op0, neg_min_op1);
  adjust_symbolic_bound (max, code, expr_type,
 sym_max_op0, sym_max_op1,
 neg_max_op0, neg_max_op1);
  type = vr->kind ();

I think the refactoring that was applied here is simply not suitable
because *vr is _not_ necessarily a valid range before the symbolic
bounds have been adjusted.  A fix would be sth like the following
which I am going to test now.

Richard.

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 40d40e5e2fe..c5748a43246 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -1328,7 +1328,7 @@ combine_bound (enum tree_code code, wide_int &wi, 
wi::overflow_type &ovf,
underflow.  +1 indicates overflow.  0 indicates neither.  */
 
 static void
-set_value_range_with_overflow (value_range &vr,
+set_value_range_with_overflow (value_range_kind &kind, tree &min, tree 
&max,
   tree type,
   const wide_int &wmin, const wide_int &wmax,
   wi::overflow_type min_ovf,
@@ -1341,7 +1341,7 @@ set_value_range_with_overflow (value_range &vr,
  range covers all values.  */
   if (prec == 1 && wi::lt_p (wmax, wmin, sgn))
 {
-  set_value_range_to_varying (&vr);
+  kind = VR_VARYING;
   return;
 }
 
@@ -1357,13 +1357,15 @@ set_value_range_with_overflow (value_range &vr,
 the entire range.  We have a similar check at the end of
 extract_range_from_binary_expr_1.  */
  if (wi::gt_p (tmin, tmax, sgn))
-   vr.set_varying ();
+   kind = VR_VARYING;
  else
-   /* No overflow or both overflow or underflow.  The
-  range kind stays VR_RANGE.  */
-   vr = value_range (VR_RANGE,
- wide_int_to_tree (type, tmin),
- wide_int_to_tree (type, tmax));
+   {
+ kind = VR_RANGE;
+ /* No overflow or both overflow or underflow.  The
+range kind stays VR_RANGE.  */
+ min = wide_int_to_tree (type, tmin);
+ max = wide_int_to_tree (type, tmax);
+   }
  return;
}
   else if ((min_ovf == wi::OVF_UNDERFLOW && max_ovf == wi::OVF_NONE)
@@ -1384,18 +1386,18 @@ set_value_range_with_overflow (value_range &vr,
 types values.  */
  if (covers || wi::cmp (tmin, tmax, sgn) > 0)
{
- set_value_range_to_varying (&vr);
+ kind = VR_VARYING;
  return;
}
- vr = value_range (VR_ANTI_RANGE,
-   wide_int_to_tree (type, tmin),
-   wide_int_to_tree (type, t

Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread H.J. Lu
On 10/18/18, Richard Sandiford  wrote:
> "H.J. Lu"  writes:
>> On 10/18/18, Richard Sandiford  wrote:
>>> "H.J. Lu"  writes:
 On 10/18/18, Richard Sandiford  wrote:
> "H.J. Lu"  writes:
>> On 10/17/18, Marc Glisse  wrote:
>>> On Wed, 17 Oct 2018, H.J. Lu wrote:
>>>
 We may simplify

  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)

 to X when mode of X is the same as of mode of subreg.
>>>
>>> Hello,
>>>
>>> we already have code to simplify vec_select(vec_merge):
>>>
>>>  /* If we select elements in a vec_merge that all come from the
>>> same
>>> operand, select from that operand directly.  */
>>>
>>> It would make sense to me to make the subreg transform as similar to
>>> it
>>> as
>>> possible, in particular you don't need to special case
>>> vec_duplicate,
>>> the
>>> transformation would see that everything comes from the first
>>> vector,
>>> produce (subreg (vec_duplicate X) 0), and let another transformation
>>> optimize that.
>
> Sorry, didn't see this before the OK.
>
>> What do you mean by another transformation? If simplify_subreg
>> doesn't
>> return X for
>>
>>   (subreg (vec_merge (vec_duplicate X)
>>   (vector)
>>   (const_int ((1 << N) | M)))
>>(N * sizeof (X)))
>>
>>
>> no further transformation will be done.
>
> I think the point was that we should transform:
>
>   (subreg (vec_merge X
>(vector)
>(const_int ((1 << N) | M)))
> (N * sizeof (X)))
>
> into:
>
>   simplify_gen_subreg (outermode, X, innermode, byte)
>
> which should further simplify when X is a vec_duplicate.

 But sizeof (X) is the size of scalar of vec_dup.  How do we
 check the mask of vec_merge?
>>>
>>> Yeah, should be sizeof (outermode) (which was the same thing
>>> in the original pattern, but not here).
>>>
>>> Richard
>>>
>>
>> Like this
>>
>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
>> index b0cf3bbb2a9..e12b5c0e165 100644
>> --- a/gcc/simplify-rtx.c
>> +++ b/gcc/simplify-rtx.c
>> @@ -6601,20 +6601,21 @@ simplify_subreg (machine_mode outermode, rtx op,
>>return NULL_RTX;
>>  }
>>
>> -  /* Return X for
>> -  (subreg (vec_merge (vec_duplicate X)
>> +  /* Simplify
>> +  (subreg (vec_merge (X)
>> (vector)
>> (const_int ((1 << N) | M)))
>> - (N * sizeof (X)))
>> + (N * sizeof (outermode)))
>> + to
>> +  (subreg ((X) (N * sizeof (outermode)))
>
> Stray "(": (subreg (X) (N * sizeof (outermode)))
>
> OK with that change if it passes testing.

The self-test failed for 32-bit compiler:

expected: (reg:QI 342)
  actual: (subreg:QI (vec_merge:V128QI (vec_duplicate:V128QI (reg:QI 342))
(reg:V128QI 343)
(const_int 65 [0x41])) 64)

since

&& (UINTVAL (XEXP (op, 2)) & (HOST_WIDE_INT_1U << idx)) != 0)

works only up to vectors with 64 elements for 32-bit compilers.
Should we limit the
self-test to vectors with 64 elements?

-- 
H.J.


Re: [patch] allow target config to state r18 is fixed on aarch64

2018-10-18 Thread Sam Tebbs




On 10/12/2018 07:43 PM, Olivier Hainque wrote:

On 12 Oct 2018, at 05:50, Kyrill Tkachov  wrote:

CC'ing the aarch64 maintainers as they'll have to approve it.
I'm guessing you've tested this in the usual way (bootstrap and test)?

Sorry, I failed to mention the testing indeed. We don't
have a native box at hand, so I'm not really able to
conduct regular bootstrap and dg testing per se.


Hi Olivier,

I managed to run a bootstrap build with your patch applied on one of our 
aarch64 machines, so there shouldn't be any problems in that regard.


Regards,
Sam


Re: [PATCH][i386] Fix vec_construct cost, remove unused ix86_vec_cost arg

2018-10-18 Thread Richard Biener
On Thu, 18 Oct 2018, Jan Hubicka wrote:

> > 
> > So like the following which removes the use of ix86_vec_cost
> > for SSE loads and stores since we have per-mode costs already.
> > I've applied the relevant factor to the individual cost tables
> > (noting that for X86_TUNE_SSE_SPLIT_REGS we only apply the
> > multiplication for size == 128, not size >= 128 ...)
> > 
> > There's a ??? hunk in inline_memory_move_cost where we
> > failed to apply the scaling thus in that place we'd now have
> > a behavior change.  Alternatively I could leave the cost
> > tables unaltered if that costing part is more critical than
> > the vectorizer one.
> 
> Changing the behaviour (applying the scale there) seems like
> right way to go to me...
> > 
> > I've also spotted, when reviewing ix86_vec_cost uses, a bug
> > in ix86_rtx_cost which keys on SFmode which doesn't work
> > for SSE modes, thus use GET_MODE_INNER.
> > 
> > Also I've changed X86_TUNE_AVX128_OPTIMAL to also apply
> > to BTVER1 - everywhere else we glob BTVER1 and BTVER2 so
> > this must surely be a omission.
> 
> BTVER1 did not have AVX :)

Yeah - noticed that as well ;)

> > 
> > Honza - is a patch like this OK?
> 
> Looks OK to me.  Splitting up individual changes is up to you.
> I think it is not that dramatic change so hopefully regressions
> won't be that hard to analyze.

I'll commit the ix86_rtx_costs change separately from the load/store
cost changes just to make bisection easier.

> > 
> > Should I split out individual fixes to make bisection possible?
> > 
> > Should I update the cost tables or instead change the vectorizer
> > costing when considering the inline_memory_move_cost "issue"?
> 
> Looks like memory move cost should do the right thing now after your patch?
> Having larger loads/stores more expensive seems correct to me.

Yes.

> Patch is OK, without the ??? comment ;)

Thanks,
Richard.

> Honza
> > 
> > Thanks,
> > Richard.
> > 
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 0cf4152acb2..f5392232f61 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -39432,6 +39432,7 @@ inline_memory_move_cost (machine_mode mode, enum 
> > reg_class regclass,
> >int index = sse_store_index (mode);
> >if (index == -1)
> > return 100;
> > +  /* ??? */
> >if (in == 2)
> >  return MAX (ix86_cost->sse_load [index], ix86_cost->sse_store 
> > [index]);
> >return in ? ix86_cost->sse_load [index] : ix86_cost->sse_store 
> > [index];
> > @@ -40183,7 +40181,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code_i, int opno,
> >  gcc_assert (TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F);
> >  
> >  *total = ix86_vec_cost (mode,
> > -   mode == SFmode ? cost->fmass : cost->fmasd,
> > +   GET_MODE_INNER (mode) == SFmode
> > +   ? cost->fmass : cost->fmasd,
> > true);
> > *total += rtx_cost (XEXP (x, 1), mode, FMA, 1, speed);
> >  
> > @@ -45122,18 +45121,14 @@ ix86_builtin_vectorization_cost (enum 
> > vect_cost_for_stmt type_of_cost,
> > /* See PR82713 - we may end up being called on non-vector type.  */
> > if (index < 0)
> >   index = 2;
> > -return ix86_vec_cost (mode,
> > - COSTS_N_INSNS (ix86_cost->sse_load[index]) / 2,
> > - true);
> > +return COSTS_N_INSNS (ix86_cost->sse_load[index]) / 2;
> >  
> >case vector_store:
> > index = sse_store_index (mode);
> > /* See PR82713 - we may end up being called on non-vector type.  */
> > if (index < 0)
> >   index = 2;
> > -return ix86_vec_cost (mode,
> > - COSTS_N_INSNS (ix86_cost->sse_store[index]) / 2,
> > - true);
> > +return COSTS_N_INSNS (ix86_cost->sse_store[index]) / 2;
> >  
> >case vec_to_scalar:
> >case scalar_to_vec:
> > @@ -45146,20 +45141,14 @@ ix86_builtin_vectorization_cost (enum 
> > vect_cost_for_stmt type_of_cost,
> > /* See PR82713 - we may end up being called on non-vector type.  */
> > if (index < 0)
> >   index = 2;
> > -return ix86_vec_cost (mode,
> > - COSTS_N_INSNS
> > -(ix86_cost->sse_unaligned_load[index]) / 2,
> > - true);
> > +return COSTS_N_INSNS (ix86_cost->sse_unaligned_load[index]) / 2;
> >  
> >case unaligned_store:
> > index = sse_store_index (mode);
> > /* See PR82713 - we may end up being called on non-vector type.  */
> > if (index < 0)
> >   index = 2;
> > -return ix86_vec_cost (mode,
> > - COSTS_N_INSNS
> > -(ix86_cost->sse_unaligned_store[index]) / 2,
> > - true);
> > +return COSTS_N_INSNS (ix86_cost->sse_unaligned_store[index]) / 2;
> >  
> >case 

Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

2018-10-18 Thread Richard Sandiford
Joseph Myers  writes:
> On Wed, 17 Oct 2018, Richard Sandiford wrote:
>> Yeah, can't deny that if you look at it as a general-purpose extension.
>> But that's not really what this is supposed to be.  It's fairly special
>> purpose: there has to be some underlying variable-length/sizeless
>> built-in type that you want to provide via a library.
>> 
>> What the extension allows is enough to support the intended use case,
>> and it does that with no enforced overhead.
>
> Part of my point is that there are various *other* possible cases of 
> non-VLA-variable-size-type people have suggested in WG14 reflector 
> discussions - so any set of concepts for such types ought to take into 
> account more than just the SVE use case (even if other use cases need 
> further concepts added on top of the ones needed for SVE).

[Answered this in the other thread -- sorry, took me a while to go
through the full discussion.]

>> > Surely, the processor knows the size when it computes using these
>> > types, so one could make it available using 'sizeof'.
>> 
>> The argument's similar here: we don't really need sizeof to be available
>> for vector use because the library provides easy ways of getting
>> vector-length-based constants.  Usually what you want to know is
>> "how many elements of type X are there?", with bytes just being one
>> of the available element sizes.
>
> But if having sizeof available makes for a more natural language feature 
> (one where a few places referencing VLAs need to change to reference a 
> more general class of variable-size types, and a few constraints on VLAs 
> and variably modified types need to be relaxed to allow what you want with 
> these types), that may be a case for doing so, even if sizeof won't 
> generally be used.

I agree that might be all that's needed in C.  But since C++ doesn't
even have VLAs yet (and since something less ambituous than VLAs was
rejected) the situation is very different there.

I think we'd need a compelling reason to make sizeof variable in C++.
The fact that it isn't going to be generally used for SVE anyway
would undercut that.

> If the processor in fact knows the size, do you actually need to include 
> it in the object to be able to provide it when sizeof is called?  (With 
> undefined behavior still present if passing the object from a thread with 
> one value of sizeof for that type to a thread with a different value of 
> sizeof for that type, of course - the rule on VLA type compatibility would 
> still need to be extended to apply to sizes of these types, and those they 
> contain, recursively.)

No, if we go the undefined behaviour route, we wouldn't need to store it.
This was just to answer Martin's suggestion that we could make sizeof(x)
do the right thing for a sizeless object x by storing the size with x.

Thanks,
Richard


[PATCH] Add splay-tree "view" for bitmap

2018-10-18 Thread Richard Biener

PR63155 made me pick up this old work from Steven, it turns our
linked-list implementation to a two-mode one with one being a
splay tree featuring O(log N) complexity for find/remove.

Over Stevens original patch I added a bitmap_tree_to_vec helper
that I use from the debug/print methods to avoid changing view
there.  In theory the bitmap iterator could get a "stack"
as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.

This can be used to fix the two biggest bottlenecks in the PRs
testcase, namely SSA propagator worklist handling and out-of-SSA
coalesce list building.  perf shows the following data, first
unpatched, second patched - also watch the thrid coulumn (samples)
when comparing percentages.

-O0
-   18.19%17.35%   407  cc1  cc1   [.] bitmap_set_b▒
   - bitmap_set_bit▒
  + 8.77% create_coalesce_list_for_region  ▒
  + 4.21% calculate_live_ranges▒
  + 2.02% build_ssa_conflict_graph ▒
  + 1.66% insert_phi_nodes_for ▒
  + 0.86% coalesce_ssa_name  
patched:
-   12.39%10.48%   129  cc1  cc1   [.] bitmap_set_b▒
   - bitmap_set_bit▒
  + 5.27% calculate_live_ranges▒
  + 2.76% insert_phi_nodes_for ▒
  + 1.90% create_coalesce_list_for_region  ▒
  + 1.63% build_ssa_conflict_graph ▒
  + 0.35% coalesce_ssa_name   

-O1
-   17.53%17.53%   842  cc1  cc1   [.] bitmap_set_b▒
   - bitmap_set_bit▒
  + 12.39% add_ssa_edge▒
  + 1.48% create_coalesce_list_for_region  ▒
  + 0.82% solve_constraints▒
  + 0.71% calculate_live_ranges▒
  + 0.64% add_implicit_graph_edge  ▒
  + 0.41% insert_phi_nodes_for ▒
  + 0.34% build_ssa_conflict_graph  
patched:
-5.79% 5.00%   167  cc1  cc1   [.] bitmap_set_b▒
   - bitmap_set_bit▒
  + 1.41% add_ssa_edge ▒
  + 0.88% calculate_live_ranges▒
  + 0.75% add_implicit_graph_edge  ▒
  + 0.68% solve_constraints▒
  + 0.48% insert_phi_nodes_for ▒
  + 0.45% build_ssa_conflict_graph   

-O3
-   12.37%12.34%  1145  cc1  cc1   [.] bitmap_set_b▒
   - bitmap_set_bit▒
  + 9.14% add_ssa_edge ▒
  + 0.80% create_coalesce_list_for_region  ▒
  + 0.69% add_implicit_graph_edge  ▒
  + 0.54% solve_constraints▒
  + 0.34% calculate_live_ranges▒
  + 0.27% insert_phi_nodes_for ▒
  + 0.21% build_ssa_conflict_graph 
-4.36% 3.86%   227  cc1  cc1   [.] bitmap_set_b▒
   - bitmap_set_bit▒
  + 0.98% add_ssa_edge ▒
  + 0.86% add_implicit_graph_edge  ▒
  + 0.64% solve_constraints▒
  + 0.57% calculate_live_ranges▒
  + 0.32% build_ssa_conflict_graph ▒
  + 0.29% mark_all_vars_used_1 ▒
  + 0.20% insert_phi_nodes_for ▒
  + 0.16% create_coalesce_list_for_region 


Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Any objections?

Thanks,
Richard.

2018-10-18  Steven Bosscher 
Richard Biener  

* bitmap.h: Update data structure documentation, including a
description of bitmap views as either linked-lists or splay trees.
(struct bitmap_element_def

Re: [patch] allow target config to state r18 is fixed on aarch64

2018-10-18 Thread Olivier Hainque


> On 18 Oct 2018, at 14:14, Sam Tebbs  wrote:
> 
> 
> 
> On 10/12/2018 07:43 PM, Olivier Hainque wrote:
>>> On 12 Oct 2018, at 05:50, Kyrill Tkachov  
>>> wrote:
>>> 
>>> CC'ing the aarch64 maintainers as they'll have to approve it.
>>> I'm guessing you've tested this in the usual way (bootstrap and test)?
>> Sorry, I failed to mention the testing indeed. We don't
>> have a native box at hand, so I'm not really able to
>> conduct regular bootstrap and dg testing per se.
> 
> Hi Olivier,
> 
> I managed to run a bootstrap build with your patch applied on one of our 
> aarch64 machines, so there shouldn't be any problems in that regard.

Great ! Thanks a lot Sam ! Much appreciated.

I'm currently building our vxworks port on a gcc-8 source
base and will then most probably retain the static approach
instead of going runtime.

The only difference there would be wrt to this part
is the use of the macro within called_used_regs[] as well,
part of what we discussed with Kyrill.

I'll reapply my testing, and with the one you just made on
top, I think the syntax / logic of the change is straightforward
enough to have confidence.

With Kind Regards,

Olivier





Re: [PATCH v2 1/3] or1k: libgcc: initial support for openrisc

2018-10-18 Thread Sebastian Huber

Hello,

is there a chance to get the or1k support integrated before the GCC 9 
stage 3?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [patch] allow target config to state r18 is fixed on aarch64

2018-10-18 Thread Olivier Hainque


> On 18 Oct 2018, at 15:10, Olivier Hainque  wrote:
> 
> The only difference there would be wrt to this part
> is the use of the macro within called_used_regs[] as well,
> part of what we discussed with Kyrill.

Ah, no, call_used[r18] is 1 currently.

Will give this some thought ...






Re: [patch] allow target config to state r18 is fixed on aarch64

2018-10-18 Thread Olivier Hainque
Hi Kyrill,

> On 16 Oct 2018, at 18:33, Kyrill Tkachov  wrote:
> 
>> I'm happy to move that part to aarch64_conditional_register_usage
>> if that's considered more canonical of course.
> 
> I don't think it's more canonical, and it is a run-time thing, whereas your 
> patch changes things
> at configure time, so there's no runtime overhead.

Ok.

>> It seems like I might need to set call_used_registers to 1 as well.
>> 
>> STATIC_CHAIN_REGNUM still needs to be adjusted directly I think.
> 
> I think so too, so you'd still need to have these configure-time changes.
> If we could make it all runtime that would be clean, but perhaps it's not 
> worth
> splicing the two approaches.

Agreed. I also thought of triggering the effect of -ffixed-r18
from within one of the vxworks early hooks, but this wouldn't prevent
the use as a static chain AFAICS. Interesting ...

>> I wondered if we could set it to R11 unconditionally and picked
>> the way ensuring no change for !vxworks ports, especially since I
>> don't have means to test more than what I described above.
>> 
>> We're working on a transition to gcc-8 and I can certainly port this
>> and the rest rapidly to verify that we get similar results on the
>> more recent code base.
> 
> So, do you still want to make this change in current trunk? Or will you make 
> the necessary changes
> when contributing the vxworks port?

I'm working on the port and will re propose
the change as part of that soon, hopefully in the
forthcoming days.

It was very useful to have early feedback on the approach,
thanks !

Olivier



Re: [PATCH] Add splay-tree "view" for bitmap

2018-10-18 Thread Richard Sandiford
Richard Biener  writes:
> PR63155 made me pick up this old work from Steven, it turns our
> linked-list implementation to a two-mode one with one being a
> splay tree featuring O(log N) complexity for find/remove.
>
> Over Stevens original patch I added a bitmap_tree_to_vec helper
> that I use from the debug/print methods to avoid changing view
> there.  In theory the bitmap iterator could get a "stack"
> as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
>
> This can be used to fix the two biggest bottlenecks in the PRs
> testcase, namely SSA propagator worklist handling and out-of-SSA
> coalesce list building.  perf shows the following data, first
> unpatched, second patched - also watch the thrid coulumn (samples)
> when comparing percentages.
>
> -O0
> -   18.19%17.35%   407  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 8.77% create_coalesce_list_for_region 
>  ▒
>   + 4.21% calculate_live_ranges   
>  ▒
>   + 2.02% build_ssa_conflict_graph
>  ▒
>   + 1.66% insert_phi_nodes_for
>  ▒
>   + 0.86% coalesce_ssa_name  
> patched:
> -   12.39%10.48%   129  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 5.27% calculate_live_ranges   
>  ▒
>   + 2.76% insert_phi_nodes_for
>  ▒
>   + 1.90% create_coalesce_list_for_region 
>  ▒
>   + 1.63% build_ssa_conflict_graph
>  ▒
>   + 0.35% coalesce_ssa_name   
>
> -O1
> -   17.53%17.53%   842  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 12.39% add_ssa_edge   
>  ▒
>   + 1.48% create_coalesce_list_for_region 
>  ▒
>   + 0.82% solve_constraints   
>  ▒
>   + 0.71% calculate_live_ranges   
>  ▒
>   + 0.64% add_implicit_graph_edge 
>  ▒
>   + 0.41% insert_phi_nodes_for
>  ▒
>   + 0.34% build_ssa_conflict_graph  
> patched:
> -5.79% 5.00%   167  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 1.41% add_ssa_edge
>  ▒
>   + 0.88% calculate_live_ranges   
>  ▒
>   + 0.75% add_implicit_graph_edge 
>  ▒
>   + 0.68% solve_constraints   
>  ▒
>   + 0.48% insert_phi_nodes_for
>  ▒
>   + 0.45% build_ssa_conflict_graph   
>
> -O3
> -   12.37%12.34%  1145  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 9.14% add_ssa_edge
>  ▒
>   + 0.80% create_coalesce_list_for_region 
>  ▒
>   + 0.69% add_implicit_graph_edge 
>  ▒
>   + 0.54% solve_constraints   
>  ▒
>   + 0.34% calculate_live_ranges   
>  ▒
>   + 0.27% insert_phi_nodes_for
>  ▒
>   + 0.21% build_ssa_conflict_graph 
> -4.36% 3.86%   227  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 0.98% add_ssa_edge
>  ▒
>   + 0.86% add_implicit_graph_edge 
>  ▒
>   + 0.64% solve_constraints   
>  ▒
>   + 0.57% calculate_live_ranges   
>  ▒
>   + 0.32% build_ssa_conflict_graph
>  ▒
>   + 0.29% mark_all_vars_used_1
>  ▒
>   + 0.20% insert_phi_nodes_for
>  ▒
>   + 0.16% create_coalesce_list_for_region 
>
>
> Bootstrapped on x86_64-unknown-lin

Re: [PATCH] Add splay-tree "view" for bitmap

2018-10-18 Thread Richard Biener
On Thu, 18 Oct 2018, Richard Sandiford wrote:

> Richard Biener  writes:
> > PR63155 made me pick up this old work from Steven, it turns our
> > linked-list implementation to a two-mode one with one being a
> > splay tree featuring O(log N) complexity for find/remove.
> >
> > Over Stevens original patch I added a bitmap_tree_to_vec helper
> > that I use from the debug/print methods to avoid changing view
> > there.  In theory the bitmap iterator could get a "stack"
> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> >
> > This can be used to fix the two biggest bottlenecks in the PRs
> > testcase, namely SSA propagator worklist handling and out-of-SSA
> > coalesce list building.  perf shows the following data, first
> > unpatched, second patched - also watch the thrid coulumn (samples)
> > when comparing percentages.
> >
> > -O0
> > -   18.19%17.35%   407  cc1  cc1   [.] 
> > bitmap_set_b▒
> >- bitmap_set_bit 
> >▒
> >   + 8.77% create_coalesce_list_for_region   
> >▒
> >   + 4.21% calculate_live_ranges 
> >▒
> >   + 2.02% build_ssa_conflict_graph  
> >▒
> >   + 1.66% insert_phi_nodes_for  
> >▒
> >   + 0.86% coalesce_ssa_name  
> > patched:
> > -   12.39%10.48%   129  cc1  cc1   [.] 
> > bitmap_set_b▒
> >- bitmap_set_bit 
> >▒
> >   + 5.27% calculate_live_ranges 
> >▒
> >   + 2.76% insert_phi_nodes_for  
> >▒
> >   + 1.90% create_coalesce_list_for_region   
> >▒
> >   + 1.63% build_ssa_conflict_graph  
> >▒
> >   + 0.35% coalesce_ssa_name   
> >
> > -O1
> > -   17.53%17.53%   842  cc1  cc1   [.] 
> > bitmap_set_b▒
> >- bitmap_set_bit 
> >▒
> >   + 12.39% add_ssa_edge 
> >▒
> >   + 1.48% create_coalesce_list_for_region   
> >▒
> >   + 0.82% solve_constraints 
> >▒
> >   + 0.71% calculate_live_ranges 
> >▒
> >   + 0.64% add_implicit_graph_edge   
> >▒
> >   + 0.41% insert_phi_nodes_for  
> >▒
> >   + 0.34% build_ssa_conflict_graph  
> > patched:
> > -5.79% 5.00%   167  cc1  cc1   [.] 
> > bitmap_set_b▒
> >- bitmap_set_bit 
> >▒
> >   + 1.41% add_ssa_edge  
> >▒
> >   + 0.88% calculate_live_ranges 
> >▒
> >   + 0.75% add_implicit_graph_edge   
> >▒
> >   + 0.68% solve_constraints 
> >▒
> >   + 0.48% insert_phi_nodes_for  
> >▒
> >   + 0.45% build_ssa_conflict_graph   
> >
> > -O3
> > -   12.37%12.34%  1145  cc1  cc1   [.] 
> > bitmap_set_b▒
> >- bitmap_set_bit 
> >▒
> >   + 9.14% add_ssa_edge  
> >▒
> >   + 0.80% create_coalesce_list_for_region   
> >▒
> >   + 0.69% add_implicit_graph_edge   
> >▒
> >   + 0.54% solve_constraints 
> >▒
> >   + 0.34% calculate_live_ranges 
> >▒
> >   + 0.27% insert_phi_nodes_for  
> >▒
> >   + 0.21% build_ssa_conflict_graph 
> > -4.36% 3.86%   227  cc1  cc1   [.] 
> > bitmap_set_b▒
> >- bitmap_set_bit 
> >▒
> >   + 0.98% add_ssa_edge  
> >▒
> >   + 0.86% add_implicit_graph_edge   
> >▒
> >   + 0.64% solve_constraints 
> >▒
> >   + 0.57% calculate_live_ranges 
> >▒
> >   + 0.32% build_ssa_conflict_graph  
> >▒
> 

Re: [PATCH] Add splay-tree "view" for bitmap

2018-10-18 Thread David Malcolm
On Thu, 2018-10-18 at 15:09 +0200, Richard Biener wrote:
> PR63155 made me pick up this old work from Steven, it turns our
> linked-list implementation to a two-mode one with one being a
> splay tree featuring O(log N) complexity for find/remove.
> 
> Over Stevens original patch I added a bitmap_tree_to_vec helper
> that I use from the debug/print methods to avoid changing view
> there.  In theory the bitmap iterator could get a "stack"
> as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> 
> This can be used to fix the two biggest bottlenecks in the PRs
> testcase, namely SSA propagator worklist handling and out-of-SSA
> coalesce list building.  perf shows the following data, first
> unpatched, second patched - also watch the thrid coulumn (samples)
> when comparing percentages.
> 
[...snip...]

> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Any objections?
> 
> Thanks,
> Richard.
> 
> 2018-10-18  Steven Bosscher 
>   Richard Biener  
> 
>   * bitmap.h: Update data structure documentation, including a
>   description of bitmap views as either linked-lists or splay
> trees.

[...snip...]

>From a "correctness" perspective, we have some existing unit-test
coverage for bitmap via selftests in bitmap.c.  Perhaps those tests
could be generalized to verify that the two different implementations
work, and that the conversions work correctly?

e.g. currently we have:

static void
test_clear_bit_in_middle ()
{
  bitmap b = bitmap_gc_alloc ();

  /* Set b to [100..200].  */
  bitmap_set_range (b, 100, 100);
  ASSERT_EQ (100, bitmap_count_bits (b));

  /* Clear a bit in the middle.  */
  bool changed = bitmap_clear_bit (b, 150);
  ASSERT_TRUE (changed);
  ASSERT_EQ (99, bitmap_count_bits (b));
  ASSERT_TRUE (bitmap_bit_p (b, 149));
  ASSERT_FALSE (bitmap_bit_p (b, 150));
  ASSERT_TRUE (bitmap_bit_p (b, 151));
}

Maybe this could change to:

static void
test_clear_bit_in_middle ()
{
  bitmap b = bitmap_gc_alloc ();

  FOR_EACH_BITMAP_IMPL (b)
{
  /* Set b to [100..200].  */
  bitmap_set_range (b, 100, 100);
  ASSERT_EQ (100, bitmap_count_bits (b));
}

  bool first_time = true;
  /* Clear a bit in the middle.  */
  FOR_EACH_BITMAP_IMPL (b)
{
  if (first_time)
{
  bool changed = bitmap_clear_bit (b, 150);
  ASSERT_TRUE (changed);
  first_time = false;
}
  ASSERT_EQ (99, bitmap_count_bits (b));
  ASSERT_TRUE (bitmap_bit_p (b, 149));
  ASSERT_FALSE (bitmap_bit_p (b, 150));
  ASSERT_TRUE (bitmap_bit_p (b, 151));
}
}

...or somesuch, where maybe FOR_EACH_BITMAP_IMPL (b) could try linked-
list, then splay tree, then linked-list, converting "b" as it goes.  
This would hopefully give us a lot of test coverage for the various
operations in both modes, and for the conversion routines (in both
directions, assuming that both directions are supported).

Hope this is constructive
Dave


[PATCH] PR libstdc++/87641 correctly initialize accumulator in valarray::sum()

2018-10-18 Thread Jonathan Wakely

Use the value of the first element as the initial value of the
__valarray_sum accumulator. Value-initialization might not create the
additive identity for the value type.

Make a similar change to __valarray_product even though it's only ever
used internally with a value_type of size_t.

PR libstdc++/87641
* include/bits/valarray_array.h (__valarray_sum): Use first element
to initialize accumulator instead of value-initializing it.
(__valarray_product<_Tp>): Move to ...
* src/c++98/valarray.cc (__valarray_product<_Tp>): Here. Use first
element to initialize accumulator.
(__valarray_product(const valarray&)): Remove const_cast made
unnecessary by LWG 389.
* testsuite/26_numerics/valarray/87641.cc: New test.

Tested powerpc64le-linux, committed to trunk.

This seems perfectly safe so I'll backport this to the branches too,
without the changes to __valarray_product.

commit 7b9d4da1fcf6ea3edc85ee960da1c1135614622c
Author: Jonathan Wakely 
Date:   Thu Oct 18 12:32:39 2018 +0100

PR libstdc++/87641 correctly initialize accumulator in valarray::sum()

Use the value of the first element as the initial value of the
__valarray_sum accumulator. Value-initialization might not create the
additive identity for the value type.

Make a similar change to __valarray_product even though it's only ever
used internally with a value_type of size_t.

PR libstdc++/87641
* include/bits/valarray_array.h (__valarray_sum): Use first element
to initialize accumulator instead of value-initializing it.
(__valarray_product<_Tp>): Move to ...
* src/c++98/valarray.cc (__valarray_product<_Tp>): Here. Use first
element to initialize accumulator.
(__valarray_product(const valarray&)): Remove const_cast 
made
unnecessary by LWG 389.
* testsuite/26_numerics/valarray/87641.cc: New test.

diff --git a/libstdc++-v3/include/bits/valarray_array.h 
b/libstdc++-v3/include/bits/valarray_array.h
index 6759d6003e9..2dd1ec836ac 100644
--- a/libstdc++-v3/include/bits/valarray_array.h
+++ b/libstdc++-v3/include/bits/valarray_array.h
@@ -338,33 +338,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   //
-  // Compute the sum of elements in range [__f, __l)
+  // Compute the sum of elements in range [__f, __l) which must not be empty.
   // This is a naive algorithm.  It suffers from cancelling.
-  // In the future try to specialize
-  // for _Tp = float, double, long double using a more accurate
-  // algorithm.
+  // In the future try to specialize for _Tp = float, double, long double
+  // using a more accurate algorithm.
   //
   template
 inline _Tp
 __valarray_sum(const _Tp* __f, const _Tp* __l)
 {
-  _Tp __r = _Tp();
+  _Tp __r = *__f++;
   while (__f != __l)
__r += *__f++;
   return __r;
 }
 
-  // Compute the product of all elements in range [__f, __l)
-  template
-inline _Tp
-__valarray_product(const _Tp* __f, const _Tp* __l)
-{
-  _Tp __r = _Tp(1);
-  while (__f != __l)
-   __r = __r * *__f++;
-  return __r;
-}
-
   // Compute the min/max of an array-expression
   template
 inline typename _Ta::value_type
diff --git a/libstdc++-v3/src/c++98/valarray.cc 
b/libstdc++-v3/src/c++98/valarray.cc
index 3cec1843be2..284db21e81c 100644
--- a/libstdc++-v3/src/c++98/valarray.cc
+++ b/libstdc++-v3/src/c++98/valarray.cc
@@ -45,15 +45,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template size_t valarray::size() const;
   template size_t& valarray::operator[](size_t);
 
+  // Compute the product of all elements in the non-empty range [__f, __l)
+  template
+inline _Tp
+__valarray_product(const _Tp* __f, const _Tp* __l)
+{
+  _Tp __r = *__f++;
+  while (__f != __l)
+   __r = __r * *__f++;
+  return __r;
+}
+
   inline size_t
   __valarray_product(const valarray& __a)
   {
-const size_t __n = __a.size();
-// XXX: This ugly cast is necessary because
-//  valarray::operator[]() const return a VALUE!
-//  Try to get the committee to correct that gross error.
-valarray& __t = const_cast&>(__a);
-return __valarray_product(&__t[0], &__t[0] + __n);
+return __valarray_product(&__a[0], &__a[0] + __a.size());
   }
 
   // Map a gslice, described by its multidimensional LENGTHS
diff --git a/libstdc++-v3/testsuite/26_numerics/valarray/87641.cc 
b/libstdc++-v3/testsuite/26_numerics/valarray/87641.cc
new file mode 100644
index 000..eae5440e60b
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/valarray/87641.cc
@@ -0,0 +1,75 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; eith

Re: [PATCH] Fix some EVRP stupidness

2018-10-18 Thread Aldy Hernandez




On 10/18/18 8:11 AM, Richard Biener wrote:

On Thu, 18 Oct 2018, Richard Biener wrote:



At some point we decided to not simply intersect all ranges we get
via register_edge_assert_for.  Instead we simply register them
in-order.  That causes things like replacing [64, +INF] with ~[0, 0].

The following patch avoids replacing a range with a larger one
as obvious improvement.

Compared to assert_expr based VRP we lack the ability to put down
actual assert_exprs and thus multiple SSA names with ranges we
could link via equivalences.  In the end we need sth similar,
for example by keeping a stack of active ranges for each SSA name.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.


Actually not.  Needed to update to the new value_range class and after
that (and its introduction of ->check()) we now ICE during bootstrap
with

during GIMPLE pass: evrp
/space/rguenther/src/svn/trunk/libgcc/config/libbid/bid128_div.c: In
function ‘get_BID128’:
/space/rguenther/src/svn/trunk/libgcc/config/libbid/bid128_div.c:1851:1:
internal compiler error: in check, at tree-vrp.c:155
  1851 | }
   | ^
0xf3a8b5 value_range::check()
 /space/rguenther/src/svn/trunk/gcc/tree-vrp.c:155
0xf42424 value_range::value_range(value_range_kind, tree_node*,
tree_node*, bitmap_head*)
 /space/rguenther/src/svn/trunk/gcc/tree-vrp.c:110
0xf42424 set_value_range_with_overflow
 /space/rguenther/src/svn/trunk/gcc/tree-vrp.c:1422
0xf42424 extract_range_from_binary_expr_1(value_range*, tree_code,
tree_node*, value_range const*, value_range const*)
 /space/rguenther/src/svn/trunk/gcc/tree-vrp.c:1679

for a PLUS_EXPR of [12254, expon_43] and [-1, -1] yielding
(temporarily!) [12254, -1] before supposed to be adjusted by the
symbolic bound:

   /* Adjust the range for possible overflow.  */
   set_value_range_with_overflow (*vr, expr_type,
  wmin, wmax, min_ovf, max_ovf);
^^^ ICE
   if (vr->varying_p ())
 return;

   /* Build the symbolic bounds if needed.  */
   min = vr->min ();
   max = vr->max ();
   adjust_symbolic_bound (min, code, expr_type,
  sym_min_op0, sym_min_op1,
  neg_min_op0, neg_min_op1);
   adjust_symbolic_bound (max, code, expr_type,
  sym_max_op0, sym_max_op1,
  neg_max_op0, neg_max_op1);
   type = vr->kind ();

I think the refactoring that was applied here is simply not suitable
because *vr is _not_ necessarily a valid range before the symbolic
bounds have been adjusted.  A fix would be sth like the following
which I am going to test now.


Sounds reasonable.

Is this PR87640?  Because the testcase there is also crashing while 
creating the range right before adjusting the symbolics.


Thanks for looking at this.
Aldy


[PATCH] Improve -dumpversion and -dumpfullversion documentation

2018-10-18 Thread Jonathan Wakely

* doc/invoke.texi (-dumpversion): Improve grammar.
(-dumpfullversion): Make more consistent with -dumpversion.

OK for trunk?


commit 67e1782be13b180e537fcf56aa041cd199b38ae9
Author: Jonathan Wakely 
Date:   Thu Oct 18 16:40:00 2018 +0100

Improve -dumpversion and -dumpfullversion documentation

* doc/invoke.texi (-dumpversion): Improve grammar.
(-dumpfullversion): Make more consistent with -dumpversion.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bf8bcfb2907..57491f1033c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14860,16 +14860,16 @@ Print the compiler's target machine (for example,
 @item -dumpversion
 @opindex dumpversion
 Print the compiler version (for example, @code{3.0}, @code{6.3.0} or 
@code{7})---and don't do
-anything else.  This is the compiler version used in filesystem paths,
-specs, can be depending on how the compiler has been configured just
-a single number (major version), two numbers separated by dot (major and
+anything else.  This is the compiler version used in filesystem paths and
+specs. Depending on how the compiler has been configured it can be just
+a single number (major version), two numbers separated by a dot (major and
 minor version) or three numbers separated by dots (major, minor and patchlevel
 version).
 
 @item -dumpfullversion
 @opindex dumpfullversion
-Print the full compiler version, always 3 numbers separated by dots,
-major, minor and patchlevel version.
+Print the full compiler version---and don't do anything else. The output is
+always three numbers separated by dots, major, minor and patchlevel version.
 
 @item -dumpspecs
 @opindex dumpspecs


[committed] Fix ICE in substring-handling building 502.gcc_r (PR 87562)

2018-10-18 Thread David Malcolm
In r264887 I broke the build of 502.gcc_r due to an ICE.
The ICE occurs when generating a location for an sprintf warning within
a string literal, where the sprintf call is in a macro.

The root cause is a bug in the original commit of substring locations
(r239175).  get_substring_ranges_for_loc has code to handle the case
where the string literal is in a very long source line that exceeds the
length that the current linemap can represent: the start of the token
is in one line map, but then another line map is started, and the end
of the token is in the new linemap.  get_substring_ranges_for_loc handles
this by using the linemap of the end-point when building location_t
values within the string.  When extracting the linemap for the endpoint
in r239175 I erroneously used LRK_MACRO_EXPANSION_POINT, which should
have instead been LRK_SPELLING_LOCATION.

I believe this bug was dormant due to rejecting macro locations earlier
in the function, but in r264887 I allowed some macro locations in order
to deal with locations coming from the C++ lexer, and this uncovered
the bug: if a string literal was defined in a macro, locations within
the string literal would be looked up using the linemap of the expansion
point of the macro, rather than of the spelling point.  This would lead
to garbage location_t values, and, depending on the precise line numbers
of the two locations, an assertion failure (which was causing the build
failure in 502.gcc_r).

This patch fixes the bug by using LRK_SPELLING_LOCATION, and adds some
bulletproofing to the "two linemaps" case.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu
(g++.sum gained 5 PASS results; gcc.sum gained 3 PASS results).
I also verified that this fixes the build of 502.gcc_r.

Committed to trunk as r265271.

gcc/ChangeLog:
PR tree-optimization/87562
* input.c (get_substring_ranges_for_loc): Use
LRK_SPELLING_LOCATION rather than LRK_MACRO_EXPANSION_POINT when
getting the linemap for the endpoint.  Verify that it's either
in the same linemap as the start point's spelling location, or
at least in the same file.

gcc/testsuite/ChangeLog:
PR tree-optimization/87562
* c-c++-common/substring-location-PR-87562-1-a.h: New file.
* c-c++-common/substring-location-PR-87562-1-b.h: New file.
* c-c++-common/substring-location-PR-87562-1.c: New test.
* gcc.dg/plugin/diagnostic-test-string-literals-1.c: Add test for
PR 87562.
* gcc.dg/plugin/pr87562-a.h: New file.
* gcc.dg/plugin/pr87562-b.h: New file.
---
 gcc/input.c| 10 +-
 .../c-c++-common/substring-location-PR-87562-1-a.h |  7 +++
 .../c-c++-common/substring-location-PR-87562-1-b.h |  0
 .../c-c++-common/substring-location-PR-87562-1.c   | 15 ++
 .../plugin/diagnostic-test-string-literals-1.c | 23 ++
 gcc/testsuite/gcc.dg/plugin/pr87562-a.h|  7 +++
 gcc/testsuite/gcc.dg/plugin/pr87562-b.h|  0
 7 files changed, 61 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/substring-location-PR-87562-1-a.h
 create mode 100644 gcc/testsuite/c-c++-common/substring-location-PR-87562-1-b.h
 create mode 100644 gcc/testsuite/c-c++-common/substring-location-PR-87562-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/pr87562-a.h
 create mode 100644 gcc/testsuite/gcc.dg/plugin/pr87562-b.h

diff --git a/gcc/input.c b/gcc/input.c
index eeeb11e..57a1a3c 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1457,9 +1457,17 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
 halfway through the token.
 Ensure that the loc_reader uses the linemap of the
 *end* of the token for its start location.  */
+  const line_map_ordinary *start_ord_map;
+  linemap_resolve_location (line_table, src_range.m_start,
+   LRK_SPELLING_LOCATION, &start_ord_map);
   const line_map_ordinary *final_ord_map;
   linemap_resolve_location (line_table, src_range.m_finish,
-   LRK_MACRO_EXPANSION_POINT, &final_ord_map);
+   LRK_SPELLING_LOCATION, &final_ord_map);
+  /* Bulletproofing.  We ought to only have different ordinary maps
+for start vs finish due to line-length jumps.  */
+  if (start_ord_map != final_ord_map
+ && start_ord_map->to_file != final_ord_map->to_file)
+ return "start and finish are spelled in different ordinary maps";
   location_t start_loc
= linemap_position_for_line_and_column (line_table, final_ord_map,
start.line, start.column);
diff --git a/gcc/testsuite/c-c++-common/substring-location-PR-87562-1-a.h 
b/gcc/testsuite/c-c++-common/substring-location-PR-87562-1-a.h
new file mode 100644
index 000..369c9d0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/substring-location-PR-87562-1-a.h

Re: [PATCH] Fix some EVRP stupidness

2018-10-18 Thread Richard Biener
On October 18, 2018 5:42:56 PM GMT+02:00, Aldy Hernandez  
wrote:
>
>
>On 10/18/18 8:11 AM, Richard Biener wrote:
>> On Thu, 18 Oct 2018, Richard Biener wrote:
>> 
>>>
>>> At some point we decided to not simply intersect all ranges we get
>>> via register_edge_assert_for.  Instead we simply register them
>>> in-order.  That causes things like replacing [64, +INF] with ~[0,
>0].
>>>
>>> The following patch avoids replacing a range with a larger one
>>> as obvious improvement.
>>>
>>> Compared to assert_expr based VRP we lack the ability to put down
>>> actual assert_exprs and thus multiple SSA names with ranges we
>>> could link via equivalences.  In the end we need sth similar,
>>> for example by keeping a stack of active ranges for each SSA name.
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to
>trunk.
>> 
>> Actually not.  Needed to update to the new value_range class and
>after
>> that (and its introduction of ->check()) we now ICE during bootstrap
>> with
>> 
>> during GIMPLE pass: evrp
>> /space/rguenther/src/svn/trunk/libgcc/config/libbid/bid128_div.c: In
>> function ‘get_BID128’:
>>
>/space/rguenther/src/svn/trunk/libgcc/config/libbid/bid128_div.c:1851:1:
>> internal compiler error: in check, at tree-vrp.c:155
>>   1851 | }
>>| ^
>> 0xf3a8b5 value_range::check()
>>  /space/rguenther/src/svn/trunk/gcc/tree-vrp.c:155
>> 0xf42424 value_range::value_range(value_range_kind, tree_node*,
>> tree_node*, bitmap_head*)
>>  /space/rguenther/src/svn/trunk/gcc/tree-vrp.c:110
>> 0xf42424 set_value_range_with_overflow
>>  /space/rguenther/src/svn/trunk/gcc/tree-vrp.c:1422
>> 0xf42424 extract_range_from_binary_expr_1(value_range*, tree_code,
>> tree_node*, value_range const*, value_range const*)
>>  /space/rguenther/src/svn/trunk/gcc/tree-vrp.c:1679
>> 
>> for a PLUS_EXPR of [12254, expon_43] and [-1, -1] yielding
>> (temporarily!) [12254, -1] before supposed to be adjusted by the
>> symbolic bound:
>> 
>>/* Adjust the range for possible overflow.  */
>>set_value_range_with_overflow (*vr, expr_type,
>>   wmin, wmax, min_ovf,
>max_ovf);
>> ^^^ ICE
>>if (vr->varying_p ())
>>  return;
>> 
>>/* Build the symbolic bounds if needed.  */
>>min = vr->min ();
>>max = vr->max ();
>>adjust_symbolic_bound (min, code, expr_type,
>>   sym_min_op0, sym_min_op1,
>>   neg_min_op0, neg_min_op1);
>>adjust_symbolic_bound (max, code, expr_type,
>>   sym_max_op0, sym_max_op1,
>>   neg_max_op0, neg_max_op1);
>>type = vr->kind ();
>> 
>> I think the refactoring that was applied here is simply not suitable
>> because *vr is _not_ necessarily a valid range before the symbolic
>> bounds have been adjusted.  A fix would be sth like the following
>> which I am going to test now.
>
>Sounds reasonable.

Doesn't work and miscompiles all over the place. 

>Is this PR87640?  Because the testcase there is also crashing while 
>creating the range right before adjusting the symbolics.

Might be. 

I'll poke some more tomorrow unless you beat me to it. 

Richard. 

>Thanks for looking at this.
>Aldy



Re: [PATCH] v2: Run selftests for C++ as well as C

2018-10-18 Thread David Malcolm
On Thu, 2018-10-18 at 07:59 +0200, Eric Botcazou wrote:
> > Thanks; I've committed this to trunk as r265240.
> 
> You modified gcc-interface/Make-lang.in without ChangeLog entry.

Oops.  I've double-checked my ChangeLog-writing script [1], and it did
generate an entry for ada, but I believe I accidentally deleted it when
creating the patch.

I've added the missing ChangeLog entry as r265272 (using yesterday's
date, since that's when I committed the change).

Sorry about the error
Dave

[1] 
https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/generate-changelog.py
fwiw


[gomp5] Add support for 5 new combined constructs for tasking

2018-10-18 Thread Jakub Jelinek
Hi!

OpenMP 5.0 adds a couple of new combined constructs, mainly to save typing
when using tasking.
In particular
#pragma omp parallel master
#pragma omp master taskloop
#pragma omp master taskloop simd
#pragma omp parallel master taskloop
#pragma omp parallel master taskloop simd
Tested on x86_64-linux, committed to gomp-5_0-branch.

2018-10-18  Jakub Jelinek  

* gimplify.c (enum omp_region_type): Add ORT_TASKLOOP and
ORT_UNTIED_TASKLOOP.
(omp_default_clause): Print "taskloop" rather than "task" if
ORT_*TASKLOOP.
(gimplify_scan_omp_clauses): Add shared clause on parallel for
combined parallel master taskloop{, simd} if taskloop has
firstprivate, lastprivate or reduction clause.
(gimplify_omp_for): Likewise.  Use ORT_TASKLOOP or
ORT_UNTIED_TASKLOOP instead of ORT_TASK or ORT_UNTIED_TASK.
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Add support for combined
#pragma omp parallel master and
#pragma omp {,parallel }master taskloop{, simd} constructs.
gcc/c/
* c-parser.c (c_parser_omp_taskloop): Add forward declaration.
Disallow in_reduction clause when combined with parallel master.
(c_parser_omp_master): Add p_name, mask and cclauses arguments.
Allow to be called while parsing combined parallel master.
Parse combined master taskloop{, simd}.
(c_parser_omp_parallel): Parse combined
parallel master{, taskloop{, simd}} constructs.
(c_parser_omp_construct) : Adjust
c_parser_omp_master caller.
gcc/cp/
* parser.c (cp_parser_omp_taskloop): Add forward declaration.
Disallow in_reduction clause when combined with parallel master.
(cp_parser_omp_master): Add p_name, mask and cclauses arguments.
Allow to be called while parsing combined parallel master.
Parse combined master taskloop{, simd}.
(cp_parser_omp_parallel): Parse combined
parallel master{, taskloop{, simd}} constructs.
(cp_parser_omp_construct) : Adjust
c_parser_omp_master caller.
gcc/testsuite/
* c-c++-common/gomp/clauses-1.c (foo): Add ntm argument and
test if and nontemporal clauses on constructs with simd.
(bar): Add ntm and i3 arguments.  Test if and nontemporal clauses
on constructs with simd.  Change if clauses on some constructs from
specific to the particular constituents to one without a modifier.
Add new tests for combined host teams and for new parallel master
and {,parallel }master taskloop{, simd} combined constructs.
(baz): New function with host teams tests.
* c-c++-common/gomp/default-1.c: New test.
* c-c++-common/gomp/master-combined-1.c: New test.
* c-c++-common/gomp/master-combined-2.c: New test.
libgomp/
* testsuite/libgomp.c-c++-common/master-combined-1.c: New test.
* testsuite/libgomp.c-c++-common/taskloop-reduction-3.c: New test.
* testsuite/libgomp.c-c++-common/taskloop-reduction-4.c: New test.

--- gcc/gimplify.c.jj   2018-10-16 16:12:25.461419030 +0200
+++ gcc/gimplify.c  2018-10-18 11:09:56.180590210 +0200
@@ -130,6 +130,8 @@ enum omp_region_type
 
   ORT_TASK = 0x10,
   ORT_UNTIED_TASK = ORT_TASK | 1,
+  ORT_TASKLOOP  = ORT_TASK | 2,
+  ORT_UNTIED_TASKLOOP = ORT_UNTIED_TASK | 2,
 
   ORT_TEAMS= 0x20,
   ORT_COMBINED_TEAMS = ORT_TEAMS | 1,
@@ -6992,6 +6994,8 @@ omp_default_clause (struct gimplify_omp_
 
if (ctx->region_type & ORT_PARALLEL)
  rtype = "parallel";
+   else if ((ctx->region_type & ORT_TASKLOOP) == ORT_TASKLOOP)
+ rtype = "taskloop";
else if (ctx->region_type & ORT_TASK)
  rtype = "task";
else if (ctx->region_type & ORT_TEAMS)
@@ -8976,6 +8980,31 @@ gimplify_scan_omp_clauses (tree *list_p,
  " or private in outer context", DECL_NAME (decl));
}
do_notice:
+ if ((region_type & ORT_TASKLOOP) == ORT_TASKLOOP
+ && outer_ctx
+ && outer_ctx->region_type == ORT_COMBINED_PARALLEL
+ && (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+ || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE
+ || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE))
+   {
+ splay_tree_node on
+   = splay_tree_lookup (outer_ctx->variables,
+(splay_tree_key)decl);
+ if (on == NULL || (on->value & GOVD_DATA_SHARE_CLASS) == 0)
+   {
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+ && TREE_CODE (OMP_CLAUSE_DECL (c)) == MEM_REF
+ && (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+ || (TREE_CODE (TREE_TYPE (decl)) == REFERENCE_TYPE
+ && (TREE_CODE (TREE_TYPE (TREE_TYPE (decl)))
+ == POINTER_TYPE
+   

Re: [Patc, fortran] PR85603 - ICE with character array substring assignment

2018-10-18 Thread Paul Richard Thomas
It turned out that this patch did not quite complete the job (Thanks
Walt): The ICE has gone but reallocation on assignment is not
occurring because the correct string length for the rhs expression was
not being picked up. The fix for this took rather more detective work
than I anticipated but here it is.

Bootstraps and regtests on FC28/x86_64 - OK for trunk?

Cheers

Paul

2018-10-18  Paul Thomas  

PR fortran/85603
* frontend-passes.c (get_len_call): New function to generate a
call to intrinsic LEN.
(create_var): Use this to make length expressions for variable
rhs string lengths.
Clean up some white space issues.

2018-10-18  Paul Thomas  

PR fortran/85603
* gfortran.dg/deferred_character_23.f90 : Check reallocation is
occurring as it should..

On Sat, 22 Sep 2018 at 11:23, Paul Richard Thomas
 wrote:
>
> Yet another 'obvious' deferred character fix. Committed to trunk as
> r264502. Will backport in about ten days time.
>
> Paul
>
> 2018-09-22  Paul Thomas  
>
> PR fortran/85603
> * trans-array.c (gfc_alloc_allocatable_for_assignment): Test
> the charlen backend_decl before using the VAR_P macro.
>
> 2018-09-22  Paul Thomas  
>
> PR fortran/85603
> * gfortran.dg/deferred_character_23.f90 : New test.



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein
Index: gcc/fortran/frontend-passes.c
===
*** gcc/fortran/frontend-passes.c	(revision 265262)
--- gcc/fortran/frontend-passes.c	(working copy)
*** realloc_string_callback (gfc_code **c, i
*** 280,286 
  	   && (expr2->expr_type != EXPR_OP
  	   || expr2->value.op.op != INTRINSIC_CONCAT))
  return 0;
!   
if (!gfc_check_dependency (expr1, expr2, true))
  return 0;
  
--- 280,286 
  	   && (expr2->expr_type != EXPR_OP
  	   || expr2->value.op.op != INTRINSIC_CONCAT))
  return 0;
! 
if (!gfc_check_dependency (expr1, expr2, true))
  return 0;
  
*** insert_block ()
*** 704,709 
--- 704,744 
return ns;
  }
  
+ 
+ /* Insert a call to the intrinsic len. Use a different name for
+the symbol tree so we don't run into trouble when the user has
+renamed len for some reason.  */
+ 
+ static gfc_expr*
+ get_len_call (gfc_expr *str)
+ {
+   gfc_expr *fcn;
+   gfc_actual_arglist *actual_arglist;
+ 
+   fcn = gfc_get_expr ();
+   fcn->expr_type = EXPR_FUNCTION;
+   fcn->value.function.isym = gfc_intrinsic_function_by_id (GFC_ISYM_LEN);
+   actual_arglist = gfc_get_actual_arglist ();
+   actual_arglist->expr = str;
+ 
+   fcn->value.function.actual = actual_arglist;
+   fcn->where = str->where;
+   fcn->ts.type = BT_INTEGER;
+   fcn->ts.kind = gfc_charlen_int_kind;
+ 
+   gfc_get_sym_tree ("__internal_len", current_ns, &fcn->symtree, false);
+   fcn->symtree->n.sym->ts = fcn->ts;
+   fcn->symtree->n.sym->attr.flavor = FL_PROCEDURE;
+   fcn->symtree->n.sym->attr.function = 1;
+   fcn->symtree->n.sym->attr.elemental = 1;
+   fcn->symtree->n.sym->attr.referenced = 1;
+   fcn->symtree->n.sym->attr.access = ACCESS_PRIVATE;
+   gfc_commit_symbol (fcn->symtree->n.sym);
+ 
+   return fcn;
+ }
+ 
+ 
  /* Returns a new expression (a variable) to be used in place of the old one,
 with an optional assignment statement before the current statement to set
 the value of the variable. Creates a new BLOCK for the statement if that
*** create_var (gfc_expr * e, const char *vn
*** 786,791 
--- 821,828 
length = constant_string_length (e);
if (length)
  	symbol->ts.u.cl->length = length;
+   else if (e->expr_type == EXPR_VARIABLE && e->ts.u.cl->length)
+ 	symbol->ts.u.cl->length = get_len_call (gfc_copy_expr (e));
else
  	{
  	  symbol->attr.allocatable = 1;
*** traverse_io_block (gfc_code *code, bool
*** 1226,1232 
  	{
  	  /* Check for (a(i,i), i=1,3).  */
  	  int j;
! 	  
  	  for (j=0; jvar->symtree == start->symtree)
  		  return false;
--- 1263,1269 
  	{
  	  /* Check for (a(i,i), i=1,3).  */
  	  int j;
! 
  	  for (j=0; jvar->symtree == start->symtree)
  		  return false;
*** traverse_io_block (gfc_code *code, bool
*** 1286,1292 
  		  || var_in_expr (var, iters[j]->end)
  		  || var_in_expr (var, iters[j]->step)))
  		  return false;
! 	}		  
  	}
  }
  
--- 1323,1329 
  		  || var_in_expr (var, iters[j]->end)
  		  || var_in_expr (var, iters[j]->step)))
  		  return false;
! 	}
  	}
  }
  
*** get_len_trim_call (gfc_expr *str, int ki
*** 2019,2024 
--- 2056,2062 
return fcn;
  }
  
+ 
  /* Optimize expressions for equality.  */
  
  static bool
*** do_subscript (gfc_expr **e)
*** 2626,2632 
  
  	  /* If we do not know about the stepsize, the loop may be zero trip.
  		 Do not warn in this case.  */
! 	  
  	  if (dl->ext.iterat

[Patch, Fortran] PR87625 - fix reallocate on assign with polymophic arrays

2018-10-18 Thread Paul Richard Thomas
Your patch at: https://gcc.gnu.org/ml/fortran/2018-10/msg00079.html is
OK for trunk.

Thanks

Paul


[PATCH, i386]: Improve some i387 sequences

2018-10-18 Thread Uros Bizjak
2018-10-18  Uros Bizjak  

* config/i386/i386.c (ix86_emit_fp_unordered_jump):
Set JUMP_LABEL to the jump insn.
(ix86_emit_i387_log1p): Use ix86_expand_branch to expand branch.
Predict emitted jump and add label to jump insn.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3ab6b205eb61..ef46083b04b9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -43879,6 +43879,7 @@ void
 ix86_emit_fp_unordered_jump (rtx label)
 {
   rtx reg = gen_reg_rtx (HImode);
+  rtx_insn *insn;
   rtx temp;
 
   emit_insn (gen_x86_fnstsw_1 (reg));
@@ -43901,10 +43902,9 @@ ix86_emit_fp_unordered_jump (rtx label)
   temp = gen_rtx_IF_THEN_ELSE (VOIDmode, temp,
  gen_rtx_LABEL_REF (VOIDmode, label),
  pc_rtx);
-  temp = gen_rtx_SET (pc_rtx, temp);
-
-  emit_jump_insn (temp);
+  insn = emit_jump_insn (gen_rtx_SET (pc_rtx, temp));
   predict_jump (REG_BR_PROB_BASE * 10 / 100);
+  JUMP_LABEL (insn) = label;
 }
 
 /* Output code to perform a log1p XFmode calculation.  */
@@ -43915,27 +43915,36 @@ void ix86_emit_i387_log1p (rtx op0, rtx op1)
   rtx_code_label *label2 = gen_label_rtx ();
 
   rtx tmp = gen_reg_rtx (XFmode);
-  rtx tmp2 = gen_reg_rtx (XFmode);
-  rtx test;
+  rtx res = gen_reg_rtx (XFmode);
+  rtx cst, cstln2, cst1;
+  rtx_insn *insn;
+
+  cst = const_double_from_real_value
+(REAL_VALUE_ATOF ("0.29289321881345247561810596348408353", XFmode), 
XFmode);
+  cstln2 = force_reg (XFmode, standard_80387_constant_rtx (4)); /* fldln2 */
 
   emit_insn (gen_absxf2 (tmp, op1));
-  emit_move_insn (tmp2, standard_80387_constant_rtx (4)); /* fldln2 */
-  test = gen_rtx_GE (VOIDmode, tmp,
-const_double_from_real_value (
-   REAL_VALUE_ATOF ("0.29289321881345247561810596348408353", XFmode),
-   XFmode));
-  emit_jump_insn
-(gen_cbranchxf4 (test, XEXP (test, 0), XEXP (test, 1), label1));
-
-  emit_insn (gen_fyl2xp1xf3_i387 (op0, op1, tmp2));
+
+  cst = force_reg (XFmode, cst);
+  ix86_expand_branch (GE, tmp, cst, label1);
+  predict_jump (REG_BR_PROB_BASE * 10 / 100);
+  insn = get_last_insn ();
+  JUMP_LABEL (insn) = label1;
+
+  emit_insn (gen_fyl2xp1xf3_i387 (res, op1, cstln2));
   emit_jump (label2);
 
   emit_label (label1);
-  emit_move_insn (tmp, CONST1_RTX (XFmode));
-  emit_insn (gen_addxf3 (tmp, op1, tmp));
-  emit_insn (gen_fyl2xxf3_i387 (op0, tmp, tmp2));
+  LABEL_NUSES (label1) = 1;
+
+  cst1 = force_reg (XFmode, CONST1_RTX (XFmode));
+  emit_insn (gen_rtx_SET (tmp, gen_rtx_PLUS (XFmode, op1, cst1)));
+  emit_insn (gen_fyl2xxf3_i387 (res, tmp, cstln2));
 
   emit_label (label2);
+  LABEL_NUSES (label2) = 1;
+
+  emit_move_insn (op0, res);
 }
 
 /* Emit code for round calculation.  */
@@ -43952,7 +43961,8 @@ void ix86_emit_i387_round (rtx op0, rtx op1)
   rtx_code_label *jump_label = gen_label_rtx ();
   rtx (*floor_insn) (rtx, rtx);
   rtx (*neg_insn) (rtx, rtx);
-  rtx insn, tmp;
+  rtx_insn *insn;
+  rtx tmp;
 
   switch (inmode)
 {


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-18 Thread Jeff Law
On 10/17/18 4:21 PM, Giuliano Augusto Faulin Belinassi wrote:
> Oh, please note that the error that I'm talking about is the
> comparison with the result obtained before and after the
> simplification. It is possible that the result obtained after the
> simplification be more precise when compared to an arbitrary precise
> value (example, a 30 digits precise approximation). Well, I will try
> check that.
That would be helpful.  Obviously if we're getting more precise, then
that's a good thing :-)

jeff


Re: [PATCH] Reset insn priority after inc/ref replacement in haifa sched

2018-10-18 Thread Jeff Law
On 10/18/18 2:10 AM, Robin Dapp wrote:
> Hi,
> 
> I added a check before calling priority in restore_pattern.  In the last
> version, not checking that would lead to assertion failure in priority
> since the insn might already have been scheduled.
> 
> Bootstrapped and regtested on x86_64 and ppc8, regtested on s390x.
> 
> Regards
>  Robin
> 
> --
> 
> gcc/ChangeLog:
> 
> 2018-10-16  Robin Dapp  
> 
>   * haifa-sched.c (priority): Add force_recompute parameter.
>   (apply_replacement): Call priority () with force_recompute = true.
>   (restore_pattern): Likewise.
> 
Still OK :-)
jeff


Re: [PATCH v2] lra: fix spill_hard_reg_in_range clobber check

2018-10-18 Thread Jeff Law
On 10/17/18 12:14 PM, Ilya Leoshkevich wrote:
> Boostrapped and regtested on x86_64-redhat-linux.
> 
> Changes since v1:
> 
> * Added the missing INSN_P () check.
> * Rewrote the commit message.
> 
> FROM..TO range might contain NOTE_INSN_DELETED insns, for which the
> corresponding entries in lra_insn_recog_data[] are NULLs.  Example from
> the problematic code from PR87596:
> 
> (note 148 154 68 7 NOTE_INSN_DELETED)
> 
> lra_insn_recog_data[] is used directly only when the insn in question
> is taken from insn_bitmap, which is not the case here.  In other
> situations lra_get_insn_recog_data () guarded by INSN_P () or other
> stricter predicate are used.  So we need to do this here as well.
> 
> A tiny detail worth noting: I put the INSN_P () check before the
> insn_bitmap check, because I believe that insn_bitmap can contain only
> real insns anyway.
> 
> gcc/ChangeLog:
> 
> 2018-10-16  Ilya Leoshkevich  
> 
>   PR rtl-optimization/87596
>   * lra-constraints.c (spill_hard_reg_in_range): Use INSN_P () +
>   lra_get_insn_recog_data () instead of lra_insn_recog_data[]
>   for instructions in FROM..TO range.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-10-16  Ilya Leoshkevich  
> 
>   PR rtl-optimization/87596
>   * gcc.target/i386/pr87596.c: New test.
OK.
jeff


Re: [patch] allow target config to state r18 is fixed on aarch64

2018-10-18 Thread Wilco Dijkstra
Hi Olivier,

> STATIC_CHAIN_REGNUM still needs to be adjusted directly I think.
>
> I wondered if we could set it to R11 unconditionally and picked
> the way ensuring no change for !vxworks ports, especially since I
> don't have means to test more than what I described above.

Yes it should always be the same register, there is no gain in switching
it dynamically. I'd suggest to use X9 since X8 is the last register used for
arguments (STATIC_CHAIN_REGNUM is passed when calling a nested
function) and some of the higher registers may be used as temporaries in
prolog/epilog.

Wilco

Re: [PATCH] Add splay-tree "view" for bitmap

2018-10-18 Thread Jeff Law
On 10/18/18 7:09 AM, Richard Biener wrote:
> 
> PR63155 made me pick up this old work from Steven, it turns our
> linked-list implementation to a two-mode one with one being a
> splay tree featuring O(log N) complexity for find/remove.
> 
> Over Stevens original patch I added a bitmap_tree_to_vec helper
> that I use from the debug/print methods to avoid changing view
> there.  In theory the bitmap iterator could get a "stack"
> as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
> 
> This can be used to fix the two biggest bottlenecks in the PRs
> testcase, namely SSA propagator worklist handling and out-of-SSA
> coalesce list building.  perf shows the following data, first
> unpatched, second patched - also watch the thrid coulumn (samples)
> when comparing percentages.
> 
> -O0
> -   18.19%17.35%   407  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 8.77% create_coalesce_list_for_region 
>  ▒
>   + 4.21% calculate_live_ranges   
>  ▒
>   + 2.02% build_ssa_conflict_graph
>  ▒
>   + 1.66% insert_phi_nodes_for
>  ▒
>   + 0.86% coalesce_ssa_name  
> patched:
> -   12.39%10.48%   129  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 5.27% calculate_live_ranges   
>  ▒
>   + 2.76% insert_phi_nodes_for
>  ▒
>   + 1.90% create_coalesce_list_for_region 
>  ▒
>   + 1.63% build_ssa_conflict_graph
>  ▒
>   + 0.35% coalesce_ssa_name   
> 
> -O1
> -   17.53%17.53%   842  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 12.39% add_ssa_edge   
>  ▒
>   + 1.48% create_coalesce_list_for_region 
>  ▒
>   + 0.82% solve_constraints   
>  ▒
>   + 0.71% calculate_live_ranges   
>  ▒
>   + 0.64% add_implicit_graph_edge 
>  ▒
>   + 0.41% insert_phi_nodes_for
>  ▒
>   + 0.34% build_ssa_conflict_graph  
> patched:
> -5.79% 5.00%   167  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 1.41% add_ssa_edge
>  ▒
>   + 0.88% calculate_live_ranges   
>  ▒
>   + 0.75% add_implicit_graph_edge 
>  ▒
>   + 0.68% solve_constraints   
>  ▒
>   + 0.48% insert_phi_nodes_for
>  ▒
>   + 0.45% build_ssa_conflict_graph   
> 
> -O3
> -   12.37%12.34%  1145  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 9.14% add_ssa_edge
>  ▒
>   + 0.80% create_coalesce_list_for_region 
>  ▒
>   + 0.69% add_implicit_graph_edge 
>  ▒
>   + 0.54% solve_constraints   
>  ▒
>   + 0.34% calculate_live_ranges   
>  ▒
>   + 0.27% insert_phi_nodes_for
>  ▒
>   + 0.21% build_ssa_conflict_graph 
> -4.36% 3.86%   227  cc1  cc1   [.] 
> bitmap_set_b▒
>- bitmap_set_bit   
>  ▒
>   + 0.98% add_ssa_edge
>  ▒
>   + 0.86% add_implicit_graph_edge 
>  ▒
>   + 0.64% solve_constraints   
>  ▒
>   + 0.57% calculate_live_ranges   
>  ▒
>   + 0.32% build_ssa_conflict_graph
>  ▒
>   + 0.29% mark_all_vars_used_1
>  ▒
>   + 0.20% insert_phi_nodes_for
>  ▒
>   + 0.16% create_coalesce_list_for_region 
> 
> 
> Boots

Re: [PATCH] Improve -dumpversion and -dumpfullversion documentation

2018-10-18 Thread Jeff Law
On 10/18/18 9:43 AM, Jonathan Wakely wrote:
> * doc/invoke.texi (-dumpversion): Improve grammar.
> (-dumpfullversion): Make more consistent with -dumpversion.
> 
> OK for trunk?
> 
> 
> 
> patch.txt
> 
> commit 67e1782be13b180e537fcf56aa041cd199b38ae9
> Author: Jonathan Wakely 
> Date:   Thu Oct 18 16:40:00 2018 +0100
> 
> Improve -dumpversion and -dumpfullversion documentation
> 
> * doc/invoke.texi (-dumpversion): Improve grammar.
> (-dumpfullversion): Make more consistent with -dumpversion.
OK
jeff


Re: [PATCH] add udivhi3, umodhi3 functions to libgcc

2018-10-18 Thread Jeff Law
On 10/17/18 5:48 PM, Paul Koning wrote:
> This is a revision of a patch I proposed a while back, to add udivhi3 and 
> umodhi3 functions to libgcc since some platforms (like pdp11) need it.  The 
> code is adopted from that of udivsi3.
> 
> In earlier discussion it was pointed out that internal functions need to 
> start with __.  The code I had copied does not do that, so I corrected mine 
> and also changed the existing code to conform to the rules.
> 
> Ok for trunk?
> 
>   paul
> 
> ChangeLog:
> 
> 2018-10-17  Paul Koning  
> 
>   * udivmodsi4.c (__udivmodsi4): Rename to conform to coding
>   standard.
>   * udivmod.c: Update references to __udivmodsi4.
>   * udivhi3.c: New file.
>   * udivmodhi4.c: New file.
>   * config/pdp11/t-pdp11 (LIB2ADD): Add the new files.
I think you need to fix divmod.c as well since it calls udivmodsi4.  OK
with that fixed.

Jeff


Re: [01/10] Expand COMPLETE_TYPE_P in obvious checks for null

2018-10-18 Thread Jeff Law
On 10/15/18 8:31 AM, Richard Sandiford wrote:
> Some tests for COMPLETE_TYPE_P are just protecting against a null
> TYPE_SIZE or TYPE_SIZE_UNIT.  Rather than replace them with a new
> macro, it seemed clearer to write out the underlying test.
> 
> 2018-10-15  Richard Sandiford  
> 
> gcc/
>   * calls.c (initialize_argument_information): Replace COMPLETE_TYPE_P
>   with checks for null.
>   * config/aarch64/aarch64.c (aapcs_vfp_sub_candidate): Likewise.
>   * config/arm/arm.c (aapcs_vfp_sub_candidate): Likewise.
>   * config/powerpcspe/powerpcspe.c (rs6000_aggregate_candidate):
>   Likewise.
>   * config/riscv/riscv.c (riscv_flatten_aggregate_field): Likewise.
>   * config/rs6000/rs6000.c (rs6000_aggregate_candidate): Likewise.
>   * expr.c (expand_assignment, safe_from_p): Likewise.
>   (expand_expr_real_1): Likewise.
>   * tree-data-ref.c (initialize_data_dependence_relation): Likewise.
>   * tree-sra.c (maybe_add_sra_candidate): Likewise.
>   (find_param_candidates): Likewise.
>   * tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Likewise.
>   * tree-vrp.c (vrp_prop::check_mem_ref): Likewise.
> 
> gcc/lto/
>   * lto-symtab.c (warn_type_compatibility_p): Likewise.
This seems largely independent of the larger changes you're trying to
make and could stand on its own.

I'm going to assume that you evaluated these instances of
COMPLETE_TYPE_P and determined that type completeness isn't really a
factor in the affected code.

OK if nobody objects by Monday.

jeff


Re: [PATCH] add udivhi3, umodhi3 functions to libgcc

2018-10-18 Thread Paul Koning



> On Oct 18, 2018, at 1:18 PM, Jeff Law  wrote:
> 
> On 10/17/18 5:48 PM, Paul Koning wrote:
>> This is a revision of a patch I proposed a while back, to add udivhi3 and 
>> umodhi3 functions to libgcc since some platforms (like pdp11) need it.  The 
>> code is adopted from that of udivsi3.
>> 
>> In earlier discussion it was pointed out that internal functions need to 
>> start with __.  The code I had copied does not do that, so I corrected mine 
>> and also changed the existing code to conform to the rules.
>> 
>> Ok for trunk?
>> 
>>  paul
>> 
>> ChangeLog:
>> 
>> 2018-10-17  Paul Koning  
>> 
>>  * udivmodsi4.c (__udivmodsi4): Rename to conform to coding
>>  standard.
>>  * udivmod.c: Update references to __udivmodsi4.
>>  * udivhi3.c: New file.
>>  * udivmodhi4.c: New file.
>>  * config/pdp11/t-pdp11 (LIB2ADD): Add the new files.
> I think you need to fix divmod.c as well since it calls udivmodsi4.  OK
> with that fixed.
> 
> Jeff

Thanks.  Committed as shown below.

paul

ChangeLog:

2018-10-18  Paul Koning  

* udivmodsi4.c (__udivmodsi4): Rename to conform to coding
standard.
* divmod.c: Update references to __udivmodsi4.
* udivmod.c: Ditto.
* udivhi3.c: New file.
* udivmodhi4.c: New file.
* config/pdp11/t-pdp11 (LIB2ADD): Add the new files.

Index: config/pdp11/t-pdp11
===
--- config/pdp11/t-pdp11(revision 265276)
+++ config/pdp11/t-pdp11(working copy)
@@ -1,5 +1,7 @@
 LIB2ADD = $(srcdir)/udivmod.c \
  $(srcdir)/udivmodsi4.c \
+ $(srcdir)/udivhi3.c \
+ $(srcdir)/udivmodhi4.c \
  $(srcdir)/memcmp.c \
  $(srcdir)/memcpy.c \
  $(srcdir)/memmove.c \
Index: divmod.c
===
--- divmod.c(revision 265276)
+++ divmod.c(working copy)
@@ -21,7 +21,8 @@ a copy of the GCC Runtime Library Exception along
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
-long udivmodsi4 ();
+extern unsigned long __udivmodsi4(unsigned long num, unsigned long den,
+ int 
modwanted);
 
 long
 __divsi3 (long a, long b)
@@ -41,7 +42,7 @@ __divsi3 (long a, long b)
   neg = !neg;
 }
 
-  res = udivmodsi4 (a, b, 0);
+  res = __udivmodsi4 (a, b, 0);
 
   if (neg)
 res = -res;
@@ -64,7 +65,7 @@ __modsi3 (long a, long b)
   if (b < 0)
 b = -b;
 
-  res = udivmodsi4 (a, b, 1);
+  res = __udivmodsi4 (a, b, 1);
 
   if (neg)
 res = -res;
Index: udivhi3.c
===
--- udivhi3.c   (nonexistent)
+++ udivhi3.c   (working copy)
@@ -0,0 +1,38 @@
+/* Copyright (C) 2000-2018 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+extern unsigned short __udivmodhi4(unsigned short num, unsigned short den,
+  int 
modwanted);
+
+unsigned short
+__udivhi3 (unsigned short a, unsigned short b)
+{
+  return __udivmodhi4 (a, b, 0);
+}
+
+unsigned short
+__umodhi3 (unsigned short a, unsigned short b)
+{
+  return __udivmodhi4 (a, b, 1);
+}
+
Index: udivmod.c
===
--- udivmod.c   (revision 265276)
+++ udivmod.c   (working copy)
@@ -21,17 +21,18 @@ a copy of the GCC Runtime Library Exception along
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
-long udivmodsi4 ();
+extern unsigned long __udivmodsi4(unsigned long num, unsigned long den,
+ int 
modwanted);
 
 long
 __udivsi3 (long a, long b)
 {
-  return udivmodsi4 (a, b, 0);
+  return __udivmodsi4 (a, b, 0);
 }
 
 long
 __umodsi3 (long a, long b)
 {
-  return udivmodsi4 (a, b, 1);
+  return __udivmodsi4 (a, b, 1);
 }
 
Index: udivmodhi4.c
===

[gomp5] Small testcase fix

2018-10-18 Thread Jakub Jelinek
Hi!

I've realized there is UB in this testcase, because taskloop simd
non-collapsed iterator is linear on simd, which implies lastprivate on
taskloop, but with nogroup the last iteration's value might be stored
when bar is out of scope already.

Fixed by declaring it in the construct, then nothing is written anywhere
(other possibility would be private (i)).

2018-10-18  Jakub Jelinek  

* testsuite/libgomp.c-c++-common/taskloop-reduction-3.c (bar): Define
iterator inside of the construct.

--- libgomp/testsuite/libgomp.c-c++-common/taskloop-reduction-3.c.jj
2018-10-18 13:49:18.531282016 +0200
+++ libgomp/testsuite/libgomp.c-c++-common/taskloop-reduction-3.c   
2018-10-18 20:38:55.501725608 +0200
@@ -19,9 +19,8 @@ foo (void)
 __attribute__((noipa)) void
 bar (int x)
 {
-  int i;
   #pragma omp taskloop simd in_reduction (+:n) grainsize (64) nogroup
-  for (i = (x & 1) * (N / 2); i < (x & 1) * (N / 2) + (N / 2); i++)
+  for (int i = (x & 1) * (N / 2); i < (x & 1) * (N / 2) + (N / 2); i++)
 n += 2 * u[i];
 }
 

Jakub


Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread Richard Sandiford
"H.J. Lu"  writes:
> On 10/18/18, Richard Sandiford  wrote:
>> "H.J. Lu"  writes:
>>> On 10/18/18, Richard Sandiford  wrote:
 "H.J. Lu"  writes:
> On 10/18/18, Richard Sandiford  wrote:
>> "H.J. Lu"  writes:
>>> On 10/17/18, Marc Glisse  wrote:
 On Wed, 17 Oct 2018, H.J. Lu wrote:

> We may simplify
>
>  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)
>
> to X when mode of X is the same as of mode of subreg.

 Hello,

 we already have code to simplify vec_select(vec_merge):

  /* If we select elements in a vec_merge that all come from the
 same
 operand, select from that operand directly.  */

 It would make sense to me to make the subreg transform as similar to
 it
 as
 possible, in particular you don't need to special case
 vec_duplicate,
 the
 transformation would see that everything comes from the first
 vector,
 produce (subreg (vec_duplicate X) 0), and let another transformation
 optimize that.
>>
>> Sorry, didn't see this before the OK.
>>
>>> What do you mean by another transformation? If simplify_subreg
>>> doesn't
>>> return X for
>>>
>>>   (subreg (vec_merge (vec_duplicate X)
>>>  (vector)
>>>  (const_int ((1 << N) | M)))
>>>   (N * sizeof (X)))
>>>
>>>
>>> no further transformation will be done.
>>
>> I think the point was that we should transform:
>>
>>   (subreg (vec_merge X
>>   (vector)
>>   (const_int ((1 << N) | M)))
>>(N * sizeof (X)))
>>
>> into:
>>
>>   simplify_gen_subreg (outermode, X, innermode, byte)
>>
>> which should further simplify when X is a vec_duplicate.
>
> But sizeof (X) is the size of scalar of vec_dup.  How do we
> check the mask of vec_merge?

 Yeah, should be sizeof (outermode) (which was the same thing
 in the original pattern, but not here).

 Richard

>>>
>>> Like this
>>>
>>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
>>> index b0cf3bbb2a9..e12b5c0e165 100644
>>> --- a/gcc/simplify-rtx.c
>>> +++ b/gcc/simplify-rtx.c
>>> @@ -6601,20 +6601,21 @@ simplify_subreg (machine_mode outermode, rtx op,
>>>return NULL_RTX;
>>>  }
>>>
>>> -  /* Return X for
>>> -  (subreg (vec_merge (vec_duplicate X)
>>> +  /* Simplify
>>> +  (subreg (vec_merge (X)
>>> (vector)
>>> (const_int ((1 << N) | M)))
>>> - (N * sizeof (X)))
>>> + (N * sizeof (outermode)))
>>> + to
>>> +  (subreg ((X) (N * sizeof (outermode)))
>>
>> Stray "(": (subreg (X) (N * sizeof (outermode)))
>>
>> OK with that change if it passes testing.
>
> The self-test failed for 32-bit compiler:
>
> expected: (reg:QI 342)
>   actual: (subreg:QI (vec_merge:V128QI (vec_duplicate:V128QI (reg:QI 342))
> (reg:V128QI 343)
> (const_int 65 [0x41])) 64)
>
> since
>
> && (UINTVAL (XEXP (op, 2)) & (HOST_WIDE_INT_1U << idx)) != 0)
>
> works only up to vectors with 64 elements for 32-bit compilers.

Since HOST_WIDE_INT should be 64 bits for both 32-bit and 64-bit
compilers, I guess the test invoked UB and so only worked by chance
for 64-bit compilers?

> Should we limit the self-test to vectors with 64 elements?

HOST_BITS_PER_WIDE_INT elements probably.  I wondered at first whether
we should try to support CONST_WIDE_INT, but that just changes the limit
to WIDE_INT_MAX_PRECISION.

Thanks,
Richard


Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

2018-10-18 Thread Uecker, Martin

Hi Richard,

responding here to a couple of points.

For bignums and for a type-descibing type 'type'
there were proposals (including from me) to implement
these as variable-sized types which have some restrictions,
i.e. they cannot be stored in a struct/union.

Most of the restrictions for these types would be the same
as proposed for your sizeless types. 

Because all these types fall into the same overall class
of types which do not have a size known at compile
time, I would suggest to add this concept to the standard
and then define your vector types as a subclass which
may have additional restrictions (no sizeof) instead
of adding a very specific concept which only works for
your proposal.

Best,
Martin





Am Donnerstag, den 18.10.2018, 13:47 +0100 schrieb Richard Sandiford:
> Joseph Myers  writes:
> > On Wed, 17 Oct 2018, Richard Sandiford wrote:
> > > Yeah, can't deny that if you look at it as a general-purpose extension.
> > > But that's not really what this is supposed to be.  It's fairly special
> > > purpose: there has to be some underlying variable-length/sizeless
> > > built-in type that you want to provide via a library.
> > > 
> > > What the extension allows is enough to support the intended use case,
> > > and it does that with no enforced overhead.
> > 
> > Part of my point is that there are various *other* possible cases of 
> > non-VLA-variable-size-type people have suggested in WG14 reflector 
> > discussions - so any set of concepts for such types ought to take into 
> > account more than just the SVE use case (even if other use cases need 
> > further concepts added on top of the ones needed for SVE).
> 
> [Answered this in the other thread -- sorry, took me a while to go
> through the full discussion.]
> 
> > > > Surely, the processor knows the size when it computes using these
> > > > types, so one could make it available using 'sizeof'.
> > > 
> > > The argument's similar here: we don't really need sizeof to be available
> > > for vector use because the library provides easy ways of getting
> > > vector-length-based constants.  Usually what you want to know is
> > > "how many elements of type X are there?", with bytes just being one
> > > of the available element sizes.
> > 
> > But if having sizeof available makes for a more natural language feature 
> > (one where a few places referencing VLAs need to change to reference a 
> > more general class of variable-size types, and a few constraints on VLAs 
> > and variably modified types need to be relaxed to allow what you want with 
> > these types), that may be a case for doing so, even if sizeof won't 
> > generally be used.
> 
> I agree that might be all that's needed in C.  But since C++ doesn't
> even have VLAs yet (and since something less ambituous than VLAs was
> rejected) the situation is very different there.
> 
> I think we'd need a compelling reason to make sizeof variable in C++.
> The fact that it isn't going to be generally used for SVE anyway
> would undercut that.
> 
> > If the processor in fact knows the size, do you actually need to include 
> > it in the object to be able to provide it when sizeof is called?  (With 
> > undefined behavior still present if passing the object from a thread with 
> > one value of sizeof for that type to a thread with a different value of 
> > sizeof for that type, of course - the rule on VLA type compatibility would 
> > still need to be extended to apply to sizes of these types, and those they 
> > contain, recursively.)
> 
> No, if we go the undefined behaviour route, we wouldn't need to store it.
> This was just to answer Martin's suggestion that we could make sizeof(x)
> do the right thing for a sizeless object x by storing the size with x.
> 
> Thanks,
> Richard

Go patch committed: Drop semicolons in export data

2018-10-18 Thread Ian Lance Taylor
This is the first in a series of patches to start changing gccgo's
export data to support type indexing and, eventually cross-package
inlining of simple functions.  (This is an inlining approach different
than LTO that relies on recording the function body in the export
data).  This patch doesn't do much, just drops some unnecessary
semicolons in the export data.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 264985)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-e32e9aaee598eeb43f9616cf6ca1d11acaa9d167
+0494dc5737f0c89ad6f45e04e8313e4161678861
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/export.cc
===
--- gcc/go/gofrontend/export.cc (revision 264813)
+++ gcc/go/gofrontend/export.cc (working copy)
@@ -26,14 +26,18 @@ const int Export::magic_len;
 // Current version magic string.
 const char Export::cur_magic[Export::magic_len] =
   {
-'v', '2', ';', '\n'
+'v', '3', ';', '\n'
   };
 
-// Magic string for previous version (still supported)
+// Magic strings for previous versions (still supported).
 const char Export::v1_magic[Export::magic_len] =
   {
 'v', '1', ';', '\n'
   };
+const char Export::v2_magic[Export::magic_len] =
+  {
+'v', '2', ';', '\n'
+  };
 
 const int Export::checksum_len;
 
@@ -147,7 +151,7 @@ Export::export_globals(const std::string
   // The package name.
   this->write_c_string("package ");
   this->write_string(package_name);
-  this->write_c_string(";\n");
+  this->write_c_string("\n");
 
   // The prefix or package path, used for all global symbols.
   if (prefix.empty())
@@ -161,7 +165,7 @@ Export::export_globals(const std::string
   this->write_c_string("prefix ");
   this->write_string(prefix);
 }
-  this->write_c_string(";\n");
+  this->write_c_string("\n");
 
   this->write_packages(packages);
 
@@ -191,7 +195,7 @@ Export::export_globals(const std::string
   dig = c & 0xf;
   s += dig < 10 ? '0' + dig : 'A' + dig - 10;
 }
-  s += ";\n";
+  s += "\n";
   this->stream_->write_checksum(s);
 }
 
@@ -233,7 +237,7 @@ Export::write_packages(const std::mapwrite_string((*p)->pkgpath());
   this->write_c_string(" ");
   this->write_string((*p)->pkgpath_symbol());
-  this->write_c_string(";\n");
+  this->write_c_string("\n");
 }
 }
 
@@ -271,7 +275,7 @@ Export::write_imports(const std::mapwrite_string(p->second->pkgpath());
   this->write_c_string(" \"");
   this->write_string(p->first);
-  this->write_c_string("\";\n");
+  this->write_c_string("\"\n");
 
   this->packages_.insert(p->second);
 }
@@ -347,7 +351,7 @@ Export::write_imported_init_fns(const st
 
   if (imported_init_fns.empty())
 {
-  this->write_c_string(";\n");
+  this->write_c_string("\n");
   return;
 }
 
@@ -394,7 +398,7 @@ Export::write_imported_init_fns(const st
it->second.push_back(ii->init_name());
}
 }
-  this->write_c_string(";\n");
+  this->write_c_string("\n");
 
   // Create the init graph. Start by populating the graph with
   // all the edges we inherited from imported packages.
@@ -494,7 +498,7 @@ Export::write_imported_init_fns(const st
  this->write_unsigned(sink);
}
 }
-  this->write_c_string(";\n");
+  this->write_c_string("\n");
 }
 
 // Write a name to the export stream.
Index: gcc/go/gofrontend/export.h
===
--- gcc/go/gofrontend/export.h  (revision 264813)
+++ gcc/go/gofrontend/export.h  (working copy)
@@ -57,7 +57,8 @@ enum Export_data_version {
   EXPORT_FORMAT_UNKNOWN = 0,
   EXPORT_FORMAT_V1 = 1,
   EXPORT_FORMAT_V2 = 2,
-  EXPORT_FORMAT_CURRENT = EXPORT_FORMAT_V2
+  EXPORT_FORMAT_V3 = 3,
+  EXPORT_FORMAT_CURRENT = EXPORT_FORMAT_V3
 };
 
 // This class manages exporting Go declarations.  It handles the main
@@ -119,9 +120,10 @@ class Export : public String_dump
   // Size of export data magic string (which includes version number).
   static const int magic_len = 4;
 
-  // Magic strings (current version and older v1 version).
+  // Magic strings (current version and older versions).
   static const char cur_magic[magic_len];
   static const char v1_magic[magic_len];
+  static const char v2_magic[magic_len];
 
   // The length of the checksum string.
   static const int checksum_len = 20;
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 264813)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -5391,7 +5391,7 @@ Function::export_func_with_type(Export*
  exp->write_c_string(")");
}
 }
-  exp->write_c_string(";\n");
+  exp->wri

Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

2018-10-18 Thread Richard Sandiford
"Uecker, Martin"  writes:
> Hi Richard,
>
> responding here to a couple of points.
>
> For bignums and for a type-descibing type 'type'
> there were proposals (including from me) to implement
> these as variable-sized types which have some restrictions,
> i.e. they cannot be stored in a struct/union.

But do you mean variable-sized types in the sense that they
are completely self-contained and don't refer to separate storage?
I.e. the moral equivalent of:

  1: struct { int size; int contents[size]; };

rather than either:

  2: struct { int size; int *contents; };

or:

  3: union {
   // embedded storage up to N bits (N constant)
   // description of separately-allocated storage (for >N bits)
 };

?  If so, how would that work with the example I gave in the earlier
message:

bignum x = ...;
for (int i = 0; i < var; ++i)
  x += x;

Each time the addition result grows beyond the original size of x,
I assume you'd need to allocate a new stack bignum for the new size,
which would result in a series of ever-increasing allocas.  Won't that
soon blow the stack?

Option 3 (as for LLVM's APInt) seems far less surprising, and can
be made efficient for a chosen N.  What makes it difficult for C
isn't the lack of general variable-length types but the lack of
user-defined contructor, destructor, copy and move operations.

Thanks,
Richard

> Most of the restrictions for these types would be the same
> as proposed for your sizeless types. 
>
> Because all these types fall into the same overall class
> of types which do not have a size known at compile
> time, I would suggest to add this concept to the standard
> and then define your vector types as a subclass which
> may have additional restrictions (no sizeof) instead
> of adding a very specific concept which only works for
> your proposal.
>
> Best,
> Martin
>
>
>
>
>
> Am Donnerstag, den 18.10.2018, 13:47 +0100 schrieb Richard Sandiford:
>> Joseph Myers  writes:
>> > On Wed, 17 Oct 2018, Richard Sandiford wrote:
>> > > Yeah, can't deny that if you look at it as a general-purpose extension.
>> > > But that's not really what this is supposed to be.  It's fairly special
>> > > purpose: there has to be some underlying variable-length/sizeless
>> > > built-in type that you want to provide via a library.
>> > > 
>> > > What the extension allows is enough to support the intended use case,
>> > > and it does that with no enforced overhead.
>> > 
>> > Part of my point is that there are various *other* possible cases of 
>> > non-VLA-variable-size-type people have suggested in WG14 reflector 
>> > discussions - so any set of concepts for such types ought to take into 
>> > account more than just the SVE use case (even if other use cases need 
>> > further concepts added on top of the ones needed for SVE).
>> 
>> [Answered this in the other thread -- sorry, took me a while to go
>> through the full discussion.]
>> 
>> > > > Surely, the processor knows the size when it computes using these
>> > > > types, so one could make it available using 'sizeof'.
>> > > 
>> > > The argument's similar here: we don't really need sizeof to be available
>> > > for vector use because the library provides easy ways of getting
>> > > vector-length-based constants.  Usually what you want to know is
>> > > "how many elements of type X are there?", with bytes just being one
>> > > of the available element sizes.
>> > 
>> > But if having sizeof available makes for a more natural language feature 
>> > (one where a few places referencing VLAs need to change to reference a 
>> > more general class of variable-size types, and a few constraints on VLAs 
>> > and variably modified types need to be relaxed to allow what you want with 
>> > these types), that may be a case for doing so, even if sizeof won't 
>> > generally be used.
>> 
>> I agree that might be all that's needed in C.  But since C++ doesn't
>> even have VLAs yet (and since something less ambituous than VLAs was
>> rejected) the situation is very different there.
>> 
>> I think we'd need a compelling reason to make sizeof variable in C++.
>> The fact that it isn't going to be generally used for SVE anyway
>> would undercut that.
>> 
>> > If the processor in fact knows the size, do you actually need to include 
>> > it in the object to be able to provide it when sizeof is called?  (With 
>> > undefined behavior still present if passing the object from a thread with 
>> > one value of sizeof for that type to a thread with a different value of 
>> > sizeof for that type, of course - the rule on VLA type compatibility would 
>> > still need to be extended to apply to sizes of these types, and those they 
>> > contain, recursively.)
>> 
>> No, if we go the undefined behaviour route, we wouldn't need to store it.
>> This was just to answer Martin's suggestion that we could make sizeof(x)
>> do the right thing for a sizeless object x by storing the size with x.
>> 
>> Thanks,
>> Richard


[PATCH] PR libstdc++/87642 handle multibyte thousands separators from libc

2018-10-18 Thread Jonathan Wakely

If a locale's THOUSANDS_SEP or MON_THOUSANDS_SEP string is not a
single character we either need to narrow it to a single char or
ignore it (and therefore disable digit grouping for that facet).

PR libstdc++/87642
* config/locale/gnu/monetary_members.cc
(moneypunct::_M_initialize_moneypunct): Use
__narrow_multibyte_chars to convert multibyte thousands separators
to a single char.
* config/locale/gnu/numeric_members.cc
(numpunct::_M_initialize_numpunct): Likewise.
(__narrow_multibyte_chars): New function.

Tested x86_64-linux, committed to trunk.


commit a8278bf69de1e5f5191b5fd434084eac7db2a1cc
Author: Jonathan Wakely 
Date:   Thu Oct 18 16:26:24 2018 +0100

PR libstdc++/87642 handle multibyte thousands separators from libc

If a locale's THOUSANDS_SEP or MON_THOUSANDS_SEP string is not a
single character we either need to narrow it to a single char or
ignore it (and therefore disable digit grouping for that facet).

PR libstdc++/87642
* config/locale/gnu/monetary_members.cc
(moneypunct::_M_initialize_moneypunct): Use
__narrow_multibyte_chars to convert multibyte thousands separators
to a single char.
* config/locale/gnu/numeric_members.cc
(numpunct::_M_initialize_numpunct): Likewise.
(__narrow_multibyte_chars): New function.

diff --git a/libstdc++-v3/config/locale/gnu/monetary_members.cc 
b/libstdc++-v3/config/locale/gnu/monetary_members.cc
index b3e7645385a..212c68dd501 100644
--- a/libstdc++-v3/config/locale/gnu/monetary_members.cc
+++ b/libstdc++-v3/config/locale/gnu/monetary_members.cc
@@ -207,6 +207,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 #endif
 
+  extern char __narrow_multibyte_chars(const char* s, __locale_t cloc);
+
   template<>
 void
 moneypunct::_M_initialize_moneypunct(__c_locale __cloc,
@@ -241,8 +243,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // Named locale.
  _M_data->_M_decimal_point = *(__nl_langinfo_l(__MON_DECIMAL_POINT,
__cloc));
- _M_data->_M_thousands_sep = *(__nl_langinfo_l(__MON_THOUSANDS_SEP,
-   __cloc));
+ const char* thousands_sep = __nl_langinfo_l(__MON_THOUSANDS_SEP,
+ __cloc);
+ if (thousands_sep[0] != '\0' && thousands_sep[1] != '\0')
+   _M_data->_M_thousands_sep = __narrow_multibyte_chars(thousands_sep,
+__cloc);
+ else
+   _M_data->_M_thousands_sep = *thousands_sep;
 
  // Check for NULL, which implies no fractional digits.
  if (_M_data->_M_decimal_point == '\0')
diff --git a/libstdc++-v3/config/locale/gnu/numeric_members.cc 
b/libstdc++-v3/config/locale/gnu/numeric_members.cc
index 1ede8fadbd0..faa35777cf3 100644
--- a/libstdc++-v3/config/locale/gnu/numeric_members.cc
+++ b/libstdc++-v3/config/locale/gnu/numeric_members.cc
@@ -30,11 +30,62 @@
 
 #include 
 #include 
+#include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+  extern char __narrow_multibyte_chars(const char* s, __locale_t cloc);
+
+// This file might be compiled twice, but we only want to define this once.
+#if ! _GLIBCXX_USE_CXX11_ABI
+  char
+  __narrow_multibyte_chars(const char* s, __locale_t cloc)
+  {
+const char* codeset = __nl_langinfo_l(CODESET, cloc);
+if (!strcmp(codeset, "UTF-8"))
+  {
+   // optimize for some known cases
+   if (!strcmp(s, "\u202F")) // NARROW NO-BREAK SPACE
+ return ' ';
+   if (!strcmp(s, "\u2019")) // RIGHT SINGLE QUOTATION MARK
+ return '\'';
+   if (!strcmp(s, "\u066C")) // ARABIC THOUSANDS SEPARATOR
+ return '\'';
+  }
+
+iconv_t cd = iconv_open("ASCII//TRANSLIT", codeset);
+if (cd != (iconv_t)-1)
+  {
+   char c1;
+   size_t inbytesleft = strlen(s);
+   size_t outbytesleft = 1;
+   char* inbuf = const_cast(s);
+   char* outbuf = &c1;
+   size_t n = iconv(cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
+   iconv_close(cd);
+   if (n != (size_t)-1)
+ {
+   cd = iconv_open(codeset, "ASCII");
+   if (cd != (iconv_t)-1)
+ {
+   char c2;
+   inbuf = &c1;
+   inbytesleft = 1;
+   outbuf = &c2;
+   outbytesleft = 1;
+   n = iconv(cd, &inbuf, &inbytesleft, &outbuf, &outbytesleft);
+   iconv_close(cd);
+   if (n != (size_t)-1)
+ return c2;
+ }
+ }
+  }
+return '\0';
+  }
+#endif
+
   template<>
 void
 numpunct::_M_initialize_numpunct(__c_locale __cloc)
@@ -63,8 +114,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // Named locale.
  _M_data->_M_decimal_poi

[PATCH] Fix tests that fail when built with different options

2018-10-18 Thread Jonathan Wakely

* testsuite/20_util/duration/cons/2.cc: Add -ffloat-store to fix
failure when compiled without optimisation.
* testsuite/ext/profile/mutex_extensions_neg.cc: Prune additional
errors caused by C++17 std::pmr alias templates.

Tested x86_64-linux, committed to trunk.

commit fb8ac7087f24a0e7e21639d75cf5227f7c15c9a2
Author: Jonathan Wakely 
Date:   Thu Oct 18 21:03:31 2018 +0100

Fix tests that fail when built with different options

* testsuite/20_util/duration/cons/2.cc: Add -ffloat-store to fix
failure when compiled without optimisation.
* testsuite/ext/profile/mutex_extensions_neg.cc: Prune additional
errors caused by C++17 std::pmr alias templates.

diff --git a/libstdc++-v3/testsuite/20_util/duration/cons/2.cc 
b/libstdc++-v3/testsuite/20_util/duration/cons/2.cc
index 3f48f25f101..65b151f8b20 100644
--- a/libstdc++-v3/testsuite/20_util/duration/cons/2.cc
+++ b/libstdc++-v3/testsuite/20_util/duration/cons/2.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++11 } }
+// { dg-additional-options "-ffloat-store" { target { m68*-*-* || ia32 } } }
 
 // Copyright (C) 2008-2018 Free Software Foundation, Inc.
 //
diff --git a/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc 
b/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
index 147d56740a1..69cc1115b80 100644
--- a/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
+++ b/libstdc++-v3/testsuite/ext/profile/mutex_extensions_neg.cc
@@ -29,3 +29,5 @@
 
 // "template argument 1 is invalid"
 // { dg-prune-output "tuple:993" }
+// PMR alias templates cause ambiguities between debug and profile containers:
+// { dg-prune-output "is ambiguous" }


Re: [PATCH v2 1/3] or1k: libgcc: initial support for openrisc

2018-10-18 Thread Stafford Horne
On Thu, Oct 18, 2018 at 03:22:56PM +0200, Sebastian Huber wrote:
> Hello,
> 
> is there a chance to get the or1k support integrated before the GCC 9 stage
> 3?

Hello,

I would definitly like that and that is my goal.  It seems the limiting factor
is getting technical review and signoff on this set of patches.

I will send out a PATCH v3 with a few minor enhancements gathered since v2 today
or tomorrow.  Then I will try to ping a few people if I dont get reviews by next
week.

-Stafford

> -- 
> Sebastian Huber, embedded brains GmbH
> 
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone   : +49 89 189 47 41-16
> Fax : +49 89 189 47 41-09
> E-Mail  : sebastian.hu...@embedded-brains.de
> PGP : Public key available on request.
> 
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
> 


Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

2018-10-18 Thread Joseph Myers
On Thu, 18 Oct 2018, Richard Sandiford wrote:

> - Type introspection for things like parsing format strings
> 
>   It sounded like the type descriptors would be fixed-sized types,
>   a bit like a C version of std::type_info.

It wasn't clear if people might also want to e.g. extract a list of all 
members of a structure type from such an object (which of course could 
either involve variable-sized data, or fixed-size data pointing to arrays, 
or something else along those lines).

> So I didn't see anything there that was really related, or anything that
> relied on sizeof being variable (which as I say seems to be a very high
> hurdle for C++).

The references you gave regarding the removal of one version of VLAs from 
C++ didn't seem to make clear whether there were supposed to be general 
issues with variable-size types fitting in the overall C++ object model, 
or whether the concerns were more specific to things in the particular 
proposal - but in either case, the SVE proposals would need to be compared 
to the actual specific concerns.

Anyway, the correct model in C++ need not be the same as the correct model 
in C.  For example, for decimal floating point, C++ chose a class-based 
model whereas C chose _Decimal* keywords (and then there's some compiler 
magic to use appropriate ABIs for std::decimal types, I think).

If you were implementing the SVE API for C++ for non-SVE hardware, you 
might have a class-based implementation where the class internally 
contains a pointer to underlying storage and does allocation / 
deallocation, for example - sizeof would give some fixed small size to the 
objects with that class type, but e.g. copying them with memcpy would not 
work correctly (and would be diagnosed with -Wclass-memaccess).  Is there 
something wrong with a model in C++ where these types have some fixed 
small sizeof (which carries through to sizeof for containing types), but 
where different ABIs are used for them, and where much the same raw memory 
operations on them are disallowed as would be disallowed for a class-based 
implementation?  (Whether implemented entirely in the compiler or through 
some combination of the compiler and class implementations in a header - 
though with the latter you might still need some new language feature, 
albeit only for use within the header rather than more generally.)

Even if that model doesn't work for some reason, it doesn't mean the only 
alternatives for C++ are something like VLAs or a new concept of sizeless 
types for C++ - but I don't have the C++ expertise to judge what other 
options for interfacing to SVE might fit best into the C++ language.

> I think it would look something like this (referring back to
> 
> *Object types are further partitioned into sized and
> sizeless; all basic and derived types defined in this standard are
> sized, but an implementation may provide additional sizeless types.*
> 
> in the RFC), not really in standardese yet:
> 
> Each implementation-specific sizeless type may have a set of
> implementation-specific "configurations".  The configuration of
> such a type may change in implementation-defined ways at any given
> sequence point.
> 
> The configuration of a sizeless structure is a tuple containing the
> configuration of each member.  Thus the configuration of a sizeless
> structure changes if and only if the configuration of one of its
> members changes.
> 
> The configuration of an object of sizeless type T is the configuration
> of T at the point that the object is created.
> 
> And then borrowing slightly from your 6.7.6.2#6 reference:
> 
> If an object of sizeless type T is accessed when T has a different
> configuration from the object, the behavior is undefined.
> 
> Is that the kind of thing you mean?

Yes.  But I wonder if it would be better to disallow such changing of 
configurations, so that all code in a program always uses the same 
configuration as far as the standard is concerned, so that there is indeed 
a size for a given vector type that's constant throughout the execution of 
a program (which would be used by calls to sizeof on such types), and so 
that communicating with a thread using a different configuration is just 
as much outside the scope of the defined language as processes using 
different ABIs communicating is today.

-- 
Joseph S. Myers
jos...@codesourcery.com


Fix std::byte namespace declaration

2018-10-18 Thread François Dumont
Current build of libstdc++ with --enable-symvers=gnu-versioned-namespace 
fails (at least under Linux) because of:


In file included from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory_resource:39,
 from 
../../../../../git/libstdc++-v3/src/c++17/memory_resource.cc:25:
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/cstddef:71:59: 
error: la référence à « byte » est ambiguë

   71 |   template<> struct __byte_operand { using __type = byte; };
  | ^~~~
In file included from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:61,
 from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory:62,
 from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory_resource:37,
 from 
../../../../../git/libstdc++-v3/src/c++17/memory_resource.cc:25:
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/cpp_type_traits.h:395:30: 
note: les candidats sont : « enum class std::__8::byte »

  395 |   enum class byte : unsigned char;
  |  ^~~~
In file included from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory_resource:39,
 from 
../../../../../git/libstdc++-v3/src/c++17/memory_resource.cc:25:
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/cstddef:68:14: 
note:  « enum class std::byte »

   68 |   enum class byte : unsigned char {};
  |  ^~~~

I think the issue if that std::byte declaration in cpp_type_traits.h has 
been done in versioned namespace, so the attached patch.


    * include/bits/cpp_type_traits.h (std::byte): Move outside versioned
    namespace.

Tested under Linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h b/libstdc++-v3/include/bits/cpp_type_traits.h
index 960d469f412..fa7fc7564c2 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -68,6 +68,10 @@ extern "C++" {
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
+#if __cplusplus >= 201703L
+  enum class byte : unsigned char;
+#endif
+
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   struct __true_type { };
@@ -392,8 +396,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
 };
 
 #if __cplusplus >= 201703L
-  enum class byte : unsigned char;
-
   template<>
 struct __is_byte
 {



Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

2018-10-18 Thread Joseph Myers
On Thu, 18 Oct 2018, Uecker, Martin wrote:

> Most of the restrictions for these types would be the same
> as proposed for your sizeless types. 
> 
> Because all these types fall into the same overall class
> of types which do not have a size known at compile
> time, I would suggest to add this concept to the standard
> and then define your vector types as a subclass which
> may have additional restrictions (no sizeof) instead
> of adding a very specific concept which only works for
> your proposal.

And an underlying point here is:

Various people are exploring various ideas for C language and library 
features that might involve extending the kinds of types present in C.  
Maybe some of the ideas will turn out to be fundamentally flawed; maybe 
some will work with existing kinds of types rather than needing new kinds 
of variable-sized types.  But since all those ideas are currently under 
discussion in WG14, the SVE issues should be brought into the exploration 
process taking place there, with a view to getting a better-defined set of 
concepts for such types out of that process than from considering just one 
proposal for concepts for one set of requirements in the context of one 
implementation.

Once that discussion has resulted in a more generally applicable set of 
concepts, experience in implementing that set of concepts - likely various 
different people implementing them, in different C implementations, with a 
view to the different use cases they are exploring - could help inform any 
standardization of such features.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] v2: gccint.texi: add user experience guidelines

2018-10-18 Thread David Malcolm
On Tue, 2018-10-16 at 21:39 -0600, Sandra Loosemore wrote:
> On 10/12/2018 09:43 AM, David Malcolm wrote:
> > Here's a proposed "User Experience Guidelines" section for our
> > internals manual
> > 
> > It's a mixture of proposed policy, together with notes on how to
> > implement the recommendations.
> > 
> > Thoughts?
> 
> I think this documentation will be very helpful.  I'll leave other 
> people who've worked on this aspect of the code to comment on the 
> content, but a few markup/copy-editing things I noticed while
> skimming 
> the patch:
> 
> - Can you please not use double-quote markup around so many words
> and 
> phrases?  If there's a technical term, use @dfn{} at the first use
> where 
> you define it (and probably also an @cindex entry), and no markup on 
> subsequent uses.  In most other cases it seemed like the quotes
> would 
> just be distracting from the flow of the text.
> 
> - I don't think "end-user" should be hyphenated when used as a noun, 
> although as an adjective phrase like "end-user experience" etc is
> fine.
> 
> - Remember to use @noindent when continuing a sentence or paragraph 
> broken up by a code example.
> 
> I'll take a deeper dive on the next iteration of the patch.
> 
> -Sandra

Thanks.

Here's an updated version of the patch, addressing your above comments,
and those from Martin and Richard (I hope).

I have a couple of texinfo questions:

(a) the guidelines frequently have contrasting pairs
of examples showing how to do something vs how not to do it.  Is there
a way of marking these up in texinfo beyond just @smallexample?
(and manually putting in "BAD" and "OK", as I've done)

(b) what's the best way of showing example output from gcc?  In
particular I wasn't able to properly express the single quotes emitted by
GCC's %qs, %<, and %> directives: everything I've tried so far has issues
in at least one of the pdf vs the html output.  I've settled for using
single quotes, which is easy to emit via LANG=C and looks OK in html,
but less good in pdf.  (also, I'd love to be able to express colorization
sanely; see e.g.:
  https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00186.html
which takes colorized GCC output and turns it into HTML for our
website; e.g. gcc-8/changes.html).

Changed in v2:
* removed "Tense of Messages" section for now
* simplified "Given a warning, an end-user will want to review the
  warning and think:" to "Given a warning, an end-user will think:"
* reworded "Is this a ``true'' result?" to "Is this a real problem?"
* split out the testing idea into a "Try the diagnostic on real-world
  code" subsection and added not about precision of wording
* added example of output for OPT_duplicated_cond example
* added note about guarding the "inform"
* added example of using %H and %I
* added @noindent in many places; merged some paragraphs
* "end-user" -> "end user" in the 3 places it was used
* replaced many uses of double-quotes with @dfn{}, adding @cindex
* converted @cindex to lower-case
* fixed string literal syntax in -Wformat example
* added "Precision of Wording" section, and used the same example
  to illustrate the "don't use input_location" guideline.
* added more "// BAD" and "// OK" lines to examples
* fixed "are and" typo
* give rationale for why gcc_rich_location is preferable to
  rich_location

gcc/ChangeLog:
* Makefile.in (TEXI_GCCINT_FILES): Add ux.texi.
* doc/gccint.texi: Include ux.texi and use it in top-level menu.
* doc/ux.texi: New file.
---
 gcc/Makefile.in |   2 +-
 gcc/doc/gccint.texi |   2 +
 gcc/doc/ux.texi | 595 
 3 files changed, 598 insertions(+), 1 deletion(-)
 create mode 100644 gcc/doc/ux.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8a85d7e..e649b9d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3177,7 +3177,7 @@ TEXI_GCCINT_FILES = gccint.texi gcc-common.texi 
gcc-vers.texi \
 gnu.texi gpl_v3.texi fdl.texi contrib.texi languages.texi  \
 sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi   \
 loop.texi generic.texi gimple.texi plugins.texi optinfo.texi   \
-match-and-simplify.texi poly-int.texi
+match-and-simplify.texi ux.texi poly-int.texi
 
 TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi \
 gcc-common.texi gcc-vers.texi
diff --git a/gcc/doc/gccint.texi b/gcc/doc/gccint.texi
index 1a1af41..2554b31 100644
--- a/gcc/doc/gccint.texi
+++ b/gcc/doc/gccint.texi
@@ -125,6 +125,7 @@ Additional tutorial information is linked to from
 * LTO:: Using Link-Time Optimization.
 
 * Match and Simplify:: How to write expression simplification patterns for 
GIMPLE and GENERIC
+* User Experience Guidelines:: Guidelines for implementing diagnostics and 
options.
 * Funding:: How to help assure funding for free software.
 * GNU Project:: The GNU Project and GNU/Linux.
 
@@ -162,6 +163,7 @@ Additional tutorial informati

Re: [00/10][RFC] Splitting the C and C++ concept of "complete type"

2018-10-18 Thread Uecker, Martin
Am Donnerstag, den 18.10.2018, 20:53 +0100 schrieb Richard Sandiford:
> "Uecker, Martin"  writes:
> > Hi Richard,
> > 
> > responding here to a couple of points.
> > 
> > For bignums and for a type-descibing type 'type'
> > there were proposals (including from me) to implement
> > these as variable-sized types which have some restrictions,
> > i.e. they cannot be stored in a struct/union.
> 
> But do you mean variable-sized types in the sense that they
> are completely self-contained and don't refer to separate storage?
> I.e. the moral equivalent of:
> 
>   1: struct { int size; int contents[size]; };
> 
> rather than either:
> 
>   2: struct { int size; int *contents; };

I was thinking about 1 not 2. But I would leave this to the
implementation. If it can unwind the stack and free
all allocated storage automatically whenever this is
necessary, it could also allocate it somewhere else.
Not that this would offer any real advantage...

In both cases the only real problem is when storing
these in structs. So this should simply be forbidden
as it is for VLAs.

> or:
> 
>   3: union {
>    // embedded storage up to N bits (N constant)
>    // description of separately-allocated storage (for >N bits)
>  };

This is essentially an optimized version of 2.

> ?  If so, how would that work with the example I gave in the earlier
> message:
> 
> bignum x = ...;
> for (int i = 0; i < var; ++i)
>   x += x;
> 
> Each time the addition result grows beyond the original size of x,
> I assume you'd need to allocate a new stack bignum for the new size,
> which would result in a series of ever-increasing allocas.  Won't that
> soon blow the stack?

That depends on the final size of x relative to the size of the stack.
But this is no different to:

for (int i = 0; i < var; ++)
{
   int x[i];
}

or to a recursive function. There are many ways to exhaust the
stack. It is also possible to exhaust other kinds of resources.
I don't really see the problem.

> Option 3 (as for LLVM's APInt) seems far less surprising, and can
> be made efficient for a chosen N.

Far less surprising in what sense?

>  What makes it difficult for C
> isn't the lack of general variable-length types but the lack of
> user-defined contructor, destructor, copy and move operations.

C already has generic variable-length types (VLAs). So yes, 
this is not what makes it difficult.

Yes, descructors would be needed to make it possible to store
these types in struct without memory leakage. But the destructors
don't need to be user defined, it could be a special purpose
destructor which only frees the special type.

But it doesn't really fit in the way C works and I kind of like
that C doesn't do anything behind my back.

Best,
Martin


> Thanks,
> Richard
> 
> > Most of the restrictions for these types would be the same
> > as proposed for your sizeless types. 
> > 
> > Because all these types fall into the same overall class
> > of types which do not have a size known at compile
> > time, I would suggest to add this concept to the standard
> > and then define your vector types as a subclass which
> > may have additional restrictions (no sizeof) instead
> > of adding a very specific concept which only works for
> > your proposal.
> > 
> > Best,
> > Martin
> > 
> > 
> > 
> > 
> > 
> > Am Donnerstag, den 18.10.2018, 13:47 +0100 schrieb Richard Sandiford:
> > > Joseph Myers  writes:
> > > > On Wed, 17 Oct 2018, Richard Sandiford wrote:
> > > > > Yeah, can't deny that if you look at it as a general-purpose 
> > > > > extension.
> > > > > But that's not really what this is supposed to be.  It's fairly 
> > > > > special
> > > > > purpose: there has to be some underlying variable-length/sizeless
> > > > > built-in type that you want to provide via a library.
> > > > > 
> > > > > What the extension allows is enough to support the intended use case,
> > > > > and it does that with no enforced overhead.
> > > > 
> > > > Part of my point is that there are various *other* possible cases of 
> > > > non-VLA-variable-size-type people have suggested in WG14 reflector 
> > > > discussions - so any set of concepts for such types ought to take into 
> > > > account more than just the SVE use case (even if other use cases need 
> > > > further concepts added on top of the ones needed for SVE).
> > > 
> > > [Answered this in the other thread -- sorry, took me a while to go
> > > through the full discussion.]
> > > 
> > > > > > Surely, the processor knows the size when it computes using these
> > > > > > types, so one could make it available using 'sizeof'.
> > > > > 
> > > > > The argument's similar here: we don't really need sizeof to be 
> > > > > available
> > > > > for vector use because the library provides easy ways of getting
> > > > > vector-length-based constants.  Usually what you want to know is
> > > > > "how many elements of type X are there?", with bytes just being one
> > > > > of the available element sizes.
> > > > 
> > > > But if havi

Re: Fix std::byte namespace declaration

2018-10-18 Thread Jonathan Wakely

On 18/10/18 22:12 +0200, François Dumont wrote:
Current build of libstdc++ with 
--enable-symvers=gnu-versioned-namespace fails (at least under Linux) 
because of:


In file included from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory_resource:39,
 from 
../../../../../git/libstdc++-v3/src/c++17/memory_resource.cc:25:
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/cstddef:71:59: 
error: la référence à « byte » est ambiguë

   71 |   template<> struct __byte_operand { using __type = byte; };
  | ^~~~
In file included from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:61,
 from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory:62,
 from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory_resource:37,
 from 
../../../../../git/libstdc++-v3/src/c++17/memory_resource.cc:25:
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/cpp_type_traits.h:395:30: 
note: les candidats sont : « enum class std::__8::byte »

  395 |   enum class byte : unsigned char;
  |  ^~~~
In file included from 
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/memory_resource:39,
 from 
../../../../../git/libstdc++-v3/src/c++17/memory_resource.cc:25:
/home/fdt/dev/gcc/build_versioned_ns/x86_64-pc-linux-gnu/libstdc++-v3/include/cstddef:68:14: 
note:  « enum class std::byte »

   68 |   enum class byte : unsigned char {};
  |  ^~~~

I think the issue if that std::byte declaration in cpp_type_traits.h 
has been done in versioned namespace, so the attached patch.


I think the definitions in  should use the versioned
namespace macros. Then  would be correct.



    * include/bits/cpp_type_traits.h (std::byte): Move outside versioned
    namespace.

Tested under Linux x86_64.

Ok to commit ?

François




diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index 960d469f412..fa7fc7564c2 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -68,6 +68,10 @@ extern "C++" {

namespace std _GLIBCXX_VISIBILITY(default)
{
+#if __cplusplus >= 201703L
+  enum class byte : unsigned char;
+#endif
+
_GLIBCXX_BEGIN_NAMESPACE_VERSION

  struct __true_type { };
@@ -392,8 +396,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
};

#if __cplusplus >= 201703L
-  enum class byte : unsigned char;
-
  template<>
struct __is_byte
{





Add missing _Safe_local_iterator_base::_M_attach_symbol export

2018-10-18 Thread François Dumont

As reported in another mail a symbol is missing.

This patch update test framework so that the problem is obvious and fix it.

    * testsuite/util/testsuite_containers.h
    (forward_members_unordered<>::forward_members_unordered
    (const value_type&)): Add local_iterator pre and post increment checks.
    * config/abi/pre/gnu.ver: Add GLIBCXX_3.4.26 new symbol.

Tested under Linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index f90ead30dd1..e8cd286ef0c 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2054,6 +2054,7 @@ GLIBCXX_3.4.26 {
 # std::basic_filebuf::open(const wchar_t*, openmode)
 _ZNSt13basic_filebufI[cw]St11char_traitsI[cw]EE4openEPKwSt13_Ios_Openmode;
 
+_ZN11__gnu_debug25_Safe_local_iterator_base16_M_attach_singleEPNS_19_Safe_sequence_baseEb;
 } GLIBCXX_3.4.25;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/testsuite/util/testsuite_containers.h b/libstdc++-v3/testsuite/util/testsuite_containers.h
index eadd43768d2..250dfda668d 100644
--- a/libstdc++-v3/testsuite/util/testsuite_containers.h
+++ b/libstdc++-v3/testsuite/util/testsuite_containers.h
@@ -176,11 +176,12 @@ namespace __gnu_test
 	   typename = typename std::iterator_traits<_Iterator>::iterator_category>
 struct iterator_concept_checks;
 
+#if __cplusplus >= 201103L
   // DR 691.
   template
 struct forward_members_unordered
 {
-  forward_members_unordered(typename _Tp::value_type& v)
+  forward_members_unordered(const typename _Tp::value_type& v)
   {
 	// Make sure that even if rel_ops is injected there is no ambiguity
 	// when comparing iterators.
@@ -196,12 +197,20 @@ namespace __gnu_test
 
 	assert( container.cbegin(0) == container.begin(0) );
 	assert( container.cend(0) == container.end(0) );
-	const typename test_type::size_type bn = container.bucket(1);
-	assert( container.cbegin(bn) != container.cend(bn) );
-	assert( container.cbegin(bn) != container.end(bn) );
+	const auto bn = container.bucket(1);
+	auto clit = container.cbegin(bn);
+	assert( clit != container.cend(bn) );
+	assert( clit != container.end(bn) );
+	assert( clit++ == container.cbegin(bn) );
+	assert( clit == container.end(bn) );
+
+	clit = container.cbegin(bn);
+	assert( ++clit == container.cend(bn) );
+
 	assert( container.begin(bn) != container.cend(bn) );
   }
 };
+#endif
 
   template
 struct iterator_concept_checks<_Iterator, false,



Re: [PATCH] Add splay-tree "view" for bitmap

2018-10-18 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, 18 Oct 2018, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > PR63155 made me pick up this old work from Steven, it turns our
>> > linked-list implementation to a two-mode one with one being a
>> > splay tree featuring O(log N) complexity for find/remove.
>> >
>> > Over Stevens original patch I added a bitmap_tree_to_vec helper
>> > that I use from the debug/print methods to avoid changing view
>> > there.  In theory the bitmap iterator could get a "stack"
>> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
>> >
>> > This can be used to fix the two biggest bottlenecks in the PRs
>> > testcase, namely SSA propagator worklist handling and out-of-SSA
>> > coalesce list building.  perf shows the following data, first
>> > unpatched, second patched - also watch the thrid coulumn (samples)
>> > when comparing percentages.
>> >
>> > -O0
>> > -   18.19%17.35%   407  cc1  cc1   [.] 
>> > bitmap_set_b▒
>> >- bitmap_set_bit
>> > ▒
>> >   + 8.77% create_coalesce_list_for_region  
>> > ▒
>> >   + 4.21% calculate_live_ranges
>> > ▒
>> >   + 2.02% build_ssa_conflict_graph 
>> > ▒
>> >   + 1.66% insert_phi_nodes_for 
>> > ▒
>> >   + 0.86% coalesce_ssa_name  
>> > patched:
>> > -   12.39%10.48%   129  cc1  cc1   [.] 
>> > bitmap_set_b▒
>> >- bitmap_set_bit
>> > ▒
>> >   + 5.27% calculate_live_ranges
>> > ▒
>> >   + 2.76% insert_phi_nodes_for 
>> > ▒
>> >   + 1.90% create_coalesce_list_for_region  
>> > ▒
>> >   + 1.63% build_ssa_conflict_graph 
>> > ▒
>> >   + 0.35% coalesce_ssa_name   
>> >
>> > -O1
>> > -   17.53%17.53%   842  cc1  cc1   [.] 
>> > bitmap_set_b▒
>> >- bitmap_set_bit
>> > ▒
>> >   + 12.39% add_ssa_edge
>> > ▒
>> >   + 1.48% create_coalesce_list_for_region  
>> > ▒
>> >   + 0.82% solve_constraints
>> > ▒
>> >   + 0.71% calculate_live_ranges
>> > ▒
>> >   + 0.64% add_implicit_graph_edge  
>> > ▒
>> >   + 0.41% insert_phi_nodes_for 
>> > ▒
>> >   + 0.34% build_ssa_conflict_graph  
>> > patched:
>> > -5.79% 5.00%   167  cc1  cc1   [.] 
>> > bitmap_set_b▒
>> >- bitmap_set_bit
>> > ▒
>> >   + 1.41% add_ssa_edge 
>> > ▒
>> >   + 0.88% calculate_live_ranges
>> > ▒
>> >   + 0.75% add_implicit_graph_edge  
>> > ▒
>> >   + 0.68% solve_constraints
>> > ▒
>> >   + 0.48% insert_phi_nodes_for 
>> > ▒
>> >   + 0.45% build_ssa_conflict_graph   
>> >
>> > -O3
>> > -   12.37%12.34%  1145  cc1  cc1   [.] 
>> > bitmap_set_b▒
>> >- bitmap_set_bit
>> > ▒
>> >   + 9.14% add_ssa_edge 
>> > ▒
>> >   + 0.80% create_coalesce_list_for_region  
>> > ▒
>> >   + 0.69% add_implicit_graph_edge  
>> > ▒
>> >   + 0.54% solve_constraints
>> > ▒
>> >   + 0.34% calculate_live_ranges
>> > ▒
>> >   + 0.27% insert_phi_nodes_for 
>> > ▒
>> >   + 0.21% build_ssa_conflict_graph 
>> > -4.36% 3.86%   227  cc1  cc1   [.] 
>> > bitmap_set_b▒
>> >- bitmap_set_bit
>> > ▒
>> >   + 0.98% add_ssa_edge 
>> > ▒
>> >   + 0.86% add_implicit_graph_edge  
>> > ▒
>> >   + 0.64% solve_constraints
>> > ▒
>> >   + 0.57% calculate_live_ranges   

Re: [PATCH 2/2] Simplify subreg of vec_merge of vec_duplicate

2018-10-18 Thread H.J. Lu
On 10/18/18, Richard Sandiford  wrote:
> "H.J. Lu"  writes:
>> On 10/18/18, Richard Sandiford  wrote:
>>> "H.J. Lu"  writes:
 On 10/18/18, Richard Sandiford  wrote:
> "H.J. Lu"  writes:
>> On 10/18/18, Richard Sandiford  wrote:
>>> "H.J. Lu"  writes:
 On 10/17/18, Marc Glisse  wrote:
> On Wed, 17 Oct 2018, H.J. Lu wrote:
>
>> We may simplify
>>
>>  (subreg (vec_merge (vec_duplicate X) (vector) (const_int 1)) 0)
>>
>> to X when mode of X is the same as of mode of subreg.
>
> Hello,
>
> we already have code to simplify vec_select(vec_merge):
>
>  /* If we select elements in a vec_merge that all come from
> the
> same
> operand, select from that operand directly.  */
>
> It would make sense to me to make the subreg transform as similar
> to
> it
> as
> possible, in particular you don't need to special case
> vec_duplicate,
> the
> transformation would see that everything comes from the first
> vector,
> produce (subreg (vec_duplicate X) 0), and let another
> transformation
> optimize that.
>>>
>>> Sorry, didn't see this before the OK.
>>>
 What do you mean by another transformation? If simplify_subreg
 doesn't
 return X for

   (subreg (vec_merge (vec_duplicate X)
 (vector)
 (const_int ((1 << N) | M)))
  (N * sizeof (X)))


 no further transformation will be done.
>>>
>>> I think the point was that we should transform:
>>>
>>>   (subreg (vec_merge X
>>>  (vector)
>>>  (const_int ((1 << N) | M)))
>>>   (N * sizeof (X)))
>>>
>>> into:
>>>
>>>   simplify_gen_subreg (outermode, X, innermode, byte)
>>>
>>> which should further simplify when X is a vec_duplicate.
>>
>> But sizeof (X) is the size of scalar of vec_dup.  How do we
>> check the mask of vec_merge?
>
> Yeah, should be sizeof (outermode) (which was the same thing
> in the original pattern, but not here).
>
> Richard
>

 Like this

 diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
 index b0cf3bbb2a9..e12b5c0e165 100644
 --- a/gcc/simplify-rtx.c
 +++ b/gcc/simplify-rtx.c
 @@ -6601,20 +6601,21 @@ simplify_subreg (machine_mode outermode, rtx
 op,
return NULL_RTX;
  }

 -  /* Return X for
 -  (subreg (vec_merge (vec_duplicate X)
 +  /* Simplify
 +  (subreg (vec_merge (X)
 (vector)
 (const_int ((1 << N) | M)))
 - (N * sizeof (X)))
 + (N * sizeof (outermode)))
 + to
 +  (subreg ((X) (N * sizeof (outermode)))
>>>
>>> Stray "(": (subreg (X) (N * sizeof (outermode)))
>>>
>>> OK with that change if it passes testing.
>>
>> The self-test failed for 32-bit compiler:
>>
>> expected: (reg:QI 342)
>>   actual: (subreg:QI (vec_merge:V128QI (vec_duplicate:V128QI (reg:QI
>> 342))
>> (reg:V128QI 343)
>> (const_int 65 [0x41])) 64)
>>
>> since
>>
>> && (UINTVAL (XEXP (op, 2)) & (HOST_WIDE_INT_1U << idx)) != 0)
>>
>> works only up to vectors with 64 elements for 32-bit compilers.
>
> Since HOST_WIDE_INT should be 64 bits for both 32-bit and 64-bit
> compilers, I guess the test invoked UB and so only worked by chance
> for 64-bit compilers?
>
>> Should we limit the self-test to vectors with 64 elements?
>
> HOST_BITS_PER_WIDE_INT elements probably.  I wondered at first whether
> we should try to support CONST_WIDE_INT, but that just changes the limit
> to WIDE_INT_MAX_PRECISION.

This is the patch I am going to check in.

-- 
H.J.
From 9c491755021816acb10fae03e897aa9932e84e9e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 18 Oct 2018 12:56:23 -0700
Subject: [PATCH] Limit mask of vec_merge to HOST_BITS_PER_WIDE_INT

Since mask of vec_merge is in HOST_WIDE_INT, HOST_BITS_PER_WIDE_INT is
the maximum number of vector elements.

	* simplify-rtx.c (simplify_subreg): Limit mask of vec_merge to
	HOST_BITS_PER_WIDE_INT.
	(test_vector_ops_duplicate): Likewise.
---
 gcc/simplify-rtx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index ccf92166356..2ff68ceb4e3 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -6611,6 +6611,7 @@ simplify_subreg (machine_mode outermode, rtx op,
*/
   unsigned int idx;
   if (constant_multiple_p (byte, GET_MODE_SIZE (outermode), &idx)
+  && idx < HOST_BITS_PER_WIDE_INT
   && GET_CODE (op) == VEC_MERGE
   && GET_MODE_INNER (innermode) == outermode
   && CONST_INT_P (XEXP (op, 2))
@@ -6861,6 +6862,8 @@ test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
   rtx vector_re

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-18 Thread Giuliano Belinassi
On 10/18, Jeff Law wrote:
> On 10/17/18 4:21 PM, Giuliano Augusto Faulin Belinassi wrote:
> > Oh, please note that the error that I'm talking about is the
> > comparison with the result obtained before and after the
> > simplification. It is possible that the result obtained after the
> > simplification be more precise when compared to an arbitrary precise
> > value (example, a 30 digits precise approximation). Well, I will try
> > check that.
> That would be helpful.  Obviously if we're getting more precise, then
> that's a good thing :-)
> 
> jeff

Well, I compared the results before and after the simplifications with a 512-bit
precise mpfr value. Unfortunately, I found that sometimes the error is very
noticeable :-( .

For example, using floats and comparing with a 512 precision mpfr calculation

with input   :  = 9.9996697902679443359375e-01
cosh: before :  = 1.2305341339111328125000e+02
cosh: after  :  = 1.230523986816406250e+02
cosh: mpfr512:  = 1.23053409952258504358633865742873246642102963529577e+02
error before :  = 3.43885477689136613425712675335789703647042270993727e-06
error after  :  = 1.01127061787935863386574287324664210296352957729006e-03

There are also some significant loss of precision with long doubles:

with input   :  = 9.96799706237365912286918501195032149553e-01
cosh: before :  = 1.24994262843556815705596818588674068450927734375000e+07
cosh: after  :  = 1.24994262843556715697559411637485027313232421875000e+07
cosh: mpfr512:  = 1.24994262843556815704069193408098058772318248178348e+07
error before :  = 1.52762518057600967860948619665184971612393688891101e-13
error after  :  = 1.6509781770613031459085826303348150283876063111e-08

So yes, precision may be a problem here.


Re: Add missing _Safe_local_iterator_base::_M_attach_symbol export

2018-10-18 Thread Jonathan Wakely

On 18/10/18 22:39 +0200, François Dumont wrote:

As reported in another mail a symbol is missing.

This patch update test framework so that the problem is obvious and fix it.

    * testsuite/util/testsuite_containers.h
    (forward_members_unordered<>::forward_members_unordered
    (const value_type&)): Add local_iterator pre and post increment checks.
    * config/abi/pre/gnu.ver: Add GLIBCXX_3.4.26 new symbol.

Tested under Linux x86_64.

Ok to commit ?


OK. Thanks for finding a way to test it too.



[PATCH] i386: Enable AVX512 memory broadcast for FP add

2018-10-18 Thread H.J. Lu
Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FP add operations.

gcc/

PR target/72782
* config/i386/sse.md
(*3_bcst_1): New.
(*add3_bcst_2): Likewise.

gcc/testsuite/

PR target/72782
* gcc.target/i386/avx512-binop-1.h: New file.
* gcc.target/i386/avx512-binop-2.h: Likewise.
* gcc.target/i386/avx512-binop-3.h: Likewise.
* gcc.target/i386/avx512-binop-4.h: Likewise.
* gcc.target/i386/avx512-binop-5.h: Likewise.
* gcc.target/i386/avx512-binop-6.h: Likewise.
* gcc.target/i386/avx512f-add-df-zmm-1.c: Likewise.
* gcc.target/i386/avx512f-add-sf-zmm-1.c: Likewise.
* gcc.target/i386/avx512f-add-sf-zmm-2.c: Likewise.
* gcc.target/i386/avx512f-add-sf-zmm-3.c: Likewise.
* gcc.target/i386/avx512f-add-sf-zmm-4.c: Likewise.
* gcc.target/i386/avx512f-add-sf-zmm-5.c: Likewise.
* gcc.target/i386/avx512f-add-sf-zmm-6.c: Likewise.
* gcc.target/i386/avx512f-sub-df-zmm-1.c: Likewise.
* gcc.target/i386/avx512f-sub-sf-zmm-1.c: Likewise.
* gcc.target/i386/avx512f-sub-sf-zmm-2.c: Likewise.
* gcc.target/i386/avx512f-sub-sf-zmm-3.c: Likewise.
* gcc.target/i386/avx512f-sub-sf-zmm-4.c: Likewise.
* gcc.target/i386/avx512f-sub-sf-zmm-5.c: Likewise.
* gcc.target/i386/avx512vl-add-sf-xmm-1.c: Likewise.
* gcc.target/i386/avx512vl-add-sf-ymm-1.c: Likewise.
* gcc.target/i386/avx512vl-sub-sf-xmm-1.c: Likewise.
* gcc.target/i386/avx512vl-sub-sf-ymm-1.c: Likewise.
---
 gcc/config/i386/sse.md| 28 +++
 .../gcc.target/i386/avx512-binop-1.h  | 12 
 .../gcc.target/i386/avx512-binop-2.h  | 12 
 .../gcc.target/i386/avx512-binop-3.h  | 15 ++
 .../gcc.target/i386/avx512-binop-4.h  | 12 
 .../gcc.target/i386/avx512-binop-5.h  | 14 ++
 .../gcc.target/i386/avx512-binop-6.h  | 14 ++
 .../gcc.target/i386/avx512f-add-df-zmm-1.c| 12 
 .../gcc.target/i386/avx512f-add-sf-zmm-1.c| 12 
 .../gcc.target/i386/avx512f-add-sf-zmm-2.c| 12 
 .../gcc.target/i386/avx512f-add-sf-zmm-3.c| 12 
 .../gcc.target/i386/avx512f-add-sf-zmm-4.c| 12 
 .../gcc.target/i386/avx512f-add-sf-zmm-5.c| 12 
 .../gcc.target/i386/avx512f-add-sf-zmm-6.c| 12 
 .../gcc.target/i386/avx512f-sub-df-zmm-1.c| 12 
 .../gcc.target/i386/avx512f-sub-sf-zmm-1.c| 12 
 .../gcc.target/i386/avx512f-sub-sf-zmm-2.c| 12 
 .../gcc.target/i386/avx512f-sub-sf-zmm-3.c| 12 
 .../gcc.target/i386/avx512f-sub-sf-zmm-4.c| 12 
 .../gcc.target/i386/avx512f-sub-sf-zmm-5.c| 12 
 .../gcc.target/i386/avx512vl-add-sf-xmm-1.c   | 12 
 .../gcc.target/i386/avx512vl-add-sf-ymm-1.c   | 12 
 .../gcc.target/i386/avx512vl-sub-sf-xmm-1.c   | 12 
 .../gcc.target/i386/avx512vl-sub-sf-ymm-1.c   | 12 
 24 files changed, 311 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-binop-1.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-binop-2.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-binop-3.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-binop-4.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-binop-5.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512-binop-6.h
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-df-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-add-sf-zmm-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-sub-df-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-sub-sf-zmm-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-sf-xmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-add-sf-ymm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-sub-sf-xmm-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-sub-sf-ymm-1.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 71684d63423..3c7b0aabb24 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1684,6 +1684,34 @@
(set

Re: [PATCH 14/14] Add D Phobos config, makefiles, and testsuite.

2018-10-18 Thread Iain Buclaw
On Tue, 16 Oct 2018 at 19:01, Richard Sandiford
 wrote:
>
> Iain Buclaw  writes:
> > diff --git a/libphobos/d_rules.am b/libphobos/d_rules.am
> > new file mode 100644
> > index 000..b16cf5052d2
> > --- /dev/null
> > +++ b/libphobos/d_rules.am
> > @@ -0,0 +1,60 @@
> > +# This file contains some common rules for D source compilation
> > +# used for libdruntime and libphobos
> > +
> > +# If there are no sources with known extension (i.e. only D sources)
> > +# automake forgets to set this
>
> Needs a copyright notice and licence.
>

OK.

> > +# AC_LANG(D)
> > +# ---
> > +# (we have to use GDC as variable prefix as our GCC patches set GDC
> > +#  GDC_FOR_BUILD etc. If we ever want to support other D compilers all
> > +#  names need to be changed to DC)
>
> Seems like this is still talking about GDC as a separate project.
>

Will remove.

> > +  # This checks to see if the host supports the compiler-generated builtins
> > +  # for atomic operations for various integral sizes. Note, this is 
> > intended
> > +  # to be an all-or-nothing switch, so all the atomic operations that are
> > +  # used should be checked.
> > +  AC_MSG_CHECKING([for atomic builtins for byte])
> > +  AC_CACHE_VAL(druntime_cv_atomic_byte, [
> > +AC_TRY_LINK(
> > +  [import gcc.builtins;], [
> > +  shared(byte) c1;
> > +   byte c2, c3;
> > +   __atomic_compare_exchange_1(&c1, &c2, c3, false, 5, 5);
> > +   __atomic_load_1(&c1, 5);
> > +   __atomic_store_1(&c1, c2, 5);
> > +   return 0;
> > +  ],
> > +  [druntime_cv_atomic_byte=yes],
> > +  [druntime_cv_atomic_byte=no])
> > +  ])
> > +  AC_MSG_RESULT($druntime_cv_atomic_byte)
>
> Link tests generally don't work for newlib targets, since they often
> require a specific command-line option to specify the target system.
> But perhaps you don't support newlib targets anyway.  Either way,
> it shouldn't hold up acceptance.
>

Targets using newlib are definitely not supported.  First off, there
would need to be relevant C bindings added to the druntime library
where appropriate.

> > --- /dev/null
> > +++ b/libphobos/src/Makefile.am
> > @@ -0,0 +1,211 @@
> > +# Makefile for the Phobos standard library.
> > +# Copyright (C) 2012-2017 Free Software Foundation, Inc.
>
> 2012-2018.
>

I will follow this up with a patch to contrib/update_copyrights.py so
that this is never missed in future.

> > diff --git a/libphobos/testsuite/Makefile.am 
> > b/libphobos/testsuite/Makefile.am
> > new file mode 100644
> > index 000..dd99d9d871e
> > --- /dev/null
> > +++ b/libphobos/testsuite/Makefile.am
> > @@ -0,0 +1,15 @@
> > +## Process this file with automake to produce Makefile.in.
> > +
> > +AUTOMAKE_OPTIONS = foreign dejagnu
> > +
> > +# Setup the testing framework, if you have one
> > +EXPECT = $(shell if test -f $(top_builddir)/../expect/expect; then \
> > +echo $(top_builddir)/../expect/expect; else echo expect; fi)
> > +
> > +_RUNTEST = $(shell if test -f $(top_srcdir)/../dejagnu/runtest; then \
> > +  echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
> > +RUNTEST = "$(_RUNTEST) $(AM_RUNTESTFLAGS)"
> > +
> > +AM_MAKEFLAGS = "EXEEXT=$(EXEEXT)"
> > +
> > +CLEANFILES = *.log *.sum
>
> Should probably have a copyright & licence here too, even though
> it's small, since it could grow in future.
>

OK, done.

> > +// { dg-shouldfail "static_dtor_exception" }
> > +// { dg-output "object.Exception@.*: static_dtor_exception" }
> > +// Issue 16594
> > +import core.stdc.stdio;
>
> Which bug tracker is this referring to?  Maybe a URI would be better,
> to avoid confusion with GCC's bugzilla.  Same for other bugzilla
> references in later tests.  Or just remove if the tracker isn't public.
>

They are referring to upstream bug-tracker http://issues.dlang.org/

I'll fix them up accordingly here, will also send them upstream too as
clickable links are generally favoured.

>
> I think that's the last of the unreviewed patches.  Let me know
> if I missed one.
>

Addressing these comments has taken a little longer than I hoped,
partly due to time constraints on my side leaving me to go through
this during the dark hours of the night.

What else would be left to do here once all has been addressed?

Thanks.
-- 
Iain


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-18 Thread Wilco Dijkstra
Hi,

> Well, I compared the results before and after the simplifications with a 
> 512-bit
> precise mpfr value. Unfortunately, I found that sometimes the error is very
> noticeable :-( .

Did you enable FMA? I'd expect 1 - x*x to be accurate with FMA, so the relative 
error
should be much better. If there is no FMA, 2*(1-fabs(x)) - (1-fabs(x))^2 should 
be
more accurate when abs(x)>0.5 and still much faster.

Wilco



Re: [PATCH] v2: Run selftests for C++ as well as C

2018-10-18 Thread Eric Botcazou
> I've added the missing ChangeLog entry as r265272 (using yesterday's
> date, since that's when I committed the change).

Thanks!

-- 
Eric Botcazou


Go patch committed: Rewrite Type::are_identical to use flags

2018-10-18 Thread Ian Lance Taylor
This patch to the Go frontend changes the Type::are_identical function
to use a single flags parameter instead of the Cmp_tags and
errors_are_identical bool parameters. The existing behavior is
unchanged.  This is a simplification step for future work that will
add a new flag.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 265287)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-0494dc5737f0c89ad6f45e04e8313e4161678861
+84531ef21230307773daa438a50bf095edcdbf93
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 265287)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -2077,7 +2077,8 @@ Escape_analysis_assign::call(Call_expres
   else
{
  if (!Type::are_identical(fntype->receiver()->type(),
-  (*p)->expr()->type(), true, NULL))
+  (*p)->expr()->type(), Type::COMPARE_TAGS,
+  NULL))
{
  // This will be converted later, preemptively track it instead
  // of its conversion expression which will show up in a later 
pass.
@@ -2096,7 +2097,7 @@ Escape_analysis_assign::call(Call_expres
   ++pn, ++p)
{
  if (!Type::are_identical(pn->type(), (*p)->expr()->type(),
-  true, NULL))
+  Type::COMPARE_TAGS, NULL))
{
  // This will be converted later, preemptively track it instead
  // of its conversion expression which will show up in a later 
pass.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 265287)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -178,7 +178,10 @@ Expression::convert_for_assignment(Gogo*
   || rhs->is_error_expression())
 return Expression::make_error(location);
 
-  bool are_identical = Type::are_identical(lhs_type, rhs_type, false, NULL);
+  bool are_identical = Type::are_identical(lhs_type, rhs_type,
+  (Type::COMPARE_ERRORS
+   | Type::COMPARE_TAGS),
+  NULL);
   if (!are_identical && lhs_type->interface_type() != NULL)
 {
   if (rhs_type->interface_type() == NULL)
@@ -341,7 +344,9 @@ Expression::convert_interface_to_interfa
bool for_type_guard,
Location location)
 {
-  if (Type::are_identical(lhs_type, rhs->type(), false, NULL))
+  if (Type::are_identical(lhs_type, rhs->type(),
+ Type::COMPARE_ERRORS | Type::COMPARE_TAGS,
+ NULL))
 return rhs;
 
   Interface_type* lhs_interface_type = lhs_type->interface_type();
@@ -3389,7 +3394,9 @@ Type_conversion_expression::do_is_static
   if (!this->expr_->is_static_initializer())
 return false;
 
-  if (Type::are_identical(type, expr_type, false, NULL))
+  if (Type::are_identical(type, expr_type,
+ Type::COMPARE_ERRORS | Type::COMPARE_TAGS,
+ NULL))
 return true;
 
   if (type->is_string_type() && expr_type->is_string_type())
@@ -3503,7 +3510,9 @@ Type_conversion_expression::do_get_backe
   Btype* btype = type->get_backend(gogo);
   Location loc = this->location();
 
-  if (Type::are_identical(type, expr_type, false, NULL))
+  if (Type::are_identical(type, expr_type,
+ Type::COMPARE_ERRORS | Type::COMPARE_TAGS,
+ NULL))
 {
   Bexpression* bexpr = this->expr_->get_backend(context);
   return gogo->backend()->convert_expression(btype, bexpr, loc);
@@ -5433,7 +5442,10 @@ Binary_expression::lower_struct_comparis
   Struct_type* st2 = this->right_->type()->struct_type();
   if (st2 == NULL)
 return this;
-  if (st != st2 && !Type::are_identical(st, st2, false, NULL))
+  if (st != st2
+  && !Type::are_identical(st, st2,
+ Type::COMPARE_ERRORS | Type::COMPARE_TAGS,
+ NULL))
 return this;
   if (!Type::are_compatible_for_comparison(true, this->left_->type(),
   this->right_->type(), NULL))
@@ -5512,7 +5524,10 @@ Binary_expression::lower_array_compariso
   Array_type* at2 = this->right_->type()->array_type();
   if (at2 == NULL)
 return this;
-  if (at != at2 && !Type::are_identical(at, at2, false, NULL))
+  if (at != at2
+  && !Type::are_identical(at, at2,
+

Re: [PATCH 2/4] Remove unused functions and fields.

2018-10-18 Thread Ian Lance Taylor
On Sat, Sep 22, 2018 at 12:08 PM, marxin  wrote:
>
> gcc/go/ChangeLog:
>
> 2018-09-24  Martin Liska  
>
> * gofrontend/escape.cc (Gogo::analyze_escape): Remove
> usage of a parameter.
> (Gogo::assign_connectivity): Likewise.
> (class Escape_analysis_tag): Likewise.
> (Gogo::tag_function): Likewise.
> * gofrontend/expressions.cc (Call_expression::do_type): Likewise.
> * gofrontend/gogo.h (class Gogo): Likewise.
> * gofrontend/types.cc (class Call_multiple_result_type): Likewise.
> (Type::make_call_multiple_result_type): Likewise.
> * gofrontend/types.h (class Type): Likewise.
> * gofrontend/wb.cc (class Check_escape): Likewise.
> (Gogo::add_write_barriers): Likewise.

HI, unfortunately this is wrong.  As described in
gcc/go/gofrontend/README, the files in that directory are mirrored
from a separate repository (the same is true of the files in the libgo
directory).  You should not make changes to them directly in the GCC
repository.  I have reverted these changes, as follows.  Sorry.

Ian
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 265293)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -979,7 +979,7 @@ Gogo::analyze_escape()
   for (std::vector::iterator fn = stack.begin();
fn != stack.end();
++fn)
-   this->tag_function(*fn);
+this->tag_function(context, *fn);
 
   if (this->debug_escape_level() != 0)
{
@@ -1232,10 +1232,10 @@ Escape_analysis_loop::statement(Block*,
 class Escape_analysis_assign : public Traverse
 {
 public:
-  Escape_analysis_assign(Escape_context* context)
+  Escape_analysis_assign(Escape_context* context, Named_object* fn)
 : Traverse(traverse_statements
   | traverse_expressions),
-  context_(context)
+  context_(context), fn_(fn)
   { }
 
   // Model statements within a function as assignments and flows between nodes.
@@ -1272,6 +1272,8 @@ public:
 private:
   // The escape context for this set of functions.
   Escape_context* context_;
+  // The current function being analyzed.
+  Named_object* fn_;
 };
 
 // Helper function to detect self assignment like the following.
@@ -2702,7 +2704,7 @@ Gogo::assign_connectivity(Escape_context
   int save_depth = context->loop_depth();
   context->set_loop_depth(1);
 
-  Escape_analysis_assign ea(context);
+  Escape_analysis_assign ea(context, fn);
   Function::Results* res = fn->func_value()->result_variables();
   if (res != NULL)
 {
@@ -3265,13 +3267,17 @@ Gogo::propagate_escape(Escape_context* c
 class Escape_analysis_tag
 {
  public:
-  Escape_analysis_tag()
+  Escape_analysis_tag(Escape_context* context)
+: context_(context)
   { }
 
   // Add notes to the function's type about the escape information of its
   // input parameters.
   void
   tag(Named_object* fn);
+
+ private:
+  Escape_context* context_;
 };
 
 void
@@ -3379,9 +3385,9 @@ Escape_analysis_tag::tag(Named_object* f
 // retain analysis results across imports.
 
 void
-Gogo::tag_function(Named_object* fn)
+Gogo::tag_function(Escape_context* context, Named_object* fn)
 {
-  Escape_analysis_tag eat;
+  Escape_analysis_tag eat(context);
   eat.tag(fn);
 }
 
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 265293)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -10108,7 +10108,7 @@ Call_expression::do_type()
   else if (results->size() == 1)
 ret = results->begin()->type();
   else
-ret = Type::make_call_multiple_result_type();
+ret = Type::make_call_multiple_result_type(this);
 
   this->type_ = ret;
 
Index: gcc/go/gofrontend/gogo.h
===
--- gcc/go/gofrontend/gogo.h(revision 265287)
+++ gcc/go/gofrontend/gogo.h(working copy)
@@ -680,7 +680,7 @@ class Gogo
   // Add notes about the escape level of a function's input and output
   // parameters for exporting and importing top level functions. 
   void
-  tag_function(Named_object*);
+  tag_function(Escape_context*, Named_object*);
 
   // Reclaim memory of escape analysis Nodes.
   void
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 265293)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -5441,8 +5441,9 @@ Type::make_nil_type()
 class Call_multiple_result_type : public Type
 {
  public:
-  Call_multiple_result_type()
-: Type(TYPE_CALL_MULTIPLE_RESULT)
+  Call_multiple_result_type(Call_expression* call)
+: Type(TYPE_CALL_MULTIPLE_RESULT),
+  call_(call)
   { }
 
  protected:
@@ -5475,14 +5476,18 @@ class Call_multiple_result_type : public
   void
   do_mangled_name(Gogo*, std::string*) const
   { go_assert(saw_errors()); }
+
+ private:
+  // The ex

Re: [concepts] Update to match the working draft (and bit more)

2018-10-18 Thread Andrew Sutton
Well,,that's unfortunate. Please forgive the alternative patch submission.

https://github.com/asutton/gcc/blob/master/concepts.patch


> Attached is a rework of the Concepts TS implementation to match the
> Working Draft. It's a big patch -- I'd loved to make it smaller, but
> it didn't work out that way.
>
> Here's a brief summary of changes:
>
> - Make concepts work with -std=c++2a; warn if -fconcepts is also supplied.
> - Add a new flag -fconcepts-ts to enable TS syntax* when -std=c++2a is
used.
> - No more bool for concepts. They are their own kind of declaration.
> - New grammar for requires clauses (unfortunately). This can be
> overriden with -fconcepts-ts.
> - Support "concept bool" with -fconcepts-ts. This includes both
> variable and function concepts.
> - Constraints are instantiated only at the point of use and properly
> interleave substitution and evaluation. This should fix any issues
> with "premature substitution" errors.
> - Implement semantic comparison of atomic constraints (P0717). This
> may be buggy. More testing with complex refinement hierarchies is
> needed.
> - Completely rewrite the subsumption algorithm in logic. The WD broke
> a number of assumptions the previous version relied on, so a simple
> fix wasn't possible. We haven't seen the performance issues related to
> subsumption that showed up in the past. They're still there, but other
> core changes minimize the likelihood of achieving worst case.
> - Declaration matching is syntactic (P0716).
> - Warnings are emitted for the use of TS syntax unless -fconcepts-ts
> is specified. And if you do use -fconcepts, the same-type rule for
> abbreviated function templates is dead.
> - And just because... make template introduction semantics actually
> conform to the TS. We weren't allowing the introduction of a fixed
> series of template parameters for an introduced pack. We do now.
>
> This is not a perfect patch.
>
> - It somehow breaks partial template specializations of variable
> templates (cpp2a/concepts pr80471.C). I have no idea how that
> happened. It almost looks like a GC bug.
> - There's a new regression in cpp2a/concepts-ts2.C)
> - This breaks a lot of concepts TS support (the g++.dg/concepts dir).
> - We've seen other errors in parts of GCC not even remotely related to
> concepts**.
>
> My goals over the next few weeks are to clean up the regressions and
> start working through the backlog of concepts issues. That includes
> fixing new issues as they arise.
>
> * This patch does not preserve the Concepts TS semantics. Anybody
> relying on e.g., subtleties of the partial ordering rules in the TS
> will find themselves with broken code. This was a conscious choice.
> there are serious design issues in the TS.
>
> ** Unfortunately, my testing effort before sending this patch is a bit
> hampered by the fact that a clean bootstrap build ICEs here (bootstrap
> build on Mac OS -- can give more details if needed).
>
> ../../isl/isl_tab_pip.c: In function
‘isl_tab_basic_set_non_trivial_lexmin’:
> ../../isl/isl_tab_pip.c:5087:21: internal compiler error: in check, at
> tree-vrp.c:155
>  5087 | __isl_give isl_vec *isl_tab_basic_set_non_trivial_lexmin(
>   | ^~~~
>
> This one isn't my fault :)
>
> Enjoy,
>

>> Hi. This is the qmail-send program at sourceware.org.
>> I'm afraid I wasn't able to deliver your message to the following
addresses.
>> This is a permanent error; I've given up. Sorry it didn't work out.
>>
>> :
>> ezmlm-reject: fatal: Sorry, I don't accept messages larger than 40
bytes (#5.2.3)


> Andrew Sutton


Go patch committed: List indirect imports separately in export data

2018-10-18 Thread Ian Lance Taylor
Previously when the Go frontend generated export data that referred to
a type that was not defined in a directly imported package, it would
write the package name as additional information in the type's export
data.  That approach required all type information to be read in
order.  This patch changes the compiler to find all references to
indirectly imported packages, and write them out as an indirectimport
line in the import data.  This will permit the compiler to read
exported type data out of order.

The type traversal used to find indirect imports is a little more
complicated than necessary in preparation for later work.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 265293)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-84531ef21230307773daa438a50bf095edcdbf93
+9c985ce6f76dd65b8eb0e4b03c09ad0100712e04
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/export.cc
===
--- gcc/go/gofrontend/export.cc (revision 265287)
+++ gcc/go/gofrontend/export.cc (working copy)
@@ -143,6 +143,10 @@ Export::export_globals(const std::string
 
   std::sort(exports.begin(), exports.end(), Sort_bindings());
 
+  // Find all packages not explicitly imported but mentioned by types.
+  Unordered_set(const Package*) type_imports;
+  this->prepare_types(&exports, &type_imports);
+
   // Although the export data is readable, at least this version is,
   // it is conceptually a binary format.  Start with a four byte
   // version number.
@@ -169,7 +173,7 @@ Export::export_globals(const std::string
 
   this->write_packages(packages);
 
-  this->write_imports(imports);
+  this->write_imports(imports, type_imports);
 
   this->write_imported_init_fns(package_name, import_init_fn,
imported_init_fns);
@@ -199,6 +203,179 @@ Export::export_globals(const std::string
   this->stream_->write_checksum(s);
 }
 
+// Traversal class to find referenced types.
+
+class Find_types_to_prepare : public Traverse
+{
+ public:
+  Find_types_to_prepare(Unordered_set(const Package*)* imports)
+: Traverse(traverse_types),
+  imports_(imports)
+  { }
+
+  int
+  type(Type* type);
+
+  // Traverse the components of a function type.
+  void
+  traverse_function(Function_type*);
+
+  // Traverse the methods of a named type, and register its package.
+  void
+  traverse_named_type(Named_type*);
+
+ private:
+  // List of packages we are building.
+  Unordered_set(const Package*)* imports_;
+};
+
+// Traverse a type.
+
+int
+Find_types_to_prepare::type(Type* type)
+{
+  // Skip forwarders.
+  if (type->forward_declaration_type() != NULL)
+return TRAVERSE_CONTINUE;
+
+  // At this stage of compilation traversing interface types traverses
+  // the final list of methods, but we export the locally defined
+  // methods.  If there is an embedded interface type we need to make
+  // sure to export that.  Check classification, rather than calling
+  // the interface_type method, because we want to handle named types
+  // below.
+  if (type->classification() == Type::TYPE_INTERFACE)
+{
+  Interface_type* it = type->interface_type();
+  const Typed_identifier_list* methods = it->local_methods();
+  if (methods != NULL)
+   {
+ for (Typed_identifier_list::const_iterator p = methods->begin();
+  p != methods->end();
+  ++p)
+   {
+ if (p->name().empty())
+   Type::traverse(p->type(), this);
+ else
+   this->traverse_function(p->type()->function_type());
+   }
+   }
+  return TRAVERSE_SKIP_COMPONENTS;
+}
+
+  Named_type* nt = type->named_type();
+  if (nt != NULL)
+this->traverse_named_type(nt);
+
+  return TRAVERSE_CONTINUE;
+}
+
+// Traverse the types in a function type.  We don't need the function
+// type tself, just the receiver, parameter, and result types.
+
+void
+Find_types_to_prepare::traverse_function(Function_type* type)
+{
+  go_assert(type != NULL);
+  if (this->remember_type(type))
+return;
+  const Typed_identifier* receiver = type->receiver();
+  if (receiver != NULL)
+Type::traverse(receiver->type(), this);
+  const Typed_identifier_list* parameters = type->parameters();
+  if (parameters != NULL)
+parameters->traverse(this);
+  const Typed_identifier_list* results = type->results();
+  if (results != NULL)
+results->traverse(this);
+}
+
+// Traverse the methods of a named type, and record its package.
+
+void
+Find_types_to_prepare::traverse_named_type(Named_type* nt)
+{
+  const Package* package = nt->named_object()->package();
+  if (package != NULL)
+this->imports_->insert(package);
+
+  // We have to travers

[PATCH] [MSP430] Extend MSP430 data attribute handler

2018-10-18 Thread Jozef Lawrynowicz

The "persistent" and "noinit" data attributes for MSP430 assign the variable
they are applied to to the .persistent and .noinit sections, respectively.

The following patch extends the handler for these attributes to check for
misuse.

The testsuite updates add tests for these attributes, and some other MSP430
attributes, which previously did not have tests.

Ok for trunk?

>From 5c56509cfa0f285549eceefb457dc59134d30d0e Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Thu, 18 Oct 2018 22:16:22 +0100
Subject: [PATCH] Extend MSP430 data attribute handling

2018-10-19  Jozef Lawrynowicz  

	* gcc/config/msp430/msp430.c (msp430_data_attr): Warn if data marked
	with the persistent attribute is not intialized.
	(gen_prefix): Do not add prefixes to data to be placed in the 
	.noinit or .persistent section.
	(msp430_section_type_flags): Use global const string to reference
	.noinit and .persistent sections.

	gcc/testsuite
	* gcc.target/msp430/data-attributes.c: Extend test to check behaviour
	of static noinit/persistent data.
	* gcc.target/msp430/function-attributes-4.c: Extend test to check
	behaviour of noinit/persistent attributes when applied to functions.
	* gcc.target/msp430/attr-critical.c: New test.
	* gcc.target/msp430/attr-naked.c: Likewise.
	* gcc.target/msp430/attr-reentrant.c: Likewise.
	* gcc.target/msp430/attr-wakeup.c: Likewise.
	* gcc.target/msp430/data-attributes-2.c: Likewise.
	* gcc.target/msp430/data-attributes-3.c: Likewise.
---
 gcc/config/msp430/msp430.c | 93 --
 gcc/testsuite/gcc.target/msp430/attr-critical.c| 11 +++
 gcc/testsuite/gcc.target/msp430/attr-naked.c   | 11 +++
 gcc/testsuite/gcc.target/msp430/attr-reentrant.c   | 11 +++
 gcc/testsuite/gcc.target/msp430/attr-wakeup.c  | 19 +
 .../gcc.target/msp430/data-attributes-2.c  | 15 
 .../gcc.target/msp430/data-attributes-3.c  | 11 +++
 gcc/testsuite/gcc.target/msp430/data-attributes.c  | 28 ---
 .../gcc.target/msp430/function-attributes-4.c  | 10 +++
 9 files changed, 174 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/attr-critical.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/attr-naked.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/attr-reentrant.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/attr-wakeup.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/data-attributes-2.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/data-attributes-3.c

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index 7d305b1..73a693e 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -1804,6 +1804,8 @@ const char * const  ATTR_UPPER  = "upper";
 const char * const  ATTR_EITHER = "either";
 const char * const  ATTR_NOINIT = "noinit";
 const char * const  ATTR_PERSIST = "persistent";
+const char * const  SEC_NOINIT = ".noinit";
+const char * const  SEC_PERSIST = ".persistent";
 
 static inline bool
 has_attr (const char * attr, tree decl)
@@ -2037,37 +2039,71 @@ msp430_data_attr (tree * node,
   gcc_assert (DECL_P (* node));
   gcc_assert (args == NULL);
 
-  if (TREE_CODE (* node) != VAR_DECL)
-message = G_("%qE attribute only applies to variables");
+  /* Variable name to use in warning string, defaults to attribute name.  */
+  tree var = name;
+  tree decl = *node;
 
-  /* Check that it's possible for the variable to have a section.  */
-  if ((TREE_STATIC (* node) || DECL_EXTERNAL (* node) || in_lto_p)
-  && DECL_SECTION_NAME (* node))
-message = G_("%qE attribute cannot be applied to variables with specific sections");
+  if (TREE_CODE (decl) != VAR_DECL)
+{
+  message = G_("%qE attribute only applies to variables");
+  goto fail;
+}
 
-  if (!message && TREE_NAME_EQ (name, ATTR_PERSIST) && !TREE_STATIC (* node)
-  && !TREE_PUBLIC (* node) && !DECL_EXTERNAL (* node))
-message = G_("%qE attribute has no effect on automatic variables");
+  if (!(TREE_STATIC (decl) || TREE_PUBLIC (decl) || DECL_EXTERNAL (decl)
+	|| in_lto_p))
+{
+  message = G_("%qE attribute has no effect on automatic variables");
+  goto fail;
+}
 
-  /* It's not clear if there is anything that can be set here to prevent the
- front end placing the variable before the back end can handle it, in a
- similar way to how DECL_COMMON is used below.
- So just place the variable in the .persistent section now.  */
-  if ((TREE_STATIC (* node) || DECL_EXTERNAL (* node) || in_lto_p)
-  && TREE_NAME_EQ (name, ATTR_PERSIST))
-set_decl_section_name (* node, ".persistent");
+  if ((TREE_NAME_EQ (name, ATTR_PERSIST)
+   && has_attr (ATTR_NOINIT, decl))
+  || (TREE_NAME_EQ (name, ATTR_NOINIT)
+	  && has_attr (ATTR_PERSIST, decl)))
+{
+  message
+	= G_("variable %qE cannot have both noinit and persistent attributes");
+  var = decl;
+  goto fail;
+}
 
-  /* If this var is thought to be common, then ch

Go patch committed: Add COMPARE_ALIASESE flag for type compare and hash

2018-10-18 Thread Ian Lance Taylor
This patch to the Go frontend adds a COMPARE_ALIASES flag for type
compare and hash functions.  Normally aliases compare as identical to
the underlying type.  The new COMPARE_ALIASES flag lets them compare
(and hash) differently.  This will be used by later work.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 265296)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-9c985ce6f76dd65b8eb0e4b03c09ad0100712e04
+6f4bce815786ff3803741355f7f280e4e2c89668
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 265296)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -349,9 +349,16 @@ Type::are_identical(const Type* t1, cons
   return (flags & COMPARE_ERRORS) == 0 ? true : t1 == t2;
 }
 
-  // Skip defined forward declarations.  Ignore aliases.
-  t1 = t1->unalias();
-  t2 = t2->unalias();
+  // Skip defined forward declarations.
+  t1 = t1->forwarded();
+  t2 = t2->forwarded();
+
+  if ((flags & COMPARE_ALIASES) == 0)
+{
+  // Ignore aliases.
+  t1 = t1->unalias();
+  t2 = t2->unalias();
+}
 
   if (t1 == t2)
 return true;
@@ -923,12 +930,17 @@ Type::copy_expressions()
 unsigned int
 Type::hash_for_method(Gogo* gogo, int flags) const
 {
-  if (this->named_type() != NULL && this->named_type()->is_alias())
-return this->named_type()->real_type()->hash_for_method(gogo, flags);
-  unsigned int ret = 0;
-  if (this->classification_ != TYPE_FORWARD)
-ret += this->classification_;
-  return ret + this->do_hash_for_method(gogo, flags);
+  const Type* t = this->forwarded();
+  if (t->named_type() != NULL && t->named_type()->is_alias())
+{
+  unsigned int r =
+   t->named_type()->real_type()->hash_for_method(gogo, flags);
+  if ((flags & Type::COMPARE_ALIASES) != 0)
+   r += TYPE_FORWARD;
+  return r;
+}
+  unsigned int ret = t->classification_;
+  return ret + t->do_hash_for_method(gogo, flags);
 }
 
 // Default implementation of do_hash_for_method.  This is appropriate
Index: gcc/go/gofrontend/types.h
===
--- gcc/go/gofrontend/types.h   (revision 265296)
+++ gcc/go/gofrontend/types.h   (working copy)
@@ -574,6 +574,9 @@ class Type
   // struct field tags for purposes of type conversion.
   static const int COMPARE_TAGS = 2;
 
+  // Compare aliases: treat an alias to T as distinct from T.
+  static const int COMPARE_ALIASES = 4;
+
   // Return true if two types are identical.  If this returns false,
   // and REASON is not NULL, it may set *REASON.
   static bool


Re: [PATCH v2 1/3] or1k: libgcc: initial support for openrisc

2018-10-18 Thread Jeff Law
On 10/18/18 2:06 PM, Stafford Horne wrote:
> On Thu, Oct 18, 2018 at 03:22:56PM +0200, Sebastian Huber wrote:
>> Hello,
>>
>> is there a chance to get the or1k support integrated before the GCC 9 stage
>> 3?
> 
> Hello,
> 
> I would definitly like that and that is my goal.  It seems the limiting factor
> is getting technical review and signoff on this set of patches.
> 
> I will send out a PATCH v3 with a few minor enhancements gathered since v2 
> today
> or tomorrow.  Then I will try to ping a few people if I dont get reviews by 
> next
> week.
Also note that for a port with minimal bleed out (and I think the or1k
qualifies) we can still integrate it during stage3.  BUt obviously it'd
better to get it in during stage1.

Jeff


Re: [PATCH] v2: gccint.texi: add user experience guidelines

2018-10-18 Thread Sandra Loosemore

On 10/18/2018 03:12 PM, David Malcolm wrote:


Here's an updated version of the patch, addressing your above comments,
and those from Martin and Richard (I hope).


Thanks, this one looks more readable.  Some more specific comments 
included inline below.



I have a couple of texinfo questions:

(a) the guidelines frequently have contrasting pairs
of examples showing how to do something vs how not to do it.  Is there
a way of marking these up in texinfo beyond just @smallexample?
(and manually putting in "BAD" and "OK", as I've done)


No, there's no markup for this.  I think the brief comments in the code 
example and longer discussion in the surrounding text is fine.



(b) what's the best way of showing example output from gcc?  In
particular I wasn't able to properly express the single quotes emitted by
GCC's %qs, %<, and %> directives: everything I've tried so far has issues
in at least one of the pdf vs the html output.  I've settled for using
single quotes, which is easy to emit via LANG=C and looks OK in html,
but less good in pdf.


I don't understand this question.  Isn't the best way to show single 
quotes in the output, single quotes?  :-S




+@cindex diagnostics, true positive
+@cindex false positive
+@cindex true positive
+
+Warnings should have a good @dfn{signal-to-noise ratio}: we should have few
+@dfn{false positives} (falsely issuing a warning when no warning is
+warranted) and few @dfn{false negatives} (failing to issue a warning when
+one @emph{is} justified).
+
+Note that a ``false positive'' can mean, in practice, a warning that the


No quote markup needed there.


+@noindent
+This will emit either one diagnostic with two locations:


s/will emit/emits/


+Avoid using the @code{input_location} global, and the diagnostic functions
+that implicitly use it - use @code{error_at} and @code{warning_at} rather


Long dashes in Texinfo are marked up as '---' (three hyphens) with no 
surrounding whitespace.



+@noindent
+@anchor{input_location_example}
+For example, in the example of imprecise wording
+above, the diagnostic was generated using @code{warning}:


How about rephrasing that as

For example, generating the diagnostic using @code{warning} results in 
the imprecise wording in the example above:


which puts it both in the present tense and active voice.



+would lead to:
+
+@smallexample
+// OK: use location of attribute, with a secondary location
+demo.c:1:24: warning: attribute 'noinline' on variable 'foo' was ignored 
[-Wattributes]


The above line seems long enough that it might overflow into the right 
margin.  You probably want to use -fmessage-length=70 or something like 
that for these examples.



+@subsection Coding Conventions
+
+See the @uref{https://gcc.gnu.org/codingconventions.html#Diagnostics,
+diagnostics section} of the GCC coding conventions.
+
+In the C++ frontend, when comparing two types in a message, use @code{%H}


s/frontend/front end/

I think you should be using @samp markup, rather than @code, on all 
instances of these %-format directives throughout the running text.



+and @code{%I} rather tha @code{%T}, as this allows the diagnostics


s/tha/than/


+subsystem to highlight differences between template-based types.
+For example, rather than using @code{%qT}:
+
+@smallexample
+  // BAD: a pair of %qT used in C++ frontend for type comparison


s/frontend/front end/ again


+  error_at (loc, "could not convert %qE from %qT to %qT", expr,
+TREE_TYPE (expr), type);
+@end smallexample
+
+@noindent
+which could lead to:
+
+@smallexample
+error: could not convert 'map()' from 'map' to 
'map'


That line looks too long too.


+@end smallexample
+
+@noindent
+using @code{%H} and @code{%I} (via @code{%qH} and @code{%qI}):
+
+@smallexample
+  // OK: compare types in C++ frontend via %qH and %qI


s/frontend/front end/ again


+  error_at (loc, "could not convert %qE from %qH to %qI", expr,
+TREE_TYPE (expr), type);
+@end smallexample
+
+@noindent
+allows the above output to be simplified to:
+
+@smallexample
+error: could not convert 'map()' from 'map<[...],double>' to 
'map<[...],int>'


Another too-long line.


+@end smallexample
+
+@noindent
+where the @code{double} and @code{int} are colorized to highlight them.
+
+@c %H and %I were added in r248698.
+
+Use @code{auto_diagnostic_group} when issuing multiple related
+diagnostics (seen in various examples on this page).  This informs the
+diagnostic subsystem that all diagnostics issued within the lifetime
+of the @code{auto_diagnostic_group} are related.  (Currently it doesn't
+do anything with this information, but we may implement that in the
+future).


I'm not real keen on documenting not-yet-implemented features.  I've 
deleted a bunch of "we may implement that in the future" things from the 
GCC user documentation that were written 20+ years ago and never 
implemented.  :-P



+@noindent
+which can lead to:
+
+@smallexample
+spellcheck-typenames.C:73:1: error: 'singed' does 

Re: [PATCH v5 01/10] Initial TI PRU GCC port

2018-10-18 Thread Dimitar Dimitrov
On Wednesday, 10/17/2018 22:26:58 EEST Richard Sandiford wrote:

> > +; Note: "JUMP_INSNs and CALL_INSNs are not allowed to have any output
> > +; reloads;".  Hence this insn must be prepared for a counter that is
> > +; not a register.
> > +(define_insn "doloop_end_internal"
> > +  [(set (pc)
> > +   (if_then_else (ne (match_operand:HISI 0 "nonimmediate_operand" "+r,*m")
> > + (const_int 1))
> > + (label_ref (match_operand 1 "" ""))
> > + (pc)))
> > +   (set (match_dup 0)
> > +   (plus:HISI (match_dup 0)
> > +(const_int -1)))
> > +   (unspec [(match_operand 2 "const_int_operand" "")] UNSPECV_LOOP_END)
> > +   (clobber (match_scratch:HISI 3 "=X,&r"))]
> > +  ""
> > +{
> > +  gcc_unreachable ();
> > +}
> > +  ;; Worst case length:
> > +  ;;
> > +  ;; lbbo op3_reg, op3_ptr   4'
> > +  ;; sub , 14
> > +  ;; qbeq .+8, , 0  4
> > +  ;; jmp4
> > +  ;; sbbo op3_reg, op3_ptr   4
> > +  [(set (attr "length")
> > +  (if_then_else
> > +   (and (ge (minus (pc) (match_dup 1)) (const_int 0))
> > +(le (minus (pc) (match_dup 1)) (const_int 1020)))
> > +   (cond [(eq_attr "alternative" "0") (const_int 4)
> > +  (eq_attr "alternative" "0") (const_int 12)]
> > +  (const_int 4))
> > +   (cond [(eq_attr "alternative" "0") (const_int 12)
> > +  (eq_attr "alternative" "0") (const_int 20)]
> > +  (const_int 4])
> 
> The second (eq_attr "alternative" "0") lines in each (cond ...)
> won't be used, since the first match wins.
Sorry about this. My intention was to have different weights for the two 
different constraints. I have fixed it the attached patch.

> OK with those changes once the port is accepted.  (No need to repost,
> just fix up locally and commit the fixed version when the time comes.)
See attached fixup patch. I'll need someone to apply it on my behalf, since I 
do not have write access.

> 
> Jeff, could you ask the SC about accepting the port, if that hasn't
> already been decided?  Dimitar, I assume you'd be OK with being the
> maintainer?
Yes, I'll be glad to support the PRU port.

In case the port is accepted, I have attached patches for maintainer listing 
and wwwdocs updates.

Thanks,
Dimitar

>From 4e07a710618e0dbbd5f97c83ebe3924a28d2ca20 Mon Sep 17 00:00:00 2001
From: Dimitar Dimitrov 
Date: Mon, 27 Aug 2018 16:40:28 +0300
Subject: [PATCH] Add myself as maintainer of PRU port

ChangeLog:

2018-10-18  Dimitar Dimitrov  

	* MAINTAINERS: Add self as PRU maintainer.

Signed-off-by: Dimitar Dimitrov 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0d6c81d4af6..1d82083512d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -92,6 +92,7 @@ nios2 port		Sandra Loosemore	
 nvptx port		Tom de Vries		
 pdp11 port		Paul Koning		
 powerpcspe port		Andrew Jenner		
+pru port		Dimitar Dimitrov	
 riscv port		Kito Cheng		
 riscv port		Palmer Dabbelt		
 riscv port		Andrew Waterman		
-- 
2.11.0

>From a05619f2ebae9afe890fd336437cf9b67ef825ac Mon Sep 17 00:00:00 2001
From: Dimitar Dimitrov 
Date: Thu, 18 Oct 2018 07:11:23 +0300
Subject: [PATCH] Fixups for v5 initial PRU backend patch

Signed-off-by: Dimitar Dimitrov 
---
 gcc/config/pru/predicates.md |  3 ---
 gcc/config/pru/pru.md| 10 --
 gcc/config/pru/pru.opt   |  2 +-
 gcc/doc/invoke.texi  |  8 
 4 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/gcc/config/pru/predicates.md b/gcc/config/pru/predicates.md
index 0e34d9c1a31..3e0a776ca54 100644
--- a/gcc/config/pru/predicates.md
+++ b/gcc/config/pru/predicates.md
@@ -75,7 +75,6 @@
   else
 	return 0;
 
-
   return REGNO_REG_CLASS (regno) == MULDST_REGS
 	 || regno >= FIRST_PSEUDO_REGISTER;
 }
@@ -96,7 +95,6 @@
   else
 	return 0;
 
-
   return REGNO_REG_CLASS (regno) == MULSRC0_REG
 	 || regno >= FIRST_PSEUDO_REGISTER;
 }
@@ -117,7 +115,6 @@
   else
 	return 0;
 
-
   return REGNO_REG_CLASS (regno) == MULSRC1_REG
 	 || regno >= FIRST_PSEUDO_REGISTER;
 }
diff --git a/gcc/config/pru/pru.md b/gcc/config/pru/pru.md
index 248ae2c953d..1fa5f9310b0 100644
--- a/gcc/config/pru/pru.md
+++ b/gcc/config/pru/pru.md
@@ -933,12 +933,10 @@
   (if_then_else
 	(and (ge (minus (pc) (match_dup 1)) (const_int 0))
 	 (le (minus (pc) (match_dup 1)) (const_int 1020)))
-	(cond [(eq_attr "alternative" "0") (const_int 4)
-	   (eq_attr "alternative" "0") (const_int 12)]
-	   (const_int 4))
-	(cond [(eq_attr "alternative" "0") (const_int 12)
-	   (eq_attr "alternative" "0") (const_int 20)]
-	   (const_int 4])
+	(cond [(eq_attr "alternative" "0") (const_int 4)]
+	   (const_int 12))
+	(cond [(eq_attr "alternative" "0") (const_int 12)]
+	   (const_int 20])
 
 (define_expand "doloop_end"
   [(use (match_operand 0 "nonimmediate_operand"))
diff --git a/gcc/config/pru/pru.opt b/gcc/config/pru/pru.opt
index fb

Ping Re: [PATCH v3 0/6] [MIPS] Reorganize the loongson march and extensions instructions set

2018-10-18 Thread Paul Hua
Ping?

I'd like check in those patches before stage3.

Thanks,

On Tue, Oct 16, 2018 at 10:49 AM Paul Hua  wrote:
>
> Hi:
>
> The original version of patches were here:
> https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00099.html
>
> This is a update version. please review, thanks.
>
> This series patches reorganize the Loongson -march=xxx and Loongson
> extensions instructions set.  For long time, the Loongson extensions
> instructions set puts under -march=loongson3a option.  We can't
> disable one of them when we need.
>
> The patch (1) split Loongson  MultiMedia extensions Instructions (MMI)
> from loongson3a, add -mloongson-mmi/-mno-loongson-mmi option for
> enable/disable them.
>
> The patch (2) split Loongson EXTensions (EXT) instructions from
> loongson3a, add -mloongson-ext/-mno-loongson-ext option for
> enable/disable them.
>
> The patch (3) add Loongson EXTensions R2 (EXT2) instructions support,
> add -mloongson-ext2/-mno-loongson-ext2 option for enable/disable them.
>
> The patch (4) add Loongson 3A1000 processor support.  The gs464 is a
> codename of 3A1000 microarchitecture.  Rename -march=loongson3a to
> -march=gs464, Keep -march=loongson3a as an alias of -march=gs464 for
> compatibility.
>
> The patch (5) add Loongson 3A2000/3A3000 processor support.  Include
> Loongson MMI, EXT, EXT2 instructions set.
>
> The patch (6) add Loongson 2K1000 processor support. Include Loongson
> MMI, EXT, EXT2 and MSA instructions set.
>
> The binutils patch has been upstreamed.
>
> There are six patches in this set, as follows.
> 1) 0001-MIPS-Add-support-for-loongson-mmi-instructions.patch
> 2) 0002-MIPS-Add-support-for-Loongson-EXT-istructions.patch
> 3) 0003-MIPS-Add-support-for-Loongson-EXT2-istructions.patch
> 4) 0004-MIPS-Add-support-for-Loongson-3A1000-proccessor.patch
> 5) 0005-MIPS-Add-support-for-Loongson-3A2000-3A3000-proccess.patch
> 6) 0006-MIPS-Add-support-for-Loongson-2K1000-proccessor.patch
>
> All patchs test under mips64el-linux-gnu no new regressions.
>
> Ok for commit ?
>
> Thanks,
> Paul Hua


Re: [PATCH] Add splay-tree "view" for bitmap

2018-10-18 Thread Richard Biener
On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On Thu, 18 Oct 2018, Richard Sandiford wrote:
>>
>>> Richard Biener  writes:
>>> > PR63155 made me pick up this old work from Steven, it turns our
>>> > linked-list implementation to a two-mode one with one being a
>>> > splay tree featuring O(log N) complexity for find/remove.
>>> >
>>> > Over Stevens original patch I added a bitmap_tree_to_vec helper
>>> > that I use from the debug/print methods to avoid changing view
>>> > there.  In theory the bitmap iterator could get a "stack"
>>> > as well and we could at least support EXECUTE_IF_SET_IN_BITMAP.
>>> >
>>> > This can be used to fix the two biggest bottlenecks in the PRs
>>> > testcase, namely SSA propagator worklist handling and out-of-SSA
>>> > coalesce list building.  perf shows the following data, first
>>> > unpatched, second patched - also watch the thrid coulumn (samples)
>>> > when comparing percentages.
>>> >
>>> > -O0
>>> > -   18.19%17.35%   407  cc1  cc1   [.]
>bitmap_set_b▒
>>> >- bitmap_set_bit   
>▒
>>> >   + 8.77% create_coalesce_list_for_region 
>▒
>>> >   + 4.21% calculate_live_ranges   
>▒
>>> >   + 2.02% build_ssa_conflict_graph
>▒
>>> >   + 1.66% insert_phi_nodes_for
>▒
>>> >   + 0.86% coalesce_ssa_name  
>>> > patched:
>>> > -   12.39%10.48%   129  cc1  cc1   [.]
>bitmap_set_b▒
>>> >- bitmap_set_bit   
>▒
>>> >   + 5.27% calculate_live_ranges   
>▒
>>> >   + 2.76% insert_phi_nodes_for
>▒
>>> >   + 1.90% create_coalesce_list_for_region 
>▒
>>> >   + 1.63% build_ssa_conflict_graph
>▒
>>> >   + 0.35% coalesce_ssa_name   
>>> >
>>> > -O1
>>> > -   17.53%17.53%   842  cc1  cc1   [.]
>bitmap_set_b▒
>>> >- bitmap_set_bit   
>▒
>>> >   + 12.39% add_ssa_edge   
>▒
>>> >   + 1.48% create_coalesce_list_for_region 
>▒
>>> >   + 0.82% solve_constraints   
>▒
>>> >   + 0.71% calculate_live_ranges   
>▒
>>> >   + 0.64% add_implicit_graph_edge 
>▒
>>> >   + 0.41% insert_phi_nodes_for
>▒
>>> >   + 0.34% build_ssa_conflict_graph  
>>> > patched:
>>> > -5.79% 5.00%   167  cc1  cc1   [.]
>bitmap_set_b▒
>>> >- bitmap_set_bit   
>▒
>>> >   + 1.41% add_ssa_edge
>▒
>>> >   + 0.88% calculate_live_ranges   
>▒
>>> >   + 0.75% add_implicit_graph_edge 
>▒
>>> >   + 0.68% solve_constraints   
>▒
>>> >   + 0.48% insert_phi_nodes_for
>▒
>>> >   + 0.45% build_ssa_conflict_graph   
>>> >
>>> > -O3
>>> > -   12.37%12.34%  1145  cc1  cc1   [.]
>bitmap_set_b▒
>>> >- bitmap_set_bit   
>▒
>>> >   + 9.14% add_ssa_edge
>▒
>>> >   + 0.80% create_coalesce_list_for_region 
>▒
>>> >   + 0.69% add_implicit_graph_edge 
>▒
>>> >   + 0.54% solve_constraints   
>▒
>>> >   + 0.34% calculate_live_ranges   
>▒
>>> >   + 0.27% insert_phi_nodes_for
>▒
>>> >   + 0.21% build_ssa_conflict_graph 
>>> > -4.36% 3.86%   227  cc1  cc1   [.]
>bitmap_set_b▒
>>> >- bitmap_set_bit   
>▒
>>> >   + 0.98% add_ssa_edge
>▒
>>> >   + 0.86% add_implicit_graph_edge 
>▒
>>> >   + 0.64% solve_constraints   
>▒
>>> >   + 0.57% calculate_live_ranges   
>▒
>>> >   + 0.32