Re: [PATCH] Add new micro-benchmark for string operations.

2019-06-19 Thread Martin Liška
On 6/17/19 11:21 AM, Martin Liška wrote:
> I'm adding a micro-benchmark that Honza has been using for quite some time.

I'm going to install that as it's a script in contrib.

Martin


Re: [C++ PATCH] Avoid constexpr garbage for implicit conversion to void.

2019-06-19 Thread Jakub Jelinek
On Tue, Jun 18, 2019 at 12:07:38PM -0400, Jason Merrill wrote:
> commit 8f67898b9bd5924f3dd5218164df62ada10ea428
> Author: Jason Merrill 
> Date:   Sat Jun 15 23:59:55 2019 -0400
> 
> Consolidate constexpr array handling.
> 
> * constexpr.c (eval_and_check_array_index): Split out from...
> (cxx_eval_array_reference): ...here.
> (cxx_eval_store_expression): Use it here, too.
> (diag_array_subscript): Take location.  Strip location wrapper.

The r272430 change introduced:
+FAIL: g++.dg/ubsan/pr63956.C   -O0  (test for excess errors)
+FAIL: g++.dg/ubsan/pr63956.C   -O1  (test for excess errors)
+FAIL: g++.dg/ubsan/pr63956.C   -O2  (test for excess errors)
+FAIL: g++.dg/ubsan/pr63956.C   -O2 -flto  (test for excess errors)
+FAIL: g++.dg/ubsan/pr63956.C   -O2 -flto -flto-partition=none  (test for 
excess errors)
+FAIL: g++.dg/ubsan/pr63956.C   -O3 -g  (test for excess errors)
+FAIL: g++.dg/ubsan/pr63956.C   -Os  (test for excess errors)
>From what I can see, it just changed the location of the diagnostics, like:
@@ -29,17 +29,23 @@ pr63956.C:72:11: error: ‘(7.0e+0f / 0.
72 | a = a / b; // { dg-error "is not a constant expression" }
   | ~~^~~
 pr63956.C:89:24:   in ‘constexpr’ expansion of ‘fn5(((const int*)(& m1)), 4)’
-pr63956.C:89:30: error: array subscript value ‘4’ is outside the bounds of 
array type ‘const int [4]’
-   89 | constexpr int m3 = fn5 (m1, 4); // { dg-error "array subscript|in 
.constexpr. expansion of " }
-  |  ^
+pr63956.C:83:12: error: array subscript value ‘4’ is outside the bounds of 
array ‘m1’ of type ‘const int [4]’
+   83 | b = a[b];
+  | ~~~^
+pr63956.C:87:15: note: declared here
+   87 | constexpr int m1[4] = { 1, 2, 3, 4 };
+  |   ^~
 pr63956.C:109:24:   in ‘constexpr’ expansion of ‘fn7(0, 8)’
 pr63956.C:109:43: error: dereferencing a null pointer
   109 | constexpr int n3 = fn7 ((const int *) 0, 8);  // { dg-error "null 
pointer|in .constexpr. expansion of " }
   |   ^
 pr63956.C:119:24:   in ‘constexpr’ expansion of ‘fn8(10)’
-pr63956.C:119:27: error: array subscript value ‘10’ is outside the bounds of 
array type ‘const int [10]’
-  119 | constexpr int o2 = fn8 (10); // { dg-error "array subscript|in 
.constexpr. expansion of " }
-  |   ^
+pr63956.C:115:13: error: array subscript value ‘10’ is outside the bounds of 
array ‘g’ of type ‘const int [10]’
+  115 |   return g[i];
+  |  ~~~^
+pr63956.C:114:17: note: declared here
+  114 |   constexpr int g[10] = { };
+  | ^
 pr63956.C:130:24:   in ‘constexpr’ expansion of ‘fn9(2147483647, 1)’
 pr63956.C:130:39: error: overflow in constant expression [-fpermissive]
   130 | constexpr int p2 = fn9 (__INT_MAX__, 1); // { dg-error "overflow in 
constant expression|in .constexpr. expansion of " }

So, I've committed following patch as obvious after testing it on
x86_64-linux -m32/-m64.

2019-06-19  Jakub Jelinek  

* g++.dg/ubsan/pr63956.C: Adjust expected diagnostics.

--- gcc/testsuite/g++.dg/ubsan/pr63956.C.jj 2019-05-20 11:39:25.003963953 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr63956.C2019-06-19 10:20:12.390666122 
+0200
@@ -80,13 +80,13 @@ constexpr int
 fn5 (const int *a, int b)
 {
   if (b != 2)
-b = a[b];
+b = a[b]; // { dg-error "array subscript" }
   return b;
 }
 
 constexpr int m1[4] = { 1, 2, 3, 4 };
 constexpr int m2 = fn5 (m1, 3);
-constexpr int m3 = fn5 (m1, 4); // { dg-error "array subscript|in .constexpr. 
expansion of " }
+constexpr int m3 = fn5 (m1, 4); // { dg-message "in .constexpr. expansion of " 
}
 
 constexpr int
 fn6 (const int &a, int b)
@@ -112,11 +112,11 @@ constexpr int
 fn8 (int i)
 {
   constexpr int g[10] = { };
-  return g[i];
+  return g[i]; // { dg-error "array subscript" }
 }
 
 constexpr int o1 = fn8 (9);
-constexpr int o2 = fn8 (10); // { dg-error "array subscript|in .constexpr. 
expansion of " }
+constexpr int o2 = fn8 (10); // { dg-message "in .constexpr. expansion of " }
 
 constexpr int
 fn9 (int a, int b)


Jakub


[committed] Fix handling of references in reduction(inscan, ...) and inclusive clauses

2019-06-19 Thread Jakub Jelinek
Hi!

References are a nightmare to support :(.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2019-06-19  Jakub Jelinek  

* omp-low.c (lower_rec_input_clauses): Handle references properly
in inscan clauses.
(lower_omp_scan): Likewise.
cp/
* cp-gimplify.c (cp_genericize_r): Handle OMP_CLAUSE_{IN,EX}CLUSIVE
like OMP_CLAUSE_SHARED.
testsuite/
* g++.dg/vect/simd-3.cc: New test.
* g++.dg/vect/simd-4.cc: New test.
* g++.dg/vect/simd-5.cc: New test.

--- gcc/omp-low.c.jj2019-06-17 23:18:53.614850166 +0200
+++ gcc/omp-low.c   2019-06-18 12:21:25.911696625 +0200
@@ -5237,10 +5237,13 @@ lower_rec_input_clauses (tree clauses, g
  if (OMP_CLAUSE_REDUCTION_GIMPLE_INIT (c))
{
  tseq = OMP_CLAUSE_REDUCTION_GIMPLE_INIT (c);
- x = DECL_VALUE_EXPR (new_var);
- SET_DECL_VALUE_EXPR (new_var, nv);
+ x = DECL_VALUE_EXPR (new_vard);
+ tree vexpr = nv;
+ if (new_vard != new_var)
+   vexpr = build_fold_addr_expr (nv);
+ SET_DECL_VALUE_EXPR (new_vard, vexpr);
  lower_omp (&tseq, ctx);
- SET_DECL_VALUE_EXPR (new_var, x);
+ SET_DECL_VALUE_EXPR (new_vard, x);
  gimple_seq_add_seq (ilist, tseq);
  OMP_CLAUSE_REDUCTION_GIMPLE_INIT (c) = NULL;
}
@@ -5366,20 +5369,23 @@ lower_rec_input_clauses (tree clauses, g
{
  if (x)
{
- tree nv = create_tmp_var_raw (TREE_TYPE (new_vard));
+ tree nv = create_tmp_var_raw (TREE_TYPE (new_var));
  gimple_add_tmp_var (nv);
- ctx->cb.decl_map->put (new_var, nv);
+ ctx->cb.decl_map->put (new_vard, nv);
  x = lang_hooks.decls.omp_clause_default_ctor
(c, nv, build_outer_var_ref (var, ctx));
  gimplify_and_add (x, ilist);
  if (OMP_CLAUSE_REDUCTION_GIMPLE_INIT (c))
{
  tseq = OMP_CLAUSE_REDUCTION_GIMPLE_INIT (c);
- SET_DECL_VALUE_EXPR (new_var, nv);
- DECL_HAS_VALUE_EXPR_P (new_var) = 1;
+ tree vexpr = nv;
+ if (new_vard != new_var)
+   vexpr = build_fold_addr_expr (nv);
+ SET_DECL_VALUE_EXPR (new_vard, vexpr);
+ DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
  lower_omp (&tseq, ctx);
- SET_DECL_VALUE_EXPR (new_var, NULL_TREE);
- DECL_HAS_VALUE_EXPR_P (new_var) = 0;
+ SET_DECL_VALUE_EXPR (new_vard, NULL_TREE);
+ DECL_HAS_VALUE_EXPR_P (new_vard) = 0;
  gimple_seq_add_seq (ilist, tseq);
}
  OMP_CLAUSE_REDUCTION_GIMPLE_INIT (c) = NULL;
@@ -5523,10 +5529,12 @@ lower_rec_input_clauses (tree clauses, g
  gimplify_assign (ref, x, &llist[1]);
 
}
- else if (rvarp == NULL)
+ else
{
  if (omp_is_reference (var) && is_simd)
handle_simd_reference (clause_loc, new_vard, ilist);
+ if (rvarp)
+   break;
  gimplify_assign (new_var, x, ilist);
  if (is_simd)
{
@@ -8586,14 +8594,26 @@ lower_omp_scan (gimple_stmt_iterator *gs
if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
&& OMP_CLAUSE_REDUCTION_INSCAN (c))
  {
+   location_t clause_loc = OMP_CLAUSE_LOCATION (c);
tree var = OMP_CLAUSE_DECL (c);
tree new_var = lookup_decl (var, octx);
tree val = new_var;
tree var2 = NULL_TREE;
tree var3 = NULL_TREE;
-   if (DECL_HAS_VALUE_EXPR_P (new_var))
+   tree new_vard = new_var;
+   if (omp_is_reference (var))
+ {
+   new_var = build_simple_mem_ref_loc (clause_loc, new_var);
+   val = new_var;
+ }
+   if (DECL_HAS_VALUE_EXPR_P (new_vard))
  {
-   val = DECL_VALUE_EXPR (new_var);
+   val = DECL_VALUE_EXPR (new_vard);
+   if (omp_is_reference (var))
+

Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-19 Thread luoxhu

Hi Martin,

On 2019/6/18 18:21, Martin Liška wrote:

On 6/18/19 3:45 AM, Xiong Hu Luo wrote:

 6.2.  SPEC2017 peakrate:
 523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r 
(+13.33%);
 525.x264_r (-5.29%).


Can you please elaborate what are the key indirect call promotions that are 
needed
to achieve such a significant speed up? Are we talking about calls to virtual 
functions
or C-style indirect calls?


For benchmark 511.povray_r, no speculations and indirect call promotion
happened from povray_r.wpa.069i.profile_estimate:

994 171 indirect calls trained.
995 0 (0.00%) have common target.
996 0 (0.00%) targets was not found.
997 0 (0.00%) targets had parameter count mismatch.
998 0 (0.00%) targets was not in polymorphic call target list.
999 0 (0.00%) speculations seems useless.
   1000 0 (0.00%) speculations produced.


After applying my patch:

   1259 171 indirect calls trained.
   1260 60 (35.09%) have common target.
   1261 41 (23.98%) targets was not found.
   1262 0 (0.00%) targets had parameter count mismatch.
   1263 0 (0.00%) targets was not in polymorphic call target list.
   1264 57 (33.33%) speculations seems useless.
   1265 5 (2.92%) speculations produced.

Below indirect calls conversion will take effect, as all of these calls
are hot functions, performance boosts a lot by the combination optimization
of later stage ipa/inline/clone.

ls *.*i.* | xargs grep "Expanding speculative call" 

povray_r.ltrans5.076i.inline:Expanding speculative call of 
create_ray.constprop/75445 -> Inside_CSG_Intersection/76219 count: 291083 
(adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
create_ray.constprop/75445 -> Inside_Plane/76221 count: 387811 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
initialize_ray_container_state_tree/54575 -> Inside_CSG_Intersection/75997 
count: 3784081 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
initialize_ray_container_state_tree/54575 -> Inside_Plane/76062 count: 
5041557 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
All_CSG_Intersect_Intersections/76183 count: 8983544 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
All_Sphere_Intersections/76184 count: 31488162 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
Inside_Plane/76197 count: 19044626 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
All_CSG_Intersect_Intersections/9843 -> All_Sphere_Intersections/76011 
count: 22068935 (adjusted)
povray_r.ltrans5.076i.inline:Expanding speculative call of 
All_CSG_Intersect_Intersections/9843 -> Inside_Plane/76031 count: 13347702 
(adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
block_light_source/26304 -> All_CSG_Intersect_Intersections/76130 count: 
5434215 (adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
block_light_source/26304 -> All_Sphere_Intersections/76139 count: 19047432 
(adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
block_light_source/26304 -> Inside_Plane/76134 count: 11520241 (adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
Inside_CSG_Union/9845 -> Inside_Plane/76081 count: 830538 (adjusted)
povray_r.ltrans6.076i.inline:Expanding speculative call of 
All_CSG_Union_Intersections/9842 -> All_Plane_Intersections/76049 count: 
1636158 (adjusted)




Thanks,
Martin





[PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

2019-06-19 Thread Jakub Jelinek
Hi!

When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR, vec_shl_optab
has been removed as unused, because we only used vec_shr_optab for the
reductions.
Without this patch the vect-simd-*.c tests can be vectorized just fine
for SSE4 and above, but can't be with SSE2.  As the comment in
tree-vect-stmts.c tries to explain, for the inclusive scan operation we
want (when using V8SImode vectors):
   _30 = MEM  [(int *)&D.2043];
   _31 = MEM  [(int *)&D.2042];
   _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
   _33 = _31 + _32;
   // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
   _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
   _35 = _33 + _34;
   // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
   // _31[1]+.._31[4], ... _31[4]+.._31[7] };
   _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
   _37 = _35 + _36;
   // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
   // _31[0]+.._31[4], ... _31[0]+.._31[7] };
   _38 = _30 + _37;
   _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
   MEM  [(int *)&D.2043] = _39;
   MEM  [(int *)&D.2042] = _38;  */
For V4SImode vectors that would be VEC_PERM_EXPR ,
VEC_PERM_EXPR  and
VEC_PERM_EXPR  etc.
Unfortunately, SSE2 can't do the VEC_PERM_EXPR 
permutation (the other two it can do).  Well, to be precise, it can do it
using the vector left shift which has been removed as unused, provided
that init is initializer_zerop (shifting all zeros from the left).
init usually is all zeros, that is the neutral element of additive
reductions and couple of others too, in the unlikely case that some other
reduction is used with scan (multiplication, minimum, maximum, bitwise and),
we can use a VEC_COND_EXPR with constant first argument, i.e. a blend or
and/or.

So, this patch reintroduces vec_shl_optab (most backends actually have those
patterns already) and handles its expansion and vector generic lowering
similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the first
operand is initializer_zerop and third operand starts with a few numbers
smaller than number of elements (doesn't matter which one, as all elements
are same - zero) followed by nelts, nelts+1, nelts+2, ...
Unlike vec_shr_optab which has zero as the second operand, this one has it
as first operand, because VEC_PERM_EXPR canonicalization wants to have
first element selector smaller than number of elements.  And unlike
vec_shr_optab, where we also have a fallback in have_whole_vector_shift
using normal permutations, this one doesn't need it, that "fallback" is tried
first before vec_shl_optab.

For the vec_shl_optab checks, it tests only for constant number of elements
vectors, not really sure if our VECTOR_CST encoding can express the left
shifts in any way nor whether SVE supports those (I see aarch64 has
vec_shl_insert but that is just a fixed shift by element bits and shifts in
a scalar rather than zeros).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-06-19  Jakub Jelinek  

* doc/md.texi: Document vec_shl_ pattern.
* optabs.def (vec_shl_optab): New optab.
* optabs.c (shift_amt_for_vec_perm_mask): Add shift_optab
argument, if == vec_shl_optab, check for left whole vector shift
pattern rather than right shift.
(expand_vec_perm_const): Add vec_shl_optab support.
* optabs-query.c (can_vec_perm_var_p): Mention also vec_shl optab
in the comment.
* tree-vect-generic.c (lower_vec_perm): Support permutations which
can be handled by vec_shl_optab.
* tree-vect-stmts.c (scan_store_can_perm_p): New function.
(check_scan_store): Use it.
(vectorizable_scan_store): If target can't do normal permutations,
try to use whole vector left shifts and if needed a VEC_COND_EXPR
after it.
* config/i386/sse.md (vec_shl_): New expander.

* gcc.dg/vect/vect-simd-8.c: If main is defined, don't include
tree-vect.h nor call check_vect.
* gcc.dg/vect/vect-simd-9.c: Likewise.
* gcc.dg/vect/vect-simd-10.c: New test.
* gcc.target/i386/sse2-vect-simd-8.c: New test.
* gcc.target/i386/sse2-vect-simd-9.c: New test.
* gcc.target/i386/sse2-vect-simd-10.c: New test.
* gcc.target/i386/avx2-vect-simd-8.c: New test.
* gcc.target/i386/avx2-vect-simd-9.c: New test.
* gcc.target/i386/avx2-vect-simd-10.c: New test.
* gcc.target/i386/avx512f-vect-simd-8.c: New test.
* gcc.target/i386/avx512f-vect-simd-9.c: New test.
* gcc.target/i386/avx512f-vect-simd-10.c: New test.

--- gcc/doc/md.texi.jj  2019-06-13 00:35:43.518942525 +0200
+++ gcc/doc/md.texi 2019-06-18 15:32:38.496629946 +0200
@@ -5454,6 +5454,14 @@ in operand 2.  Store the result in vecto
 0 and 1 have mode @var{m} and operand 2 has the mode ap

Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-19 Thread Martin Liška
On 6/19/19 10:50 AM, luoxhu wrote:
> Hi Martin,
> 
> On 2019/6/18 18:21, Martin Liška wrote:
>> On 6/18/19 3:45 AM, Xiong Hu Luo wrote:
>>>  6.2.  SPEC2017 peakrate:
>>>  523.xalancbmk_r (+4.87%); 538.imagick_r (+4.59%); 511.povray_r 
>>> (+13.33%);
>>>  525.x264_r (-5.29%).
>>
>> Can you please elaborate what are the key indirect call promotions that are 
>> needed
>> to achieve such a significant speed up? Are we talking about calls to 
>> virtual functions
>> or C-style indirect calls?
> 
> For benchmark 511.povray_r, no speculations and indirect call promotion
> happened from povray_r.wpa.069i.profile_estimate:
> 
>     994 171 indirect calls trained.
>     995 0 (0.00%) have common target.
>     996 0 (0.00%) targets was not found.
>     997 0 (0.00%) targets had parameter count mismatch.
>     998 0 (0.00%) targets was not in polymorphic call target list.
>     999 0 (0.00%) speculations seems useless.
>    1000 0 (0.00%) speculations produced.
> 
> 
> After applying my patch:
> 
>    1259 171 indirect calls trained.
>    1260 60 (35.09%) have common target.
>    1261 41 (23.98%) targets was not found.
>    1262 0 (0.00%) targets had parameter count mismatch.
>    1263 0 (0.00%) targets was not in polymorphic call target list.
>    1264 57 (33.33%) speculations seems useless.
>    1265 5 (2.92%) speculations produced.
> 
> Below indirect calls conversion will take effect, as all of these calls
> are hot functions, performance boosts a lot by the combination optimization
> of later stage ipa/inline/clone.
> 
> ls *.*i.* | xargs grep "Expanding speculative call"
> povray_r.ltrans5.076i.inline:Expanding speculative call of 
> create_ray.constprop/75445 -> Inside_CSG_Intersection/76219 count: 291083 
> (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of 
> create_ray.constprop/75445 -> Inside_Plane/76221 count: 387811 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of 
> initialize_ray_container_state_tree/54575 -> Inside_CSG_Intersection/75997 
> count: 3784081 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of 
> initialize_ray_container_state_tree/54575 -> Inside_Plane/76062 count: 
> 5041557 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
> All_CSG_Intersect_Intersections/76183 count: 8983544 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
> All_Sphere_Intersections/76184 count: 31488162 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of Trace/54564 -> 
> Inside_Plane/76197 count: 19044626 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of 
> All_CSG_Intersect_Intersections/9843 -> All_Sphere_Intersections/76011 count: 
> 22068935 (adjusted)
> povray_r.ltrans5.076i.inline:Expanding speculative call of 
> All_CSG_Intersect_Intersections/9843 -> Inside_Plane/76031 count: 13347702 
> (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of 
> block_light_source/26304 -> All_CSG_Intersect_Intersections/76130 count: 
> 5434215 (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of 
> block_light_source/26304 -> All_Sphere_Intersections/76139 count: 19047432 
> (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of 
> block_light_source/26304 -> Inside_Plane/76134 count: 11520241 (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of 
> Inside_CSG_Union/9845 -> Inside_Plane/76081 count: 830538 (adjusted)
> povray_r.ltrans6.076i.inline:Expanding speculative call of 
> All_CSG_Union_Intersections/9842 -> All_Plane_Intersections/76049 count: 
> 1636158 (adjusted)
> 
>>
>> Thanks,
>> Martin
>>
> 

Thank you very much for the numbers. Today, I'm going to prepare the 
generalization of single-value counter to track N values.

Martin


Re: [PATCH 1/3] Create GCN-specific gthreads

2019-06-19 Thread Andrew Stubbs

Ping.

I can probably approve this myself, as it only affects GCN, but I'd 
appreciate a second opinion.


Thanks

Andrew

On 07/06/2019 15:39, Andrew Stubbs wrote:

This patch creates a new gthread model for AMD GCN devices.

For now, there's just enough support for libgfortran to use mutexes in 
its I/O routines. The rest can be added at a later time, if at all.


Notes:

  * GCN GPUs do not support dynamic creation and deletion of threads, so
    there can be no implementation for those functions. (There may be
    many threads, of course, but they are hardware managed and must be
    launched all at once.)

  * It would be possible to implement support for EMUTLS, but I have no
    wish to do so at this time, and it isn't likely to be needed by
    OpenMP or OpenACC offload kernels, so those functions are also stub
    implementations.

OK to commit?





Re: [PATCH 2/3] Stub implementation of unwinding for AMD GCN.

2019-06-19 Thread Andrew Stubbs

Ping.

I can probably approve this myself, as it only affects GCN, but I'd 
appreciate a second opinion.


Thanks

Andrew

On 07/06/2019 15:40, Andrew Stubbs wrote:
This patch provides the "_Unwind_Backtrace" and "_Unwind_GetIPInfo" 
symbols required to link programs using libgfortran.


I do not wish to implement proper backtracing at this time (I have other 
things to work on), and IIUC none of the existing implementations will 
Just Work.


OK to commit?





Re: Use ODR for canonical types construction in LTO

2019-06-19 Thread Jan Hubicka
Hi,
Jason, this is about C++ frontend producing two copies of same type with
different TYPE_CANONICAL but same get_alias_set.  Since the types are
structurally different, this does not go well with LTO which no longer
sees they are same.  They are created in
  if (CLASSTYPE_NON_LAYOUT_POD_P (t) || CLASSTYPE_EMPTY_P (t))
  {
   /* T needs a different layout as a base (eliding virtual bases
  or whatever).  Create that version.  */
   tree base_t = make_node (TREE_CODE (t));

and later cxx_get_alias_set arranges they have same set by testing
IS_FAKE_BASE_TYPE.


I am not able to come with a testcase that triggers wrong code with LTO
However compiling this testcase:

struct E {
  ~E();
  virtual void f() const;
};
struct B : E {};
struct G : virtual B {};
struct A {
  virtual ~A();
};
struct J : E {
  int val;
  ~J() {if (val) __builtin_abort ();}
  void f() const {
E *p = 0;
p->f();
  }
};
J h;
struct I : A, G, virtual B {};

Front-end produce two versions of J that do not pass
gimple_canonical_types_compatible_p thus at compile time they have same
alias set and at LTO time they are considered non-conflicting.

Both types are mixed in one function:

J::~J (struct J * const this)
{
  int _1;
  struct E * _2;

   [local count: 1073741824]:
  this_4(D)->D.2376._vptr.E = &MEM  [(void 
*)&_ZTV1J + 16B];
  _1 = this_4(D)->val;
  if (_1 != 0)
goto ; [0.00%]
  else
goto ; [100.00%]

   [count: 0]:
  abort ();

   [local count: 1073741824]:
  _2 = &this_4(D)->D.2376;
  E::~E (_2);
  MEM[(struct  &)this_4(D)] ={v} {CLOBBER};
  return;
}

Here "(struct &)" cast is turning struct J into the other, incompatible,
variant.  I can't make FE to use the type anywhere else and this is why
I do not get wrong code since we sort of ignore the clobber then
(consider that the memory it guards is not used/initialized).

Jason, is there a way to actually read/write into the copied structure?
Putting same alias set but keeping canonical types different comfuses
same_type_for_tbaa, is there a reason why we do not make canonical types
same?

It seems to me that we want
 1) use same TYPE_CANONICAL for both variants of the type
 2) teach LTO canonical type merging about fake bases (i.e. add a
middle-end flag to do so).
Or can we avoid these types from getting into instruction stream?
(I suppose not - I guess point of using them in clobber is to get the
size right)

Honza


Re: [PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

2019-06-19 Thread Richard Biener
On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR,
>vec_shl_optab
>has been removed as unused, because we only used vec_shr_optab for the
>reductions.
>Without this patch the vect-simd-*.c tests can be vectorized just fine
>for SSE4 and above, but can't be with SSE2.  As the comment in
>tree-vect-stmts.c tries to explain, for the inclusive scan operation we
>want (when using V8SImode vectors):
>   _30 = MEM  [(int *)&D.2043];
>   _31 = MEM  [(int *)&D.2042];
>   _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
>   _33 = _31 + _32;
> // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
>   _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
>   _35 = _33 + _34;
>// _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>   // _31[1]+.._31[4], ... _31[4]+.._31[7] };
>   _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
>   _37 = _35 + _36;
>// _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>   // _31[0]+.._31[4], ... _31[0]+.._31[7] };
>   _38 = _30 + _37;
>   _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>   MEM  [(int *)&D.2043] = _39;
>   MEM  [(int *)&D.2042] = _38;  */
>For V4SImode vectors that would be VEC_PERM_EXPR }>,
>VEC_PERM_EXPR  and
>VEC_PERM_EXPR  etc.
>Unfortunately, SSE2 can't do the VEC_PERM_EXPR }>
>permutation (the other two it can do).  Well, to be precise, it can do
>it
>using the vector left shift which has been removed as unused, provided
>that init is initializer_zerop (shifting all zeros from the left).
>init usually is all zeros, that is the neutral element of additive
>reductions and couple of others too, in the unlikely case that some
>other
>reduction is used with scan (multiplication, minimum, maximum, bitwise
>and),
>we can use a VEC_COND_EXPR with constant first argument, i.e. a blend
>or
>and/or.
>
>So, this patch reintroduces vec_shl_optab (most backends actually have
>those
>patterns already) and handles its expansion and vector generic lowering
>similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the first
>operand is initializer_zerop and third operand starts with a few
>numbers
>smaller than number of elements (doesn't matter which one, as all
>elements
>are same - zero) followed by nelts, nelts+1, nelts+2, ...
>Unlike vec_shr_optab which has zero as the second operand, this one has
>it
>as first operand, because VEC_PERM_EXPR canonicalization wants to have
>first element selector smaller than number of elements.  And unlike
>vec_shr_optab, where we also have a fallback in have_whole_vector_shift
>using normal permutations, this one doesn't need it, that "fallback" is
>tried
>first before vec_shl_optab.
>
>For the vec_shl_optab checks, it tests only for constant number of
>elements
>vectors, not really sure if our VECTOR_CST encoding can express the
>left
>shifts in any way nor whether SVE supports those (I see aarch64 has
>vec_shl_insert but that is just a fixed shift by element bits and
>shifts in
>a scalar rather than zeros).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok. 

Richard. 

>2019-06-19  Jakub Jelinek  
>
>   * doc/md.texi: Document vec_shl_ pattern.
>   * optabs.def (vec_shl_optab): New optab.
>   * optabs.c (shift_amt_for_vec_perm_mask): Add shift_optab
>   argument, if == vec_shl_optab, check for left whole vector shift
>   pattern rather than right shift.
>   (expand_vec_perm_const): Add vec_shl_optab support.
>   * optabs-query.c (can_vec_perm_var_p): Mention also vec_shl optab
>   in the comment.
>   * tree-vect-generic.c (lower_vec_perm): Support permutations which
>   can be handled by vec_shl_optab.
>   * tree-vect-stmts.c (scan_store_can_perm_p): New function.
>   (check_scan_store): Use it.
>   (vectorizable_scan_store): If target can't do normal permutations,
>   try to use whole vector left shifts and if needed a VEC_COND_EXPR
>   after it.
>   * config/i386/sse.md (vec_shl_): New expander.
>
>   * gcc.dg/vect/vect-simd-8.c: If main is defined, don't include
>   tree-vect.h nor call check_vect.
>   * gcc.dg/vect/vect-simd-9.c: Likewise.
>   * gcc.dg/vect/vect-simd-10.c: New test.
>   * gcc.target/i386/sse2-vect-simd-8.c: New test.
>   * gcc.target/i386/sse2-vect-simd-9.c: New test.
>   * gcc.target/i386/sse2-vect-simd-10.c: New test.
>   * gcc.target/i386/avx2-vect-simd-8.c: New test.
>   * gcc.target/i386/avx2-vect-simd-9.c: New test.
>   * gcc.target/i386/avx2-vect-simd-10.c: New test.
>   * gcc.target/i386/avx512f-vect-simd-8.c: New test.
>   * gcc.target/i386/avx512f-vect-simd-9.c: New test.
>   * gcc.target/i386/avx512f-vect-simd-10.c: New test.
>
>--- gcc/doc/md.texi.jj 2019-06-13 00:35:43.518942525 +0200
>+++ gcc/doc/md.texi2019-06-1

Re: [PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

2019-06-19 Thread Richard Sandiford
Richard Biener  writes:
> On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek  
> wrote:
>>Hi!
>>
>>When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR,
>>vec_shl_optab
>>has been removed as unused, because we only used vec_shr_optab for the
>>reductions.
>>Without this patch the vect-simd-*.c tests can be vectorized just fine
>>for SSE4 and above, but can't be with SSE2.  As the comment in
>>tree-vect-stmts.c tries to explain, for the inclusive scan operation we
>>want (when using V8SImode vectors):
>>   _30 = MEM  [(int *)&D.2043];
>>   _31 = MEM  [(int *)&D.2042];
>>   _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
>>   _33 = _31 + _32;
>> // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
>>   _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
>>   _35 = _33 + _34;
>>// _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>>   // _31[1]+.._31[4], ... _31[4]+.._31[7] };
>>   _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
>>   _37 = _35 + _36;
>>// _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
>>   // _31[0]+.._31[4], ... _31[0]+.._31[7] };
>>   _38 = _30 + _37;
>>   _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>>   MEM  [(int *)&D.2043] = _39;
>>   MEM  [(int *)&D.2042] = _38;  */
>>For V4SImode vectors that would be VEC_PERM_EXPR >}>,
>>VEC_PERM_EXPR  and
>>VEC_PERM_EXPR  etc.
>>Unfortunately, SSE2 can't do the VEC_PERM_EXPR >}>
>>permutation (the other two it can do).  Well, to be precise, it can do
>>it
>>using the vector left shift which has been removed as unused, provided
>>that init is initializer_zerop (shifting all zeros from the left).
>>init usually is all zeros, that is the neutral element of additive
>>reductions and couple of others too, in the unlikely case that some
>>other
>>reduction is used with scan (multiplication, minimum, maximum, bitwise
>>and),
>>we can use a VEC_COND_EXPR with constant first argument, i.e. a blend
>>or
>>and/or.
>>
>>So, this patch reintroduces vec_shl_optab (most backends actually have
>>those
>>patterns already) and handles its expansion and vector generic lowering
>>similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the first
>>operand is initializer_zerop and third operand starts with a few
>>numbers
>>smaller than number of elements (doesn't matter which one, as all
>>elements
>>are same - zero) followed by nelts, nelts+1, nelts+2, ...
>>Unlike vec_shr_optab which has zero as the second operand, this one has
>>it
>>as first operand, because VEC_PERM_EXPR canonicalization wants to have
>>first element selector smaller than number of elements.  And unlike
>>vec_shr_optab, where we also have a fallback in have_whole_vector_shift
>>using normal permutations, this one doesn't need it, that "fallback" is
>>tried
>>first before vec_shl_optab.
>>
>>For the vec_shl_optab checks, it tests only for constant number of
>>elements
>>vectors, not really sure if our VECTOR_CST encoding can express the
>>left
>>shifts in any way nor whether SVE supports those (I see aarch64 has
>>vec_shl_insert but that is just a fixed shift by element bits and
>>shifts in
>>a scalar rather than zeros).
>>
>>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Ok. 

I think it would be worth instead telling evpc that the second permute
vector is zero.  Permutes with a second vector of zero are somewhat
special for SVE too, and could be in other cases for other targets.

I thought the direction of travel was not to have optabs for specific
kinds of permute any more.  E.g. zip, unzip and blend are common
permutes too, but we decided not to commonise those.

Thanks,
Richard


RE: [PATCH] [ARC] Fix PR89838

2019-06-19 Thread Claudiu Zissulescu
Hi Jeff,

> OK.
> 
 Thank you for your review.

> THe BZ mentions that this was found building a glibc test for ARC.  Is
> there a glibc port for the ARC?  I don't see one in the glibc git repo.
>  Are you aware of any plans to produce an official glibc port.

Indeed, we are in the process of upstreaming the glibc port for arc. However,  
we hit a number of issues, one being this BZ, and other is the need to provide 
some ARC processor internals documentation. Once those items cleaned up, the 
arc glibc port will be public available upstream.

> 
> I believe building glibc is a hell of a better sniff test than building
> newlib.  So if it's in the plan, I'd love to re-wire my tester to test
> with glibc rather than newlib on the ARC port.
> 

I totally agree, we also have an uClibc-ng port which, as far as I know, is 
fully upstream. If I may ask, how do you test the ARC toolchain, do you use 
Synopsys free nSIM simulator?

Thank you,
Claudiu


Re: Use ODR for canonical types construction in LTO

2019-06-19 Thread Jan Hubicka
Jason,
I also wonder if something like this would make sense:

* decl.c (build_clobber_this): Use original basetype if they
match.

Index: decl.c
===
--- decl.c  (revision 272381)
+++ decl.c  (working copy)
@@ -15210,7 +15210,11 @@ build_clobber_this ()
 
   tree ctype = current_class_type;
   if (!vbases)
-ctype = CLASSTYPE_AS_BASE (ctype);
+{
+  if (!tree_int_cst_equal (TYPE_SIZE (ctype),
+  TYPE_SIZE (CLASSTYPE_AS_BASE (ctype
+ctype = CLASSTYPE_AS_BASE (ctype);
+}
 
   tree clobber = build_clobber (ctype);
 
Having to LTO stream many classtypes twice just for clobber seems
wasteful

call.c already seem to contain similar logic in build_over_call:
it checks both types for match and if they do it uses class type
for the copy and otherwise it builds array and copies it
(I wonder why I does not just copy CLASSTYPE_AS_BASE instead,
though I see that with the type merging bug this may break with LTO)

Honza


[RFC, PATCH] Display inlining context for uninitialized warnings

2019-06-19 Thread Vladislav Ivanishin
Hi, 

This patch (partially) adds displaying inlining context for
-W{maybe,}uninitialized warnings.  This is not as trivial to enable as
simply supplying the "%G" format specifier, so I have some questions
below.

I need this hunk 

 void
 percent_K_format (text_info *text, location_t loc, tree block)
 {
-  text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
+  if (loc != UNKNOWN_LOCATION)
+text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
  
for two reasons:

- The gimple return stmt has UNKNOWN_LOCATION due to this code in
  lower_gimple_return ():

  if (gimple_return_retval (stmt) == gimple_return_retval (tmp_rs.stmt))
{
  /* Remove the line number from the representative return statement.
 It now fills in for many such returns.  Failure to remove this
 will result in incorrect results for coverage analysis.  */
  gimple_set_location (tmp_rs.stmt, UNKNOWN_LOCATION);

  (In case you are wondering, the code and the comment were
  added in 2004.)

- When percent_K_format is called, TEXT might already hold precise
  location information in case its (indirect) caller is
  warning_at/error_at.  So it seems to me, this function lacks
  distinguishing the two cases: being called from plain warning/error
  functions vs. their *_at counterparts (the location shouldn't be
  updated in the latter case).

  Martin, you did the necessary work for percent_G_format to accept
  arbitrary gimple statements rather than just calls for PR86650, but
  left the PR open.  Do you remember running into that sort of problem,
  or was it something else?
  
Sometimes inlining context is still lost (with the current patch), as
can be seen e.g. with a version of the test with 's/unsigned yo/int yo/'
substitution applied.  (The chain of block = BLOCK_SUPERCONTEXT (block)
is broken - it ends with 0 rather than a FUNCTION_DECL.)  Is this known
and expected (e.g. because pass_late_warn_uninitialized runs very late),
or something to be investigated and fixed?

The patch was bootstrapped and regtested on x86_64-pc-linux-gnu.  Shall
it be applied?

* tree-ssa-uninit.c (warn_uninit): Pass context (stmt of the uninit use)
to warning_at.
(warn_uninitialized_vars): Add %G format specifier.
(warn_uninitialized_phi): Ditto.
* tree-pretty-print.c (percent_K_format): Don't re-set location of TEXT
to UNKNOWN_LOCATION.

testsuite/
* gcc.dg/uninit-inlined.c: New test.
---
 gcc/testsuite/gcc.dg/uninit-inlined.c | 25 +
 gcc/tree-pretty-print.c   |  3 ++-
 gcc/tree-ssa-uninit.c |  8 
 3 files changed, 31 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/uninit-inlined.c

diff --git a/gcc/testsuite/gcc.dg/uninit-inlined.c b/gcc/testsuite/gcc.dg/uninit-inlined.c
new file mode 100644
index 000..231a0b6b7c2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-inlined.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O -Wmaybe-uninitialized" } */
+
+extern unsigned bar (int);
+
+extern int f;
+
+static int
+foo (int num)
+{
+  unsigned yo;
+  if (f)
+yo = bar (num);
+  return yo; /* { dg-warning "may be used uninitialized" }  */
+}
+int
+test (int num)
+{
+  if (num == 42 || !num)
+return foo (num);
+  return 0;
+}
+
+/* { dg-regexp "In function 'foo'," "In foo" } */
+/* { dg-regexp ".*inlined from 'test'.*" "Inilned from" } */
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 329cc6fceeb..db1a00a6e49 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -4196,7 +4196,8 @@ newline_and_indent (pretty_printer *pp, int spc)
 void
 percent_K_format (text_info *text, location_t loc, tree block)
 {
-  text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
+  if (loc != UNKNOWN_LOCATION)
+text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
   gcc_assert (pp_ti_abstract_origin (text) != NULL);
   *pp_ti_abstract_origin (text) = NULL;
 
diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index fe8f8f0bc28..4d6f3773a87 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -179,7 +179,7 @@ warn_uninit (enum opt_code wc, tree t, tree expr, tree var,
   xloc = expand_location (location);
   floc = expand_location (cfun_loc);
   auto_diagnostic_group d;
-  if (warning_at (location, wc, gmsgid, expr))
+  if (warning_at (location, wc, gmsgid, context, expr))
 {
   TREE_NO_WARNING (expr) = 1;
 
@@ -257,12 +257,12 @@ warn_uninitialized_vars (bool warn_possibly_uninitialized)
 	  if (always_executed)
 		warn_uninit (OPT_Wuninitialized, use, SSA_NAME_VAR (use),
 			 SSA_NAME_VAR (use),
-			 "%qD is used uninitialized in this function", stmt,
+			 "%G%qD is used uninitialized in this function", stmt,
 			 UNKNOWN_LOCATION);
 	  else if (warn_possibly_uninitialized)
 		warn_uninit (OPT_Wmaybe_uninitialized, use, SSA_NAME_VAR (use),
 			 SSA_NAME_VAR (use),
-			 "%qD may be used uninitialized in 

Re: [PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

2019-06-19 Thread Richard Biener
On June 19, 2019 11:05:42 AM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek
> wrote:
>>>Hi!
>>>
>>>When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR,
>>>vec_shl_optab
>>>has been removed as unused, because we only used vec_shr_optab for
>the
>>>reductions.
>>>Without this patch the vect-simd-*.c tests can be vectorized just
>fine
>>>for SSE4 and above, but can't be with SSE2.  As the comment in
>>>tree-vect-stmts.c tries to explain, for the inclusive scan operation
>we
>>>want (when using V8SImode vectors):
>>>   _30 = MEM  [(int *)&D.2043];
>>>   _31 = MEM  [(int *)&D.2042];
>>>   _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
>>>   _33 = _31 + _32;
>>> // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7]
>};
>>>   _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
>>>   _35 = _33 + _34;
>>>// _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2],
>_31[0]+.._31[3],
>>>   // _31[1]+.._31[4], ... _31[4]+.._31[7] };
>>>   _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
>>>   _37 = _35 + _36;
>>>// _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2],
>_31[0]+.._31[3],
>>>   // _31[0]+.._31[4], ... _31[0]+.._31[7] };
>>>   _38 = _30 + _37;
>>>   _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>>>   MEM  [(int *)&D.2043] = _39;
>>>   MEM  [(int *)&D.2042] = _38;  */
>>>For V4SImode vectors that would be VEC_PERM_EXPR 2
>>>}>,
>>>VEC_PERM_EXPR  and
>>>VEC_PERM_EXPR  etc.
>>>Unfortunately, SSE2 can't do the VEC_PERM_EXPR >>}>
>>>permutation (the other two it can do).  Well, to be precise, it can
>do
>>>it
>>>using the vector left shift which has been removed as unused,
>provided
>>>that init is initializer_zerop (shifting all zeros from the left).
>>>init usually is all zeros, that is the neutral element of additive
>>>reductions and couple of others too, in the unlikely case that some
>>>other
>>>reduction is used with scan (multiplication, minimum, maximum,
>bitwise
>>>and),
>>>we can use a VEC_COND_EXPR with constant first argument, i.e. a blend
>>>or
>>>and/or.
>>>
>>>So, this patch reintroduces vec_shl_optab (most backends actually
>have
>>>those
>>>patterns already) and handles its expansion and vector generic
>lowering
>>>similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the
>first
>>>operand is initializer_zerop and third operand starts with a few
>>>numbers
>>>smaller than number of elements (doesn't matter which one, as all
>>>elements
>>>are same - zero) followed by nelts, nelts+1, nelts+2, ...
>>>Unlike vec_shr_optab which has zero as the second operand, this one
>has
>>>it
>>>as first operand, because VEC_PERM_EXPR canonicalization wants to
>have
>>>first element selector smaller than number of elements.  And unlike
>>>vec_shr_optab, where we also have a fallback in
>have_whole_vector_shift
>>>using normal permutations, this one doesn't need it, that "fallback"
>is
>>>tried
>>>first before vec_shl_optab.
>>>
>>>For the vec_shl_optab checks, it tests only for constant number of
>>>elements
>>>vectors, not really sure if our VECTOR_CST encoding can express the
>>>left
>>>shifts in any way nor whether SVE supports those (I see aarch64 has
>>>vec_shl_insert but that is just a fixed shift by element bits and
>>>shifts in
>>>a scalar rather than zeros).
>>>
>>>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>
>> Ok. 
>
>I think it would be worth instead telling evpc that the second permute
>vector is zero.  Permutes with a second vector of zero are somewhat
>special for SVE too, and could be in other cases for other targets.
>
>I thought the direction of travel was not to have optabs for specific
>kinds of permute any more.  E.g. zip, unzip and blend are common
>permutes too, but we decided not to commonise those.

The issue is that vec_perm_const_ok doesn't have enough info here (second 
operand is all zeros). So the optab is needed to query target capabilities, not 
so much for the actual expansion which could go via vec_perm_const. 

I thought of having special permute vector entries to denote zero or don't - 
care which would make it possible to 
Have this info and allow these permutes to be single vector permutes. But then 
encoding this might be awkward. 

Richard. 

>Thanks,
>Richard



[PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts

2019-06-19 Thread Kewen.Lin
Hi all,

This is the following patch after 
https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00910.html

Main steps:
  1) Identify the doloop cmp type iv use and record its bind_cand (explain it 
later).
  2) Set zero cost for pairs between this use and any iv cand.
  3) IV cand set selecting algorithm runs as usual.
  4) Fix up the selected iv cand for doloop use if need.

It only focuses on the targets like Power which has specific count register.
target hook have_count_reg_decr_p is proposed for it.

Some notes:

*) Why we need zero cost?  How about just decrease the cost for the pair
   between doloop use and its original iv cand?  How about just decrease
   the cost for the pair between doloop use and one selected iv cand?

   Since some target supports hardware count register for decrement and
   branch, it doesn't need the general instruction sequence for decr, cmp and
   branch in general registers.  The cost of moving count register to GPR
   is generally high, so it's standalone and can't be shared with other iv 
   uses.  It means IVOPTs can take doloop use as invisible (zero cost).

   Let's take a look at PR80791 for example.

original biv (cand 4)  use derived iv (cand 6)
 generic use:   4  0
 comp use (doloop use): 0 infinite

For iv cost, original biv has cost 4 while use derived iv has cost 5.
When IVOPTs considers doloop use, the optimal cost is 8 (original biv 
iv cost 4 + use cost 4).  Unfortunately it's not actually optimal, since
later doloop transformation updates loop closing by count register,
original biv (and its update) won't be needed in loop closing any more.
The generic use become the only use for original biv.  That means, if we 
know the doloop will perform later, we shouldn't consider the doloop use
when determining IV set.  If we don't consider it, the algorithm will 
choose iv cand 6 with total cost 5 (iv cost 5 + use cost 0).

From the above, we can see that to decrease the cost for the pair between 
doloop use and original biv doesn't work.  Meanwhile it's hard to predict
one good iv cand in final optimal set here and pre-update the cost
between it and doloop use.  The analysis would be heavy and imperfect.
   
*) Why we need bind_cand?

As above, we assign zero cost for pairs between doloop use and each iv 
cand.  It's possible that doloop use gets assigned one iv cand which is
invalid to be used during later rewrite.  Then we have to fix it up with iv
cand originally used for it.  It's fine whatever this iv cand exists in
final iv cand set or not, even if it's not in the set, it will be 
eliminated in doloop transformation.

By the way, I was thinking whether we can replace the hook have_count_reg_decr_p
with flag_branch_on_count_reg.  As the description of the "no-" option, "Disable
the optimization pass that scans for opportunities to use 'decrement and 
branch' 
instructions on a count register instead of instruction sequences that 
decrement 
a register, compare it against zero, and then branch based upon the result.", it
implicitly says it has count register support.  But I noticed that the gate of 
doloop_optimize checks this flag, as what I got from the previous discussions, 
some
targets which can perform doloop_optimize don't have specific count register, 
so it
sounds we can't make use of the flag, is it correct?

Bootstrapped on powerpcle, also ran regression testing on powerpcle, got one 
failure
which is exposed by this patch and the root cause is duplicate of PR62147.
case is gcc.target/powerpc/20050830-1.c

Is it OK for trunk?  

--

gcc/ChangeLog

2019-06-19  Kewen Lin  

PR middle-end/80791
* target.def (have_count_reg_decr_p): New hook.
* doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): New hook.
* doc/tm.texi: Regenerate.
* config/rs6000/rs6000.c (rs6000_have_count_reg_decr_p): New function.
(TARGET_HAVE_COUNT_REG_DECR_P): New macro.
* tree-ssa-loop-ivopts.c (adjust_group_iv_cost_for_doloop): New 
function.
(fixup_doloop_groups): Likewise.
(find_doloop_use_and_its_bind): Likewise.
(record_group): Init bind_cand.
(determine_group_iv_cost): Call adjust_group_iv_cost_for_doloop.
(find_optimal_iv_set): Call fixup_doloop_groups.
(tree_ssa_iv_optimize_loop): Call function have_count_reg_decr_p, 
generic_predict_doloop_p and find_doloop_use_and_its_bind.
(generic_predict_doloop_p): Update attribute.

gcc/testsuite/ChangeLog

2019-06-19  Kewen Lin  

PR middle-end/80791
* gcc.dg/tree-ssa/ivopts-lt.c: Adjust.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6667cd0..12f1dfd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1912,6 +1912,9 @@ static const struct attribute_spec 
rs6000_attribute_t

[PING][PATCH 0/3] GNAT test suite fixes for build sysroot

2019-06-19 Thread Maciej Rozycki
On Tue, 14 May 2019, Maciej W. Rozycki wrote:

>  In the course of setting up GCC regression testing for the RISC-V target 
> I have discovered that the GNAT test suite does not correctly respond to 
> the test environment settings passed from the test harness in my setup and 
> consequently no test case works correctly.

 Ping for:





  Maciej

Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-19 Thread Martin Liška
On 6/19/19 10:56 AM, Martin Liška wrote:
> Thank you very much for the numbers. Today, I'm going to prepare the 
> generalization of single-value counter to track N values.

Ok, here's a patch candidate that does tracking of most common N values. For 
your test-case I can see:

pr69678.gcda:01a9:  18:COUNTERS indirect_call 9 counts
pr69678.gcda:   0: 35000 1868707024 17500 969338501 
17500 0 0 0 
pr69678.gcda:   8: 0 

So for now, you'll need to generalize get_most_common_single_value to return
N most common values.

Eventually we'll need to renamed the counter as it won't be tracking just a 
single value
any longer. I can take care of it.

Can you please verify that the patch candidate works for you?
Thanks,
Martin
>From 93175b20aa794baf1795ff1ccb3ac0391c326ada Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 19 Jun 2019 14:15:14 +0200
Subject: [PATCH] Support N values in libgcov for single value counter type.

---
 libgcc/libgcov-merge.c| 48 +--
 libgcc/libgcov-profiler.c | 38 +++
 2 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
index f778cc4b6b7..84367005663 100644
--- a/libgcc/libgcov-merge.c
+++ b/libgcc/libgcov-merge.c
@@ -89,49 +89,53 @@ __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
 static void
 merge_single_value_set (gcov_type *counters)
 {
-  unsigned j;
-  gcov_type value, counter;
-
   /* First value is number of total executions of the profiler.  */
   gcov_type all = gcov_get_counter_ignore_scaling (-1);
   counters[0] += all;
   ++counters;
 
+  /* Read all part values.  */
+  gcov_type read_counters[2 * GCOV_DISK_SINGLE_VALUES];
+
   for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
 {
-  value = gcov_get_counter_target ();
-  counter = gcov_get_counter_ignore_scaling (-1);
+  read_counters[2 * i] = gcov_get_counter_target ();
+  read_counters[2 * i + 1] = gcov_get_counter_ignore_scaling (-1);
+}
 
-  if (counter == -1)
-	{
-	  counters[1] = -1;
-	  /* We can't return as we need to read all counters.  */
-	  continue;
-	}
-  else if (counter == 0 || counters[1] == -1)
-	{
-	  /* We can't return as we need to read all counters.  */
-	  continue;
-	}
+  if (read_counters[1] == -1)
+{
+  counters[1] = -1;
+  return;
+}
+
+  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+{
+  if (read_counters[2 * i + 1] == 0)
+	return;
 
+  unsigned j;
   for (j = 0; j < GCOV_DISK_SINGLE_VALUES; j++)
 	{
-	  if (counters[2 * j] == value)
+	  if (counters[2 * j] == read_counters[2 * i])
 	{
-	  counters[2 * j + 1] += counter;
+	  counters[2 * j + 1] += read_counters[2 * i + 1];
 	  break;
 	}
 	  else if (counters[2 * j + 1] == 0)
 	{
-	  counters[2 * j] = value;
-	  counters[2 * j + 1] = counter;
+	  counters[2 * j] += read_counters[2 * i];
+	  counters[2 * j + 1] += read_counters[2 * i + 1];
 	  break;
 	}
 	}
 
-  /* We haven't found a free slot for the value, mark overflow.  */
+  /* We haven't found a slot, bail out.  */
   if (j == GCOV_DISK_SINGLE_VALUES)
-	counters[1] = -1;
+	{
+	  counters[1] = -1;
+	  return;
+	}
 }
 }
 
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 9ba65b90df3..0ef400bdda7 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -118,20 +118,38 @@ static inline void
 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
 int use_atomic)
 {
-  if (value == counters[1])
-counters[2]++;
-  else if (counters[2] == 0)
-{
-  counters[2] = 1;
-  counters[1] = value;
-}
-  else
-counters[2]--;
-
   if (use_atomic)
 __atomic_fetch_add (&counters[0], 1, __ATOMIC_RELAXED);
   else
 counters[0]++;
+
+  ++counters;
+
+  /* We have GCOV_DISK_SINGLE_VALUES as we can keep multiple values
+ next to each other.  */
+  unsigned sindex = 0;
+
+  for (unsigned i = 0; i < GCOV_DISK_SINGLE_VALUES; i++)
+{
+  if (value == counters[2 * i])
+	{
+	  counters[2 * i + 1]++;
+	  return;
+	}
+  else if (counters[2 * i + 1] == 0)
+	{
+	  /* We found an empty slot.  */
+	  counters[2 * i] = value;
+	  counters[2 * i + 1] = 1;
+	  return;
+	}
+
+  if (counters[2 * i + 1] < counters[2 * sindex + 1])
+	sindex = i;
+}
+
+  /* We haven't found an empty slot, then decrement the smallest.  */
+  counters[2 * sindex + 1]--;
 }
 
 #ifdef L_gcov_one_value_profiler_v2
-- 
2.21.0



Re: [PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

2019-06-19 Thread Richard Sandiford
Richard Biener  writes:
> On June 19, 2019 11:05:42 AM GMT+02:00, Richard Sandiford 
>  wrote:
>>Richard Biener  writes:
>>> On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek
>> wrote:
Hi!

When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR,
vec_shl_optab
has been removed as unused, because we only used vec_shr_optab for
>>the
reductions.
Without this patch the vect-simd-*.c tests can be vectorized just
>>fine
for SSE4 and above, but can't be with SSE2.  As the comment in
tree-vect-stmts.c tries to explain, for the inclusive scan operation
>>we
want (when using V8SImode vectors):
   _30 = MEM  [(int *)&D.2043];
   _31 = MEM  [(int *)&D.2042];
   _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
   _33 = _31 + _32;
 // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7]
>>};
   _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
   _35 = _33 + _34;
// _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2],
>>_31[0]+.._31[3],
   // _31[1]+.._31[4], ... _31[4]+.._31[7] };
   _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
   _37 = _35 + _36;
// _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2],
>>_31[0]+.._31[3],
   // _31[0]+.._31[4], ... _31[0]+.._31[7] };
   _38 = _30 + _37;
   _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
   MEM  [(int *)&D.2043] = _39;
   MEM  [(int *)&D.2042] = _38;  */
For V4SImode vectors that would be VEC_PERM_EXPR >2
}>,
VEC_PERM_EXPR  and
VEC_PERM_EXPR  etc.
Unfortunately, SSE2 can't do the VEC_PERM_EXPR >>>}>
permutation (the other two it can do).  Well, to be precise, it can
>>do
it
using the vector left shift which has been removed as unused,
>>provided
that init is initializer_zerop (shifting all zeros from the left).
init usually is all zeros, that is the neutral element of additive
reductions and couple of others too, in the unlikely case that some
other
reduction is used with scan (multiplication, minimum, maximum,
>>bitwise
and),
we can use a VEC_COND_EXPR with constant first argument, i.e. a blend
or
and/or.

So, this patch reintroduces vec_shl_optab (most backends actually
>>have
those
patterns already) and handles its expansion and vector generic
>>lowering
similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the
>>first
operand is initializer_zerop and third operand starts with a few
numbers
smaller than number of elements (doesn't matter which one, as all
elements
are same - zero) followed by nelts, nelts+1, nelts+2, ...
Unlike vec_shr_optab which has zero as the second operand, this one
>>has
it
as first operand, because VEC_PERM_EXPR canonicalization wants to
>>have
first element selector smaller than number of elements.  And unlike
vec_shr_optab, where we also have a fallback in
>>have_whole_vector_shift
using normal permutations, this one doesn't need it, that "fallback"
>>is
tried
first before vec_shl_optab.

For the vec_shl_optab checks, it tests only for constant number of
elements
vectors, not really sure if our VECTOR_CST encoding can express the
left
shifts in any way nor whether SVE supports those (I see aarch64 has
vec_shl_insert but that is just a fixed shift by element bits and
shifts in
a scalar rather than zeros).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>>
>>> Ok. 
>>
>>I think it would be worth instead telling evpc that the second permute
>>vector is zero.  Permutes with a second vector of zero are somewhat
>>special for SVE too, and could be in other cases for other targets.
>>
>>I thought the direction of travel was not to have optabs for specific
>>kinds of permute any more.  E.g. zip, unzip and blend are common
>>permutes too, but we decided not to commonise those.
>
> The issue is that vec_perm_const_ok doesn't have enough info here
> (second operand is all zeros). So the optab is needed to query target
> capabilities, not so much for the actual expansion which could go via
> vec_perm_const.

Not at the moment, sure.  But my point was that we could add that
information instead of doing what the patch does, since the information
could be useful in other cases too.

These days all permute queries go through the same target hook as the
expansion: targetm.vec_perm_const,  So the target interface itself
should adapt naturally.

For can_vec_perm_const_p we could either add zeroness information
to vec_perm_indices or provide it separately (e.g. with tree inputs).
In the latter case the zeroness could be relayed to
targetm.vec_perm_const by passing zero rtxes even for queries.

> I thought of having special permute vector entries to denote zero or
> don't - care which would make it possible

Re: [PING][PATCH 0/3] GNAT test suite fixes for build sysroot

2019-06-19 Thread Arnaud Charlet
> >  In the course of setting up GCC regression testing for the RISC-V target 
> > I have discovered that the GNAT test suite does not correctly respond to 
> > the test environment settings passed from the test harness in my setup and 
> > consequently no test case works correctly.
> 
>  Ping for:
> 
> 

Have you resolved your copyright assignment issues since then?

The above patch needs to use "or else" instead of "or". OK with this change
on the above patch.

Arno


Re: [PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

2019-06-19 Thread Jakub Jelinek
On Wed, Jun 19, 2019 at 01:46:15PM +0100, Richard Sandiford wrote:
> For can_vec_perm_const_p we could either add zeroness information
> to vec_perm_indices or provide it separately (e.g. with tree inputs).
> In the latter case the zeroness could be relayed to
> targetm.vec_perm_const by passing zero rtxes even for queries.

Yeah, I'm not against doing this, it might clean stuff up.

But for start I'd still use the vec_sh[lr]_optab under the hood
in can_vec_perm_const_p, and then we can gradually decide if we want
to convert the targets to use that information too in their target hooks
and whether we'll provide some helper routines for them or not.
vec_sh[rl]_ is right now present in aarch64, alpha, i386, ia64, mips,
rs6000 and s390 backends.

> The patch is adding code to places that would not be patched in the same
> way if we already passed down information about zeroness.  So it feels

Note, I've already committed the patch, but it can be improved
incrementally.

> like we're adding back code that we already want to take out again at
> some point, unless we change policy and allow specific optabs for
> common permutes.  (I'd be fine with that FWIW.)

Well, that is not entirely true, the code that was added for it would
essentially need to be added either way, just instead of to
tree-vect-generic.c to can_vec_perm_const_p if we still kept the optabs,
or to each of the above 7 backends if we propagated that info down to the
target hooks.

Jakub


Re: [PATCH] Fix PR84521

2019-06-19 Thread Wilco Dijkstra
Hi Max,

> On Tue, Jun 18, 2019 at 4:53 PM Wilco Dijkstra  wrote:
> > > It would work if a frame pointer was initialized in the function test, but
> > > it wasn't:
> >
> > Right, because it unwinds, it needs a valid frame pointer since we no
> > longer store the stack pointer. So xtensa_frame_pointer_required
> > should do something like:
> >
> >   if (cfun->machine->accesses_prev_frame || cfun->has_nonlocal_label)
> > return true;
> 
>  You're right, with this change things are back to normal.
 
Great, I've added this to the commit. Thanks for the proactive testing!

Wilco
 

[PATCH, netbsd] Give a name to the number 0 in sysarch(0, ...)

2019-06-19 Thread coypu
The definition originates in
https://nxr.netbsd.org/xref/src/sys/arch/arm/include/sysarch.h#58

I've added the prefix SYSARCH to avoid any naming conflict concerns.

It looked a bit like an error on a first impression :-)
* config/arm/netbsd-elf.h (SYSARCH_ARM_SYNC_ICACHE): New definition.
(CLEAR_INSN_CACHE) Use it.
---
 gcc/config/arm/netbsd-elf.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/netbsd-elf.h b/gcc/config/arm/netbsd-elf.h
index ec68d3fd10f..e42a32f927c 100644
--- a/gcc/config/arm/netbsd-elf.h
+++ b/gcc/config/arm/netbsd-elf.h
@@ -138,6 +138,8 @@
 #undef DEFAULT_STRUCTURE_SIZE_BOUNDARY
 #define DEFAULT_STRUCTURE_SIZE_BOUNDARY 8
 
+#define SYSARCH_ARM_SYNC_ICACHE0
+
 /* Clear the instruction cache from `BEG' to `END'.  This makes a
call to the ARM_SYNC_ICACHE architecture specific syscall.  */
 #define CLEAR_INSN_CACHE(BEG, END) \
@@ -151,6 +153,6 @@ do  
\
   } s; \
 s.addr = (unsigned int)(BEG);  \
 s.len = (END) - (BEG); \
-(void) sysarch (0, &s);\
+(void) sysarch (SYSARCH_ARM_SYNC_ICACHE, &s);  \
   }\
 while (0)
-- 
2.11.0



[PATCH] Decrease hash-table-verification-limit from 100 to 10.

2019-06-19 Thread Martin Liška
Hi.

The patch is about decrease of verification limit. I collected following 
statistics:

godot WPA:

hash-table-verification-limit=0: 66s
hash-table-verification-limit=10: 69s
hash-table-verification-limit=100: 94s

make all runtime libraries in GCC for --disable-bootstrap 
--disable-libsanitizer --disable-multilib

=0: 127s (17m30s user time)
=10: 130s (18m17s user time)
=100: 150s (20m37s user time)

That's why I'm suggesting default equal to 10. Moreover as items are rehashed 
during a hash table
growth we always compare different items.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2019-06-19  Martin Liska  

* params.def (PARAM_HASH_TABLE_VERIFICATION_LIMIT): Decrease
to 10.
---
 gcc/params.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/params.def b/gcc/params.def
index 0db60951413..4567c17ba60 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1435,7 +1435,7 @@ DEFPARAM(PARAM_HASH_TABLE_VERIFICATION_LIMIT,
 	 "hash-table-verification-limit",
 	 "The number of elements for which hash table verification is done for "
 	 "each searched element.",
-	 100, 0, 0)
+	 10, 0, 0)
 
 /*
 



Re: Use ODR for canonical types construction in LTO

2019-06-19 Thread Jan Hubicka
> Jason,
> I also wonder if something like this would make sense:
> 
>   * decl.c (build_clobber_this): Use original basetype if they
>   match.
> 
> Index: decl.c
> ===
> --- decl.c(revision 272381)
> +++ decl.c(working copy)
> @@ -15210,7 +15210,11 @@ build_clobber_this ()
>  
>tree ctype = current_class_type;
>if (!vbases)
> -ctype = CLASSTYPE_AS_BASE (ctype);
> +{
> +  if (!tree_int_cst_equal (TYPE_SIZE (ctype),
> +TYPE_SIZE (CLASSTYPE_AS_BASE (ctype
> +ctype = CLASSTYPE_AS_BASE (ctype);
> +}
>  
>tree clobber = build_clobber (ctype);

I have lto-bootstrapped/regtested the aptch and it seems to work just
fine.  For tramp3d it reduces number of types:
[WPA] read 56122 SCCs of average size 1.127419
[WPA] 63273 tree bodies read in total
[WPA] tree SCC table: size 65521, 40469 elements, collision ratio:
1.141428
[WPA] tree SCC max chain length 12 (size 1)
[WPA] Compared 3603 SCCs, 657 collisions (0.182348)
[WPA] Merged 3488 SCCs
[WPA] Merged 3496 tree bodies
[WPA] Merged 1949 types
[WPA] 24104 types prevailed (31196 associated trees)
[WPA] GIMPLE canonical type table: size 16381, 412 elements, 2106
searches, 19 collisions (ratio: 0.009022)
[WPA] GIMPLE canonical type pointer-map: 412 elements, 3430 searches

to

[WPA] read 55182 SCCs of average size 1.098384
[WPA] 60611 tree bodies read in total
[WPA] tree SCC table: size 65521, 39444 elements, collision ratio:
1.127024
[WPA] tree SCC max chain length 12 (size 1)
[WPA] Compared 3603 SCCs, 657 collisions (0.182348)
[WPA] Merged 3488 SCCs
[WPA] Merged 3496 tree bodies
[WPA] Merged 1949 types
[WPA] 22908 types prevailed (28278 associated trees)
[WPA] GIMPLE canonical type table: size 16381, 405 elements, 1390
searches, 13 collisions (ratio: 0.009353)
[WPA] GIMPLE canonical type pointer-map: 405 elements, 1998 searches

since the number of GIMPLE canonical type hash searches roughly
corresponds to number of types in the LTO stream, it is down by 50%
that is nice.

Morever the patch improves TBAA oracle from:

Alias oracle query stats:
  refs_may_alias_p: 3021544 disambiguations, 3321141 queries
  ref_maybe_used_by_call_p: 7118 disambiguations, 3047138 queries
  call_may_clobber_ref_p: 817 disambiguations, 817 queries
  nonoverlapping_component_refs_p: 27 disambiguations, 63312 queries
  nonoverlapping_component_refs_of_decl_p: 19 disambiguations, 2192 queries
  aliasing_component_refs_p: 2050 disambiguations, 20312 queries
  TBAA oracle: 1419961 disambiguations 2917965 queries
   555158 are in alias set 0
   575113 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   253187 are dependent in the DAG
   114546 are aritificially in conflict with void *

PTA query stats:
  pt_solution_includes: 671982 disambiguations, 952515 queries
  pt_solutions_intersect: 97060 disambiguations, 437910 queries

to

Alias oracle query stats:
  refs_may_alias_p: 3243740 disambiguations, 3545965 queries
  ref_maybe_used_by_call_p: 6861 disambiguations, 3268658 queries
  call_may_clobber_ref_p: 817 disambiguations, 817 queries
  nonoverlapping_component_refs_p: 27 disambiguations, 64095 queries
  nonoverlapping_component_refs_of_decl_p: 19 disambiguations, 2192 queries
  aliasing_component_refs_p: 2058 disambiguations, 21010 queries
  TBAA oracle: 1488225 disambiguations 3022119 queries
   551266 are in alias set 0
   575109 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   290651 are dependent in the DAG
   116868 are aritificially in conflict with void *

PTA query stats:
  pt_solution_includes: 679953 disambiguations, 965286 queries
  pt_solutions_intersect: 97067 disambiguations, 437676 queries

This is bit inexpected, but I think it mostly comes from the fact that
same_type_for_tbaa currently returns -1 for the fake type and original
type. This disables good part of access path querries.

I also tested the patch setting TYPE_CANONICAL to be same between
original and the fake copy. It works but I have to disable type verifier
that wants match between TYPE_MODE and TYPE_SIZE.

Honza


Re: Use ODR for canonical types construction in LTO

2019-06-19 Thread Nathan Sidwell

On 6/19/19 9:28 AM, Jan Hubicka wrote:

Jason,
I also wonder if something like this would make sense:

* decl.c (build_clobber_this): Use original basetype if they
match.

Index: decl.c
===
--- decl.c  (revision 272381)
+++ decl.c  (working copy)
@@ -15210,7 +15210,11 @@ build_clobber_this ()
  
tree ctype = current_class_type;

if (!vbases)
-ctype = CLASSTYPE_AS_BASE (ctype);
+{
+  if (!tree_int_cst_equal (TYPE_SIZE (ctype),
+  TYPE_SIZE (CLASSTYPE_AS_BASE (ctype
+ctype = CLASSTYPE_AS_BASE (ctype);
+}
  
tree clobber = build_clobber (ctype);


I have noticed we build a distinct as-base type in rather more cases 
than strictly necessary.  For instance when there's a member of 
reference type or we have a non-trivial dtor. 
(CLASSTYPE_NON_LAYOUT_POD_P gets set by a bunch of things that don't 
affect ABI layout)


nathan

--
Nathan Sidwell


[PATCH, i386]: Remove dead code from cmpstrnsi expander

2019-06-19 Thread Uros Bizjak
Operand 0 is always a register due to register_operand predicate.

2019-06-19  Uroš Bizjak  

* config/i386/i386.md (cmpstrnsi): Remove dead code.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 272473)
+++ config/i386/i386.md (working copy)
@@ -17004,7 +17004,7 @@
(use (match_operand 4 "immediate_operand"))]
   ""
 {
-  rtx addr1, addr2, out, outlow, count, countreg, align;
+  rtx addr1, addr2, countreg, align, out;
 
   if (optimize_insn_for_size_p () && !TARGET_INLINE_ALL_STRINGOPS)
 FAIL;
@@ -17028,10 +17028,6 @@
   && TREE_CODE (TREE_OPERAND (TREE_OPERAND (t2, 0), 0)) == 
STRING_CST)))
 FAIL;
 
-  out = operands[0];
-  if (!REG_P (out))
-out = gen_reg_rtx (SImode);
-
   addr1 = copy_addr_to_reg (XEXP (operands[1], 0));
   addr2 = copy_addr_to_reg (XEXP (operands[2], 0));
   if (addr1 != XEXP (operands[1], 0))
@@ -17039,8 +17035,7 @@
   if (addr2 != XEXP (operands[2], 0))
 operands[2] = replace_equiv_address_nv (operands[2], addr2);
 
-  count = operands[3];
-  countreg = ix86_zero_extend_to_Pmode (count);
+  countreg = ix86_zero_extend_to_Pmode (operands[3]);
 
   /* %%% Iff we are testing strict equality, we can use known alignment
  to good advantage.  This may be possible with combine, particularly
@@ -17047,9 +17042,9 @@
  once cc0 is dead.  */
   align = operands[4];
 
-  if (CONST_INT_P (count))
+  if (CONST_INT_P (operands[3]))
 {
-  if (INTVAL (count) == 0)
+  if (operands[3] == const0_rtx)
{
  emit_move_insn (operands[0], const0_rtx);
  DONE;
@@ -17064,13 +17059,10 @@
  operands[1], operands[2]));
 }
 
-  outlow = gen_lowpart (QImode, out);
-  emit_insn (gen_cmpintqi (outlow));
-  emit_move_insn (out, gen_rtx_SIGN_EXTEND (SImode, outlow));
+  out = gen_lowpart (QImode, operands[0]);
+  emit_insn (gen_cmpintqi (out));
+  emit_move_insn (operands[0], gen_rtx_SIGN_EXTEND (SImode, out));
 
-  if (operands[0] != out)
-emit_move_insn (operands[0], out);
-
   DONE;
 })
 
@@ -19320,7 +19312,7 @@
 (match_operand:SI 2 "const_int_operand"))]
   "TARGET_3DNOW || TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1"
 {
-  bool write = INTVAL (operands[1]) != 0;
+  bool write = operands[1] != const0_rtx;
   int locality = INTVAL (operands[2]);
 
   gcc_assert (IN_RANGE (locality, 0, 3));
@@ -19385,7 +19377,7 @@
 (const_int 3))]
   "TARGET_3DNOW || TARGET_PRFCHW || TARGET_PREFETCHWT1"
 {
-  if (INTVAL (operands[1]) == 0)
+  if (operands[1] == const0_rtx)
 return "prefetch\t%a0";
   else
 return "prefetchw\t%a0";


[PATCH, v2] PowerPC: Add 'prefix' to the 'isa' attribute

2019-06-19 Thread Michael Meissner
On Tue, Jun 18, 2019 at 07:13:22AM -0500, Segher Boessenkool wrote:
> On Mon, Jun 17, 2019 at 08:04:42PM -0400, Michael Meissner wrote:
> > --- gcc/config/rs6000/rs6000.md (revision 272270)
> > +++ gcc/config/rs6000/rs6000.md (working copy)
> > @@ -267,7 +267,9 @@ (define_attr "cpu"
> >(const (symbol_ref "(enum attr_cpu) rs6000_tune")))
> >  
> >  ;; The ISA we implement.
> > -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf" (const_string 
> > "any"))
> > +(define_attr "isa"
> > +  "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf,prefix"
> > +  (const_string "any"))
> 
> Don't break the line unnecessarily.
> 
> "prefix" is not a good name.  Maybe use "futp" now, so it is in line with
> the other isa values?  I.e. the "prefix" part of the "future" isa.  Or
> "futx"?  Short names are important, keep it to 4 chars?
> 
> (The isa values are named after the processor that first implements them,
> not the ISA versions, because 2.07 is much easier to get wrong than p8 is).

Here is version 2:

2019-06-19  Michael Meissner  

* config/rs6000/rs6000.md (isa attribute): Add support for
for a future processor.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 272439)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -267,7 +267,8 @@ (define_attr "cpu"
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf" (const_string "any"))
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9v,p9kf,p9tf,futp"
+  (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
 (define_attr "enabled" ""
@@ -306,6 +307,10 @@ (define_attr "enabled" ""
  (and (eq_attr "isa" "p9tf")
  (match_test "FLOAT128_VECTOR_P (TFmode)"))
  (const_int 1)
+
+ (and (eq_attr "isa" "futp")
+ (match_test "TARGET_FUTURE"))
+ (const_int 1)
 ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797



Re: [RFC, PATCH] Display inlining context for uninitialized warnings

2019-06-19 Thread Martin Sebor

On 6/19/19 5:11 AM, Vladislav Ivanishin wrote:

Hi,

This patch (partially) adds displaying inlining context for
-W{maybe,}uninitialized warnings.  This is not as trivial to enable as
simply supplying the "%G" format specifier, so I have some questions
below.

I need this hunk

  void
  percent_K_format (text_info *text, location_t loc, tree block)
  {
-  text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
+  if (loc != UNKNOWN_LOCATION)
+text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
   
for two reasons:


- The gimple return stmt has UNKNOWN_LOCATION due to this code in
   lower_gimple_return ():

   if (gimple_return_retval (stmt) == gimple_return_retval (tmp_rs.stmt))
 {
   /* Remove the line number from the representative return statement.
  It now fills in for many such returns.  Failure to remove this
  will result in incorrect results for coverage analysis.  */
   gimple_set_location (tmp_rs.stmt, UNKNOWN_LOCATION);


This code is also causing quite a few non-trivial instances of
-Wreturn-local-addr to have no location after two (or more) such
statements have been merged (I raised PR 90735 for it), independent
of inlining.  I wonder if using the location of the closing curly
brace of the function definition (if it's available somewhere)
would work without messing up the coverage analysis.



   (In case you are wondering, the code and the comment were
   added in 2004.)

- When percent_K_format is called, TEXT might already hold precise
   location information in case its (indirect) caller is
   warning_at/error_at.  So it seems to me, this function lacks
   distinguishing the two cases: being called from plain warning/error
   functions vs. their *_at counterparts (the location shouldn't be
   updated in the latter case).

   Martin, you did the necessary work for percent_G_format to accept
   arbitrary gimple statements rather than just calls for PR86650, but
   left the PR open.  Do you remember running into that sort of problem,
   or was it something else?


I'm not sure it was the same problem.  I described what I was seeing
when I posted the patch:
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01286.html
I think it's worth revisiting this with your patch in place.  It would
be nice to get the inlining context in place for -Warray-bounds too.
David Malcolm (CC'd) is the expert on locations and the diagnostic
subsystem so he will have a much better insight than me into all of
this than me.

Martin

   
Sometimes inlining context is still lost (with the current patch), as

can be seen e.g. with a version of the test with 's/unsigned yo/int yo/'
substitution applied.  (The chain of block = BLOCK_SUPERCONTEXT (block)
is broken - it ends with 0 rather than a FUNCTION_DECL.)  Is this known
and expected (e.g. because pass_late_warn_uninitialized runs very late),
or something to be investigated and fixed?

The patch was bootstrapped and regtested on x86_64-pc-linux-gnu.  Shall
it be applied?







Go patch committed: Optimize string concatenations

2019-06-19 Thread Ian Lance Taylor
This patch to the Go frontend and libgo by Cherry Zhang optimizes
string concatenations.  runtime.concatstring{2,3,4,5} are just
wrappers of concatstrings.  These wrappers don't provide any benefit,
at least in the C calling convention we use, where passing arrays by
value isn't an efficient thing.  Change it to always use
concatstrings.

Also, the cap field of the slice passed to concatstrings is not
necessary.  So change it to pass a pointer and a length directly,
which is more efficient than passing a slice header by value.

Bootstrapped and tested on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 272468)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-0e4aa31b26a20b6a6a2ca102b85ba8c8b8cdf876
+7822080a6e226b1e5872e2fcefac30f666f4cc1e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 272468)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -7442,7 +7442,7 @@ String_concat_expression::do_check_types
 
 Expression*
 String_concat_expression::do_flatten(Gogo*, Named_object*,
-Statement_inserter*)
+Statement_inserter* inserter)
 {
   if (this->is_error_expression())
 return this;
@@ -7497,56 +7497,22 @@ String_concat_expression::do_flatten(Gog
 }
   if (buf == NULL)
 buf = Expression::make_nil(loc);
-  Expression* call;
-  switch (this->exprs_->size())
-{
-case 0: case 1:
-  go_unreachable();
-
-case 2: case 3: case 4: case 5:
-  {
-   Expression* len = Expression::make_integer_ul(this->exprs_->size(),
- NULL, loc);
-   Array_type* arg_type = Type::make_array_type(type, len);
-   arg_type->set_is_array_incomparable();
-   Expression* arg =
- Expression::make_array_composite_literal(arg_type, this->exprs_,
-  loc);
-   Runtime::Function code;
-   switch (this->exprs_->size())
- {
- default:
-   go_unreachable();
- case 2:
-   code = Runtime::CONCATSTRING2;
-   break;
- case 3:
-   code = Runtime::CONCATSTRING3;
-   break;
- case 4:
-   code = Runtime::CONCATSTRING4;
-   break;
- case 5:
-   code = Runtime::CONCATSTRING5;
-   break;
- }
-   call = Runtime::make_call(code, loc, 2, buf, arg);
-  }
-  break;
-
-default:
-  {
-   Type* arg_type = Type::make_array_type(type, NULL);
-   Slice_construction_expression* sce =
- Expression::make_slice_composite_literal(arg_type, this->exprs_,
-  loc);
-   sce->set_storage_does_not_escape();
-   call = Runtime::make_call(Runtime::CONCATSTRINGS, loc, 2, buf,
- sce);
-  }
-  break;
-}
-
+  go_assert(this->exprs_->size() > 1);
+  Expression* len =
+Expression::make_integer_ul(this->exprs_->size(), NULL, loc);
+  Array_type* array_type = Type::make_array_type(type, len);
+  array_type->set_is_array_incomparable();
+  Expression* array =
+Expression::make_array_composite_literal(array_type, this->exprs_,
+ loc);
+  Temporary_statement* ts =
+Statement::make_temporary(array_type, array, loc);
+  inserter->insert(ts);
+  Expression* ref = Expression::make_temporary_reference(ts, loc);
+  ref = Expression::make_unary(OPERATOR_AND, ref, loc);
+   Expression* call =
+Runtime::make_call(Runtime::CONCATSTRINGS, loc, 3, buf,
+   ref, len->copy());
   return Expression::make_cast(type, call, loc);
 }
 
Index: gcc/go/gofrontend/runtime.def
===
--- gcc/go/gofrontend/runtime.def   (revision 272127)
+++ gcc/go/gofrontend/runtime.def   (working copy)
@@ -36,16 +36,8 @@ DEF_GO_RUNTIME(DECODERUNE, "runtime.deco
   R2(RUNE, INT))
 
 // Concatenate strings.
-DEF_GO_RUNTIME(CONCATSTRINGS, "runtime.concatstrings", P2(POINTER, SLICE),
-  R1(STRING))
-DEF_GO_RUNTIME(CONCATSTRING2, "runtime.concatstring2",
-  P2(POINTER, ARRAY2STRING), R1(STRING))
-DEF_GO_RUNTIME(CONCATSTRING3, "runtime.concatstring3",
-  P2(POINTER, ARRAY3STRING), R1(STRING))
-DEF_GO_RUNTIME(CONCATSTRING4, "runtime.concatstring4",
-  P2(POINTER, ARRAY4STRING), R1(STRING))
-DEF_GO_RUNTIME(CONCATSTRING5, "runtime.concatstring5",
-  P2(POINTER, ARRAY5STRING), R1(STRING))
+DEF_GO_RUNTIME(CONCATSTRINGS, "runtime.concatstrings",
+  

Re: std::inclusive_scan

2019-06-19 Thread Jonathan Wakely

On 19/06/19 00:04 +0100, Jonathan Wakely wrote:

On 18/06/19 19:08 +0100, Dietmar Kuehl via libstdc++ wrote:

The first release shipping the parallel algorithms is gcc-9.1.0. Specifically 
std::inclusive_scan() *without* execution policy seems to be missing, though. I 
guess, that’s because it was an add/renamed algorithm. The versions
taking an execution policy are present.


Right.

But as of a few minutes ago (and r272459 in subversion) the versions
without an execution policy are also present, thanks to this patch.


This is a small fix for yesterday's patch. This new test enforces the
standard's current requirement that the function object is called with
lvalues (unless you use move iterators for the range). That's probably
a defect and will be discussed at the next WG21 meeting. If we fix the
standard I'll change this test too.

Tested x86_64-linux, committed to trunk.


commit 4e62418aeb33b851aa8fb454962999e84e82d4b9
Author: Jonathan Wakely 
Date:   Wed Jun 19 16:16:57 2019 +0100

Fix value category bugs in std::reduce

* include/std/numeric (reduce(Iter, Iter, T, BinOp)): Fix value
category used in invocable check.
(reduce(Iter, Iter, T)): Pass initial value as rvalue.
* testsuite/26_numerics/reduce/2.cc: New test.

diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index 66792506d10..fc2242f3de6 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -246,7 +246,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   _BinaryOperation __binary_op)
 {
   using value_type = typename iterator_traits<_InputIterator>::value_type;
-  static_assert(is_invocable_r_v<_Tp, _BinaryOperation, _Tp&, _Tp&>);
+  static_assert(is_invocable_r_v<_Tp, _BinaryOperation&, _Tp&, _Tp&>);
   static_assert(is_convertible_v);
   if constexpr (__is_random_access_iter<_InputIterator>::value)
 	{
@@ -278,7 +278,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 inline _Tp
 reduce(_InputIterator __first, _InputIterator __last, _Tp __init)
-{ return std::reduce(__first, __last, __init, plus<>()); }
+{ return std::reduce(__first, __last, std::move(__init), plus<>()); }
 
   /**
*  @brief  Calculate reduction of values in a range.
diff --git a/libstdc++-v3/testsuite/26_numerics/reduce/2.cc b/libstdc++-v3/testsuite/26_numerics/reduce/2.cc
new file mode 100644
index 000..adbfaf877bd
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/reduce/2.cc
@@ -0,0 +1,70 @@
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++17 } }
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// C++17 29.8.3 [reduce]
+
+#include 
+#include 
+#include 
+#include 
+
+struct T
+{
+  T(int);
+  T(T&&); // MoveConstructible
+  T& operator=(T&&); // not required by the standard, but it needs to be
+  T operator+(const T&) const;
+};
+
+void
+test01()
+{
+  T t[1]{1};
+  std::reduce(t, t+1, T(0));
+
+  using __gnu_test::test_container;
+  using __gnu_test::input_iterator_wrapper;
+  test_container con(t);
+  std::reduce(con.begin(), con.end(), T(0));
+}
+
+struct Op
+{
+  T operator()(T&, T&) const&;
+
+  // The standard does *not* require invoking as an rvalue to be supported.
+  T operator()(T&, T&) && = delete;
+
+  // The standard does *not* require rvalue arguments to be supported
+  // (this is almost certainly a defect and should be allowed).
+  T operator()(T&&, T&&) const = delete;
+};
+
+void
+test02()
+{
+  T t[1]{1};
+  std::reduce(t, t+1, T(0), Op());
+
+  using __gnu_test::test_container;
+  using __gnu_test::input_iterator_wrapper;
+  test_container con(t);
+  std::reduce(con.begin(), con.end(), T(0), Op());
+}


Re: [PATCH] [ARC] Fix PR89838

2019-06-19 Thread Jeff Law
On 6/19/19 3:29 AM, Claudiu Zissulescu wrote:
> Hi Jeff,
> 
>> OK.
>> 
> Thank you for your review.
> 
>> THe BZ mentions that this was found building a glibc test for ARC.
>> Is there a glibc port for the ARC?  I don't see one in the glibc
>> git repo. Are you aware of any plans to produce an official glibc
>> port.
> 
> Indeed, we are in the process of upstreaming the glibc port for arc.
> However,  we hit a number of issues, one being this BZ, and other is
> the need to provide some ARC processor internals documentation. Once
> those items cleaned up, the arc glibc port will be public available
> upstream.
That'll be great.  It looks like there's an ARC kernel port (needed for
the headers and kernel build step).  So it should look like a standard
cross glibc/kernel as far as my tester is concerned.


> 
>> 
>> I believe building glibc is a hell of a better sniff test than
>> building newlib.  So if it's in the plan, I'd love to re-wire my
>> tester to test with glibc rather than newlib on the ARC port.
>> 
> 
> I totally agree, we also have an uClibc-ng port which, as far as I
> know, is fully upstream. If I may ask, how do you test the ARC
> toolchain, do you use Synopsys free nSIM simulator?
I've pondered wiring in support for uClibc, but haven't had the time.
Right now for ARC we build libgcc, and newlib/libgloss, then run the gcc
testsuite with a dummy simulator.

Essentially this verifies the compiler generates code that can be
assembled and linked and allows us to also use the tests which scan dumps.

The dummy simulator always returns success to avoid useless noise.

The state of simulated targets using newlib/libgloss is generally poor.
 A large part of the problem is the fixed, limited, memory size that's
baked into the libgloss linker scripts.  This results in numerous
clashes of the heap & stack and whether or not those cause an execution
failure is horribly unpredictable, so except for a couple ports I
audited, I'm not doing simulator testing for the newlib/libgloss targets.

jeff


Re: [PATCH] Decrease hash-table-verification-limit from 100 to 10.

2019-06-19 Thread Jeff Law
On 6/19/19 7:15 AM, Martin Liška wrote:
> Hi.
> 
> The patch is about decrease of verification limit. I collected following 
> statistics:
> 
> godot WPA:
> 
> hash-table-verification-limit=0: 66s
> hash-table-verification-limit=10: 69s
> hash-table-verification-limit=100: 94s
> 
> make all runtime libraries in GCC for --disable-bootstrap 
> --disable-libsanitizer --disable-multilib
> 
> =0: 127s (17m30s user time)
> =10: 130s (18m17s user time)
> =100: 150s (20m37s user time)
> 
> That's why I'm suggesting default equal to 10. Moreover as items are rehashed 
> during a hash table
> growth we always compare different items.
> 
> Ready for trunk?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-06-19  Martin Liska  
> 
>   * params.def (PARAM_HASH_TABLE_VERIFICATION_LIMIT): Decrease
>   to 10.
OK
jeff


Re: [PATCH, netbsd] Give a name to the number 0 in sysarch(0, ...)

2019-06-19 Thread Jeff Law
On 6/19/19 7:13 AM, co...@sdf.org wrote:
> The definition originates in
> https://nxr.netbsd.org/xref/src/sys/arch/arm/include/sysarch.h#58
> 
> I've added the prefix SYSARCH to avoid any naming conflict concerns.
> 
> It looked a bit like an error on a first impression :-)
> 
> 
> 0001-Give-a-name-to-the-number-0-in-sysarch-0.patch
> 
> * config/arm/netbsd-elf.h (SYSARCH_ARM_SYNC_ICACHE): New definition.
> (CLEAR_INSN_CACHE) Use it.
THanks.  Installed.
jeff


Re: [PATCH 1/3] Create GCN-specific gthreads

2019-06-19 Thread Jeff Law
On 6/19/19 2:57 AM, Andrew Stubbs wrote:
> Ping.
> 
> I can probably approve this myself, as it only affects GCN, but I'd
> appreciate a second opinion.
Yes, this would fall under things you could approve yourself.  Thanks
for double-checking.

jeff


Re: [PATCH 2/3] Stub implementation of unwinding for AMD GCN.

2019-06-19 Thread Jeff Law
On 6/19/19 2:58 AM, Andrew Stubbs wrote:
> Ping.
> 
> I can probably approve this myself, as it only affects GCN, but I'd
> appreciate a second opinion.
Similarly this is fine to self-approve.  Thanks.

Jeff


Re: [PATCH 3/3] Enable full libgfortran library for AMD GCN

2019-06-19 Thread Jeff Law
On 6/7/19 8:40 AM, Andrew Stubbs wrote:
> This patch basically reverts the previous patch to put AMD GCN in
> "minimal" mode.
> 
> OK to commit?
OK
jeff



[Darwin, committed] Fix two off-by-one errors in the driver.

2019-06-19 Thread Iain Sandoe
Spotted by Dominique with a sanitised build.

fixed thus on trunk - will backport as needed.
thanks
Iain

2019-06-19  Iain Sandoe  

* config/darwin-driver.c (darwin_driver_init): Fix off-by-one errors
in computing the number of options to be moved.



diff --git a/gcc/config/darwin-driver.c b/gcc/config/darwin-driver.c
index 01238e2..3d85f29 100644
--- a/gcc/config/darwin-driver.c
+++ b/gcc/config/darwin-driver.c
@@ -261,7 +261,7 @@ darwin_driver_init (unsigned int *decoded_options_count,
  if (*decoded_options_count > i) {
memmove (*decoded_options + i,
 *decoded_options + i + 1,
-((*decoded_options_count - i)
+((*decoded_options_count - i - 1)
  * sizeof (struct cl_decoded_option)));
  }
  --i;
@@ -307,7 +307,7 @@ darwin_driver_init (unsigned int *decoded_options_count,
  if (*decoded_options_count > i) {
memmove (*decoded_options + i,
 *decoded_options + i + 1,
-((*decoded_options_count - i)
+((*decoded_options_count - i - 1)
  * sizeof (struct cl_decoded_option)));
  }
  --i;



Re: [RFC, PATCH] Display inlining context for uninitialized warnings

2019-06-19 Thread Jeff Law
On 6/19/19 8:57 AM, Martin Sebor wrote:
> On 6/19/19 5:11 AM, Vladislav Ivanishin wrote:
>> Hi,
>>
>> This patch (partially) adds displaying inlining context for
>> -W{maybe,}uninitialized warnings.  This is not as trivial to enable as
>> simply supplying the "%G" format specifier, so I have some questions
>> below.
>>
>> I need this hunk
>>
>>   void
>>   percent_K_format (text_info *text, location_t loc, tree block)
>>   {
>> -  text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
>> +  if (loc != UNKNOWN_LOCATION)
>> +    text->set_location (0, loc, SHOW_RANGE_WITH_CARET);
>>    for two reasons:
>>
>> - The gimple return stmt has UNKNOWN_LOCATION due to this code in
>>    lower_gimple_return ():
>>
>>    if (gimple_return_retval (stmt) == gimple_return_retval (tmp_rs.stmt))
>>  {
>>    /* Remove the line number from the representative return
>> statement.
>>   It now fills in for many such returns.  Failure to remove this
>>   will result in incorrect results for coverage analysis.  */
>>    gimple_set_location (tmp_rs.stmt, UNKNOWN_LOCATION);
> 
> This code is also causing quite a few non-trivial instances of
> -Wreturn-local-addr to have no location after two (or more) such
> statements have been merged (I raised PR 90735 for it), independent
> of inlining.  I wonder if using the location of the closing curly
> brace of the function definition (if it's available somewhere)
> would work without messing up the coverage analysis.
> 
>>
>>    (In case you are wondering, the code and the comment were
>>    added in 2004.)
>>
>> - When percent_K_format is called, TEXT might already hold precise
>>    location information in case its (indirect) caller is
>>    warning_at/error_at.  So it seems to me, this function lacks
>>    distinguishing the two cases: being called from plain warning/error
>>    functions vs. their *_at counterparts (the location shouldn't be
>>    updated in the latter case).
>>
>>    Martin, you did the necessary work for percent_G_format to accept
>>    arbitrary gimple statements rather than just calls for PR86650, but
>>    left the PR open.  Do you remember running into that sort of problem,
>>    or was it something else?
> 
> I'm not sure it was the same problem.  I described what I was seeing
> when I posted the patch:
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01286.html
> I think it's worth revisiting this with your patch in place.  It would
> be nice to get the inlining context in place for -Warray-bounds too.
> David Malcolm (CC'd) is the expert on locations and the diagnostic
> subsystem so he will have a much better insight than me into all of
> this than me.
You might also want to bring in Andreas K who added the original bits to
help with debug info generation and Eric B who extended that code in
2014 due to impacts on profiling.

https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00769.html
https://gcc.gnu.org/ml/gcc-patches/2011-02/msg00421.html

Jeff


Re: [PATCH] PR fortran/87907 -- Don't dereference a NULL pointer

2019-06-19 Thread Paul Richard Thomas
Hi Steve,

I'm surprised that you didn't commit as obvious :-) OK for trunk.

Thanks

Paul

On Tue, 18 Jun 2019 at 20:30, Steve Kargl
 wrote:
>
> The attach patch has been regression tested on x86_64-*-freebsd.
> If the pointer is NULL, the function simply returns.  It seems
> that gfortran then does the Right Thing.  OK to commit?
>
> 2019-06-18  Steven G. Kargl  
>
>  PR fortran/87907
>  * resolve.c (resolve_contained_fntype): Do not dereference a NULL
>  pointer.
>
> 2019-06-18  Steven G. Kargl  
>
>  PR fortran/87907
> * gfortran.dg/pr87907.f90: New testcase.
>
>
> --
> Steve



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH] PR fortran/69398 -- Duplicate dimensions and CLASS

2019-06-19 Thread Dominique d'Humières
Hi Steve,

Patch missing?

TIA

Dominique


Re: [PATCH] implement -Wformat-diag, v2

2019-06-19 Thread Jeff Law
On 6/18/19 1:21 PM, Martin Sebor wrote:
> On 6/18/19 12:59 PM, Jeff Law wrote:
>> On 5/22/19 10:42 AM, Martin Sebor wrote:
>>> Attached is a revised implementation of the -Wformat-diag checker
>>> incorporating the feedback I got on the first revision.
>>>
>>> Martin
>>>
>>> gcc-wformat-diag-checker.diff
>>>
>>> gcc/c-family/ChangeLog:
>>>
>>> * c-format.c (function_format_info::format_type): Adjust type.
>>> (function_format_info::is_raw): New member.
>>> (decode_format_type): Adjust signature.  Handle "raw" diag
>>> attributes.
>>> (decode_format_attr): Adjust call to decode_format_type.
>>> Avoid a redundant call to convert_format_name_to_system_name.
>>> Avoid abbreviating the word "arguments" in a diagnostic.
>>> (format_warning_substr): New function.
>>> (avoid_dollar_number): Quote dollar sign in a diagnostic.
>>> (finish_dollar_format_checking): Same.
>>> (check_format_info): Same.
>>> (struct baltoks_t): New.
>>> (c_opers, c_keywords, cxx_keywords, badwords, contrs): New arrays.
>>> (maybe_diag_unbalanced_tokens, check_tokens, check_plain): New
>>> functions.
>>> (check_format_info_main): Call check_plain.  Use baltoks_t.  Call
>>> maybe_diag_unbalanced_tokens.
>>> (handle_format_attribute): Spell out the word "arguments" in
>>> a diagnostic.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.dg/format/gcc_diag-11.c: New test.
>> High level comment.  This is painful to read.  But that's probably an
>> artifact of trying to cobble together C code to parse strings and codify
>> the conventions.    ie, it's likely inherent for the problem you're
>> trying to solve.
> 
> It wasn't exactly a lot of fun to write either.  I suspect it
> would have been even worse if I had used regular expressions.
> It is more complicated than strictly necessary because it's
> trying to balance usability of the warning with efficiency.
> (Although I'm sure both could be improved.)
> 
>>> diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
>>> index a7f76c1c01d..713fce442c9 100644
>>> --- a/gcc/c-family/c-format.c
>>> +++ b/gcc/c-family/c-format.c
>>> @@ -320,10 +352,8 @@ decode_format_attr (const_tree fntype, tree
>>> atname, tree args,
>>>   {
>>>     const char *p = IDENTIFIER_POINTER (format_type_id);
>>>   -  p = convert_format_name_to_system_name (p);
>>> +  info->format_type = decode_format_type (p, &info->is_raw);
>>>   -  info->format_type = decode_format_type (p);
>>> -
>>>     if (!c_dialect_objc ()
>>>  && info->format_type == gcc_objc_string_format_type)
>>>   {
>> Did you mean to drop the call to convert_format_name_to_system_name?
> 
> Yes, it's redundant (it's the first thing decode_format_type does).
> 
>>> @@ -2789,6 +2850,904 @@ check_argument_type (const format_char_info
>>> *fci,
>>>     return true;
>>>   }
>>>   +
>>> +/* Helper for initializing global token_t arrays below.  */
>>> +#define NAME(name) { name, sizeof name - 1, NULL }
>>> +
>>> +/* C/C++ operators that are expected to be quoted within the format
>>> +   string.  */
>>> +
>>> +static const token_t c_opers[] =
>>> +  {
>>> +   NAME ("!="), NAME ("%="),  NAME ("&&"),  NAME ("&="), NAME ("*="),
>>> +   NAME ("++"), NAME ("+="),  NAME ("--"),  NAME ("-="), NAME ("->"),
>>> +   NAME ("/="), NAME ("<<"),  NAME ("<<="), NAME ("<="), NAME ("=="),
>>> +   NAME (">="), NAME (">>="), NAME (">>"),  NAME ("?:"),  NAME ("^="),
>>> +   NAME ("|="), NAME ("||")
>>> +  };
>> This clearly isn't a full list of operators.  Is there a particular
>> reason why we aren't diagnosing others.  I guess you're catching the
>> single character operators via the ISPUNCT checks?  That seems a little
>> loose (@ isn't an operator for example).  It  may be OK in practice
>> though.
> 
> Yes, it only handles two-character operators and its only purpose
> is to make diagnostics more readable and less repetitive (otherwise
> we'd get one for each occurrence of the punctuator). I think @ is
> an operator in Objective-C (I only know this because I fixed a few
> instances of it there).
> 
>>> +
>>> +  if (nchars)
>>> +    {
>>> +  /* This is the most common problem: go the extra mile to decribe
>> s/decribe/describe/
>>
>>
>>
>>> +
>>> +static void
>>> +maybe_diag_unbalanced_tokens (location_t format_string_loc,
>>> +  const char *orig_format_chars,
>>> +  tree format_string_cst,
>>> +  baltoks_t &baltoks)
>> Needs a function comment.
>>
>>
>>> @@ -2828,10 +3789,26 @@ check_format_info_main (format_check_results
>>> *res,
>>>       init_dollar_format_checking (info->first_arg_num,
>>> first_fillin_param);
>>>   +  /* In GCC diagnostic functions check plain directives
>>> (substrings within
>>> + the format string that don't start with %) for quoting and
>>> punctuations
>>> + problems.  */
>>> +  bool ck_plain = (!info->is_raw
>>> +   && (info->format_type == gcc_diag_format_type
>>> + 

Re: [PATCH 1/3] Create GCN-specific gthreads

2019-06-19 Thread Andrew Stubbs

On 19/06/2019 17:04, Jeff Law wrote:

On 6/19/19 2:57 AM, Andrew Stubbs wrote:

Ping.

I can probably approve this myself, as it only affects GCN, but I'd
appreciate a second opinion.

Yes, this would fall under things you could approve yourself.  Thanks
for double-checking.


Sorry, I meant I'd like another opinion on the patch contents. I'm not 
confident that I didn't miss something.


Andrew


Re: [PATCH 2/3] Stub implementation of unwinding for AMD GCN.

2019-06-19 Thread Andrew Stubbs

On 19/06/2019 17:04, Jeff Law wrote:

On 6/19/19 2:58 AM, Andrew Stubbs wrote:

Ping.

I can probably approve this myself, as it only affects GCN, but I'd
appreciate a second opinion.

Similarly this is fine to self-approve.  Thanks.


Sorry, same again, I meant I'd like another opinion on the patch contents.

Andrew


Re: [PATCH 1/3] Create GCN-specific gthreads

2019-06-19 Thread Jeff Law
On 6/19/19 10:56 AM, Andrew Stubbs wrote:
> On 19/06/2019 17:04, Jeff Law wrote:
>> On 6/19/19 2:57 AM, Andrew Stubbs wrote:
>>> Ping.
>>>
>>> I can probably approve this myself, as it only affects GCN, but I'd
>>> appreciate a second opinion.
>> Yes, this would fall under things you could approve yourself.  Thanks
>> for double-checking.
> 
> Sorry, I meant I'd like another opinion on the patch contents. I'm not
> confident that I didn't miss something.
The contents looked reasonable as well.   The changes aren't going to
affect any other targets, so you can flesh them out over time as needed.

jeff


PR libstdc++/90945 Patch to have pretty printer for std::vector return bool intead of int for elements

2019-06-19 Thread Michael Weghorn
Hi everyone,

the Python pretty printer for a 'std::vector' currently returns
integers as values for the elements, which e.g. leads to the situation
that a 'gdb.Value' constructed from that doesn't have 'bool' type, but
an integer type ('long long' for my test with gdb 8.2.1 on Debian
testing, amd64). Returning bool values ('True'/'False') would make sure
that the type is clear and can thus help improve the displayed type.
More details in [1].

The attached patch changes the behaviour of the pretty printer accordingly.


I'd be glad to receive feedback on this and also notes in case anything
else is needed. (This is my first contribution.)

So far, I've tested this with GDB 8.2.1 on Debian testing.

ChangeLog:
* PR libstdc++/90945: Have pretty printer return bool instead of int for
  value of std::vector elements


Regards,
  Michael


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90945
>From c62cbd47ea8602e2351ff3b0c4c80fa8cb71e313 Mon Sep 17 00:00:00 2001
From: Michael Weghorn 
Date: Wed, 19 Jun 2019 17:22:16 +0200
Subject: [PATCH] Have std::vector printer's iterator return bool

Have the pretty-printer for 'std::vector' return a
value of type 'bool' rather than an 'int'.

This way, the type is clear and that can be used for better
display and a 'gdb.Value' constructed from the returned value
will have type 'bool' again, not e.g. 'long long' as happened
previously (at least with GDB 8.2.1 on amd64).

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 05143153bee..177db3c0fdb 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -363,15 +363,12 @@ class StdVectorPrinter:
 if self.item == self.finish and self.so >= self.fo:
 raise StopIteration
 elt = self.item.dereference()
-if elt & (1 << self.so):
-obit = 1
-else:
-obit = 0
+val = elt & (1 << self.so) != 0
 self.so = self.so + 1
 if self.so >= self.isize:
 self.item = self.item + 1
 self.so = 0
-return ('[%d]' % count, obit)
+return ('[%d]' % count, val)
 else:
 if self.item == self.finish:
 raise StopIteration
-- 
2.20.1



[wwwdocs] Announce PRU backend

2019-06-19 Thread Dimitar Dimitrov
Hi,

This WWW update announces the new PRU port in WWW docs, and fills in the 
backend characteristics.

Thanks,
Dimitar

Index: htdocs/backends.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/backends.html,v
retrieving revision 1.85
diff -u -r1.85 backends.html
--- htdocs/backends.html13 Jun 2019 18:15:29 -  1.85
+++ htdocs/backends.html19 Jun 2019 16:54:23 -
@@ -104,6 +104,7 @@
 pa | Q   CBD  qr  b   i  e
 pdp11  |L   ICqr  b  e
 powerpcspe | Q   Cqr pb   ia
+pru|L  F   a  s
 riscv  | Q   Cqr gia
 rl78   |L  F l   gs
 rs6000 | Q   Cqr pb   ia
Index: htdocs/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.1129
diff -u -r1.1129 index.html
--- htdocs/index.html   3 May 2019 09:05:54 -   1.1129
+++ htdocs/index.html   19 Jun 2019 16:54:23 -
@@ -54,6 +54,10 @@
 News
 
 
+PRU support
+ [2019-06-12]
+ GCC support for TI PRU I/O processors has been added.
+
 GCC 9.1 released
 [2019-05-03]
 
Index: htdocs/readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.314
diff -u -r1.314 readings.html
--- htdocs/readings.html11 Jun 2019 06:06:18 -  1.314
+++ htdocs/readings.html19 Jun 2019 16:54:23 -
@@ -253,6 +253,12 @@
   http://simh.trailing-edge.com/";>Simulators
  
 
+ pru
+   Manufacturer: Texas Instruments
+   http://processors.wiki.ti.com/index.php/PRU-ICSS";>Official 
PRU Documentation
+   https://elinux.org/Category:PRU";>Community PRU 
Documentation
+ 
+
  riscv
   Manufacturer: Many (open ISA standard)
   https://riscv.org";>RISC-V Foundation
Index: htdocs/gcc-10/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-10/changes.html,v
retrieving revision 1.3
diff -u -r1.3 changes.html
--- htdocs/gcc-10/changes.html  13 May 2019 11:00:15 -  1.3
+++ htdocs/gcc-10/changes.html  19 Jun 2019 16:54:23 -
@@ -87,6 +87,13 @@
 
 
 
+PRU
+
+  
+A new back end targeting TI PRU I/O processors has been contributed to GCC.
+  
+
+
 
 
 



unordered_multimap/unordered_multiset optimizations

2019-06-19 Thread François Dumont

Hi

    Still influenced by PR 68303 this patch:

- Extend usage of find within other methods. It simplify code and will 
allow to implement the PR in less places if we decide to do so.


- Get rid of several bucket index comparison for non-unique key 
containers this way we have less hash code computations.


It also adds _M_update_bbegin cause I plan to propose another 
optimization regarding this special bucket maintenance.



    * include/bits/hashtable_policy.h (_Map_base<>::at): Use
    _Hashtable<>::find.
(_Hashtable_base<>::_Equal_hash_code<>::_S_node_equals):New.
    (_Hashtable_base<>::_M_node_equals): New, use latter.
    * include/bits/hashtable.h (_Hashtable<>::_M_update_bbegin): New.
    (_Hashtable<>::_M_assign): Use latter.
    (_Hashtable<>::_M_move_assign): Likewise.
    (_Hashtable<>(_Hashtable<>&&)): Likewise.
    (_Hashtable<>(_Hashtable<>&&, const allocator_type&)): Likewise.
    (_Hashtable<>::swap): Likewise.
    (_Hashtable<>::find): Build iterator directly from _M_find_node result.
    (_Hashtable<>::count): Use _Hashtable<>::find.
    (_Hashtable<>::equal_range): Likewise.
    (_Hashtable<>::_M_erase(false_type, const key_type&)): Use
    _M_node_equals.

Tested under Linux x86_64, ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 41edafaa2e3..95c409a4d10 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -359,6 +359,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // numerous checks in the code to avoid 0 modulus.
   __bucket_type		_M_single_bucket	= nullptr;
 
+  void
+  _M_update_bbegin()
+  {
+	if (_M_begin())
+	  _M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+  }
+
+  void
+  _M_update_bbegin(__node_type* __n)
+  {
+	_M_before_begin._M_nxt = __n;
+	_M_update_bbegin();
+  }
+
   bool
   _M_uses_single_bucket(__bucket_type* __bkts) const
   { return __builtin_expect(__bkts == &_M_single_bucket, false); }
@@ -1152,8 +1166,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__node_type* __ht_n = __ht._M_begin();
 	__node_type* __this_n = __node_gen(__ht_n);
 	this->_M_copy_code(__this_n, __ht_n);
-	_M_before_begin._M_nxt = __this_n;
-	_M_buckets[_M_bucket_index(__this_n)] = &_M_before_begin;
+	_M_update_bbegin(__this_n);
 
 	// Then deal with other nodes.
 	__node_base* __prev_n = __this_n;
@@ -1214,15 +1227,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  _M_buckets = &_M_single_bucket;
 	  _M_single_bucket = __ht._M_single_bucket;
 	}
+
   _M_bucket_count = __ht._M_bucket_count;
   _M_before_begin._M_nxt = __ht._M_before_begin._M_nxt;
   _M_element_count = __ht._M_element_count;
   std::__alloc_on_move(this->_M_node_allocator(), __ht._M_node_allocator());
 
-  // Fix buckets containing the _M_before_begin pointers that can't be
-  // moved.
-  if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+  // Fix bucket containing the _M_before_begin pointer that can't be moved.
+  _M_update_bbegin();
   __ht._M_reset();
 }
 
@@ -1293,10 +1305,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  _M_single_bucket = __ht._M_single_bucket;
 	}
 
-  // Update, if necessary, bucket pointing to before begin that hasn't
-  // moved.
-  if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+  // Fix bucket containing the _M_before_begin pointer that can't be moved.
+  _M_update_bbegin();
 
   __ht._M_reset();
 }
@@ -1348,11 +1358,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  else
 	_M_buckets = __ht._M_buckets;
 
-	  _M_before_begin._M_nxt = __ht._M_before_begin._M_nxt;
-	  // Update, if necessary, bucket pointing to before begin that hasn't
+	  // Fix bucket containing the _M_before_begin pointer that can't be
 	  // moved.
-	  if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	  _M_update_bbegin(__ht._M_begin());
+
 	  __ht._M_reset();
 	}
   else
@@ -1420,14 +1429,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   std::swap(_M_element_count, __x._M_element_count);
   std::swap(_M_single_bucket, __x._M_single_bucket);
 
-  // Fix buckets containing the _M_before_begin pointers that can't be
-  // swapped.
-  if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
-
-  if (__x._M_begin())
-	__x._M_buckets[__x._M_bucket_index(__x._M_begin())]
-	  = &__x._M_before_begin;
+  // Fix bucket containing the _M_before_begin pointer that can't be swap.
+  _M_update_bbegin();
+  __x._M_update_bbegin();
 }
 
   template_M_hash_code(__k);
   std::size_t __bkt = _M_bucket_index(__k, __code);
-  __node_type* __p = _M_find_node(__bkt, __k, __code);
-  return __p ? iterator(__p) : end();
+  return iterator(_M_find_node(__bkt, __k, __code));
 }
 
   template_M_hash_code(__k);
   std::size_t __bkt 

Re: [PATCH 1/2] PR c/65403 - Ignore -Wno-error=

2019-06-19 Thread Jeff Law
On 3/18/19 8:46 PM, Alex Henrie wrote:
> From: Manuel López-Ibáñez 
> 
> * opts.c: Ignore -Wno-error= except if there are
> other diagnostics.
That's not a complete ChangeLog entry.  Each file/function changed
should be mentioned.  Something like this:

* opts-common.c (ignored_wnoerror_options): New global variable.
* opts-global.c (print_ignored_options): Ignore
-Wno-error= except if there are other
diagnostics.
* opts.c (enable_warning_as_error): Record ignored -Wno-error
options.
opts.h (ignored_wnoerror_options): Declare.

> diff --git a/gcc/opts.c b/gcc/opts.c
> index 3161e0b6753..4fa79306e30 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -3079,7 +3079,10 @@ enable_warning_as_error (const char *arg, int value, 
> unsigned int lang_mask,
>strcpy (new_option + 1, arg);
>option_index = find_opt (new_option, lang_mask);
>if (option_index == OPT_SPECIAL_unknown)
> -error_at (loc, "%<-Werror=%s%>: no option -%s", arg, new_option);
> +if (value)
> +  error_at (loc, "%<-Werror=%s%>: no option -%s", arg, new_option);
> +else
> +  ignored_wnoerror_options.safe_push (arg);
>else if (!(cl_options[option_index].flags & CL_WARNING))
>  error_at (loc, "%<-Werror=%s%>: -%s is not an option that controls "
> "warnings", arg, new_option);
This hunk is my only concern with the patch.  In particular how is this
going to interact with the hints/suggestions we emit when we see an
unknown option?

The code on the trunk looks like this now:

>   if (option_index == OPT_SPECIAL_unknown)
> {
>   option_proposer op;
>   const char *hint = op.suggest_option (new_option);
>   if (hint)
> error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>;"
>   " did you mean %<-%s%>?", value ? "" : "no-",
>   arg, new_option, hint);
>   else
> error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>",
>   value ? "" : "no-", arg, new_option);
> }
If HINT is set, do we still want to potentially push the argument onto
the ignored_wnoerror_options list?  My inclination is yes since the hint
is just that, a fuzzily matched hint.  That hint may be appropriate
(user typo'd or somesuch) or it may be totally offbase (user was trying
to turn off some future warning.

I'm open to other opinions here.

jeff


Re: Use ODR for canonical types construction in LTO

2019-06-19 Thread Jan Hubicka
> > > -ctype = CLASSTYPE_AS_BASE (ctype);
> > > +{
> > > +  if (!tree_int_cst_equal (TYPE_SIZE (ctype),
> > > +TYPE_SIZE (CLASSTYPE_AS_BASE (ctype
> > > +ctype = CLASSTYPE_AS_BASE (ctype);
> > > +}
> > > tree clobber = build_clobber (ctype);
> 
> I have noticed we build a distinct as-base type in rather more cases than
> strictly necessary.  For instance when there's a member of reference type or
> we have a non-trivial dtor. (CLASSTYPE_NON_LAYOUT_POD_P gets set by a bunch
> of things that don't affect ABI layout)

Avoiding the extra copies at first place would be great. In my
understanding the types differ by virtual bases and also by their size
since the fake types are not padded to multiply of their alignment.
I guess this can be tested ahead of producing the copy and saving some
memory...

I am not sure if my C++ FE abilities are on par to implement this tough.
Honza
> 
> nathan
> 
> -- 
> Nathan Sidwell


Re: [PATCH 2/2] PR c/65403 - Add tests for -Wno-error=

2019-06-19 Thread Jeff Law
On 3/18/19 8:46 PM, Alex Henrie wrote:
> ---
>  gcc/testsuite/c-c++-common/pr65403-1.c | 10 ++
>  gcc/testsuite/c-c++-common/pr65403-2.c | 15 +++
>  2 files changed, 25 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/pr65403-1.c
>  create mode 100644 gcc/testsuite/c-c++-common/pr65403-2.c
This needs a ChangeLog entry.  It could be as simple as:

* c-c++-common/pr65403-1.c: New test.
* c-c++-common/pr65403-2.c: New test.

No inherent problems with the test, though I think we need to verify
they still do the right thing given the hints the compiler will try to
emit on unknown options now.  That's easy enough to do once we have
concluded on what the behvior should be now that the compiler emits
hints for unknown options.

Jeff


Re: [PATCH] Wrap 'expand_all_functions' and 'ipa_passes' around timevars

2019-06-19 Thread Jeff Law
On 1/24/19 12:51 PM, Giuliano Belinassi wrote:
> This patch adds two variables named 'TV_CGRAPH_FUNC_EXPANSION' and
> 'TV_CGRAPH_IPA_PASSES' that count the elapsed time of the functions
> 'expand_all_functions' and 'ipa_passes', respectivelly.
> 
> The main point of this is that these functions takes a very long time
> when compiling the 'gimple-match.c' file, and therefore may also take
> a long time when compiling other large files.
> 
> I also accept suggestions about how to improve this :-)
> 
> ChangeLog:
> 
> 2019-01-24Giuliano Belinassi 
> 
>   * cgraph_unit.c (compile): TV_CGRAPH_FUNC_EXPANSION and
>   TV_CGRAPH_IPA_PASSES start, stop.
>   * timevar.def (TV_CGRAPH_IPA_PASSES, TV_CGRAPH_FUNC_EXPANSION): New.
So I'm guessing you want the accumulated time for the ipa_passes and
expansion.  So independent counters using timevar_{start,stop} seem right.

The alternate approach would be to use the timevar_{push,pop}. This
would change what timevar is on the top of the stack when we call
ipa_passes and expand_all_functions -- it would, in effect, allow us to
determine how much time is spent in those functions which is _not_
attributable to existing timevars.

I just want to confirm your intent before acking/naking.

jeff


Re: Use ODR for canonical types construction in LTO

2019-06-19 Thread Nathan Sidwell

On 6/19/19 1:53 PM, Jan Hubicka wrote:

-ctype = CLASSTYPE_AS_BASE (ctype);
+{
+  if (!tree_int_cst_equal (TYPE_SIZE (ctype),
+  TYPE_SIZE (CLASSTYPE_AS_BASE (ctype
+ctype = CLASSTYPE_AS_BASE (ctype);
+}
 tree clobber = build_clobber (ctype);


I have noticed we build a distinct as-base type in rather more cases than
strictly necessary.  For instance when there's a member of reference type or
we have a non-trivial dtor. (CLASSTYPE_NON_LAYOUT_POD_P gets set by a bunch
of things that don't affect ABI layout)


Avoiding the extra copies at first place would be great. In my
understanding the types differ by virtual bases and also by their size
since the fake types are not padded to multiply of their alignment.
I guess this can be tested ahead of producing the copy and saving some
memory...

I am not sure if my C++ FE abilities are on par to implement this tough.


I don't think it's simple to fix there, just unfortunate.  your 
understanding is correct, and I think your workaround will work. 
However, remember it's possible for T == CLASSTYPE_AS_BASE (T), so might 
be worth checking that before doing the size comparison?


It'd be great to comment on why you're not just using classtype_as_base 
there.  I suppose I'm serializing this stuff too, with the same 
inefficiencies ...


nathan

--
Nathan Sidwell


Re: [PATCH] Reintroduce vec_shl_optab and use it for #pragma omp scan inclusive

2019-06-19 Thread Richard Biener
On June 19, 2019 2:46:15 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On June 19, 2019 11:05:42 AM GMT+02:00, Richard Sandiford
> wrote:
>>>Richard Biener  writes:
 On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek
>>> wrote:
>Hi!
>
>When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR,
>vec_shl_optab
>has been removed as unused, because we only used vec_shr_optab for
>>>the
>reductions.
>Without this patch the vect-simd-*.c tests can be vectorized just
>>>fine
>for SSE4 and above, but can't be with SSE2.  As the comment in
>tree-vect-stmts.c tries to explain, for the inclusive scan
>operation
>>>we
>want (when using V8SImode vectors):
>   _30 = MEM  [(int *)&D.2043];
>   _31 = MEM  [(int *)&D.2042];
>   _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
>   _33 = _31 + _32;
> // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ...,
>_31[6]+_31[7]
>>>};
>   _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
>   _35 = _33 + _34;
>// _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2],
>>>_31[0]+.._31[3],
>   // _31[1]+.._31[4], ... _31[4]+.._31[7] };
>   _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3
>}>;
>   _37 = _35 + _36;
>// _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2],
>>>_31[0]+.._31[3],
>   // _31[0]+.._31[4], ... _31[0]+.._31[7] };
>   _38 = _30 + _37;
>   _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>   MEM  [(int *)&D.2043] = _39;
>   MEM  [(int *)&D.2042] = _38;  */
>For V4SImode vectors that would be VEC_PERM_EXPR 1,
>>>2
>}>,
>VEC_PERM_EXPR  and
>VEC_PERM_EXPR  etc.
>Unfortunately, SSE2 can't do the VEC_PERM_EXPR 2
>}>
>permutation (the other two it can do).  Well, to be precise, it can
>>>do
>it
>using the vector left shift which has been removed as unused,
>>>provided
>that init is initializer_zerop (shifting all zeros from the left).
>init usually is all zeros, that is the neutral element of additive
>reductions and couple of others too, in the unlikely case that some
>other
>reduction is used with scan (multiplication, minimum, maximum,
>>>bitwise
>and),
>we can use a VEC_COND_EXPR with constant first argument, i.e. a
>blend
>or
>and/or.
>
>So, this patch reintroduces vec_shl_optab (most backends actually
>>>have
>those
>patterns already) and handles its expansion and vector generic
>>>lowering
>similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the
>>>first
>operand is initializer_zerop and third operand starts with a few
>numbers
>smaller than number of elements (doesn't matter which one, as all
>elements
>are same - zero) followed by nelts, nelts+1, nelts+2, ...
>Unlike vec_shr_optab which has zero as the second operand, this one
>>>has
>it
>as first operand, because VEC_PERM_EXPR canonicalization wants to
>>>have
>first element selector smaller than number of elements.  And unlike
>vec_shr_optab, where we also have a fallback in
>>>have_whole_vector_shift
>using normal permutations, this one doesn't need it, that
>"fallback"
>>>is
>tried
>first before vec_shl_optab.
>
>For the vec_shl_optab checks, it tests only for constant number of
>elements
>vectors, not really sure if our VECTOR_CST encoding can express the
>left
>shifts in any way nor whether SVE supports those (I see aarch64 has
>vec_shl_insert but that is just a fixed shift by element bits and
>shifts in
>a scalar rather than zeros).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
>trunk?

 Ok. 
>>>
>>>I think it would be worth instead telling evpc that the second
>permute
>>>vector is zero.  Permutes with a second vector of zero are somewhat
>>>special for SVE too, and could be in other cases for other targets.
>>>
>>>I thought the direction of travel was not to have optabs for specific
>>>kinds of permute any more.  E.g. zip, unzip and blend are common
>>>permutes too, but we decided not to commonise those.
>>
>> The issue is that vec_perm_const_ok doesn't have enough info here
>> (second operand is all zeros). So the optab is needed to query target
>> capabilities, not so much for the actual expansion which could go via
>> vec_perm_const.
>
>Not at the moment, sure.  But my point was that we could add that
>information instead of doing what the patch does, since the information
>could be useful in other cases too.
>
>These days all permute queries go through the same target hook as the
>expansion: targetm.vec_perm_const,  So the target interface itself
>should adapt naturally.
>
>For can_vec_perm_const_p we could either add zeroness information
>to vec_perm_indices or provide it separately (e.g. with tree inputs).
>In the latter case the zerone

[Committed] PR fortran/69499 -- SELECT TYPE is executable statement

2019-06-19 Thread Steve Kargl
I have committed the attach patch.  It checks that
a SELECT TYPE construct does not appear in MODULE
or SUBMODULE scope as it is an executable statement.


2019-06-19  Steven G. Kargl  

PR fortran/69499
* match.c (gfc_match_select_type):  SELECT TYPE is an executable 
statement, and cannot appear in MODULE or SUBMODULE scope.

2019-06-19  Steven G. Kargl  

PR fortran/69499
* gfortran.dg/pr69499.f90: New test.
* gfortran.dg/module_error_1.f90: Update dg-error string.

-- 
Steve
Index: gcc/fortran/match.c
===
--- gcc/fortran/match.c	(revision 272480)
+++ gcc/fortran/match.c	(working copy)
@@ -6219,6 +6219,13 @@ gfc_match_select_type (void)
   if (m != MATCH_YES)
 return m;
 
+  if (gfc_current_state() == COMP_MODULE
+  || gfc_current_state() == COMP_SUBMODULE)
+{
+  gfc_error ("SELECT TYPE at %C cannot appear in this scope");
+  return MATCH_ERROR;
+}
+
   gfc_current_ns = gfc_build_block_ns (ns);
   m = gfc_match (" %n => %e", name, &expr2);
   if (m == MATCH_YES)
Index: gcc/testsuite/gfortran.dg/pr69499.f90
===
--- gcc/testsuite/gfortran.dg/pr69499.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr69499.f90	(working copy)
@@ -0,0 +1,7 @@
+! { dg-do compile }
+! PR fortran/69499
+! Contributed by Gerhard Steinmetz.
+module m
+   class(*) :: z! { dg-error "must be dummy, allocatable or pointer" }
+   select type (x => z) ! { dg-error "cannot appear in this scope" }
+end
Index: gcc/testsuite/gfortran.dg/module_error_1.f90
===
--- gcc/testsuite/gfortran.dg/module_error_1.f90	(revision 272480)
+++ gcc/testsuite/gfortran.dg/module_error_1.f90	(working copy)
@@ -1,5 +1,5 @@
 ! { dg-do compile }
 ! PR fortran/50627
 module kernels
-  select type (args) ! { dg-error "Unexpected SELECT TYPE" }
+  select type (args) ! { dg-error "cannot appear in this scope" }
 end module kernels


Re: PING^1: [PATCH] i386: Generate standard floating point scalar operation patterns

2019-06-19 Thread Jeff Law
On 6/3/19 4:50 PM, H.J. Lu wrote:
> On Tue, May 21, 2019 at 8:54 AM H.J. Lu  wrote:
>>
>> On Wed, May 15, 2019 at 2:29 PM Richard Sandiford
>>  wrote:
>>>
>>> "H.J. Lu"  writes:
 On Thu, Feb 7, 2019 at 9:49 AM H.J. Lu  wrote:
>
> Standard scalar operation patterns which preserve the rest of the vector
> look like
>
>  (vec_merge:V2DF
>(vec_duplicate:V2DF
>  (op:DF (vec_select:DF (reg/v:V2DF 85 [ x ])
> (parallel [ (const_int 0 [0])]))
>  (reg:DF 87))
>(reg/v:V2DF 85 [ x ])
>(const_int 1 [0x1])]))
>
> Add such pattens to i386 backend and convert VEC_CONCAT patterns to
> standard standard scalar operation patterns.
>>>
>>> It looks like there's some variety in the patterns used, e.g.:
>>>
>>> (define_insn 
>>> "_vm3"
>>>   [(set (match_operand:VF_128 0 "register_operand" "=x,v")
>>> (vec_merge:VF_128
>>>   (smaxmin:VF_128
>>> (match_operand:VF_128 1 "register_operand" "0,v")
>>> (match_operand:VF_128 2 "vector_operand" 
>>> "xBm,"))
>>>  (match_dup 1)
>>>  (const_int 1)))]
>>>   "TARGET_SSE"
>>>   "@
>>>\t{%2, %0|%0, %2}
>>>
>>> v\t{%2, 
>>> %1, %0|%0, %1, 
>>> %2}"
>>>   [(set_attr "isa" "noavx,avx")
>>>(set_attr "type" "sse")
>>>(set_attr "btver2_sse_attr" "maxmin")
>>>(set_attr "prefix" "")
>>>(set_attr "mode" "")])
>>>
>>> makes the operand a full vector operation, which seems simpler.
>>
>> This pattern is used to implement scalar smaxmin intrinsics.
>>
>>> The above would then be:
>>>
>>>   (vec_merge:V2DF
>>> (op:V2DF
>>>   (reg:V2DF 85)
>>>   (vec_duplicate:V2DF (reg:DF 87)))
>>> (reg/v:V2DF 85 [ x ])
>>> (const_int 1 [0x1])]))
>>>
>>> I guess technically the two have different faulting behaviour though,
>>> since the smaxmin gets applied to all elements, not just element 0.
>>
>> This is the issue.   We don't use the correct mode for scalar instructions:
>>
>> ---
>> #include 
>>
>> __m128d
>> foo1 (__m128d x, double *p)
>> {
>>   __m128d y = _mm_load_sd (p);
>>   return _mm_max_pd (x, y);
>> }
>> ---
>>
>> movq (%rdi), %xmm1
>> maxpd %xmm1, %xmm0
>> ret
>>
>>
>> Here is the updated patch to add standard floating point scalar
>> operation patterns to i386 backend.Then we can do
>>
>> ---
>> #include 
>>
>> extern __inline __m128d __attribute__((__gnu_inline__,
>> __always_inline__, __artificial__))
>> _new_mm_max_pd (__m128d __A, __m128d __B)
>> {
>>   __A[0] = __A[0] > __B[0] ? __A[0] : __B[0];
>>   return __A;
>> }
>>
>> __m128d
>> foo2 (__m128d x, double *p)
>> {
>>   __m128d y = _mm_load_sd (p);
>>   return _new_mm_max_pd (x, y);
>> }
>>
>> maxsd (%rdi), %xmm0
>> ret
>>
>> We should use generic vector operations to implement i386 intrinsics
>> as much as we can.
>>
>>> The patch seems very specific.  E.g. why just PLUS, MINUS, MULT and DIV?
>>
>> This patch only adds  +, -, *, /, > and <.We can add more if there
>> are testcases
>> for them.
>>
>>> Thanks,
>>> Richard
>>>
>>>
>
> gcc/
>
> PR target/54855
> * simplify-rtx.c (simplify_binary_operation_1): Convert
> VEC_CONCAT patterns to standard standard scalar operation
> patterns.
> * config/i386/sse.md (*_vm3): New.
> (*_vm3): Likewise.
>
> gcc/testsuite/
>
> PR target/54855
> * gcc.target/i386/pr54855-1.c: New test.
> * gcc.target/i386/pr54855-2.c: Likewise.
> * gcc.target/i386/pr54855-3.c: Likewise.
> * gcc.target/i386/pr54855-4.c: Likewise.
> * gcc.target/i386/pr54855-5.c: Likewise.
> * gcc.target/i386/pr54855-6.c: Likewise.
> * gcc.target/i386/pr54855-7.c: Likewise.

 PING:

 https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00398.html
>>
>> Thanks.
>>
> 
> PING:
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01416.html
The simplify-rtx changes are OK as are the x86 backend changes (either
the original version that just handled basic arithmetic operators or the
subsequent one that added support for minmax and setv2df_0.

Jeff



Re: [wwwdocs] Announce PRU backend

2019-06-19 Thread Jeff Law
On 6/19/19 11:06 AM, Dimitar Dimitrov wrote:
> Hi,
> 
> This WWW update announces the new PRU port in WWW docs, and fills in the 
> backend characteristics.
OK
jeff


[Darwin, specs, committed] Tidy some more linker options.

2019-06-19 Thread Iain Sandoe
pie, no-pie and rdynamic are driver options, we can process them in the
relevant place and drop them once dealt with.

Support for the -pie, -no_pie and -no_compact_unwind options should ideally
be checked at configure time, however the status quo is to assert that linkers
capable of targeting the relevant systems support these options (i.e. we trust
that the user doesn't attempt to configure inappropriately).

TODO: check the availability of the linker opts in configure rather than
trusting to the user.  We now have machinery in configure that can do this and
I plan to add tests for these, and a bunch of newer ones we are not yet
handling.

This fixes the fail of gcc.dg/pie-7.c, which is a result of failing to handle 
the
no-pie driver option.

tested on darwin9,10,16,18,
applied to mainline,
thanks
Iain

2019-06-19  Iain Sandoe  

* config/darwin.h (DRIVER_SELF_SPECS): Add RDYNAMIC, DARWIN_PIE_SPEC
and DARWIN_NOPIE_SPEC.
(RDYNAMIC): New, modified from DARWIN_EXPORT_DYNAMIC.
(DARWIN_PIE_SPEC): Collate from darwin.h and darwin9.h.
(DARWIN_NOPIE_SPEC): Collate from darwin10.h.
(DARWIN_NOCOMPACT_UNWIND): New from darwin10.h
(DARWIN_EXPORT_DYNAMIC): Delete.
* config/darwin10.h (LINK_GCC_C_SEQUENCE_SPEC): Move no_compact_unwind
and pie options processing to  darwin.h.
* config/darwin9.h (DARWIN_PIE_SPEC): Move pie processing to darwin.h

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index bf44b10..ae324f1 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -124,7 +124,30 @@ extern GTY(()) int darwin_ms_struct;
   "%{fapple-kext|mkernel:-static}",\
   "%{shared:-Zdynamiclib} %= 10.5 mmacosx-version-min= -Xlinker) \
+   %:version-compare(>= 10.5 mmacosx-version-min= -pie) }} %= 10.7 mmacosx-version-min= -Xlinker ) \
+   %:version-compare(>= 10.7 mmacosx-version-min= -no_pie) } %= 10.6 mmacosx-version-min= -no_compact_unwind) "
+
 /* This is mostly a clone of the standard LINK_COMMAND_SPEC, plus
precomp, libtool, and fat build additions.
 
@@ -164,12 +197,6 @@ extern GTY(()) int darwin_ms_struct;
specifying the handling of options understood by generic Unix
linkers, and for positional arguments like libraries.  */
 
-#if LD64_HAS_EXPORT_DYNAMIC
-#define DARWIN_EXPORT_DYNAMIC " %{rdynamic:-export_dynamic}"
-#else
-#define DARWIN_EXPORT_DYNAMIC " %{rdynamic: %nrdynamic is not supported}"
-#endif
-
 #define LINK_COMMAND_SPEC_A \
"%{!fdump=*:%{!fsyntax-only:%{!c:%{!M:%{!MM:%{!E:%{!S:\
 %(linker)" \
@@ -190,10 +217,11 @@ extern GTY(()) int darwin_ms_struct;
   %{%:sanitize(address): -lasan } \
   %{%:sanitize(undefined): -lubsan } \
   %(link_ssp) \
-  " DARWIN_EXPORT_DYNAMIC " %http://www.gnu.org/licenses/>.  */
 
-/* Fix PR41260 by passing -no_compact_unwind on darwin10 and later until
-   unwinder in libSystem is fixed to digest new epilog unwinding notes.
+/* Fix PR47558 by linking against libSystem ahead of libgcc_ext. */
 
-   Fix PR47558 by linking against libSystem ahead of libgcc_ext. */
 #undef  LINK_GCC_C_SEQUENCE_SPEC
 #define LINK_GCC_C_SEQUENCE_SPEC \
-"%:version-compare(>= 10.6 mmacosx-version-min= -no_compact_unwind) \
- %{!static:%{!static-libgcc: \
+"%{!static:%{!static-libgcc: \
 %:version-compare(>= 10.6 mmacosx-version-min= -lSystem) } } \
- %{fno-pic|fno-PIC|fno-pie|fno-PIE|fapple-kext|mkernel|static|mdynamic-no-pic: 
\
-   %:version-compare(>= 10.7 mmacosx-version-min= -no_pie) } \
  %{!nostdlib:%:version-compare(>< 10.6 10.7 mmacosx-version-min= 
-ld10-uwfef.o)} \
   %G %{!nolibc:%L}"
 
diff --git a/gcc/config/darwin9.h b/gcc/config/darwin9.h
index ca5c517..1fd1604 100644
--- a/gcc/config/darwin9.h
+++ b/gcc/config/darwin9.h
@@ -35,12 +35,6 @@ along with GCC; see the file COPYING3.  If not see
 /* Tell collect2 to run dsymutil for us as necessary.  */
 #define COLLECT_RUN_DSYMUTIL 1
 
-#undef DARWIN_PIE_SPEC
-#define DARWIN_PIE_SPEC \
-  "%{fpie|pie|fPIE: \
- %{mdynamic-no-pic: %n'-mdynamic-no-pic' overrides '-pie', '-fpie' or 
'-fPIE'; \
-  :-pie}}"
-
 /* Only ask as for debug data if the debug style is stabs (since as doesn't
yet generate dwarf.)  */
 



Re: C++ PATCH for c++/60364 - noreturn after first decl not diagnosed (v2)

2019-06-19 Thread Jeff Law
On 6/17/19 12:10 PM, Marek Polacek wrote:
> On Mon, Jun 17, 2019 at 09:02:17AM -0600, Martin Sebor wrote:
>>> diff --git gcc/cp/tree.c gcc/cp/tree.c
>>> index cd021b7f594..bb695e14e73 100644
>>> --- gcc/cp/tree.c
>>> +++ gcc/cp/tree.c
>>> @@ -4453,6 +4453,8 @@ const struct attribute_spec std_attribute_table[] =
>>>   handle_likeliness_attribute, attr_cold_hot_exclusions },
>>> { "unlikely", 0, 0, false, false, false, false,
>>>   handle_likeliness_attribute, attr_cold_hot_exclusions },
>>> +  { "noreturn", 0, 0, true, false, false, false,
>>> +handle_noreturn_attribute, NULL },
>>   
>>
>> The GNU attribute is made mutually exclusive with a bunch of other
>> attributes (e.g., malloc or warn_unused_result) by setting the last
>> member to the array of exclusive attribute.  Does the change preserve
>> this relationship some other way?
> Oop, no, that is a bug.  I meant to go back to that, but I'd forgotten to add
> an XXX comment as I'm wont to, and the testsuite doesn't test that, so it
> slipped.  Fixed & new test added.  Thanks for catching it.
> 
> Also added a test for the scenario Jakub pointed out in the other mail.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2019-06-17  Marek Polacek  
> 
>   PR c++/60364 - noreturn after first decl not diagnosed.
>   * attribs.c (get_attribute_namespace): No longer static.
>   (decl_attributes): Avoid shadowing.  Preserve the C++11 form for C++11
>   attributes.
>   (attr_noreturn_exclusions): Make it extern.
>   * attribs.h (get_attribute_namespace): Declare.
>   * tree-inline.c (function_attribute_inlinable_p): Use
>   get_attribute_name.
> 
>   * c-attribs.c (handle_noreturn_attribute): No longer static.
>   * c-common.h (handle_noreturn_attribute, attr_noreturn_exclusions):
>   Declare.
>   * c-format.c (check_function_format): Use get_attribute_name.
> 
>   * decl.c (duplicate_decls): Give an error when a function is
>   declared [[noreturn]] after its first declaration.
>   * parser.c (cp_parser_std_attribute): Don't treat C++11 noreturn
>   attribute as equivalent to GNU's.
>   * tree.c (std_attribute_table): Add noreturn.
> 
>   * g++.dg/warn/noreturn-8.C: New test.
>   * g++.dg/warn/noreturn-9.C: New test.
>   * g++.dg/warn/noreturn-10.C: New test.
>   * g++.dg/warn/noreturn-11.C: New test.
> 
Turns out there was more generic stuff than C++ specific stuff here. So
I went ahead and walked through it.

OK for the trunk.

jeff


Re: PR libstdc++/90945 Patch to have pretty printer for std::vector return bool intead of int for elements

2019-06-19 Thread Jonathan Wakely

On 19/06/19 19:04 +0200, Michael Weghorn wrote:

Hi everyone,

the Python pretty printer for a 'std::vector' currently returns
integers as values for the elements, which e.g. leads to the situation
that a 'gdb.Value' constructed from that doesn't have 'bool' type, but
an integer type ('long long' for my test with gdb 8.2.1 on Debian
testing, amd64). Returning bool values ('True'/'False') would make sure
that the type is clear and can thus help improve the displayed type.
More details in [1].

The attached patch changes the behaviour of the pretty printer accordingly.


I'd be glad to receive feedback on this and also notes in case anything
else is needed. (This is my first contribution.)


Thanks, the patch looks fine and is small enough that we can accept it
without a copyright assignment, but if you plan to contribute again
you should look into https://gcc.gnu.org/contribute.html#legal

I think I'd prefer to have the 'elt' variable be the actual element
(not the unsigned long that contains the element) so I'll adjust the
patch to do this instead:

   elt = bool(self.item.dereference() & (1 << self.so))


So far, I've tested this with GDB 8.2.1 on Debian testing.


It looks like we don't have any tests in the testsuite for printing
vector, so I'll add one to verify this behaviour and commit your
patch - thanks!

I've attached what I'm testing and plan to commit.


commit 57396b03636d261fbce3f57cb901ed90405b0c8b
Author: Jonathan Wakely 
Date:   Wed Jun 19 20:32:45 2019 +0100

Have std::vector printer's iterator return bool for vector

Have the pretty-printer for 'std::vector' return a
value of type 'bool' rather than an 'int'.

This way, the type is clear and that can be used for better
display and a 'gdb.Value' constructed from the returned value
will have type 'bool' again, not e.g. 'long long' as happened
previously (at least with GDB 8.2.1 on amd64).

2019-06-19  Michael Weghorn  
Jonathan Wakely  

PR libstdc++/90945
* python/libstdcxx/v6/printers.py (StdVectorPrinter._iterator): Use
values of type bool for vector elements.
* testsuite/libstdc++-prettyprinters/simple.cc: Test vector.

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 05143153bee..cd79a1fa6e6 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -362,16 +362,12 @@ class StdVectorPrinter:
 if self.bitvec:
 if self.item == self.finish and self.so >= self.fo:
 raise StopIteration
-elt = self.item.dereference()
-if elt & (1 << self.so):
-obit = 1
-else:
-obit = 0
+elt = bool(self.item.dereference() & (1 << self.so))
 self.so = self.so + 1
 if self.so >= self.isize:
 self.item = self.item + 1
 self.so = 0
-return ('[%d]' % count, obit)
+return ('[%d]' % count, elt)
 else:
 if self.item == self.finish:
 raise StopIteration
@@ -382,7 +378,7 @@ class StdVectorPrinter:
 def __init__(self, typename, val):
 self.typename = strip_versioned_namespace(typename)
 self.val = val
-self.is_bool = val.type.template_argument(0).code  == gdb.TYPE_CODE_BOOL
+self.is_bool = val.type.template_argument(0).code == gdb.TYPE_CODE_BOOL
 
 def children(self):
 return self._iterator(self.val['_M_impl']['_M_start'],
@@ -422,6 +418,8 @@ class StdVectorIteratorPrinter:
 return 'non-dereferenceable iterator for std::vector'
 return str(self.val['_M_current'].dereference())
 
+# TODO add printer for vector's _Bit_iterator and _Bit_const_iterator
+
 class StdTuplePrinter:
 "Print a std::tuple"
 
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
index 60dfdc597f3..b7dbcb67d5a 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
@@ -116,6 +116,16 @@ main()
   std::vector::iterator viter0;
 // { dg-final { note-test viter0 {non-dereferenceable iterator for std::vector} } }
 
+  std::vector vb;
+  vb.reserve(100);
+  vb.push_back(true);
+  vb.push_back(true);
+  vb.push_back(false);
+  vb.push_back(false);
+  vb.push_back(true);
+  vb.erase(vb.begin());
+// { dg-final { regexp-test vb {std::(__debug::)?vector of length 4, capacity 100 = \\{true, false, false, true\\}} } }
+
   __gnu_cxx::slist sll;
   sll.push_front(23);
   sll.push_front(47);


Re: [PATCH] implement -Wformat-diag, v2

2019-06-19 Thread Martin Sebor

On 6/19/19 10:46 AM, Jeff Law wrote:

On 6/18/19 1:21 PM, Martin Sebor wrote:

On 6/18/19 12:59 PM, Jeff Law wrote:

On 5/22/19 10:42 AM, Martin Sebor wrote:

Attached is a revised implementation of the -Wformat-diag checker
incorporating the feedback I got on the first revision.

Martin

gcc-wformat-diag-checker.diff

gcc/c-family/ChangeLog:

 * c-format.c (function_format_info::format_type): Adjust type.
 (function_format_info::is_raw): New member.
 (decode_format_type): Adjust signature.  Handle "raw" diag
attributes.
 (decode_format_attr): Adjust call to decode_format_type.
 Avoid a redundant call to convert_format_name_to_system_name.
 Avoid abbreviating the word "arguments" in a diagnostic.
 (format_warning_substr): New function.
 (avoid_dollar_number): Quote dollar sign in a diagnostic.
 (finish_dollar_format_checking): Same.
 (check_format_info): Same.
 (struct baltoks_t): New.
 (c_opers, c_keywords, cxx_keywords, badwords, contrs): New arrays.
 (maybe_diag_unbalanced_tokens, check_tokens, check_plain): New
 functions.
 (check_format_info_main): Call check_plain.  Use baltoks_t.  Call
 maybe_diag_unbalanced_tokens.
 (handle_format_attribute): Spell out the word "arguments" in
 a diagnostic.

gcc/testsuite/ChangeLog:
 * gcc.dg/format/gcc_diag-11.c: New test.

High level comment.  This is painful to read.  But that's probably an
artifact of trying to cobble together C code to parse strings and codify
the conventions.    ie, it's likely inherent for the problem you're
trying to solve.


It wasn't exactly a lot of fun to write either.  I suspect it
would have been even worse if I had used regular expressions.
It is more complicated than strictly necessary because it's
trying to balance usability of the warning with efficiency.
(Although I'm sure both could be improved.)


diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index a7f76c1c01d..713fce442c9 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -320,10 +352,8 @@ decode_format_attr (const_tree fntype, tree
atname, tree args,
   {
     const char *p = IDENTIFIER_POINTER (format_type_id);
   -  p = convert_format_name_to_system_name (p);
+  info->format_type = decode_format_type (p, &info->is_raw);
   -  info->format_type = decode_format_type (p);
-
     if (!c_dialect_objc ()
  && info->format_type == gcc_objc_string_format_type)
   {

Did you mean to drop the call to convert_format_name_to_system_name?


Yes, it's redundant (it's the first thing decode_format_type does).


@@ -2789,6 +2850,904 @@ check_argument_type (const format_char_info
*fci,
     return true;
   }
   +
+/* Helper for initializing global token_t arrays below.  */
+#define NAME(name) { name, sizeof name - 1, NULL }
+
+/* C/C++ operators that are expected to be quoted within the format
+   string.  */
+
+static const token_t c_opers[] =
+  {
+   NAME ("!="), NAME ("%="),  NAME ("&&"),  NAME ("&="), NAME ("*="),
+   NAME ("++"), NAME ("+="),  NAME ("--"),  NAME ("-="), NAME ("->"),
+   NAME ("/="), NAME ("<<"),  NAME ("<<="), NAME ("<="), NAME ("=="),
+   NAME (">="), NAME (">>="), NAME (">>"),  NAME ("?:"),  NAME ("^="),
+   NAME ("|="), NAME ("||")
+  };

This clearly isn't a full list of operators.  Is there a particular
reason why we aren't diagnosing others.  I guess you're catching the
single character operators via the ISPUNCT checks?  That seems a little
loose (@ isn't an operator for example).  It  may be OK in practice
though.


Yes, it only handles two-character operators and its only purpose
is to make diagnostics more readable and less repetitive (otherwise
we'd get one for each occurrence of the punctuator). I think @ is
an operator in Objective-C (I only know this because I fixed a few
instances of it there).


+
+  if (nchars)
+    {
+  /* This is the most common problem: go the extra mile to decribe

s/decribe/describe/




+
+static void
+maybe_diag_unbalanced_tokens (location_t format_string_loc,
+  const char *orig_format_chars,
+  tree format_string_cst,
+  baltoks_t &baltoks)

Needs a function comment.



@@ -2828,10 +3789,26 @@ check_format_info_main (format_check_results
*res,
       init_dollar_format_checking (info->first_arg_num,
first_fillin_param);
   +  /* In GCC diagnostic functions check plain directives
(substrings within
+ the format string that don't start with %) for quoting and
punctuations
+ problems.  */
+  bool ck_plain = (!info->is_raw
+   && (info->format_type == gcc_diag_format_type
+   || info->format_type == gcc_tdiag_format_type
+   || info->format_type == gcc_cdiag_format_type
+   || info->format_type == gcc_cxxdiag_format_type));
+
     while (*format_chars != 0)
   {
-  if (*format_chars++ != '%')
+  if (ck_plain)
+    format_chars = check_plain (format_string_loc,
+ 

Re: PR libstdc++/90945 Patch to have pretty printer for std::vector return bool intead of int for elements

2019-06-19 Thread Michael Weghorn
Thank you for the quick reply!

On 19/06/2019 21.37, Jonathan Wakely wrote:
> Thanks, the patch looks fine and is small enough that we can accept it
> without a copyright assignment, but if you plan to contribute again
> you should look into https://gcc.gnu.org/contribute.html#legal

I'll do as soon as I plan to submit another patch.

> I think I'd prefer to have the 'elt' variable be the actual element
> (not the unsigned long that contains the element) so I'll adjust the
> patch to do this instead:
> 
>    elt = bool(self.item.dereference() & (1 << self.so))
> 
>> So far, I've tested this with GDB 8.2.1 on Debian testing.
> 
> It looks like we don't have any tests in the testsuite for printing
> vector, so I'll add one to verify this behaviour and commit your
> patch - thanks!
> 
> I've attached what I'm testing and plan to commit.

This sounds all reasonable, just one comment on the test:

> +  std::vector vb;
> +  vb.reserve(100);
> +  vb.push_back(true);
> +  vb.push_back(true);
> +  vb.push_back(false);
> +  vb.push_back(false);
> +  vb.push_back(true);
> +  vb.erase(vb.begin());
> +// { dg-final { regexp-test vb {std::(__debug::)?vector of length 4, 
> capacity 100 = \\{true, false, false, true\\}} } }
> +

This inserts 5 elements, so I'd expect that either "vector of length 5"
and an additional "true" element at the beginning need to be added for
the expected result or one of the two first 'vb.push_back(true)' needs
to be removed.

Thanks again.



signature.asc
Description: OpenPGP digital signature


Re: PR libstdc++/90945 Patch to have pretty printer for std::vector return bool intead of int for elements

2019-06-19 Thread Jonathan Wakely

On 19/06/19 21:49 +0200, Michael Weghorn wrote:

Thank you for the quick reply!

On 19/06/2019 21.37, Jonathan Wakely wrote:

Thanks, the patch looks fine and is small enough that we can accept it
without a copyright assignment, but if you plan to contribute again
you should look into https://gcc.gnu.org/contribute.html#legal


I'll do as soon as I plan to submit another patch.


I think I'd prefer to have the 'elt' variable be the actual element
(not the unsigned long that contains the element) so I'll adjust the
patch to do this instead:

   elt = bool(self.item.dereference() & (1 << self.so))


So far, I've tested this with GDB 8.2.1 on Debian testing.


It looks like we don't have any tests in the testsuite for printing
vector, so I'll add one to verify this behaviour and commit your
patch - thanks!

I've attached what I'm testing and plan to commit.


This sounds all reasonable, just one comment on the test:


+  std::vector vb;
+  vb.reserve(100);
+  vb.push_back(true);
+  vb.push_back(true);
+  vb.push_back(false);
+  vb.push_back(false);
+  vb.push_back(true);
+  vb.erase(vb.begin());
+// { dg-final { regexp-test vb {std::(__debug::)?vector of length 4, capacity 
100 = \\{true, false, false, true\\}} } }
+


This inserts 5 elements, so I'd expect that either "vector of length 5"
and an additional "true" element at the beginning need to be added for
the expected result or one of the two first 'vb.push_back(true)' needs
to be removed.


It inserts five then erases one, the test is right.

Except that it should be capacity=128, because the capacity increases
in units of 64 bits. of course.





Re: PR libstdc++/90945 Patch to have pretty printer for std::vector return bool intead of int for elements

2019-06-19 Thread Michael Weghorn
On 19/06/2019 21.54, Jonathan Wakely wrote:
>>> +  std::vector vb;
>>> +  vb.reserve(100);
>>> +  vb.push_back(true);
>>> +  vb.push_back(true);
>>> +  vb.push_back(false);
>>> +  vb.push_back(false);
>>> +  vb.push_back(true);
>>> +  vb.erase(vb.begin());
>>> +// { dg-final { regexp-test vb {std::(__debug::)?vector of length 4,
>>> capacity 100 = \\{true, false, false, true\\}} } }
>>> +
>>
>> This inserts 5 elements, so I'd expect that either "vector of length 5"
>> and an additional "true" element at the beginning need to be added for
>> the expected result or one of the two first 'vb.push_back(true)' needs
>> to be removed.
> 
> It inserts five then erases one, the test is right.
> 
> Except that it should be capacity=128, because the capacity increases
> in units of 64 bits. of course.

Yes, of course. Sorry, I was missing that.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Wrap 'expand_all_functions' and 'ipa_passes' around timevars

2019-06-19 Thread Giuliano Belinassi
Hi,

Oh, I have completely forgotten about this patch

On 06/19, Jeff Law wrote:
> On 1/24/19 12:51 PM, Giuliano Belinassi wrote:
> > This patch adds two variables named 'TV_CGRAPH_FUNC_EXPANSION' and
> > 'TV_CGRAPH_IPA_PASSES' that count the elapsed time of the functions
> > 'expand_all_functions' and 'ipa_passes', respectivelly.
> > 
> > The main point of this is that these functions takes a very long time
> > when compiling the 'gimple-match.c' file, and therefore may also take
> > a long time when compiling other large files.
> > 
> > I also accept suggestions about how to improve this :-)
> > 
> > ChangeLog:
> > 
> > 2019-01-24  Giuliano Belinassi 
> > 
> > * cgraph_unit.c (compile): TV_CGRAPH_FUNC_EXPANSION and
> > TV_CGRAPH_IPA_PASSES start, stop.
> > * timevar.def (TV_CGRAPH_IPA_PASSES, TV_CGRAPH_FUNC_EXPANSION): New.
> So I'm guessing you want the accumulated time for the ipa_passes and
> expansion.  So independent counters using timevar_{start,stop} seem right.

Yes, my point was to accumulate the total time spent in those function,
including everything these functions calls.

With regard to breaking the timevar with IPA, GIMPLE and RTL expansion
passes, I can do this but it will require splitting `all_passes` into
`all_passes` and `all_rtl_passes`, as suggested by richi in the
parallelization thread. I can do this fairly easily since I have
already done it done in my branch. Is it OK?

> 
> The alternate approach would be to use the timevar_{push,pop}. This
> would change what timevar is on the top of the stack when we call
> ipa_passes and expand_all_functions -- it would, in effect, allow us to
> determine how much time is spent in those functions which is _not_
> attributable to existing timevars.
> 
> I just want to confirm your intent before acking/naking.
> 
> jeff

Giuliano


Re: [wwwdocs] Announce PRU backend

2019-06-19 Thread Dimitar Dimitrov
On сряда, 19 юни 2019 г. 13:23:01 EEST Jeff Law wrote:
> On 6/19/19 11:06 AM, Dimitar Dimitrov wrote:
> > Hi,
> > 
> > This WWW update announces the new PRU port in WWW docs, and fills in the
> > backend characteristics.
> OK
> jeff
Thank you. Pushed to CVS.

Dimitar



[PATCH] PR fortran/86587 -- PRIVATE and BIND(C) are allowed for derived type

2019-06-19 Thread Steve Kargl
Revision 126185 introduced ISO C Binding to gfortran.
In that revision, a check for a conflict between a
derived type with the PRIVATE attribute and BIND(C) was
introduced.  After checking the F2003, F2008, and F2018
standards, I cannot find this restriction.  Thus, the
check is removed by the attached patch.  Regression 
checked on x86_64-*-freebsd.  OK to commit?

2019-06-19  Steven G. Kargl  

PR fortran/86587
* symbol.c (verify_bind_c_derived_type): Remove erroneous error
checking for BIND(C) and PRIVATE attributes.

2019-06-19  Steven G. Kargl  

PR fortran/86587
* gfortran.dg/pr86587.f90: New test.

-- 
Steve
Index: gcc/fortran/symbol.c
===
--- gcc/fortran/symbol.c	(revision 272481)
+++ gcc/fortran/symbol.c	(working copy)
@@ -4529,16 +4529,6 @@ verify_bind_c_derived_type (gfc_symbol *derived_sym)
   curr_comp = curr_comp->next;
 } while (curr_comp != NULL);
 
-
-  /* Make sure we don't have conflicts with the attributes.  */
-  if (derived_sym->attr.access == ACCESS_PRIVATE)
-{
-  gfc_error ("Derived type %qs at %L cannot be declared with both "
- "PRIVATE and BIND(C) attributes", derived_sym->name,
- &(derived_sym->declared_at));
-  retval = false;
-}
-
   if (derived_sym->attr.sequence != 0)
 {
   gfc_error ("Derived type %qs at %L cannot have the SEQUENCE "
Index: gcc/testsuite/gfortran.dg/pr86587.f90
===
--- gcc/testsuite/gfortran.dg/pr86587.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr86587.f90	(working copy)
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! PR fortran/86587
+! Code contirubted by Valentin Clement 
+!
+module mod1
+   use iso_c_binding
+   type, bind(c), private :: mytype
+  integer(c_int) :: i1, i2
+   end type
+end module mod1
+
+module mod2
+  use iso_c_binding
+  private
+  type, bind(c) :: mytype
+integer(c_int) :: i1, i2
+  end type
+end module mod2


Re: [PATCH] improve strcmp folding of unequal strings (PR 90626)

2019-06-19 Thread Martin Sebor

On 6/14/19 2:03 PM, Jeff Law wrote:

On 6/13/19 5:50 PM, Martin Sebor wrote:

While integrating the strlen and sprintf passes and investigating
optimization opportunities that it opens up I noticed a few related
to a strcmp optimization implemented in GCC 9.  One is to take
advantage of the fact that a nul-terminated string of a known
length cannot be equal to a string whose length is greater, even
if it isn't known exactly.  For example, the equality below must
be false:

   int f (char * restrict a, char * restrict b)
   {
 memcpy (a, "1234", 4);   // length >= 4
 strcpy (b, "123");   // length == 3

 return strcmp (a, b) == 0;   // must be false
   }

The attached patch enhances the existing optimization to exploit
this opportunity.

Tested on x86_64-linux.

Martin

PS There are several more improvements to be made here.  I've been
raising bugs for them and I plan to submit patches for them in
the near future.  (Incidentally, they are all in the spirit of
the suggestion made back in 2012 in pr52171.)

gcc-90626.diff

PR tree-optimization/90626 - fold strcmp(a, b) == 0 to zero when one string 
length is exact and the other is unequal

gcc/ChangeLog:

PR tree-optimization/90626
* tree-ssa-strlen.c (strxcmp_unequal): New function.
(handle_builtin_string_cmp): Call it.

gcc/testsuite/ChangeLog:

PR tree-optimization/90626
* gcc.dg/strlenopt-65.c: New test.
* gcc.dg/strlenopt-66.c: New test.
* gcc.dg/strlenopt.h (strcmp, strncmp): Declare.

OK.  Good to see we're able to extend Qing's work to handle more cases
without major surgery.


I had a couple of typos in there that I had overlooked along
with a test failure they caused.  I fixed those in the followup
r272487.  Sorry about that.

Martin


Re: PR libstdc++/90945 Patch to have pretty printer for std::vector return bool intead of int for elements

2019-06-19 Thread Jonathan Wakely

On 19/06/19 21:58 +0200, Michael Weghorn wrote:

On 19/06/2019 21.54, Jonathan Wakely wrote:

+  std::vector vb;
+  vb.reserve(100);
+  vb.push_back(true);
+  vb.push_back(true);
+  vb.push_back(false);
+  vb.push_back(false);
+  vb.push_back(true);
+  vb.erase(vb.begin());
+// { dg-final { regexp-test vb {std::(__debug::)?vector of length 4,
capacity 100 = \\{true, false, false, true\\}} } }
+


This inserts 5 elements, so I'd expect that either "vector of length 5"
and an additional "true" element at the beginning need to be added for
the expected result or one of the two first 'vb.push_back(true)' needs
to be removed.


It inserts five then erases one, the test is right.

Except that it should be capacity=128, because the capacity increases
in units of 64 bits. of course.


Yes, of course. Sorry, I was missing that.


Here's the patch that I've committed to trunk, after testing.

This is probably OK to backport to the release branches as well, so
I'll add that to my TODO list.

Thanks again for the patch!


commit 6c7d761a3f5fd7d19795d1d4b9b027a04b3fe88b
Author: redi 
Date:   Wed Jun 19 22:57:06 2019 +

Have std::vector printer's iterator return bool for vector

Have the pretty-printer for 'std::vector' return a
value of type 'bool' rather than an 'int'.

This way, the type is clear and that can be used for better
display and a 'gdb.Value' constructed from the returned value
will have type 'bool' again, not e.g. 'long long' as happened
previously (at least with GDB 8.2.1 on amd64).

2019-06-19  Michael Weghorn  
Jonathan Wakely  

PR libstdc++/90945
* python/libstdcxx/v6/printers.py (StdVectorPrinter._iterator): Use
values of type bool for vector elements.
* testsuite/libstdc++-prettyprinters/simple.cc: Test vector.
* testsuite/libstdc++-prettyprinters/simple11.cc: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272490 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 05143153bee..cd79a1fa6e6 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -362,16 +362,12 @@ class StdVectorPrinter:
 if self.bitvec:
 if self.item == self.finish and self.so >= self.fo:
 raise StopIteration
-elt = self.item.dereference()
-if elt & (1 << self.so):
-obit = 1
-else:
-obit = 0
+elt = bool(self.item.dereference() & (1 << self.so))
 self.so = self.so + 1
 if self.so >= self.isize:
 self.item = self.item + 1
 self.so = 0
-return ('[%d]' % count, obit)
+return ('[%d]' % count, elt)
 else:
 if self.item == self.finish:
 raise StopIteration
@@ -382,7 +378,7 @@ class StdVectorPrinter:
 def __init__(self, typename, val):
 self.typename = strip_versioned_namespace(typename)
 self.val = val
-self.is_bool = val.type.template_argument(0).code  == gdb.TYPE_CODE_BOOL
+self.is_bool = val.type.template_argument(0).code == gdb.TYPE_CODE_BOOL
 
 def children(self):
 return self._iterator(self.val['_M_impl']['_M_start'],
@@ -422,6 +418,8 @@ class StdVectorIteratorPrinter:
 return 'non-dereferenceable iterator for std::vector'
 return str(self.val['_M_current'].dereference())
 
+# TODO add printer for vector's _Bit_iterator and _Bit_const_iterator
+
 class StdTuplePrinter:
 "Print a std::tuple"
 
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
index 60dfdc597f3..04c1ef683a6 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple.cc
@@ -116,6 +116,16 @@ main()
   std::vector::iterator viter0;
 // { dg-final { note-test viter0 {non-dereferenceable iterator for std::vector} } }
 
+  std::vector vb;
+  vb.reserve(100);
+  vb.push_back(true);
+  vb.push_back(true);
+  vb.push_back(false);
+  vb.push_back(false);
+  vb.push_back(true);
+  vb.erase(vb.begin());
+// { dg-final { regexp-test vb {std::(__debug::)?vector of length 4, capacity 128 = \\{true, false, false, true\\}} } }
+
   __gnu_cxx::slist sll;
   sll.push_front(23);
   sll.push_front(47);
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc
index b18f5cc22a9..ace217cc9e8 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/simple11.cc
@@ -109,6 +109,16 @@ main()
   std::vector::iterator viter0;
 // { dg-fina

[PATCH] PR libstdc++/90920 restore previous checks for empty ranges

2019-06-19 Thread Jonathan Wakely

The change in r263433 broke the contract of the __rotate functions, by no
longer accepting empty ranges. That means that callers which inlined the
old version of std::rotate (without checks) that end up linking to a new
definition of std::__rotate (also without checks) could perform a divide
by zero and crash.

This restores the old contract of the __rotate overloads.

PR libstdc++/90920 partially revert r263433
* include/bits/stl_algo.h (__rotate): Restore checks for empty ranges.
(rotate): Remove checks.
* testsuite/25_algorithms/rotate/90920.cc: New test.


Tested x86_64-linux, committed to trunk.

This needs to be backported to gcc-9-branch too.

commit 100ba82c20a03d6ff316eb34fb46fe29e5c920fc
Author: redi 
Date:   Wed Jun 19 22:57:02 2019 +

PR libstdc++/90920 restore previous checks for empty ranges

The change in r263433 broke the contract of the __rotate functions, by no
longer accepting empty ranges. That means that callers which inlined the
old version of std::rotate (without checks) that end up linking to a new
definition of std::__rotate (also without checks) could perform a divide
by zero and crash.

This restores the old contract of the __rotate overloads.

PR libstdc++/90920 partially revert r263433
* include/bits/stl_algo.h (__rotate): Restore checks for empty 
ranges.
(rotate): Remove checks.
* testsuite/25_algorithms/rotate/90920.cc: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272489 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index ca957e0b9a7..ac21c55ca1b 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -1251,6 +1251,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _ForwardIterator __last,
 forward_iterator_tag)
 {
+  if (__first == __middle)
+   return __last;
+  else if (__last == __middle)
+   return __first;
+
   _ForwardIterator __first2 = __middle;
   do
{
@@ -1291,6 +1296,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_Mutable_BidirectionalIteratorConcept<
  _BidirectionalIterator>)
 
+  if (__first == __middle)
+   return __last;
+  else if (__last == __middle)
+   return __first;
+
   std::__reverse(__first,  __middle, bidirectional_iterator_tag());
   std::__reverse(__middle, __last,   bidirectional_iterator_tag());
 
@@ -1324,6 +1334,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_Mutable_RandomAccessIteratorConcept<
  _RandomAccessIterator>)
 
+  if (__first == __middle)
+   return __last;
+  else if (__last == __middle)
+   return __first;
+
   typedef typename iterator_traits<_RandomAccessIterator>::difference_type
_Distance;
   typedef typename iterator_traits<_RandomAccessIterator>::value_type
@@ -1425,11 +1440,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_requires_valid_range(__first, __middle);
   __glibcxx_requires_valid_range(__middle, __last);
 
-  if (__first == __middle)
-   return __last;
-  else if (__last  == __middle)
-   return __first;
-
   return std::__rotate(__first, __middle, __last,
   std::__iterator_category(__first));
 }
diff --git a/libstdc++-v3/testsuite/25_algorithms/rotate/90920.cc 
b/libstdc++-v3/testsuite/25_algorithms/rotate/90920.cc
new file mode 100644
index 000..cae3ae748d8
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/rotate/90920.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run }
+
+#include 
+#include 
+
+namespace gnu_test
+{
+  // This is the definition from GCC 8.x, with no checks for first==middle
+  // or middle==last.
+  template
+inline _ForwardIterator
+rotate(_ForwardIterator __first, _ForwardIterator __middle,
+  _ForwardIterator __last)
+{
+  return std::__rotate(__first, __middle, __last,
+  std::__iterator_category(__first));
+}
+}
+
+void
+test01()
+{

[PATCH] Fix non-standard behaviour of std::istream_iterator

2019-06-19 Thread Jonathan Wakely

The current implementation of istream_iterator allows the iterator to be
reused after reaching end-of-stream, so that subsequent reads from the
stream can succeed (e.g. if the stream state has been cleared and stream
position changed from EOF). The P0738R2 paper clarified that the
expected behaviour is to set the stream pointer to null after reaching
end-of-stream, preventing further reads.

This implements that requirement, and adds the new default constructor
to std::ostream_iterator.

* include/bits/stream_iterator.h (istream_iterator::_M_equal()): Make
private.
(istream_iterator::_M_read()): Do not check stream state before
attempting extraction. Set stream pointer to null when extraction
fails (P0738R2).
(operator==(const istream_iterator&, const istream_iterator&)): Change
to be a hidden friend of istream_iterator.
(operator!=(const istream_iterator&, const istream_iterator&)):
Likewise.
(ostream_iterator::ostream_iterator()): Add default constructor.
(ostream_iterator::ostream_iterator(ostream_type*, const C*)): Use
addressof.
* testsuite/24_iterators/istream_iterator/1.cc: New test.
* testsuite/24_iterators/ostream_iterator/1.cc: New test.
* testsuite/24_iterators/ostream_iterator/70766.cc: Also check
constructor taking a string.
* testsuite/24_iterators/ostream_iterator/requirements/constexpr.cc:
New test.

Tested x86_64-linux, committed to trunk.

commit 4af63f8369114e24f1ddab531f06fb3902a56bf4
Author: redi 
Date:   Wed Jun 19 22:57:10 2019 +

Fix non-standard behaviour of std::istream_iterator

The current implementation of istream_iterator allows the iterator to be
reused after reaching end-of-stream, so that subsequent reads from the
stream can succeed (e.g. if the stream state has been cleared and stream
position changed from EOF). The P0738R2 paper clarified that the
expected behaviour is to set the stream pointer to null after reaching
end-of-stream, preventing further reads.

This implements that requirement, and adds the new default constructor
to std::ostream_iterator.

* include/bits/stream_iterator.h (istream_iterator::_M_equal()): 
Make
private.
(istream_iterator::_M_read()): Do not check stream state before
attempting extraction. Set stream pointer to null when extraction
fails (P0738R2).
(operator==(const istream_iterator&, const istream_iterator&)): 
Change
to be a hidden friend of istream_iterator.
(operator!=(const istream_iterator&, const istream_iterator&)):
Likewise.
(ostream_iterator::ostream_iterator()): Add default constructor.
(ostream_iterator::ostream_iterator(ostream_type*, const C*)): Use
addressof.
* testsuite/24_iterators/istream_iterator/1.cc: New test.
* testsuite/24_iterators/ostream_iterator/1.cc: New test.
* testsuite/24_iterators/ostream_iterator/70766.cc: Also check
constructor taking a string.
* testsuite/24_iterators/ostream_iterator/requirements/constexpr.cc:
New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272491 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/stream_iterator.h 
b/libstdc++-v3/include/bits/stream_iterator.h
index bc3353ab10f..247ad5376dd 100644
--- a/libstdc++-v3/include/bits/stream_iterator.h
+++ b/libstdc++-v3/include/bits/stream_iterator.h
@@ -57,6 +57,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 private:
   istream_type*_M_stream;
   _Tp  _M_value;
+  // This bool becomes false at end-of-stream. It should be sufficient to
+  // check _M_stream != nullptr instead, but historically we did not set
+  // _M_stream to null when reaching the end, so we need to keep this flag.
   bool _M_ok;
 
 public:
@@ -66,7 +69,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   ///  Construct start of input stream iterator.
   istream_iterator(istream_type& __s)
-  : _M_stream(std::__addressof(__s))
+  : _M_stream(std::__addressof(__s)), _M_ok(true)
   { _M_read(); }
 
   istream_iterator(const istream_iterator& __obj)
@@ -76,6 +79,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cplusplus >= 201103L
   istream_iterator& operator=(const istream_iterator&) = default;
+  ~istream_iterator() = default;
 #endif
 
   const _Tp&
@@ -111,37 +115,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return __tmp;
   }
 
+private:
   bool
   _M_equal(const istream_iterator& __x) const
-  { return (_M_ok == __x._M_ok) && (!_M_ok || _M_stream == __x._M_stream); 
}
+  {
+   // Ideally this would just return _M_stream == __x._M_stream,
+   // but code compiled with old versions never sets _M_stream to null.
+

Re: [PATCH,RFC,V2 2/3] Add CTF command line options : -gtLEVEL

2019-06-19 Thread Indu Bhagat



On 06/18/2019 12:24 PM, Bernhard Reutner-Fischer wrote:

On 12 June 2019 20:00:09 CEST, Indu Bhagat  wrote:

-gtLEVEL is used to request CTF debug information and also to specify
how much
CTF debug information.

The option name is way too generic IMO.
-gctfLEVEL or some such would at least  indicate its intended purpose, fwiw.

thanks


I thought about this too. The prime reason for -gtLEVEL is consistency with the
DWARF debug info command line options.

For DWARF, we have -gdwarf-VERSION and -gLEVEL. Although GCC, at this time, will
generate only the most recent CTF version (CTF_VERSION_3 in ); I don't
know for sure if the compiler will need to support the CTF_VERSION_3 and higher
simultaneously in the future. If we do, then -gctf-VERSION and -gtLEVEL will be
the most palatable choice for generating Type debug information (Type, hence
the "t" in gt) of various versions.

Thanks
Indu



C++ PATCH for c++/64235 - missing syntax error with invalid alignas

2019-06-19 Thread Marek Polacek
We are wrongly accepting invalid code like:

  struct alignas(16 S2 { }; // missing )

The reason is that cp_parser_type_specifier uses tentative parsing to see if
we're dealing with a class-specifier, and if that doesn't work, it looks for
an elaborated-type-specifier.  When trying to parse it as a class-specifier,
we use cp_parser_class_head which parses any attributes after the class-key
via cp_parser_attributes_opt, and that includes an alignment-specifier.

So we're parsing the alignment-specifier, find the missing ')', but since we
are parsing tentatively, no error is issued.  Then we get back to
cp_parser_class_head, and that commits:

23862   /* At this point, we're going ahead with the class-specifier, even
23863  if some other problem occurs.  */
23864   cp_parser_commit_to_tentative_parse (parser);

so we lose the error.  Fixed as below.  Other approach would be to use
a tentative_firewall, but that would be a much bigger hammer.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-06-19  Marek Polacek  

PR c++/64235 - missing syntax error with invalid alignas.
* parser.c (cp_parser_std_attribute_spec): Commit to tentative parse
if there's a missing close paren.

* g++.dg/parse/alignas1.C: New test.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index 6b4aab8a12f..f88652f1f82 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -26371,6 +26371,12 @@ cp_parser_std_attribute_spec (cp_parser *parser)
   if (alignas_expr == error_mark_node)
return error_mark_node;
 
+  /* Missing ')' means the code cannot possibly be valid; go ahead
+and commit to make sure we issue a hard error.  */
+  if (cp_parser_uncommitted_to_tentative_parse_p (parser)
+ && cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN))
+   cp_parser_commit_to_tentative_parse (parser);
+
   if (!parens.require_close (parser))
return error_mark_node;
 
diff --git gcc/testsuite/g++.dg/parse/alignas1.C 
gcc/testsuite/g++.dg/parse/alignas1.C
new file mode 100644
index 000..322c0f434f9
--- /dev/null
+++ gcc/testsuite/g++.dg/parse/alignas1.C
@@ -0,0 +1,10 @@
+// PR c++/64235
+// { dg-do compile { target c++11 } }
+
+struct S {
+  double d;
+};
+
+struct alignas(sizeof(S) S1 { }; // { dg-error "expected '\\)' before 'S1'" }
+struct alignas(16 S2 { }; // { dg-error "expected '\\)' before 'S2'" }
+struct alignas(int S3 { }; // { dg-error "expected '\\)' before 'S3'" }


Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-19 Thread luoxhu




On 2019/6/19 20:18, Martin Liška wrote:

On 6/19/19 10:56 AM, Martin Liška wrote:

Thank you very much for the numbers. Today, I'm going to prepare the 
generalization of single-value counter to track N values.


Ok, here's a patch candidate that does tracking of most common N values. For 
your test-case I can see:

pr69678.gcda:01a9:  18:COUNTERS indirect_call 9 counts
pr69678.gcda:   0: 35000 1868707024 17500 969338501 
17500 0 0 0
pr69678.gcda:   8: 0

So for now, you'll need to generalize get_most_common_single_value to return
N most common values.

Eventually we'll need to renamed the counter as it won't be tracking just a 
single value
any longer. I can take care of it.

Can you please verify that the patch candidate works for you?

Thanks, the profile data seems good, I will try it.  I need rebase my patch
to trunk first, as there are many conflicts with your previous patch.



Thanks,
Martin





[PATCH] Adding RBIT gcc builtin for ARM

2019-06-19 Thread Ayan Shafqat
The attached patch contains __builtin_arm_rbit which generates RBIT 
instruction for ARM targets.


Please let me know if you any questions or comments, or commit this 
patch for me as I do not have write access to SVN.


Thanks
Ayan

commit a692b5b4965840babbdaf5e2b9b1feb1995d351d
Author: Ayan Shafqat 
Date:   Mon Jun 17 21:46:54 2019 -0400

Implementing RBIT builtin as described in ACLE doc

ARM's RBIT instruction is used to reverse the bit order
of a word. This is present in ARMv6 and above in both
ARM and Thumb modes. This is also specified as an intrinsic
function in ACLE documentation.

This commit implements the GCC builtin for ARM target for
RBIT instruction, __builtin_arm_rbit. Also, this implements
the intrinsic functions as stated in ARM ACLE documentation,
which are listed below:

uint32_t __rbit(uint32_t x);
unsigned long __rbitl(unsigned long x);
uint64_t __rbitll(uint64_t x);

Note: __rbitll is implemented as two calls to __rbit. I know
this is not how it's done in AArch64, but this is what I can
do for now.

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ae582172ab9..83dcb7b411c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11568,6 +11568,13 @@
   [(set_attr "predicable" "yes")
(set_attr "type" "clz")])

+(define_insn "rbit"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+   (unspec:SI [(match_operand:SI 1 "s_register_operand" "r")] 
UNSPEC_RBIT))]
+  "TARGET_32BIT && arm_arch_thumb2"
+  "rbit%?\\t%0, %1"
+  [(set_attr "predicable" "yes")])
+
 (define_insn "rbitsi2"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
(unspec:SI [(match_operand:SI 1 "s_register_operand" "r")] 
UNSPEC_RBIT))]
diff --git a/gcc/config/arm/arm_acle.h b/gcc/config/arm/arm_acle.h
index 2c7acc698ea..ce1b102444b 100644
--- a/gcc/config/arm/arm_acle.h
+++ b/gcc/config/arm/arm_acle.h
@@ -168,6 +168,29 @@ __arm_mrrc2 (const unsigned int __coproc, const 
unsigned int __opc1,

 {
   return __builtin_arm_mrrc2 (__coproc, __opc1,  __CRm);
 }
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+__rbit(uint32_t __op1)
+{
+  return __builtin_arm_rbit(__op1);
+}
+
+__extension__ static __inline uint64_t __attribute__ ((__always_inline__))
+__rbitll(uint64_t __op1)
+{
+  return (((uint64_t)__rbit(__op1)) << 32U) | __rbit(__op1 >> 32U);
+}
+
+__extension__ static __inline unsigned long __attribute__ 
((__always_inline__))

+__rbitl(unsigned long __op1)
+{
+#if __SIZEOF_LONG__ == 4
+  return __rbit(__op1);
+#else
+  return __rbitll(__op1);
+#endif
+}
+
 #endif /* __ARM_ARCH >= 6.  */
 #endif /* __ARM_ARCH >= 6 ||  defined (__ARM_ARCH_5TE__).  */
 #endif /*  __ARM_ARCH >= 5.  */
diff --git a/gcc/config/arm/arm_acle_builtins.def 
b/gcc/config/arm/arm_acle_builtins.def

index b2438d66da2..ecb3be491fc 100644
--- a/gcc/config/arm/arm_acle_builtins.def
+++ b/gcc/config/arm/arm_acle_builtins.def
@@ -24,6 +24,7 @@ VAR1 (UBINOP, crc32w, si)
 VAR1 (UBINOP, crc32cb, si)
 VAR1 (UBINOP, crc32ch, si)
 VAR1 (UBINOP, crc32cw, si)
+VAR1 (UBINOP, rbit, si)
 VAR1 (CDP, cdp, void)
 VAR1 (CDP, cdp2, void)
 VAR1 (LDC, ldc, void)
diff --git a/gcc/testsuite/gcc.target/arm/acle/rbit.c 
b/gcc/testsuite/gcc.target/arm/acle/rbit.c

new file mode 100644
index 000..7803dd33615
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/rbit.c
@@ -0,0 +1,18 @@
+/* Test the crc32d ACLE intrinsic.  */
+
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_crc_ok } */
+/* { dg-options "-save-temps -O0" } */
+/* { dg-add-options arm_crc } */
+
+#include "arm_acle.h"
+
+void test_rbit (void)
+{
+  uint32_t out_uint32_t;
+  uint32_t arg0_uint32_t;
+
+  out_uint32_t = __rbit (arg0_uint32_t);
+}
+
+/* { dg-final { scan-assembler-times "rbit\t...?, ...?\n" 2 } } */


[C++ Patch] PR 90909 ("[10 Regression] call devirtualized to pure virtual")

2019-06-19 Thread Paolo Carlini

Hi,

this bug notices that the more aggressive de-virtualization check that 
we have now in place (fixed c++/67184) doesn't work correctly for the 
below reproducer, which involves a pure virtual: we de-virtualize and 
the build fails at link-time. To cure this I believe we simply want an 
additional DECL_PURE_VIRTUAL_P in the condition. I also checked that the 
other compilers I have at hand appear to do the same, that is, they 
compile the reproducer both as-is and without the final specifier to the 
same assembly.


Note, in principle we have the option of not doing the additional 
DECL_PURE_VIRTUAL_P check when the final overrider comes from the class 
itself, not from a base, that is in the cases that we were already 
de-virtualizing pre-67184. That is, for something like:


struct A final
{
  virtual void foo () = 0;
};

void fun(A* a, B* b)
{
  a->foo();
}

devirtualize anyway (which then doesn't link). We could add back an '|| 
CLASSTYPE_FINAL (TYPE_METHOD_BASETYPE (TREE_TYPE (fn)))' for that. ICC 
appears to behave this way.


Tested x86_64-linux.

Thanks, Paolo.





/cp
2019-06-20  Paolo Carlini  

PR c++/90909
* call.c (build_over_call): Do not try to devirtualize when
then function is pure virtual.

/testsuite
2019-06-20  Paolo Carlini  

PR c++/90909
* g++.dg/other/final6.C: New.
Index: testsuite/g++.dg/other/final6.C
===
--- testsuite/g++.dg/other/final6.C (nonexistent)
+++ testsuite/g++.dg/other/final6.C (working copy)
@@ -0,0 +1,9 @@
+// PR c++/90909
+// { dg-do link { target c++11 } }
+
+struct S1 { virtual void f() = 0; };
+struct S2: S1 { virtual void f() {} };
+struct S3: S2 { using S1::f; };
+struct S4 final: S3 { void g(); };
+void S4::g() { f(); }
+int main() { S4().g(); }
Index: cp/call.c
===
--- cp/call.c   (revision 272410)
+++ cp/call.c   (working copy)
@@ -8244,7 +8244,8 @@ build_over_call (struct z_candidate *cand, int fla
   /* See if the function member or the whole class type is declared
 final and the call can be devirtualized.  */
   if (DECL_FINAL_P (fn)
- || CLASSTYPE_FINAL (TREE_TYPE (argtype)))
+ || (CLASSTYPE_FINAL (TREE_TYPE (argtype))
+ && !DECL_PURE_VIRTUAL_P (fn)))
flags |= LOOKUP_NONVIRTUAL;
 
   /* [class.mfct.nonstatic]: If a nonstatic member function of a class


Re: Use ODR for canonical types construction in LTO

2019-06-19 Thread Jason Merrill
On Wed, Jun 19, 2019 at 2:47 PM Nathan Sidwell  wrote:
>
> On 6/19/19 1:53 PM, Jan Hubicka wrote:
>  -ctype = CLASSTYPE_AS_BASE (ctype);
>  +{
>  +  if (!tree_int_cst_equal (TYPE_SIZE (ctype),
>  + TYPE_SIZE (CLASSTYPE_AS_BASE (ctype
>  +ctype = CLASSTYPE_AS_BASE (ctype);
>  +}
>   tree clobber = build_clobber (ctype);
> >>
> >> I have noticed we build a distinct as-base type in rather more cases than
> >> strictly necessary.  For instance when there's a member of reference type 
> >> or
> >> we have a non-trivial dtor. (CLASSTYPE_NON_LAYOUT_POD_P gets set by a bunch
> >> of things that don't affect ABI layout)
> >
> > Avoiding the extra copies at first place would be great. In my
> > understanding the types differ by virtual bases and also by their size
> > since the fake types are not padded to multiply of their alignment.
> > I guess this can be tested ahead of producing the copy and saving some
> > memory...
> >
> > I am not sure if my C++ FE abilities are on par to implement this tough.
>
> I don't think it's simple to fix there, just unfortunate.  your
> understanding is correct, and I think your workaround will work.
> However, remember it's possible for T == CLASSTYPE_AS_BASE (T), so might
> be worth checking that before doing the size comparison?
>
> It'd be great to comment on why you're not just using classtype_as_base
> there.  I suppose I'm serializing this stuff too, with the same
> inefficiencies ...

This simple (untested) patch doesn't avoid creating the unnecessary
as-base types, but it should avoid using them in a way that causes
them to be streamed, and should let them be discarded by GC.
Thoughts?
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index de37e43d04c..e0df9ef2b20 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -6453,6 +6453,12 @@ layout_class_type (tree t, tree *virtuals_p)
   /* Let the back end lay out the type.  */
   finish_record_layout (rli, /*free_p=*/true);
 
+  /* If we didn't end up needing an as-base type, don't use it.  */
+  if (CLASSTYPE_AS_BASE (t) != t
+  && tree_int_cst_equal (TYPE_SIZE (t),
+TYPE_SIZE (CLASSTYPE_AS_BASE (t
+CLASSTYPE_AS_BASE (t) = t;
+
   if (TYPE_SIZE_UNIT (t)
   && TREE_CODE (TYPE_SIZE_UNIT (t)) == INTEGER_CST
   && !TREE_OVERFLOW (TYPE_SIZE_UNIT (t))


Re: [PATCH 0/2][RFC][PR88836][AARCH64] Fix redundant ptest instruction

2019-06-19 Thread Kugan Vivekanandarajah
Hi Richard,

Thanks for your comments.

On Thu, 16 May 2019 at 18:13, Richard Sandiford
 wrote:
>
> kugan.vivekanandara...@linaro.org writes:
> > From: Kugan Vivekanandarajah 
> >
> > Inorder to fix this PR.
> >  * We need to change the whilelo pattern in backend
> >  * Change RTL CSE such that:
> >- Add support for VEC_DUPLICATE
> >- When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the
> >  parallel rtx at the end.
> >
> > For example, with patch1, we now have rtl insn as follows:
> >
> > (insn 19 18 20 3 (parallel [
> > (set (reg:VNx4BI 93 [ next_mask_18 ])
> > (unspec:VNx4BI [
> > (const_int 0 [0])
> > (reg:DI 95 [ _33 ])
> > ] UNSPEC_WHILE_LO))
> > (set (reg:CC 66 cc)
> > (compare:CC (unspec:SI [
> > (vec_duplicate:VNx4BI (const_int 1 [0x1]))
> > (reg:VNx4BI 93 [ next_mask_18 ])
> > ] UNSPEC_PTEST_PTRUE)
> > (const_int 0 [0])))
> > ]) 4244 {while_ultdivnx4bi}
> >
> > When cse_insn process the first, it records the CSE set in reg 93.  Then 
> > after
> > processing both the instruction in the parallel rtx, we invalidate all
> > expression with reg 93 which means expression in the second instruction is
> > invalidated for CSE. Attached patch relaxes this by invalidating before 
> > processing the
> > second.
>
> As far as patch 1 goes: the traditional reason for using clobbers
> to start with is that:
>
> - setting CC places a requirement on what CC must be after that instruction.
>   We then have to rely on REG_UNUSED notes to tell whether that value
>   actually matters or not.
>
>   This was a bigger deal before df though.  It might not matter as much now.
>
> - many passes find it harder to deal with multiple sets rather than
>   single sets, so it's better to keep a single_set unless we know
>   that both results are needed.
>
> It's currently combine's job to create a multiple set in cases
> where both results are useful.  The pattern for that already exists
> (*while_ult_cc), so if we do go for something
> like patch 1, we should simply expand to that insn rather than adding a
> new one.  Note that:
>
>   (vec_duplicate:PRED_ALL (const_int 1))
>
> shouldn't appear in the insn stream.  It should always be a const_vector
> instead.
>
> From a quick check, I haven't yet found a case where setting CC up-front
> hurts though, so maybe the above is out-of-date best practice and we
> should set the register up-front after all, if only to reduce the number
> of patterns.
>
> However, if possible, I think we should fix the PR in a way that works
> for instructions that only optionally set the flags (which for AArch64
> is the common case).  So it would be good if we could fix the PR without
> needing patch 1.

Do you think that combine should be able to set this. Sorry, I don't
understand how we can let other passes know that this instruction will
set the flags needed.

Thanks,
Kugan

>
> Thanks,
> Richard
>
> >
> > Bootstrap and regression testing for the current version is ongoing.
> >
> > Thanks,
> > Kugan
> >
> > Kugan Vivekanandarajah (2):
> >   [PR88836][aarch64] Set CC_REGNUM instead of clobber
> >   [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one
> >
> >  gcc/config/aarch64/aarch64-sve.md  |  9 +++-
> >  gcc/cse.c  | 67 
> > ++
> >  gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++
> >  3 files changed, 80 insertions(+), 10 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c


Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-19 Thread Hongtao Liu
On Sat, Jun 8, 2019 at 4:12 AM Uros Bizjak  wrote:
>
> On 6/7/19, H.J. Lu  wrote:
>
> >> > > +/* Register pair.  */
> >> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 2); /* P2QI */
> >> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 4); /* P2HI P4QI */
> >> > >
> >> > > I think
> >> > >
> >> > > INT_MODE (P2QI, 16);
> >> > > INT_MODE (P2HI, 32);
> >> > >
> >> > > with the above subreg approach should work.
> >> > >
> >> >
> >> > I don't think subreg works on pseudo registers with non-zero
> >> > offset.  validate_subreg has
> >> >
> >> >  if (maybe_lt (osize, regsize)
> >> >   && ! (lra_in_progress && (FLOAT_MODE_P (imode) || FLOAT_MODE_P
> >> > (omode
> >> > {
> >> >   /* It is invalid for the target to pick a register size for a
> >> > mode
> >> >  that isn't ordered wrt to the size of that mode.  */
> >> >   poly_uint64 block_size = ordered_min (isize, regsize);
> >> >   unsigned int start_reg;
> >> >   poly_uint64 offset_within_reg;
> >> >   if (!can_div_trunc_p (offset, block_size, &start_reg,
> >> > &offset_within_reg)
> >> >   || (BYTES_BIG_ENDIAN
> >> >   ? maybe_ne (offset_within_reg, block_size - osize)
> >> >   : maybe_ne (offset_within_reg, 0U)))
> >> > return false;
> >>
> >> It works with SImode subregs of DImode values on 32bit targets. Please
> >> look for calls to gen_highpart, one concrete example is in
> >> atomic_compare_and_swap.
> >>
> >
> > It works because of
> >
> > #define REGMODE_NATURAL_SIZE(MODE) UNITS_PER_WORD
> >
> > and only works for the high part of SImode of DImode.
> >
> > P2QI and P2HI are 2 special modes of mask register pair for
> > 2 instructions.   Do we want to make them more generic?
>
> If enhancing the referred define means that we don't need two
> artificial instructions and leave all heavy lifting to the existing
Do you mean that we take P2HI and P2QI as normal vector modes,
and reuse ix86_expand_vector_* things?
But still two artificial instructions can't be avoided.
> generic functionality, then this is the way to go.
>
> Uros.



-- 
BR,
Hongtao


Re: [RFC] ARM -mfpu=auto woes

2019-06-19 Thread Alexandre Oliva
Richard,

Thanks for your feedback.

I conclude from it that it's not worth it to introduce code to support
configuring --with-fpu=auto, so I'm going ahead and installing just the
obvious configury bits I'd posted before.

On Jun 12, 2019, Alexandre Oliva  wrote:

> So, any objections to my installing this then, or even now, if
> --with-fpu=auto shouldn't be accepted?

fix ARM --with-fpu option checking and error message

Fix the test for failure in parsecpu's checking of the --with-fpu
argument, and the error message that gets printed when the check
fails.

for  gcc/ChangeLog

* config.gcc: Fix ARM --with-fpu checking and error message.

---
 gcc/config.gcc |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fda048dc12b1..7119698779fd 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4002,12 +4002,13 @@ case "${target}" in
 
# see if --with-fpu matches any of the supported FPUs
if [ x"$with_fpu" != x ] ; then
+ val=$with_fpu
  fpu=`awk -f ${srcdir}/config/arm/parsecpu.awk \
-   -v cmd="chkfpu $with_fpu" \
+   -v cmd="chkfpu $val" \
${srcdir}/config/arm/arm-cpus.in`
- if [ "$fpu" = "error"]
+ if [ "$fpu" = "error" ]
  then
-   echo "Unknown target in --with-$which=$val" 1>&2
+   echo "Unknown target in --with-fpu=$val" 1>&2
exit 1
  fi
fi


-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


Re: PR libstdc++/90945 Patch to have pretty printer for std::vector return bool intead of int for elements

2019-06-19 Thread Stephan Bergmann

On 19/06/2019 21:54, Jonathan Wakely wrote:

On 19/06/19 21:49 +0200, Michael Weghorn wrote:

On 19/06/2019 21.37, Jonathan Wakely wrote:

+  std::vector vb;
+  vb.reserve(100);
+  vb.push_back(true);
+  vb.push_back(true);
+  vb.push_back(false);
+  vb.push_back(false);
+  vb.push_back(true);
+  vb.erase(vb.begin());
+// { dg-final { regexp-test vb {std::(__debug::)?vector of length 4, 
capacity 100 = \\{true, false, false, true\\}} } }

+


This inserts 5 elements, so I'd expect that either "vector of length 5"
and an additional "true" element at the beginning need to be added for
the expected result or one of the two first 'vb.push_back(true)' needs
to be removed.


It inserts five then erases one, the test is right.


Just one thought that occurred to me while idly browsing this thread: 
Wouldn't it be better in general to have non-symmetric content to test 
against, to check that the printer doesn't print it in reverse?


Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-19 Thread Uros Bizjak
On Thu, Jun 20, 2019 at 7:36 AM Hongtao Liu  wrote:
>
> On Sat, Jun 8, 2019 at 4:12 AM Uros Bizjak  wrote:
> >
> > On 6/7/19, H.J. Lu  wrote:
> >
> > >> > > +/* Register pair.  */
> > >> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 2); /* P2QI */
> > >> > > +VECTOR_MODES_WITH_PREFIX (P, INT, 4); /* P2HI P4QI */
> > >> > >
> > >> > > I think
> > >> > >
> > >> > > INT_MODE (P2QI, 16);
> > >> > > INT_MODE (P2HI, 32);
> > >> > >
> > >> > > with the above subreg approach should work.
> > >> > >
> > >> >
> > >> > I don't think subreg works on pseudo registers with non-zero
> > >> > offset.  validate_subreg has
> > >> >
> > >> >  if (maybe_lt (osize, regsize)
> > >> >   && ! (lra_in_progress && (FLOAT_MODE_P (imode) || FLOAT_MODE_P
> > >> > (omode
> > >> > {
> > >> >   /* It is invalid for the target to pick a register size for a
> > >> > mode
> > >> >  that isn't ordered wrt to the size of that mode.  */
> > >> >   poly_uint64 block_size = ordered_min (isize, regsize);
> > >> >   unsigned int start_reg;
> > >> >   poly_uint64 offset_within_reg;
> > >> >   if (!can_div_trunc_p (offset, block_size, &start_reg,
> > >> > &offset_within_reg)
> > >> >   || (BYTES_BIG_ENDIAN
> > >> >   ? maybe_ne (offset_within_reg, block_size - osize)
> > >> >   : maybe_ne (offset_within_reg, 0U)))
> > >> > return false;
> > >>
> > >> It works with SImode subregs of DImode values on 32bit targets. Please
> > >> look for calls to gen_highpart, one concrete example is in
> > >> atomic_compare_and_swap.
> > >>
> > >
> > > It works because of
> > >
> > > #define REGMODE_NATURAL_SIZE(MODE) UNITS_PER_WORD
> > >
> > > and only works for the high part of SImode of DImode.
> > >
> > > P2QI and P2HI are 2 special modes of mask register pair for
> > > 2 instructions.   Do we want to make them more generic?
> >
> > If enhancing the referred define means that we don't need two
> > artificial instructions and leave all heavy lifting to the existing
> Do you mean that we take P2HI and P2QI as normal vector modes,
> and reuse ix86_expand_vector_* things?
> But still two artificial instructions can't be avoided.
> > generic functionality, then this is the way to go.

No, declare them as integer modes and use subregs to access high and
low register. This should work in the same way as SImode hard
registers are accessed in DImode pair for 32bit targets.

Uros.


Re: [PATCH] [RFC, PGO+LTO] Missed function specialization + partial devirtualization

2019-06-19 Thread luoxhu

Hi Martin,

On 2019/6/20 09:59, luoxhu wrote:



On 2019/6/19 20:18, Martin Liška wrote:

On 6/19/19 10:56 AM, Martin Liška wrote:
Thank you very much for the numbers. Today, I'm going to prepare the 
generalization of single-value counter to track N values.


Ok, here's a patch candidate that does tracking of most common N values. 
For your test-case I can see:


pr69678.gcda:    01a9:  18:COUNTERS indirect_call 9 counts
pr69678.gcda:   0: 35000 1868707024 17500 
969338501 17500 0 0 0

pr69678.gcda:   8: 0

So for now, you'll need to generalize get_most_common_single_value to return
N most common values.

Eventually we'll need to renamed the counter as it won't be tracking just 
a single value

any longer. I can take care of it.

Can you please verify that the patch candidate works for you?

Thanks, the profile data seems good, I will try it.  I need rebase my patch
to trunk first, as there are many conflicts with your previous patch.


The patch works perfect for me, lots of duplicate code can be removed base
on that.  Hope you can upstream it soon.  :)
BTW, I don't need call the get_most_common_single_value function to access
the histogram values & counters, I will loop access it directly one by one.

Thanks
Xionghu





Thanks,
Martin