Re: [patch, fortran] Make ABI ready for BACK argument of MINLOC and MAXLOC

2018-01-08 Thread Janne Blomqvist
On Mon, Jan 8, 2018 at 1:23 AM, Thomas Koenig  wrote:
> Hello world,
>
> the attached patch is a step towards the implementaion of the BACK
> argument for the MINLOC and MAXLOC intrinsics, a part of F2008.
> This readies the ABI for a later date.

Makes sense.

> In order to avoid combinatrorial explosion in the library, I have
> opted to always add the BACK argument to the library version.
> The additional overhead should be small, this is only a scalar
> LOGICAL. We currently have 216 versions of minloc in the library,
> I don't want this to be 432 :-)

Yes, I agree.

> Of course, the actual implementation of BACK is still missing, as
> are the standard-dependent checks. This will be done at a later
> date. In this version, the library functions always get a .false.
> value, which is equivalent to the current behavior.
>
> Regression-tested. OK for trunk?

If I understand it correctly, in the library the back argument is
always a LOGICAL(kind=4). But in the frontend, the back argument is
coerced to gfc_default_logical_kind. So this doesn't work if one uses
the -fdefault-integer-8 option, because then gfc_default_logical_kind
will be 8.

I suggest you create a constant "gfc_logical4_kind" and use that in
the frontend.

-- 
Janne Blomqvist


[PATCH] Fix PR83517

2018-01-08 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2018-01-08  Richard Biener  

PR middle-end/83517
* match.pd ((t * 2) / 2) -> t): Add missing :c.

* gcc.dg/pr83517.c: New testcase.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 256275)
+++ gcc/match.pd(working copy)
@@ -510,7 +510,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Simplify (t * 2) / 2) -> t.  */
 (for div (trunc_div ceil_div floor_div round_div exact_div)
  (simplify
-  (div (mult @0 @1) @1)
+  (div (mult:c @0 @1) @1)
   (if (ANY_INTEGRAL_TYPE_P (type)
&& TYPE_OVERFLOW_UNDEFINED (type))
@0)))
Index: gcc/testsuite/gcc.dg/pr83517.c
===
--- gcc/testsuite/gcc.dg/pr83517.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr83517.c  (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-original" } */
+
+int test(int x)
+{
+  return (x+x)/x;
+}
+
+/* { dg-final { scan-tree-dump "return 2;" "original" } } */


Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

2018-01-08 Thread Florian Weimer
* H. J. Lu:

> Add -mindirect-branch-loop= option to control loop filler in call and
> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> as loop filler.  The default is 'lfence'.

Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
execution?


[PATCH] Fix PR83580

2018-01-08 Thread Richard Biener

The following fixes PR83580, split_constant_offset is a somewhat odd
beast, replicating SCEV code and tree-affine a bit.  It also got
similar tricks as those with regarding to looking through conversions
but while being pedantic about overflow it simply strips sign-conversions.
That's of course wrong, thus the following patch removes that.

I do expect eventual fallout (in missed optimizations), so it needs
some baking on trunk before backporting.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-01-08  Richard Biener  

PR middle-end/83580
* tree-data-ref.c (split_constant_offset): Remove STRIP_NOPS.

* gcc.dg/torture/pr83580.c: New testcase.

Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c (revision 256275)
+++ gcc/tree-data-ref.c (working copy)
@@ -723,23 +723,21 @@ split_constant_offset_1 (tree type, tree
 void
 split_constant_offset (tree exp, tree *var, tree *off)
 {
-  tree type = TREE_TYPE (exp), otype, op0, op1, e, o;
+  tree type = TREE_TYPE (exp), op0, op1, e, o;
   enum tree_code code;
 
   *var = exp;
   *off = ssize_int (0);
-  STRIP_NOPS (exp);
 
   if (tree_is_chrec (exp)
   || get_gimple_rhs_class (TREE_CODE (exp)) == GIMPLE_TERNARY_RHS)
 return;
 
-  otype = TREE_TYPE (exp);
   code = TREE_CODE (exp);
   extract_ops_from_tree (exp, &code, &op0, &op1);
-  if (split_constant_offset_1 (otype, op0, code, op1, &e, &o))
+  if (split_constant_offset_1 (type, op0, code, op1, &e, &o))
 {
-  *var = fold_convert (type, e);
+  *var = e;
   *off = o;
 }
 }
Index: gcc/testsuite/gcc.dg/torture/pr83580.c
===
--- gcc/testsuite/gcc.dg/torture/pr83580.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr83580.c  (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+
+int a[2] = { 0, 1 };
+int x = 129;
+
+int
+main ()
+{
+  volatile int v = 0;
+  int t = x, i;
+  for (i = 0; i < 1 + v + v + v + v + v + v + v + v + a[a[0]]; i++)
+t = a[(signed char) (130 - x)];
+  if (t != 1)
+__builtin_abort ();
+  return 0;
+}


Re: [wwwdocs] readings.html - libre.adacore.com is gone

2018-01-08 Thread Pierre-Marie de Rodat

On 01/08/2018 01:39 AM, Gerald Pfeifer wrote:

...so adjust to where it redirects.

Applied.


Ah, good catch. Thank you Gerald!

--
Pierre-Marie de Rodat


Re: [PING 2][PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2018-01-08 Thread Tom de Vries

On 12/17/2017 01:01 AM, Martin Sebor wrote:


* c-c++-common/Wrestrict.c: New test.




   681/* The following doesn't overlap but it should trigger 
-Wstrinop-ovewrflow
   682   for writing past the end.  */
   683T ("012", a + sizeof a, a);


For nvptx, the warning actually shows up and is classified as excess error:
...
gcc/testsuite/c-c++-common/Wrestrict.c:683:3: warning: 
'__builtin_memcpy' writing 4 bytes into a region of size 0 overflows the 
destination [-Wstringop-overflow=]

...


   760r = SR (DIFF_MAX - 2, DIFF_MAX - 1);
   761T (8, "012", a + r, a);/* { dg-warning "accessing 4 bytes at offsets 
\\\[\[0-9\]+, \[0-9\]+] and 0 overlaps" "strcpy" } */
   762  


Likewise, the warning triggers here:
...
gcc/testsuite/c-c++-common/Wrestrict.c:761:3: warning: 
'__builtin_memcpy' writing 4 bytes into a region of size 0 overflows the 
destination [-Wstringop-overflow=]

...


>>> * c-c++-common/Warray-bounds-4.c: New test.


66TM ("0123", "",  ma.a5 + i, ma.a5); /* { dg-warning "offset 6 from the object at 
.ma. is out of the bounds of referenced subobject .a5. with type .char\\\[5]. at offset 0" "strcpy" { 
xfail *-*-* } } */
67TM ("", "012345", ma.a7 + i, ma.a7);/* { dg-warning "offset 13 from 
the object at .ma. is out of the bounds of referenced subobject .\(MA::\)?a7. with type .char ?\\\[7]. at 
offset 5" } */


And this warning fails to trigger:
...
FAIL: c-c++-common/Warray-bounds-4.c  -Wc++-compat   (test for warnings, 
line 67)

...

Thanks,
- Tom


Re: [PATCH 3/5] x86: Add -mfunction-return=

2018-01-08 Thread Martin Liška
On 01/07/2018 11:59 PM, H.J. Lu wrote:
> Function return thunk is the same as memory thunk for -mindirect-branch=
> where the return address is at the top of the stack:
> 
> __x86_return_thunk:
>   call L2
> L1:
>   lfence
>   jmp L1
> L2:
>   lea 8(%rsp), %rsp|lea 4(%esp), %esp
>   ret
> 
> and function return becomes
> 
>   jmp __x86_return_thunk

Hello.

Can you please explain more usage of the option? Is to prevent a speculative
execution of 'ret' instruction (which is an indirect call), as described in [1]?
The paper mentions that return stack predictors are commonly implemented in 
some form.
Looks that current version of Linux patches does not use the option.

Thanks,
Martin

[1] https://support.google.com/faqs/answer/7625886



[testsuite] Require alloca for some test-cases

2018-01-08 Thread Tom de Vries

Hi,

This patch requires alloca for some test-cases.

Tested on x86_64 and committed.

Thanks,
- Tom
Require alloca for some test-cases

2018-01-08  Tom de Vries  

	* c-c++-common/builtins.c: Require effective target alloca.
	* gcc.dg/Wrestrict.c: Same.
	* gcc.dg/tree-ssa/loop-interchange-15.c: Same.

---
 gcc/testsuite/c-c++-common/builtins.c   | 3 ++-
 gcc/testsuite/gcc.dg/Wrestrict.c| 3 ++-
 gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/builtins.c b/gcc/testsuite/c-c++-common/builtins.c
index 673fcad..3f1ef11 100644
--- a/gcc/testsuite/c-c++-common/builtins.c
+++ b/gcc/testsuite/c-c++-common/builtins.c
@@ -2,7 +2,8 @@
with no prototype do not cause an ICE.
   { dg-do compile }
   { dg-options "-O2 -Wall -Wextra" }
-  { dg-prune-output "warning" }  */
+  { dg-prune-output "warning" }
+  { dg-require-effective-target alloca }  */
 
 typedef __SIZE_TYPE__ size_t;
 
diff --git a/gcc/testsuite/gcc.dg/Wrestrict.c b/gcc/testsuite/gcc.dg/Wrestrict.c
index 076f878..266443f 100644
--- a/gcc/testsuite/gcc.dg/Wrestrict.c
+++ b/gcc/testsuite/gcc.dg/Wrestrict.c
@@ -1,6 +1,7 @@
 /* Test to verify that VLAs are handled gracefully by -Wrestrict
{ dg-do compile }
-   { dg-options "-O2 -Wrestrict" }  */
+   { dg-options "-O2 -Wrestrict" }
+   { dg-require-effective-target alloca }  */
 
 typedef __SIZE_TYPE__ size_t;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c
index 420880e..63e5bb7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-15.c
@@ -1,6 +1,7 @@
 /* PR tree-optimization/83337 */
 /* { dg-do run { target int32plus } } */
 /* { dg-options "-O2 -floop-interchange" } */
+/* { dg-require-effective-target alloca }  */
 
 /* Copied from graphite/interchange-5.c */
 


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread Martin Liška
On 01/08/2018 01:29 AM, H.J. Lu wrote:
> 1. They need to be backportable to GCC 7/6/5/4.x.

I must admit this is very important constrain. To be honest, we're planning
to backport the patchset to GCC 4.3.

Martin


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread Martin Liška
On 01/07/2018 11:59 PM, H.J. Lu wrote:
> +static void
> +output_indirect_thunk_function (bool need_bnd_p, int regno)
> +{
> +  char name[32];
> +  tree decl;
> +
> +  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
> +  indirect_thunk_name (name, regno, need_bnd_p);
> +  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
> +  get_identifier (name),
> +  build_function_type_list (void_type_node, NULL_TREE));
> +  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
> +NULL_TREE, void_type_node);
> +  TREE_PUBLIC (decl) = 1;
> +  TREE_STATIC (decl) = 1;
> +  DECL_IGNORED_P (decl) = 1;
> +
> +#if TARGET_MACHO
> +  if (TARGET_MACHO)
> +{
> +  switch_to_section (darwin_sections[picbase_thunk_section]);
> +  fputs ("\t.weak_definition\t", asm_out_file);
> +  assemble_name (asm_out_file, name);
> +  fputs ("\n\t.private_extern\t", asm_out_file);
> +  assemble_name (asm_out_file, name);
> +  putc ('\n', asm_out_file);
> +  ASM_OUTPUT_LABEL (asm_out_file, name);
> +  DECL_WEAK (decl) = 1;
> +}
> +  else
> +#endif
> +if (USE_HIDDEN_LINKONCE)
> +  {
> + cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME 
> (decl));
> +
> + targetm.asm_out.unique_section (decl, 0);
> + switch_to_section (get_named_section (decl, NULL, 0));
> +
> + targetm.asm_out.globalize_label (asm_out_file, name);
> + fputs ("\t.hidden\t", asm_out_file);
> + assemble_name (asm_out_file, name);
> + putc ('\n', asm_out_file);
> + ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
> +  }
> +else
> +  {
> + switch_to_section (text_section);
> + ASM_OUTPUT_LABEL (asm_out_file, name);
> +  }
> +
> +  DECL_INITIAL (decl) = make_node (BLOCK);
> +  current_function_decl = decl;
> +  allocate_struct_function (decl, false);
> +  init_function_start (decl);
> +  /* We're about to hide the function body from callees of final_* by
> + emitting it directly; tell them we're a thunk, if they care.  */
> +  cfun->is_thunk = true;
> +  first_function_block_is_cold = false;
> +  /* Make sure unwind info is emitted for the thunk if needed.  */
> +  final_start_function (emit_barrier (), asm_out_file, 1);
> +
> +  output_indirect_thunk (need_bnd_p, regno);
> +
> +  final_end_function ();
> +  init_insn_lengths ();
> +  free_after_compilation (cfun);
> +  set_cfun (NULL);
> +  current_function_decl = NULL;
> +}
> +

I'm wondering whether thunk creation can be a good target-independent 
generalization? I guess
we can emit the function declaration without direct writes to asm_out_file? And 
the emission
of function body can be potentially a target hook?

What about emitting body of the function with RTL instructions instead of 
direct assembly write?
My knowledge of RTL is quite small, but maybe it can bring some generalization 
and reusability
for other targets?

Thank you,
Martin


[PATCH] -mjsr option bug fix

2018-01-08 Thread Sebastian Perta
Hi,

The -mjsr option in RX should ensure the that BSR instruction is not
generated, only JSR instruction should be generated.
However this does not work as expected: BSR instruction still gets generated
even if -mjsr is passed in the command line.
This is reproducible even if test cases from the gcc testsuite, for example:
gcc.c-torture\compile\920625-1.c
gcc.c-torture\compile\20051216-1.c
gcc.dg\torture\builtin-explog-1.c

The following patch fixes this issue by adding a new constraint to
call_internal and call_value_internal.
The patch also contains a test case which I created as follows:
1. I copied gcc.c-torture\compile\20051216-1.c  to gcc.target\rx and renamed
to mjsr.c
2. added the following lines to scan the assembly files for BSR. If BSR is
present the test fails.
/* { dg-do compile } */
/* { dg-options "-O2 -mjsr" } */
/* { dg-final { scan-assembler-not "bsr" } } */

Regression test is OK, tested with the following command:
make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim

Please let me know if this is OK. Thank you!

Best Regards,
Sebastian

Index: ChangeLog
===
--- ChangeLog   (revision 256278)
+++ ChangeLog   (working copy)
@@ -1,3 +1,10 @@
+2018-01-05  Sebastian Perta  
+
+   * config/rx/constraints.md: added new constraint CALL_OP_SYMBOL_REF 
+   to allow or block "symbol_ref" depending on value of TARGET_JSR
+   * config/rx/rx.md: use CALL_OP_SYMBOL_REF in call_internal and 
+   call_value_internal insns
+
 2018-01-05  Richard Sandiford  
 
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't
Index: config/rx/constraints.md
===
--- config/rx/constraints.md(revision 256278)
+++ config/rx/constraints.md(working copy)
@@ -106,3 +106,9 @@
)
   )
 )
+
+(define_constraint "CALL_OP_SYMBOL_REF"
+"constraint for call instructions using symbol ref"
+(and (match_test "!TARGET_JSR")
+ (match_code "symbol_ref"))
+)
Index: config/rx/rx.md
===
--- config/rx/rx.md (revision 256278)
+++ config/rx/rx.md (working copy)
@@ -438,7 +438,7 @@
 )
 
 (define_insn "call_internal"
-  [(call (mem:QI (match_operand:SI 0 "rx_call_operand" "r,Symbol"))
+  [(call (mem:QI (match_operand:SI 0 "rx_call_operand"
"r,CALL_OP_SYMBOL_REF"))
 (const_int 0))
(clobber (reg:CC CC_REG))]
   ""
@@ -466,7 +466,7 @@
 
 (define_insn "call_value_internal"
   [(set (match_operand  0 "register_operand" "=r,r")
-   (call (mem:QI (match_operand:SI 1 "rx_call_operand"   "r,Symbol"))
+   (call (mem:QI (match_operand:SI 1 "rx_call_operand"
"r,CALL_OP_SYMBOL_REF"))
  (const_int 0)))
(clobber (reg:CC CC_REG))]
   ""
Index: testsuite/gcc.target/rx/mjsr.c
===
--- testsuite/gcc.target/rx/mjsr.c  (nonexistent)
+++ testsuite/gcc.target/rx/mjsr.c  (working copy)
@@ -0,0 +1,134 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mjsr" } */
+
+void *malloc (__SIZE_TYPE__);
+void *realloc (void *, __SIZE_TYPE__);
+
+struct A { double x, y; };
+struct B { double x0, y0, x1, y1; };
+struct C { int n_points; int dir; struct B bbox; struct A *points; };
+struct D { int n_segs; struct C segs[1]; };
+
+void foo (int, int, int *, int, int *, struct A **, int *, int *,
+ struct D *, int *, struct D **, int *, int **);
+int baz (struct A, struct A, struct A, struct A);
+
+static void
+bar (struct D *svp, int *n_points_max,
+ struct A p, int *seg_map, int *active_segs, int i)
+{
+  int asi, n_points;
+  struct C *seg;
+
+  asi = seg_map[active_segs[i]];
+  seg = &svp->segs[asi];
+  n_points = seg->n_points;
+  seg->points = ((struct A *)
+   realloc (seg->points, (n_points_max[asi] <<= 1) * sizeof
(struct A)));
+  seg->points[n_points] = p;
+  seg->bbox.y1 = p.y;
+  seg->n_points++;
+}
+
+struct D *
+test (struct D *vp)
+{
+  int *active_segs, n_active_segs, *cursor, seg_idx;
+  double y, share_x;
+  int tmp1, tmp2, asi, i, j, *n_ips, *n_ips_max, n_segs_max;
+  struct A **ips, p_curs, *pts;
+  struct D *new_vp;
+  int *n_points_max, *seg_map, first_share;
+
+  n_segs_max = 16;
+  new_vp = (struct D *) malloc (sizeof (struct D) +
+   (n_segs_max - 1) * sizeof (struct C));
+  new_vp->n_segs = 0;
+
+  if (vp->n_segs == 0)
+return new_vp;
+
+  active_segs = ((int *) malloc ((vp->n_segs) * sizeof (int)));
+  cursor = ((int *) malloc ((vp->n_segs) * sizeof (int)));
+
+  seg_map = ((int *) malloc ((vp->n_segs) * sizeof (int)));
+  n_ips = ((int *) malloc ((vp->n_segs) * sizeof (int)));
+  n_ips_max = ((int *) malloc ((vp->n_segs) * sizeof (int)));
+  ips = ((struct A * *) malloc ((vp->n_segs) * sizeof (struct A *)));
+
+  n_points_max = ((int *) malloc ((n_segs_max) * sizeof (int)));
+
+  n_active_segs = 0;
+  seg_idx = 0;
+  y = vp->segs[0].

Re: [PATCH, libgcc] Fix PowerPC libgcc issues with -mabi=ieeelongdouble

2018-01-08 Thread Segher Boessenkool
On Thu, Dec 14, 2017 at 11:10:13PM -0500, Michael Meissner wrote:
> I am working on some patches to optionally enable multilibs for the PowerPC
> long double support to be switchable between IBM extended double and IEEE
> 128-bit floating point.  While the patches to actually enable the multlibs 
> need
> some more tweaking, it did point up an issue in the libgcc _Float128 and IBM
> extended double functions.
> 
> These patches use the correct types for IBM extended double and __float128 if
> the IEEE default is used.  I have built the compiler with bootstrap builds and
> there were no regressions in running the tests on a little endian power8
> system.
> 
> In addition, I had fixed the previous changes to _divkc3.c and _mulkc3.c so
> that these functions now include soft-fp.h and quad-float128.h, which provides
> the appropriate prototypes.
> 
> I have also done a bootstrap build with my preliminary multilib patches, and 
> it
> built fine with both -mabi=ieeelongdouble and -mabi=ibmlongdouble
> configurations.
> 
> Can I apply these patches to libgcc?

As far as I can follow, it's okay for trunk.  Please apply.  Thanks,


Segher


> 2017-12-14  Michael Meissner  
> 
>   * config/rs6000/_divkc3.c: Switch to using soft-fp.h and
>   quad-float128.h include files and use the types that they define,
>   instead of hand-rolling the types.
>   * config/rs6000/_mulkc3.c: Likewise.
>   * config/rs6000/ibm-ldouble.c: If we have __float128/_Float128,
>   use __ieee128 for the IBM extended double type instead of long double.
>   Change all functions.
>   * config/rs6000/quad-float128.h (IBM128_TYPE): Always use
>   __ieee128.
>   (CVT_FLOAT128_TO_IBM128): Use long double instead of __float128 on
>   systems where the default long double is IEEE 128-bit floating
>   point.
>   * config/rs6000/extendkftf2-sw.c (extendkftf2_sw): Likewise.
>   * config/rs6000/trunctfkf2-sw.c (__trunctfkf2_sw): Likewise.


Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer  wrote:
> * H. J. Lu:
>
>> Add -mindirect-branch-loop= option to control loop filler in call and
>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> as loop filler.  The default is 'lfence'.
>
> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
> execution?

My understanding is that a loop works better.

-- 
H.J.


[Patch, fortran] PDT bugs PR2 83611 and 83731

2018-01-08 Thread Paul Richard Thomas
This patch adds:
(i) Default initializers for parameterized arrays;
(ii) Fixes ordinary assignment of PDTs by implementing the same deep
copy mechanism as for derived types with allocatable components; and
(iii) Fixes the len parameter checking, which failed where the dummy
type had an assumed parameter.

I have in fact committed this patch as revision 256335 since it is
safe for anything other than PDTs.

2018-01-08  Paul Thomas  

PR fortran/83611
* decl.c (gfc_get_pdt_instance): If parameterized arrays have
an initializer, convert the kind parameters and add to the
component if the instance.
* trans-array.c (structure_alloc_comps): Add 'is_pdt_type' and
use it with case COPY_ALLOC_COMP. Call 'duplicate_allocatable'
for parameterized arrays. Clean up typos in comments. Convert
parameterized array initializers and copy into the array.
* trans-expr.c (gfc_trans_scalar_assign): Do a deep copy for
parameterized types.
*trans-stmt.c (trans_associate_var): Deallocate associate vars
as necessary, when they are PDT function results for example.

PR fortran/83731
* trans-array.c (structure_alloc_comps): Only compare len parms
when they are declared explicitly.

2018-01-08  Paul Thomas  

PR fortran/83611
* gfortran.dg/pdt_15.f03 : Bump count of 'n.data = 0B' to 8.
* gfortran.dg/pdt_26.f03 : Bump count of '_malloc' to 9.
* gfortran.dg/pdt_27.f03 : New test.

PR fortran/83731
* gfortran.dg/pdt_28.f03 : New test.

Cheers

Paul


Re: [PATCH improve early strlen range folding (PR 83671)

2018-01-08 Thread Richard Biener
On Sat, Jan 6, 2018 at 11:04 PM, Martin Sebor  wrote:
> Bug 83671 - Fix for false positive reported by -Wstringop-overflow
> does not work at -O1, points out that the string length range
> optimization implemented as a solution for bug 83373 doesn't help
> at -O1.  The root cause is that the fix was added to the strlen
> pass that doesn't run at -O1.
>
> The string length range computation doesn't depend on the strlen
> pass, and so the range can be set earlier, in gimple-fold, and
> its results made available even at -O1.  The attached patch
> changes the gimple_fold_builtin_strlen() function to do that.
>
> While testing the change I came across a number of other simple
> strlen cases that currently aren't handled, some at -O1, others
> at all.  I added code to handle some of the simplest of them
> and opened bugs to remind us/myself to get back to the rest in
> the future (pr83693 and pr83702).  The significant enhancement
> is handling arrays of arrays with non-constant indices and
> pointers to such things, such as in:
>
>   char a[2][7];
>
>   void f (int i)
>   {
> if (strlen (a[i]) > 6)   // eliminated with the patch
>   abort ();
>   }
>
> Attached is a near-minimal patch to handle PR 83671.

Please don't use set_range_info form insinde fold_stmt (), this is
IMHO a layering violation.

Why not restrict -Wstrinop-overflow to -O2+?

Richard.

> Martin


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread H.J. Lu
On Sun, Jan 7, 2018 at 8:07 PM, Sandra Loosemore
 wrote:
> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>>
>> This set of patches for GCC 8 mitigates variant #2 of the speculative
>> execution
>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka
>> Spectre.  They
>> convert indirect branches to call and return thunks to avoid speculative
>> execution
>> via indirect call and jmp.
>
>
> I have a general documentation issue with all the new command-line options
> and attributes added by this patch set:  the documentation is very
> implementor-speaky and doesn't explain what user-level problem they're
> trying to solve.

Do you have any suggestions?

> E.g. to take just one example
>
>> +@item function_return("@var{choice}")
>> +@cindex @code{function_return} function attribute, x86
>> +On x86 targets, the @code{function_return} attribute causes the compiler
>> +to convert function return with @var{choice}.  @samp{keep} keeps function
>> +return unmodified.  @samp{thunk} converts function return to call and
>> +return thunk.  @samp{thunk-inline} converts function return to inlined
>> +call and return thunk.  @samp{thunk-extern} converts function return to
>> +external call and return thunk provided in a separate object file.
>
>
> Why would you want to mess with call and return code generation in this way?
> The documentation doesn't say.
>
> For thunk-extern, is the programmer supposed to provide the thunk?  How
> would you go about implementing the missing bit of code?  What should it do?
> I'm compiler implementor and I wouldn't even know how to use this feature by
> reading the manual; how would an ordinary application programmer who isn't
> familiar with GCC internals know how to use it?

This option was requested by Linux kernel people.  Linux kernel may
choose different thunks at kernel load-time.  If a program doesn't know
how to write external thunk, he/she shouldn't it.

> If the goal here is to tell GCC to produce code that is protected against
> the Spectre vulnerability, perhaps simplify this to adding just one option
> that controls all the things you've given separate options and attributes
> for?

-mindirect-branch=thunk does the job.  Other options/choices are for
fine tuning.

Thanks.

-- 
H.J.


Re: [PATCH] fold strlen of constant aggregates (PR 83693)

2018-01-08 Thread Richard Biener
On Mon, Jan 8, 2018 at 3:11 AM, Martin Sebor  wrote:
> GCC is able to fold references to members of global aggregate
> constants in many expressions but it doesn't known how do it
> for strlen() arguments.  As a result, strlen calls with such
> arguments such the one below are not optimized:
>
>   const struct { char a[4]; } s = { "123" };
>
>   void f (void)
>   {
> if (s.a[3]) abort ();   // folded/eliminated
>   }
>
>   void g (void)
>   {
> if (strlen (s.a) != 3) abort ();   // not folded
>   }
>
> The attached patch enables gimple_fold_builtin_strlen() to extract
> constant strings from aggregate initializers, analogously to how
> it extracts data of other types.

Hmm.  You do

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index fe3e0b4..f277324 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -3517,8 +3517,14 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
   wide_int minlen;
   wide_int maxlen;

+  /* Try to extract a constant from an object's CONSTRUCTOR first.  */
+  tree arg = gimple_call_arg (stmt, 0);
+  if (TREE_CODE (arg) == ADDR_EXPR)
+if (tree str = fold_const_aggregate_ref (TREE_OPERAND (arg, 0)))
+  arg = str;
+

(patch is not against trunk?) but then fold_const_aggregate_ref of, say, &s.a[2]
will simply return '3' which then yields to a bougs result?

So I'm not sure this flys as-is or at least needs a comment how such
simplification
will end up _not_ disturbing the following code (maybe by noticing '3'
aka an INTEGER_CST
isn't a valid string and thus not folding).

Richard.

> Martin


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread H.J. Lu
On Sun, Jan 7, 2018 at 10:55 PM, Markus Trippelsdorf
 wrote:
> On 2018.01.07 at 21:07 -0700, Sandra Loosemore wrote:
>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>> > This set of patches for GCC 8 mitigates variant #2 of the speculative 
>> > execution
>> > vulnerabilities on x86 processors identified by CVE-2017-5715, aka 
>> > Spectre.  They
>> > convert indirect branches to call and return thunks to avoid speculative 
>> > execution
>> > via indirect call and jmp.
>>
>> I have a general documentation issue with all the new command-line
>> options and attributes added by this patch set:  the documentation is
>> very implementor-speaky and doesn't explain what user-level problem
>> they're trying to solve.
>>
>> E.g. to take just one example
>>
>> > +@item function_return("@var{choice}")
>> > +@cindex @code{function_return} function attribute, x86
>> > +On x86 targets, the @code{function_return} attribute causes the compiler
>> > +to convert function return with @var{choice}.  @samp{keep} keeps function
>> > +return unmodified.  @samp{thunk} converts function return to call and
>> > +return thunk.  @samp{thunk-inline} converts function return to inlined
>> > +call and return thunk.  @samp{thunk-extern} converts function return to
>> > +external call and return thunk provided in a separate object file.
>>
>> Why would you want to mess with call and return code generation in this
>> way?  The documentation doesn't say.
>>
>> For thunk-extern, is the programmer supposed to provide the thunk?  How
>> would you go about implementing the missing bit of code?  What should it
>> do?  I'm compiler implementor and I wouldn't even know how to use this
>> feature by reading the manual; how would an ordinary application
>> programmer who isn't familiar with GCC internals know how to use it?
>>
>> If the goal here is to tell GCC to produce code that is protected
>> against the Spectre vulnerability, perhaps simplify this to adding just
>> one option that controls all the things you've given separate options
>> and attributes for?
>
> Also it would be good to coordinate with the LLVM guys: They currently
> use -mretpoline and -mretpoline_external_thunk.
> I agree with Sandra that a single master option like -mretpoline would
> be better.

Our current goal is to compile Linux kernel.  We won't change the generated
codes.  We will change the command options only if we add a late generic RTL
pass.

-- 
H.J.


Re: [PATCH][PR rtl-optimization/81308] Conditionally cleanup the CFG after insn splitting

2018-01-08 Thread Richard Biener
On Mon, Jan 8, 2018 at 5:22 AM, Jeff Law  wrote:
>
> This patch fixes the original problem reported in 81308.  Namely that
> g++.dg/pr62079.C will trigger a checking failure on 32bit x86.
>
> As Jakub noted in the BZ the problem is we had an insn with an EH region
> note.  That insn gets split and the split insns do not have an EH region
> note (nor do they need one AFAICT).
>
> With the EH region note gone, the actual EH region becomes unreachable
> and we get a verification error in the dominance code.
>
> My solution is relatively simple.  During splitting we track if the
> current insn has an EH region note.  If it does and we end splitting the
> insn or deleting it as a nop-move, then we note that we're going to need
> a cfg cleanup.
>
> After splitting all insns, we conditionally cleanup the CFG.
>
> This should keep the overhead relatively low -- primarily it's the cost
> to look for the EH region note on each insn as I don't expect the cfg
> cleanup is often needed.
>
> If we could prove to ourselves that the situation only occurs with
> -fnon-call-exceptions, then we could further reduce the overhead by
> exploiting that invariant as well.  I haven't really thought too much
> about this.
>
> No new testcase as the existing pr62079.C will trigger on x86.
>
> Bootstrapped and regression tested on x86_64.  Verified pr62079.C now
> passes by hand in 32 bit mode.
>
> OK for the trunk?

Ok.

Richard.

> Jeff
>
> PR rtl-optimization/81308
> * recog.c (split_all_insns): Conditionally cleanup the CFG after
> splitting insns.
>
> diff --git a/gcc/recog.c b/gcc/recog.c
> index d6aa903..cc28b71 100644
> --- a/gcc/recog.c
> +++ b/gcc/recog.c
> @@ -2931,6 +2931,7 @@ void
>  split_all_insns (void)
>  {
>bool changed;
> +  bool need_cfg_cleanup = false;
>basic_block bb;
>
>auto_sbitmap blocks (last_basic_block_for_fn (cfun));
> @@ -2949,6 +2950,18 @@ split_all_insns (void)
>  CODE_LABELS and short-out basic blocks.  */
>   next = NEXT_INSN (insn);
>   finish = (insn == BB_END (bb));
> +
> + /* If INSN has a REG_EH_REGION note and we split INSN, the
> +resulting split may not have/need REG_EH_REGION notes.
> +
> +If that happens and INSN was the last reference to the
> +given EH region, then the EH region will become unreachable.
> +We can not leave the unreachable blocks in the CFG as that
> +will trigger a checking failure.
> +
> +So track if INSN has a REG_EH_REGION note.  If so and we
> +split INSN, then trigger a CFG cleanup.  */
> + rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
>   if (INSN_P (insn))
> {
>   rtx set = single_set (insn);
> @@ -2965,6 +2978,8 @@ split_all_insns (void)
>  nops then anyways.  */
>   if (reload_completed)
>   delete_insn_and_edges (insn);
> + if (note)
> +   need_cfg_cleanup = true;
> }
>   else
> {
> @@ -2972,6 +2987,8 @@ split_all_insns (void)
> {
>   bitmap_set_bit (blocks, bb->index);
>   changed = true;
> + if (note)
> +   need_cfg_cleanup = true;
> }
> }
> }
> @@ -2980,7 +2997,16 @@ split_all_insns (void)
>
>default_rtl_profile ();
>if (changed)
> -find_many_sub_basic_blocks (blocks);
> +{
> +  find_many_sub_basic_blocks (blocks);
> +
> +  /* Splitting could drop an REG_EH_REGION if it potentially
> +trapped in its original form, but does not in its split
> +form.  Consider a FLOAT_TRUNCATE which splits into a memory
> +store/load pair and -fnon-call-exceptions.  */
> +  if (need_cfg_cleanup)
> +   cleanup_cfg (0);
> +}
>
>checking_verify_flow_info ();
>  }
>


Re: [PATCH][PR rtl-optimization/81308] Conditionally cleanup the CFG after switch conversion

2018-01-08 Thread Richard Biener
On Mon, Jan 8, 2018 at 5:45 AM, Jeff Law  wrote:
> This patch fixes the second testcase in 81308 and the duplicate in 83724.
>
> For those cases we have a switch statement where one or more case labels
> are marked as __builtin_unreachable.
>
> Switch conversion calls group_case_labels which can drop the edges from
> the switch to the case labels that are marked as __builtin_unreachable.
>
> That leaves those blocks unreachable and thus we again trigger the
> assertion failure in the dominance code.
>
> My solution here is again to note if a change was made that ought to
> trigger a CFG cleanup (group_case_labels returns a suitable boolean).
> If such a change was made then the pass returns TODO_cleanup_cfg.
>
> Bootstrapped and regression tested on x86_64.  OK for the trunk?

Ok.

RIchard.

> Jeff
>
>
> PR rtl-optimizatin/81308
> * tree-switch-conversion.c (cfg_altered): New file scoped static.
> (process_switch): If group_case_labels makes a change, then set
> cfg_altered.
> (pass_convert_switch::execute): If a switch is converted, then
> set cfg_altered.  Return TODO_cfg_cleanup if cfg_altered is true.
>
>
> PR rtl-optimizatin/81308
> * g++.dg/pr81308-1.C: New test.
> * g++.dg/pr81308-2.C: New test.
>
> diff --git a/gcc/testsuite/g++.dg/pr81308-1.C 
> b/gcc/testsuite/g++.dg/pr81308-1.C
> new file mode 100644
> index 000..508372b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr81308-1.C
> @@ -0,0 +1,67 @@
> +/* { dg-do compile } */
> +/* { dg-options "-w -O2 -fno-exceptions -std=c++11 -fpermissive" } */
> +
> +namespace a {
> +template  struct d { static constexpr b e = c; };
> +template  struct f : d {};
> +}
> +typedef long g;
> +template  struct h { static const bool e = a::f::e; };
> +namespace a {
> +template  struct ah;
> +template  class ai;
> +}
> +class i {
> +public:
> +  operator[](long) const {}
> +};
> +template  class am : public i {};
> +class an;
> +class k : public am, h>>::e> {};
> +class l {
> +public:
> +  aq();
> +};
> +class ar extern as;
> +typedef k at;
> +class m {
> +  virtual bool av(int, unsigned &, at &, int &, g &, bool);
> +};
> +class ar {
> +public:
> +  typedef m *aw(const &, int &, const &, const &);
> +};
> +struct ax {
> +  static ay(ar::aw);
> +};
> +template  struct n {
> +  n(ar) { ax::ay(ba); }
> +  static m *ba(const &bb, int &bc, const &bd, const &be) { az(bb, bc, bd, 
> be); }
> +};
> +namespace {
> +class G : m {
> +  unsigned bi(const at &, l &);
> +  bool av(int, unsigned &, at &, int &, g &, bool);
> +
> +public:
> +  G(const, int, const, const) {}
> +};
> +}
> +bool G::av(int, unsigned &, at &bl, int &, g &, bool) {
> +  l bo;
> +  bi(bl, bo);
> +}
> +o() { n bp(as); }
> +namespace {
> +enum { bq, br };
> +}
> +unsigned G::bi(const at &bl, l &bo) {
> +  unsigned bs;
> +  for (char *j;; j += 2)
> +switch (*j) {
> +case bq:
> +  bl[bs];
> +case br:
> +  bo.aq();
> +}
> +}
> diff --git a/gcc/testsuite/g++.dg/pr81308-2.C 
> b/gcc/testsuite/g++.dg/pr81308-2.C
> new file mode 100644
> index 000..97e3409
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr81308-2.C
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-w -O2" } */
> +
> +struct A {
> +  int operator[](int) const {}
> +};
> +struct B {
> +  void m_fn1();
> +};
> +struct C {
> +  virtual bool m_fn2(int, unsigned &, A &, int &, unsigned long &, bool);
> +};
> +template  struct D {
> +  D(int) { MCAsmParserImpl(0, 0, 0, 0); }
> +};
> +int a;
> +namespace {
> +struct F : C {
> +  bool m_fn2(int, unsigned &, A &, int &, unsigned long &, bool);
> +  unsigned m_fn3(const A &, B &);
> +  F(int, int, int, int) {}
> +};
> +}
> +bool F::m_fn2(int, unsigned &, A &p3, int &, unsigned long &, bool) {
> +  B b;
> +  m_fn3(p3, b);
> +}
> +void fn1() { D(0); }
> +unsigned F::m_fn3(const A &p1, B &p2) {
> +  for (int *p;; p++)
> +switch (*p) {
> +case 0:
> +  p1[a];
> +case 1:
> +  p2.m_fn1();
> +}
> +}
> +
> diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
> index fdec59e..b384e4d 100644
> --- a/gcc/tree-switch-conversion.c
> +++ b/gcc/tree-switch-conversion.c
> @@ -60,6 +60,10 @@ Software Foundation, 51 Franklin Street, Fifth Floor, 
> Boston, MA
>   targetm.case_values_threshold(), or be its own param.  */
>  #define MAX_CASE_BIT_TESTS  3
>
> +/* Track whether or not we have altered the CFG and thus may need to
> +   cleanup the CFG when complete.  */
> +bool cfg_altered;
> +
>  /* Split the basic block at the statement pointed to by GSIP, and insert
> a branch to the target basic block of E_TRUE conditional on tree
> expression COND.
> @@ -1492,7 +1496,7 @@ process_switch (gswitch *swtch)
>
>/* Group case labels so that we get the right results from the heuristics
>   that decide on the code generation approach for this switch.  */
> -  group_case_labels_stmt (swtch);
> +  cfg_altered |= group_case_labels_stmt (swtch);
>
>/* If 

Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-08 Thread Richard Biener
On Mon, Jan 8, 2018 at 5:47 AM, Jeff Law  wrote:
> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>> Hi Richard,
>>
>> Unfortunately, I don't see any way that this will be useful for the ppc 
>> targets.  We don't
>> have a way to force resolution of a condition prior to continuing 
>> speculation, so this
>> will just introduce another comparison that we would speculate past.  For 
>> our mitigation
>> we will have to introduce an instruction that halts all speculation at that 
>> point, and place
>> it in front of all dangerous loads.  I wish it were otherwise.
> So could you have an expander for __builtin_load_no_speculate that just
> emits the magic insn that halts all speculation and essentially ignores
> the additional stuff that __builtin_load_no_speculate might be able to
> do on other platforms?

I think you at least need to expand the builtin semantically given as designed
it might consume the condition entirely in the source code.

I also think the user documentation in extend.texi should contain examples on
how to actually use the builtin to mitigate the Spectre attack, that
is, code before
and after using it.

And somebody might want to set up a spectre.html page and some NEWS item
at some point.

Richard.

>
> jeff


[testsuite] Require stack size for some test-cases

2018-01-08 Thread Tom de Vries

Hi,

this patch requires stack size for some test-cases that are currently 
failing for nvptx with error message:

...
nvptx-run: error launching kernel: invalid argument 
(CUDA_ERROR_INVALID_VALUE, 1)

...

Tested on nvptx.

Committed.

Thanks,
- Tom
Require stack size for some test-cases

2018-01-08  Tom de Vries  

	* gcc.dg/graphite/interchange-7.c: Add dg-require-stack-size.
	* gcc.dg/graphite/run-id-1.c: Same.
	* gcc.dg/tree-ssa/loop-interchange-4.c: Same.

---
 gcc/testsuite/gcc.dg/graphite/interchange-7.c  | 1 +
 gcc/testsuite/gcc.dg/graphite/run-id-1.c   | 1 +
 gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-7.c b/gcc/testsuite/gcc.dg/graphite/interchange-7.c
index 81a6d83..50f7dd7 100644
--- a/gcc/testsuite/gcc.dg/graphite/interchange-7.c
+++ b/gcc/testsuite/gcc.dg/graphite/interchange-7.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target size32plus } */
+/* { dg-require-stack-size "8*111*" } */
 
 /* Formerly known as ltrans-8.c */
 
diff --git a/gcc/testsuite/gcc.dg/graphite/run-id-1.c b/gcc/testsuite/gcc.dg/graphite/run-id-1.c
index a58c090..d2fc3c5 100644
--- a/gcc/testsuite/gcc.dg/graphite/run-id-1.c
+++ b/gcc/testsuite/gcc.dg/graphite/run-id-1.c
@@ -1,5 +1,6 @@
 /* { dg-options "-Wl,--stack,12582912" { target *-*-mingw* *-*-cygwin* } } */
 /* { dg-require-effective-target size32plus } */
+/* { dg-require-stack-size "4*1000*1000" } */
 
 void abort (void);
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c
index a919a6c..4e64275 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */
+/* { dg-require-stack-size "8*111*" } */
 
 /* Copied from graphite/interchange-7.c */
 


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 2:10 AM, Martin Liška  wrote:
> On 01/07/2018 11:59 PM, H.J. Lu wrote:
>> +static void
>> +output_indirect_thunk_function (bool need_bnd_p, int regno)
>> +{
>> +  char name[32];
>> +  tree decl;
>> +
>> +  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
>> +  indirect_thunk_name (name, regno, need_bnd_p);
>> +  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>> +  get_identifier (name),
>> +  build_function_type_list (void_type_node, NULL_TREE));
>> +  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
>> +NULL_TREE, void_type_node);
>> +  TREE_PUBLIC (decl) = 1;
>> +  TREE_STATIC (decl) = 1;
>> +  DECL_IGNORED_P (decl) = 1;
>> +
>> +#if TARGET_MACHO
>> +  if (TARGET_MACHO)
>> +{
>> +  switch_to_section (darwin_sections[picbase_thunk_section]);
>> +  fputs ("\t.weak_definition\t", asm_out_file);
>> +  assemble_name (asm_out_file, name);
>> +  fputs ("\n\t.private_extern\t", asm_out_file);
>> +  assemble_name (asm_out_file, name);
>> +  putc ('\n', asm_out_file);
>> +  ASM_OUTPUT_LABEL (asm_out_file, name);
>> +  DECL_WEAK (decl) = 1;
>> +}
>> +  else
>> +#endif
>> +if (USE_HIDDEN_LINKONCE)
>> +  {
>> + cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME 
>> (decl));
>> +
>> + targetm.asm_out.unique_section (decl, 0);
>> + switch_to_section (get_named_section (decl, NULL, 0));
>> +
>> + targetm.asm_out.globalize_label (asm_out_file, name);
>> + fputs ("\t.hidden\t", asm_out_file);
>> + assemble_name (asm_out_file, name);
>> + putc ('\n', asm_out_file);
>> + ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
>> +  }
>> +else
>> +  {
>> + switch_to_section (text_section);
>> + ASM_OUTPUT_LABEL (asm_out_file, name);
>> +  }
>> +
>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>> +  current_function_decl = decl;
>> +  allocate_struct_function (decl, false);
>> +  init_function_start (decl);
>> +  /* We're about to hide the function body from callees of final_* by
>> + emitting it directly; tell them we're a thunk, if they care.  */
>> +  cfun->is_thunk = true;
>> +  first_function_block_is_cold = false;
>> +  /* Make sure unwind info is emitted for the thunk if needed.  */
>> +  final_start_function (emit_barrier (), asm_out_file, 1);
>> +
>> +  output_indirect_thunk (need_bnd_p, regno);
>> +
>> +  final_end_function ();
>> +  init_insn_lengths ();
>> +  free_after_compilation (cfun);
>> +  set_cfun (NULL);
>> +  current_function_decl = NULL;
>> +}
>> +
>
> I'm wondering whether thunk creation can be a good target-independent 
> generalization? I guess
> we can emit the function declaration without direct writes to asm_out_file? 
> And the emission
> of function body can be potentially a target hook?
>
> What about emitting body of the function with RTL instructions instead of 
> direct assembly write?
> My knowledge of RTL is quite small, but maybe it can bring some 
> generalization and reusability
> for other targets?

Thunks are x86 specific and they are created the same way as 32-bit PIC thunks.
I don't see how a target hook is used.

-- 
H.J.


Re: [PATCH 3/5] x86: Add -mfunction-return=

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 1:56 AM, Martin Liška  wrote:
> On 01/07/2018 11:59 PM, H.J. Lu wrote:
>> Function return thunk is the same as memory thunk for -mindirect-branch=
>> where the return address is at the top of the stack:
>>
>> __x86_return_thunk:
>>   call L2
>> L1:
>>   lfence
>>   jmp L1
>> L2:
>>   lea 8(%rsp), %rsp|lea 4(%esp), %esp
>>   ret
>>
>> and function return becomes
>>
>>   jmp __x86_return_thunk
>
> Hello.
>
> Can you please explain more usage of the option? Is to prevent a speculative
> execution of 'ret' instruction (which is an indirect call), as described in 
> [1]?
> The paper mentions that return stack predictors are commonly implemented in 
> some form.
> Looks that current version of Linux patches does not use the option.
>

This option is requested by Linux kernel people.  It may be used in
the future.

-- 
H.J.


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread Jakub Jelinek
On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
> > I'm wondering whether thunk creation can be a good target-independent 
> > generalization? I guess
> > we can emit the function declaration without direct writes to asm_out_file? 
> > And the emission
> > of function body can be potentially a target hook?
> >
> > What about emitting body of the function with RTL instructions instead of 
> > direct assembly write?
> > My knowledge of RTL is quite small, but maybe it can bring some 
> > generalization and reusability
> > for other targets?
> 
> Thunks are x86 specific and they are created the same way as 32-bit PIC 
> thunks.
> I don't see how a target hook is used.

Talking about PIC thunks, those have I believe . character in their symbols,
so that they can't be confused with user functions.  Any reason these
retpoline thunks aren't?

Jakub


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek  wrote:
> On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
>> > I'm wondering whether thunk creation can be a good target-independent 
>> > generalization? I guess
>> > we can emit the function declaration without direct writes to 
>> > asm_out_file? And the emission
>> > of function body can be potentially a target hook?
>> >
>> > What about emitting body of the function with RTL instructions instead of 
>> > direct assembly write?
>> > My knowledge of RTL is quite small, but maybe it can bring some 
>> > generalization and reusability
>> > for other targets?
>>
>> Thunks are x86 specific and they are created the same way as 32-bit PIC 
>> thunks.
>> I don't see how a target hook is used.
>
> Talking about PIC thunks, those have I believe . character in their symbols,
> so that they can't be confused with user functions.  Any reason these
> retpoline thunks aren't?
>

They used to have '.'.  It was changed at the last minute since kernel needs to
export them as regular symbols.

-- 
H.J.


[PATCH] PR 78534 Regression on 32-bit targets

2018-01-08 Thread Janne Blomqvist
By switching from int to size_t in order to handle larger values,
r256322 introduced a bug that manifested itself on 32-bit
targets. Fixed by using the correct type to store the result of a
next_array_record call.

Regtested on x86_64-pc-linux-gnu and i686-pc-linux-gnu, committed to
trunk as obvious.

libgfortran/ChangeLog:

2018-01-08  Janne Blomqvist  

PR 78534, bugfix for r256322
* io/transfer.c (next_record_w): Use correct type for return value
of next_array_record.
---
 libgfortran/io/transfer.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index f9c8696..7e076de 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -3691,7 +3691,7 @@ next_record_w (st_parameter_dt *dtp, int done)
{
  char *p;
  /* Internal unit, so must fit in memory.  */
- size_t length, m, record;
+ size_t length, m;
  size_t max_pos = max_pos_off;
  if (is_array_io (dtp))
{
@@ -3730,14 +3730,16 @@ next_record_w (st_parameter_dt *dtp, int done)
memset (p, ' ', length);
 
  /* Now that the current record has been padded out,
-determine where the next record in the array is. */
- record = next_array_record (dtp, dtp->u.p.current_unit->ls,
- &finished);
+determine where the next record in the array is.
+Note that this can return a negative value, so it
+needs to be assigned to a signed value.  */
+ gfc_offset record = next_array_record
+   (dtp, dtp->u.p.current_unit->ls, &finished);
  if (finished)
dtp->u.p.current_unit->endfile = AT_ENDFILE;
 
  /* Now seek to this record */
- record = record * ((size_t) dtp->u.p.current_unit->recl);
+ record = record * dtp->u.p.current_unit->recl;
 
  if (sseek (dtp->u.p.current_unit->s, record, SEEK_SET) < 0)
{
-- 
2.7.4



Re: [PATCH, rs6000] Add vec_mergee, vec_mergeo, vec_float2 builtin support

2018-01-08 Thread Segher Boessenkool
Hi Carl,

On Mon, Dec 18, 2017 at 04:10:06PM -0800, Carl Love wrote:
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1,3 +1,4 @@
> +
>  ;; AltiVec patterns.
>  ;; Copyright (C) 2002-2017 Free Software Foundation, Inc.
>  ;; Contributed by Aldy Hernandez (a...@quesejoda.com)

Please lose this change.

> +;; Power8 vector merge two V2DF/V2DI even words to V2DF
> +(define_expand "p8_vmrgew_"
> +  [(use (match_operand:VSX_D 0 "vsx_register_operand" ""))
> +   (use (match_operand:VSX_D 1 "vsx_register_operand" ""))
> +   (use (match_operand:VSX_D 2 "vsx_register_operand" ""))]

Drop the empty default field please (here and elsewhere, certainly in all
expanders).

It looks good to me otherwise.  Okay for trunk.  Thanks!


Segher


[PATCH] Fix PR83719

2018-01-08 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-01-08  Richard Biener  

PR lto/83719
* dwarf2out.c (output_indirect_strings): Handle empty
skeleton_debug_str_hash.
(dwarf2out_early_finish): Index strings for -gsplit-dwarf.

* gcc.dg/lto/pr83719_0.c: New testcase.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 256329)
+++ gcc/dwarf2out.c (working copy)
@@ -27795,8 +27795,9 @@ output_indirect_strings (void)
   unsigned int offset = 0;
   unsigned int cur_idx = 0;
 
-  skeleton_debug_str_hash->traverse (DW_FORM_strp);
+  if (skeleton_debug_str_hash)
+skeleton_debug_str_hash->traverse 
(DW_FORM_strp);
 
   switch_to_section (debug_str_offsets_section);
   debug_str_hash->traverse_noresize
@@ -30819,6 +30820,12 @@ dwarf2out_early_finish (const char *file
 
   save_macinfo_strings ();
 
+  if (dwarf_split_debug_info)
+{
+  unsigned int index = 0;
+  debug_str_hash->traverse_noresize (&index);
+}
+
   /* Output all of the compilation units.  We put the main one last so that
  the offsets are available to output_pubnames.  */
   for (limbo_die_node *node = limbo_die_list; node; node = node->next)
Index: gcc/testsuite/gcc.dg/lto/pr83719_0.c
===
--- gcc/testsuite/gcc.dg/lto/pr83719_0.c(nonexistent)
+++ gcc/testsuite/gcc.dg/lto/pr83719_0.c(working copy)
@@ -0,0 +1,4 @@
+/* { dg-lto-do assemble } */
+/* { dg-lto-options { { -flto -g -gsplit-dwarf } } } */
+
+/* Empty.  */


[PATCH] Fix PR83685

2018-01-08 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-01-08  Richard Biener  

PR tree-optimization/83685
* tree-ssa-pre.c (create_expression_by_pieces): Do not insert
references to abnormals.

* gcc.dg/torture/pr83685.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 256329)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -2697,6 +2697,8 @@ create_expression_by_pieces (basic_block
that value numbering saw through.  */
 case NAME:
   folded = PRE_EXPR_NAME (expr);
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (folded))
+   return NULL_TREE;
   if (useless_type_conversion_p (exprtype, TREE_TYPE (folded)))
return folded;
   break;
Index: gcc/testsuite/gcc.dg/torture/pr83685.c
===
--- gcc/testsuite/gcc.dg/torture/pr83685.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr83685.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile }  */
+
+int _setjmp (void *);
+void foo (int);
+
+void
+bar (int e, int b, char c, void *d)
+{
+  while (b)
+{
+  if (_setjmp (d))
+   foo (e);
+  if (c)
+   {
+ e--;
+ foo (0);
+   }
+  e++;
+}
+}


[PATCH] Fix PR83713

2018-01-08 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk 
sofar.

Richard.

2018-01-08  Richard Biener  

PR middle-end/83713
* convert.c (do_narrow): Properly guard TYPE_OVERFLOW_WRAPS checks.

* g++.dg/torture/pr83713.C: New testcase.

Index: gcc/convert.c
===
--- gcc/convert.c   (revision 256329)
+++ gcc/convert.c   (working copy)
@@ -471,8 +471,10 @@ do_narrow (location_t loc,
 type in case the operation in outprec precision
 could overflow.  Otherwise, we would introduce
 signed-overflow undefinedness.  */
- || ((!TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0))
-  || !TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg1)))
+ || ((!(INTEGRAL_TYPE_P (TREE_TYPE (arg0))
+&& TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0)))
+  || !(INTEGRAL_TYPE_P (TREE_TYPE (arg1))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg1
  && ((TYPE_PRECISION (TREE_TYPE (arg0)) * 2u
   > outprec)
  || (TYPE_PRECISION (TREE_TYPE (arg1)) * 2u
Index: gcc/testsuite/g++.dg/torture/pr83713.C
===
--- gcc/testsuite/g++.dg/torture/pr83713.C  (nonexistent)
+++ gcc/testsuite/g++.dg/torture/pr83713.C  (working copy)
@@ -0,0 +1,12 @@
+// { dg-do compile }
+
+class a
+{
+  char b;
+  void c ();
+};
+void
+a::c ()
+{
+  &b + ((long long) &b & 0);
+}


Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-08 Thread Wilco Dijkstra
Segher Boessenkool wrote:
> On Fri, Jan 05, 2018 at 12:22:44PM +, Wilco Dijkstra wrote:
>> An example epilog in a shrinkwrapped function before:
>> 
>> ldp    x21, x22, [sp,#16]
>> ldr    x23, [sp,#32]
>> ldr    x24, [sp,#40]
>> ldp    x25, x26, [sp,#48]
>> ldr    x27, [sp,#64]
>> ldr    x28, [sp,#72]
>> ldr    x30, [sp,#80]
>> ldr    d8, [sp,#88]
>> ldp    x19, x20, [sp],#96
>> ret
>
> In this example, the compiler already can make a ldp for both x23/x24 and
> x27/x28 just fine (if not in emit_epilogue_components, then simply in a
> peephole); why did that not work?  Or is this not the actual generated
> machine code (and there are labels between the insns, for example)?

This block originally had a label in it, 2 blocks emitted identical restores and
then branched to the final epilog. The final epilogue was then duplicated so
we end up with 2 almost identical epilogs of 10 instructions (almost since
there were 1-2 unrelated instructions in both blocks).

Peepholing is very conservative about instructions using SP and won't touch
anything frame related. If this was working better then the backend could just
emit single loads/stores and let peepholing generate LDP/STP.

However this is not the real issue. In the worst case the current code may
only emit LDR and STR. If there are multiple callee-saves in a block, we
want to use LDP/STP, and if there is an odd number of registers, we want
to add a callee-save from an inner block.

Another issue is that after pro_and_epilogue pass I see multiple restores
of the same registers and then a branch to the same block. We should try
to avoid the unnecessary duplication.

Wilco

[testsuite] Xfail ssa-dom-cse-2.c for nvptx

2018-01-08 Thread Tom de Vries

Hi,

For nvptx we have:
...
FAIL: gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;"
...

The test-case is compiled with -O3, which implies -ftree-loop-vectorize 
and -ftree-slp-vectorize.


I've investigated the test-case on x86_64, and there the test-case fails 
when specifying -fno-tree-loop-vectorize, but passes again when adding 
-fno-tree-slp-vectorize.


For nvptx, loop vectorization does nothing but slp vectorization manages 
to do a transformation, which matches the failing scenario on x86_64, 
and with similar gimple code.


So, I think we expect this scan test to fail for nvptx.

Tested on nvptx and committed.

Thanks,
- Tom
Xfail ssa-dom-cse-2.c for nvptx

2018-01-08  Tom de Vries  

	* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Xfail scan for nvptx.

---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index a660e82..7e88516 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -25,4 +25,4 @@ foo ()
but the loop reads only one element at a time, and DOM cannot resolve these.
The same happens on powerpc depending on the SIMD support available.  */
 
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* } || { lp64 && { powerpc*-*-* sparc*-*-* riscv*-*-* } } } } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* nvptx*-*-* } || { lp64 && { powerpc*-*-* sparc*-*-* riscv*-*-* } } } } } } */


Re: [v3 PATCH] Make optional conditionally trivially_{copy,move}_{constructible,assignable}

2018-01-08 Thread Jonathan Wakely

On 25/12/17 23:59 +0200, Ville Voutilainen wrote:

In the midst of the holiday season, the king and ruler of all elves, otherwise
known as The Elf, was told by little elves that users are complaining how
stlstl and libc++ make optional's copy and move operations conditionally
trivial, but libstdc++ doesn't. This made The Elf fairly angry, and he spoke
"this will not stand".

Tested on Linux-PPC64. The change is an ABI break due to changing
optional to a trivially copyable type. It's perhaps
better to get that ABI break in now rather than later.


Agreed, but a few comments and questions below.



@@ -203,6 +200,39 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  this->_M_construct(std::move(__other._M_payload));
  }

+  _Optional_payload&
+  operator=(const _Optional_payload& __other)
+  {
+if (this->_M_engaged && __other._M_engaged)
+  this->_M_get() = __other._M_get();
+else
+ {
+   if (__other._M_engaged)
+ this->_M_construct(__other._M_get());
+   else
+ this->_M_reset();
+ }
+
+return *this;
+  }
+
+  _Optional_payload&
+  operator=(_Optional_payload&& __other)
+  noexcept(__and_,
+ is_nothrow_move_assignable<_Tp>>())
+  {
+   if (this->_M_engaged && __other._M_engaged)
+ this->_M_get() = std::move(__other._M_get());
+   else
+ {
+   if (__other._M_engaged)
+ this->_M_construct(std::move(__other._M_get()));
+   else
+ this->_M_reset();
+ }
+   return *this;
+  }


Please make the whitespace before the return statement consistent in
these two assignment operators (one has a blank line and uses spaces,
one uses a tab).


@@ -226,95 +256,86 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_Stored_type(std::forward<_Args>(__args)...);
  this->_M_engaged = true;
}
-};
-
-  // Payload for non-constexpr optionals with trivial destructor.
-  template 
-struct _Optional_payload<_Tp, false, true>
-{
-  constexpr _Optional_payload()
-   : _M_empty() {}
-
-  template 
-  constexpr _Optional_payload(in_place_t, _Args&&... __args)
-   : _M_payload(std::forward<_Args>(__args)...),
- _M_engaged(true) {}
-
-  template
-  constexpr _Optional_payload(std::initializer_list<_Up> __il,
- _Args&&... __args)
-   : _M_payload(__il, std::forward<_Args>(__args)...),
- _M_engaged(true) {}
-  constexpr
-  _Optional_payload(bool __engaged, const _Optional_payload& __other)
-   : _Optional_payload(__other)
-  {}

-  constexpr
-  _Optional_payload(bool __engaged, _Optional_payload&& __other)
-   : _Optional_payload(std::move(__other))
-  {}
+  // The _M_get operations have _M_engaged as a precondition.
+  constexpr _Tp&
+   _M_get() noexcept
+  {
+   return this->_M_payload;
+  }

-  constexpr _Optional_payload(const _Optional_payload& __other)
+  constexpr const _Tp&
+   _M_get() const noexcept
  {
-   if (__other._M_engaged)
- this->_M_construct(__other._M_payload);
+   return this->_M_payload;
  }

-  constexpr _Optional_payload(_Optional_payload&& __other)
+  // _M_reset is a 'safe' operation with no precondition.
+  void
+  _M_reset()


Should this be noexcept?


  {
-   if (__other._M_engaged)
- this->_M_construct(std::move(__other._M_payload));
+   if (this->_M_engaged)
+ {
+   this->_M_engaged = false;
+   this->_M_payload.~_Stored_type();
+ }
  }
+  };


This closing brace seems to be indented incorrectly.


-  using _Stored_type = remove_const_t<_Tp>;
-  struct _Empty_byte { };
-  union {
-  _Empty_byte _M_empty;
-  _Stored_type _M_payload;
-  };
-  bool _M_engaged = false;
+  template
+class _Optional_base_impl
+  {
+  protected:


And thos whole class body should be indented to line up with the
"class" keyword.


+using _Stored_type = remove_const_t<_Tp>;
+
+// The _M_construct operation has !_M_engaged as a precondition
+// while _M_destruct has _M_engaged as a precondition.
+template
+void
+_M_construct(_Args&&... __args)
+  noexcept(is_nothrow_constructible<_Stored_type, _Args...>())
+{
+  ::new
+   (std::__addressof(static_cast<_Dp*>(this)->_M_payload._M_payload))
+   _Stored_type(std::forward<_Args>(__args)...);
+  static_cast<_Dp*>(this)->_M_payload._M_engaged = true;
+}

-  template
-void
-_M_construct(_Args&&... __args)
-noexcept(is_nothrow_constructible<_Stored_type, _Args...>())
-{
-  ::new ((void *) std::__addressof(this->_M_payload))
-_Stored_type(std::forward<_Args>(__args)...);
-  this->_M_engaged = true;
-}
-};
+void
+_M_destruct()


noexcept?


+   

[wwwdocs] Add GCC 7.3 section

2018-01-08 Thread Sebastian Huber

Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.96
diff -u -r1.96 changes.html
--- htdocs/gcc-7/changes.html   4 Aug 2017 12:44:54 - 1.96
+++ htdocs/gcc-7/changes.html   8 Jan 2018 13:42:20 -
@@ -1283,5 +1283,22 @@
  double has been added.
    

+
+GCC 7.3
+
+This is the href="https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED&target_milestone=7.3";>list

+of problem reports (PRs) from GCC's bug tracking system that are
+known to be fixed in the 7.3 release. This list might not be
+complete (that is, it is possible that some PRs that have been fixed
+are not listed here).
+
+
+Operating Systems
+
+RTEMS
+   
+ Support for EPIPHANY has been added.
+   
+
 
 

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: std::forward_list optim for always equal allocator

2018-01-08 Thread Jonathan Wakely

On 23/11/17 22:22 +0100, François Dumont wrote:

Gentle reminder for this patch.

I looked when the constructor got unused and I think it is back in 
June 2015 in git commit:


commit debb6aabb771ed02cb7256a7719555e5fbd7d3f7
Author: redi 
Date:   Wed Jun 17 17:45:45 2015 +

    * include/bits/forward_list.h
    (_Fwd_list_base(const _Node_alloc_type&)): Change parameter to
    rvalue-reference.


Hmm, I should have put that same change on the gcc-5-branch too.

If you fear abi breaking change I can restore it in a 
!_GLIBCXX_INLINE_VERSION section.


I think if there was a problem here my June 2015 change would already
have caused it (when I changed the _Fwd_list_base constructor
signatures).

So let's assume it's OK to remove the constructor.



@@ -533,15 +560,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER

  /**
   *  @brief  The %forward_list move constructor.
-   *  @param  __list  A %forward_list of identical element and allocator
-   *  types.
+   *  @param  A %forward_list of identical element and allocator types.


This change is wrong, you can't just remove the parameter name,
because now Doxygen will document a parameter called "A" (and complain
that there is no such parameter).

It would be better to leave the name __list there and just get the
warning.

Otherwise the patch is OK for trunk (please ensure to update the
Copyright dates in the test files to 2018).

Thanks.




Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-08 Thread Richard Earnshaw (lists)
On 08/01/18 02:20, Bill Schmidt wrote:
> Hi Richard,
> 
> Unfortunately, I don't see any way that this will be useful for the ppc 
> targets.  We don't
> have a way to force resolution of a condition prior to continuing 
> speculation, so this
> will just introduce another comparison that we would speculate past.  For our 
> mitigation
> we will have to introduce an instruction that halts all speculation at that 
> point, and place
> it in front of all dangerous loads.  I wish it were otherwise.

So can't you make the builtin expand to (in pseudo code):

if (bounds_check)
  {
__asm ("barrier");
result = *ptr;
  }
else
  result = failval;

R.

> 
> Thanks,
> Bill
> 
>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw  
>> wrote:
>>
>>
>> This patch adds generic support for the new builtin
>> __builtin_load_no_speculate.  It provides the overloading of the
>> different access sizes and a default fall-back expansion for targets
>> that do not support a mechanism for inhibiting speculation.
>>
>>  * builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>>  New builtin type signature.
>>  (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>  (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>  (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>  (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>  * builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
>>  (BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
>>  (BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
>>  (BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
>>  (BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
>>  (BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
>>  * target.def (inhibit_load_speculation): New hook.
>>  * doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
>>  documentation.
>>  * doc/tm.texi: Regenerated.
>>  * doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
>>  * doc/extend.texi: Document __builtin_load_no_speculate.
>>  * c-family/c-common.c (load_no_speculate_resolve_size): New function.
>>  (load_no_speculate_resolve_params): New function.
>>  (load_no_speculate_resolve_return): New function.
>>  (resolve_overloaded_builtin): Handle overloading
>>  __builtin_load_no_speculate.
>>  * builtins.c (expand_load_no_speculate): New function.
>>  (expand_builtin): Handle new no-speculation builtins.
>>  * targhooks.h (default_inhibit_load_speculation): Declare.
>>  * targhooks.c (default_inhibit_load_speculation): New function.
>> ---
>> gcc/builtin-types.def   |  16 +
>> gcc/builtins.c  |  99 ++
>> gcc/builtins.def|  22 ++
>> gcc/c-family/c-common.c | 164 
>> 
>> gcc/c-family/c-cppbuiltin.c |   5 +-
>> gcc/doc/cpp.texi|   4 ++
>> gcc/doc/extend.texi |  53 ++
>> gcc/doc/tm.texi |   6 ++
>> gcc/doc/tm.texi.in  |   2 +
>> gcc/target.def  |  20 ++
>> gcc/targhooks.c |  69 +++
>> gcc/targhooks.h |   3 +
>> 12 files changed, 462 insertions(+), 1 deletion(-)
>>
>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>
> 



Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-08 Thread Bill Schmidt

> On Jan 7, 2018, at 10:47 PM, Jeff Law  wrote:
> 
> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>> Hi Richard,
>> 
>> Unfortunately, I don't see any way that this will be useful for the ppc 
>> targets.  We don't
>> have a way to force resolution of a condition prior to continuing 
>> speculation, so this
>> will just introduce another comparison that we would speculate past.  For 
>> our mitigation
>> we will have to introduce an instruction that halts all speculation at that 
>> point, and place
>> it in front of all dangerous loads.  I wish it were otherwise.
> So could you have an expander for __builtin_load_no_speculate that just
> emits the magic insn that halts all speculation and essentially ignores
> the additional stuff that __builtin_load_no_speculate might be able to
> do on other platforms?

This is possible, but the builtin documentation is completely misleading for
powerpc.  We will not provide the semantics that this builtin claims to provide.
So at a minimum we would need the documentation to indicate that the additional
range-checking is target-specific behavior as well, not just the speculation 
code.
At that point it isn't really a very target-neutral solution.

What about other targets?  This builtin seems predicated on specific behavior
of ARM architecture; I don't know whether other targets have a guaranteed
speculation-rectifying conditional test.

For POWER, all we would need, or be able to exploit, is 

void __builtin_speculation_barrier ()

or some such.  If there are two dangerous loads in one block, a single call
to this suffices, but a generic solution involving range checks for specific
loads would require one per load.

Thanks,
Bill  

> 
> jeff
> 



Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread Alan Modra
On Sun, Jan 07, 2018 at 04:36:20PM -0700, Jeff Law wrote:
> On 01/07/2018 03:58 PM, H.J. Lu wrote:
> > This set of patches for GCC 8 mitigates variant #2 of the speculative 
> > execution
> > vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.
[snip]
> My fundamental problem with this patchkit is that it is 100% x86/x86_64
> specific.

It's possible that x86 needs spectre variant 2 mitigation that isn't
necessary on other modern processors like ARM and PowerPC, so let's
not rush into general solutions designed around x86..

Here's a quick overview of Spectre.  For more, see
https://spectreattack.com/spectre.pdf
https://googleprojectzero.blogspot.com.au/2018/01/reading-privileged-memory-with-side.html
https://developer.arm.com/-/media/Files/pdf/Cache_Speculation_Side-channels.pdf

The simplest example of ideal "gadget" code that can be exploited by
an attacker who can control the value of x, perhaps as a parameter to
some service provided by the victim is:

char *array1, *array2;
y = array2[array1[x] * cache_line_size];

The idea being that with the appropriate x, array1[x] can load any
byte of interest in the victim, with the array2 load evicting a cache
line detectable by the attacker.  The value of the byte of interest
can then be inferred by which cache line was affected.

Typical code of course checks user input.

if (x < array1_size)
  y = array2[array1[x] * cache_line_size];

Spectre variant 1 preloads the branch predictor to make the condition
predict as true.  Then when the out-of-range value of x is given,
speculative execution runs the gadget code affecting the cache.  Even
though this speculative execution is rolled back, the cache remains
affected..

Spectre variant 2 preloads the branch target predictor for indirect
branches so that some indirect branch in the victim, eg. a PLT call,
speculatively executes gadget code found somewhere in the victim.


So, to mitigate Spectre variant 1, ensure that speculative execution
doesn't get as far as the array2 load.  You could do that by modifying
the above code to

if (x < array1_size)
  {
/* speculation barrier goes here */
y = array2[array1[x] * cache_line_size];
  }

But you could also do

if (x < array1_size)
  {
tmp = array1[x] * cache_line_size;
/* speculation barrier goes here */
y = array2[tmp];
  }

This has the advantage of killing variant 2 attacks for this gadget
too.  If you ensure there are no gadgets anywhere, then variant 2
attacks are not possible.  Besides compiler changes to prevent gadgets
being emitted you also need compiler and linker changes to not emit
read-only data in executable segments, because data might just happen
to be a gadget when executed.

However, x86 has the additional problem of variable length
instructions.  Gadgets might be hiding in code when executed at an
offset from the start of the "real" instructions.  Which is why x86 is
more at risk from this attack than other processors, and why x86 needs
something like the posted variant 2 mitigation, slowing down all
indirect branches.

-- 
Alan Modra
Australia Development Lab, IBM


[Patch, fortran] PR52162 - Bogus -fcheck=bounds with realloc on assignment to unallocated LHS

2018-01-08 Thread Paul Richard Thomas
I post this patch early last year and did not submit because I was up
to my eyeballs with PR34640. I just forgot about it until it came up
on clf a few days ago.

Bootstraps and regtests on FC23/x86_64 - OK for trunk?

Paul

2018-01-08  Paul Thomas  

PR fortran/52162
* trans-expr.c (gfc_trans_scalar_assign): Flag is_alloc_lhs if
the rhs expression is neither an elemental nor a conversion
function.

2018-01-08  Paul Thomas  

PR fortran/52162
* gfortran.dg/bounds_check_19.f90 : New test.
Index: gcc/fortran/trans-expr.c
===
*** gcc/fortran/trans-expr.c(revision 256335)
--- gcc/fortran/trans-expr.c(working copy)
*** gfc_trans_assignment_1 (gfc_expr * expr1
*** 9924,9932 
/* Walk the lhs.  */
lss = gfc_walk_expr (expr1);
if (gfc_is_reallocatable_lhs (expr1)
!   && !(expr2->expr_type == EXPR_FUNCTION
!&& expr2->value.function.isym != NULL))
  lss->is_alloc_lhs = 1;
rss = NULL;
  
if ((expr1->ts.type == BT_DERIVED)
--- 9924,9935 
/* Walk the lhs.  */
lss = gfc_walk_expr (expr1);
if (gfc_is_reallocatable_lhs (expr1)
!   && !(expr2->expr_type == EXPR_FUNCTION
!  && expr2->value.function.isym != NULL
!  && !(expr2->value.function.isym->elemental
!   || expr2->value.function.isym->conversion)))
  lss->is_alloc_lhs = 1;
+ 
rss = NULL;
  
if ((expr1->ts.type == BT_DERIVED)
Index: gcc/testsuite/gfortran.dg/bounds_check_19.f90
===
*** gcc/testsuite/gfortran.dg/bounds_check_19.f90   (nonexistent)
--- gcc/testsuite/gfortran.dg/bounds_check_19.f90   (working copy)
***
*** 0 
--- 1,24 
+ ! { dg-do run }
+ ! { dg-options "-fbounds-check" }
+ !
+ ! Test the fix for PR52162 in which the elemental and conversion
+ ! intrinsics in lines 14 and 19 would cause the bounds check to fail.
+ !
+ ! Contributed by Dominique d'Humieres  
+ !
+ integer(4), allocatable :: a(:)
+ integer(8), allocatable :: b(:)
+ real, allocatable :: c(:)
+ allocate (b(7:11), source = [7_8,8_8,9_8,10_8,11_8])
+ 
+ a = b ! Implicit conversion
+ 
+ if (lbound (a, 1) .ne. lbound(b, 1)) call abort
+ if (ubound (a, 1) .ne. ubound(b, 1)) call abort
+ 
+ c = sin(real(b(9:11))/100_8) ! Elemental intrinsic
+ 
+ if ((ubound(c, 1) - lbound(c, 1)) .ne. 2) call abort
+ if (any (nint(asin(c)*100.0) .ne. b(9:11))) call abort
+ deallocate (a, b, c)
+   end


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 6:23 AM, Alan Modra  wrote:
> On Sun, Jan 07, 2018 at 04:36:20PM -0700, Jeff Law wrote:
>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>> > This set of patches for GCC 8 mitigates variant #2 of the speculative 
>> > execution
>> > vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.
> [snip]
>> My fundamental problem with this patchkit is that it is 100% x86/x86_64
>> specific.
>
> It's possible that x86 needs spectre variant 2 mitigation that isn't
> necessary on other modern processors like ARM and PowerPC, so let's
> not rush into general solutions designed around x86..
>
> Here's a quick overview of Spectre.  For more, see
> https://spectreattack.com/spectre.pdf
> https://googleprojectzero.blogspot.com.au/2018/01/reading-privileged-memory-with-side.html
> https://developer.arm.com/-/media/Files/pdf/Cache_Speculation_Side-channels.pdf
>
> The simplest example of ideal "gadget" code that can be exploited by
> an attacker who can control the value of x, perhaps as a parameter to
> some service provided by the victim is:
>
> char *array1, *array2;
> y = array2[array1[x] * cache_line_size];
>
> The idea being that with the appropriate x, array1[x] can load any
> byte of interest in the victim, with the array2 load evicting a cache
> line detectable by the attacker.  The value of the byte of interest
> can then be inferred by which cache line was affected.
>
> Typical code of course checks user input.
>
> if (x < array1_size)
>   y = array2[array1[x] * cache_line_size];
>
> Spectre variant 1 preloads the branch predictor to make the condition
> predict as true.  Then when the out-of-range value of x is given,
> speculative execution runs the gadget code affecting the cache.  Even
> though this speculative execution is rolled back, the cache remains
> affected..
>
> Spectre variant 2 preloads the branch target predictor for indirect
> branches so that some indirect branch in the victim, eg. a PLT call,
> speculatively executes gadget code found somewhere in the victim.
>
>
> So, to mitigate Spectre variant 1, ensure that speculative execution
> doesn't get as far as the array2 load.  You could do that by modifying
> the above code to
>
> if (x < array1_size)
>   {
> /* speculation barrier goes here */
> y = array2[array1[x] * cache_line_size];
>   }
>
> But you could also do
>
> if (x < array1_size)
>   {
> tmp = array1[x] * cache_line_size;
> /* speculation barrier goes here */
> y = array2[tmp];
>   }
>
> This has the advantage of killing variant 2 attacks for this gadget
> too.  If you ensure there are no gadgets anywhere, then variant 2
> attacks are not possible.  Besides compiler changes to prevent gadgets
> being emitted you also need compiler and linker changes to not emit
> read-only data in executable segments, because data might just happen
> to be a gadget when executed.

See:

https://sourceware.org/ml/binutils/2017-11/msg00369.html

-- 
H.J.


[PATCH] Fix PR83563

2018-01-08 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-01-08  Richard Biener  

PR tree-optimization/83563
* graphite.c (canonicalize_loop_closed_ssa_form): Reset the SCEV
cache.

* gcc.dg/graphite/pr83563.c: New testcase.

Index: gcc/graphite.c
===
--- gcc/graphite.c  (revision 256329)
+++ gcc/graphite.c  (working copy)
@@ -322,6 +323,10 @@ canonicalize_loop_closed_ssa_form (void)
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
 canonicalize_loop_closed_ssa (loop);
 
+  /* We can end up releasing duplicate exit PHIs and also introduce
+ additional copies so the cached information isn't correct anymore.  */
+  scev_reset ();
+
   checking_verify_loop_closed_ssa (true);
 }
 
Index: gcc/testsuite/gcc.dg/graphite/pr83563.c
===
--- gcc/testsuite/gcc.dg/graphite/pr83563.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr83563.c (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fgraphite -ftree-loop-distribution 
-fno-tree-dominator-opts -fno-tree-sink -fno-tree-dce" } */
+
+void
+sy (void)
+{
+  int hb;
+
+  for (hb = 1; hb != 0; hb += hb)
+{
+}
+
+  while (hb < 1)
+++hb;
+}


[PR83663] Revert r255946

2018-01-08 Thread Vidya Praveen
Hello,

This patch reverts the changes introduced by r255946 and further changes
to that done by r256195, as the former causes large number of regressions
on aarch64_be* targets. This should be respun with the mismatch in lane
numbering in AArch64 and GCC's numbering fixed as explained in PR83663.

OK for trunk?

VP.


ChangeLog:

gcc/

PR target/83663 - Revert r255946

* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
across a vec_duplicate and a paradoxical subreg forming a vector
mode to a vec_concat.

gcc/testsuite/

PR target/83663 - Revert r255946

* gcc.target/aarch64/vect-slp-dup.c: New.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a189605..03a92b6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12129,51 +12129,9 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	maxv = matches[i][1];
 	  }
 
-  /* Create a duplicate of the most common element, unless all elements
-	 are equally useless to us, in which case just immediately set the
-	 vector register using the first element.  */
-
-  if (maxv == 1)
-	{
-	  /* For vectors of two 64-bit elements, we can do even better.  */
-	  if (n_elts == 2
-	  && (inner_mode == E_DImode
-		  || inner_mode == E_DFmode))
-
-	{
-	  rtx x0 = XVECEXP (vals, 0, 0);
-	  rtx x1 = XVECEXP (vals, 0, 1);
-	  /* Combine can pick up this case, but handling it directly
-		 here leaves clearer RTL.
-
-		 This is load_pair_lanes, and also gives us a clean-up
-		 for store_pair_lanes.  */
-	  if (memory_operand (x0, inner_mode)
-		  && memory_operand (x1, inner_mode)
-		  && !STRICT_ALIGNMENT
-		  && rtx_equal_p (XEXP (x1, 0),
-  plus_constant (Pmode,
-		 XEXP (x0, 0),
-		 GET_MODE_SIZE (inner_mode
-		{
-		  rtx t;
-		  if (inner_mode == DFmode)
-		t = gen_load_pair_lanesdf (target, x0, x1);
-		  else
-		t = gen_load_pair_lanesdi (target, x0, x1);
-		  emit_insn (t);
-		  return;
-		}
-	}
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
-	  aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
-	  maxelement = 0;
-	}
-  else
-	{
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
-	  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
-	}
+  /* Create a duplicate of the most common element.  */
+  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
 
   /* Insert the rest.  */
   for (int i = 0; i < n_elts; i++)
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 6cb5a6e..b052fbb 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5888,57 +5888,6 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 		return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
 	}
 
-	  /* Replace:
-
-	  (vec_merge:outer (vec_duplicate:outer x:inner)
-			   (subreg:outer y:inner 0)
-			   (const_int N))
-
-	 with (vec_concat:outer x:inner y:inner) if N == 1,
-	 or (vec_concat:outer y:inner x:inner) if N == 2.
-	 We assume that degenrate cases (N == 0 or N == 3), which
-	 represent taking all elements from either input, are handled
-	 elsewhere.
-
-	 Implicitly, this means we have a paradoxical subreg, but such
-	 a check is cheap, so make it anyway.
-
-	 Only applies for vectors of two elements.  */
-
-	  if ((GET_CODE (op0) == VEC_DUPLICATE
-	   || GET_CODE (op1) == VEC_DUPLICATE)
-	  && GET_MODE (op0) == GET_MODE (op1)
-	  && known_eq (GET_MODE_NUNITS (GET_MODE (op0)), 2)
-	  && known_eq (GET_MODE_NUNITS (GET_MODE (op1)), 2)
-	  && IN_RANGE (sel, 1, 2))
-	{
-	  rtx newop0 = op0, newop1 = op1;
-
-	  /* Canonicalize locally such that the VEC_DUPLICATE is always
-		 the first operand.  */
-	  if (GET_CODE (newop1) == VEC_DUPLICATE)
-		{
-		  std::swap (newop0, newop1);
-		  /* If we swap the operand order, we also need to swap
-		 the selector mask.  */
-		  sel = sel == 1 ? 2 : 1;
-		}
-
-	  if (GET_CODE (newop1) == SUBREG
-		  && paradoxical_subreg_p (newop1)
-		  && subreg_lowpart_p (newop1)
-		  && GET_MODE (SUBREG_REG (newop1))
-		  == GET_MODE (XEXP (newop0, 0)))
-		{
-		  newop0 = XEXP (newop0, 0);
-		  newop1 = SUBREG_REG (newop1);
-		  if (sel == 2)
-		std::swap (newop0, newop1);
-		  return simplify_gen_binary (VEC_CONCAT, mode,
-	  newop0, newop1);
-		}
-	}
-
 	  /* Replace (vec_merge (vec_duplicate x) (vec_duplicate y)
  (const_int n))
 	 with (vec_concat x y) or (vec_concat y x) depending on value
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c b/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c
deleted file mode 100644
index 0541

Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread Jakub Jelinek
On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote:
> See:
> 
> https://sourceware.org/ml/binutils/2017-11/msg00369.html

Program Headers:
  Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD   0x00 0x 0x 0x00200 0x00200 R   0x20
  LOAD   0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x20
  LOAD   0x001000 0x00201000 0x00201000 0x00058 0x00058 R   0x20
  LOAD   0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW  0x20
  DYNAMIC0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW  0x4
  GNU_STACK  0x00 0x 0x 0x0 0x0 RW  0x10
  GNU_RELRO  0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R   0x1

Uh, 3 read-only LOADs instead of 2?  Shouldn't then all the read-only
non-executable sections be emitted together, so that you have a R, then R E,
then RW PT_LOADs?

Jakub


Re: [PATCH] rs6000: Cleanup bdz/bdnz insn/splitter, add new insn/splitter for bdzt/bdzf/bdnzt/bdnzf

2018-01-08 Thread Aaron Sawdey
On Fri, 2017-12-01 at 16:45 -0600, Segher Boessenkool wrote:
> Looks good otherwise.  I'll ok it when there is a user (or a
> testcase).
> It shouldn't go in before the canonicalize_condition patch, of
> course.

The canonicalize_condition patch is in, so I have checked in this
cleanup and addition to the patterns and splitters for the branch
decrement instructions as 256344.

2018-01-08  Aaron Sawdey  

* config/rs6000/rs6000.md (cceq_ior_compare): Remove * so I can use it
to generate rtl.
(cceq_ior_compare_complement): Give it a name so I can use it, and
change boolean_or_operator predicate to boolean_operator so it can
be used to generate a crand.
(eqne): New code iterator.
(bd/bd_neg): New code_attrs.
(_): New name for ctr_internal[12] now combined into
a single define_insn.
(tf_): A new insn pattern for the conditional form branch
decrement (bdnzt/bdnzf/bdzt/bdzf).
* config/rs6000/rs6000.c (rs6000_legitimate_combined_insn): Updated
with the new names of the branch decrement patterns, and added the
names of the branch decrement conditional patterns.


-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md	(revision 256216)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -12797,7 +12797,7 @@
 ; which are generated by the branch logic.
 ; Prefer destructive operations where BT = BB (for crXX BT,BA,BB)
 
-(define_insn "*cceq_ior_compare"
+(define_insn "cceq_ior_compare"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=y,?y")
 (compare:CCEQ (match_operator:SI 1 "boolean_operator"
 	[(match_operator:SI 2
@@ -12817,9 +12817,9 @@
 
 ; Why is the constant -1 here, but 1 in the previous pattern?
 ; Because ~1 has all but the low bit set.
-(define_insn ""
+(define_insn "cceq_ior_compare_complement"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=y,?y")
-(compare:CCEQ (match_operator:SI 1 "boolean_or_operator"
+(compare:CCEQ (match_operator:SI 1 "boolean_operator"
 	[(not:SI (match_operator:SI 2
   "branch_positive_comparison_operator"
   [(match_operand 3
@@ -13036,34 +13036,13 @@
 ;; rs6000_legitimate_combined_insn prevents combine creating any of
 ;; the ctr insns.
 
-(define_insn "ctr_internal1"
-  [(set (pc)
-	(if_then_else (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
-			  (const_int 1))
-		  (label_ref (match_operand 0))
-		  (pc)))
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
-	(plus:P (match_dup 1)
-		(const_int -1)))
-   (clobber (match_scratch:CC 3 "=X,&x,&x,&x"))
-   (clobber (match_scratch:P 4 "=X,X,&r,r"))]
-  ""
-{
-  if (which_alternative != 0)
-return "#";
-  else if (get_attr_length (insn) == 4)
-return "bdnz %l0";
-  else
-return "bdz $+8\;b %l0";
-}
-  [(set_attr "type" "branch")
-   (set_attr "length" "*,16,20,20")])
+(define_code_iterator eqne [eq ne])
+(define_code_attr bd [(eq "bdz") (ne "bdnz")])
+(define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
 
-;; Similar but use EQ
-
-(define_insn "ctr_internal2"
+(define_insn "_"
   [(set (pc)
-	(if_then_else (eq (match_operand:P 1 "register_operand" "c,*b,*b,*b")
+	(if_then_else (eqne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
 			  (const_int 1))
 		  (label_ref (match_operand 0))
 		  (pc)))
@@ -13077,15 +13056,14 @@
   if (which_alternative != 0)
 return "#";
   else if (get_attr_length (insn) == 4)
-return "bdz %l0";
+return " %l0";
   else
-return "bdnz $+8\;b %l0";
+return " $+8\;b %l0";
 }
   [(set_attr "type" "branch")
(set_attr "length" "*,16,20,20")])
 
-;; Now the splitters if we could not allocate the CTR register
-
+;; Now the splitter if we could not allocate the CTR register
 (define_split
   [(set (pc)
 	(if_then_else (match_operator 2 "comparison_operator"
@@ -13093,19 +13071,13 @@
    (const_int 1)])
 		  (match_operand 5)
 		  (match_operand 6)))
-   (set (match_operand:P 0 "int_reg_operand")
+   (set (match_operand:P 0 "nonimmediate_operand")
 	(plus:P (match_dup 1)
 		(const_int -1)))
(clobber (match_scratch:CC 3))
(clobber (match_scratch:P 4))]
   "reload_completed"
-  [(set (match_dup 3)
-	(compare:CC (match_dup 1)
-		(const_int 1)))
-   (set (match_dup 0)
-	(plus:P (match_dup 1)
-		(const_int -1)))
-   (set (pc)
+  [(set (pc)
 	(if_then_else (match_dup 7)
 		  (match_dup 5)
 		  (match_dup 6)))]
@@ -13112,37 +13084,124 @@
 {
   operands[7] = gen_rtx_fmt_ee (GET_CODE (operands[2]), VOIDmode, operands[3],
 const0_rtx);
+  emit_insn (gen_rtx_SET (operands[3],
+			  gen_rtx_COMPARE (CCmode, operands[1], const1_rtx)));
+  if (gpc_reg_operand (operands[0], mode))
+emit_insn (gen_add3 (

[PATCH][arm] Add -march=armv8.3-a and dotprod multilib selection rules

2018-01-08 Thread Kyrill Tkachov

Hi all,

We don't have the t-aprofile, t-multilib and t-arm-elf mapping
rules for multilibs when using the variants of -march=armv8.3-a
and the dotproduct extension.
This patch adds them. -march=armv8.3-a behaves in the same
way as -march=armv8.2-a in this regard.

Bootstrapped and tested with the aprofile multilib list.
Checked that --print-multi-directory gives sensible results
with armv8.3-a options and extensions.
I've also added some armv8.3-a, fp16 and dotprod
combination tests to multilib.exp

Committing to trunk.

Thanks,
Kyrill

2018-01-08  Kyrylo Tkachov  

* config/arm/t-aprofile (MULTILIB_MATCHES): Add mapping rules for
-march=armv8.3-a variants.
* config/arm/t-multilib: Likewise.
* config/arm/t-arm-elf: Likewise.  Handle dotprod extension.

2018-01-08  Kyrylo Tkachov  

* gcc.target/arm/multilib.exp: Add fp16, dotprod and armv8.3-a
combination tests.
commit 1ddc344d07ed643926e0f91c8467b3f0973483c0
Author: Kyrylo Tkachov 
Date:   Mon Dec 18 19:57:22 2017 +

[arm] Add -march=armv8.3-a and dotprod multilib selection rules

diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index 3e92c62..6c34c09 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -88,9 +88,13 @@ MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_1_a_simd_variants), \
 # Baseline v8.2-a: map down to baseline v8-a
 MULTILIB_MATCHES	+= march?armv8-a=march?armv8.2-a
 
-# Map all v8.2-a SIMD variants to v8-a+simd
+# Baseline v8.3-a: map down to baseline v8-a
+MULTILIB_MATCHES	+= march?armv8-a=march?armv8.3-a
+
+# Map all v8.2-a and v8.3-a SIMD variants to v8-a+simd
 MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_2_a_simd_variants), \
-			 march?armv8-a+simd=march?armv8.2-a$(ARCH))
+			 march?armv8-a+simd=march?armv8.2-a$(ARCH) \
+			 march?armv8-a+simd=march?armv8.3-a$(ARCH))
 
 # Use Thumb libraries for everything.
 
diff --git a/gcc/config/arm/t-arm-elf b/gcc/config/arm/t-arm-elf
index ace4736..189ab9b 100644
--- a/gcc/config/arm/t-arm-elf
+++ b/gcc/config/arm/t-arm-elf
@@ -36,7 +36,7 @@ v7ve_fps	:= vfpv3-d16 vfpv3 vfpv3-d16-fp16 vfpv3-fp16 vfpv4 neon \
 
 # Not all these permutations exist for all architecture variants, but
 # it seems to work ok.
-v8_fps		:= simd fp16 crypto fp16+crypto
+v8_fps		:= simd fp16 crypto fp16+crypto dotprod
 
 # We don't do anything special with these.  Pre-v4t probably doesn't work.
 all_early_nofp	:= armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5t
@@ -46,7 +46,7 @@ all_early_arch	:= armv5e armv5tej armv6 armv6j armv6k armv6z armv6kz \
 
 all_v7_a_r	:= armv7-a armv7ve armv7-r
 
-all_v8_archs	:= armv8-a armv8-a+crc armv8.1-a armv8.2-a
+all_v8_archs	:= armv8-a armv8-a+crc armv8.1-a armv8.2-a armv8.3-a
 
 # No floating point variants, require thumb1 softfp
 all_nofp_t	:= armv6-m armv6s-m armv8-m.base
diff --git a/gcc/config/arm/t-multilib b/gcc/config/arm/t-multilib
index 58448a9..c0ca255 100644
--- a/gcc/config/arm/t-multilib
+++ b/gcc/config/arm/t-multilib
@@ -139,9 +139,13 @@ MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_1_a_simd_variants), \
 # Baseline v8.2-a: map down to baseline v8-a
 MULTILIB_MATCHES	+= march?armv7=march?armv8.2-a
 
-# Map all v8.2-a SIMD variants
+# Baseline v8.3-a: map down to baseline v8-a
+MULTILIB_MATCHES	+= march?armv7=march?armv8.3-a
+
+# Map all v8.2-a SIMD variants.  v8.3-a SIMD variants have the same mappings
 MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_2_a_simd_variants), \
-			 march?armv7+fp=march?armv8.2-a$(ARCH))
+			 march?armv7+fp=march?armv8.2-a$(ARCH) \
+			 march?armv7+fp=march?armv8.3-a$(ARCH))
 
 # Use Thumb libraries for everything.
 
diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp b/gcc/testsuite/gcc.target/arm/multilib.exp
index 5de87c4..c3c8e1f 100644
--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -72,6 +72,26 @@ if {[multilib_config "aprofile"] } {
 	{-march=armv8.2-a+simd+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
 	{-march=armv8.2-a+simd+crypto+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
 	{-march=armv8.2-a+simd+nofp+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.2-a+fp16 -mfloat-abi=soft} "thumb/v8-a/nofp"
+	{-march=armv8.2-a+simd+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.2-a+simd+fp16+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
+	{-march=armv8.2-a+simd+nofp+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.2-a+dotprod -mfloat-abi=soft} "thumb/v8-a/nofp"
+	{-march=armv8.2-a+simd+dotprod -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.2-a+simd+dotprod+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
+	{-march=armv8.2-a+simd+nofp+dotprod -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.3-a+crypto -mfloat-abi=soft} "thumb/v8-a/nofp"
+	{-march=armv8.3-a+simd+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.3-a+simd+crypto+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
+	{-march=armv8.3-a+simd+nofp+crypto -mfloat-abi=softfp}

RE: [PATCH 00/10][ARC] Critical fixes

2018-01-08 Thread Claudiu Zissulescu
>   [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV.
>   [ARC] Don't allow the last ZOL insn to be in a delay slot.
>   [ARC] Add trap instruction.
>   [ARC] Update legitimate constant hook.
>   [ARC] Enable unaligned access.
>   [ARC] Revamp trampoline implementation.
>   [ARC][ZOL] Update uses for hw-loop labels.
>   [ARC] Add ARCv2 core3 tune option.
>   [ARC][FIX] Consider command line ffixed- option.
>   [ARC] Update (u)maddsidi patterns.

Hi Andrew,

Thank you for reviewing this batch of fixes. Any chance to check also these 
ones, they are hanging there for a long time now:

https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00078.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00081.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00080.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00079.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00084.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00083.html
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00082.html

Thank you,
Claudiu


Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-08 Thread Richard Earnshaw (lists)
On 08/01/18 14:19, Bill Schmidt wrote:
> 
>> On Jan 7, 2018, at 10:47 PM, Jeff Law  wrote:
>>
>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>>> Hi Richard,
>>>
>>> Unfortunately, I don't see any way that this will be useful for the ppc 
>>> targets.  We don't
>>> have a way to force resolution of a condition prior to continuing 
>>> speculation, so this
>>> will just introduce another comparison that we would speculate past.  For 
>>> our mitigation
>>> we will have to introduce an instruction that halts all speculation at that 
>>> point, and place
>>> it in front of all dangerous loads.  I wish it were otherwise.
>> So could you have an expander for __builtin_load_no_speculate that just
>> emits the magic insn that halts all speculation and essentially ignores
>> the additional stuff that __builtin_load_no_speculate might be able to
>> do on other platforms?
> 
> This is possible, but the builtin documentation is completely misleading for
> powerpc.  We will not provide the semantics that this builtin claims to 
> provide.
> So at a minimum we would need the documentation to indicate that the 
> additional
> range-checking is target-specific behavior as well, not just the speculation 
> code.
> At that point it isn't really a very target-neutral solution.
> 
> What about other targets?  This builtin seems predicated on specific behavior
> of ARM architecture; I don't know whether other targets have a guaranteed
> speculation-rectifying conditional test.
> 
> For POWER, all we would need, or be able to exploit, is 
> 
>   void __builtin_speculation_barrier ()
> 
> or some such.  If there are two dangerous loads in one block, a single call
> to this suffices, but a generic solution involving range checks for specific
> loads would require one per load.
> 

Do you have any data to suggest that multiple /independent/ vulnerable
accesses occur under a single guarding condition more often than 'once
in a blue moon'?  It seems to me that would be highly unlikely.


R.



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-08 Thread Nick Clifton
Hi Guys,

  It seems to me that it might be worth taking a step back here,
  and consider adding a security framework to gcc.  Mitigations
  for CVEs in the past have resulted in individual patches being
  added to gcc, oftern in a target specific manner, and with no
  real framework to support them, document them, or indicate to
  an external tool that they have been applied.

  In addition security fixes often result in the generation of
  less optimal code, and so it might be useful to have a way to
  tell other parts of gcc that a given particular sequence should
  not be altered.

  Not that I am an expert in this area, but I do think that it is
  something that should be discussed...

Cheers
  Nick





Re: [PATCH, rs6000] Fix PR83677 (incorrect generation of xxpermr)

2018-01-08 Thread Segher Boessenkool
Hi!

On Thu, Jan 04, 2018 at 08:16:06AM -0600, Bill Schmidt wrote:
> https://gcc.gnu.org/PR83677 reports that generation of xxpermr is always
> wrong.  It effectively inverts the order of the two input registers from
> what they should be.  This patch addresses that and provides a test case
> modified from the original report.

How confusing.  Maybe it would be easier to read if the operands number
1 and 2 in the pattern had swapped numbers.

> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this okay for trunk and shortly for backport to GCC 7?  I will check
> on 6, but I'm pretty certain this was introduced in 7, as 6 has only
> minimal POWER9 support.

Okay for trunk and all branches where it is needed.  Thanks!

One minor testcase thingie:

> --- gcc/testsuite/gcc.target/powerpc/pr83677.c(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr83677.c(working copy)
> @@ -0,0 +1,166 @@
> +/* { dg-do run { target { powerpc64*-*-* && { lp64 && p9vector_hw } } } } */

powerpc*-*-* please; or why would that not work?


Segher


Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-08 Thread Bill Schmidt
On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists)  
wrote:
> 
> On 08/01/18 02:20, Bill Schmidt wrote:
>> Hi Richard,
>> 
>> Unfortunately, I don't see any way that this will be useful for the ppc 
>> targets.  We don't
>> have a way to force resolution of a condition prior to continuing 
>> speculation, so this
>> will just introduce another comparison that we would speculate past.  For 
>> our mitigation
>> we will have to introduce an instruction that halts all speculation at that 
>> point, and place
>> it in front of all dangerous loads.  I wish it were otherwise.
> 
> So can't you make the builtin expand to (in pseudo code):
> 
>   if (bounds_check)
> {
>   __asm ("barrier");
>   result = *ptr;
>  }
>else
> result = failval;

Could, but this just generates unnecessary code for Power.  We would instead 
generate

__asm ("barrier");
result = *ptr;

without any checks.  We would ignore everything but the first argument.

Thanks,
Bill

> 
> R.
> 
>> 
>> Thanks,
>> Bill
>> 
>>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw  
>>> wrote:
>>> 
>>> 
>>> This patch adds generic support for the new builtin
>>> __builtin_load_no_speculate.  It provides the overloading of the
>>> different access sizes and a default fall-back expansion for targets
>>> that do not support a mechanism for inhibiting speculation.
>>> 
>>> * builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>>> New builtin type signature.
>>> (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>> (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>> (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>> (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>> * builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
>>> (BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
>>> (BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
>>> (BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
>>> (BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
>>> (BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
>>> * target.def (inhibit_load_speculation): New hook.
>>> * doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
>>> documentation.
>>> * doc/tm.texi: Regenerated.
>>> * doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
>>> * doc/extend.texi: Document __builtin_load_no_speculate.
>>> * c-family/c-common.c (load_no_speculate_resolve_size): New function.
>>> (load_no_speculate_resolve_params): New function.
>>> (load_no_speculate_resolve_return): New function.
>>> (resolve_overloaded_builtin): Handle overloading
>>> __builtin_load_no_speculate.
>>> * builtins.c (expand_load_no_speculate): New function.
>>> (expand_builtin): Handle new no-speculation builtins.
>>> * targhooks.h (default_inhibit_load_speculation): Declare.
>>> * targhooks.c (default_inhibit_load_speculation): New function.
>>> ---
>>> gcc/builtin-types.def   |  16 +
>>> gcc/builtins.c  |  99 ++
>>> gcc/builtins.def|  22 ++
>>> gcc/c-family/c-common.c | 164 
>>> 
>>> gcc/c-family/c-cppbuiltin.c |   5 +-
>>> gcc/doc/cpp.texi|   4 ++
>>> gcc/doc/extend.texi |  53 ++
>>> gcc/doc/tm.texi |   6 ++
>>> gcc/doc/tm.texi.in  |   2 +
>>> gcc/target.def  |  20 ++
>>> gcc/targhooks.c |  69 +++
>>> gcc/targhooks.h |   3 +
>>> 12 files changed, 462 insertions(+), 1 deletion(-)
>>> 
>>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>
>> 
> 



Re: C++ PATCH to add a test for c++/81860

2018-01-08 Thread Rainer Orth
Hi Nathan,

> On 01/02/2018 09:36 AM, Marek Polacek wrote:
>> This test exercising inheriting a template constructor in this PR got
>> fixed with
>> r251426.  As I don't see any lambdas here, I thought it worth to add it.
>>
>> Tested on x86_64-linux, ok for trunk?
>>
>> 2018-01-02  Marek Polacek  
>>
>>  PR c++/81860
>>  * g++.dg/cpp0x/inh-ctor30.C: New test.
>>
>
> yes thanks

this test FAILs on a couple of targets: i386-pc-solaris2.1[01],
sparc-sun-solaris2.11, powerpc-ibm-aix7.2.0.0,
x86_64-apple-darwin15.6.0.

The former two have _ZN1AIjEC1Ev instead of _ZN1AIjEC2Ev which demangle
the same.  Should it accept both?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-08 Thread Bernd Edlinger
I thought about your new builtin again, and I wonder if
something like that might work as well?


cat despec.s
.arch armv7-a
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 4
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.section.text.startup,"ax",%progbits
.align  2
.global despec_i
.syntax unified
.arm
.fpu vfpv3-d16

despec_i:
cmp r0,#0
beq L0
ldr r0,[r1]
moveq r0,r2
nop {0x14} @ CSDB
str r0,[r1]
mov r0,#1
bx lr
L0: mov r0,#0
bx lr

cat test.c
extern int despec_i(int predicate, int *untrusted, int fallback);
#define N 8
int a[N] = {1,2,3,4,5,7,8,9};
int a2[0x200];
int test(int untrust)
{
   int x = 0;
   if (despec_i(untrust >= 0 && untrust < N, &untrust, 0))
   {
  int v = a[untrust] & 0x1 ? 0x100 : 0x0;
  x = a2[v];
   }
   return x;
}


So this should feed the predicate through the builtin, and
clear the untrusted value when the condition has been been
mis-predicted.

Wouldn't that be more flexible to use?
Or am I missing something?


As a side note: I noticed that "nop {0x14}" seems to produce the correct
assembler opcode without the need for using a .insn code.


Bernd.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 7:06 AM, Jakub Jelinek  wrote:
> On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote:
>> See:
>>
>> https://sourceware.org/ml/binutils/2017-11/msg00369.html
>
> Program Headers:
>   Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD   0x00 0x 0x 0x00200 0x00200 R   0x20
>   LOAD   0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x20
>   LOAD   0x001000 0x00201000 0x00201000 0x00058 0x00058 R   0x20
>   LOAD   0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW  0x20
>   DYNAMIC0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW  0x4
>   GNU_STACK  0x00 0x 0x 0x0 0x0 RW  0x10
>   GNU_RELRO  0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R   0x1
>
> Uh, 3 read-only LOADs instead of 2?  Shouldn't then all the read-only
> non-executable sections be emitted together, so that you have a R, then R E,
> then RW PT_LOADs?

It is done on purpose since the second RO segment will be merged with the RELRO
segment at load time:

Elf file type is EXEC (Executable file)
Entry point 0x401ea0
There are 11 program headers, starting at offset 52

Program Headers:
  Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR   0x34 0x00400034 0x00400034 0x00160 0x00160 R   0x4
  INTERP 0x000194 0x00400194 0x00400194 0x0001a 0x0001a R   0x1
  [Requesting program interpreter: /libx32/ld-linux-x32.so.2]
  LOAD   0x00 0x0040 0x0040 0x0037c 0x0037c R   0x1000
  LOAD   0x000e68 0x00401e68 0x00401e68 0x00195 0x00195 R E 0x1000
  LOAD   0x001000 0x00402000 0x00402000 0x00124 0x00124 R   0x1000
  LOAD   0x001ef0 0x00402ef0 0x00402ef0 0x00134 0x00138 RW  0x1000
  DYNAMIC0x001ef8 0x00402ef8 0x00402ef8 0x000f8 0x000f8 RW  0x4
  NOTE   0x0001b0 0x004001b0 0x004001b0 0x00044 0x00044 R   0x4
  GNU_EH_FRAME   0x001008 0x00402008 0x00402008 0x00034 0x00034 R   0x4
  GNU_STACK  0x00 0x 0x 0x0 0x0 RW  0x10
  GNU_RELRO  0x001ef0 0x00402ef0 0x00402ef0 0x00110 0x00110 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01 .interp
   02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym
.dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
   03 .init .plt .text .fini
   04 .rodata .eh_frame_hdr .eh_frame
   05 .init_array .fini_array .dynamic .got .got.plt .data .bss
   06 .dynamic
   07 .note.ABI-tag .note.gnu.build-id
   08 .eh_frame_hdr
   09
   10 .init_array .fini_array .dynamic .got



-- 
H.J.


Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-08 Thread Segher Boessenkool
On Thu, Jan 04, 2018 at 06:05:55PM -0500, Michael Meissner wrote:
> This patch is the beginning step to switching the PowerPC long double support
> from IBM extended double to IEEE 128-bit floating point on PowerPC servers.  
> It
> will be necessary to have this patch or a similar patch to allow the GLIBC 
> team
> to begin their modifications in GLIBC 2.28, so that by the time GCC 9 comes
> out, we can decide to switch the default.  It is likely, the default will only
> be switched on the 64-bit little endian PowerPC systems, when a distribution
> goes through a major level, such that they can contemplate major changes.

I would hope the default changes for BE systems at the same time (at
least those with VSX, but ideally *all*).

> If you do not use the configuration option --with-long-double-format=ieee or
> --with-long-double-format=ibm, the system will not build multilibs, and just
> build normal libraries with the default set to IBM extended double.  If you do
> use either of the switches, and allow multilibs, it will build two sets of
> multilibs, one for -mabi=ieeelongdouble and one for -mabi=ibmlongdouble.

Huh.  Why not always, then?  There already is an option to turn off
multilibs, for people who really really want that.


Segher


[patch,avr] Implement PR83737

2018-01-08 Thread Georg-Johann Lay

This PR skips saving of any registers in main.

Attribute OS_main can do this as well, however we can just drop
any saves / restores in all optimized compilation -- not even
the test suite needs these saves.

The feature can still be switched off by new -mno-OS_main

Ok for trunk?


gcc/
Don't save registers in main().

PR target/83737
* doc/invoke.texi (AVR Options) [-mOS_main]: Document it.
* config/avr/avr.opt (-mOS_main): New target option.
* config/avr/avr.c (avr_in_main_p): New static function.
(avr_regs_to_save) [avr_in_main_p]: Return 0.
(avr_prologue_setup_frame): Don't save any regs if avr_in_main_p.
(avr_expand_epilogue): Same.
* common/config/avr/avr-common.c (avr_option_optimization_table):
Switch on -mOS_main for optimizing compilations.
Index: common/config/avr/avr-common.c
===
--- common/config/avr/avr-common.c	(revision 256338)
+++ common/config/avr/avr-common.c	(working copy)
@@ -31,6 +31,7 @@ static const struct default_options avr_
 // a frame without need when it tries to be smart around calls.
 { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 },
+{ OPT_LEVELS_1_PLUS, OPT_mOS_main, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 256338)
+++ config/avr/avr.c	(working copy)
@@ -1167,6 +1167,23 @@ avr_starting_frame_offset (void)
 }
 
 
+/* Return true if we are supposed to be in main().  This is only used
+   to determine if the callee-saved registers don't need to be saved
+   because the caller of main (crt*.o) doesn't use any of them.  */
+
+static bool
+avr_in_main_p ()
+{
+  return (TARGET_OS_MAIN
+  && MAIN_NAME_P (DECL_NAME (current_function_decl))
+  // FIXME:  We'd like to also test `flag_hosted' which is only
+  // available in the C-ish fronts, so no such test for now.
+  // Instead, we test the return type of "main" which is not exactly
+  // the same but good enough.
+  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (current_function_decl;
+}
+
+
 /* Return the number of hard registers to push/pop in the prologue/epilogue
of the current function, and optionally store these registers in SET.  */
 
@@ -1180,10 +1197,11 @@ avr_regs_to_save (HARD_REG_SET *set)
 CLEAR_HARD_REG_SET (*set);
   count = 0;
 
-  /* No need to save any registers if the function never returns or
- has the "OS_task" or "OS_main" attribute.  */
+  /* No need to save any registers if the function never returns or has the
+ "OS_task" or "OS_main" attribute.  Dito if we are in "main".  */
 
   if (TREE_THIS_VOLATILE (current_function_decl)
+  || avr_in_main_p()
   || cfun->machine->is_OS_task
   || cfun->machine->is_OS_main)
 return 0;
@@ -1651,6 +1669,7 @@ avr_prologue_setup_frame (HOST_WIDE_INT
&& size < size_max
&& live_seq
&& !isr_p
+   && !avr_in_main_p()
&& !cfun->machine->is_OS_task
&& !cfun->machine->is_OS_main
&& !AVR_TINY);
@@ -1713,7 +1732,9 @@ avr_prologue_setup_frame (HOST_WIDE_INT
   emit_push_byte (reg, true);
 
   if (frame_pointer_needed
-  && (!(cfun->machine->is_OS_task || cfun->machine->is_OS_main)))
+  && (!(avr_in_main_p()
+|| cfun->machine->is_OS_task
+|| cfun->machine->is_OS_main)))
 {
   /* Push frame pointer.  Always be consistent about the
  ordering of pushes -- epilogue_restores expects the
@@ -1834,6 +1855,9 @@ avr_prologue_setup_frame (HOST_WIDE_INT
   if (cfun->machine->is_interrupt)
 irq_state = 1;
 
+  /* IRQs might be on when entering "main", hence avr_in_main_p
+ is *not* included in the following test.  */
+
   if (TARGET_NO_INTERRUPTS
   || cfun->machine->is_signal
   || cfun->machine->is_OS_main)
@@ -2122,6 +2146,7 @@ avr_expand_epilogue (bool sibcall_p)
   && !isr_p
   && !cfun->machine->is_OS_task
   && !cfun->machine->is_OS_main
+  && !avr_in_main_p()
   && !AVR_TINY);
 
   if (minimize
@@ -2227,7 +2252,9 @@ avr_expand_epilogue (bool sibcall_p)
 } /* size != 0 */
 
   if (frame_pointer_needed
-  && !(cfun->machine->is_OS_task || cfun->machine->is_OS_main))
+  && !(avr_in_main_p()
+   || cfun->machine->is_OS_task
+   || cfun->machine->is_OS_main))
 {
   /* Restore previous frame_pointer.  See avr_expand_prologue for
  rationale for not using pophi.  */
Index: config/avr/avr.opt
===
--- co

Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread Jakub Jelinek
On Mon, Jan 08, 2018 at 08:17:27AM -0800, H.J. Lu wrote:
> On Mon, Jan 8, 2018 at 7:06 AM, Jakub Jelinek  wrote:
> > On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote:
> >> See:
> >>
> >> https://sourceware.org/ml/binutils/2017-11/msg00369.html
> >
> > Program Headers:
> >   Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
> >   LOAD   0x00 0x 0x 0x00200 0x00200 R   0x20
> >   LOAD   0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x20
> >   LOAD   0x001000 0x00201000 0x00201000 0x00058 0x00058 R   0x20
> >   LOAD   0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW  0x20
> >   DYNAMIC0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW  0x4
> >   GNU_STACK  0x00 0x 0x 0x0 0x0 RW  0x10
> >   GNU_RELRO  0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R   0x1
> >
> > Uh, 3 read-only LOADs instead of 2?  Shouldn't then all the read-only
> > non-executable sections be emitted together, so that you have a R, then R E,
> > then RW PT_LOADs?
> 
> It is done on purpose since the second RO segment will be merged with the 
> RELRO
> segment at load time:

That doesn't look like an advantage over not introducing it.

> Elf file type is EXEC (Executable file)
> Entry point 0x401ea0
> There are 11 program headers, starting at offset 52
> 
> Program Headers:
>   Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   PHDR   0x34 0x00400034 0x00400034 0x00160 0x00160 R   0x4
>   INTERP 0x000194 0x00400194 0x00400194 0x0001a 0x0001a R   0x1
>   [Requesting program interpreter: /libx32/ld-linux-x32.so.2]
>   LOAD   0x00 0x0040 0x0040 0x0037c 0x0037c R   0x1000
>   LOAD   0x000e68 0x00401e68 0x00401e68 0x00195 0x00195 R E 0x1000
>   LOAD   0x001000 0x00402000 0x00402000 0x00124 0x00124 R   0x1000
>   LOAD   0x001ef0 0x00402ef0 0x00402ef0 0x00134 0x00138 RW  0x1000
>   DYNAMIC0x001ef8 0x00402ef8 0x00402ef8 0x000f8 0x000f8 RW  0x4
>   NOTE   0x0001b0 0x004001b0 0x004001b0 0x00044 0x00044 R   0x4
>   GNU_EH_FRAME   0x001008 0x00402008 0x00402008 0x00034 0x00034 R   0x4
>   GNU_STACK  0x00 0x 0x 0x0 0x0 RW  0x10
>   GNU_RELRO  0x001ef0 0x00402ef0 0x00402ef0 0x00110 0x00110 R   0x1

   PHDR   0x34 0x00400034 0x00400034 0x00160 0x00160 R   0x4
   INTERP 0x000194 0x00400194 0x00400194 0x0001a 0x0001a R   0x1
   [Requesting program interpreter: /libx32/ld-linux-x32.so.2]
   LOAD   0x00 0x0040 0x0040 0x004a0 0x004a0 R   0x1000
   LOAD   0x000e68 0x00401e68 0x00401e68 0x00195 0x00195 R E 0x1000
   LOAD   0x001ef0 0x00402ef0 0x00402ef0 0x00134 0x00138 RW  0x1000
   DYNAMIC0x001ef8 0x00402ef8 0x00402ef8 0x000f8 0x000f8 RW  0x4
   NOTE   0x0001b0 0x004001b0 0x004001b0 0x00044 0x00044 R   0x4
   GNU_EH_FRAME   0x001008 0x00402008 0x00402008 0x00034 0x00034 R   0x4
   GNU_STACK  0x00 0x 0x 0x0 0x0 RW  0x10
   GNU_RELRO  0x001ef0 0x00402ef0 0x00402ef0 0x00110 0x00110 R   0x1

you could even more the second PT_LOAD earlier to make the gaps on disk
smaller.

Jakub


Re: [PATCH, PR83327] Fix liveness analysis in lra for spilled-into hard regs

2018-01-08 Thread Tom de Vries

On 12/18/2017 05:57 PM, Vladimir Makarov wrote:



On 12/15/2017 06:25 AM, Tom de Vries wrote:


Proposed Solution:

The patch addresses the problem, by:
- marking the hard regs that have been used in lra_spill in
  hard_regs_spilled_into
- using hard_regs_spilled_into in lra_create_live_ranges to
  make sure those registers are marked in the conflict_hard_regs
  of pseudos that overlap with the spill register usage

[ I've also tried an approach where I didn't use 
hard_regs_spilled_into, but tried to propagate all hard regs. I 
figured out that I needed to mask out eliminable_regset.  Also I 
needed to masked out lra_no_alloc_regs, but that could be due to 
gcn-specific problems (pointers take 2 hard regs), I'm not yet sure. 
Anyway, in the submitted patch I tried to avoid these problems and 
went for the more minimal approach. ]


Tom, thank you for the detail explanation of the problem and solutions 
you considered.  It helped me a lot.  Your simple solution is adequate 
as the most transformations and allocation are done on the 1st LRA 
subpasses iteration.

In order to get the patch accepted for trunk, I think we need:
- bootstrap and reg-test on x86_64
- build and reg-test on mips (the only primary platform that has the
  spill_class hook enabled)

Any comments?


The patch looks ok to me.  You can commit it after successful testing on 
x86-64 and mips but I am sure there will be no problems with x86-64 as 
it does not use spill_class currently (actually your patch might help to 
switch it on again for x86-64.  spill_class was quite useful for x86-64 
performance on Intel processors).




Hi Matthew,

there's an lra optimization that is currently enabled for MIPS, and not 
for any other primary or secondary target.


This (already approved) patch fixes a bug in that optimization, and 
needs to be tested on MIPS.


Unfortunately, the optimization is only enabled for MIPS16, and we don't 
have a current setup to test this.


Could you help us out here and test this patch for MIPS16 on trunk?

Thanks,
- Tom


Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-08 Thread Bill Schmidt
On Jan 8, 2018, at 9:23 AM, Richard Earnshaw (lists)  
wrote:
> 
> On 08/01/18 14:19, Bill Schmidt wrote:
>> 
>>> On Jan 7, 2018, at 10:47 PM, Jeff Law  wrote:
>>> 
>>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
 Hi Richard,
 
 Unfortunately, I don't see any way that this will be useful for the ppc 
 targets.  We don't
 have a way to force resolution of a condition prior to continuing 
 speculation, so this
 will just introduce another comparison that we would speculate past.  For 
 our mitigation
 we will have to introduce an instruction that halts all speculation at 
 that point, and place
 it in front of all dangerous loads.  I wish it were otherwise.
>>> So could you have an expander for __builtin_load_no_speculate that just
>>> emits the magic insn that halts all speculation and essentially ignores
>>> the additional stuff that __builtin_load_no_speculate might be able to
>>> do on other platforms?
>> 
>> This is possible, but the builtin documentation is completely misleading for
>> powerpc.  We will not provide the semantics that this builtin claims to 
>> provide.
>> So at a minimum we would need the documentation to indicate that the 
>> additional
>> range-checking is target-specific behavior as well, not just the speculation 
>> code.
>> At that point it isn't really a very target-neutral solution.
>> 
>> What about other targets?  This builtin seems predicated on specific behavior
>> of ARM architecture; I don't know whether other targets have a guaranteed
>> speculation-rectifying conditional test.
>> 
>> For POWER, all we would need, or be able to exploit, is 
>> 
>>  void __builtin_speculation_barrier ()
>> 
>> or some such.  If there are two dangerous loads in one block, a single call
>> to this suffices, but a generic solution involving range checks for specific
>> loads would require one per load.
>> 
> 
> Do you have any data to suggest that multiple /independent/ vulnerable
> accesses occur under a single guarding condition more often than 'once
> in a blue moon'?  It seems to me that would be highly unlikely.

No, I agree with that.

This is more thinking ahead about the problem of trying to identify these
cases automatically.  Anything we do there is going to be necessarily 
conservative, so more loads may look dangerous than a human user can 
identify.  But for something like that then you probably aren't looking at
using __builtin_load_no_speculate anyhow.

Thanks,
Bill

> 
> 
> R.



Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))

2018-01-08 Thread David Malcolm
On Fri, 2018-01-05 at 17:20 -0500, David Malcolm wrote:
> On Fri, 2018-01-05 at 15:29 -0500, Jason Merrill wrote:
> > On 12/29/2017 12:06 PM, David Malcolm wrote:
> > > One issue I ran into was that fold_for_warn doesn't eliminate
> > > location wrappers when processing_template_decl, leading to
> > > failures of the template-based cases in
> > > g++.dg/warn/Wmemset-transposed-args-1.C.
> > > 
> > > This is due to the early bailout when processing_template_decl
> > > within cp_fold:
> > > 
> > > 2078if (processing_template_decl
> > > 2079|| (EXPR_P (x) && (!TREE_TYPE (x) || TREE_TYPE
> > > (x) == error_mark_node)))
> > > 2080  return x;
> > > 
> > > which dates back to the merger of the C++ delayed folding branch.
> > > 
> > > I've fixed that in this version of the patch by removing that
> > > "processing_template_decl ||" condition from that cp_fold early
> > > bailout.
> > 
> > Hmm, that makes me nervous.  We might want to fold in templates
> > when 
> > called from fold_for_warn, but not for other occurrences.  But I
> > see 
> > that we check processing_template_decl in cp_fully_fold and in the
> > call 
> > to cp_fold_function, so I guess this is fine.
> 
> (I wondered if it would be possible to add a new flag to the various
> fold* calls to request folding in templates, but given that the API
> is
> partially shared with C doing so doesn't seem to make sense)
> 
> > > +case VIEW_CONVERT_EXPR:
> > > +case NON_LVALUE_EXPR:
> > >  case CAST_EXPR:
> > >  case REINTERPRET_CAST_EXPR:
> > >  case CONST_CAST_EXPR:
> > > @@ -14937,6 +14940,15 @@ tsubst_copy (tree t, tree args,
> > > tsubst_flags_t complain, tree in_decl)
> > >  case CONVERT_EXPR:
> > >  case NOP_EXPR:
> > >{
> > > + if (location_wrapper_p (t))
> > > +   {
> > > + /* Handle location wrappers by substituting the
> > > wrapped node
> > > +first, *then* reusing the resulting type.  Doing
> > > the type
> > > +first ensures that we handle template parameters
> > > and
> > > +parameter pack expansions.  */
> > > + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args,
> > > complain, in_decl);
> > > + return build1 (code, TREE_TYPE (op0), op0);
> > > +   }
> > 
> > I'd rather handle location wrappers separately, and abort if 
> > VIEW_CONVERT_EXPR or NON_LVALUE_EXPR appear other than as wrappers.
> 
> OK.  I'm testing an updated version which does so.

Doing so uncovered an issue which I'm not sure how to resolve: it's
possible for a decl to change type during parsing, after location
wrappers may have been created, which changes location_wrapper_p on
those wrappers from true to false.

Here's the most minimal reproducer I've generated so far:

 1  template
 2  struct basic_string {
 3static const _CharT _S_terminal;
 4static void assign(const _CharT& __c2);
 5void _M_set_length_and_sharable() {
 6  assign(_S_terminal);
 7}
 8  };
 9
10  template
11  const _CharT basic_string<_CharT>::_S_terminal = _CharT();
12
13  void getline(basic_string& __str) {
14__str._M_set_length_and_sharable();
15  }

Asserting that the only VIEW_CONVERT_EXPR or NON_LVALUE_EXPR seen in
tsubst_copy and tsubst_copy_and_build are location_wrapper_p leads to
an ICE on the above code.

What's happening is as follows.  First, in the call:

 6  assign(_S_terminal);
   ^~~

the VAR_DECL "_S_terminal" gains a VIEW_CONVERT_EXPR location wrapper
node to express the underline shown above.

Later, during parsing of this init-declarator:

10  template
11  const _CharT basic_string<_CharT>::_S_terminal = _CharT();
   ^~~


...cp_parser_init_declarator calls start_decl, which calls
duplicate_decls, merging the "_S_terminal" seen here:

 1  template
 2  struct basic_string {
 3static const _CharT _S_terminal;
  ^~~

with that seen here:

10  template
11  const _CharT basic_string<_CharT>::_S_terminal = _CharT();
^~~

Both "_S_terminal" VAR_DECLs have a "_CharT" TEMPLATE_TYPE_PARM, but
these types are different tree nodes.

Hence the type of the first VAR_DECL changes in duplicate_decls here:

2152  TREE_TYPE (newdecl) = TREE_TYPE (olddecl) = newtype;

...changing type to the TEMPLATE_TYPE_PARM of the second VAR_DECL.

At this point, the location wrapper node at the callsite still has the
*old* type, and hence location_wrapper_p (wrapper) changes from true to
false, as its type no longer matches that of the decl it's wrapping.

Hence my rewritten code in tsubst_copy_and_build fails the assertion
here:

18306   case VIEW_CONVERT_EXPR:
18307   case NON_LVALUE_EXPR:
18308 gcc_assert (location_wrapper_p (t));
18309 RETURN (RECUR (TREE_OPERAND (t, 0)));

Assuming I'm correctly understanding 

Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))

2018-01-08 Thread Nathan Sidwell

On 01/08/2018 12:02 PM, David Malcolm wrote:

On Fri, 2018-01-05 at 17:20 -0500, David Malcolm wrote:



Doing so uncovered an issue which I'm not sure how to resolve: it's
possible for a decl to change type during parsing, after location
wrappers may have been created, which changes location_wrapper_p on
those wrappers from true to false.



Asserting that the only VIEW_CONVERT_EXPR or NON_LVALUE_EXPR seen in
tsubst_copy and tsubst_copy_and_build are location_wrapper_p leads to
an ICE on the above code.

What's happening is as follows.  First, in the call:

  6  assign(_S_terminal);
^~~

the VAR_DECL "_S_terminal" gains a VIEW_CONVERT_EXPR location wrapper
node to express the underline shown above.

Later, during parsing of this init-declarator:

 10  template
 11  const _CharT basic_string<_CharT>::_S_terminal = _CharT();
^~~


...cp_parser_init_declarator calls start_decl, which calls
duplicate_decls, merging the "_S_terminal" seen here:

...

Both "_S_terminal" VAR_DECLs have a "_CharT" TEMPLATE_TYPE_PARM, but
these types are different tree nodes.


correct. they are not EQ but are EQUAL (same_type_p will be true).


Hence the type of the first VAR_DECL changes in duplicate_decls here:

2152  TREE_TYPE (newdecl) = TREE_TYPE (olddecl) = newtype;

...changing type to the TEMPLATE_TYPE_PARM of the second VAR_DECL.



18306   case VIEW_CONVERT_EXPR:
18307   case NON_LVALUE_EXPR:
18308 gcc_assert (location_wrapper_p (t));
18309 RETURN (RECUR (TREE_OPERAND (t, 0)));

Assuming I'm correctly understanding the above, I'm not sure what the
best solution is.

Some ideas:



* don't add location wrappers if processing a template

* introduce a new tree node for location wrappers (gah)

* something I haven't thought of


Add a flag on the VIEW_CONVERT/NON_LVALUE expr explicitly noting its 
wrapperness (rather than infer it from TREE_TYPE == TREE_TYPE 
(TREE_OPERAND)).  TREE_LANG_FLAG_0 looks available?


nathan


--
Nathan Sidwell


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread Michael Matz
Hi,

On Mon, 8 Jan 2018, Jakub Jelinek wrote:

> On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote:
> > See:
> > 
> > https://sourceware.org/ml/binutils/2017-11/msg00369.html
> 
> Program Headers:
>   Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD   0x00 0x 0x 0x00200 0x00200 R   0x20
>   LOAD   0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x20
>   LOAD   0x001000 0x00201000 0x00201000 0x00058 0x00058 R   0x20
>   LOAD   0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW  0x20
>   DYNAMIC0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW  0x4
>   GNU_STACK  0x00 0x 0x 0x0 0x0 RW  0x10
>   GNU_RELRO  0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R   0x1
> 
> Uh, 3 read-only LOADs instead of 2?  Shouldn't then all the read-only
> non-executable sections be emitted together, so that you have a R, then R E,
> then RW PT_LOADs?

See also my subthread starting at H.J. first version of the set:
  https://sourceware.org/ml/binutils/2017-11/msg00218.html
where some of the issues are hashed through.


Ciao,
Michael.


Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))

2018-01-08 Thread Jakub Jelinek
On Mon, Jan 08, 2018 at 12:10:50PM -0500, Nathan Sidwell wrote:
> > Both "_S_terminal" VAR_DECLs have a "_CharT" TEMPLATE_TYPE_PARM, but
> > these types are different tree nodes.
> 
> correct. they are not EQ but are EQUAL (same_type_p will be true).

So perhaps location_wrapper_p could use that instead of pointer comparison.
Though it would be expensive.

> > Some ideas:
> 
> > * don't add location wrappers if processing a template
> > 
> > * introduce a new tree node for location wrappers (gah)
> > 
> > * something I haven't thought of
> 
> Add a flag on the VIEW_CONVERT/NON_LVALUE expr explicitly noting its
> wrapperness (rather than infer it from TREE_TYPE == TREE_TYPE
> (TREE_OPERAND)).  TREE_LANG_FLAG_0 looks available?

Yeah, I think most if not all lang flags are still available for those two
tree codes and checking that should be quite cheap.

Jakub


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread Michael Matz
Hi,

On Mon, 8 Jan 2018, H.J. Lu wrote:

> On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek  wrote:
> > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
> >> > I'm wondering whether thunk creation can be a good target-independent 
> >> > generalization? I guess
> >> > we can emit the function declaration without direct writes to 
> >> > asm_out_file? And the emission
> >> > of function body can be potentially a target hook?
> >> >
> >> > What about emitting body of the function with RTL instructions instead 
> >> > of direct assembly write?
> >> > My knowledge of RTL is quite small, but maybe it can bring some 
> >> > generalization and reusability
> >> > for other targets?
> >>
> >> Thunks are x86 specific and they are created the same way as 32-bit PIC 
> >> thunks.
> >> I don't see how a target hook is used.
> >
> > Talking about PIC thunks, those have I believe . character in their symbols,
> > so that they can't be confused with user functions.  Any reason these
> > retpoline thunks aren't?
> >
> 
> They used to have '.'.  It was changed at the last minute since kernel 
> needs to export them as regular symbols.

That can be done via asm aliases or direct assembler use; the kernel 
doesn't absolutely have to access them via C compatible symbol names.


Ciao,
Michael.


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 9:18 AM, Michael Matz  wrote:
> Hi,
>
> On Mon, 8 Jan 2018, H.J. Lu wrote:
>
>> On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek  wrote:
>> > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
>> >> > I'm wondering whether thunk creation can be a good target-independent 
>> >> > generalization? I guess
>> >> > we can emit the function declaration without direct writes to 
>> >> > asm_out_file? And the emission
>> >> > of function body can be potentially a target hook?
>> >> >
>> >> > What about emitting body of the function with RTL instructions instead 
>> >> > of direct assembly write?
>> >> > My knowledge of RTL is quite small, but maybe it can bring some 
>> >> > generalization and reusability
>> >> > for other targets?
>> >>
>> >> Thunks are x86 specific and they are created the same way as 32-bit PIC 
>> >> thunks.
>> >> I don't see how a target hook is used.
>> >
>> > Talking about PIC thunks, those have I believe . character in their 
>> > symbols,
>> > so that they can't be confused with user functions.  Any reason these
>> > retpoline thunks aren't?
>> >
>>
>> They used to have '.'.  It was changed at the last minute since kernel
>> needs to export them as regular symbols.
>
> That can be done via asm aliases or direct assembler use; the kernel
> doesn't absolutely have to access them via C compatible symbol names.
>

Hi David,

Can you comment on this?


-- 
H.J.


Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))

2018-01-08 Thread Nathan Sidwell

On 01/08/2018 12:14 PM, Jakub Jelinek wrote:

On Mon, Jan 08, 2018 at 12:10:50PM -0500, Nathan Sidwell wrote:

Both "_S_terminal" VAR_DECLs have a "_CharT" TEMPLATE_TYPE_PARM, but
these types are different tree nodes.


correct. they are not EQ but are EQUAL (same_type_p will be true).


So perhaps location_wrapper_p could use that instead of pointer comparison.
Though it would be expensive.


If TYPE_STRUCTURAL_COMPARISON_P (or however it's spelt) is true, it'll 
be expensive.  Otherwise it's a function call, a couple of indirections 
and a pointer compare.  But still more expensive than ...



Add a flag on the VIEW_CONVERT/NON_LVALUE expr explicitly noting its
wrapperness (rather than infer it from TREE_TYPE == TREE_TYPE
(TREE_OPERAND)).  TREE_LANG_FLAG_0 looks available?


Yeah, I think most if not all lang flags are still available for those two
tree codes and checking that should be quite cheap.


... a bit test on the node itself.

location_wrapper_p could contain something like
  bool result = TREE_LANG_FLAG_$FOO (t);
  gcc_checking_assert (result == same_type_p (TREE_TYPE (t), TREE_TYPE 
(TREE_OPERAND (t, 0)));

  return result;

for the paranoid.

nathan
--
Nathan Sidwell


Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-08 Thread Michael Meissner
On Mon, Jan 08, 2018 at 10:17:06AM -0600, Segher Boessenkool wrote:
> On Thu, Jan 04, 2018 at 06:05:55PM -0500, Michael Meissner wrote:
> > This patch is the beginning step to switching the PowerPC long double 
> > support
> > from IBM extended double to IEEE 128-bit floating point on PowerPC servers. 
> >  It
> > will be necessary to have this patch or a similar patch to allow the GLIBC 
> > team
> > to begin their modifications in GLIBC 2.28, so that by the time GCC 9 comes
> > out, we can decide to switch the default.  It is likely, the default will 
> > only
> > be switched on the 64-bit little endian PowerPC systems, when a distribution
> > goes through a major level, such that they can contemplate major changes.
> 
> I would hope the default changes for BE systems at the same time (at
> least those with VSX, but ideally *all*).

Note, the change has to be on a system by system basis.  We will need to
support distributions that use the IBM extended double for the long double
format, and we will need to support distributions for the IEEE 128-bit format.
It all depends on what the host system uses.  While the work can be done, I
don't know of any BE distribution that will be using GCC 8 as their main
compiler.

> > If you do not use the configuration option --with-long-double-format=ieee or
> > --with-long-double-format=ibm, the system will not build multilibs, and just
> > build normal libraries with the default set to IBM extended double.  If you 
> > do
> > use either of the switches, and allow multilibs, it will build two sets of
> > multilibs, one for -mabi=ieeelongdouble and one for -mabi=ibmlongdouble.
> 
> Huh.  Why not always, then?  There already is an option to turn off
> multilibs, for people who really really want that.

I'm trying not to surprise people building compilers for a setup that does not
work.  In the GCC 9 timeframe, when there is GLIBC support for it, we can make
it default (assuming we keep the multilibs).

It is a chicken and egg problem.  Real users (as opposed to GCC and GLIBC
developers) would need GLIBC 2.28 in order to use the IEEE multilib.  But if we
don't provide the switch or multilib as an option, it makes the GLIBC work
harder.  I suspect that libstc++-v3 may be more of an issue than GLIBC, since
we have people starting to look at the GLIBC work, but we can't really do
anything about libstdc++-v3 until we have a GLIBC.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread Florian Weimer
* H. J. Lu:

> This set of patches for GCC 8 mitigates variant #2 of the
> speculative execution vulnerabilities on x86 processors identified
> by CVE-2017-5715, aka Spectre.  They convert indirect branches to
> call and return thunks to avoid speculative execution via indirect
> call and jmp.

Would it make sense to add a mode which relies on an empty return
stack cache?  Or will CPUs use the regular branch predictor if the
return stack is empty?

With an empty return stack cache and no branch predictor, a simple
PUSH/RET sequence cannot be predicted, so the complex CALL sequence
with a speculation barrier is not needed.


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread Florian Weimer
* Sandra Loosemore:

> I have a general documentation issue with all the new command-line 
> options and attributes added by this patch set:  the documentation is 
> very implementor-speaky and doesn't explain what user-level problem 
> they're trying to solve.

Agreed.  Ideally, the documentation would also list the CPU
models/model groups where it is known to have the desired effect, and
if firmware updates are needed.

For some users, it may be useful to be able to advertise that they
have built their binaries with hardening, but another group of users
is interested in hardening which actually works to stop all potential
exploits.


Re: [patch,avr] Implement PR83737

2018-01-08 Thread Denis Chertykov
2018-01-08 20:19 GMT+04:00 Georg-Johann Lay :
> This PR skips saving of any registers in main.
>
> Attribute OS_main can do this as well, however we can just drop
> any saves / restores in all optimized compilation -- not even
> the test suite needs these saves.
>
> The feature can still be switched off by new -mno-OS_main
>
> Ok for trunk?

I like it.

Please commit.

>
>
> gcc/
> Don't save registers in main().
>
> PR target/83737
> * doc/invoke.texi (AVR Options) [-mOS_main]: Document it.
> * config/avr/avr.opt (-mOS_main): New target option.
> * config/avr/avr.c (avr_in_main_p): New static function.
> (avr_regs_to_save) [avr_in_main_p]: Return 0.
> (avr_prologue_setup_frame): Don't save any regs if avr_in_main_p.
> (avr_expand_epilogue): Same.
> * common/config/avr/avr-common.c (avr_option_optimization_table):
> Switch on -mOS_main for optimizing compilations.


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread Woodhouse, David
On Mon, 2018-01-08 at 09:25 -0800, H.J. Lu wrote:
> On Mon, Jan 8, 2018 at 9:18 AM, Michael Matz  wrote:
> > 
> > Hi,
> > 
> > On Mon, 8 Jan 2018, H.J. Lu wrote:
> > 
> > > 
> > > On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek  wrote:
> > > > 
> > > > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
> > > > > 
> > > > > > 
> > > > > > I'm wondering whether thunk creation can be a good 
> > > > > > target-independent generalization? I guess
> > > > > > we can emit the function declaration without direct writes to 
> > > > > > asm_out_file? And the emission
> > > > > > of function body can be potentially a target hook?
> > > > > > 
> > > > > > What about emitting body of the function with RTL instructions 
> > > > > > instead of direct assembly write?
> > > > > > My knowledge of RTL is quite small, but maybe it can bring some 
> > > > > > generalization and reusability
> > > > > > for other targets?
> > > > > Thunks are x86 specific and they are created the same way as 32-bit 
> > > > > PIC thunks.
> > > > > I don't see how a target hook is used.
> > > > Talking about PIC thunks, those have I believe . character in their 
> > > > symbols,
> > > > so that they can't be confused with user functions.  Any reason these
> > > > retpoline thunks aren't?
> > > > 
> > > They used to have '.'.  It was changed at the last minute since kernel
> > > needs to export them as regular symbols.
> > That can be done via asm aliases or direct assembler use; the kernel
> > doesn't absolutely have to access them via C compatible symbol names.
> > 
> Hi David,
> 
> Can you comment on this?

It ends up being a real pain for the CONFIG_TRIM_UNUSED_SYMBOLS
mechanism in the kernel, which really doesn't cope well with the dots.
It *does* assume that exported symbols have C-compatible names.
MODVERSIONS too, although we had a simpler "just shut up the warnings"
solution for that. It was CONFIG_TRIM_UNUSED_SYMBOLS which was the
really horrid one.

I went a little way down the rabbit-hole of trying to make it cope, but
it was far from pretty:

https://patchwork.kernel.org/patch/10148081/

If there's a way to make it work sanely, I'm up for that. But if the
counter-argument is "But someone might genuinely want to make their own
C function called __x86_indirect_thunk_rax"... I'm not so receptive to
that argument :)


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-08 Thread Richard Earnshaw (lists)
On 08/01/18 16:10, Bernd Edlinger wrote:
> I thought about your new builtin again, and I wonder if
> something like that might work as well?
> 
> 
> cat despec.s
>   .arch armv7-a
>   .eabi_attribute 28, 1
>   .eabi_attribute 20, 1
>   .eabi_attribute 21, 1
>   .eabi_attribute 23, 3
>   .eabi_attribute 24, 1
>   .eabi_attribute 25, 1
>   .eabi_attribute 26, 2
>   .eabi_attribute 30, 4
>   .eabi_attribute 34, 1
>   .eabi_attribute 18, 4
>   .section.text.startup,"ax",%progbits
>   .align  2
>   .global despec_i
>   .syntax unified
>   .arm
>   .fpu vfpv3-d16
> 
> despec_i:
>   cmp r0,#0
>   beq L0
>   ldr r0,[r1]
>   moveq r0,r2
>   nop {0x14} @ CSDB
>   str r0,[r1]
>   mov r0,#1
>   bx lr
> L0:   mov r0,#0
>   bx lr
> 
> cat test.c
> extern int despec_i(int predicate, int *untrusted, int fallback);
> #define N 8
> int a[N] = {1,2,3,4,5,7,8,9};
> int a2[0x200];
> int test(int untrust)
> {
>int x = 0;
>if (despec_i(untrust >= 0 && untrust < N, &untrust, 0))
>{
>   int v = a[untrust] & 0x1 ? 0x100 : 0x0;
>   x = a2[v];
>}
>return x;
> }
> 
> 
> So this should feed the predicate through the builtin, and
> clear the untrusted value when the condition has been been
> mis-predicted.
> 
> Wouldn't that be more flexible to use?
> Or am I missing something?

Yes, if you modified your test case to be something like:


int test(int untrust)
{
   int x = 0;

   if (untrust < 0)
 return x;

   if (despec_i(untrust >= 0 && untrust < N, &untrust, 0))
   {
  int v = a[untrust] & 0x1 ? 0x100 : 0x0;
  x = a2[v];
   }
   return x;
}

then the compiler can (and will) optimize the condition to

  if (despec (true && untrust < N, &untrust, 0))

and suddenly you don't have the protection against speculative execution
that you thought you had.  That makes the API exceedingly dangerous as
we can't easily detect and inhibit all the sources of information that
the compiler might use to make such deductions.  What happens, for
example if your original case is inlined into a wider function that has
additional tests?

R.


> 
> 
> As a side note: I noticed that "nop {0x14}" seems to produce the correct
> assembler opcode without the need for using a .insn code.
> 
> 
> Bernd.
> 



[PATCH][arm][2/3] Implement fp16fml extension for ARMv8.4-A

2018-01-08 Thread Kyrill Tkachov

Hi all,

This patch adds the +fp16fml extension that enables some
half-precision floating-point Advanced SIMD instructions,
available through arm_neon.h intrinsics.

This extension is on by default for armv8.4-a
if fp16 is available, so it can be enabled by -march=armv8.4-a+fp16.

fp16fml is also available for armv8.2-a and armv8.3-a through the
+fp16fml option that is added for these architectures.

The new instructions that this patch adds support for are:
vfmal.f16 Dr, Sm, Sn
vfmal.f16 Qr, Dm, Dn
vfmsl.f16 Dr, Sm, Sn
vfmsl.f16 Qr, Dm, Dn

They interpret their input registers as a vector of half-precision
floating-point values, extend them to single-precision vectors
and perform a fused multiply-add or subtract of them with the
destination vector.

This patch exposes these instructions through arm_neon.h intrinsics.
The set of intrinsics allows us to do stuff such as perform
the multiply-add/subtract operation on the low or top half of
float16x4_t and float16x8_t values.  This maps naturally in aarch64
to the FMLAL and FMLAL2 instructions but on arm we have to use the
fact that consecutive NEON registers overlap the wider register
(i.e. d0 is s0 plus s1, q0 is d0 plus d1 etc). This just means
we have to be careful to use the right subreg operand print code.

New arm-specific builtins are defined to expand to the new patterns.
I've managed to compress the define_expands using code, mode and int
iterators but the define_insns don't compress very well without two-tiered
iterators (iterator attributes expanding to iterators) which we
don't support.

Bootstrapped and tested on arm-none-linux-gnueabihf and also on
armeb-none-eabi.

Thanks,
Kyrill

2018-01-08  Kyrylo Tkachov  

* config/arm/arm-cpus.in (fp16fml): New feature.
(ALL_SIMD): Add fp16fml.
(armv8.2-a): Add fp16fml as an option.
(armv8.3-a): Likewise.
(armv8.4-a): Add fp16fml as part of fp16.
* config/arm/arm.h (TARGET_FP16FML): Define.
* config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_FP16_FML
when appropriate.
* config/arm/arm-modes.def (V2HF): Define.
* config/arm/arm_neon.h (vfmlal_low_u32, vfmlsl_low_u32,
vfmlal_high_u32, vfmlsl_high_u32, vfmlalq_low_u32,
vfmlslq_low_u32, vfmlalq_high_u32, vfmlslq_high_u32): Define.
* config/arm/arm_neon_builtins.def (vfmal_low, vfmal_high,
vfmsl_low, vfmsl_high): New set of builtins.
* config/arm/iterators.md (PLUSMINUS): New code iterator.
(vfml_op): New code attribute.
(VFMLHALVES): New int iterator.
(VFML, VFMLSEL): New mode attributes.
(V_reg): Define mapping for V2HF.
(V_hi, V_lo): New mode attributes.
(VF_constraint): Likewise.
(vfml_half, vfml_half_selector): New int attributes.
* config/arm/neon.md (neon_vfml_): New
define_expand.
(vfmal_low_intrinsic, vfmsl_high_intrinsic,
vfmal_high_intrinsic, vfmsl_low_intrinsic):
New define_insn.
* config/arm/t-arm-elf (v8_fps): Add fp16fml.
* config/arm/t-multilib (v8_2_a_simd_variants): Add fp16fml.
* config/arm/unspecs.md (UNSPEC_VFML_LO, UNSPEC_VFML_HI): New unspecs.
* doc/invoke.texi (ARM Options): Document fp16fml.  Update armv8.4-a
documentation.
* doc/sourcebuild.texi (arm_fp16fml_neon_ok, arm_fp16fml_neon):
Document new effective target and option set.

2018-01-08  Kyrylo Tkachov  

* gcc.target/arm/multilib.exp: Add combination tests for fp16fml.
* gcc.target/arm/simd/fp16fml_high.c: New test.
* gcc.target/arm/simd/fp16fml_low.c: Likewise.
* lib/target-supports.exp
(check_effective_target_arm_fp16fml_neon_ok_nocache,
check_effective_target_arm_fp16fml_neon_ok,
add_options_for_arm_fp16fml_neon): New procedures.
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 635bc3c1c38de79802041fc50229b90defd2e467..46dc8d51ffcd80983a70f1bd283caa3688648c9b 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -160,6 +160,7 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 		  TARGET_VFP_FP16INST);
   def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC",
 		  TARGET_NEON_FP16INST);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_FP16_FML", TARGET_FP16FML);
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_FMA", TARGET_FMA);
   def_or_undef_macro (pfile, "__ARM_NEON__", TARGET_NEON);
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 0967b9d2277a0d211452b7cd4d579db1774f29b3..7b9224b6b0791a9a7a315e1807b439604a3c0929 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -165,6 +165,9 @@ define feature fp16
 # Dot Product instructions extension to ARMv8.2-a.
 define feature dotprod
 
+# Half-precision floating-point instructions in ARMv8.4-A.
+define feature fp16fml
+
 # ISA Quirks (errata?).  Don't forget to add this to the fgroup
 # ALL_QUIRKS below.
 
@@ -202,7 +205,7 @@ define fgroup ALL_CRYPTO	crypto
 # strip off 32 D-registers, but does not remove support for
 # double-precision FP.
 define fgroup ALL_SIMD_INT

[PATCH][arm][3/3] Implement fp16fml lane intrinsics

2018-01-08 Thread Kyrill Tkachov

Hi all,

This patch implements the lane-wise fp16fml intrinsics.
There's quite a few of them so I've split them up from
the other simpler fp16fml intrinsics.

These ones expose instructions such as

vfmal.f16 Dd, Sn, Sm[]  0 <= index <= 1
vfmal.f16 Qd, Dn, Dm[]  0 <= index <= 3
vfmsl.f16 Dd, Sn, Sm[]  0 <= index <= 1
vfmsl.f16 Qd, Dn, Dm[]  0 <= index <= 3

These instructions extract a single half-precision
floating-point value from one of the source regs
and perform a vfmal/vfmsl operation as per the
normal variant with that value.

The nuance here is that some of the intrinsics want
to do things like:

float32x2_t vfmlal_laneq_low_u32 (float32x2_t __r, float16x4_t __a, 
float16x8_t __b, const int __index)



where the float16x8_t value of '__b' is held in a Q
register, so we need to be a bit smart about finding
the right D or S sub-register and translating the
lane number to a lane in that sub-register, instead
of just passing the language-level const-int down to
the assembly instruction.

That's where most of the complexity of this patch comes from
but hopefully it's orthogonal enough to make sense.

Bootstrapped and tested on arm-none-linux-gnueabihf as well as
armeb-none-eabi.

Thanks,
Kyrill

2018-01-08  Kyrylo Tkachov  

* config/arm/arm_neon.h (vfmlal_lane_low_u32, vfmlal_lane_high_u32,
vfmlalq_laneq_low_u32, vfmlalq_lane_low_u32, vfmlal_laneq_low_u32,
vfmlalq_laneq_high_u32, vfmlalq_lane_high_u32, vfmlal_laneq_high_u32,
vfmlsl_lane_low_u32, vfmlsl_lane_high_u32, vfmlslq_laneq_low_u32,
vfmlslq_lane_low_u32, vfmlsl_laneq_low_u32, vfmlslq_laneq_high_u32,
vfmlslq_lane_high_u32, vfmlsl_laneq_high_u32): Define.
* config/arm/arm_neon_builtins.def (vfmal_lane_low,
vfmal_lane_lowv4hf, vfmal_lane_lowv8hf, vfmal_lane_high,
vfmal_lane_highv4hf, vfmal_lane_highv8hf, vfmsl_lane_low,
vfmsl_lane_lowv4hf, vfmsl_lane_lowv8hf, vfmsl_lane_high,
vfmsl_lane_highv4hf, vfmsl_lane_highv8hf): New sets of builtins.
* config/arm/iterators.md (VFMLSEL2, vfmlsel2): New mode attributes.
(V_lane_reg): Likewise.
* config/arm/neon.md (neon_vfml_lane_):
New define_expand.
 (neon_vfml_lane_): Likewise.
(vfmal_lane_low_intrinsic,
vfmal_lane_low_intrinsic,
vfmal_lane_high_intrinsic,
vfmal_lane_high_intrinsic, vfmsl_lane_low_intrinsic,
vfmsl_lane_low_intrinsic,
vfmsl_lane_high_intrinsic,
vfmsl_lane_high_intrinsic): New define_insns.

2018-01-08  Kyrylo Tkachov  

* gcc.target/arm/simd/fp16fml_lane_high.c: New test.
* gcc.target/arm/simd/fp16fml_lane_low.c: New test.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 01324096d673187b504b89a6d68785275b445b1b..a8aae4464aa02b4286751c116fae493517056e99 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18164,6 +18164,150 @@ vfmlslq_high_u32 (float32x4_t __r, float16x8_t __a, float16x8_t __b)
   return __builtin_neon_vfmsl_highv4sf (__r, __a, __b);
 }
 
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vfmlal_lane_low_u32 (float32x2_t __r, float16x4_t __a, float16x4_t __b,
+		 const int __index)
+{
+  __builtin_arm_lane_check (4, __index);
+  return __builtin_neon_vfmal_lane_lowv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vfmlal_lane_high_u32 (float32x2_t __r, float16x4_t __a, float16x4_t __b,
+		  const int __index)
+{
+  __builtin_arm_lane_check (4, __index);
+  return __builtin_neon_vfmal_lane_highv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vfmlalq_laneq_low_u32 (float32x4_t __r, float16x8_t __a, float16x8_t __b,
+		   const int __index)
+{
+  __builtin_arm_lane_check (8, __index);
+  return __builtin_neon_vfmal_lane_lowv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vfmlalq_lane_low_u32 (float32x4_t __r, float16x8_t __a, float16x4_t __b,
+		   const int __index)
+{
+  __builtin_arm_lane_check (4, __index);
+  return __builtin_neon_vfmal_lane_lowv4hfv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vfmlal_laneq_low_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b,
+		   const int __index)
+{
+  __builtin_arm_lane_check (8, __index);
+  return __builtin_neon_vfmal_lane_lowv8hfv2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vfmlalq_laneq_high_u32 (float32x4_t __r, float16x8_t __a, float16x8_t __b,
+			const int __index)
+{
+  __builtin_arm_lane_check (8, __index);
+  return __builtin_neon_vfmal_lane_highv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __

[PATCH][arm][1/3] Add -march=armv8.4-a option

2018-01-08 Thread Kyrill Tkachov

[resending due to mailer problems...]

Hi all,

This patch adds support for the Armv8.4-A architecture [1]
in the arm backend. This is done through the new
-march=armv8.4-a option.

With this patch armv8.4-a is recognised as an argument
and supports the extensions: simd, fp16, crypto, nocrypto,
nofp with the familiar meaning of these options.
Worth noting that there is no dotprod option like in
armv8.2-a and armv8.3-a because Dot Product support is
mandatory in Armv8.4-A when simd is available, so when using
+simd (of fp16 which enables +simd), the +dotprod is implied.

The various multilib selection makefile fragments are updated
too and the mutlilib.exp test gets a few armv8.4-a combination
tests.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Christophe: Can I ask you for a huge favour to give these 3
patches a run through your testing infrastructure if you get
the chance?
The changes should be fairly self-contained
(i.e. touching only -march=armv8.4-a support) but I've gotten
various edge cases with testsuite setup wrong in the past...

Thanks,
Kyrill

[1] 
https://community.arm.com/processors/b/blog/posts/introducing-2017s-extensions-to-the-arm-architecture


2017-01-08  Kyrylo Tkachov  

* config/arm/arm-cpus.in (armv8_4): New feature.
(ARMv8_4a): New fgroup.
(armv8.4-a): New arch.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-aprofile: Add matching rules for -march=armv8.4-a.
* config/arm/t-arm-elf (all_v8_archs): Add armv8.4-a.
* config/arm/t-multilib (v8_4_a_simd_variants): New variable.
Add matching rules for -march=armv8.4-a and extensions.
* doc/invoke.texi (ARM Options): Document -march=armv8.4-a.

2017-01-08  Kyrylo Tkachov  

* gcc.target/arm/multilib.exp: Add some -march=armv8.4-a
combination tests.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 281ec162db8c982128462d8efac2be1d21959cf7..0967b9d2277a0d211452b7cd4d579db1774f29b3 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -120,6 +120,9 @@ define feature armv8_2
 # Architecture rel 8.3.
 define feature armv8_3
 
+# Architecture rel 8.4.
+define feature armv8_4
+
 # M-Profile security extensions.
 define feature cmse
 
@@ -242,6 +245,7 @@ define fgroup ARMv8a  ARMv7ve armv8
 define fgroup ARMv8_1aARMv8a crc32 armv8_1
 define fgroup ARMv8_2aARMv8_1a armv8_2
 define fgroup ARMv8_3aARMv8_2a armv8_3
+define fgroup ARMv8_4aARMv8_3a armv8_4
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
@@ -597,6 +601,19 @@ begin arch armv8.3-a
  option dotprod add FP_ARMv8 DOTPROD
 end arch armv8.3-a
 
+begin arch armv8.4-a
+ tune for cortex-a53
+ tune flags CO_PROC
+ base 8A
+ profile A
+ isa ARMv8_4a
+ option simd add FP_ARMv8 DOTPROD
+ option fp16 add fp16 FP_ARMv8 DOTPROD
+ option crypto add FP_ARMv8 CRYPTO DOTPROD
+ option nocrypto remove ALL_CRYPTO
+ option nofp remove ALL_FP
+end arch armv8.4-a
+
 begin arch armv8-m.base
  tune for cortex-m23
  base 8M_BASE
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index f7937256cd79296ba33d109232bcf0d6f7b03917..b8ebec668b1404fd3f9a71dd1f0d48d1261bcf53 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -455,19 +455,22 @@ EnumValue
 Enum(arm_arch) String(armv8.3-a) Value(29)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.base) Value(30)
+Enum(arm_arch) String(armv8.4-a) Value(30)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.main) Value(31)
+Enum(arm_arch) String(armv8-m.base) Value(31)
 
 EnumValue
-Enum(arm_arch) String(armv8-r) Value(32)
+Enum(arm_arch) String(armv8-m.main) Value(32)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(33)
+Enum(arm_arch) String(armv8-r) Value(33)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(34)
+Enum(arm_arch) String(iwmmxt) Value(34)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(35)
 
 Enum
 Name(arm_fpu) Type(enum fpu_type)
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index a4bf04794e71381256e1489cdad71e966306477f..167a49d16e468be3c222a50abec57b6a68bc561e 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -96,6 +96,13 @@ MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_2_a_simd_variants), \
 			 march?armv8-a+simd=march?armv8.2-a$(ARCH) \
 			 march?armv8-a+simd=march?armv8.3-a$(ARCH))
 
+# Baseline v8.4-a: map down to baseline v8-a
+MULTILIB_MATCHES	+= march?armv8-a=march?armv8.4-a
+
+# Map all v8.4-a SIMD variants to v8-a+simd
+MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_4_a_simd_variants), \
+			 march?armv8-a+simd=march?armv8.4-a$(ARCH))
+
 # Use Thumb libraries for everything.
 
 MULTILIB_REUSE		+= mthumb/march.armv7-a/mfloat-abi.soft=marm/march.armv7-a/mfloat-abi.soft
diff --git a/gcc/config/arm/t-arm-elf b/gcc/config/arm/t-arm-elf
index a15fb2df12f7b0d637976f3912432740ecd104bd..3e721ec789806335c6097d4088642150abf1003a 100644
--- a/gcc/config/arm/t-a

Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))

2018-01-08 Thread David Malcolm
On Mon, 2018-01-08 at 12:25 -0500, Nathan Sidwell wrote:
> On 01/08/2018 12:14 PM, Jakub Jelinek wrote:
> > On Mon, Jan 08, 2018 at 12:10:50PM -0500, Nathan Sidwell wrote:
> > > > Both "_S_terminal" VAR_DECLs have a "_CharT"
> > > > TEMPLATE_TYPE_PARM, but
> > > > these types are different tree nodes.
> > > 
> > > correct. they are not EQ but are EQUAL (same_type_p will be
> > > true).
> > 
> > So perhaps location_wrapper_p could use that instead of pointer
> > comparison.
> > Though it would be expensive.
> 
> If TYPE_STRUCTURAL_COMPARISON_P (or however it's spelt) is true,
> it'll 
> be expensive.  Otherwise it's a function call, a couple of
> indirections 
> and a pointer compare.  But still more expensive than ...
> 
> > > Add a flag on the VIEW_CONVERT/NON_LVALUE expr explicitly noting
> > > its
> > > wrapperness (rather than infer it from TREE_TYPE == TREE_TYPE
> > > (TREE_OPERAND)).  TREE_LANG_FLAG_0 looks available?
> > 
> > Yeah, I think most if not all lang flags are still available for
> > those two
> > tree codes and checking that should be quite cheap.
> 
> ... a bit test on the node itself.
> 
> location_wrapper_p could contain something like
>bool result = TREE_LANG_FLAG_$FOO (t);
>gcc_checking_assert (result == same_type_p (TREE_TYPE (t),
> TREE_TYPE 
> (TREE_OPERAND (t, 0)));
>return result;
> 
> for the paranoid.
> 
> nathan

Thanks Nathan and Jakub: a quick smoketest using TREE_LANG_FLAG_0
worked, and fixes this issue.

However, should I be using a TREE_LANG_FLAG for something that's in
tree.h/c, rather than just in the "cp" subdir?  (the wrapper nodes are
only added to C++ in this patch kit, but given that e.g. STRIP_NOPS
needs to remove them, the lang-independent code needs to handle them,
and ultimately we may want wrapper nodes in other FEs).

Dave


Re: Location wrappers vs decls that change type (was Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl))

2018-01-08 Thread Jakub Jelinek
On Mon, Jan 08, 2018 at 01:02:37PM -0500, David Malcolm wrote:
> Thanks Nathan and Jakub: a quick smoketest using TREE_LANG_FLAG_0
> worked, and fixes this issue.
> 
> However, should I be using a TREE_LANG_FLAG for something that's in
> tree.h/c, rather than just in the "cp" subdir?  (the wrapper nodes are
> only added to C++ in this patch kit, but given that e.g. STRIP_NOPS
> needs to remove them, the lang-independent code needs to handle them,
> and ultimately we may want wrapper nodes in other FEs).

If location_wrapper_p etc. are in generic code rather than only in the FE,
sure, you'd need to use some generic flag rather than FE specific.
I bet e.g. private_flag and protected_flag aren't used yet for
VIEW_CONVERT_EXPR/NON_LVALUE_EXPR, but they are used on some other
expressions, e.g. CALL_EXPR, which suggests that it could be used for that.

Jakub


Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-08 Thread Segher Boessenkool
On Mon, Jan 08, 2018 at 01:27:24PM +, Wilco Dijkstra wrote:
> Segher Boessenkool wrote:
> > On Fri, Jan 05, 2018 at 12:22:44PM +, Wilco Dijkstra wrote:
> >> An example epilog in a shrinkwrapped function before:
> >> 
> >> ldp    x21, x22, [sp,#16]
> >> ldr    x23, [sp,#32]
> >> ldr    x24, [sp,#40]
> >> ldp    x25, x26, [sp,#48]
> >> ldr    x27, [sp,#64]
> >> ldr    x28, [sp,#72]
> >> ldr    x30, [sp,#80]
> >> ldr    d8, [sp,#88]
> >> ldp    x19, x20, [sp],#96
> >> ret
> >
> > In this example, the compiler already can make a ldp for both x23/x24 and
> > x27/x28 just fine (if not in emit_epilogue_components, then simply in a
> > peephole); why did that not work?  Or is this not the actual generated
> > machine code (and there are labels between the insns, for example)?
> 
> This block originally had a label in it, 2 blocks emitted identical restores 
> and
> then branched to the final epilog. The final epilogue was then duplicated so
> we end up with 2 almost identical epilogs of 10 instructions (almost since
> there were 1-2 unrelated instructions in both blocks).
> 
> Peepholing is very conservative about instructions using SP and won't touch
> anything frame related. If this was working better then the backend could just
> emit single loads/stores and let peepholing generate LDP/STP.

How unfortunate; that should definitely be improved then.

Always pairing two registers together *also* degrades code quality.

> Another issue is that after pro_and_epilogue pass I see multiple restores
> of the same registers and then a branch to the same block. We should try
> to avoid the unnecessary duplication.

It already does that if *all* predecessors of that block do that.  If you
want to do it in other cases, you end up with more jumps.  That may be
beneficial in some cases, of course, but it is not an obvious win (and in
the general case it is, hrm let's use nice words, "terrible").


Segher


Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

2018-01-08 Thread Florian Weimer
* H. J. Lu:

> On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer  wrote:
>> * H. J. Lu:
>>
>>> Add -mindirect-branch-loop= option to control loop filler in call and
>>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>> as loop filler.  The default is 'lfence'.
>>
>> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
>> execution?
>
> My understanding is that a loop works better.

Better how?

What about far jumps?  I think they prevent some forms of prefetch on
i386, so perhaps long mode is similar in that regard?


Re: [PATCH] PR 78534 Change character length from int to size_t

2018-01-08 Thread Bob Deen

On 1/3/18 11:43 AM, Janne Blomqvist wrote:

On Wed, Jan 3, 2018 at 8:34 PM, Bob Deen  wrote:

On 12/29/17 5:31 AM, Janne Blomqvist wrote:


In order to handle large character lengths on (L)LP64 targets, switch
the GFortran character length from an int to a size_t.

This is an ABI change, as procedures with character arguments take
hidden arguments with the character length.



Did this change not make it into gcc 7 then?


No, it caused regressions on big endian targets, and it was so late in
the gcc 7 cycle that there was no time to fix them (at the time I
didn't have access to a big endian target to test on, and getting said
access took time), so I had to revert it. Those regressions have now
been fixed, and the ABI has been broken anyway due to other changes,
so I'm trying again for gcc 8.


Okay.  If for some reason it doesn't make 8, it would be nice to know. 
I lurk here and try to pay attention but obviously I missed that it was 
pulled from 7.  (which is why I'd prefer something specific to test 
rather than a version number, but it is what it is).


Thanks...

-Bob







  I am one of those who still
use these hidden arguments for Fortran <-> C interfaces.  Based on
discussions a year ago, I added this to my code:

#if defined(__GNUC__) && (__GNUC__ > 6)
#include 
#define FORSTR_STDARG_TYPE size_t
#else
#define FORSTR_STDARG_TYPE int
#endif

I infer from this thread that I should change this to __GNUC__ > 7 now. Is
this still the correct/best way to determine the hidden argument size?


Yes, I would say so.


(note that we're still using 4.x... ugh, don't ask... so the >6 check hasn't
actually been used yet, I just want to future-proof things as much as
possible without having to rewrite the entire Fortran <-> C interface.)

Thanks...

-Bob

Bob Deen  @  NASA-JPL Multmission Image Processing Lab
bob.d...@jpl.nasa.gov









Re: Fix Bug 83566 - cyl_bessel_j returns wrong result for x>1000 for high orders

2018-01-08 Thread Michele Pezzutti

Formatting fixed.

diff --git a/libstdc++-v3/include/tr1/bessel_function.tcc 
b/libstdc++-v3/include/tr1/bessel_function.tcc

index 7ac733d..5f8fc9f 100644
--- a/libstdc++-v3/include/tr1/bessel_function.tcc
+++ b/libstdc++-v3/include/tr1/bessel_function.tcc
@@ -27,6 +27,10 @@
  *  Do not attempt to use it directly. @headername{tr1/cmath}
  */

+/* __cyl_bessel_jn_asymp adapted from GNU GSL version 2.4 
specfunc/bessel_j.c

+ * Copyright (C) 1996-2003 Gerard Jungman
+ */
+
 //
 // ISO C++ 14882 TR1: 5.2  Special functions
 //
@@ -358,16 +362,42 @@ namespace tr1
 void
 __cyl_bessel_jn_asymp(_Tp __nu, _Tp __x, _Tp & __Jnu, _Tp & __Nnu)
 {
-  const _Tp __mu   = _Tp(4) * __nu * __nu;
-  const _Tp __mum1 = __mu - _Tp(1);
-  const _Tp __mum9 = __mu - _Tp(9);
-  const _Tp __mum25 = __mu - _Tp(25);
-  const _Tp __mum49 = __mu - _Tp(49);
-  const _Tp __xx = _Tp(64) * __x * __x;
-  const _Tp __P = _Tp(1) - __mum1 * __mum9 / (_Tp(2) * __xx)
-    * (_Tp(1) - __mum25 * __mum49 / (_Tp(12) * __xx));
-  const _Tp __Q = __mum1 / (_Tp(8) * __x)
-    * (_Tp(1) - __mum9 * __mum25 / (_Tp(6) * __xx));
+  const _Tp __mu = _Tp(4) * __nu * __nu;
+  const _Tp __8x = _Tp(8) * __x;
+
+  _Tp __P = _Tp(0);
+  _Tp __Q = _Tp(0);
+
+  _Tp __k = _Tp(0);
+  _Tp __term = _Tp(1);
+
+  int __epsP = 0;
+  int __epsQ = 0;
+
+  _Tp __eps = std::numeric_limits<_Tp>::epsilon();
+
+  do
+    {
+  __term *= (__k == 0
+ ? _Tp(1)
+ : -(__mu - (2 * __k - 1) * (2 * __k - 1)) / (__k * 
__8x));

+
+  __epsP = std::abs(__term) < std::abs(__eps * __P);
+  __P += __term;
+
+  __k++;
+
+  __term *= (__mu - (2 * __k - 1) * (2 * __k - 1)) / (__k * __8x);
+  __epsQ = std::abs(__term) < std::abs(__eps * __Q);
+  __Q += __term;
+
+  if (__epsP && __epsQ && __k > __nu / 2.)
+    break;
+
+  __k++;
+    }
+  while (__k < 1000);
+

   const _Tp __chi = __x - (__nu + _Tp(0.5L))
 * __numeric_constants<_Tp>::__pi_2();

On 01/06/2018 10:23 AM, Paolo Carlini wrote:

Hi,

On 05/01/2018 23:46, Michele Pezzutti wrote:
+ __term *= (__k == 0) ? _Tp(1) : -(__mu - (2 * __k - 1) * (2 * __k - 
1))

+    / (__k * __8x);
In such cases, per the Coding Standards, you want an outer level of 
parentheses wrapping the whole right side expression. See toward the 
end of 
https://www.gnu.org/prep/standards/html_node/Formatting.html#Formatting. 
I see that many other places will need fixing, at some point - this 
rather straightforward rule is often overlooked leading to brittle 
formatting of many expressions, probably because it's really obvious 
in practice only together with Emacs, I'm not sure.


Also - this kind of stylistic nitpicking is partially personal taste - 
the parentheses around (__k == 0) seem definitely redundant to me, and 
I don't think you would find many examples of that in our code.


About the Assignement, please be patient. For example, used to be the 
case that when RMS was traveling couldn't process Assignments, for 
example. It can be slow for many reasons, it's 100% normal.


Paolo..






Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 10:32 AM, Florian Weimer  wrote:
> * H. J. Lu:
>
>> On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer  wrote:
>>> * H. J. Lu:
>>>
 Add -mindirect-branch-loop= option to control loop filler in call and
 return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
 as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
 as loop filler.  The default is 'lfence'.
>>>
>>> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
>>> execution?
>>
>> My understanding is that a loop works better.
>
> Better how?
>
> What about far jumps?  I think they prevent some forms of prefetch on
> i386, so perhaps long mode is similar in that regard?

These are more expensive and we can't guarantee that they are
effective, hence the short loop .

-- 
H.J.


Re: [PATCH] lto, testsuite: Fix ICE in -Wodr (PR lto/83121)

2018-01-08 Thread David Malcolm
On Sat, 2018-01-06 at 08:44 +0100, Richard Biener wrote:
> On January 5, 2018 11:55:11 PM GMT+01:00, David Malcolm  hat.com> wrote:
> > On Fri, 2018-01-05 at 10:36 +0100, Richard Biener wrote:
> > > On Thu, Jan 4, 2018 at 10:52 PM, David Malcolm  > > om>
> > > wrote:
> > > > PR lto/83121 reports an ICE deep inside the linemap code when
> > > > -Wodr
> > > > reports on a type mismatch.
> > > > 
> > > > The root cause is that the warning can access the
> > > > DECL_SOURCE_LOCATION
> > > > of a streamed-in decl before the lto_location_cache has been
> > > > applied.
> > > > 
> > > > lto_location_cache::input_location stores
> > > > RESERVED_LOCATION_COUNT
> > > > (==2)
> > > > as a poison value until the cache is applied:
> > > > 250   /* Keep value RESERVED_LOCATION_COUNT in *loc as
> > > > linemap
> > > > lookups will
> > > > 251  ICE on it.  */
> > > > 
> > > > The fix is relatively simple: apply the cache before reading
> > > > the
> > > > DECL_SOURCE_LOCATION.
> > > > 
> > > > (I wonder if we should instead have a INVALID_LOCATION value to
> > > > handle
> > > > this case more explicitly?  e.g. 0x?  or reserve 2 in
> > > > libcpp for
> > > > that purpose, and have the non-reserved locations start at
> > > > 3?  Either
> > > > would be more invasive, though)
> > > > 
> > > > Triggering the ICE was fiddly: it seems to be affected by many
> > > > things,
> > > > including the order of files, and (I think) by filenames.  My
> > > > theory is
> > > > that it's affected by the ordering of the tree nodes in the LTO
> > > > stream:
> > > > for the ICE to occur, the types in question need to be compared
> > > > before
> > > > some other operation flushes the lto_location_cache.  This
> > > > ordering
> > > > is affected by the hash-based ordering in DFS in lto-streamer-
> > > > out.c, which
> > > > might explain why r255066 seemed to trigger the bug; the only
> > > > relevant
> > > > change to LTO there seemed to be:
> > > >   * lto-streamer-out.c (hash_tree): Hash TYPE_EMPTY_P and
> > > > DECL_PADDING_P.
> > > > If so, then the bug was presumably already present, but hidden.
> > > > 
> > > > The patch also adds regression test coverage for the ICE, which
> > > > is
> > > > more
> > > > involved - as far as I can tell, we don't have an existing way
> > > > to
> > > > verify
> > > > diagnostics emitted during link-time optimization.
> > > > 
> > > > Hence the patch adds some machinery to lib/lto.exp to support
> > > > two
> > > > new
> > > > directives: dg-lto-warning and dg-lto-message, corresponding to
> > > > dg-warning and dg-message respectively, where the diagnostics
> > > > are
> > > > expected to be emitted at link-time.
> > > > 
> > > > The test case includes examples of LTO warnings and notes in
> > > > both
> > > > the
> > > > primary and secondary source files
> > > > 
> > > > Doing so required reusing the logic from DejaGnu for handling
> > > > diagnostics.
> > > > Unfortunately the pertinent code is a 50 line loop within a
> > > > ~200
> > > > line Tcl
> > > > function in dg.exp (dg-test), so I had to copy it from DejaGnu,
> > > > making
> > > > various changes as necessary (see
> > > > lto_handle_diagnostics_for_file
> > > > in the
> > > > patch; for example the LTO version supports multiple source
> > > > files,
> > > > identifying which source file emitted a diagnostic).
> > > > 
> > > > For non-LTO diagnostics we currently ignore surplus "note"
> > > > diagnostics.
> > > > This patch updates lto_prune_warns to follow this behavior
> > > > (since
> > > > otherwise we'd need numerous dg-lto-message directives for the
> > > > motivating
> > > > test case).
> > > > 
> > > > The patch adds these PASS results to g++.sum:
> > > > 
> > > > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o assemble, -O0 -flto
> > > > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_1.o assemble, -O0 -flto
> > > > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_0.C
> > > > line
> > > > 6)
> > > > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_0.C
> > > > line
> > > > 8)
> > > > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_1.C
> > > > line
> > > > 2)
> > > > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_1.C
> > > > line
> > > > 3)
> > > > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o-cp_lto_pr83121_1.o
> > > > link, -O0 -flto
> > > > 
> > > > The output for dg-lto-message above refers to "warnings",
> > > > rather
> > > > than
> > > > "messages" but that's the same as for the non-LTO case, where
> > > > dg-
> > > > message
> > > > also refers to "warnings".
> > > > 
> > > > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> > > > 
> > > > OK for trunk?
> > > 
> > > Hmm, but we do this in warn_odr already?  How's that not enough?
> > > 
> > > At least it seems the place you add this isn't ideal (not at the
> > > "root cause").
> > > 
> > > Richard.
> > 
> > [CCing Honza]
> > 
> > Yes, warn_odr does apply the cache, but this warning is coming from
> > warn_types_mismat

Re: [PATCH, rs6000] generate loop code for memcmp inline expansion

2018-01-08 Thread Aaron Sawdey
On Tue, 2017-12-12 at 10:13 -0600, Segher Boessenkool wrote:
> Please fix those trivialities, and it's okay for trunk (after the
> rtlanal patch is approved too).  Thanks!

Here's the final version of this, which is committed as 256351.


2018-01-08  Aaron Sawdey  

* config/rs6000/rs6000-string.c (do_load_for_compare_from_addr): New
function.
(do_ifelse): New function.
(do_isel): New function.
(do_sub3): New function.
(do_add3): New function.
(do_load_mask_compare): New function.
(do_overlap_load_compare): New function.
(expand_compare_loop): New function.
(expand_block_compare): Call expand_compare_loop() when appropriate.
* config/rs6000/rs6000.opt (-mblock-compare-inline-limit): Change
option description.
(-mblock-compare-inline-loop-limit): New option.


-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-string.c
===
--- gcc/config/rs6000/rs6000-string.c	(revision 256350)
+++ gcc/config/rs6000/rs6000-string.c	(working copy)
@@ -303,6 +303,959 @@
   return MIN (base_align, offset & -offset);
 }
 
+/* Prepare address and then do a load.
+
+   MODE is the mode to use for the load.
+   DEST is the destination register for the data.
+   ADDR is the address to be loaded.
+   ORIG_ADDR is the original address expression.  */
+static void
+do_load_for_compare_from_addr (machine_mode mode, rtx dest, rtx addr,
+			   rtx orig_addr)
+{
+  rtx mem = gen_rtx_MEM (mode, addr);
+  MEM_COPY_ATTRIBUTES (mem, orig_addr);
+  set_mem_size (mem, GET_MODE_SIZE (mode));
+  do_load_for_compare (dest, mem, mode);
+  return;
+}
+
+/* Do a branch for an if/else decision.
+
+   CMPMODE is the mode to use for the comparison.
+   COMPARISON is the rtx code for the compare needed.
+   A is the first thing to be compared.
+   B is the second thing to be compared.
+   CR is the condition code reg input, or NULL_RTX.
+   TRUE_LABEL is the label to branch to if the condition is true.
+
+   The return value is the CR used for the comparison.
+   If CR is null_rtx, then a new register of CMPMODE is generated.
+   If A and B are both null_rtx, then CR must not be null, and the
+   compare is not generated so you can use this with a dot form insn.  */
+
+static void
+do_ifelse (machine_mode cmpmode, rtx_code comparison,
+	   rtx a, rtx b, rtx cr, rtx true_label)
+{
+  gcc_assert ((a == NULL_RTX && b == NULL_RTX && cr != NULL_RTX)
+	  || (a != NULL_RTX && b != NULL_RTX));
+
+  if (cr != NULL_RTX)
+gcc_assert (GET_MODE (cr) == cmpmode);
+  else
+cr = gen_reg_rtx (cmpmode);
+
+  rtx label_ref = gen_rtx_LABEL_REF (VOIDmode, true_label);
+
+  if (a != NULL_RTX)
+emit_move_insn (cr, gen_rtx_COMPARE (cmpmode, a, b));
+
+  rtx cmp_rtx = gen_rtx_fmt_ee (comparison, VOIDmode, cr, const0_rtx);
+
+  rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx, label_ref, pc_rtx);
+  rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse));
+  JUMP_LABEL (j) = true_label;
+  LABEL_NUSES (true_label) += 1;
+}
+
+/* Emit an isel of the proper mode for DEST.
+
+   DEST is the isel destination register.
+   SRC1 is the isel source if CR is true.
+   SRC2 is the isel source if CR is false.
+   CR is the condition for the isel.  */
+static void
+do_isel (rtx dest, rtx cmp, rtx src_t, rtx src_f, rtx cr)
+{
+  if (GET_MODE (dest) == DImode)
+emit_insn (gen_isel_signed_di (dest, cmp, src_t, src_f, cr));
+  else
+emit_insn (gen_isel_signed_si (dest, cmp, src_t, src_f, cr));
+}
+
+/* Emit a subtract of the proper mode for DEST.
+
+   DEST is the destination register for the subtract.
+   SRC1 is the first subtract input.
+   SRC2 is the second subtract input.
+
+   Computes DEST = SRC1-SRC2.  */
+static void
+do_sub3 (rtx dest, rtx src1, rtx src2)
+{
+  if (GET_MODE (dest) == DImode)
+emit_insn (gen_subdi3 (dest, src1, src2));
+  else
+emit_insn (gen_subsi3 (dest, src1, src2));
+}
+
+/* Emit an add of the proper mode for DEST.
+
+   DEST is the destination register for the add.
+   SRC1 is the first add input.
+   SRC2 is the second add input.
+
+   Computes DEST = SRC1+SRC2.  */
+static void
+do_add3 (rtx dest, rtx src1, rtx src2)
+{
+  if (GET_MODE (dest) == DImode)
+emit_insn (gen_adddi3 (dest, src1, src2));
+  else
+emit_insn (gen_addsi3 (dest, src1, src2));
+}
+
+/* Generate rtl for a load, shift, and compare of less than a full word.
+
+   LOAD_MODE is the machine mode for the loads.
+   DIFF is the reg for the difference.
+   CMP_REM is the reg containing the remaining bytes to compare.
+   DCOND is the CCUNS reg for the compare if we are doing P9 code with setb.
+   SRC1_ADDR is the first source address.
+   SRC2_ADDR is the second source address.
+   ORIG_SRC1 is the original first source block's address rtx.
+   ORIG_SRC2 is the origin

Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-08 Thread Jeff Law
On 01/08/2018 07:19 AM, Bill Schmidt wrote:
> 
>> On Jan 7, 2018, at 10:47 PM, Jeff Law  wrote:
>>
>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>>> Hi Richard,
>>>
>>> Unfortunately, I don't see any way that this will be useful for the ppc 
>>> targets.  We don't
>>> have a way to force resolution of a condition prior to continuing 
>>> speculation, so this
>>> will just introduce another comparison that we would speculate past.  For 
>>> our mitigation
>>> we will have to introduce an instruction that halts all speculation at that 
>>> point, and place
>>> it in front of all dangerous loads.  I wish it were otherwise.
>> So could you have an expander for __builtin_load_no_speculate that just
>> emits the magic insn that halts all speculation and essentially ignores
>> the additional stuff that __builtin_load_no_speculate might be able to
>> do on other platforms?
> 
> This is possible, but the builtin documentation is completely misleading for
> powerpc.  We will not provide the semantics that this builtin claims to 
> provide.
> So at a minimum we would need the documentation to indicate that the 
> additional
> range-checking is target-specific behavior as well, not just the speculation 
> code.
> At that point it isn't really a very target-neutral solution.
> 
> What about other targets?  This builtin seems predicated on specific behavior
> of ARM architecture; I don't know whether other targets have a guaranteed
> speculation-rectifying conditional test.
> 
> For POWER, all we would need, or be able to exploit, is 
> 
>   void __builtin_speculation_barrier ()
> 
> or some such.  If there are two dangerous loads in one block, a single call
> to this suffices, but a generic solution involving range checks for specific
> loads would require one per load.
So my concern is that if we have multiple builtins to deal with this
problem, then we're forcing the pain of figuring out which one to use
onto the users.

I'd really like there to be a single builtin to address this problem.
Otherwise the kernel (and anyone else that wants to use this stuff) is
stuck with using both, conditional compilation or something along those
lines which seems like a huge lose.

We'd really like them to be able to add one appropriate
__builtin_whatever call at the key site(s) that does the right thing
regardless of the architecture.

I think that implies we're likely to have arguments that are unused on
some architectures.  I can live with that.  But it also implies we need
better language around the semantics.

As you mention -- there's some belief that we're going to want to do
something for automatic detection in the.  I think a builtin for that
could well be different than what we provide to the kernel folks in the
immediate term.  I want to focus first on getting a builtin the kernel
guys can use today as needed though.

Jeff




[PATCH, combine]: Use correct mode for ASHIFT in force_int_to_mode

2018-01-08 Thread Uros Bizjak
Hello!

Attached patch corrects wrong mode argument in the call to
force_to_mode call for ASHIFT operator. The patch uses "mode" mode,
the same as for all binop and unop operators in the force_int_to_mode
function.

Also, the unpatched function would force operand to op_mode and later
truncate to op_mode again, so it all looks like a typo to me.

2018-01-08  Uros Bizjak  

PR target/83628
* combine.c (force_int_to_mode) : Use mode instead of
op_mode in the force_to_mode call.

Together with a follow-up target patch, the patch fixes
gcc.target/alpha/pr83628-2.c scan-asm failures on alpha.

2018-01-08  Uros Bizjak  

PR target/83628
* combine.c (force_int_to_mode) : Use mode instead of
op_mode in the force_to_mode call.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for mainline and branches?

Uros.

diff --git a/gcc/combine.c b/gcc/combine.c
index 3a42de53455c..6adc0a7d6f85 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -8908,7 +8908,7 @@ force_int_to_mode (rtx x, scalar_int_mode mode,
scalar_int_mode xmode,
 mask = fuller_mask;

   op0 = gen_lowpart_or_truncate (op_mode,
- force_to_mode (XEXP (x, 0), op_mode,
+ force_to_mode (XEXP (x, 0), mode,
 mask, next_select));

   if (op_mode != xmode || op0 != XEXP (x, 0))


Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-08 Thread Wilco Dijkstra
Segher Boessenkool wrote:
> On Mon, Jan 08, 2018 at 01:27:24PM +, Wilco Dijkstra wrote:
>
>> Peepholing is very conservative about instructions using SP and won't touch
>> anything frame related. If this was working better then the backend could 
>> just
>> emit single loads/stores and let peepholing generate LDP/STP.
>
> How unfortunate; that should definitely be improved then.

Improving that helps indeed but won't fix the issue. The epilog may not
always be duplicated and merged like in my example. Any subset of saves
and restores may not be pairable, so the worst case still has twice as many
memory accesses.

> Always pairing two registers together *also* degrades code quality.

No, while it's not optimal, it means smaller code and fewer memory accesses.

>> Another issue is that after pro_and_epilogue pass I see multiple restores
>> of the same registers and then a branch to the same block. We should try
>> to avoid the unnecessary duplication.
>
> It already does that if *all* predecessors of that block do that.  If you
> want to do it in other cases, you end up with more jumps.  That may be
> beneficial in some cases, of course, but it is not an obvious win (and in
> the general case it is, hrm let's use nice words, "terrible").

That may well be the problem. So if there are N predecessors, of which N-1
need to restore the same set of callee saves, but one was shrinkwrapped,
N-1 copies of the same restores might be emitted. N could be the number
of blocks in a function - I really hope it doesn't work out like that...

Wilco

Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread David Woodhouse
On Mon, 2018-01-08 at 09:27 +0100, Florian Weimer wrote:
> * H. J. Lu:
> 
> > 
> > This set of patches for GCC 8 mitigates variant #2 of the
> > speculative execution vulnerabilities on x86 processors identified
> > by CVE-2017-5715, aka Spectre.  They convert indirect branches to
> > call and return thunks to avoid speculative execution via indirect
> > call and jmp.
> Would it make sense to add a mode which relies on an empty return
> stack cache?  Or will CPUs use the regular branch predictor if the
> return stack is empty?
> 
> With an empty return stack cache and no branch predictor, a simple
> PUSH/RET sequence cannot be predicted, so the complex CALL sequence
> with a speculation barrier is not needed.

Some CPUs will use the regular branch predictor if the RSB is empty.
Others just round-robin the RSB and will use the *oldest* entry if they
underflow.



smime.p7s
Description: S/MIME cryptographic signature


[Committed] Fix typo in comment.

2018-01-08 Thread Steve Kargl
Committed as obvious.

Index: ChangeLog
===
--- ChangeLog   (revision 256351)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2018-01-08  Steven G. Kargl  
+
+   * expr.c (gfc_check_pointer_assign): Fix typo in comment.
+
 2018-01-08  Paul Thomas  
 
PR fortran/83611
Index: expr.c
===
--- expr.c  (revision 256351)
+++ expr.c  (working copy)
@@ -3911,7 +3911,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *
 
   /* Error for assignments of contiguous pointers to targets which is not
  contiguous.  Be lenient in the definition of what counts as
- congiguous.  */
+ contiguous.  */
 
   if (lhs_attr.contiguous && !gfc_is_simply_contiguous (rvalue, false, true))
 gfc_error ("Assignment to contiguous pointer from non-contiguous "

-- 
Steve


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-08 Thread David Woodhouse
On Sun, 2018-01-07 at 16:36 -0700, Jeff Law wrote:
> 
> My fundamental problem with this patchkit is that it is 100% x86/x86_64
> specific.
> 
> ISTM we want a target independent mechanism (ie, new standard patterns,
> options, etc) then an x86/x86_64 implementation using that target
> independent framework (ie, the actual implementation of those new
> standard patterns).

From the kernel point of view, I'm not too worried about GCC internal
implementation details. What would be really useful to agree in short
order is the command-line options that invoke this behaviour, and the
ABI for the thunks. 

Once that's done, we can push the patches to Linus and people can build
safe kernels, and we can build with HJ's existing patch set for the
time being. And you can bikeshed the rest to your collective hearts'
content... :)

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

2018-01-08 Thread David Woodhouse
On Mon, 2018-01-08 at 09:20 +0100, Florian Weimer wrote:
> * H. J. Lu:
> 
> > Add -mindirect-branch-loop= option to control loop filler in call and
> > return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
> > as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> > as loop filler.  The default is 'lfence'.
> 
> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
> execution?

The idea is not to stop it per se, but to capture it. We trick the
speculative execution into *thinking* it's going to return back to that
endless loop, which prevents it from doing the branch prediction which
would otherwise have got into trouble.

There has been a fair amount of bikeshedding of precisely what goes in
there already, and '1: pause; jmp 1b' is the best option that hasn't
been shot down in flames by the CPU architects.

HJ, do we still actually need the options for lfence and nop? I thought
those were originally just for testing and could possibly be dropped
now?

Not that I care for Linux since I'm providing my own external thunk
anyway...

smime.p7s
Description: S/MIME cryptographic signature


Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2018-01-08 Thread David Malcolm
On Fri, 2018-01-05 at 15:29 -0500, Jason Merrill wrote:
> On 12/29/2017 12:06 PM, David Malcolm wrote:
> > One issue I ran into was that fold_for_warn doesn't eliminate
> > location wrappers when processing_template_decl, leading to
> > failures of the template-based cases in
> > g++.dg/warn/Wmemset-transposed-args-1.C.
> > 
> > This is due to the early bailout when processing_template_decl
> > within cp_fold:
> > 
> > 2078  if (processing_template_decl
> > 2079  || (EXPR_P (x) && (!TREE_TYPE (x) || TREE_TYPE
> > (x) == error_mark_node)))
> > 2080return x;
> > 
> > which dates back to the merger of the C++ delayed folding branch.
> > 
> > I've fixed that in this version of the patch by removing that
> > "processing_template_decl ||" condition from that cp_fold early
> > bailout.
> 
> Hmm, that makes me nervous.  We might want to fold in templates when 
> called from fold_for_warn, but not for other occurrences.  But I see 
> that we check processing_template_decl in cp_fully_fold and in the
> call 
> to cp_fold_function, so I guess this is fine.
> 
> > +case VIEW_CONVERT_EXPR:
> > +case NON_LVALUE_EXPR:
> >  case CAST_EXPR:
> >  case REINTERPRET_CAST_EXPR:
> >  case CONST_CAST_EXPR:
> > @@ -14937,6 +14940,15 @@ tsubst_copy (tree t, tree args,
> > tsubst_flags_t complain, tree in_decl)
> >  case CONVERT_EXPR:
> >  case NOP_EXPR:
> >{
> > +   if (location_wrapper_p (t))
> > + {
> > +   /* Handle location wrappers by substituting the
> > wrapped node
> > +  first, *then* reusing the resulting type.  Doing
> > the type
> > +  first ensures that we handle template parameters
> > and
> > +  parameter pack expansions.  */
> > +   tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args,
> > complain, in_decl);
> > +   return build1 (code, TREE_TYPE (op0), op0);
> > + }
> 
> I'd rather handle location wrappers separately, and abort if 
> VIEW_CONVERT_EXPR or NON_LVALUE_EXPR appear other than as wrappers.

Once I fixed the issue with location_wrapper_p with decls changing
type, it turns out that trunk is already passing VIEW_CONVERT_EXPR to
tsubst_copy_and_build for non-wrapper nodes (and from there to
tsubst_copy), where the default case currently handles them.  Adding an
assert turns this into an ICE.

g++.dg/delayedfold/builtin1.C is the only instance of it I found in our
test suite, where it's used here:

class RegionLock {
  template  void m_fn1();
  int spinlock;
} acquire_zero;
int acquire_one;
template  void RegionLock::m_fn1() {
  __atomic_compare_exchange(&spinlock, &acquire_zero, &acquire_one, false, 2, 
2);
   ^
}

(gdb) call debug_tree (t)
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x718c9690 precision:32 min  max 
pointer_to_this >
unsigned type_6 DI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-
type 0x718d93f0>
   
arg:0 
unsigned type_6 DI size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x71a18150>
   
arg:0 
arg:0 
../../src/gcc/testsuite/g++.dg/delayedfold/builtin1.C:10:40
start: ../../src/gcc/testsuite/g++.dg/delayedfold/builtin1.C:10:40
finish: ../../src/gcc/testsuite/g++.dg/delayedfold/builtin1.C:10:52>>>

(This one is just for VIEW_CONVERT_EXPR; I don't yet know of any
existing places where NON_LVALUE_EXPR can be passed to tsubst_*).

Given that, is it OK to go with the approach in this (v2) patch? 
(presumably requiring the fix to location_wrapper_p to use a flag
rather than a matching type).


> > @@ -24998,6 +25018,8 @@ build_non_dependent_expr (tree expr)
> >&& !expanding_concept ())
> >  fold_non_dependent_expr (expr);
> >  
> > +  STRIP_ANY_LOCATION_WRAPPER (expr);
> 
> Why is this needed?

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00349.html

> Jason

Thanks
Dave


Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()

2018-01-08 Thread Bill Schmidt
On Jan 8, 2018, at 1:40 PM, Jeff Law  wrote:
> 
> On 01/08/2018 07:19 AM, Bill Schmidt wrote:
>> 
>>> On Jan 7, 2018, at 10:47 PM, Jeff Law  wrote:
>>> 
>>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
 Hi Richard,
 
 Unfortunately, I don't see any way that this will be useful for the ppc 
 targets.  We don't
 have a way to force resolution of a condition prior to continuing 
 speculation, so this
 will just introduce another comparison that we would speculate past.  For 
 our mitigation
 we will have to introduce an instruction that halts all speculation at 
 that point, and place
 it in front of all dangerous loads.  I wish it were otherwise.
>>> So could you have an expander for __builtin_load_no_speculate that just
>>> emits the magic insn that halts all speculation and essentially ignores
>>> the additional stuff that __builtin_load_no_speculate might be able to
>>> do on other platforms?
>> 
>> This is possible, but the builtin documentation is completely misleading for
>> powerpc.  We will not provide the semantics that this builtin claims to 
>> provide.
>> So at a minimum we would need the documentation to indicate that the 
>> additional
>> range-checking is target-specific behavior as well, not just the speculation 
>> code.
>> At that point it isn't really a very target-neutral solution.
>> 
>> What about other targets?  This builtin seems predicated on specific behavior
>> of ARM architecture; I don't know whether other targets have a guaranteed
>> speculation-rectifying conditional test.
>> 
>> For POWER, all we would need, or be able to exploit, is 
>> 
>>  void __builtin_speculation_barrier ()
>> 
>> or some such.  If there are two dangerous loads in one block, a single call
>> to this suffices, but a generic solution involving range checks for specific
>> loads would require one per load.
> So my concern is that if we have multiple builtins to deal with this
> problem, then we're forcing the pain of figuring out which one to use
> onto the users.
> 
> I'd really like there to be a single builtin to address this problem.
> Otherwise the kernel (and anyone else that wants to use this stuff) is
> stuck with using both, conditional compilation or something along those
> lines which seems like a huge lose.
> 
> We'd really like them to be able to add one appropriate
> __builtin_whatever call at the key site(s) that does the right thing
> regardless of the architecture.
> 
> I think that implies we're likely to have arguments that are unused on
> some architectures.  I can live with that.  But it also implies we need
> better language around the semantics.
> 
> As you mention -- there's some belief that we're going to want to do
> something for automatic detection in the.  I think a builtin for that
> could well be different than what we provide to the kernel folks in the
> immediate term.  I want to focus first on getting a builtin the kernel
> guys can use today as needed though.

Hi Jeff,

I agree 100% with this approach.  I just wanted to raise the point in case
other architectures have different needs.  Power can work around this
by just ignoring 4 of the 5 arguments.  As long as nobody else needs
*additional* arguments, this should work out just fine.  But I want to be clear
that the only guarantee of the semantics for everybody is that "speculation 
stops here," while on some processors it may be "speculation stops here
if out of range."  If we can write this into the documentation, then I'm fine
writing a target expander for Power as discussed.

I had a brief interchange with Richi last week, and he suggested that for
the automatic detection we might look into flagging MEM_REFs rather
than inserting a built-in; a target hook can still handle such a flag.  That
has some advantages and some disadvantages that I can think of, so
we'll have to talk that out on the list over time after we get through the
crisis mode reactions.

Thanks!

Bill

> 
> Jeff



Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 1:00 PM, David Woodhouse  wrote:
> On Mon, 2018-01-08 at 09:20 +0100, Florian Weimer wrote:
>> * H. J. Lu:
>>
>> > Add -mindirect-branch-loop= option to control loop filler in call and
>> > return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> > as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> > as loop filler.  The default is 'lfence'.
>>
>> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
>> execution?
>
> The idea is not to stop it per se, but to capture it. We trick the
> speculative execution into *thinking* it's going to return back to that
> endless loop, which prevents it from doing the branch prediction which
> would otherwise have got into trouble.
>
> There has been a fair amount of bikeshedding of precisely what goes in
> there already, and '1: pause; jmp 1b' is the best option that hasn't
> been shot down in flames by the CPU architects.
>
> HJ, do we still actually need the options for lfence and nop? I thought
> those were originally just for testing and could possibly be dropped
> now?

This is a trial change.  It may be useful later.  But I can drop it and
hardcode it to "pause".

-- 
H.J.


Re: [PATCH 3/5] x86: Add -mfunction-return=

2018-01-08 Thread David Woodhouse
On Mon, 2018-01-08 at 03:59 -0800, H.J. Lu wrote:
> On Mon, Jan 8, 2018 at 1:56 AM, Martin Liška  wrote:
> > 
> > On 01/07/2018 11:59 PM, H.J. Lu wrote:
> > > 
> > > Function return thunk is the same as memory thunk for -mindirect-branch=
> > > where the return address is at the top of the stack:
> > > 
> > > __x86_return_thunk:
> > >   call L2
> > > L1:
> > >   lfence
> > >   jmp L1
> > > L2:
> > >   lea 8(%rsp), %rsp|lea 4(%esp), %esp
> > >   ret
> > > 
> > > and function return becomes
> > > 
> > >   jmp __x86_return_thunk
> > Hello.
> > 
> > Can you please explain more usage of the option? Is to prevent a speculative
> > execution of 'ret' instruction (which is an indirect call), as described in 
> > [1]?
> > The paper mentions that return stack predictors are commonly implemented in 
> > some form.
> > Looks that current version of Linux patches does not use the option.
> > 
> This option is requested by Linux kernel people.  It may be used in
> the future.

Right. Essentially the new set of vulnerability are all about
speculative execution. Instructions which *don't* get committed, and
it's supposed to be like they never happen, actually *do* have side-
effects and can leak information.

This is *particularly* problematic for Intel CPUs where the CPU
architects said "ah, screw it, let's not do memory permission checks in
advance; as long as we make sure it's done before we *commit* an
instruction it'll be fine". With the result that you can now basically
read *all* of kernel memory, and hence all of physical memory, directly
from userspace on Intel CPUs. Oops :)

The fix for *that* one is to actually remove the kernel pages from the
page tables while running userspace, instead of just setting the
permissions to prevent access. Hence the whole Kernel Page Table
Isolation thing.

The next interesting attack is the so-called "variant 2" where the
attacker pollutes the branch predictors so that in *kernel* mode the
CPU *speculatively* runs... well, whatever the attacker wants. This is
one that affects lots of vendors, not just Intel. We mitigate this by
eliminating *all* the indirect branches in the kernel, to make it
immune to such an attack.

This is all very well, but *some* CPUs also pull in predictions from
the generic branch target predictor when the return stack buffer has
underflowed (e.g. if there was a call stack of more than X depth).
Hence, in some cases we may yet end up needing this -mfunction-return=
thunk too. As you (Martin) note, we don't use it *yet*. The full set of
mitigations for the various attacks are still being put together, and
the optimal choice for each CPU family does end up being different.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread Andi Kleen
"H.J. Lu"  writes:
>>
>> Talking about PIC thunks, those have I believe . character in their symbols,
>> so that they can't be confused with user functions.  Any reason these
>> retpoline thunks aren't?
>>
>
> They used to have '.'.  It was changed at the last minute since kernel needs 
> to
> export them as regular symbols.

The kernel doesn't actually need that to export the symbols.

While symbol CRCs cannot be generated for symbols with '.', CRCs are not
needed and there were already patches to hide the resulting warnings.

-Andi


Re: [PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-08 Thread Segher Boessenkool
On Mon, Jan 08, 2018 at 08:25:47PM +, Wilco Dijkstra wrote:
> > Always pairing two registers together *also* degrades code quality.
> 
> No, while it's not optimal, it means smaller code and fewer memory accesses.

It means you execute *more* memory accesses.  Always.  This may be
sometimes hidden, sure.  I'm not saying you do not want more ldp's;
I'm saying this particular strategy is very far from ideal.

> >> Another issue is that after pro_and_epilogue pass I see multiple restores
> >> of the same registers and then a branch to the same block. We should try
> >> to avoid the unnecessary duplication.
> >
> > It already does that if *all* predecessors of that block do that.  If you
> > want to do it in other cases, you end up with more jumps.  That may be
> > beneficial in some cases, of course, but it is not an obvious win (and in
> > the general case it is, hrm let's use nice words, "terrible").
> 
> That may well be the problem. So if there are N predecessors, of which N-1
> need to restore the same set of callee saves, but one was shrinkwrapped,
> N-1 copies of the same restores might be emitted. N could be the number
> of blocks in a function - I really hope it doesn't work out like that...

In the worst case it would.  OTOH, joining every combo into blocks costs
O(2**C) (where C is the # components) bb's worst case.

It isn't a simple problem.  The current tuning works pretty well for us,
but no doubt it can be improved!


Segher


[PATCH] PR libstdc++/83709 don't rehash if no insertion

2018-01-08 Thread François Dumont

Hi

    Bug confirmed, limited to range insertion on unordered_set and 
unordered_map.


    I had to specialize _M_insert_range for those containers. Now this 
method maintains the theoretical number of elements to insert which is 
used only if an insertion takes place.


    I also took this oportunity to introduce a small change in 
__distance_fw to report 0 only if __first == __last and do nothing in 
this case.


    * include/bits/hashtable_policy.h
    (__distance_fwd(_Iterator, _Iterator, input_iterator_tag)): Return 1 if
    __first != __last.
    (_Insert_base::_M_insert_range(_Ite, _Ite, _NodeGetter, 
true_type)): New.

    (_Insert_base::_M_insert_range(_Ite, _Ite, _NodeGetter, false_type)):
    Add false_type parameter.
    (_Insert_base::insert): Adapt.
    * include/bits/hashtable.h (_Hashtable::operator=(initializzr_list<>)):
    Adapt.
    (_Hashtable::_M_insert_unique_node): Add __n_elt parameter, defaulted
    to 1.
    (_Hashtable::_M_insert(_Arg&&, const _NodeGen&, true_type, size_t)):
    Likewise.
    (_Hashtable::_M_merge_unique): Pass target number of elements to add to
    produce only 1 rehash if necessary.
    * testsuite/23_containers/unordered_map/insert/83709.cc: New.
    * testsuite/23_containers/unordered_set/insert/83709.cc: New.

Tested under Linux x86_64 normal and debug modes.

Ok to commit ?

François


diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index e6108a6..1245d79 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -490,7 +490,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__reuse_or_alloc_node_type __roan(_M_begin(), *this);
 	_M_before_begin._M_nxt = nullptr;
 	clear();
-	this->_M_insert_range(__l.begin(), __l.end(), __roan);
+	this->_M_insert_range(__l.begin(), __l.end(), __roan, __unique_keys());
 	return *this;
   }
 
@@ -678,7 +678,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // deallocate it on exception.
   iterator
   _M_insert_unique_node(size_type __bkt, __hash_code __code,
-			__node_type* __n);
+			__node_type* __n, size_type __n_elt = 1);
 
   // Insert node with hash code __code. Take ownership of the node,
   // deallocate it on exception.
@@ -707,12 +707,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 	std::pair
-	_M_insert(_Arg&&, const _NodeGenerator&, std::true_type);
+	_M_insert(_Arg&&, const _NodeGenerator&, true_type, size_type = 1);
 
   template
 	iterator
 	_M_insert(_Arg&& __arg, const _NodeGenerator& __node_gen,
-		  std::false_type __uk)
+		  false_type __uk)
 	{
 	  return _M_insert(cend(), std::forward<_Arg>(__arg), __node_gen,
 			   __uk);
@@ -722,7 +722,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 	iterator
 	_M_insert(const_iterator, _Arg&& __arg,
-		  const _NodeGenerator& __node_gen, std::true_type __uk)
+		  const _NodeGenerator& __node_gen, true_type __uk)
 	{
 	  return
 	_M_insert(std::forward<_Arg>(__arg), __node_gen, __uk).first;
@@ -732,7 +732,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 	iterator
 	_M_insert(const_iterator, _Arg&&,
-		  const _NodeGenerator&, std::false_type);
+		  const _NodeGenerator&, false_type);
 
   size_type
   _M_erase(std::true_type, const key_type&);
@@ -884,6 +884,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  node_type>, "Node types are compatible");
 	  __glibcxx_assert(get_allocator() == __src.get_allocator());
 
+	  auto __n_elt = __src.size();
 	  for (auto __i = __src.begin(), __end = __src.end(); __i != __end;)
 	{
 	  auto __pos = __i++;
@@ -893,9 +894,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  if (_M_find_node(__bkt, __k, __code) == nullptr)
 		{
 		  auto __nh = __src.extract(__pos);
-		  _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
+		  _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
 		  __nh._M_ptr = nullptr;
+		  __n_elt = 1;
 		}
+	  else if (__n_elt != 1)
+		--__n_elt;
 	}
 	}
 
@@ -1721,12 +1725,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 	   _H1, _H2, _Hash, _RehashPolicy, _Traits>::
 _M_insert_unique_node(size_type __bkt, __hash_code __code,
-			  __node_type* __node)
+			  __node_type* __node, size_type __n_elt)
 -> iterator
 {
   const __rehash_state& __saved_state = _M_rehash_policy._M_state();
   std::pair __do_rehash
-	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count, 1);
+	= _M_rehash_policy._M_need_rehash(_M_bucket_count, _M_element_count,
+	  __n_elt);
 
   __try
 	{
@@ -1824,7 +1829,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   auto
   _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-  _M_insert(_Arg&& __v, const _NodeGenerator& __node_gen, std::true_type)
+  _M_insert(_Arg&& __v, const _NodeGenerator& __node_gen, true_type,
+		size_type __n_elt)
   -> pair
   {
 	const key_type& __k = this->_M_e

Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread H.J. Lu
On Mon, Jan 8, 2018 at 8:46 AM, Andi Kleen  wrote:
> "H.J. Lu"  writes:
>>>
>>> Talking about PIC thunks, those have I believe . character in their symbols,
>>> so that they can't be confused with user functions.  Any reason these
>>> retpoline thunks aren't?
>>>
>>
>> They used to have '.'.  It was changed at the last minute since kernel needs 
>> to
>> export them as regular symbols.
>
> The kernel doesn't actually need that to export the symbols.
>
> While symbol CRCs cannot be generated for symbols with '.', CRCs are not
> needed and there were already patches to hide the resulting warnings.
>

Andi, can you work it out with David?

-- 
H.J.


Re: [PATCH 1/5] x86: Add -mindirect-branch=

2018-01-08 Thread David Woodhouse
On Mon, 2018-01-08 at 13:32 -0800, H.J. Lu wrote:
> On Mon, Jan 8, 2018 at 8:46 AM, Andi Kleen  wrote:
> > 
> > "H.J. Lu"  writes:
> > > 
> > > > 
> > > > 
> > > > Talking about PIC thunks, those have I believe . character in their 
> > > > symbols,
> > > > so that they can't be confused with user functions.  Any reason these
> > > > retpoline thunks aren't?
> > > > 
> > > They used to have '.'.  It was changed at the last minute since kernel 
> > > needs to
> > > export them as regular symbols.
> > The kernel doesn't actually need that to export the symbols.
> > 
> > While symbol CRCs cannot be generated for symbols with '.', CRCs are not
> > needed and there were already patches to hide the resulting warnings.
> > 
> Andi, can you work it out with David?

It wasn't CONFIG_MODVERSIONS but CONFIG_TRIM_UNUSED_SYMBOLS which was
the straw that broke the camel's back on that one. I'm open to a
solution for that one, but I couldn't see one that didn't make my eyes
bleed. Except for making the symbols not have dots in.

https://patchwork.kernel.org/patch/10148081/

smime.p7s
Description: S/MIME cryptographic signature


  1   2   >