date:20161102

On Tue, Nov 01, 2016 at 08:55:03PM -0600, Martin Sebor wrote:
> struct S {
>   int a, b, c, d;
> };
> 
> #define bos(p, t) __builtin_object_size (p, t)
> #define memset0(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 0))
> #define memset1(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 1))
> 
> void f0 (struct S *s)
> {
>   memset0 (&s->d, 0, 1024);   // no warning here (bos 0)
> }

But we do not want the warning here, there is nothing wrong on it.
The caller may be

void
bar (void)
{
  struct T { struct S header; char payload[1024 - offsetof (struct S, d)]; } t;
  initialize (&t);
  f0 (&t.header);
}

and the callee might rely on that.  Using some header structure at the
beginning and then conditionally on fields in that structure various
payloads occurs in many projects, starting with glibc, gcc, Linux kernel,
... The warning really must not be detached from reality.

If you want a warning for suspicious calls, sure, but
1) it has to be clearly worded significantly differently from how do you
   word it, so that users really understand you are warning about
   suspicious code (though, I really believe it is very common and there
   is really nothing the users can do about it)
2) it really shouldn't be enabled in -Wall, and I think not even in -Wextra

Jakub

Re: [PATCH] Fix PR77407

On Tue, 1 Nov 2016, Marc Glisse wrote:

> On Mon, 31 Oct 2016, Richard Biener wrote:
> 
> > On Fri, 28 Oct 2016, Marc Glisse wrote:
> > 
> > > On Wed, 28 Sep 2016, Richard Biener wrote:
> > > 
> > > > The following patch implements patterns to catch x / abs (x)
> > > > and x / -x, taking advantage of undefinedness at x == 0 as
> > > > opposed to the PR having testcases with explicit != 0 checks.
> > > > 
> > > > Bootstrap / regtest pending on x86_64-unknown-linux-gnu.
> > > > 
> > > > Richard.
> > > > 
> > > > 2016-09-28  Richard Biener  
> > > > 
> > > > PR middle-end/77407
> > > > * match.pd: Add X / abs (X) -> X < 0 ? -1 : 1 and
> > > > X / -X -> -1 simplifications.
> > > 
> > > I notice that we still have the following comment a few lines above:
> > > 
> > > /* Make sure to preserve divisions by zero.  This is the reason why
> > >we don't simplify x / x to 1 or 0 / x to 0.  */
> > > 
> > > Did we give up on preserving divisions by 0? Can we now do the 2
> > > simplifications listed by the comment?
> > 
> > At some point there was at least diagnostics fallout when doing them.
> > There may be also undefined sanitizer fallout depending on when we
> > instrument for that.
> > 
> > But in general yes, we do want to do the two simplifications.  Maybe
> > we can compromise (in case of early fallout) to do them on GIMPLE
> > only.
> > 
> > We could at least add them with a proper nonzero_p predicate.
> 
>  (for div (trunc_div ceil_div floor_div round_div exact_div)
> + (simplify (div @0 @0) { build_one_cst (type); })
> + (simplify (div integer_zerop@0 @1) @0)
> 
> causes no regression on powerpc64le-unknown-linux-gnu with
> --enable-languages=all,obj-c++,go.

Good.  I probably tried last before the C++ early folding changes.

If you'd formally post a patch adding the above (and adjusting the
comment) I'll approve that.

This eventually means we can remove the if (integer_zerop ()) early out
in fold_binary_loc as well.

Richard.

[PATCH] Infer value ranges from stmt ops in EVRP


The following makes EVRP use infer_value_range.  It also adds a bit of
dump verbosity to make EVRP traceable with -details.

Bootstrapped and tested on x86_64-unkown-linux-gnu, applied to trunk.

Richard.

2016-11-02  Richard Biener  

* tree-vrp.c (evrp_dom_walker::before_dom_children): Call
infer_value_range on stmt ops and update value-ranges.
Dump visited stmts and blocks.
(evrp_dom_walker::push_value_range): Dump changes.
(evrp_dom_walker::pop_value_range): Likewise.
(evrp_dom_walker::try_find_new_range): Avoid noop changes.

* gcc.dg/tree-ssa/vrp111.c: New testcase.
* gcc.dg/tree-ssa/pr20702.c: Disable EVRP.
* gcc.dg/tree-ssa/pr21086.c: Likewise.
* gcc.dg/tree-ssa/pr58480.c: Likewise.
* gcc.dg/tree-ssa/vrp08.c: Likewise.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241697)
+++ gcc/tree-vrp.c  (working copy)
@@ -10650,18 +10650,17 @@ public:
 }
   virtual edge before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
-  void push_value_range (const_tree var, value_range *vr);
-  value_range *pop_value_range (const_tree var);
+  void push_value_range (tree var, value_range *vr);
+  value_range *pop_value_range (tree var);
   value_range *try_find_new_range (tree op, tree_code code, tree limit);
 
   /* Cond_stack holds the old VR.  */
-  auto_vec > stack;
+  auto_vec > stack;
   bitmap need_eh_cleanup;
   auto_vec stmts_to_fixup;
   auto_vec stmts_to_remove;
 };
 
-
 /*  Find new range for OP such that (OP CODE LIMIT) is true.  */
 
 value_range *
@@ -10679,6 +10678,10 @@ evrp_dom_walker::try_find_new_range (tre
  PUSH old value in the stack with the old VR.  */
   if (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE)
 {
+  if (old_vr->type == vr.type
+ && vrp_operand_equal_p (old_vr->min, vr.min)
+ && vrp_operand_equal_p (old_vr->max, vr.max))
+   return NULL;
   value_range *new_vr = vrp_value_range_pool.allocate ();
   *new_vr = vr;
   return new_vr;
@@ -10696,7 +10699,10 @@ evrp_dom_walker::before_dom_children (ba
   edge_iterator ei;
   edge e;
 
-  push_value_range (NULL_TREE, NULL);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Visiting BB%d\n", bb->index);
+
+  stack.safe_push (std::make_pair (NULL_TREE, (value_range *)NULL));
 
   edge pred_e = NULL;
   FOR_EACH_EDGE (e, ei, bb->preds)
@@ -10723,6 +10729,11 @@ evrp_dom_walker::before_dom_children (ba
  && (INTEGRAL_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))
  || POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)
{
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Visiting controlling predicate ");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+   }
  /* Entering a new scope.  Try to see if we can find a VR
 here.  */
  tree op1 = gimple_cond_rhs (stmt);
@@ -10778,6 +10789,11 @@ evrp_dom_walker::before_dom_children (ba
continue;
   value_range vr_result = VR_INITIALIZER;
   bool interesting = stmt_interesting_for_vrp (phi);
+  if (interesting && dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Visiting PHI node ");
+ print_gimple_stmt (dump_file, phi, 0, 0);
+   }
   if (!has_unvisited_preds
  && interesting)
extract_range_from_phi_node (phi, &vr_result);
@@ -10814,6 +10830,12 @@ evrp_dom_walker::before_dom_children (ba
   bool was_noreturn = (is_gimple_call (stmt)
   && gimple_call_noreturn_p (stmt));
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Visiting stmt ");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+   }
+
   if (gcond *cond = dyn_cast  (stmt))
{
  vrp_visit_cond_stmt (cond, &taken_edge);
@@ -10825,6 +10847,7 @@ evrp_dom_walker::before_dom_children (ba
gimple_cond_make_false (cond);
  else
gcc_unreachable ();
+ update_stmt (stmt);
}
}
   else if (stmt_interesting_for_vrp (stmt))
@@ -10873,6 +10896,55 @@ evrp_dom_walker::before_dom_children (ba
   else
set_defs_to_varying (stmt);
 
+  /* See if we can derive a range for any of STMT's operands.  */
+  tree op;
+  ssa_op_iter i;
+  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_USE)
+   {
+ tree value;
+ enum tree_code comp_code;
+
+ /* If OP is used in such a way that we can infer a value
+range for it, and we don't find a previous assertion for
+it, create a new assertion location node for OP.  */
+ if (infer_value_range (stmt, op, &comp_code, &value))
+   {
+ /* If we are able to infer a nonzero value range f

Re: [PATCH, testsuite]: Cleanup lib/target-supports.exp, ...

On Wed, Nov 02, 2016 at 12:39:08PM +0530, Prathamesh Kulkarni wrote:
> On 1 November 2016 at 23:41, Uros Bizjak  wrote:
> > On Tue, Nov 1, 2016 at 5:05 PM, Jakub Jelinek  wrote:
> >> On Tue, Nov 01, 2016 at 10:05:22AM +0100, Uros Bizjak wrote:
> >>> ... simplify some conditions and add i?86-*-* target where missing.
> >>>
> >>> 2016-11-01  Uros Bizjak  
> >>>
> >>> * lib/target-supports.exp: Normalize order of i?86 and x86_64 targets.
> >>> Whitespace fixes.
> >> ...
> >>> (check_effective_target_divmod): Add i?86-*-* target.
> >>
> >> This part likely broke
> >> +FAIL: gcc.dg/divmod-1.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> +FAIL: gcc.dg/divmod-2.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> +FAIL: gcc.dg/divmod-3.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> +FAIL: gcc.dg/divmod-4.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> +FAIL: gcc.dg/divmod-6.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> on i686-linux (i.e. 32-bit).
> >
> > No, this is expected (these tests already fail with x86_64 -m32
> > multilib). These will be fixed by [1].
> Oops, sorry for the breakage.
> The tests are meant to check if the divmod transform triggered, which
> is done by scanning
> DIVMOD in the widening_mul dump.
> 
> Apparently I only checked for the triplet "x86_64-*-*" in
> check_effective_target_divmod()
> and it returned 1, which probably caused the divmod DImode tests to
> fail with -m32.
> 
> In general, could I check in check_effective_target_*(), what options
> are passed ?

On some targets, yes.  E.g. on i?86-*-*/x86_64-*-*, one can additionally
test lp64 or ia32 or negation thereof - there are -m32, -mx32 and -m64
modes, -m64 satisfies lp64, -m32 ia32.

Though the predicate seems to be misnamed and not properly documented
if it is about properties of DImode divmod rather than other modes
(SImode, HImode, QImode, TImode, ...).

Jakub

Re: [PATCH] Fix host_size_t_cst_p predicate

On Mon, Oct 31, 2016 at 3:56 PM, Martin Liška  wrote:
> On 10/31/2016 12:11 PM, Richard Biener wrote:
>> On Mon, Oct 31, 2016 at 11:18 AM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Mon, Oct 31, 2016 at 10:58 AM, Richard Sandiford
  wrote:
> Richard Biener  writes:
>> On Mon, Oct 31, 2016 at 10:10 AM, Martin Liška  wrote:
>>> On 10/31/2016 01:12 AM, Richard Sandiford wrote:
 Richard Biener  writes:
> On Thu, Oct 27, 2016 at 5:06 PM, Martin Liška  wrote:
>> On 10/27/2016 03:35 PM, Richard Biener wrote:
>>> On Thu, Oct 27, 2016 at 9:41 AM, Martin Liška  
>>> wrote:
 Running simple test-case w/o the proper header file causes ICE:
 strncmp ("a", "b", -1);

 0xe74462 tree_to_uhwi(tree_node const*)
 ../../gcc/tree.c:7324
 0x90a23f host_size_t_cst_p
 ../../gcc/fold-const-call.c:63
 0x90a23f fold_const_call(combined_fn, tree_node*, tree_node*,
 tree_node*, tree_node*)
 ../../gcc/fold-const-call.c:1512
 0x787b01 fold_builtin_3
 ../../gcc/builtins.c:8385
 0x787b01 fold_builtin_n(unsigned int, tree_node*, tree_node**,
 int, bool)
 ../../gcc/builtins.c:8465
 0x9052b1 fold(tree_node*)
 ../../gcc/fold-const.c:11919
 0x6de2bb c_fully_fold_internal
 ../../gcc/c/c-fold.c:185
 0x6e1f6b c_fully_fold(tree_node*, bool, bool*)
 ../../gcc/c/c-fold.c:90
 0x67cbbf c_process_expr_stmt(unsigned int, tree_node*)
 ../../gcc/c/c-typeck.c:10369
 0x67cfbd c_finish_expr_stmt(unsigned int, tree_node*)
 ../../gcc/c/c-typeck.c:10414
 0x6cb578 c_parser_statement_after_labels
 ../../gcc/c/c-parser.c:5430
 0x6cd333 c_parser_compound_statement_nostart
 ../../gcc/c/c-parser.c:4944
 0x6cdbde c_parser_compound_statement
 ../../gcc/c/c-parser.c:4777
 0x6c93ac c_parser_declaration_or_fndef
 ../../gcc/c/c-parser.c:2176
 0x6d51ab c_parser_external_declaration
 ../../gcc/c/c-parser.c:1574
 0x6d5c09 c_parser_translation_unit
 ../../gcc/c/c-parser.c:1454
 0x6d5c09 c_parse_file()
 ../../gcc/c/c-parser.c:18173
 0x72ffd2 c_common_parse_file()
 ../../gcc/c-family/c-opts.c:1087

 Following patch improves the host_size_t_cst_p predicate.

 Patch can bootstrap on ppc64le-redhat-linux and survives
 regression tests.

 Ready to be installed?
>>>
>>> I believe the wi::min_precision (t, UNSIGNED) <= sizeof (size_t) *
>>> CHAR_BIT test is now redundant.
>>>
>>> OTOH it was probably desired to allow -1 here?  A little looking 
>>> back
>>> in time should tell.
>>
>> Ok, it started with r229922, where it was changed from:
>>
>>   if (tree_fits_uhwi_p (len) && p1 && p2)
>> {
>>   const int i = strncmp (p1, p2, tree_to_uhwi (len));
>> ...
>>
>> to current version:
>>
>> case CFN_BUILT_IN_STRNCMP:
>>   {
>> bool const_size_p = host_size_t_cst_p (arg2, &s2);
>>
>> Thus I'm suggesting to change to back to it.
>>
>> Ready to be installed?
>
> Let's ask Richard.

 The idea with the:

   wi::min_precision (t, UNSIGNED) <= sizeof (size_t) * CHAR_BIT

 test was to stop us attempting 64-bit size_t operations on ILP32 hosts.
 I think we still want that.
>>>
>>> OK, so is the consensus to add tree_fits_uhwi_p predicate to the current
>>> wi::min_precision check, right?
>>
>> Not sure.  If we have host_size_t_cst_p then we should have a 
>> corresponding
>> size_t host_size_t (const_tree) and should use those in pairs.  Not sure
>> why we have sth satisfying host_size_t_cst_p but not tree_fits_uhwi_p.
>
> It's the other way around: something can satisfy tree_fits_uhwi_p
> (i.e. fit within a uint64_t) but not fit within the host's size_t.
> The kind of case I'm thinking of is:
>
>   strncmp ("fi", "fo", (1L << 32) + 1)
>
> for an ILP32 host and LP64 target.  There's a danger that by passing
> the uint64_t value (1L << 32) + 1 to the host's strncmp that we'd
> truncate it to 1, giving the wrong result.

 Yes, but if it passes host_size_t_cst_p why does tree_to_uhwi ICE then?
 (unless we have a > 64bit host size_t).
>>>
>>> Because in Mar

Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool
 wrote:
> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
>> > This patch solves this problem by simply running the 
>> > duplicate_computed_gotos
>> > pass again, as long as it does any work.  The patch looks much bigger than
>> > it is, because I factored out two routines to simplify the control flow.
>>
>> It's made the patch a bit difficult to read. Condensing it a bit...
>
> Well, it would have a goto crossing a loop, or two gotos crossing each
> other, otherwise.  I should have done it as two patches I guess (first
> factor, then change).
>
>> > +  for (;;)
>> >  {
>> > +  if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
>> > +   return 0;
>>
>> This test should not be needed in the loop. This pass can never
>> collapse the function to a single basic block.
>
> Yeah maybe, but that relies on quite a few assumptions.  This is strictly
> an optimisation anyway, will move it outside the loop.
>
>> > +  basic_block bb;
>> > +  FOR_EACH_BB_FN (bb, fun)
>> > +   {
>> > + /* Build the reorder chain for the original order of blocks.  */
>> > + if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
>> > +   bb->aux = bb->next_bb;
>> > +   }
>> >
>> > +  duplicate_computed_gotos_find_candidates (fun, candidates, 
>> > max_size);
>> >
>> > +  bool changed = false;
>> > +  if (!bitmap_empty_p (candidates))
>> > +   changed = duplicate_computed_gotos_do_duplicate (fun, candidates);
>> >
>> > +  if (changed)
>> > +   fixup_partitions ();
>> > +
>> > +  cfg_layout_finalize ();
>>
>> I don't think you have to go into/out-of cfglayout mode for each iteration.
>
> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
> iteration though, I was afraid that interacts.

Ick.  Why would it need a cfg-cleanup every iteration?  I fear this is quadratic
complexity in the number of edges to the compgoto block (and the size of the
function).  Can't the unfactoring perform the "cleanup" we rely on here?

>> >/* Merge the duplicated blocks into predecessors, when possible.  */
>> > +  if (changed)
>> > +   cleanup_cfg (0);
>> > +  else
>> > +   break;
>> >  }
>>
>> Maybe a gcc_assert that the loop doesn't iterate more often than num_edges?
>
> Good plan (num blocks even).
>
> Thanks,
>
>
> Segher

Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

On Mon, 31 Oct 2016, Jakub Jelinek wrote:

> Hi!
> 
> Some automatic VAR_DECLs don't get DECL_RTL set - e.g. if its SSA_NAMEs
> expand to multiple rtls, then there is not a single one that can be used.
> Using DECL_RTL on such VAR_DECLs ICEs.
> 
> I've tried to just return 0 in nonoverlapping_memrefs_p if either
> DECL_RTL_SET_P (expr{x,y}) wasn't true, but that triggered way too often
> during bootstrap/regtest (3800+ times).  So the following patch narrows it
> down more and triggers only on the testcase below.

What kind of cases did this trigger on?  I would have expected we
have DECL_RTL_SET_P on (almost?) all decls that can have it.  And for
those that don't it should be uninteresting to have?

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Hmm.  Can you try splitting out a decl_can_have_rtl () predicate
from make_decl_rtl and use that?

Thanks,
Richard.

> 2016-10-31  Jakub Jelinek  
> 
>   PR target/77834
>   * alias.c (nonoverlapping_memrefs_p): Return 0 if exprx or expry
>   doesn't have rtl set and it can't be safely created.
> 
>   * gcc.dg/pr77834.c: New test.
> 
> --- gcc/alias.c.jj2016-10-21 17:06:27.0 +0200
> +++ gcc/alias.c   2016-10-31 11:38:29.448031590 +0100
> @@ -2755,6 +2755,27 @@ nonoverlapping_memrefs_p (const_rtx x, c
>|| TREE_CODE (expry) == CONST_DECL)
>  return 1;
>  
> +  /* Don't try to create RTL for decls that intentionally don't have
> + DECL_RTL set (e.g. marked as living in multiple places).  */
> +  if (!DECL_RTL_SET_P (exprx))
> +{
> +  if (TREE_CODE (exprx) == PARM_DECL
> +   || TREE_CODE (exprx) == RESULT_DECL
> +   || (VAR_P (exprx)
> +   && !TREE_STATIC (exprx)
> +   && !DECL_EXTERNAL (exprx)))
> + return 0;
> +}
> +  if (!DECL_RTL_SET_P (expry))
> +{
> +  if (TREE_CODE (expry) == PARM_DECL
> +   || TREE_CODE (expry) == RESULT_DECL
> +   || (VAR_P (expry)
> +   && !TREE_STATIC (expry)
> +   && !DECL_EXTERNAL (expry)))
> + return 0;
> +}
> +
>rtlx = DECL_RTL (exprx);
>rtly = DECL_RTL (expry);
>  
> --- gcc/testsuite/gcc.dg/pr77834.c.jj 2016-10-31 11:41:46.290521464 +0100
> +++ gcc/testsuite/gcc.dg/pr77834.c2016-10-31 11:41:24.0 +0100
> @@ -0,0 +1,18 @@
> +/* PR target/77834 */
> +/* { dg-do compile } */
> +/* { dg-options "-O -ftree-pre -Wno-psabi" } */
> +/* { dg-additional-options "-mstringop-strategy=libcall" { target i?86-*-* 
> x86_64-*-* } } */
> +
> +typedef int V __attribute__ ((vector_size (64)));
> +
> +V
> +foo (V u, V v, int w)
> +{
> +  do
> +{
> +  if (u[0]) v ^= u[w];
> +}
> +  while ((V) { 0, u[w] }[1]);
> +  u = (V) { v[v[0]], u[u[0]] };
> +  return v + u;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Five patches for std::experimental::filesystem

2016-11-02 Thread Christophe Lyon

On 27 October 2016 at 15:34, Jonathan Wakely  wrote:
> On 26/10/16 09:24 +0200, Christophe Lyon wrote:
>>
>> Hi Jonathan,
>>
>> On 25 October 2016 at 17:32, Jonathan Wakely  wrote:
>>>
>>> Two more fixes for the filesystem TS, and improved tests.
>>>
>>>   Handle negative times in filesystem::last_write_time
>>>   * src/filesystem/ops.cc
>>>(last_write_time(const path&, file_time_type, error_code&)):
>>> Handle
>>>negative times correctly.
>>>* testsuite/experimental/filesystem/operations/last_write_time.cc:
>>>Test writing file times.
>>>
>>>Fix error handling in copy_file and equivalent
>>>   * src/filesystem/ops.cc (do_copy_file): Report an error if
>>> source
>>> or
>>>destination is not a regular file (LWG 2712).
>>>(equivalent): Fix error handling and result when only one file
>>> exists.
>>>* testsuite/experimental/filesystem/operations/copy.cc: Remove
>>> files
>>>created by tests. Test copying directories.
>>>* testsuite/experimental/filesystem/operations/copy_file.cc:
>>> Remove
>>>files created by tests.
>>>* testsuite/experimental/filesystem/operations/equivalent.cc: New.
>>>* testsuite/experimental/filesystem/operations/is_empty.cc: New.
>>>* testsuite/experimental/filesystem/operations/read_symlink.cc:
>>> Remove
>>>file created by test.
>>>* testsuite/experimental/filesystem/operations/remove_all.cc: New.
>>>* testsuite/util/testsuite_fs.h (~scoped_file): Only try to remove
>>>file if path is non-empty, to support removal by other means.
>>>
>>> Tested x86_64-linux, committed to trunk.
>>>
>>>
>> I can see failures in
>> experimental/filesystem/operations/last_write_time.cc after your
>> committed this patch:
>>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/testsuite/experimental/filesystem/operations/last_write_time.cc:127:
>> void test02(): Assertion 'last_write_time(f.path) == time' failed.
>> on arm*linux* and aarch64*linux* targets.
>
>
> That test will fail for targets where _GLIBCXX_USE_UTIMENSAT is not
> defined, as they use utime() instead which only supports second
> granularity.
>
> This should solve it, by only checking that the file times are within
> one second of the expected value.
>

Hi Jonathan,
Indeed your patch fixes the problem I reported.
Sorry for the delay, I was on holidays.

Thanks,

Christophe


>
> Tested x86_64-linux, committed to trunk.

Re: [PATCH, RFC, rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add

On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek  wrote:
> On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
>> The PowerPC back end loses performance on vector intrinsics, because 
>> currently
>> all of them are treated as calls throughout the middle-end phases and only
>> expanded when they reach RTL.  Our version of altivec.h currently defines the
>> public names of overloaded functions (like vec_add) to be #defines for hidden
>> functions (like __builtin_vec_add), which are recognized in the parser as
>> requiring special back-end support.  Tables in rs6000-c.c handle dispatch of
>> the overloaded functions to specific function calls appropriate to the 
>> argument
>> types.
>
> This doesn't look very nice.  If all you care is that the builtins like
> __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
> into generic vector operations under certain conditions, just fold those
> into whatever you want in targetm.gimple_fold_builtin (gsi).

Note that traditionally "overloading" with GCC "builtins" is done by
using varargs
and the "type generic" attribute.  That doesn't scale to return type overloading
though for which we usually added direct support to the parser (for example
for __builtin_shuffle).

The folding trick of course should work just fine.

Richard.

> Jakub

Re: [PATCH] PR tree-optimization/78170: Truncate sign-extended padding when encoding bitfields

On Tue, Nov 1, 2016 at 12:54 PM, Kyrill Tkachov
 wrote:
> Hi all,
>
> In this PR the code writes a -1 to a bitfield of size 17 bits and ends up
> overwriting another bitfields.
> The problem is that the intermediate buffer in encode_tree_to_bitpos holding
> the value to merge holds
> a 24-bit temporary with -1 written to it i.e. sign-extended to all ones.
> That is how native_encode_expr works.This gets then written to
> the final buffer (well, a shifted version of it).
>
> We should instead be truncating the intermediate value to contain zeros in
> all the bits that we don't want.
> This is already performed for big-endian, this patch just wires it up for
> little-endian.
>
> Bootstrapped and tested on x86_64.
> Ok for trunk?

Ok.

Richard.

> Thanks,
> Kyrill
>
> 2016-11-01  Kyrylo Tkachov  
>
> PR tree-optimization/78170
> * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Truncate padding
> introduced by native_encode_expr on little-endian as well.
>
> 2016-11-01  Kyrylo Tkachov  
>
> PR tree-optimization/78170
> * gcc.c-torture/execute/pr78170.c: New test.

Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

On Wed, Nov 02, 2016 at 10:08:25AM +0100, Richard Biener wrote:
> On Mon, 31 Oct 2016, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > Some automatic VAR_DECLs don't get DECL_RTL set - e.g. if its SSA_NAMEs
> > expand to multiple rtls, then there is not a single one that can be used.
> > Using DECL_RTL on such VAR_DECLs ICEs.
> > 
> > I've tried to just return 0 in nonoverlapping_memrefs_p if either
> > DECL_RTL_SET_P (expr{x,y}) wasn't true, but that triggered way too often
> > during bootstrap/regtest (3800+ times).  So the following patch narrows it
> > down more and triggers only on the testcase below.
> 
> What kind of cases did this trigger on?  I would have expected we
> have DECL_RTL_SET_P on (almost?) all decls that can have it.  And for
> those that don't it should be uninteresting to have?

I admit I havne't studied it in detail yet.  Attaching list of
BITS_PER_WORD, main_input_filename, current_function_name ()
where nonoverlapping_memrefs_p returned 0 early because
either exprx or expry didn't have DECL_RTL_SET_P.  As except for the new
testcase that didn't result into ICE, all of those must have been something
where make_decl_rtl created RTL in that case.

> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Hmm.  Can you try splitting out a decl_can_have_rtl () predicate
> from make_decl_rtl and use that?

Will do.

Jakub


alias.bz2
Description: BZip2 compressed data

Re: [PATCH] PR tree-optimization/78162: Reject negative offsets in store merging early

On Tue, Nov 1, 2016 at 12:54 PM, Kyrill Tkachov
 wrote:
> Hi all,
>
> Store merging ICEs on this invalid testcase because it trips up on the
> negative bitposition to store to.
> It doesn't really expect to handle negative offsets and I believe they won't
> occur very often in valid code anyway.
> Filling out structs/bitfields/class members involves positive offsets.
> I can look into removing all the assumptions about positive offsets if folks
> want me to (should involve removing
> some 'unsigned' modifiers from HOST_WIDE_INTs and double-checking some range
> checks), but for now
> this patch just fixes the ICE by rejecting negative offsets early on.
>
> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64.
>
> Ok for trunk?

Ok (an improvement would be to only reject it after eventually
processing a MEM_REF base_addr).

Richard.

> Thanks,
> Kyrill
>
> 2016-11-01  Kyrylo Tkachov  
>
> PR tree-optimization/78162
> * gimple-ssa-store-merging.c (execute): Mark stores with bitpos < 0
> as invalid.
>
> 2016-11-01  Kyrylo Tkachov  
>
> PR tree-optimization/78162
> * gcc.c-torture/compile/pr78162.c: New test.

Re: [PATCH, RFC, rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add

On Wed, Nov 02, 2016 at 10:19:26AM +0100, Richard Biener wrote:
> On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek  wrote:
> > On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
> >> The PowerPC back end loses performance on vector intrinsics, because 
> >> currently
> >> all of them are treated as calls throughout the middle-end phases and only
> >> expanded when they reach RTL.  Our version of altivec.h currently defines 
> >> the
> >> public names of overloaded functions (like vec_add) to be #defines for 
> >> hidden
> >> functions (like __builtin_vec_add), which are recognized in the parser as
> >> requiring special back-end support.  Tables in rs6000-c.c handle dispatch 
> >> of
> >> the overloaded functions to specific function calls appropriate to the 
> >> argument
> >> types.
> >
> > This doesn't look very nice.  If all you care is that the builtins like
> > __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
> > into generic vector operations under certain conditions, just fold those
> > into whatever you want in targetm.gimple_fold_builtin (gsi).
> 
> Note that traditionally "overloading" with GCC "builtins" is done by
> using varargs
> and the "type generic" attribute.  That doesn't scale to return type 
> overloading
> though for which we usually added direct support to the parser (for example
> for __builtin_shuffle).

My understanding is that rs6000 already does that, it hooks into
resolve_overloaded_builtin which already handles the fully type generic
builtins where not just the arguments, but also the return type can be
picked up.  But it resolves the overloaded builtins into calls to other
builtins that are not type-generic.

So, either that function instead of returning the specific md builtin calls
in some cases already returns trees with the generic behavior of the
builtin, or it returns what it does now and then in the gimple fold builtin
target hook (note, the normal fold builtin target hook is not right for
that, because it is mostly used for folding builtins into constant - callers
will usually throw away other results) fold those specific md builtins
into whatever GIMPLE you want.  If we want to decrease amount of folding in
the FEs, the gimple fold builtin hook is probably better.

Jakub

Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

On Wed, 2 Nov 2016, Jakub Jelinek wrote:

> On Wed, Nov 02, 2016 at 10:08:25AM +0100, Richard Biener wrote:
> > On Mon, 31 Oct 2016, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > Some automatic VAR_DECLs don't get DECL_RTL set - e.g. if its SSA_NAMEs
> > > expand to multiple rtls, then there is not a single one that can be used.
> > > Using DECL_RTL on such VAR_DECLs ICEs.
> > > 
> > > I've tried to just return 0 in nonoverlapping_memrefs_p if either
> > > DECL_RTL_SET_P (expr{x,y}) wasn't true, but that triggered way too often
> > > during bootstrap/regtest (3800+ times).  So the following patch narrows it
> > > down more and triggers only on the testcase below.
> > 
> > What kind of cases did this trigger on?  I would have expected we
> > have DECL_RTL_SET_P on (almost?) all decls that can have it.  And for
> > those that don't it should be uninteresting to have?
> 
> I admit I havne't studied it in detail yet.  Attaching list of
> BITS_PER_WORD, main_input_filename, current_function_name ()
> where nonoverlapping_memrefs_p returned 0 early because
> either exprx or expry didn't have DECL_RTL_SET_P.  As except for the new
> testcase that didn't result into ICE, all of those must have been something
> where make_decl_rtl created RTL in that case.

Ok, just looking at regex.c (I happen to have regex.i around) shows
the first hit as

#2  0x008d9914 in true_dependence_1 (mem=0x761ef360, 
mem_mode=DImode, mem_addr=0x76220cf0, x=0x76605060, 
x_addr=0x7662ffd8, mem_canonicalized=true)
at /space/rguenther/src/svn/trunk/gcc/alias.c:2928
2928  if (nonoverlapping_memrefs_p (mem, x, false))
(gdb) p debug_tree (expry)
 ) [0 __builtin_malloc S1 A8])
$5 = void

called from DSE check_mem_read_rtx.  I believe that's a bogus query.

(I see only a single hit on regex.c, your log seems to have multiple
ones...)

I wouldn't worry about the lost cases, and if I'd be ambitious I'd
try to investigate the above and see what remains...

> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Hmm.  Can you try splitting out a decl_can_have_rtl () predicate
> > from make_decl_rtl and use that?
> 
> Will do.

Thanks,
Richard.

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
> What kind of false positives it is for each case?  Is it with normal
> asan-bootstrap (without your -fsanitize-use-after-scope changes), or
> only with those changes, or only with those changes and
> -fsanitize-use-after-scope used during bootstrap?

Ok, the situation is simpler than I thought:

#include 

int main(int argc, char **argv)
{
  int *ptr;

  switch (argc)
{
  int a;

case 1:
  break;

default:
  ptr = &a;
  break;
}

  fprintf (stderr, "v: %d\n", *ptr);
  return 0;
}

Which is gimplified as:

int * ptr;

switch (argc) , case 1: >
{
  int a;

  try
{
  ASAN_MARK (2, &a, 4);
  :
  goto ;
  :
  ptr = &a;
  goto ;
}
  finally
{
  ASAN_MARK (1, &a, 4);
}
}
:
_1 = *ptr;
stderr.0_2 = stderr;
fprintf (stderr.0_2, "v: %d\n", _1);
D.2577 = 0;
return D.2577;
  }
  D.2577 = 0;
  return D.2577;

and thus we get:
/tmp/switch-case.c:9:11: warning: statement will never be executed 
[-Wswitch-unreachable]
   int a;

I'm wondering where properly fix that, we can either find all these ASAN_MARKs 
in gimplify_switch_expr
and distribute it to all labels (which are gimplified). Or we can put such 
variables to asan_poisoned_variables
if we have information that we're gimplifing statements before a first label. 
Do we know that from gimple context?
If so, these variables will be unpoisoned at the very beginning of each label 
and the ASAN_MARK call in between
switch statement and a first label can be removed.

Thoughts?
Thanks,
Martin

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Tue, Nov 1, 2016 at 4:12 PM, Jakub Jelinek  wrote:
> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>>
>>  static void
>>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
>> - bitmap suitable_for_renaming)
>> + bitmap suitable_for_renaming, bitmap marked_nonaddressable)
>>  {
>>/* Global Variables, result decls cannot be changed.  */
>>if (is_global_var (var)
>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
>> bitmap not_reg_needs,
>> || !bitmap_bit_p (not_reg_needs, DECL_UID (var
>>  {
>>TREE_ADDRESSABLE (var) = 0;
>> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));
>
> Why do you need the marked_nonaddressable bitmap?
>
>>if (is_gimple_reg (var))
>>   bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>>if (dump_file)
>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap 
>> addresses_taken, bitmap not_reg_needs,
>>  }
>>  }
>>
>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
>> +/* Return true when STMT is ASAN mark where second argument is an address
>> +   of a local variable.  */
>>
>> -void
>> -execute_update_addresses_taken (void)
>> +static bool
>> +is_asan_mark_p (gimple *stmt)
>> +{
>> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
>> +return false;
>> +
>> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
>> +  if (TREE_CODE (addr) == ADDR_EXPR
>> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)
>
> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
> would turn it into is_gimple_reg), and don't return true if not.
>
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
>> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK built-ins. 
>>  */
>> +
>> +
>> +static void
>> +execute_update_addresses_taken (bool sanitize_asan_mark = false)
>
> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
> set during the asan pass and kept on until end of compilation of that
> function.  That means even if a var only addressable because of ASAN_MARK is
> discovered after this pass we'd still be able to rewrite it into SSA.

Note that being TREE_ADDRESSABLE also has effects on alias analysis
(didn't follow the patches to see whether you handle ASAN_MARK specially
in points-to analysis and/or alias analysis).

Generally in update-address-taken you can handle ASAN_MARK similar to
MEM_REF (and drop it in the rewrite phase?).

As said, I didnt look at the patches and just came by here seeing
tree-ssa.c pieces...

Richard.

> Jakub

Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

On Wed, Nov 02, 2016 at 10:34:15AM +0100, Richard Biener wrote:
> Ok, just looking at regex.c (I happen to have regex.i around) shows
> the first hit as
> 
> #2  0x008d9914 in true_dependence_1 (mem=0x761ef360, 
> mem_mode=DImode, mem_addr=0x76220cf0, x=0x76605060, 
> x_addr=0x7662ffd8, mem_canonicalized=true)
> at /space/rguenther/src/svn/trunk/gcc/alias.c:2928
> 2928  if (nonoverlapping_memrefs_p (mem, x, false))
> (gdb) p debug_tree (expry)
>   type  
> 2928  if (nonoverlapping_memrefs_p (mem, x, false))
> (gdb) p debug_rtx (mem)
> (mem/f/c:DI (plus:DI (reg/f:DI 7 sp)
> (const_int 200 [0xc8])) [5 p+0 S8 A64])
> $4 = void
> (gdb) p debug_rtx (x)
> (mem:QI (symbol_ref:DI ("malloc") [flags 0x41]  0x76943000 malloc>) [0 __builtin_malloc S1 A8])
> $5 = void
> 
> called from DSE check_mem_read_rtx.  I believe that's a bogus query.

I guess I should also log debug_rtx (DECL_RTL (expr{x,y})) in those cases
then and perhaps also if we ever return non-zero in those cases.

> (I see only a single hit on regex.c, your log seems to have multiple
> ones...)

That was statistics gathered across x86_64-linux and i686-linux simultaneous
bootstraps + regtests (and most likely I've mistyped the first value
from (int) BITS_PER_WORD to (int) BITS_PER_UNIT :( ).  So regex.c is built
there many times (3 times each bootstrap at least).

Jakub

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On 11/02/2016 10:40 AM, Richard Biener wrote:
> On Tue, Nov 1, 2016 at 4:12 PM, Jakub Jelinek  wrote:
>> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
>>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>>>
>>>  static void
>>>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
>>> - bitmap suitable_for_renaming)
>>> + bitmap suitable_for_renaming, bitmap 
>>> marked_nonaddressable)
>>>  {
>>>/* Global Variables, result decls cannot be changed.  */
>>>if (is_global_var (var)
>>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
>>> bitmap not_reg_needs,
>>> || !bitmap_bit_p (not_reg_needs, DECL_UID (var
>>>  {
>>>TREE_ADDRESSABLE (var) = 0;
>>> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));
>>
>> Why do you need the marked_nonaddressable bitmap?
>>
>>>if (is_gimple_reg (var))
>>>   bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>>>if (dump_file)
>>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap 
>>> addresses_taken, bitmap not_reg_needs,
>>>  }
>>>  }
>>>
>>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
>>> +/* Return true when STMT is ASAN mark where second argument is an address
>>> +   of a local variable.  */
>>>
>>> -void
>>> -execute_update_addresses_taken (void)
>>> +static bool
>>> +is_asan_mark_p (gimple *stmt)
>>> +{
>>> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
>>> +return false;
>>> +
>>> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
>>> +  if (TREE_CODE (addr) == ADDR_EXPR
>>> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)
>>
>> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
>> would turn it into is_gimple_reg), and don't return true if not.
>>
>>> +return true;
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
>>> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK 
>>> built-ins.  */
>>> +
>>> +
>>> +static void
>>> +execute_update_addresses_taken (bool sanitize_asan_mark = false)
>>
>> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
>> set during the asan pass and kept on until end of compilation of that
>> function.  That means even if a var only addressable because of ASAN_MARK is
>> discovered after this pass we'd still be able to rewrite it into SSA.
> 
> Note that being TREE_ADDRESSABLE also has effects on alias analysis
> (didn't follow the patches to see whether you handle ASAN_MARK specially
> in points-to analysis and/or alias analysis).

Currently all manipulation with shadow memory is done via a pointer type
which has created a separate aliasing set:

static void
asan_init_shadow_ptr_types (void)
{
  asan_shadow_set = new_alias_set ();
  tree types[3] = { signed_char_type_node, short_integer_type_node,
integer_type_node };

  for (unsigned i = 0; i < 3; i++)
{
  shadow_ptr_types[i] = build_distinct_type_copy (types[i]);
  TYPE_ALIAS_SET (shadow_ptr_types[i]) = asan_shadow_set;
  shadow_ptr_types[i] = build_pointer_type (shadow_ptr_types[i]);
}
...

Martin

> 
> Generally in update-address-taken you can handle ASAN_MARK similar to
> MEM_REF (and drop it in the rewrite phase?).
> 
> As said, I didnt look at the patches and just came by here seeing
> tree-ssa.c pieces...
> 
> Richard.
> 
>> Jakub

Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

On Wed, 2 Nov 2016, Jakub Jelinek wrote:

> On Wed, Nov 02, 2016 at 10:34:15AM +0100, Richard Biener wrote:
> > Ok, just looking at regex.c (I happen to have regex.i around) shows
> > the first hit as
> > 
> > #2  0x008d9914 in true_dependence_1 (mem=0x761ef360, 
> > mem_mode=DImode, mem_addr=0x76220cf0, x=0x76605060, 
> > x_addr=0x7662ffd8, mem_canonicalized=true)
> > at /space/rguenther/src/svn/trunk/gcc/alias.c:2928
> > 2928  if (nonoverlapping_memrefs_p (mem, x, false))
> > (gdb) p debug_tree (expry)
> >   > type  > 
> > 2928  if (nonoverlapping_memrefs_p (mem, x, false))
> > (gdb) p debug_rtx (mem)
> > (mem/f/c:DI (plus:DI (reg/f:DI 7 sp)
> > (const_int 200 [0xc8])) [5 p+0 S8 A64])
> > $4 = void
> > (gdb) p debug_rtx (x)
> > (mem:QI (symbol_ref:DI ("malloc") [flags 0x41]  > 0x76943000 malloc>) [0 __builtin_malloc S1 A8])
> > $5 = void
> > 
> > called from DSE check_mem_read_rtx.  I believe that's a bogus query.
> 
> I guess I should also log debug_rtx (DECL_RTL (expr{x,y})) in those cases
> then and perhaps also if we ever return non-zero in those cases.

Yeah, plus if a followup test would have disambiguated things (the
dispatch to the tree oracle for example).

> > (I see only a single hit on regex.c, your log seems to have multiple
> > ones...)
> 
> That was statistics gathered across x86_64-linux and i686-linux simultaneous
> bootstraps + regtests (and most likely I've mistyped the first value
> from (int) BITS_PER_WORD to (int) BITS_PER_UNIT :( ).  So regex.c is built
> there many times (3 times each bootstrap at least).

Ah, ok.  At least 12 times for x86_64 with multilibs (we build PIC and 
non-PIC variants IIRC).

Richard.

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On 11/01/2016 04:12 PM, Jakub Jelinek wrote:
> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>>  
>>  static void
>>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
>> -bitmap suitable_for_renaming)
>> +bitmap suitable_for_renaming, bitmap marked_nonaddressable)
>>  {
>>/* Global Variables, result decls cannot be changed.  */
>>if (is_global_var (var)
>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
>> bitmap not_reg_needs,
>>|| !bitmap_bit_p (not_reg_needs, DECL_UID (var
>>  {
>>TREE_ADDRESSABLE (var) = 0;
>> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));
> 
> Why do you need the marked_nonaddressable bitmap?

Because the later loop (which visits every gimple statement) iterates only
if there's an entry in suitable_for_renaming.

> 
>>if (is_gimple_reg (var))
>>  bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>>if (dump_file)
>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap 
>> addresses_taken, bitmap not_reg_needs,
>>  }
>>  }
>>  
>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
>> +/* Return true when STMT is ASAN mark where second argument is an address
>> +   of a local variable.  */
>>  
>> -void
>> -execute_update_addresses_taken (void)
>> +static bool
>> +is_asan_mark_p (gimple *stmt)
>> +{
>> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
>> +return false;
>> +
>> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
>> +  if (TREE_CODE (addr) == ADDR_EXPR
>> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)
> 
> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
> would turn it into is_gimple_reg), and don't return true if not.

Well, the predicate is called once before maybe_optimize_var, thus I need to 
have
it conservative and not consider TREE_ADDRESSABLE flag. Having argument would 
work
for that?

> 
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
>> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK built-ins. 
>>  */
>> +
>> +
>> +static void
>> +execute_update_addresses_taken (bool sanitize_asan_mark = false)
> 
> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
> set during the asan pass and kept on until end of compilation of that
> function.  That means even if a var only addressable because of ASAN_MARK is
> discovered after this pass we'd still be able to rewrite it into SSA.

It's doable (please see attached patch) and also nicer. However, one would need 
to
extend curr_properties to long type as we already use 16 existing values.

Martin

> 
>   Jakub
> 

>From ad5f68a010674118fac7ca8b6953f7b99fd3c2a8 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 1 Nov 2016 11:21:20 +0100
Subject: [PATCH] Use-after-scope: do not mark variables that are no longer
 addressable

gcc/ChangeLog:

2016-11-02  Martin Liska  

	* asan.c: Update properties_provided and todo_flags_finish.
	* function.h (struct GTY): Change int to long as there's not
	enough space for a new value.
	* tree-pass.h: Define PROP_asan_check_done.
	* tree-ssa.c (maybe_optimize_var): Add new argument.
	(is_asan_mark_p): New function.
	(execute_update_addresses_taken): Handle ASAN_MARK internal fns.
---
 gcc/asan.c  |  5 +++--
 gcc/function.h  |  2 +-
 gcc/tree-pass.h |  1 +
 gcc/tree-ssa.c  | 69 +
 4 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 95495d2..94ee877 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "builtins.h"
 #include "fnmatch.h"
+#include "tree-ssa.h"
 
 /* AddressSanitizer finds out-of-bounds and use-after-free bugs
with <2x slowdown on average.
@@ -2993,10 +2994,10 @@ const pass_data pass_data_asan =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */
-  0, /* properties_provided */
+  PROP_asan_check_done, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_update_ssa, /* todo_flags_finish */
+  TODO_update_ssa | TODO_update_address_taken, /* todo_flags_finish */
 };
 
 class pass_asan : public gimple_opt_pass
diff --git a/gcc/function.h b/gcc/function.h
index e854c7f..5600488 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -289,7 +289,7 @@ struct GTY(()) function {
   location_t function_end_locus;
 
   /* Properties used by the pass manager.  */
-  unsigned int curr_properties;
+  unsigned long curr_properties;
   unsigned int last_verified;
 
   /* Non-null if the function does something that would prevent it from
d

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 02, 2016 at 10:40:35AM +0100, Richard Biener wrote:
> > I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
> > set during the asan pass and kept on until end of compilation of that
> > function.  That means even if a var only addressable because of ASAN_MARK is
> > discovered after this pass we'd still be able to rewrite it into SSA.
> 
> Note that being TREE_ADDRESSABLE also has effects on alias analysis
> (didn't follow the patches to see whether you handle ASAN_MARK specially
> in points-to analysis and/or alias analysis).
> 
> Generally in update-address-taken you can handle ASAN_MARK similar to
> MEM_REF (and drop it in the rewrite phase?).

That is the intent, but we can't do that before the asan pass, because
otherwise as Martin explained we don't diagnose at runtime bugs where
a variable is used outside of its scope only through a MEM_REF.
So we need to wait for asan pass to actually add a real builtin call that
takes the address in that case.  Except now I really don't see how that
can work for the case where the var is used only properly when it is in the
scope, because the asan pass will still see those being addressable.

Unless I'm missing something we either need to perform further analysis
during the addressable subpass - this variable could be made
non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA
form there be any SSA uses of the poisoning ASAN_MARK?  If yes, keep it
addressable, otherwise rewrite into SSA.
Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into
some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any
uses of those, rewrite it back into addressable immediately or later or
something.

Jakub

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 02, 2016 at 10:36:44AM +0100, Martin Liška wrote:
> On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
> > What kind of false positives it is for each case?  Is it with normal
> > asan-bootstrap (without your -fsanitize-use-after-scope changes), or
> > only with those changes, or only with those changes and
> > -fsanitize-use-after-scope used during bootstrap?
> 
> Ok, the situation is simpler than I thought:

CCing also Marek.
> 
> #include 
> 
> int main(int argc, char **argv)
> {
>   int *ptr;
> 
>   switch (argc)
> {
>   int a;
> 
> case 1:
>   break;
> 
> default:
>   ptr = &a;
>   break;
> }
> 
>   fprintf (stderr, "v: %d\n", *ptr);
>   return 0;
> }
> 
> Which is gimplified as:
> 
> int * ptr;
> 
> switch (argc) , case 1: >
> {
>   int a;
> 
>   try
> {
>   ASAN_MARK (2, &a, 4);
>   :
>   goto ;
>   :
>   ptr = &a;
>   goto ;
> }
>   finally
> {
>   ASAN_MARK (1, &a, 4);
> }
> }
> :
> _1 = *ptr;
> stderr.0_2 = stderr;
> fprintf (stderr.0_2, "v: %d\n", _1);
> D.2577 = 0;
> return D.2577;
>   }
>   D.2577 = 0;
>   return D.2577;
> 
> and thus we get:
> /tmp/switch-case.c:9:11: warning: statement will never be executed 
> [-Wswitch-unreachable]
>int a;
> 
> I'm wondering where properly fix that, we can either find all these 
> ASAN_MARKs in gimplify_switch_expr
> and distribute it to all labels (which are gimplified). Or we can put such 
> variables to asan_poisoned_variables
> if we have information that we're gimplifing statements before a first label. 
> Do we know that from gimple context?
> If so, these variables will be unpoisoned at the very beginning of each label 
> and the ASAN_MARK call in between
> switch statement and a first label can be removed.

Wouldn't it be easiest if -Wswitch-unreachable warning just ignored
the ASAN_MARK internal calls altogether?
Do you emit there any other statements, or just ASAN_MARK and nothing else?

Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
statement?  Otherwise, consider this being done in a loop, after the first
iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with
args 1 and have in case 1: a = 0;, won't that trigger runtime error?

Jakub

Re: [PATCH, ARM/testsuite 6/7, ping] Force soft float in ARMv6-M and ARMv8-M Baseline options


Ping?

Best regards,

Thomas

On 28/10/16 10:49, Thomas Preudhomme wrote:

On 22/09/16 16:47, Richard Earnshaw (lists) wrote:

On 22/09/16 15:51, Thomas Preudhomme wrote:

Sorry, noticed an error in the patch. It was not caught during testing
because GCC was built with --with-mode=thumb. Correct patch attached.

Best regards,

Thomas

On 22/09/16 14:49, Thomas Preudhomme wrote:

Hi,

ARMv6-M and ARMv8-M Baseline only support soft float ABI. Therefore, the
arm_arch_v8m_base add option should pass -mfloat-abi=soft, much like
-mthumb is
passed for architectures that only support Thumb instruction set. This
patch
adds -mfloat-abi=soft to both arm_arch_v6m and arm_arch_v8m_base add
options.
Patch is in attachment.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2016-07-15  Thomas Preud'homme  

* lib/target-supports.exp (add_options_for_arm_arch_v6m): Add
-mfloat-abi=soft option.
(add_options_for_arm_arch_v8m_base): Likewise.


Is this ok for trunk?

Best regards,

Thomas


6_softfloat_testing_v6m_v8m_baseline.patch


diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index
0dabea0850124947a7fe333e0b94c4077434f278..b5d72f1283be6a6e4736a1d20936e169c1384398
100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3540,24 +3540,25 @@ proc check_effective_target_arm_fp16_hw { } {
 # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */
 #/* { dg-add-options arm_arch_v5 } */
 # /* { dg-require-effective-target arm_arch_v5_multilib } */
-foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
- v4t "-march=armv4t" __ARM_ARCH_4T__
- v5 "-march=armv5 -marm" __ARM_ARCH_5__
- v5t "-march=armv5t" __ARM_ARCH_5T__
- v5te "-march=armv5te" __ARM_ARCH_5TE__
- v6 "-march=armv6" __ARM_ARCH_6__
- v6k "-march=armv6k" __ARM_ARCH_6K__
- v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
- v6z "-march=armv6z" __ARM_ARCH_6Z__
- v6m "-march=armv6-m -mthumb" __ARM_ARCH_6M__
- v7a "-march=armv7-a" __ARM_ARCH_7A__
- v7r "-march=armv7-r" __ARM_ARCH_7R__
- v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
- v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
- v8a "-march=armv8-a" __ARM_ARCH_8A__
- v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
- v8m_base "-march=armv8-m.base -mthumb"
__ARM_ARCH_8M_BASE__
- v8m_main "-march=armv8-m.main -mthumb"
__ARM_ARCH_8M_MAIN__ } {
+foreach { armfunc armflag armdef } {
+v4 "-march=armv4 -marm" __ARM_ARCH_4__
+v4t "-march=armv4t" __ARM_ARCH_4T__
+v5 "-march=armv5 -marm" __ARM_ARCH_5__
+v5t "-march=armv5t" __ARM_ARCH_5T__
+v5te "-march=armv5te" __ARM_ARCH_5TE__
+v6 "-march=armv6" __ARM_ARCH_6__
+v6k "-march=armv6k" __ARM_ARCH_6K__
+v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
+v6z "-march=armv6z" __ARM_ARCH_6Z__
+v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
+v7a "-march=armv7-a" __ARM_ARCH_7A__
+v7r "-march=armv7-r" __ARM_ARCH_7R__
+v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
+v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
+v8a "-march=armv8-a" __ARM_ARCH_8A__
+v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
+v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft"
__ARM_ARCH_8M_BASE__
+v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 proc check_effective_target_arm_arch_FUNC_ok { } {
 if { [ string match "*-marm*" "FLAG" ] &&



I think if you're going to do this you need to also check that changing
the ABI in this way isn't incompatible with other aspects of how the
user has invoked dejagnu.


The reason this patch was made is that without it dg-require-effective-target
arm_arch_v8m_base_ok evaluates to true for an arm-none-linux-gnueabihf toolchain
but then any testcase containing a function for such a target (such as the
atomic-op-* in gcc.target/arm) will error out because ARMv8-M Baseline does not
support hard float ABI.

I see 2 ways to fix this:

1) the approach taken in this patch, ie saying that to select ARMv8-M baseline
architecture you need the right -march, -mthumb but also the right float ABI.

Note that the comment at the top of that procedure says:
# Creates a series of routines that return 1 if the given architecture
# can be selected and a routine to give the flags to select that architecture

2) Add a function to the assembly that is used to test support for the
architecture.

The reason I favor the first one is that it enables more test while the second
test would just skip ARMv6-M and ARMv8-M Baseline tests for
arm-none-linux-gnueabihf toolchai

Re: [PATCH, GCC/ARM 1/2, ping] Add multilib support for embedded bare-metal targets


Ping?

Best regards,

Thomas

On 27/10/16 15:26, Thomas Preudhomme wrote:

Hi Kyrill,

On 27/10/16 10:45, Kyrill Tkachov wrote:

Hi Thomas,

On 24/10/16 09:06, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile ARM
architectures. This specific patch adds the t-rmprofile multilib Makefile
fragment for the former objective. Multilib are built for all M profile
architecture involved: ARMv6S-M, ARMv7-M and ARMv7E-M as well as ARMv7. ARMv7
multilib is used for R profile architectures but also A profile architectures.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme 

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.


Testing:

== aprofile ==
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* default spec (gcc -dumpspecs) is the same before and after the patchset for
aprofile
* No difference in --print-multi-directory between before and after the
patchset
for aprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU and
float ABI

== rmprofile ==
* aprofile and rmprofile use similar directory structure (ISA/arch/FPU/float
ABI) and directory naming
* Difference in --print-multi-directory between before [1] and after the
patchset for rmprofile for all combination of ISA (ARM/Thumb), architecture,
CPU, FPU and float ABI modulo the name and directory structure changes

[1] as per patch applied in ARM embedded branches
https://gcc.gnu.org/viewcvs/gcc/branches/ARM/embedded-5-branch/gcc/config/arm/t-baremetal?view=markup




== aprofile + rmprofile ==
* aprofile,rmprofile and rmprofile,aprofile builds give an error saying it is
not supported


Is this ok for master branch?

Best regards,

Thomas


+# Arch Matches
+MULTILIB_MATCHES   += march?armv6s-m=march?armv6-m
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8-m.main+dsp
+MULTILIB_MATCHES   += march?armv7=march?armv7-r
+ifeq (,$(HAS_APROFILE))
+MULTILIB_MATCHES   += march?armv7=march?armv7-a
+MULTILIB_MATCHES   += march?armv7=march?armv7ve
+MULTILIB_MATCHES   += march?armv7=march?armv8-a
+MULTILIB_MATCHES   += march?armv7=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a+crc
+endif

I think you want to update the patch to handle -march=armv8.2-a and
armv8.2-a+fp16
Thanks,
Kyrill


Indeed. Please find updated ChangeLog and patch (attached):

*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.

Ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3739,6 +3739,16 @@ case "${target}" in
 # pragmatic.
 tmake_profile_file="arm/t-aprofile"
 ;;
+			rmprofile)
+# Note that arm/t-rmprofile is a
+# stand-alone make file fragment to be
+# used only with itself.  We do not
+# specifically use the
+# TM_MULTILIB_OPTION framework because
+# this shorthand is more
+# pragmatic.
+tmake_profile_file="arm/t-rmprofile"
+;;
 			default)
 ;;
 			*)
@@ -3748,9 +3758,10 @@ case "${target}" in
 			esac
 
 			if test "x${tmake_profile_file}" != x ; then
-# arm/t-aprofile is only designed to work
-# without any with-cpu, with-arch, with-mode,
-# with-fpu or with-float options.
+# arm/t-aprofile and arm/t-rmprofile are only
+# designed to work without any with-cpu,
+# with-arch, with-mode, with-fpu or with-float
+# options.
 if test "x$with_arch" != x \
 || test "x$with_cpu" != x \
 || test "x$with_float" != x \
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
new file mode 100644
index ..c8b5c9cbd03694eea69855e20372afa3e97d6b4c
--- /dev/null
+++ b/gcc/config/arm/t-rmprofile
@@ -0,0 +1,174 @@
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without e

Re: [PATCH, GCC/ARM 2/2, ping2] Allow combination of aprofile and rmprofile multilibs


Ping?

Best regards,

Thomas

On 24/10/16 09:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile ARM
architectures. This specific patch is concerned with the latter. The patch works
by moving the bits shared by both aprofile and rmprofile multilib build
(variable initilization as well as ISA and float ABI to build multilib for) to a
new t-multilib file. Then, based on which profile was requested in
--with-multilib-list option, that files includes t-aprofile and/or t-rmprofile
where the architecture and FPU to build the multilib for are specified.

Unfortunately the duplication of CPU to A profile architectures could not be
avoided because substitution due to MULTILIB_MATCHES are not transitive.
Therefore, mapping armv7-a to armv7 for rmprofile multilib build does not have
the expected effect. Two patches were written to allow this using 2 different
approaches but I decided against it because this is not the right solution IMO.
See caveats below for what I believe is the correct approach.


*** combined build caveats ***

As the documentation in this patch warns, there is a few caveats to using a
combined multilib build due to the way the multilib framework works.

1) For instance, when using only rmprofile the combination of options -mthumb
-march=armv7 -mfpu=neon the thumb/-march=armv7 multilib but in a combined
multilib build the default multilib would be used. This is because in the
rmprofile build -mfpu=neon is not specified in MULTILIB_OPTION and thus the
option is ignored when considering MULTILIB_REQUIRED entries.

2) Another issue is the fact that aprofile and rmprofile multilib build have
some conflicting requirements in terms of how to map options for which no
multilib is built to another option. (i) A first example of this is the
difference of CPU to architecture mapping mentionned above: rmprofile multilib
build needs A profile CPUs and architectures to be mapped down to ARMv7 so that
one of the v7-ar multilib gets chosen in such a case but aprofile needs A
profile architectures to stand on their own because multilibs are built for
several architectures.

(ii) Another example of this is that in aprofile multilib build no multilib is
built with -mfpu=fpv5-d16 but some multilibs are built with -mfpu=fpv4-d16.
Therefore, aprofile defines a match rule to map fpv5-d16 onto fpv4-d16. However,
rmprofile multilib profile *does* build some multilibs with -mfpu=fpv5-d16. This
has the consequence that when building for -mthumb -march=armv7e-m
-mfpu=fpv5-d16 -mfloat-abi=hard the default multilib is chosen because this is
rewritten into -mthumb -march=armv7e-m -mfpu=fpv5-d16 -mfloat-abi=hard and there
is no multilib for that.

Both of these issues could be handled by using MULTILIB_REUSE instead of
MULTILIB_MATCHES but this would require a large set of rules. I believe instead
the right approach is to create a new mechanism to inform GCC on how options can
be down mapped _when no multilib can be found_ which would require a smaller set
of rules and would make it explicit that the options are not equivalent. A patch
will be posted to this effect at a later time.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow combinations of aprofile and rmprofile values for
--with-multilib-list.
* config/arm/t-multilib: New file.
* config/arm/t-aprofile: Remove initialization of MULTILIB_*
variables.  Remove setting of ISA and floating-point ABI in
MULTILIB_OPTIONS and MULTILIB_DIRNAMES.  Set architecture and FPU in
MULTI_ARCH_OPTS_A and MULTI_ARCH_DIRS_A rather than MULTILIB_OPTIONS
and MULTILIB_DIRNAMES respectively.  Add comment to introduce all
matches.  Add architecture matches for marvel-pj4 and generic-armv7-a
CPU options.
* config/arm/t-rmprofile: Likewise except for the matches changes.
* doc/install.texi (--with-multilib-list): Document the combination of
aprofile and rmprofile values and warn about pitfalls in doing that.


Testing:

* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same for aprofile,rmprofile
and rmprofile,aprofile
* default spec (gcc -dumpspecs) is the same for aprofile,rmprofile and
rmprofile,aprofile

* Difference in --print-multi-directory between aprofile or rmprofile and
aprofile,rmprofile for all combination of ISA (ARM/Thumb), architecture, CPU,
FPU and float ABI is as expected (best multilib for the combination is chosen),
modulo the caveat mentionned above and the new marvel-pj4 and generic-armv7-a
CPU to architecture mapping.


Is this ok for master?

Best regards,

Thomas

Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M33


Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M33 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Mainline with DSP
extensions architecture and arm_v7m_tune tuning parameters for the time being.
It also updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m33-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme  

* config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m33): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M33 processor.


Tested by building libgcc and libstdc++ for Cortex-M33 and running a hello world
compiled for it.

Is this ok for master?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 9293429b3f9a026bcdacc1651c534bdf14d4df1e..cd79bc505853d4dda6cf2e58bdc2d129032befef 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -73,7 +73,7 @@ ARM_ARCH("armv8-m.base", cortexm23, 8M_BASE,
 	 ARM_FSET_MAKE_CPU1 (			  FL_FOR_ARCH8M_BASE))
 ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN,
 	 ARM_FSET_MAKE_CPU1(FL_CO_PROC |	  FL_FOR_ARCH8M_MAIN))
-ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN,
+ARM_ARCH("armv8-m.main+dsp", cortexm33, 8M_MAIN,
 	 ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 940b5de82f0340fc0c26be80d47729bc1f193db0..ec63ee4abe54af06cd5531486f294f9a8dae71a1 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -168,6 +168,7 @@ ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7, cortexa7,	7A,	ARM_FSET_MAKE_
 /* V8 Architecture Processors */
 ARM_CORE("cortex-m23",	cortexm23, cortexm23,	8M_BASE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8M_BASE), v6m)
 ARM_CORE("cortex-a32",	cortexa32, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35)
+ARM_CORE("cortex-m33",	cortexm33, cortexm33,	8M_MAIN, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN), v7m)
 ARM_CORE("cortex-a35",	cortexa35, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35)
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53)
 ARM_CORE("cortex-a57",	cortexa57, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index de712924afd33ba1e6e65cb56a5b260858d0cc4f..f7886b94be779fcba91506e77574662fe7188876 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -313,6 +313,9 @@ EnumValue
 Enum(processor_type) String(cortex-a32) Value(cortexa32)
 
 EnumValue
+Enum(processor_type) String(cortex-m33) Value(cortexm33)
+
+EnumValue
 Enum(processor_type) String(cortex-a35) Value(cortexa35)
 
 EnumValue
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 46c2c9258bcad43618a50f6201414fa084cb5b56..e782baccf424e51ac19ef5f02d25ed4f4eb0541d 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -33,8 +33,9 @@
 	cortexr8,cortexm7,cortexm4,
 	cortexm3,marvell_pj4,cortexa15cortexa7,
 	cortexa17cortexa7,cortexm23,cortexa32,
-	cortexa35,cortexa53,cortexa57,
-	cortexa72,cortexa73,exynosm1,
-	qdf24xx,xgene1,cortexa57cortexa53,
-	cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
+	cortexm33,cortexa35,cortexa53,
+	cortexa57,cortexa72,cortexa73,
+	exynosm1,qdf24xx,xgene1,
+	cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,
+	cortexa73cortexa53"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 302302f0d2d522fe282bb1d12687b53de72cae25..d45a1ca421901da25e16d965a9474438ea10f349 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -97,7 +97,7 @@
|march=armv8.2-a+fp16\
|march=armv8-m.base|mcpu=cortex-m23			\
|march=armv8-m.main	\
-   |march=armv8-m.main+dsp\
+   |march=armv8-m.main+dsp|mcpu=cortex-m33		\
:%{!r:--be8}}}"
 #else
 #define BE8_LINK_SPEC \
@@ -136,7 +136,7 @@
|march=armv8.2-a+fp16\
|march=armv8-m.base|mcpu=cortex-m23			\
|march=armv8-m.main	\
-

Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M23


Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M23 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Baseline
architecture and arm_v6m_tune tuning parameters for the time being. It also
updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m23-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme  

* config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m23): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores
this tuning parameters apply to in the comment.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M23 processor.


Tested by building libgcc and libstdc++ for Cortex-M23 and running a hello world
compiled for it.

Is this ok for master?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 4b196a7d1188de5eca028e5c2597bbc20835201f..9293429b3f9a026bcdacc1651c534bdf14d4df1e 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -69,7 +69,7 @@ ARM_ARCH ("armv8.2-a", cortexa53,  8A,
 ARM_ARCH ("armv8.2-a+fp16", cortexa53,  8A,
 	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
 			 FL2_FOR_ARCH8_2A | FL2_FP16INST))
-ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE,
+ARM_ARCH("armv8-m.base", cortexm23, 8M_BASE,
 	 ARM_FSET_MAKE_CPU1 (			  FL_FOR_ARCH8M_BASE))
 ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN,
 	 ARM_FSET_MAKE_CPU1(FL_CO_PROC |	  FL_FOR_ARCH8M_MAIN))
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 2072e1e6f8d84533deead24e6fb0b6aff7603f24..940b5de82f0340fc0c26be80d47729bc1f193db0 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -166,6 +166,7 @@ ARM_CORE("cortex-a15.cortex-a7", cortexa15cortexa7, cortexa7,	7A,	ARM_FSET_MAKE_
 ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7, cortexa7,	7A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV | FL_FOR_ARCH7A), cortex_a12)
 
 /* V8 Architecture Processors */
+ARM_CORE("cortex-m23",	cortexm23, cortexm23,	8M_BASE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8M_BASE), v6m)
 ARM_CORE("cortex-a32",	cortexa32, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35)
 ARM_CORE("cortex-a35",	cortexa35, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35)
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index ee9e3bb7ec57e0e8f2f15b83442711b9faf82d20..de712924afd33ba1e6e65cb56a5b260858d0cc4f 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -307,6 +307,9 @@ EnumValue
 Enum(processor_type) String(cortex-a17.cortex-a7) Value(cortexa17cortexa7)
 
 EnumValue
+Enum(processor_type) String(cortex-m23) Value(cortexm23)
+
+EnumValue
 Enum(processor_type) String(cortex-a32) Value(cortexa32)
 
 EnumValue
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 594ce9d1734451f89812200191cb35f1f579289e..46c2c9258bcad43618a50f6201414fa084cb5b56 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -32,9 +32,9 @@
 	cortexr4f,cortexr5,cortexr7,
 	cortexr8,cortexm7,cortexm4,
 	cortexm3,marvell_pj4,cortexa15cortexa7,
-	cortexa17cortexa7,cortexa32,cortexa35,
-	cortexa53,cortexa57,cortexa72,
-	cortexa73,exynosm1,qdf24xx,
-	xgene1,cortexa57cortexa53,cortexa72cortexa53,
-	cortexa73cortexa35,cortexa73cortexa53"
+	cortexa17cortexa7,cortexm23,cortexa32,
+	cortexa35,cortexa53,cortexa57,
+	cortexa72,cortexa73,exynosm1,
+	qdf24xx,xgene1,cortexa57cortexa53,
+	cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 022c1d72a1272e56397dc7e2018483e77f18b90d..39b2da05d2135c68032231bb7780104061355786 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2243,7 +2243,8 @@ const struct tune_params arm_cortex_m7_tune =
 };
 
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
-   arm_v6t2_tune. It is used for cortex-m0, cortex-m1 and cortex-m0plus.  */
+   arm_v6t2_tune.  It is used for cortex-m0, cortex-m1, cortex-m0plus and
+   cortex-m23.  */
 const struct tune_params arm_v6m_tune =
 {
   arm_9e_rtx_costs,
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 0da98fb711bdcaf5add6e392060f4edaddf6cf05..302302f0d2d522f

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On 11/02/2016 10:59 AM, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 10:36:44AM +0100, Martin Liška wrote:
>> On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
>>> What kind of false positives it is for each case?  Is it with normal
>>> asan-bootstrap (without your -fsanitize-use-after-scope changes), or
>>> only with those changes, or only with those changes and
>>> -fsanitize-use-after-scope used during bootstrap?
>>
>> Ok, the situation is simpler than I thought:
> 
> CCing also Marek.
>>
>> #include 
>>
>> int main(int argc, char **argv)
>> {
>>   int *ptr;
>>
>>   switch (argc)
>> {
>>   int a;
>>
>> case 1:
>>   break;
>>
>> default:
>>   ptr = &a;
>>   break;
>> }
>>
>>   fprintf (stderr, "v: %d\n", *ptr);
>>   return 0;
>> }
>>
>> Which is gimplified as:
>>
>> int * ptr;
>>
>> switch (argc) , case 1: >
>> {
>>   int a;
>>
>>   try
>> {
>>   ASAN_MARK (2, &a, 4);
>>   :
>>   goto ;
>>   :
>>   ptr = &a;
>>   goto ;
>> }
>>   finally
>> {
>>   ASAN_MARK (1, &a, 4);
>> }
>> }
>> :
>> _1 = *ptr;
>> stderr.0_2 = stderr;
>> fprintf (stderr.0_2, "v: %d\n", _1);
>> D.2577 = 0;
>> return D.2577;
>>   }
>>   D.2577 = 0;
>>   return D.2577;
>>
>> and thus we get:
>> /tmp/switch-case.c:9:11: warning: statement will never be executed 
>> [-Wswitch-unreachable]
>>int a;
>>
>> I'm wondering where properly fix that, we can either find all these 
>> ASAN_MARKs in gimplify_switch_expr
>> and distribute it to all labels (which are gimplified). Or we can put such 
>> variables to asan_poisoned_variables
>> if we have information that we're gimplifing statements before a first 
>> label. Do we know that from gimple context?
>> If so, these variables will be unpoisoned at the very beginning of each 
>> label and the ASAN_MARK call in between
>> switch statement and a first label can be removed.
> 
> Wouldn't it be easiest if -Wswitch-unreachable warning just ignored
> the ASAN_MARK internal calls altogether?
> Do you emit there any other statements, or just ASAN_MARK and nothing else?

Yep, skipping warning can be done easily, however gimplified code is wrong as
un-poisoning is not done for variable (even for a valid program).

> 
> Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
> statement?  Otherwise, consider this being done in a loop, after the first
> iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with
> args 1 and have in case 1: a = 0;, won't that trigger runtime error?

Hopefully having the un-poisoning call be encapsulated in finally block would 
properly
clean up the variable. Or am I wrong?

Martin

> 
>   Jakub
>

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote:
> > Which is gimplified as:
> > 
> > int * ptr;
> > 
> > switch (argc) , case 1: >
> > {
> >   int a;
> > 
> >   try
> > {
> >   ASAN_MARK (2, &a, 4);
> >   :
> >   goto ;
> >   :
> >   ptr = &a;
> >   goto ;
> > }
> >   finally
> > {
> >   ASAN_MARK (1, &a, 4);
> > }

> Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
> statement?  Otherwise, consider this being done in a loop, after the first
> iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with
> args 1 and have in case 1: a = 0;, won't that trigger runtime error?

Wonder if for the variables declared inside of switch body, because we don't
care about uses before scope, it wouldn't be more efficient to arrange for
int *p, *q, *r;
switch (x)
  {
int a;
  case 1:
p = &a;
a = 5;
break;
int b;
  case 2:
int c;
q = &b;
r = &c;
b = 3;
c = 4;
break;
  }
to effectively ASAN_MARK (2, &a, 4); ASAN_MARK (2, &b, 4); ASAN_MARK (2, &c, 4);
before the GIMPLE_SWITCH, instead of unpoisoning them on every case label
where they might be in scope.  Though, of course, at least until lower pass
that is quite ugly, because it would refer to "not yet declared" variables.
Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not
the expression evaluation of the switch control expression) inside of the
switches' GIMPLE_BIND.

Jakub

Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M23



On 02/11/16 10:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M23 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Baseline
architecture and arm_v6m_tune tuning parameters for the time being. It also
updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m23-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme 

* config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m23): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores
this tuning parameters apply to in the comment.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M23 processor.


Tested by building libgcc and libstdc++ for Cortex-M23 and running a hello world
compiled for it.

Is this ok for master?



Ok.
Thanks,
Kyrill


Best regards,

Thomas

Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M33



On 02/11/16 10:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M33 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Mainline with DSP
extensions architecture and arm_v7m_tune tuning parameters for the time being.
It also updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m33-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme 

* config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m33): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M33 processor.


Tested by building libgcc and libstdc++ for Cortex-M33 and running a hello world
compiled for it.

Is this ok for master?



Ok.
Thanks,
Kyrill


Best regards,

Thomas

Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

2016-11-02 Thread Steven Bosscher

 On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener
 wrote:
> On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote:
>> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
>>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
>>> > +  cfg_layout_finalize ();
>>>
>>> I don't think you have to go into/out-of cfglayout mode for each iteration.
>>
>> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
>> iteration though, I was afraid that interacts.
>
> Ick.  Why would it need a cfg-cleanup every iteration?

I don't believe it needs a cleanup on every iteration. One cleanup at
the end should work fine.

Ciao!
Steven

Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2016-11-02 Thread Eric Botcazou

> This converts the preprocessor check for WORD_REGISTER_OPERATIONS into a
> runtime one in rtlanal.c.
> 
> Since this one was in combination with an "#if defined" and used to guard an
> if-statement I'd appreciate it if someone gave it a double-check that I
> dind't screw up the intended behaviour.

Unfortunately I think you did, as the old version was:

#if WORD_REGISTER_OPERATIONS && defined (LOAD_EXTEND_OP)
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
   ? val_signbit_known_set_p (inner_mode, nonzero)
   : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
  || !MEM_P (SUBREG_REG (x)))
#endif

and the new version is:

#ifdef LOAD_EXTEND_OP
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if (WORD_REGISTER_OPERATIONS
  && ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
 ? val_signbit_known_set_p (inner_mode, nonzero)
 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   || !MEM_P (SUBREG_REG (x
#endif

So if WORD_REGISTER_OPERATIONS is zero and LOAD_EXTEND_OP is defined, for 
example on PowerPC, the block guarded by the condition is always executed in 
the former case but never in the latter case.

-- 
Eric Botcazou

[PATCH] Fix store-merging alias check, apply some TLC


This fixes the alias check in terminate_all_aliasing_chains -- the
base we use for the hash table indexing does not constitute an object
that aliases all stores in the chain (consider for example the MEM_REF
handling which strips the offset completely).

I've refactored data structures a bit in the process of making
(as a followup) 'base_addr' a true address (and thus avoid building
that new MEM_REF for example).  A followup will then try to support
(some) bases with offset != NULL_TREE.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-11-02  Richard Biener  

* gimple-ssa-store-merging.c (struct store_immediate_info): Remove
redundant val and dest members.
(store_immediate_info::store_immediate_info): Adjust.
(merged_store_group::merged_store_group): Adjust.
(merged_store_group::apply_stores): Likewise.
(struct imm_store_chain_info): Add base_addr field.
(imm_store_chain_info::imm_store_chain_info): New constructor.
(imm_store_chain_info::terminate_and_process_chain): Do not pass base.
(imm_store_chain_info::output_merged_store): Likewise.
(imm_store_chain_info::output_merged_stores): Likewise.
(pass_tree_store_merging::terminate_all_aliasing_chains): Take
imm_store_chain_info instead of base.  Fix alias check.
(pass_tree_store_merging::terminate_and_release_chain): Likewise.
(imm_store_chain_info::coalesce_immediate_stores): Adjust.

Index: gcc/gimple-ssa-store-merging.c
===
--- gcc/gimple-ssa-store-merging.c  (revision 241779)
+++ gcc/gimple-ssa-store-merging.c  (working copy)
@@ -140,19 +140,17 @@ struct store_immediate_info
 {
   unsigned HOST_WIDE_INT bitsize;
   unsigned HOST_WIDE_INT bitpos;
-  tree val;
-  tree dest;
   gimple *stmt;
   unsigned int order;
-  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, tree,
-   tree, gimple *, unsigned int);
+  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+   gimple *, unsigned int);
 };
 
 store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
-   unsigned HOST_WIDE_INT bp, tree v,
-   tree d, gimple *st,
+   unsigned HOST_WIDE_INT bp,
+   gimple *st,
unsigned int ord)
-  : bitsize (bs), bitpos (bp), val (v), dest (d), stmt (st), order (ord)
+  : bitsize (bs), bitpos (bp), stmt (st), order (ord)
 {
 }
 
@@ -557,7 +555,7 @@ merged_store_group::merged_store_group (
   /* VAL has memory allocated for it in apply_stores once the group
  width has been finalized.  */
   val = NULL;
-  align = get_object_alignment (info->dest);
+  align = get_object_alignment (gimple_assign_lhs (info->stmt));
   stores.create (1);
   stores.safe_push (info);
   last_stmt = info->stmt;
@@ -654,14 +652,16 @@ merged_store_group::apply_stores ()
   FOR_EACH_VEC_ELT (stores, i, info)
 {
   unsigned int pos_in_buffer = info->bitpos - start;
-  bool ret = encode_tree_to_bitpos (info->val, val, info->bitsize,
-pos_in_buffer, buf_size);
+  bool ret = encode_tree_to_bitpos (gimple_assign_rhs1 (info->stmt),
+   val, info->bitsize,
+   pos_in_buffer, buf_size);
   if (dump_file && (dump_flags & TDF_DETAILS))
{
  if (ret)
{
  fprintf (dump_file, "After writing ");
- print_generic_expr (dump_file, info->val, 0);
+ print_generic_expr (dump_file,
+ gimple_assign_rhs1 (info->stmt), 0);
  fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC
" at position %d the merged region contains:\n",
info->bitsize, pos_in_buffer);
@@ -680,13 +680,15 @@ merged_store_group::apply_stores ()
 
 struct imm_store_chain_info
 {
+  tree base_addr;
   auto_vec m_store_info;
   auto_vec m_merged_store_groups;
 
-  bool terminate_and_process_chain (tree);
+  imm_store_chain_info (tree b_a) : base_addr (b_a) {}
+  bool terminate_and_process_chain ();
   bool coalesce_immediate_stores ();
-  bool output_merged_store (tree, merged_store_group *);
-  bool output_merged_stores (tree);
+  bool output_merged_store (merged_store_group *);
+  bool output_merged_stores ();
 };
 
 const pass_data pass_data_tree_store_merging = {
@@ -722,8 +724,9 @@ private:
   hash_map m_stores;
 
   bool terminate_and_process_all_chains ();
-  bool terminate_all_aliasing_chains (tree, tree, bool, gimple *);
-  bool terminate_and_release_chain (tree);
+  bool terminate_all_aliasing_chains (tree, imm_store_chain_info **,
+

Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.

2016-11-02 Thread James Greenhalgh

On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote:
> On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski  wrote:
> > Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 (variant 
> > 1)
> > is an ARMv8.1 part, I needed to add detecting of the variant also for this
> > difference. Also I simplify a little bit and combined the single core and
> > arch detecting cases so it would be easier to add variant.
> 
> Actually it is a bit more complex than what I said here, see below for
> the full table of options and what are enabled/disabled now.
> 
> > OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> > Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is
> > deecting the two seperately.
> 
> 
> Here is the final patch in this series updated; I changed the cpu name
> slightly and made sure I updated invoke.texi too.
> 
> The names are going to match the names in LLVM (worked with our LLVM
> engineer here at Cavium about the names).
> Here are the names recorded and
> -mpcu=thunderx:
> *Matches part num 0xA0 (reserved for ThunderX 8x series)
> *T88 Pass 2 scheduling
> *Hardware prefetching (software prefetching disabled)
> *LSE enabled
> *no v8.1

This doesn't match the current LLVM proposal
( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support
for -mcpu=thunderx.

> -mcpu=thunderxt88:
> *Matches part num 0xA1
> *T88 Pass 2 scheduling
> *software prefetching enabled
> *LSE enabled
> *no v8.1
> 
> -mcpu=thunderxt88p1 (only for GCC):
> *Matches part num 0xA1, variant 0
> *T88 Pass 1 scheduling
> *software prefetching enabled
> *no LSE enabled
> *no v8.1
> 
> -mcpu=thunderxt81 and -mcpu=thunderxt83:
> *Matches part num 0xA2/0xA3
> *T88 Pass 2 scheduling
> *Hardware prefetching (software prefetching disabled)
> *LSE enabled
> *v8.1

This looks like what has been added to LLVM as -mcpu=thunderx.

> I have not hooked up software vs hardware prefetching and the
> scheduler parts (the next patch will do part of that); both ARMv8.1-a
> and LSE parts are hooked up as those parts are only in
> aarch64-cores.def.
> 
> OK?  Bootstrapped and tested on ThunderX T88 and ThunderX T81
> (aarch64-linux-gnu).
> 
> Index: common/config/aarch64/aarch64-common.c
> ===
> --- common/config/aarch64/aarch64-common.c(revision 241727)
> +++ common/config/aarch64/aarch64-common.c(working copy)
> @@ -145,7 +145,7 @@ struct arch_to_arch_name
> the default set of architectural feature flags they support.  */
>  static const struct processor_name_to_arch all_cores[] =
>  {
> -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \
> +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, 
> VARIANT) \
>{NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS},
>  #include "config/aarch64/aarch64-cores.def"
>{"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8},
> Index: config/aarch64/aarch64-cores.def
> ===
> --- config/aarch64/aarch64-cores.def  (revision 241727)
> +++ config/aarch64/aarch64-cores.def  (working copy)
> @@ -21,7 +21,7 @@
>  
> Before using #include to read this file, define a macro:
>  
> -  AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
> FLAGS, COSTS, IMP, PART)
> +  AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
> FLAGS, COSTS, IMP, PART, VARIANT)
>  
> The CORE_NAME is the name of the core, represented as a string constant.
> The CORE_IDENT is the name of the core, represented as an identifier.
> @@ -39,39 +39,45 @@
> PART is the part number of the CPU.  On a GNU/Linux system it can be
> found in /proc/cpuinfo.  For big.LITTLE systems this should use the
> macro AARCH64_BIG_LITTLE where the big part number comes as the first
> -   argument to the macro and little is the second.  */
> +   argument to the macro and little is the second.
> +   VARIANT is the variant of the CPU.  In a GNU/Linux system it can found
> +   in /proc/cpuinfo.  If this is -1, this means it can match any variant.  */
>  
>  /* V8 Architecture Processors.  */
>  
>  /* ARM ('A') cores. */
> -AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04)
> -AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03)
> -AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07)
> -AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08)
> -AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa73

Re: [PATCH testsuite]Require vect_cond_mixed for test case gcc.dg/vect/pr56541.c

On Tue, Nov 1, 2016 at 4:24 PM, Bin.Cheng  wrote:
> On Thu, Aug 11, 2016 at 11:38 AM, Richard Biener
>  wrote:
>> On Thu, Aug 11, 2016 at 11:56 AM, Bin.Cheng  wrote:
>>> On Thu, Aug 11, 2016 at 10:50 AM, Richard Biener
>>>  wrote:
 On Wed, Aug 10, 2016 at 5:58 PM, Bin Cheng  wrote:
> Hi,
> Due to some reasons, tree-if-conv.c now factors floating point comparison 
> out of cond_expr,
> resulting in mixed types in it.  This does help CSE on common comparison 
> operations.
> Only problem is that test gcc.dg/vect/pr56541.c now requires 
> vect_cond_mixed to be
> vectorized.  This patch changes the test in that way.
> Test result checked.  Is it OK?

 Hmm, I think the fix is to fix if-conversion not doing that.  Can you
 track down why this happens?
>>> Hmm, but there are several common floating comparison operations in
>>> the case, by doing this, we could do CSE on GIMPLE, otherwise we
>>> depends on RTL optimizers.
>>
>> I see.
>>
>>> I thought we prefer GIMPLE level
>>> transforms?
>>
>> Yes, but the vectorizer is happier with the conditions present in the 
>> COND_EXPR
>> and thus we concluded we always want to have them there.  forwprop will
>> also aggressively put them back.  Note that we cannot put back
>> tree_could_throw_p
>> conditions (FP compares with signalling nans for example) to properly model 
>> EH
>> (though for VEC_COND_EXPRs we don't really care here).
>>
>> Note that nothing between if-conversion and vectorization will perform
>> the desired
>> CSE anyway.
> Hi Richard,
> I looked into this one and found it was not if-conv factors cond_expr
> out.  For test case:
>
>   for (i=0; i!=1024; ++i)
> {
>   float rR = a*z[i];
>   float rL = b*z[i];
>   float rMin = (rR   float rMax = (rR   rMin = (rMax>0) ? rMin : rBig;
>   rMin = (rMin>0) ? rMin : rMax;
>   ok[i] = rMin-c }
> Dump before jump threading is like:
>
>   :
>   # iftmp.3_12 = PHI 
>   if (iftmp.3_12 > 0.0)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # iftmp.4_13 = PHI 
>   if (iftmp.4_13 > 0.0)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # iftmp.5_14 = PHI 
>
> Jump thread in dom pass threads edges (bb7 -> bb8 -> ... bb11) to (bb6
> -> bb12 -> bb9) as below:
>
>   :
>   # iftmp.3_12 = PHI 
>   # iftmp.2_23 = PHI 
>   if (iftmp.3_12 > 0.0)
> goto ;
>   else
> goto ;
>
>   :
>   # iftmp.4_13 = PHI 
>   if (iftmp.4_13 > 0.0)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # iftmp.5_14 = PHI 
>
>   //...
>   :
>   # iftmp.4_22 = PHI <1.5e+2(6)>
>   goto ;
>
> This transform saves one comparison on the path, but creates multi-arg
> phi, resulting in cond_expr being factored out.
>
> Looks like threading corrupts vectorization opportunity for target
> doesn't support vect_cond_mixed, but I guess it's hard to tell in
> threading itself.  Any ideas?

First of all I wonder why MIN/MAX are not pattern matched earlier...
(I'd generally avoid threading through them).

But yes, I see that if-conversion can cause factored out conditons.
Also for the simple case of ! condition:

_49 = iftmp.3_12 > 0.0;
  _50 = ~_49;
  iftmp.5_14 = _50 ? 1.5e+2 : _ifc__48;

but those should have been eliminated by swapping operands (looks like
that doesn't reliably work).

In the end we have the bool patterns in the vectorizer to recover from this,
so I wonder why that doesn't work then.  It should re-write the above to

 _111 = iftmp.3_12 > 0.0 ? 1 : 0;  // of proper type
 _112 = ~ _111;
 iftmp.5_14 = _112 != 0 ? ;

Richard.

> Thanks,
> bin

Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS


Hi Eric,

On 02/11/16 10:47, Eric Botcazou wrote:

This converts the preprocessor check for WORD_REGISTER_OPERATIONS into a
runtime one in rtlanal.c.

Since this one was in combination with an "#if defined" and used to guard an
if-statement I'd appreciate it if someone gave it a double-check that I
dind't screw up the intended behaviour.

Unfortunately I think you did, as the old version was:

#if WORD_REGISTER_OPERATIONS && defined (LOAD_EXTEND_OP)
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
   ? val_signbit_known_set_p (inner_mode, nonzero)
   : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
  || !MEM_P (SUBREG_REG (x)))
#endif

and the new version is:

#ifdef LOAD_EXTEND_OP
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if (WORD_REGISTER_OPERATIONS
  && ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
 ? val_signbit_known_set_p (inner_mode, nonzero)
 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   || !MEM_P (SUBREG_REG (x
#endif

So if WORD_REGISTER_OPERATIONS is zero and LOAD_EXTEND_OP is defined, for
example on PowerPC, the block guarded by the condition is always executed in
the former case but never in the latter case.


I think you're right. I suppose the new condition should be:

#ifdef LOAD_EXTEND_OP
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if (!WORD_REGISTER_OPERATIONS
  || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
 ? val_signbit_known_set_p (inner_mode, nonzero)
 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   || !MEM_P (SUBREG_REG (x
#endif

Would you prefer me to make this change or just revert the patch?

Thanks,
Kyrill

Re: [PATCH testsuite]Require vect_cond_mixed for test case gcc.dg/vect/pr56541.c

2016-11-02 Thread Bin.Cheng

On Wed, Nov 2, 2016 at 11:08 AM, Richard Biener
 wrote:
> On Tue, Nov 1, 2016 at 4:24 PM, Bin.Cheng  wrote:
>> On Thu, Aug 11, 2016 at 11:38 AM, Richard Biener
>>  wrote:
>>> On Thu, Aug 11, 2016 at 11:56 AM, Bin.Cheng  wrote:
 On Thu, Aug 11, 2016 at 10:50 AM, Richard Biener
  wrote:
> On Wed, Aug 10, 2016 at 5:58 PM, Bin Cheng  wrote:
>> Hi,
>> Due to some reasons, tree-if-conv.c now factors floating point 
>> comparison out of cond_expr,
>> resulting in mixed types in it.  This does help CSE on common comparison 
>> operations.
>> Only problem is that test gcc.dg/vect/pr56541.c now requires 
>> vect_cond_mixed to be
>> vectorized.  This patch changes the test in that way.
>> Test result checked.  Is it OK?
>
> Hmm, I think the fix is to fix if-conversion not doing that.  Can you
> track down why this happens?
 Hmm, but there are several common floating comparison operations in
 the case, by doing this, we could do CSE on GIMPLE, otherwise we
 depends on RTL optimizers.
>>>
>>> I see.
>>>
 I thought we prefer GIMPLE level
 transforms?
>>>
>>> Yes, but the vectorizer is happier with the conditions present in the 
>>> COND_EXPR
>>> and thus we concluded we always want to have them there.  forwprop will
>>> also aggressively put them back.  Note that we cannot put back
>>> tree_could_throw_p
>>> conditions (FP compares with signalling nans for example) to properly model 
>>> EH
>>> (though for VEC_COND_EXPRs we don't really care here).
>>>
>>> Note that nothing between if-conversion and vectorization will perform
>>> the desired
>>> CSE anyway.
>> Hi Richard,
>> I looked into this one and found it was not if-conv factors cond_expr
>> out.  For test case:
>>
>>   for (i=0; i!=1024; ++i)
>> {
>>   float rR = a*z[i];
>>   float rL = b*z[i];
>>   float rMin = (rR>   float rMax = (rR>   rMin = (rMax>0) ? rMin : rBig;
>>   rMin = (rMin>0) ? rMin : rMax;
>>   ok[i] = rMin-c> }
>> Dump before jump threading is like:
>>
>>   :
>>   # iftmp.3_12 = PHI 
>>   if (iftmp.3_12 > 0.0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # iftmp.4_13 = PHI 
>>   if (iftmp.4_13 > 0.0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # iftmp.5_14 = PHI 
>>
>> Jump thread in dom pass threads edges (bb7 -> bb8 -> ... bb11) to (bb6
>> -> bb12 -> bb9) as below:
>>
>>   :
>>   # iftmp.3_12 = PHI 
>>   # iftmp.2_23 = PHI 
>>   if (iftmp.3_12 > 0.0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>   # iftmp.4_13 = PHI 
>>   if (iftmp.4_13 > 0.0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # iftmp.5_14 = PHI 
>>
>>   //...
>>   :
>>   # iftmp.4_22 = PHI <1.5e+2(6)>
>>   goto ;
>>
>> This transform saves one comparison on the path, but creates multi-arg
>> phi, resulting in cond_expr being factored out.
>>
>> Looks like threading corrupts vectorization opportunity for target
>> doesn't support vect_cond_mixed, but I guess it's hard to tell in
>> threading itself.  Any ideas?
>
> First of all I wonder why MIN/MAX are not pattern matched earlier...
> (I'd generally avoid threading through them).
Hmm, this case is not MIN/MAX cases, though it uses rMin/rMax as
variable names.  So it cannot be pattern matched.

As for below cases you mentioned.  Since I am introducing
more/advanced pattern in match.pd and adding call to fold_stmt in
fold_stmt, I think they can be addressed in this way too, I will keep
this in mind in my cond_expr pattern work.  So we don't need to go
that far to vector pattern matches.

Thanks,
bin
>
> But yes, I see that if-conversion can cause factored out conditons.
> Also for the simple case of ! condition:
>
> _49 = iftmp.3_12 > 0.0;
>   _50 = ~_49;
>   iftmp.5_14 = _50 ? 1.5e+2 : _ifc__48;
>
> but those should have been eliminated by swapping operands (looks like
> that doesn't reliably work).
>
> In the end we have the bool patterns in the vectorizer to recover from this,
> so I wonder why that doesn't work then.  It should re-write the above to
>
>  _111 = iftmp.3_12 > 0.0 ? 1 : 0;  // of proper type
>  _112 = ~ _111;
>  iftmp.5_14 = _112 != 0 ? ;
>
> Richard.
>
>> Thanks,
>> bin

Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsincs.

2016-11-02 Thread Bin.Cheng

On Tue, Nov 1, 2016 at 12:21 PM, Kyrill Tkachov
 wrote:
> Hi Tamar,
>
>
> On 26/10/16 16:01, Tamar Christina wrote:
>>
>> Hi Christophe,
>>
>> Here's the updated patch.
>>
>> Cheers,
>> Tamar
>> 
>> From: Christophe Lyon 
>> Sent: Wednesday, October 19, 2016 11:23:56 AM
>> To: Tamar Christina
>> Cc: GCC Patches; Kyrylo Tkachov; nd
>> Subject: Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and
>> vminnmQ_ST intrinsincs.
>>
>> On 19 October 2016 at 11:36, Tamar Christina 
>> wrote:
>>>
>>> Hi All,
>>>
>>> This patch implements the vmaxnmQ_ST and vminnmQ_ST intrinsics. The
>>> current builtin registration code is deficient since it can't access
>>> standard pattern names, to which vmaxnmQ_ST and vminnmQ_ST map
>>> directly. Thus, to enable the vectoriser to have access to these
>>> intrinsics, we implement them using builtin functions, which we
>>> expand to the proper standard pattern using a define_expand.
>>>
>>> This patch also implements the __ARM_FEATURE_NUMERIC_MAXMIN macro,
>>> which is defined when __ARM_ARCH >= 8, and which enables the
>>> intrinsics.
>>>
>>> Regression tested on arm-none-eabi and no regressions.
>>>
>>> This patch is a rework of a previous patch:
>>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01971.html
>>>
>>> OK for trunk?
These cases failed on arm-none-linux-gnueabihf as below:
FAIL: gcc.target/arm/simd/vmaxnm_f32_1.c execution test
FAIL: gcc.target/arm/simd/vmaxnmq_f32_1.c execution test
FAIL: gcc.target/arm/simd/vminnm_f32_1.c execution test
FAIL: gcc.target/arm/simd/vminnmq_f32_1.c execution test

For such changes, I would suggest reg test for both bare-metal and
linux toolchains, plus a bootstrap for linux toolchain.

Thanks,
bin

>
>
> Ok.
> Thanks,
> Kyrill
>
>
>>> Thanks,
>>> Tamar
>>>
>>> ---
>>>
>>> gcc/
>>>
>>> 2016-10-19  Bilyan Borisov  
>>>  Tamar Christina 
>>>
>>>  * config/arm/arm-c.c (arm_cpu_builtins): New macro definition.
>>>  * config/arm/arm_neon.h (vmaxnm_f32): New intrinsinc.
>>>  (vmaxnmq_f32): Likewise.
>>>  (vminnm_f32): Likewise.
>>>  (vminnmq_f32): Likewise.
>>>  * config/arm/arm_neon_builtins.def (vmaxnm): New builtin.
>>>  (vminnm): Likewise.
>>>  * config/arm/neon.md (neon_, VCVTF): New
>>>  expander.
>>>
>>> gcc/testsuite/
>>>
>>> 2016-10-19  Bilyan Borisov  
>>>
>>>  * gcc.target/arm/simd/vmaxnm_f32_1.c: New.
>>>  * gcc.target/arm/simd/vmaxnmq_f32_1.c: Likewise.
>>>  * gcc.target/arm/simd/vminnm_f32_1.c: Likewise.
>>>  * gcc.target/arm/simd/vminnmq_f32_1.c: Likewise.
>>>
>> I think you forgot to attach the new tests.
>>
>> Christophe
>>
>

Re: [PATCH] Fix store-merging alias check, apply some TLC

On Wed, 2 Nov 2016, Richard Biener wrote:

> 
> This fixes the alias check in terminate_all_aliasing_chains -- the
> base we use for the hash table indexing does not constitute an object
> that aliases all stores in the chain (consider for example the MEM_REF
> handling which strips the offset completely).

I fat-fingered this thinking one step ahead (forgot to take the address).

> I've refactored data structures a bit in the process of making
> (as a followup) 'base_addr' a true address (and thus avoid building
> that new MEM_REF for example).  A followup will then try to support
> (some) bases with offset != NULL_TREE.
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.

So I had to re-start the testing when I already finished the very minor
change to make addr_base actually the address.  Thus the 2nd part is now
included.

This also has the advantage that the hash table comparison will
now apply the lax rules valid for comparing addresses (thus the
TREE_THIS_VOLATILE check should be no longer needed for example
and also different TBAA stores will now be merged).  I took the
liberty to explicitely rule out TARGET_MEM_REFs from the
merging (they'll have failed to form groups anyway as they are
not subsetted and we don't handle the const offset stripping yet
to eventually merge byte to word stores).

Richard.

2016-11-02  Richard Biener  

* gimple-ssa-store-merging.c (struct store_immediate_info): Remove
redundant val and dest members.
(store_immediate_info::store_immediate_info): Adjust.
(merged_store_group::merged_store_group): Adjust.
(merged_store_group::apply_stores): Likewise.
(struct imm_store_chain_info): Add base_addr field.
(imm_store_chain_info::imm_store_chain_info): New constructor.
(imm_store_chain_info::terminate_and_process_chain): Do not pass base.
(imm_store_chain_info::output_merged_store): Likewise.  Use
addr_base which is already the address.
(imm_store_chain_info::output_merged_stores): Likewise.
(pass_tree_store_merging::terminate_all_aliasing_chains): Take
imm_store_chain_info instead of base.  Fix alias check.
(pass_tree_store_merging::terminate_and_release_chain): Likewise.
(imm_store_chain_info::coalesce_immediate_stores): Adjust.
(pass_store_merging::execute): Refuse to operate on TARGET_MEM_REF.
use the address of the base and adjust for other changes.

Index: gcc/gimple-ssa-store-merging.c
===
--- gcc/gimple-ssa-store-merging.c  (revision 241779)
+++ gcc/gimple-ssa-store-merging.c  (working copy)
@@ -140,19 +140,17 @@ struct store_immediate_info
 {
   unsigned HOST_WIDE_INT bitsize;
   unsigned HOST_WIDE_INT bitpos;
-  tree val;
-  tree dest;
   gimple *stmt;
   unsigned int order;
-  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, tree,
-   tree, gimple *, unsigned int);
+  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+   gimple *, unsigned int);
 };

 store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
-   unsigned HOST_WIDE_INT bp, tree v,
-   tree d, gimple *st,
+   unsigned HOST_WIDE_INT bp,
+   gimple *st,
unsigned int ord)
-  : bitsize (bs), bitpos (bp), val (v), dest (d), stmt (st), order (ord)
+  : bitsize (bs), bitpos (bp), stmt (st), order (ord)
 {
 }

@@ -557,7 +555,7 @@ merged_store_group::merged_store_group (
   /* VAL has memory allocated for it in apply_stores once the group
  width has been finalized.  */
   val = NULL;
-  align = get_object_alignment (info->dest);
+  align = get_object_alignment (gimple_assign_lhs (info->stmt));
   stores.create (1);
   stores.safe_push (info);
   last_stmt = info->stmt;
@@ -654,14 +652,16 @@ merged_store_group::apply_stores ()
   FOR_EACH_VEC_ELT (stores, i, info)
 {
   unsigned int pos_in_buffer = info->bitpos - start;
-  bool ret = encode_tree_to_bitpos (info->val, val, info->bitsize,
-pos_in_buffer, buf_size);
+  bool ret = encode_tree_to_bitpos (gimple_assign_rhs1 (info->stmt),
+   val, info->bitsize,
+   pos_in_buffer, buf_size);
   if (dump_file && (dump_flags & TDF_DETAILS))
{
  if (ret)
{
  fprintf (dump_file, "After writing ");
- print_generic_expr (dump_file, info->val, 0);
+ print_generic_expr (dump_file,
+ gimple_assign_rhs1 (info->stmt), 0);
  fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC
" at position %d the merged r

Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2016-11-02 Thread Eric Botcazou

> I think you're right. I suppose the new condition should be:
> 
> #ifdef LOAD_EXTEND_OP
> /* If this is a typical RISC machine, we only have to worry
>about the way loads are extended.  */
> if (!WORD_REGISTER_OPERATIONS
> 
> || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
> 
>? val_signbit_known_set_p (inner_mode, nonzero)
> 
>: LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
>  || 
>  || !MEM_P (SUBREG_REG (x
> 
> #endif

Agreed.

> Would you prefer me to make this change or just revert the patch?

Go ahead and make the change, but please do a bit of comment massaging in the 
process, for example:

#ifdef LOAD_EXTEND_OP
  /* On many CISC machines, accessing an object in a wider mode
 causes the high-order bits to become undefined.  So they are
 not known to be zero.  */
  if (!WORD_REGISTER_OPERATIONS
 /* If this is a typical RISC machine, we only have to worry
about the way loads are extended.  */
  || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
 ? val_signbit_known_set_p (inner_mode, nonzero)
 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   || !MEM_P (SUBREG_REG (x
#endif
{
  if (GET_MODE_PRECISION (GET_MODE (x))
  > GET_MODE_PRECISION (inner_mode))
nonzero |= (GET_MODE_MASK (GET_MODE (x))
& ~GET_MODE_MASK (inner_mode));
}

-- 
Eric Botcazou

Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.

2016-11-02 Thread Jones, Joel

What is currently submitted for LLVM review was submitted before we determined 
this naming scheme. I will mark the current submittal as abandoned, as the 
scheduling model needs to be split out and revised.

Joel Jones 

Sent from my AArch64 powered iPhone

> On Nov 2, 2016, at 3:55 AM, James Greenhalgh  wrote:
> 
>> On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote:
>>> On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski  wrote:
>>> Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 (variant 
>>> 1)
>>> is an ARMv8.1 part, I needed to add detecting of the variant also for this
>>> difference. Also I simplify a little bit and combined the single core and
>>> arch detecting cases so it would be easier to add variant.
>> 
>> Actually it is a bit more complex than what I said here, see below for
>> the full table of options and what are enabled/disabled now.
>> 
>>> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>>> Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is
>>> deecting the two seperately.
>> 
>> 
>> Here is the final patch in this series updated; I changed the cpu name
>> slightly and made sure I updated invoke.texi too.
>> 
>> The names are going to match the names in LLVM (worked with our LLVM
>> engineer here at Cavium about the names).
>> Here are the names recorded and
>> -mpcu=thunderx:
>> *Matches part num 0xA0 (reserved for ThunderX 8x series)
>> *T88 Pass 2 scheduling
>> *Hardware prefetching (software prefetching disabled)
>> *LSE enabled
>> *no v8.1
> 
> This doesn't match the current LLVM proposal
> ( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support
> for -mcpu=thunderx.
> 
>> -mcpu=thunderxt88:
>> *Matches part num 0xA1
>> *T88 Pass 2 scheduling
>> *software prefetching enabled
>> *LSE enabled
>> *no v8.1
>> 
>> -mcpu=thunderxt88p1 (only for GCC):
>> *Matches part num 0xA1, variant 0
>> *T88 Pass 1 scheduling
>> *software prefetching enabled
>> *no LSE enabled
>> *no v8.1
>> 
>> -mcpu=thunderxt81 and -mcpu=thunderxt83:
>> *Matches part num 0xA2/0xA3
>> *T88 Pass 2 scheduling
>> *Hardware prefetching (software prefetching disabled)
>> *LSE enabled
>> *v8.1
> 
> This looks like what has been added to LLVM as -mcpu=thunderx.
> 
>> I have not hooked up software vs hardware prefetching and the
>> scheduler parts (the next patch will do part of that); both ARMv8.1-a
>> and LSE parts are hooked up as those parts are only in
>> aarch64-cores.def.
>> 
>> OK?  Bootstrapped and tested on ThunderX T88 and ThunderX T81
>> (aarch64-linux-gnu).
>> 
>> Index: common/config/aarch64/aarch64-common.c
>> ===
>> --- common/config/aarch64/aarch64-common.c(revision 241727)
>> +++ common/config/aarch64/aarch64-common.c(working copy)
>> @@ -145,7 +145,7 @@ struct arch_to_arch_name
>>the default set of architectural feature flags they support.  */
>> static const struct processor_name_to_arch all_cores[] =
>> {
>> -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \
>> +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, 
>> VARIANT) \
>>   {NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS},
>> #include "config/aarch64/aarch64-cores.def"
>>   {"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8},
>> Index: config/aarch64/aarch64-cores.def
>> ===
>> --- config/aarch64/aarch64-cores.def(revision 241727)
>> +++ config/aarch64/aarch64-cores.def(working copy)
>> @@ -21,7 +21,7 @@
>> 
>>Before using #include to read this file, define a macro:
>> 
>> -  AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
>> FLAGS, COSTS, IMP, PART)
>> +  AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
>> FLAGS, COSTS, IMP, PART, VARIANT)
>> 
>>The CORE_NAME is the name of the core, represented as a string constant.
>>The CORE_IDENT is the name of the core, represented as an identifier.
>> @@ -39,39 +39,45 @@
>>PART is the part number of the CPU.  On a GNU/Linux system it can be
>>found in /proc/cpuinfo.  For big.LITTLE systems this should use the
>>macro AARCH64_BIG_LITTLE where the big part number comes as the first
>> -   argument to the macro and little is the second.  */
>> +   argument to the macro and little is the second.
>> +   VARIANT is the variant of the CPU.  In a GNU/Linux system it can found
>> +   in /proc/cpuinfo.  If this is -1, this means it can match any variant.  
>> */
>> 
>> /* V8 Architecture Processors.  */
>> 
>> /* ARM ('A') cores. */
>> -AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04)
>> -AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH6

Re: [PATCH, RFC, rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add

2016-11-02 Thread Bill Schmidt

On Nov 2, 2016, at 4:28 AM, Jakub Jelinek  wrote:
> 
> On Wed, Nov 02, 2016 at 10:19:26AM +0100, Richard Biener wrote:
>> On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek  wrote:
>>> On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
 The PowerPC back end loses performance on vector intrinsics, because 
 currently
 all of them are treated as calls throughout the middle-end phases and only
 expanded when they reach RTL.  Our version of altivec.h currently defines 
 the
 public names of overloaded functions (like vec_add) to be #defines for 
 hidden
 functions (like __builtin_vec_add), which are recognized in the parser as
 requiring special back-end support.  Tables in rs6000-c.c handle dispatch 
 of
 the overloaded functions to specific function calls appropriate to the 
 argument
 types.
>>> 
>>> This doesn't look very nice.  If all you care is that the builtins like
>>> __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
>>> into generic vector operations under certain conditions, just fold those
>>> into whatever you want in targetm.gimple_fold_builtin (gsi).
>> 
>> Note that traditionally "overloading" with GCC "builtins" is done by
>> using varargs
>> and the "type generic" attribute.  That doesn't scale to return type 
>> overloading
>> though for which we usually added direct support to the parser (for example
>> for __builtin_shuffle).
> 
> My understanding is that rs6000 already does that, it hooks into
> resolve_overloaded_builtin which already handles the fully type generic
> builtins where not just the arguments, but also the return type can be
> picked up.  But it resolves the overloaded builtins into calls to other
> builtins that are not type-generic.
> 
> So, either that function instead of returning the specific md builtin calls
> in some cases already returns trees with the generic behavior of the
> builtin, or it returns what it does now and then in the gimple fold builtin
> target hook (note, the normal fold builtin target hook is not right for
> that, because it is mostly used for folding builtins into constant - callers
> will usually throw away other results) fold those specific md builtins
> into whatever GIMPLE you want.  If we want to decrease amount of folding in
> the FEs, the gimple fold builtin hook is probably better.
> 
>   Jakub

Thanks, all.  Using the gimple_fold_builtin target hook works very well and
is exactly what I'm looking for.  I've reworked the patch to the much simpler
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00104.html.

Much obliged for the help!

Bill

Re: [PATCH, vec-tails] Support loop epilogue vectorization

On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:

> Hi All,
> 
> I re-send all patches sent by Ilya earlier for review which support
> vectorization of loop epilogues and loops with low trip count. We
> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> approved by Jeff.
> 
> I did re-base of all patches and performed bootstrapping and
> regression testing that did not show any new failures. Also all
> changes related to new vect_do_peeling algorithm have been changed
> accordingly.
> 
> Is it OK for trunk?

I would have prefered that the series up to -03-nomask-tails would
_only_ contain epilogue loop vectorization changes but unfortunately
the patchset is oddly separated.

I have a comment on that part nevertheless:

@@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
   || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
-  || loop->inner)
+  || loop->inner
+  /* Required peeling was performed in prologue and
+is not required for epilogue.  */
+  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
 do_peeling = false;

   if (do_peeling
@@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)

   do_versioning =
optimize_loop_nest_for_speed_p (loop)
-   && (!loop->inner); /* FORNOW */
+   && (!loop->inner) /* FORNOW */
+/* Required versioning was performed for the
+  original loop and is not required for epilogue.  */
+   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);

   if (do_versioning)
 {

please do that check in the single caller of this function.

Otherwise I still dislike the new ->aux use and I believe that simply
passing down info from the processed parent would be _much_ cleaner.
That is, here (and avoid the FOR_EACH_LOOP change):

@@ -580,12 +586,21 @@ vectorize_loops (void)
&& dump_enabled_p ())
   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
"loop vectorized\n");
-   vect_transform_loop (loop_vinfo);
+   new_loop = vect_transform_loop (loop_vinfo);
num_vectorized_loops++;
/* Now that the loop has been vectorized, allow it to be unrolled
   etc.  */
loop->force_vectorize = false;

+   /* Add new loop to a processing queue.  To make it easier
+  to match loop and its epilogue vectorization in dumps
+  put new loop as the next loop to process.  */
+   if (new_loop)
+ {
+   loops.safe_insert (i + 1, new_loop->num);
+   vect_loops_num = number_of_loops (cfun);
+ }

simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
function which will set up stuff properly (and also perform
the if-conversion of the epilogue there).

That said, if we can get in non-masked epilogue vectorization
separately that would be great.

I'm still torn about all the rest of the stuff and question its
usability (esp. merging the epilogue with the main vector loop).
But it has already been approved ... oh well.

Thanks,
Richard.

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 2, 2016 at 10:52 AM, Jakub Jelinek  wrote:
> On Wed, Nov 02, 2016 at 10:40:35AM +0100, Richard Biener wrote:
>> > I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
>> > set during the asan pass and kept on until end of compilation of that
>> > function.  That means even if a var only addressable because of ASAN_MARK 
>> > is
>> > discovered after this pass we'd still be able to rewrite it into SSA.
>>
>> Note that being TREE_ADDRESSABLE also has effects on alias analysis
>> (didn't follow the patches to see whether you handle ASAN_MARK specially
>> in points-to analysis and/or alias analysis).
>>
>> Generally in update-address-taken you can handle ASAN_MARK similar to
>> MEM_REF (and drop it in the rewrite phase?).
>
> That is the intent, but we can't do that before the asan pass, because
> otherwise as Martin explained we don't diagnose at runtime bugs where
> a variable is used outside of its scope only through a MEM_REF.
> So we need to wait for asan pass to actually add a real builtin call that
> takes the address in that case.  Except now I really don't see how that
> can work for the case where the var is used only properly when it is in the
> scope, because the asan pass will still see those being addressable.
>
> Unless I'm missing something we either need to perform further analysis
> during the addressable subpass - this variable could be made
> non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA
> form there be any SSA uses of the poisoning ASAN_MARK?  If yes, keep it
> addressable, otherwise rewrite into SSA.
> Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into
> some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any
> uses of those, rewrite it back into addressable immediately or later or
> something.

Or just give up optimizing (asan has a penalty anyway)?  Or
re-structure ASAN_POISON ()
similar to clobbers:

  var = ASAN_POISION ();

that avoids the address-taking and thus should be less heavy-weight on
optimizations.

Richard.

>
> Jakub

Re: [RFC PATCH] expand_strn_compare should attempt expansion even if neither string is constant

2016-11-02 Thread Bernd Schmidt


On 10/27/2016 03:14 AM, Aaron Sawdey wrote:

I'm currently working on a builtin expansion of strncmp for powerpc
similar to the one for memcmp I checked recently. One thing I
encountered is that the code in expand_strn_compare will not attempt to
expand the cmpstrnsi pattern at all if neither string parameter is a
constant string. This doesn't make a lot of sense in light of the fact
that expand_str_compare starts with an attempt to expand cmpstrsi and
then if that does not work, looks at the string args to see if one is
constant so it can use cmpstrnsi with the known length.

The attached patch is my attempt to add this fallback path to
expand_strn_compare, i.e. if neither length is known, just attempt
expansion of cmpstrnsi using the given 3 arguments.

It looks like in addition to rs6000, there are 3 other targets that
have cmpstrnsi patterns: i386, sh, and rx.

Is this a reasonable approach to take with this? If so I'll
bootstrap/regtest on i386 as rs6000 does not as yet have an expansion
for cmpstrsi or cmpstrnsi.


Just to be sure, this is superseded by your later patch series, correct?

(After I saw this one I had been trying to remember what exactly the 
i386 issue was but it looks like you found it yourself.)



Bernd

Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS

On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra  wrote:
>  Jeff Law  wrote:
>
>> I think you'll need to look at bz61320 before this could go in.
>
> I had a look, but there is nothing there that is related - eventually
> a latent alignment bug was fixed in IVOpt. Note that the bswap phase
> currently inserts unaligned accesses irrespectively of STRICT_ALIGNMENT
> or SLOW_UNALIGNED_ACCESS:
>
> -  if (bswap
>  - && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))
>  - && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
>  -   return false;
>
> If bswap is false no byte swap is needed, so we found a native endian load
> and it will always perform the optimization by inserting an unaligned load.

Yes, the general agreement is that the expander can do best and thus we
should canonicalize accesses to larger ones even for SLOW_UNALIGNED_ACCESS.
The expander will generate the canonical best code (hopefully...).

> This apparently works on all targets, and doesn't cause alignment traps or
> huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS.
> So I'm at a loss what these macros are supposed to mean and how I can query
> whether a backend supports fast unaligned access for a particular mode.
>
> What I actually want to write is something like:
>
>  if (!FAST_UNALIGNED_LOAD (mode, align)) return false;
>
> And know that it only accepts unaligned accesses that are efficient on the 
> target.
> Maybe we need a new hook like this and get rid of the old one?

No, we don't need to other hook.

Note there is another similar user in gimple-fold.c when folding small
memcpy/memmove
to single load/store pairs (patch posted but not applied by me -- I've
asked for strict-align
target maintainer feedback but got none).

Now - for bswap I'm only 99% sure that unaligned load + bswap is
better than piecewise
loads plus manual swap.

But generally I'm always in favor of removing SLOW_UNALIGNED_ACCESS /
STRICT_ALIGNMENT
checks from the GIMPLE side of the compiler.

Richard.

> Wilco
>

Re: [ping * 4] PR35503 - warn for restrict

2016-11-02 Thread Joseph Myers

The format-checking parts of the patch are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 02, 2016 at 01:36:31PM +0100, Richard Biener wrote:
> > Unless I'm missing something we either need to perform further analysis
> > during the addressable subpass - this variable could be made
> > non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA
> > form there be any SSA uses of the poisoning ASAN_MARK?  If yes, keep it
> > addressable, otherwise rewrite into SSA.
> > Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into
> > some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any
> > uses of those, rewrite it back into addressable immediately or later or
> > something.
> 
> Or just give up optimizing (asan has a penalty anyway)?  Or

Well, asan has a penalty and -fsanitize-use-after-scope even bigger penalty,
but the point is to make that penalty bearable.

> re-structure ASAN_POISON ()
> similar to clobbers:
> 
>   var = ASAN_POISION ();
> 
> that avoids the address-taking and thus should be less heavy-weight on
> optimizations.

Yeah, that is what I meant.  The issue is how to report uses of such
SSA_NAME when there is no memory.  So, either we'd need a special runtime
library entrypoint that would report uses after scope even when there is no
underlying memory, or we'd need to force it at asan pass time into memory again.

Jakub

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 2, 2016 at 1:56 PM, Jakub Jelinek  wrote:
> On Wed, Nov 02, 2016 at 01:36:31PM +0100, Richard Biener wrote:
>> > Unless I'm missing something we either need to perform further analysis
>> > during the addressable subpass - this variable could be made
>> > non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA
>> > form there be any SSA uses of the poisoning ASAN_MARK?  If yes, keep it
>> > addressable, otherwise rewrite into SSA.
>> > Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into
>> > some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any
>> > uses of those, rewrite it back into addressable immediately or later or
>> > something.
>>
>> Or just give up optimizing (asan has a penalty anyway)?  Or
>
> Well, asan has a penalty and -fsanitize-use-after-scope even bigger penalty,
> but the point is to make that penalty bearable.
>
>> re-structure ASAN_POISON ()
>> similar to clobbers:
>>
>>   var = ASAN_POISION ();
>>
>> that avoids the address-taking and thus should be less heavy-weight on
>> optimizations.
>
> Yeah, that is what I meant.  The issue is how to report uses of such
> SSA_NAME when there is no memory.  So, either we'd need a special runtime
> library entrypoint that would report uses after scope even when there is no
> underlying memory, or we'd need to force it at asan pass time into memory 
> again.

Well, there can't be any uses outside the scope -- there are no (memory) uses
left if we rewrite the thing into SSA.  That is, the address can no
longer "escape".

Of course there could have been invalid uses before the rewrite into SSA.  But
those can be diagnosed either immediately before or after re-writing into SSA
at compile-time (may be in dead code regions of course).

Richard.

> Jakub

Re: [ping * 4] PR35503 - warn for restrict

Then I'll approve the whole patch.

On Wed, Nov 2, 2016 at 8:42 AM, Joseph Myers  wrote:
> The format-checking parts of the patch are OK.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 02, 2016 at 01:59:00PM +0100, Richard Biener wrote:
> > Yeah, that is what I meant.  The issue is how to report uses of such
> > SSA_NAME when there is no memory.  So, either we'd need a special runtime
> > library entrypoint that would report uses after scope even when there is no
> > underlying memory, or we'd need to force it at asan pass time into memory 
> > again.
> 
> Well, there can't be any uses outside the scope -- there are no (memory) uses
> left if we rewrite the thing into SSA.  That is, the address can no
> longer "escape".
> 
> Of course there could have been invalid uses before the rewrite into SSA.  But
> those can be diagnosed either immediately before or after re-writing into SSA
> at compile-time (may be in dead code regions of course).

Sure, we can warn on those at compile time, but we really should arrange to
error on those at runtime if they are ever executed, the UB happens only at
runtime, so in dead code isn't fatal.

Jakub

[PATCH] Allow non-NULL offset for store-merging bases


The following teaches store-merging to handle non-NULL offset if the
base is already addressable (otherwise introducing new pointers to
a non-addressable base invalidates points-to information, see a comment
in the patch how we could avoid this in theory).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-11-02  Richard Biener  

* gimple-ssa-store-merging.c: Include gimplify-me.h.
(imm_store_chain_info::output_merged_stores): Force base_addr
to be proper GIMPLE for a MEM_REF address.
(pass_store_merging::execute): Restrict negative bitpos
handling to non-MEM_REF bases.  Remove TREE_THIS_VOLATILE
check.  Take into account non-NULL_TREE offset if the base
is already addressable.

* gcc.dg/store_merging_8.c: New testcase.

Index: gcc/gimple-ssa-store-merging.c
===
--- gcc/gimple-ssa-store-merging.c  (revision 241789)
+++ gcc/gimple-ssa-store-merging.c  (working copy)
@@ -125,6 +125,7 @@
 #include "tree-cfg.h"
 #include "tree-eh.h"
 #include "target.h"
+#include "gimplify-me.h"
 
 /* The maximum size (in bits) of the stores this pass should generate.  */
 #define MAX_STORE_BITSIZE (BITS_PER_WORD)
@@ -1127,6 +1128,8 @@ imm_store_chain_info::output_merged_stor
   unsigned int i;
   bool fail = false;
 
+  tree addr = force_gimple_operand_1 (unshare_expr (base_addr), &seq,
+ is_gimple_mem_ref_addr, NULL_TREE);
   FOR_EACH_VEC_ELT (split_stores, i, split_store)
 {
   unsigned HOST_WIDE_INT try_size = split_store->size;
@@ -1137,7 +1140,7 @@ imm_store_chain_info::output_merged_stor
 
   tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
   int_type = build_aligned_type (int_type, align);
-  tree dest = fold_build2 (MEM_REF, int_type, base_addr,
+  tree dest = fold_build2 (MEM_REF, int_type, addr,
   build_int_cst (offset_type, try_pos));
 
   tree src = native_interpret_expr (int_type,
@@ -1366,15 +1369,10 @@ pass_store_merging::execute (function *f
   &unsignedp, &reversep, &volatilep);
  /* As a future enhancement we could handle stores with the same
 base and offset.  */
- bool invalid = offset || reversep || bitpos < 0
+ bool invalid = reversep
 || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
  && (TREE_CODE (rhs) != INTEGER_CST))
-|| !rhs_valid_for_store_merging_p (rhs)
-   /* An access may not be volatile itself but base_addr may be
-  a volatile decl i.e. MEM[&volatile-decl].  The hashing for
-  tree_operand_hash won't consider such stores equal to each
-  other so we can't track chains on them.  */
-|| TREE_THIS_VOLATILE (base_addr);
+|| !rhs_valid_for_store_merging_p (rhs);
 
  /* We do not want to rewrite TARGET_MEM_REFs.  */
  if (TREE_CODE (base_addr) == TARGET_MEM_REF)
@@ -1398,7 +1396,32 @@ pass_store_merging::execute (function *f
  /* get_inner_reference returns the base object, get at its
 address now.  */
  else
-   base_addr = build_fold_addr_expr (base_addr);
+   {
+ if (bitpos < 0)
+   invalid = true;
+ base_addr = build_fold_addr_expr (base_addr);
+   }
+
+ if (! invalid
+ && offset != NULL_TREE)
+   {
+ /* If the access is variable offset then a base
+decl has to be address-taken to be able to
+emit pointer-based stores to it.
+???  We might be able to get away with
+re-using the original base up to the first
+variable part and then wrapping that inside
+a BIT_FIELD_REF.  */
+ tree base = get_base_address (base_addr);
+ if (! base
+ || (DECL_P (base)
+ && ! TREE_ADDRESSABLE (base)))
+   invalid = true;
+ else
+   base_addr = build2 (POINTER_PLUS_EXPR,
+   TREE_TYPE (base_addr),
+   base_addr, offset);
+   }
 
  struct imm_store_chain_info **chain_info
= m_stores.get (base_addr);
Index: gcc/testsuite/gcc.dg/store_merging_8.c
===
--- gcc/testsuite/gcc.dg/store_merging_8.c  (revision 0)
+++ gcc/testsuite/gcc.dg/store_merging_8.c  (working copy)
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_stric

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 2, 2016 at 2:06 PM, Jakub Jelinek  wrote:
> On Wed, Nov 02, 2016 at 01:59:00PM +0100, Richard Biener wrote:
>> > Yeah, that is what I meant.  The issue is how to report uses of such
>> > SSA_NAME when there is no memory.  So, either we'd need a special runtime
>> > library entrypoint that would report uses after scope even when there is no
>> > underlying memory, or we'd need to force it at asan pass time into memory 
>> > again.
>>
>> Well, there can't be any uses outside the scope -- there are no (memory) uses
>> left if we rewrite the thing into SSA.  That is, the address can no
>> longer "escape".
>>
>> Of course there could have been invalid uses before the rewrite into SSA.  
>> But
>> those can be diagnosed either immediately before or after re-writing into SSA
>> at compile-time (may be in dead code regions of course).
>
> Sure, we can warn on those at compile time, but we really should arrange to
> error on those at runtime if they are ever executed, the UB happens only at
> runtime, so in dead code isn't fatal.

Then we can replace those uses with a call into the asan runtime diagnosing the
issue instead?

Richard.

> Jakub

[PATCH] libiberty: Fix -Wimplicit-fallthrough warnings.

2016-11-02 Thread Mark Wielaard

Adjust some comments, add some explicit fall through comments or explicit
returns where necessary to not get implicit-fallthrough warnings.

All fall throughs were deliberate. In one case I added an explicit return
false for clarity instead of falling through a default case (that also
would return false).

libiberty/ChangeLog:

   * cplus-dem.c (demangle_signature): Move fall through comment.
   (demangle_fund_type): Add fall through comment between 'G' and 'I'.
   * hashtab.c (iterative_hash): Add fall through comments.
   * regex.c (regex_compile): Add Fall through comment after '+'/'?'.
   (byte_re_match_2_internal): Add Fall through comment after jump_n.
   Change "Note fall through" to "Fall through".
   (common_op_match_null_string_p): Return false after set_number_at
   instead of fall through.
---
diff --git a/libiberty/cplus-dem.c b/libiberty/cplus-dem.c
index 7f63397..810e971 100644
--- a/libiberty/cplus-dem.c
+++ b/libiberty/cplus-dem.c
@@ -1658,8 +1658,8 @@ demangle_signature (struct work_stuff *work,
  break;
}
  else
-   /* fall through */
{;}
+   /* fall through */
 
default:
  if (AUTO_DEMANGLING || GNU_DEMANGLING)
@@ -4024,6 +4024,7 @@ demangle_fund_type (struct work_stuff *work,
  success = 0;
  break;
}
+  /* fall through */
 case 'I':
   (*mangled)++;
   if (**mangled == '_')
diff --git a/libiberty/hashtab.c b/libiberty/hashtab.c
index 04607ea..99381b1 100644
--- a/libiberty/hashtab.c
+++ b/libiberty/hashtab.c
@@ -962,17 +962,17 @@ iterative_hash (const PTR k_in /* the key */,
   c += length;
   switch(len)  /* all the case statements fall through */
 {
-case 11: c+=((hashval_t)k[10]<<24);
-case 10: c+=((hashval_t)k[9]<<16);
-case 9 : c+=((hashval_t)k[8]<<8);
+case 11: c+=((hashval_t)k[10]<<24);/* fall through */
+case 10: c+=((hashval_t)k[9]<<16); /* fall through */
+case 9 : c+=((hashval_t)k[8]<<8);  /* fall through */
   /* the first byte of c is reserved for the length */
-case 8 : b+=((hashval_t)k[7]<<24);
-case 7 : b+=((hashval_t)k[6]<<16);
-case 6 : b+=((hashval_t)k[5]<<8);
-case 5 : b+=k[4];
-case 4 : a+=((hashval_t)k[3]<<24);
-case 3 : a+=((hashval_t)k[2]<<16);
-case 2 : a+=((hashval_t)k[1]<<8);
+case 8 : b+=((hashval_t)k[7]<<24); /* fall through */
+case 7 : b+=((hashval_t)k[6]<<16); /* fall through */
+case 6 : b+=((hashval_t)k[5]<<8);  /* fall through */
+case 5 : b+=k[4];  /* fall through */
+case 4 : a+=((hashval_t)k[3]<<24); /* fall through */
+case 3 : a+=((hashval_t)k[2]<<16); /* fall through */
+case 2 : a+=((hashval_t)k[1]<<8);  /* fall through */
 case 1 : a+=k[0];
   /* case 0: nothing left to add */
 }
diff --git a/libiberty/regex.c b/libiberty/regex.c
index 9ffc3f4..6854e3b 100644
--- a/libiberty/regex.c
+++ b/libiberty/regex.c
@@ -2493,6 +2493,7 @@ PREFIX(regex_compile) (const char *ARG_PREFIX(pattern),
   if ((syntax & RE_BK_PLUS_QM)
   || (syntax & RE_LIMITED_OPS))
 goto normal_char;
+ /* Fall through.  */
 handle_plus:
 case '*':
   /* If there is no previous pattern... */
@@ -6697,6 +6698,7 @@ byte_re_match_2_internal (struct re_pattern_buffer *bufp,
 {
   case jump_n:
is_a_jump_n = true;
+   /* Fall through.  */
   case pop_failure_jump:
  case maybe_pop_jump:
  case jump:
@@ -7125,7 +7127,7 @@ byte_re_match_2_internal (struct re_pattern_buffer *bufp,
   DEBUG_PRINT1 ("  Match => jump.\n");
  goto unconditional_jump;
}
-/* Note fall through.  */
+/* Fall through.  */
 
 
/* The end of a simple repeat has a pop_failure_jump back to
@@ -7150,7 +7152,7 @@ byte_re_match_2_internal (struct re_pattern_buffer *bufp,
dummy_low_reg, dummy_high_reg,
reg_dummy, reg_dummy, reg_info_dummy);
   }
- /* Note fall through.  */
+ /* Fall through.  */
 
unconditional_jump:
 #ifdef _LIBC
@@ -7453,6 +7455,7 @@ byte_re_match_2_internal (struct re_pattern_buffer *bufp,
 {
 case jump_n:
   is_a_jump_n = true;
+ /* Fall through.  */
 case maybe_pop_jump:
 case pop_failure_jump:
 case jump:
@@ -7718,6 +7721,7 @@ PREFIX(common_op_match_null_string_p) (UCHAR_T **p, 
UCHAR_T *end,
 
 case set_number_at:
   p1 += 2 * OFFSET_ADDRESS_SIZE;
+  return false;
 
 default:
   /* All other opcodes mean we cannot match the empty string.  */
-- 
1.8.3.1

Re: [PATCH] libiberty: Fix -Wimplicit-fallthrough warnings.

On Wed, Nov 02, 2016 at 02:19:33PM +0100, Mark Wielaard wrote:
> Adjust some comments, add some explicit fall through comments or explicit
> returns where necessary to not get implicit-fallthrough warnings.
> 
> All fall throughs were deliberate. In one case I added an explicit return
> false for clarity instead of falling through a default case (that also
> would return false).
> 
> libiberty/ChangeLog:
> 
>* cplus-dem.c (demangle_signature): Move fall through comment.
>(demangle_fund_type): Add fall through comment between 'G' and 'I'.
>* hashtab.c (iterative_hash): Add fall through comments.
>* regex.c (regex_compile): Add Fall through comment after '+'/'?'.
>(byte_re_match_2_internal): Add Fall through comment after jump_n.
>Change "Note fall through" to "Fall through".
>(common_op_match_null_string_p): Return false after set_number_at
>instead of fall through.

LGTM, except for:

> --- a/libiberty/cplus-dem.c
> +++ b/libiberty/cplus-dem.c
> @@ -1658,8 +1658,8 @@ demangle_signature (struct work_stuff *work,
> break;
>   }
> else
> - /* fall through */
>   {;}
> + /* fall through */

I think you should just remove the else and {;} and just have fallthrough
comment indented where else used to be.

Jakub

Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

2016-11-02 Thread Jiong Wang


On 01/11/16 16:48, Jason Merrill wrote:
> It seems to me that a CFA_*expression note would never use target
> UNSPEC codes, and a DWARF UNSPEC would never appear outside of such a
> note, so we don't need to worry about conflicts.

Indeed.

DWARF UNSPEC is deeper inside DW_CFA_*expression note.  My worry of conflict
makes no sense.

I updated the patch to put DWARF operation in to UNSPEC number field.

x86-64 bootstrap OK,  no regression on gcc/g++.

Please review.

Thanks.

gcc/
2016-11-02  Jiong Wang  

* reg-notes.def (CFA_VAL_EXPRESSION): New entry.
* dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New 
function.

(dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION.
(output_cfa_loc): Support DW_CFA_val_expression.
(output_cfa_loc_raw): Likewise.
(output_cfi): Likewise.
(output_cfi_directive): Likewise.
* dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression.
(dw_cfi_oprnd2_desc): Likewise.
(mem_loc_descriptor): Recognize new pattern generated for value
expression.
diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 6491d5aaf4c4a21241cc718bfff1016f6d149951..b8c88fbae1df80a2664a414d8ae016a5343bf435 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set)
   reg_save (sregno, dregno, 0);
 }
 
-/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note.  */
 
 static void
 dwarf2out_frame_debug_cfa_expression (rtx set)
@@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
   update_row_reg_save (cur_row, regno, cfi);
 }
 
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION
+   note.  */
+
+static void
+dwarf2out_frame_debug_cfa_val_expression (rtx set)
+{
+  rtx dest = SET_DEST (set);
+  gcc_assert (REG_P (dest));
+
+  rtx span = targetm.dwarf_register_span (dest);
+  gcc_assert (!span);
+
+  rtx src = SET_SRC (set);
+  dw_cfi_ref cfi = new_cfi ();
+  cfi->dw_cfi_opc = DW_CFA_val_expression;
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest);
+  cfi->dw_cfi_oprnd2.dw_cfi_loc
+= mem_loc_descriptor (src, GET_MODE (src),
+			  GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED);
+  add_cfi (cfi);
+  update_row_reg_save (cur_row, dwf_regno (dest), cfi);
+}
+
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */
 
 static void
@@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn)
 	break;
 
   case REG_CFA_EXPRESSION:
+  case REG_CFA_VAL_EXPRESSION:
 	n = XEXP (note, 0);
 	if (n == NULL)
 	  n = single_set (insn);
-	dwarf2out_frame_debug_cfa_expression (n);
+
+	if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION)
+	  dwarf2out_frame_debug_cfa_expression (n);
+	else
+	  dwarf2out_frame_debug_cfa_val_expression (n);
+
 	handled_one = true;
 	break;
 
@@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_CFA_expression
+  || cfi->dw_cfi_opc == DW_CFA_val_expression)
 {
   unsigned r =
 	DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh);
@@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_CFA_expression
+  || cfi->dw_cfi_opc == DW_CFA_val_expression)
 {
   unsigned r =
 	DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
@@ -3188,6 +3219,7 @@ output_cfi (dw_cfi_ref cfi, dw_fde_ref fde, int for_eh)
 
 	case DW_CFA_def_cfa_expression:
 	case DW_CFA_expression:
+	case DW_CFA_val_expression:
 	  output_cfa_loc (cfi, for_eh);
 	  break;
 
@@ -3302,16 +3334,13 @@ output_cfi_directive (FILE *f, dw_cfi_ref cfi)
   break;
 
 case DW_CFA_def_cfa_expression:
-  if (f != asm_out_file)
-	{
-	  fprintf (f, "\t.cfi_def_cfa_expression ...\n");
-	  break;
-	}
-  /* FALLTHRU */
 case DW_CFA_expression:
+case DW_CFA_val_expression:
   if (f != asm_out_file)
 	{
-	  fprintf (f, "\t.cfi_cfa_expression ...\n");
+	  fprintf (f, "\t.cfi_%scfa_%sexpression ...\n",
+		   cfi->dw_cfi_opc == DW_CFA_def_cfa_expression ? "def_" : "",
+		   cfi->dw_cfi_opc == DW_CFA_val_expression ? "val_" : "");
 	  break;
 	}
   fprintf (f, "\t.cfi_escape %#x,", cfi->dw_cfi_opc);
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4a3df339df2c6a6816ac8b8dbdb2466a7492c592..7dac70d7392f2c457ffd3f677e07decb1ba488a1 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -518,6 +518,7 @@ dw_cfi_oprnd1_desc (enum dwarf_call_frame_info cfi)
 case DW_CFA_def_cfa_register:
 case DW_CFA_register:
 case DW_CFA_expression:
+case DW_CFA_val_expression:
   return dw_cfi_oprnd_reg_num;
 
 case DW_CFA_def_cfa_offset:
@@ -551,6 +552,7 @@ dw_cfi_oprnd2_desc (enum dwarf_

Re: [PATCH 1/2, libgcc]: Implement _divmoddi4

2016-11-02 Thread Ian Lance Taylor

On Mon, Oct 31, 2016 at 7:46 AM, Uros Bizjak  wrote:
> This function will be used in a follow-up patch to implement
> TARGET_EXPAND_DIVMOD_LIBFUNC for x86 targets. Other targets can call
> this function, so IMO it should be part of a generic library.
>
> 2016-10-31  Uros Bizjak  
>
> * Makefile.in (LIB2_DIVMOD_FUNCS): Add _divmoddi4.
> * libgcc2.c (__divmoddi4): New function.
> * libgcc2.h (__divmoddi4): Declare.
> * libgcc-std.ver.in (GCC_7.0.0): New. Add __PFX_divmoddi4
> and __PFX_divmodti4.

You aren't defining divmodti4 anywhere, so it seems premature to add
it to libgcc-std.ver.in.

Other than that the patch is OK.

Thanks.

Ian

Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsincs.

2016-11-02 Thread Christophe Lyon

On 2 November 2016 at 12:22, Bin.Cheng  wrote:
> On Tue, Nov 1, 2016 at 12:21 PM, Kyrill Tkachov
>  wrote:
>> Hi Tamar,
>>
>>
>> On 26/10/16 16:01, Tamar Christina wrote:
>>>
>>> Hi Christophe,
>>>
>>> Here's the updated patch.
>>>
>>> Cheers,
>>> Tamar
>>> 
>>> From: Christophe Lyon 
>>> Sent: Wednesday, October 19, 2016 11:23:56 AM
>>> To: Tamar Christina
>>> Cc: GCC Patches; Kyrylo Tkachov; nd
>>> Subject: Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and
>>> vminnmQ_ST intrinsincs.
>>>
>>> On 19 October 2016 at 11:36, Tamar Christina 
>>> wrote:

 Hi All,

 This patch implements the vmaxnmQ_ST and vminnmQ_ST intrinsics. The
 current builtin registration code is deficient since it can't access
 standard pattern names, to which vmaxnmQ_ST and vminnmQ_ST map
 directly. Thus, to enable the vectoriser to have access to these
 intrinsics, we implement them using builtin functions, which we
 expand to the proper standard pattern using a define_expand.

 This patch also implements the __ARM_FEATURE_NUMERIC_MAXMIN macro,
 which is defined when __ARM_ARCH >= 8, and which enables the
 intrinsics.

 Regression tested on arm-none-eabi and no regressions.

 This patch is a rework of a previous patch:
 https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01971.html

 OK for trunk?
> These cases failed on arm-none-linux-gnueabihf as below:
> FAIL: gcc.target/arm/simd/vmaxnm_f32_1.c execution test
> FAIL: gcc.target/arm/simd/vmaxnmq_f32_1.c execution test
> FAIL: gcc.target/arm/simd/vminnm_f32_1.c execution test
> FAIL: gcc.target/arm/simd/vminnmq_f32_1.c execution test
>
> For such changes, I would suggest reg test for both bare-metal and
> linux toolchains, plus a bootstrap for linux toolchain.
>

Hi,

I confirm some tests are failing:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241736/report-build-info.html

Sorry I couldn't answer/test before you committed, I was on holidays.

Christophe

> Thanks,
> bin
>
>>
>>
>> Ok.
>> Thanks,
>> Kyrill
>>
>>
 Thanks,
 Tamar

 ---

 gcc/

 2016-10-19  Bilyan Borisov  
  Tamar Christina 

  * config/arm/arm-c.c (arm_cpu_builtins): New macro definition.
  * config/arm/arm_neon.h (vmaxnm_f32): New intrinsinc.
  (vmaxnmq_f32): Likewise.
  (vminnm_f32): Likewise.
  (vminnmq_f32): Likewise.
  * config/arm/arm_neon_builtins.def (vmaxnm): New builtin.
  (vminnm): Likewise.
  * config/arm/neon.md (neon_, VCVTF): New
  expander.

 gcc/testsuite/

 2016-10-19  Bilyan Borisov  

  * gcc.target/arm/simd/vmaxnm_f32_1.c: New.
  * gcc.target/arm/simd/vmaxnmq_f32_1.c: Likewise.
  * gcc.target/arm/simd/vminnm_f32_1.c: Likewise.
  * gcc.target/arm/simd/vminnmq_f32_1.c: Likewise.

>>> I think you forgot to attach the new tests.
>>>
>>> Christophe
>>>
>>

Re: [RFC PATCH] expand_strn_compare should attempt expansion even if neither string is constant

2016-11-02 Thread Aaron Sawdey

On Wed, 2016-11-02 at 13:41 +0100, Bernd Schmidt wrote:
> On 10/27/2016 03:14 AM, Aaron Sawdey wrote:
> > 
> > I'm currently working on a builtin expansion of strncmp for powerpc
> > similar to the one for memcmp I checked recently. One thing I
> > encountered is that the code in expand_strn_compare will not
> > attempt to
> > expand the cmpstrnsi pattern at all if neither string parameter is
> > a
> > constant string. This doesn't make a lot of sense in light of the
> > fact
> > that expand_str_compare starts with an attempt to expand cmpstrsi
> > and
> > then if that does not work, looks at the string args to see if one
> > is
> > constant so it can use cmpstrnsi with the known length.
> > 
> > The attached patch is my attempt to add this fallback path to
> > expand_strn_compare, i.e. if neither length is known, just attempt
> > expansion of cmpstrnsi using the given 3 arguments.
> > 
> > It looks like in addition to rs6000, there are 3 other targets that
> > have cmpstrnsi patterns: i386, sh, and rx.
> > 
> > Is this a reasonable approach to take with this? If so I'll
> > bootstrap/regtest on i386 as rs6000 does not as yet have an
> > expansion
> > for cmpstrsi or cmpstrnsi.
> 
> Just to be sure, this is superseded by your later patch series,
> correct?
> 
> (After I saw this one I had been trying to remember what exactly the 
> i386 issue was but it looks like you found it yourself.)
> 
> 
> Bernd

Yes, the later patch series replaces this preliminary patch. And yes,
the i386 issue took some headscratching and careful reading of the arch
manual on what repz cmpsb actually does.

Thanks,
   Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

2016-11-02 Thread Segher Boessenkool

On Wed, Nov 02, 2016 at 11:39:20AM +0100, Steven Bosscher wrote:
>  On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener
>  wrote:
> > On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote:
> >> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
> >>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
> >>> > +  cfg_layout_finalize ();
> >>>
> >>> I don't think you have to go into/out-of cfglayout mode for each 
> >>> iteration.
> >>
> >> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
> >> iteration though, I was afraid that interacts.
> >
> > Ick.  Why would it need a cfg-cleanup every iteration?
> 
> I don't believe it needs a cleanup on every iteration. One cleanup at
> the end should work fine.

But as the comment there says:

  /* Merge the duplicated blocks into predecessors, when possible.  */
  cleanup_cfg (0);

(this is not a new comment), and without merging blocks this whole
patch does zilch.

There is no need to create new basic blocks at all (can replace the
final branch in a block with a copy of the whole block it jumps to,
instead); and then it is painfully obvious that switching to layout
mode here is pointless, too.

Do you want me to do that?

Btw, this isn't quadratic anyway; it is a constant number (the maximal
length allowed of the block to copy) linear.  Unless there are unboundedly
many forwarder blocks, which there shouldn't be, but I cannot prove that.

And on a testcase with 2000 cases (instead of the 4 in the testcase in
the PR) this pass takes less than 1% of the compiler runtime; and in
the normal cases (no computed gotos to unfactor) it does the same as
before.

Segher

Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

On Wed, Nov 02, 2016 at 01:26:48PM +, Jiong Wang wrote:
> -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. 
> */
> +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. 
>  */

Too long line.

> +/* RTL sequences inside PARALLEL are raw expression representation.
> +
> +   mem_loc_descriptor can be used to build generic DWARF expressions for
> +   DW_CFA_expression and DW_CFA_val_expression where the expression may 
> can
> +   not be represented using normal RTL sequences.  In this case, group 
> all
> +   expression operations (DW_OP_*) inside a PARALLEL.  For those DW_OP 
> which
> +   doesn't have RTL mapping, wrap it using UNSPEC.  The logic for parsing
> +   PARALLEL sequences is:
> +
> + foreach elem inside PARALLEL
> +   if (elem is UNSPEC)
> + dw_op =  XINT (elem, 1) (DWARF operation is kept as UNSPEC number)
> + oprnd1 = XVECEXP (elem, 0, 0)
> + oprnd2 = XVECEXP (elem, 0, 1)
> +   else
> + call mem_loc_descriptor  */

Not sure if it is a good idea to document in weirdly formatted
pseudo-language what the code actually does a few lines below.  IMHO either
express it in words, or don't express it at all.

> + exp_result =
> +   new_loc_descr ((enum dwarf_location_atom) dw_op, oprnd1,
> +  oprnd2);

Wrong formatting, = should be on the next line.

> +   }
> + else
> +   exp_result =
> + mem_loc_descriptor (elem, mode, mem_mode,
> + VAR_INIT_STATUS_INITIALIZED);

Likewise.

Jakub

Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS

2016-11-02 Thread Wilco Dijkstra

Richard Biener wrote:
On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra  wrote:

> > If bswap is false no byte swap is needed, so we found a native endian load
> > and it will always perform the optimization by inserting an unaligned load.
>
> Yes, the general agreement is that the expander can do best and thus we
> should canonicalize accesses to larger ones even for SLOW_UNALIGNED_ACCESS.
> The expander will generate the canonical best code (hopefully...).

Right, but there are cases where you have to choose between unaligned or aligned
accesses and you need to know whether the unaligned access is fast.

A good example is memcpy expansion, if you have fast unaligned accesses then you
should use them to deal with the last few bytes, but if they get expanded, 
using several
aligned accesses is much faster than a single unaligned access.

> > This apparently works on all targets, and doesn't cause alignment traps or
> > huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS.
> > So I'm at a loss what these macros are supposed to mean and how I can query
> > whether a backend supports fast unaligned access for a particular mode.
> >
> > What I actually want to write is something like:
> >
> >  if (!FAST_UNALIGNED_LOAD (mode, align)) return false;
> >
> > And know that it only accepts unaligned accesses that are efficient on the 
> > target.
> > Maybe we need a new hook like this and get rid of the old one?
>
> No, we don't need to other hook.
> 
> Note there is another similar user in gimple-fold.c when folding small
> memcpy/memmove
> to single load/store pairs (patch posted but not applied by me -- I've
> asked for strict-align
> target maintainer feedback but got none).

I didn't find it, do you have a link?

> Now - for bswap I'm only 99% sure that unaligned load + bswap is
> better than piecewise loads plus manual swap.

It depends on whether unaligned loads and bswap are expanded or not. Even if we 
assume the expansion is at least as efficient as doing it explicitly 
(definitely true
for modes larger than the native integer size - as we found out in PR77308!),
if both the unaligned load and bswap are expanded it seems better not to make 
the
transformation for modes up to the word size. But there is no way to find out as
SLOW_UNALIGNED_ACCESS must be true whenever STRICT_ALIGN is true.

> But generally I'm always in favor of removing SLOW_UNALIGNED_ACCESS /
> STRICT_ALIGNMENT checks from the GIMPLE side of the compiler.

I sort of agree because the purpose of these macros is unclear - the 
documentation
is insufficient and out of date. I do believe however we need an accurate way 
to find out
whether a target supports fast unaligned accesses as that is required to 
generate good
target code.

Wilco

Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

On Wed, Nov 2, 2016 at 2:40 PM, Segher Boessenkool
 wrote:
> On Wed, Nov 02, 2016 at 11:39:20AM +0100, Steven Bosscher wrote:
>>  On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener
>>  wrote:
>> > On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote:
>> >> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
>> >>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
>> >>> > +  cfg_layout_finalize ();
>> >>>
>> >>> I don't think you have to go into/out-of cfglayout mode for each 
>> >>> iteration.
>> >>
>> >> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
>> >> iteration though, I was afraid that interacts.
>> >
>> > Ick.  Why would it need a cfg-cleanup every iteration?
>>
>> I don't believe it needs a cleanup on every iteration. One cleanup at
>> the end should work fine.
>
> But as the comment there says:
>
>   /* Merge the duplicated blocks into predecessors, when possible.  */
>   cleanup_cfg (0);
>
> (this is not a new comment), and without merging blocks this whole
> patch does zilch.
>
> There is no need to create new basic blocks at all (can replace the
> final branch in a block with a copy of the whole block it jumps to,
> instead); and then it is painfully obvious that switching to layout
> mode here is pointless, too.
>
> Do you want me to do that?
>
> Btw, this isn't quadratic anyway; it is a constant number (the maximal
> length allowed of the block to copy) linear.  Unless there are unboundedly
> many forwarder blocks, which there shouldn't be, but I cannot prove that.

Well, you iterate calling functions (find candidates, merge, cleanup-cfg) that
walk the whole function.  So unless the number of iterations is bound
with a constant I call this quadratic (in function size).

> And on a testcase with 2000 cases (instead of the 4 in the testcase in
> the PR) this pass takes less than 1% of the compiler runtime; and in
> the normal cases (no computed gotos to unfactor) it does the same as
> before.
>
>
> Segher

Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS

On Wed, Nov 2, 2016 at 2:43 PM, Wilco Dijkstra  wrote:
> Richard Biener wrote:
> On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra  
> wrote:
>
>> > If bswap is false no byte swap is needed, so we found a native endian load
>> > and it will always perform the optimization by inserting an unaligned load.
>>
>> Yes, the general agreement is that the expander can do best and thus we
>> should canonicalize accesses to larger ones even for SLOW_UNALIGNED_ACCESS.
>> The expander will generate the canonical best code (hopefully...).
>
> Right, but there are cases where you have to choose between unaligned or 
> aligned
> accesses and you need to know whether the unaligned access is fast.
>
> A good example is memcpy expansion, if you have fast unaligned accesses then 
> you
> should use them to deal with the last few bytes, but if they get expanded, 
> using several
> aligned accesses is much faster than a single unaligned access.

Yes.  That's RTL expansion at which point you of course have to look
at SLOW_UNALIGNED_ACCESS.

>> > This apparently works on all targets, and doesn't cause alignment traps or
>> > huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS.
>> > So I'm at a loss what these macros are supposed to mean and how I can query
>> > whether a backend supports fast unaligned access for a particular mode.
>> >
>> > What I actually want to write is something like:
>> >
>> >  if (!FAST_UNALIGNED_LOAD (mode, align)) return false;
>> >
>> > And know that it only accepts unaligned accesses that are efficient on the 
>> > target.
>> > Maybe we need a new hook like this and get rid of the old one?
>>
>> No, we don't need to other hook.
>>
>> Note there is another similar user in gimple-fold.c when folding small
>> memcpy/memmove
>> to single load/store pairs (patch posted but not applied by me -- I've
>> asked for strict-align
>> target maintainer feedback but got none).
>
> I didn't find it, do you have a link?

https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00598.html

>> Now - for bswap I'm only 99% sure that unaligned load + bswap is
>> better than piecewise loads plus manual swap.
>
> It depends on whether unaligned loads and bswap are expanded or not. Even if 
> we
> assume the expansion is at least as efficient as doing it explicitly 
> (definitely true
> for modes larger than the native integer size - as we found out in PR77308!),
> if both the unaligned load and bswap are expanded it seems better not to make 
> the
> transformation for modes up to the word size. But there is no way to find out 
> as
> SLOW_UNALIGNED_ACCESS must be true whenever STRICT_ALIGN is true.

The case I was thinking about is availability of a bswap load operating only on
aligned memory and "regular" register bswap being "fake" provided by first
spilling to an aligned stack slot and then loading from that.

Maybe a bit far-fetched.

>> But generally I'm always in favor of removing SLOW_UNALIGNED_ACCESS /
>> STRICT_ALIGNMENT checks from the GIMPLE side of the compiler.
>
> I sort of agree because the purpose of these macros is unclear - the 
> documentation
> is insufficient and out of date. I do believe however we need an accurate way 
> to find out
> whether a target supports fast unaligned accesses as that is required to 
> generate good
> target code.

I believe the target macros are solely for RTL expansion and say that
it has to avoid
unaligned ops as those would trap.

Richard.

> Wilco

Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

2016-11-02 Thread Jiong Wang


On 02/11/16 13:42, Jakub Jelinek wrote:

On Wed, Nov 02, 2016 at 01:26:48PM +, Jiong Wang wrote:

-/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note.  
*/


Too long line.


Hmm, it shows 80 columns under my editor.  I guess '+' is counted in?




+/* RTL sequences inside PARALLEL are raw expression representation.
+
+   mem_loc_descriptor can be used to build generic DWARF expressions for
+   DW_CFA_expression and DW_CFA_val_expression where the expression may can
+   not be represented using normal RTL sequences.  In this case, group all
+   expression operations (DW_OP_*) inside a PARALLEL.  For those DW_OP 
which
+   doesn't have RTL mapping, wrap it using UNSPEC.  The logic for parsing
+   PARALLEL sequences is:
+
+   foreach elem inside PARALLEL
+ if (elem is UNSPEC)
+   dw_op =  XINT (elem, 1) (DWARF operation is kept as UNSPEC number)
+   oprnd1 = XVECEXP (elem, 0, 0)
+   oprnd2 = XVECEXP (elem, 0, 1)
+ else
+   call mem_loc_descriptor  */


Not sure if it is a good idea to document in weirdly formatted
pseudo-language what the code actually does a few lines below.  IMHO either
express it in words, or don't express it at all.


OK, fixed. I replaced these comments as some brief words.




+   exp_result =
+ new_loc_descr ((enum dwarf_location_atom) dw_op, oprnd1,
+oprnd2);


Wrong formatting, = should be on the next line.


+ }
+   else
+ exp_result =
+   mem_loc_descriptor (elem, mode, mem_mode,
+   VAR_INIT_STATUS_INITIALIZED);


Likewise.


Both fixed. Patch updated, please review.

Thanks.

gcc/
2016-11-02  Jiong Wang  

* reg-notes.def (CFA_VAL_EXPRESSION): New entry.
* dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New function.
(dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION.
(output_cfa_loc): Support DW_CFA_val_expression.
(output_cfa_loc_raw): Likewise.
(output_cfi): Likewise.
(output_cfi_directive): Likewise.
* dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression.
(dw_cfi_oprnd2_desc): Likewise.
(mem_loc_descriptor): Recognize new pattern generated for value
expression.

commit 36de0173c17efcc30c56ef10304377e71313e8bc
Author: Jiong Wang 
Date:   Wed Oct 19 15:42:04 2016 +0100

dwarf val expression

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 6491d5a..b8c88fb 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set)
   reg_save (sregno, dregno, 0);
 }
 
-/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note.  */
 
 static void
 dwarf2out_frame_debug_cfa_expression (rtx set)
@@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
   update_row_reg_save (cur_row, regno, cfi);
 }
 
+/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION
+   note.  */
+
+static void
+dwarf2out_frame_debug_cfa_val_expression (rtx set)
+{
+  rtx dest = SET_DEST (set);
+  gcc_assert (REG_P (dest));
+
+  rtx span = targetm.dwarf_register_span (dest);
+  gcc_assert (!span);
+
+  rtx src = SET_SRC (set);
+  dw_cfi_ref cfi = new_cfi ();
+  cfi->dw_cfi_opc = DW_CFA_val_expression;
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest);
+  cfi->dw_cfi_oprnd2.dw_cfi_loc
+= mem_loc_descriptor (src, GET_MODE (src),
+			  GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED);
+  add_cfi (cfi);
+  update_row_reg_save (cur_row, dwf_regno (dest), cfi);
+}
+
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */
 
 static void
@@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn)
 	break;
 
   case REG_CFA_EXPRESSION:
+  case REG_CFA_VAL_EXPRESSION:
 	n = XEXP (note, 0);
 	if (n == NULL)
 	  n = single_set (insn);
-	dwarf2out_frame_debug_cfa_expression (n);
+
+	if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION)
+	  dwarf2out_frame_debug_cfa_expression (n);
+	else
+	  dwarf2out_frame_debug_cfa_val_expression (n);
+
 	handled_one = true;
 	break;
 
@@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_CFA_expression
+  || cfi->dw_cfi_opc == DW_CFA_val_expression)
 {
   unsigned r =
 	DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh);
@@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi)
   dw_loc_descr_ref loc;
   unsigned long size;
 
-  if (cfi->dw_cfi_opc == DW_CFA_expression)
+  if (cfi->dw_cfi_opc == DW_CFA_expression
+  || cfi->dw_cfi_opc == DW_CFA_val_expression)
 {
   u

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Marek Polacek

On Wed, Nov 02, 2016 at 11:10:53AM +0100, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote:
> > > Which is gimplified as:
> > > 
> > > int * ptr;
> > > 
> > > switch (argc) , case 1: >
> > > {
> > >   int a;
> > > 
> > >   try
> > > {
> > >   ASAN_MARK (2, &a, 4);
> > >   :
> > >   goto ;
> > >   :
> > >   ptr = &a;
> > >   goto ;
> > > }
> > >   finally
> > > {
> > >   ASAN_MARK (1, &a, 4);
> > > }
> 
> > Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
> > statement?  Otherwise, consider this being done in a loop, after the first
> > iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with
> > args 1 and have in case 1: a = 0;, won't that trigger runtime error?
> 
> Wonder if for the variables declared inside of switch body, because we don't
> care about uses before scope, it wouldn't be more efficient to arrange for
> int *p, *q, *r;
> switch (x)
>   {
> int a;
>   case 1:
> p = &a;
> a = 5;
> break;
> int b;
>   case 2:
> int c;
> q = &b;
> r = &c;
> b = 3;
> c = 4;
> break;
>   }
> to effectively ASAN_MARK (2, &a, 4); ASAN_MARK (2, &b, 4); ASAN_MARK (2, &c, 
> 4);
> before the GIMPLE_SWITCH, instead of unpoisoning them on every case label
> where they might be in scope.  Though, of course, at least until lower pass
> that is quite ugly, because it would refer to "not yet declared" variables.
> Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not
> the expression evaluation of the switch control expression) inside of the
> switches' GIMPLE_BIND.

So is there anything I should do wrt -Wswitch-unreachable?

Marek

[PATCH 2/4] Split prefix-map values on the last '=' character, not the first

We are planning to ask other tools to support SOURCE_PREFIX_MAP, in the same
way that we have already done this for SOURCE_DATE_EPOCH. So, this will last
for some time and have quite wide reach. Consequently, we believe it is better
to split on the final '=', since it is much less likely to result in problems.

For example, with the previous behaviour (splitting on the first '=') it would
not be possible to map paths like the following:

C:\Documents and Settings\Betrand Russell\Proofs of 1+1=2\Automated 
Proofs\Source Code\
/srv/net/distributed-hash-table/address/VaIWP8YIWDChR2O76yiufXBsbw5g2skB_kp9VP-qVLvydovdGw==/projects/gcc-6/

These are contrived examples, but hopefully you can agree that they're not *so*
unrealistic - future software or users might plausibly wish to run reproducible
build processes underneath paths similar to these.

On the other hand, I can think of much fewer cases where the new-prefix *must*
include a '=' character. I can't think of any software project that includes
it, and I'd imagine that any such (or future) projects that might exist would
already have standardised "ASCII-only" versions of its name.

ChangeLogs
--

gcc/ChangeLog:

2016-11-01  Ximin Luo  

* final.c: (add_debug_prefix_map): Split on the last and not first '='.
* doc/invoke.texi (Environment Variables): Update SOURCE_PREFIX_MAP to
describe new parsing.

Index: gcc-7-20161030/gcc/final.c
===
--- gcc-7-20161030.orig/gcc/final.c
+++ gcc-7-20161030/gcc/final.c
@@ -1528,7 +1528,7 @@ add_debug_prefix_map (const char *arg)
   debug_prefix_map *map;
   const char *p;
 
-  p = strchr (arg, '=');
+  p = strrchr (arg, '=');
   if (!p)
 {
   error ("invalid value %qs for debug-prefix-map", arg);
Index: gcc-7-20161030/gcc/doc/invoke.texi
===
--- gcc-7-20161030.orig/gcc/doc/invoke.texi
+++ gcc-7-20161030/gcc/doc/invoke.texi
@@ -26233,8 +26233,8 @@ output by higher-level build processes.
 
 The form and behaviour is similar to @option{-fdebug-prefix-map}. That
 is, the value of @env{SOURCE_PREFIX_MAP} must be of the form
-@samp{@var{old}=@var{new}}. The split occurs on the first @code{=}
-character, so that @var{old} cannot itself contain a @code{=}.
+@samp{@var{old}=@var{new}}. The split occurs on the last @code{=}
+character, so that @var{new} cannot itself contain a @code{=}.
 
 Whenever an absolute source- or build-related filepath is to be emitted
 in a final end-result output, GCC will replace @var{old} with @var{new}

[PATCH 4/4] Use SOURCE_DATE_EPOCH in place of TIMESTAMP if the latter is newer.

This brings the behaviour in line with the __DATE__ and __TIME__ macros. Note
though the minor difference: __TIMESTAMP__ is defined as the file-modification
date and not the "current" date or time like the other two macros. Therefore,
we do a clamping behaviour (similar to tar --clamp-mtime).

Acknowledgements


Reiner Herrmann suggested the clamping behaviour.

ChangeLogs
--

libcpp/ChangeLog:

2016-11-01  Ximin Luo  

* macro.c (_cpp_builtin_macro_text): Use SOURCE_DATE_EPOCH in place of
__TIMESTAMP__ if the latter is newer than the former.

gcc/ChangeLog:

2016-11-01  Ximin Luo  

* doc/cppenv.texi (Environment Variables): Update SOURCE_DATE_EPOCH to
describe the new effect on __TIMESTAMP__.

gcc/testsuite/ChangeLog:

2016-11-01  Ximin Luo  

* gcc.dg/cpp/source_date_epoch-4.c: New test.
* gcc.dg/cpp/source_date_epoch-5.c: New test.

Index: gcc-7-20161030/libcpp/macro.c
===
--- gcc-7-20161030.orig/libcpp/macro.c
+++ gcc-7-20161030/libcpp/macro.c
@@ -264,7 +264,30 @@ _cpp_builtin_macro_text (cpp_reader *pfi
struct tm *tb = NULL;
struct stat *st = _cpp_get_file_stat (file);
if (st)
- tb = localtime (&st->st_mtime);
+ {
+   /* Set a reproducible timestamp for __DATE__ and __TIME__ 
macro
+  if SOURCE_DATE_EPOCH is defined.  */
+   if (pfile->source_date_epoch == (time_t) -2
+   && pfile->cb.get_source_date_epoch != NULL)
+ pfile->source_date_epoch = 
pfile->cb.get_source_date_epoch (pfile);
+
+   if (pfile->source_date_epoch >= (time_t) 0)
+ {
+   /* If SOURCE_DATE_EPOCH is set, we want reproducible
+  timestamps so use gmtime not localtime.  */
+   if (st->st_mtime >= pfile->source_date_epoch)
+ tb = gmtime (&pfile->source_date_epoch);
+   else
+ /* Don't use SOURCE_DATE_EPOCH if the timestamp is
+older, since that means it was probably already
+set in a reproducible way (e.g. via source tarball
+extraction or some other method). */
+ tb = gmtime (&st->st_mtime);
+ }
+   else
+ tb = localtime (&st->st_mtime);
+ }
+
if (tb)
  {
char *str = asctime (tb);
Index: gcc-7-20161030/gcc/doc/cppenv.texi
===
--- gcc-7-20161030.orig/gcc/doc/cppenv.texi
+++ gcc-7-20161030/gcc/doc/cppenv.texi
@@ -83,8 +83,9 @@ main input file is omitted.
 @item SOURCE_DATE_EPOCH
 If this variable is set, its value specifies a UNIX timestamp to be
 used in replacement of the current date and time in the @code{__DATE__}
-and @code{__TIME__} macros, so that the embedded timestamps become
-reproducible.
+and @code{__TIME__} macros, and in replacement of the file modification
+date in the @code{__TIMESTAMP__} macro if the latter is newer than the
+former, so that the embedded timestamps become reproducible.
 
 The value of @env{SOURCE_DATE_EPOCH} must be a UNIX timestamp,
 defined as the number of seconds (excluding leap seconds) since
Index: gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-4.c
===
--- /dev/null
+++ gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-4.c
@@ -0,0 +1,13 @@
+/* __TIMESTAMP__ should use SOURCE_DATE_EPOCH if the latter is older. */
+/* { dg-do run } */
+/* { dg-set-compiler-env-var TZ "UTC" } */
+/* { dg-set-compiler-env-var LC_ALL "C" } */
+/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "0" } */
+
+int
+main ()
+{
+  if (__builtin_strcmp (__TIMESTAMP__, "Thu Jan  1 00:00:00 1970") != 0)
+__builtin_abort ();
+  return 0;
+}
Index: gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-5.c
===
--- /dev/null
+++ gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-5.c
@@ -0,0 +1,13 @@
+/* __TIMESTAMP__ should not use SOURCE_DATE_EPOCH if the latter is newer. */
+/* { dg-do run } */
+/* { dg-set-compiler-env-var TZ "UTC" } */
+/* { dg-set-compiler-env-var LC_ALL "C" } */
+/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "253402300799" } */
+
+int
+main ()
+{
+  if (__builtin_strcmp (__TIMESTAMP__, "Fri Dec 31 23:59:59 UTC ") == 0)
+__builtin_abort ();
+  return 0;
+}

[PATCH 1/4] Use SOURCE_PREFIX_MAP envvar as an implicit debug-prefix-map

Define the SOURCE_PREFIX_MAP environment variable, and treat it as an implicit
CLI -fdebug-prefix-map option specified before any explicit such options.

Acknowledgements


Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of
time figuring out what to edit.

HW42 for discussion on the details of the proposal, and for suggesting that we
retain the ability to map the prefix to something other than ".".

ChangeLogs
--

gcc/ChangeLog:

2016-11-01  Ximin Luo  

* opts-global.c (add_debug_prefix_map_from_envvar): Add the value of
SOURCE_PREFIX_MAP as a debug_prefix_map if the former is set.
(handle_common_deferred_options): Call add_debug_prefix_map_from_envvar
before processing options.
* final.c: (add_debug_prefix_map): Be less specific in the error 
message,
since it can also be triggered by the SOURCE_PREFIX_MAP variable.
* doc/invoke.texi (Environment Variables): Document form and behaviour 
of
SOURCE_PREFIX_MAP.

gcc/testsuite/ChangeLog:

2016-11-01  Ximin Luo  

* gcc.dg/debug/dwarf2/source_prefix_map-1.c: New test.
* gcc.dg/debug/dwarf2/source_prefix_map-2.c: New test.

Index: gcc-7-20161030/gcc/opts-global.c
===
--- gcc-7-20161030.orig/gcc/opts-global.c
+++ gcc-7-20161030/gcc/opts-global.c
@@ -310,6 +310,21 @@ decode_options (struct gcc_options *opts
   finish_options (opts, opts_set, loc);
 }
 
+/* Add a debug-prefix-map using the SOURCE_PREFIX_MAP environment variable if
+   it is set.  */
+
+static void
+add_debug_prefix_map_from_envvar ()
+{
+  char *prefix_map;
+
+  prefix_map = getenv ("SOURCE_PREFIX_MAP");
+  if (!prefix_map)
+return;
+
+  add_debug_prefix_map (prefix_map);
+}
+
 /* Hold command-line options associated with stack limitation.  */
 const char *opt_fstack_limit_symbol_arg = NULL;
 int opt_fstack_limit_register_no = -1;
@@ -335,6 +350,8 @@ handle_common_deferred_options (void)
   if (flag_opt_info)
 opt_info_switch_p (NULL);
 
+  add_debug_prefix_map_from_envvar ();
+
   FOR_EACH_VEC_ELT (v, i, opt)
 {
   switch (opt->opt_index)
Index: gcc-7-20161030/gcc/final.c
===
--- gcc-7-20161030.orig/gcc/final.c
+++ gcc-7-20161030/gcc/final.c
@@ -1531,7 +1531,7 @@ add_debug_prefix_map (const char *arg)
   p = strchr (arg, '=');
   if (!p)
 {
-  error ("invalid argument %qs to -fdebug-prefix-map", arg);
+  error ("invalid value %qs for debug-prefix-map", arg);
   return;
 }
   map = XNEW (debug_prefix_map);
Index: gcc-7-20161030/gcc/doc/invoke.texi
===
--- gcc-7-20161030.orig/gcc/doc/invoke.texi
+++ gcc-7-20161030/gcc/doc/invoke.texi
@@ -26222,6 +26222,23 @@ Recognize EUCJP characters.
 If @env{LANG} is not defined, or if it has some other value, then the
 compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale 
to
 recognize and translate multibyte characters.
+
+@item SOURCE_PREFIX_MAP If this variable is set, it specifies a mapping
+that is used to transform filepaths that are output in debug symbols.
+This helps the embedded paths become reproducible, without having the
+unreproducible value be visible in other input sources - such as GCC
+command-line flags or standardised build-time environment variables like
+@code{CFLAGS} - that are commonly legitimately-embedded in the build
+output by higher-level build processes.
+
+The form and behaviour is similar to @option{-fdebug-prefix-map}. That
+is, the value of @env{SOURCE_PREFIX_MAP} must be of the form
+@samp{@var{old}=@var{new}}. The split occurs on the first @code{=}
+character, so that @var{old} cannot itself contain a @code{=}.
+
+Whenever an absolute source- or build-related filepath is to be emitted
+in a final end-result output, GCC will replace @var{old} with @var{new}
+if that filepath starts with @var{old}.
 @end table
 
 @noindent
Index: gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-1.c
===
--- /dev/null
+++ gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-1.c
@@ -0,0 +1,9 @@
+/* DW_AT_comp_dir should be relative if SOURCE_PREFIX_MAP is a prefix of it. */
+/* { dg-do compile } */
+/* { dg-options "-gdwarf -dA" } */
+/* { dg-set-compiler-env-var SOURCE_PREFIX_MAP "[file dirname 
[pwd]]=DWARF2TEST" } */
+/* { dg-final { scan-assembler-dem "DW_AT_comp_dir: \"DWARF2TEST/gcc" } } */
+
+void func (void)
+{
+}
Index: gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-2.c
===
--- /dev/null
+++ gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-2.c
@@ -0,0 +1,8 @@
+/* DW_AT_comp_dir should be absolute if SOURCE_PREFIX_MAP is not set. */
+/* { dg-do compile } */
+/* { dg-op

[PATCH 3/4] Use SOURCE_PREFIX_MAP envvar to transform FILE

Honour the SOURCE_PREFIX_MAP environment variable when expanding the __FILE__
macro, in the same way that debug-prefix-map works for debugging symbol paths.

This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH.
Specifically, we read the environment variable not in libcpp but via a hook
which has an implementation defined in gcc/c-family. However, to achieve this
is more complex than the earlier patch: we need to share the prefix_map data
structure and associated functions between libcpp and c-family. Therefore, we
need to move these to libiberty. (For comparison, the SOURCE_DATE_EPOCH patch
did not need this because time_t et. al. are in the standard C library.)

Acknowledgements


Dhole  who wrote the earlier patch for SOURCE_DATE_EPOCH
which saved me a lot of time on figuring out what to edit.

ChangeLogs
--

include/ChangeLog:

2016-11-01  Ximin Luo  

* prefix-map.h: New file, mostly derived from /gcc/final.c.

libiberty/ChangeLog:

2016-11-01  Ximin Luo  

* prefix-map.c: New file, mostly derived from /gcc/final.c.
* Makefile.in: Update for new files.

gcc/ChangeLog:

2016-11-01  Ximin Luo  

* final.c: Generalise and refactor code related to debug_prefix_map.
Move some of it to /libiberty/prefix-map.c, /include/prefix-map.h
and refactor the remaining code to use the moved-out things.
* doc/invoke.texi (Environment Variables): Update SOURCE_PREFIX_MAP to
describe how it affects __FILE__ expansion.

gcc/c-family/ChangeLog:

2016-11-01  Ximin Luo  

* c-common.c (cb_get_source_prefix_map): Define new call target.
* c-common.h (cb_get_source_prefix_map): Declare call target.
* c-lex.c (init_c_lex): Set the get_source_prefix_map callback.

libcpp/ChangeLog:

2016-11-01  Ximin Luo  

* include/cpplib.h (cpp_callbacks): Add get_source_prefix_map
callback.
* init.c (cpp_create_reader): Initialise source_prefix_map field.
* internal.h (cpp_reader): Add new field source_prefix_map.
* macro.c (_cpp_builtin_macro_text): Set the source_prefix_map field
if unset and apply it to the __FILE__ macro.

gcc/testsuite/ChangeLog:

2016-11-01  Ximin Luo  

* gcc.dg/cpp/source_prefix_map-1.c: New test.
* gcc.dg/cpp/source_prefix_map-2.c: New test.

Index: gcc-7-20161030/include/prefix-map.h
===
--- /dev/null
+++ gcc-7-20161030/include/prefix-map.h
@@ -0,0 +1,71 @@
+/* Declarations for manipulating filename prefixes.
+
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.  */
+
+#ifndef _PREFIX_MAP_H
+#define _PREFIX_MAP_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifdef HAVE_STDLIB_H
+#include 
+#endif
+
+/* Linked-list of mappings from old prefixes to new prefixes.  */
+
+struct prefix_map
+{
+  const char *old_prefix;
+  const char *new_prefix;
+  size_t old_len;
+  size_t new_len;
+  struct prefix_map *next;
+};
+
+/* Parse a single prefix-map.
+
+   The string `arg' is split at the final '=' character. The part before
+   it is used to set `map->old_prefix' and `map->old_len', and the part
+   after it is used to set `map->new_prefix' and `map->new_len'.
+
+   If `arg' does not contain a '=' then 0 is returned. Otherwise, a
+   non-zero value is returned.
+   */
+
+extern int parse_prefix_map (const char *arg, struct prefix_map *map);
+
+/* Perform mapping of filename prefixes.
+
+   Return the filename corresponding to `old_name'. The return value is
+   equal to `old_name' if no transformation occurred, else it is equal
+   to `new_name' where the new filename is stored.
+
+   On entry into this function, `new_name' must be able to hold at least
+   `(old_name - map->old_len + map->old_len + 1)' characters, where
+   `map' is the mapping that will be selected and performed.
+   */
+
+extern const char *apply_prefix_map (const char *old_name, char *new_name,
+struct prefix_map *map_head);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _PREFIX_MAP_H */
Index: gcc-7-20161030/libiberty/prefix-map.c
===
--- /dev/null
+++ gcc-7-20161030/libiberty/prefix-map.c
@

[PATCH] Generate reproducible output independently of the build-path

(Please keep me on CC, I am not subscribed)

Background
==

We are on a long journey to make build processes be able to reproduce the build
outputs independently of which filesystem path the build is being executed from
- e.g. if the executing user doesn't have root access to be able to execute it
under a standard path such as /build. This currently is making about 2k-3k [1]
packages in Debian unreproducible when build-paths are varied across builds.

[1]
https://tests.reproducible-builds.org/debian/issues/unstable/captures_build_path_issue.html

Previous attempts have involved using -fdebug-prefix-map to strip away the
prefix of an absolute path, leaving behind the part relative to the top-level
directory of the source code, which is reproducible. But this flag was itself
stored in DW_AT_producer, which makes the final output unreproducible. This was
pointed out in bug 68848 and fixed in r231835.

However, through more testing, we have found out that the fix just mentioned is
not enough to achieve reproducibility in practice. The main issue is that many
different packages like to store CFLAGS et. al. in arbitrary ways. So if we add
an explicit -fdebug-prefix-map flag to the system-level CFLAGS etc, this will
often propagate into the build result, making it again dependent on the
build-path, and not reproducible. For example:

Some packages embed compiler flags into their pkg-config files (or equivalent):
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/curl.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/perl.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/qt4-x11.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/fflas-ffpack.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/sip4.html

Other packages embed compiler flags directly into the binary:
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/singular.html
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/mutt.html

etc etc.

We think it's not appropriate to patch all (3k+) of these packages to strip out
-fdebug-prefix-map flags. This would involve adding quite complex logic to
everyone's build scripts, and we have to adapt this logic every single time to
that particular package. Also, in general CFLAGS is *supposed* to affect the
compiler output, and saving it unconditionally is quite a reasonable thing for
packages to do. If we tried to patch all of these packages, we would be turning
"reproducible builds" in to a costly opt-in feature, rather than on-by-default
that everyone can easily benefit from.

So, we believe it is better to patch GCC centrally. Our proposed solution is
similar to (a) the SOURCE_DATE_EPOCH environment variable which was previously
accepted into GCC and was used to successfully make 400+ packages reproducible,
and (b) the -fdebug-prefix-map mechanism that already exists in GCC and which
nearly but not quite, achieves at-scale build-path-independent reproducibility.

Proposal

This patch series adds a new environment variable SOURCE_PREFIX_MAP. When this
is set, GCC will treat this as an implicit "-fdebug-prefix-map=$value"
command-line argument. This makes the final binary output reproducible, and
also hides the unreproducible value (the build path prefix) from CFLAGS et. al.
which everyone is (understandably) embedding as-is into their build output.

This environment variable also acts on the __FILE__ macro, mapping it in the
same way that debug-prefix-map works for debug symbols. We have seen that
__FILE__ is also a very large source of unreproducibility, and is represented
quite heavily in the 3k+ figure given above.

Finally, we tweak the __TIMESTAMP__ macro so it honours SOURCE_DATE_EPOCH, in a
similar way to how __DATE__ and __TIME__ do so already.

More details are given in the headers of the patch files themselves.

Testing
===

I've tested these patches on a Debian testing/unstable x86_64-linux-gnu system.
So far I've only run the new tests that this patch adds, on a disable-bootstrap
build. I will do a full bootstrap and run the full testsuite over the next few
days, both with and without this patch, and report back.

I dedicate these patches to the public domain by waiving all of my rights to
the work worldwide under copyright law, including all related and neighboring
rights, to the extent allowed by law.

See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full text.

Please let me know if the above is insufficient and I will be happy to sign any
relevant forms.

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On 11/02/2016 03:20 PM, Marek Polacek wrote:
> On Wed, Nov 02, 2016 at 11:10:53AM +0100, Jakub Jelinek wrote:
>> On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote:
 Which is gimplified as:

 int * ptr;

 switch (argc) , case 1: >
 {
   int a;

   try
 {
   ASAN_MARK (2, &a, 4);
   :
   goto ;
   :
   ptr = &a;
   goto ;
 }
   finally
 {
   ASAN_MARK (1, &a, 4);
 }
>>
>>> Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
>>> statement?  Otherwise, consider this being done in a loop, after the first
>>> iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with
>>> args 1 and have in case 1: a = 0;, won't that trigger runtime error?
>>
>> Wonder if for the variables declared inside of switch body, because we don't
>> care about uses before scope, it wouldn't be more efficient to arrange for
>> int *p, *q, *r;
>> switch (x)
>>   {
>> int a;
>>   case 1:
>> p = &a;
>> a = 5;
>> break;
>> int b;
>>   case 2:
>> int c;
>> q = &b;
>> r = &c;
>> b = 3;
>> c = 4;
>> break;
>>   }
>> to effectively ASAN_MARK (2, &a, 4); ASAN_MARK (2, &b, 4); ASAN_MARK (2, &c, 
>> 4);
>> before the GIMPLE_SWITCH, instead of unpoisoning them on every case label
>> where they might be in scope.  Though, of course, at least until lower pass
>> that is quite ugly, because it would refer to "not yet declared" variables.
>> Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not
>> the expression evaluation of the switch control expression) inside of the
>> switches' GIMPLE_BIND.
> 
> So is there anything I should do wrt -Wswitch-unreachable?
> 
>   Marek
> 

Probably not. I'm having a patch puts GIMPLE_SWITCH statement to a proper place
in GIMPLE_BIND. Let's see whether such patch can bootstrap and survive 
regression
tests.

Thanks,
Martin

[PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those

Hi!

This is a merge of my patch from yesterday, Jason's incremental patch to
that and parts of Alex' patch from Oct 19.

It uses Alex' LANG_HOOKS_GET_PTRMEMFN_TYPE langhook.  I've tried
to think about https://gcc.gnu.org/ml/gcc-patches/2011-05/msg00227.html
and we even have such a langhook now, modified_type_die
uses lang_hooks.types.get_debug_type, but
1) it is just called in modified_type_die and not in
   gen_type_die_with_usage, that looks weird
2) it is used for something Ada-ish I really don't know how to test etc.
   to be able to find out if it is safe to call it in
   gen_type_die_with_usage too
3) most importantly, if the C++ version of this langhook would create
   OFFSET_TYPE on the fly, I don't know how to ensure effective sharing
   of DW_TAG_ptr_to_member_type nodes with the same DW_AT_type
   and DW_AT_containing_type; unless the C++ langhook adds some extra
   hash table that caches already created OFFSET_TYPEs or something similar,
   it would create a new OFFSET_TYPE each time it is called

Also, I really don't know how well does GDB (especially older releases)
handle DW_TAG_ptr_to_member_type for PMF, so the patch wraps that currently
with if (dwarf_version >= 5).  Quick grep revealed that GDB has code to
handle the __pfn/__delta fields.  So, can I ask somebody from the GDB
team to test this patch with that if (dwarf_version >= 5) replaced
with if (1) and see if it works properly with current GDB as well as say
4-5 years old one (e.g. with -gdwarf-2 or -gdwarf-3)?  If yes, we
should emit it unconditionally.

Bootstrapped/regtested on x86_64-linux and i686-linux, on the new ref-3.C
testcase emits the number of DIEs I really expect to.  Ok for trunk?

2016-11-02  Jakub Jelinek  
Alexandre Oliva  
Jason Merrill  

PR debug/28767
PR debug/56974
* langhooks.h (struct lang_hooks_for_types): Document type_hash_eq
being also called on METHOD_TYPEs.  Add type_dwarf_attribute and
get_ptrmemfn_type langhooks.
* langhooks.c (lhd_type_dwarf_attribute): New function.
* langhooks-def.h (lhd_type_dwarf_attribute): Declare.
(LANG_HOOKS_TYPE_DWARF_ATTRIBUTE, LANG_HOOKS_GET_PTRMEMFN_TYPE):
Define.
(LANG_HOOKS_FOR_TYPES_INITIALIZER): Add
LANG_HOOKS_TYPE_DWARF_ATTRIBUTE and LANG_HOOKS_GET_PTRMEMFN_TYPE.
* hooks.h (hook_tree_const_tree_int_null): Declare.
* hooks.c (hook_tree_const_tree_int_null): New function.
* tree.h (check_lang_type): Declare.
* tree.c (check_lang_type): New function.
(check_qualified_type, check_aligned_type): Call it.
* dwarf2out.c (modified_type_die): Don't use type_main_variant
for FUNCTION_TYPE or METHOD_TYPE, instead walk over variants with
check_base_type and check_lang_type.
(gen_ptr_to_mbr_type_die): Add class_type and member_type arguments,
allow type to be something other than OFFSET_TYPE, if lookup_type_die
is already non-NULL, return early.  For OFFSET_TYPE add
DW_AT_use_location attribute.
(gen_subroutine_type_die): Add DW_AT_{,rvalue_}reference attribute
if needed.
(gen_type_die_with_usage): Don't use type_main_variant
for FUNCTION_TYPE or METHOD_TYPE, instead walk over variants with
check_base_type and check_lang_type.  Formatting fixes.
Adjust gen_ptr_to_mbr_type_die caller.  Handle PMF.
cp/
* tree.c (cp_check_qualified_type): Use check_base_type and
TYPE_QUALS comparison instead of check_qualified_type.
(cxx_type_hash_eq): Return false if type_memfn_rqual don't match.
* cp-objcp-common.c (cp_get_ptrmemfn_type): New function.
(cp_decl_dwarf_attribute): Don't handle types here.
(cp_type_dwarf_attribute): New function.
* cp-objcp-common.h (cp_get_ptrmemfn_type, cp_type_dwarf_attribute):
Declare.
(LANG_HOOKS_GET_PTRMEMFN_TYPE, LANG_HOOKS_TYPE_DWARF_ATTRIBUTE):
Define.
testsuite/
* g++.dg/debug/dwarf2/ptrdmem-1.C: New test.
* g++.dg/debug/dwarf2/ref-3.C: New test.
* g++.dg/debug/dwarf2/refqual-1.C: New test.
* g++.dg/debug/dwarf2/refqual-2.C: New test.

--- gcc/langhooks.h.jj  2016-11-01 15:18:44.994506161 +0100
+++ gcc/langhooks.h 2016-11-02 11:43:51.905046755 +0100
@@ -120,7 +120,7 @@ struct lang_hooks_for_types
   /* Return TRUE if TYPE1 and TYPE2 are identical for type hashing purposes.
  Called only after doing all language independent checks.
  At present, this function is only called when both TYPE1 and TYPE2 are
- FUNCTION_TYPEs.  */
+ FUNCTION_TYPE or METHOD_TYPE.  */
   bool (*type_hash_eq) (const_tree, const_tree);
 
   /* Return TRUE if TYPE uses a hidden descriptor and fills in information
@@ -162,6 +162,15 @@ struct lang_hooks_for_types
  for the debugger about scale factor, etc.  */
   bool (*get_fixed_point_type_info) (const_tree,
 struct

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 02, 2016 at 03:27:42PM +0100, Martin Liška wrote:
> > So is there anything I should do wrt -Wswitch-unreachable?
> > 
> > Marek
> > 
> 
> Probably not. I'm having a patch puts GIMPLE_SWITCH statement to a proper 
> place
> in GIMPLE_BIND. Let's see whether such patch can bootstrap and survive 
> regression
> tests.

Please do that only for -fsanitize-use-after-scope, it will likely affect at
least for -O0 the debugging experience.  For -O0 -fsanitize=address 
-fsanitize-use-after-scope
perhaps we could arrange for some extra stmt to have the locus of the
switch (where we still don't want the vars to appear in scope) and then
have no locus on the ASAN_MARK and actual GIMPLE_SWITCH or something
similar.

Jakub

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On 11/02/2016 02:16 PM, Richard Biener wrote:
> On Wed, Nov 2, 2016 at 2:06 PM, Jakub Jelinek  wrote:
>> On Wed, Nov 02, 2016 at 01:59:00PM +0100, Richard Biener wrote:
 Yeah, that is what I meant.  The issue is how to report uses of such
 SSA_NAME when there is no memory.  So, either we'd need a special runtime
 library entrypoint that would report uses after scope even when there is no
 underlying memory, or we'd need to force it at asan pass time into memory 
 again.
>>>
>>> Well, there can't be any uses outside the scope -- there are no (memory) 
>>> uses
>>> left if we rewrite the thing into SSA.  That is, the address can no
>>> longer "escape".
>>>
>>> Of course there could have been invalid uses before the rewrite into SSA.  
>>> But
>>> those can be diagnosed either immediately before or after re-writing into 
>>> SSA
>>> at compile-time (may be in dead code regions of course).
>>
>> Sure, we can warn on those at compile time, but we really should arrange to
>> error on those at runtime if they are ever executed, the UB happens only at
>> runtime, so in dead code isn't fatal.
> 
> Then we can replace those uses with a call into the asan runtime diagnosing 
> the
> issue instead?
> 
> Richard.
> 
>> Jakub

OK, thanks for the clarification, it's more clear to me. So we want to consider 
for
SSA transformation of ASAN_MARK only is_gimple_reg_types. I'm having a 
test-case where
it converts:
foo ()
{
  char a;
  char * p;
  char _1;
  int _2;
  int _8;
  int _9;

  :
  ASAN_MARK (2, &a, 1);
  a = 0;
  p_6 = &a;
  ASAN_MARK (1, &a, 1);
  _1 = *p_6;
  if (_1 != 0)
goto ;
  else
goto ;

  :
  _9 = 1;
  goto ;

  :
  _8 = 0;

  :
  # _2 = PHI <_9(3), _8(4)>
  return _2;

}

to:

foo ()
{
  char a;
  char * p;
  char _1;
  int _2;

  :
  a_10 = 0;
  a_12 = ASAN_POISON ();
  _1 = a_12;
  if (_1 != 0)
goto ;
  else
goto ;

  :

  :
  # _2 = PHI <1(2), 0(3)>
  return _2;

}

and probably the last goal is to convert the newly added internal fn to a 
runtime call.
Hope sanopt pass is the right place where to it?

Thanks,
Martin

Re: [PATCH 0/3] use rtx_insn * more

2016-11-02 Thread David Malcolm

On Wed, 2016-11-02 at 00:05 -0400, Trevor Saunders wrote:
> On Mon, Oct 31, 2016 at 07:37:54AM -0600, Jeff Law wrote:
> > On 10/28/2016 01:13 PM, tbsaunde+...@tbsaunde.org wrote:
> > > From: Trevor Saunders 
> > > 
> > > HI,
> > > 
> > > This series changes various variables type from rtx to rtx_insn *
> > > so that the
> > > remaining patches in this series
> > > http://gcc.gnu.org/ml/gcc-patches/2016-10/msg01353.html can be
> > > applied.
> > > 
> > > patches bootstrapped and regtested on x86_64-linux-gnu, and run
> > > through config-list.mk, ok?
> > > 
> > > Thanks!
> > > 
> > > Trev
> > > 
> > > Trevor Saunders (3):
> > >   use rtx_insn * in various places where it is obvious
> > >   split up the trial variable in reorg.c:relax_delay_slots to use
> > > rtx_insn * more
> > >   split up some variables to use rtx_insn * more
> > All 3 patches in this series are fine.
> 
> Thanks for the reviews.
> 
> Can I trouble you to look at 
> http://gcc.gnu.org/ml/gcc-patches/2016-10/
> now that it is tested on aarch64?

I think you're missing the full URL here; the above URL is the index
for all of October.

Did you mean this patch:
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01358.html
?

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On Wed, Nov 02, 2016 at 03:38:25PM +0100, Martin Liška wrote:
> it converts:
> foo ()
> {
>   char a;
>   char * p;
>   char _1;
>   int _2;
>   int _8;
>   int _9;
> 
>   :
>   ASAN_MARK (2, &a, 1);
>   a = 0;
>   p_6 = &a;
>   ASAN_MARK (1, &a, 1);
>   _1 = *p_6;

You shouldn't convert if a is addressable (when ignoring &a in ASAN_MARK
calls).  Only if there is &a just in ASAN_MARK and MEM_REF, you can convert.

> to:
> 
> foo ()
> {
>   char a;
>   char * p;
>   char _1;
>   int _2;
> 
>   :
>   a_10 = 0;
>   a_12 = ASAN_POISON ();
>   _1 = a_12;
>   if (_1 != 0)
> goto ;
>   else
> goto ;
> 
>   :
> 
>   :
>   # _2 = PHI <1(2), 0(3)>
>   return _2;
> 
> }
> 
> and probably the last goal is to convert the newly added internal fn to a 
> runtime call.
> Hope sanopt pass is the right place where to it?

If ASAN_POISON is ECF_CONST and has any uses during sanopt, perhaps best
would be to add an artificial variable you give the same name as the
underlying var of the SSA_NAME (and alignment, locus etc.) and poison it
right away (keep unpoisoning only to the function epilogue) and then
ASAN_CHECK replace all uses of that SSA_NAME with ASAN_CHECK + use of
(D) SSA_NAME.

Jakub

[PATCH] Fix for big-endian gcc.c-torture/execute/pr55750.c


Hi all,

I noticed that my patch for PR tree-optimization/78170 broke 
execute.exp=pr55750.c on big-endian.
The problem with that patch is that we should not forget to clear the padding 
bits in the temporary
buffer when merging values even when they are less than a byte.

Tested on aarch64_be-none-elf (the test failure goes away)
Bootstrap and test on x86_64 ongoing.
Ok for trunk if successful?

David, does this patch allow AIX bootstrap to proceed by any chance?

Thanks,
Kyrill

2016-11-02  Kyrylo Tkachov  

* gimple-ssa-store-merging.c (encode_tree_to_bitpos): Don't forget to
clear padding bits even when they're less than a byte.
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 081620e50f603e2de8ed962aec6c619890ce1e33..db3c3c14a5b8938024db8bed80145c77d29396ac 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -430,7 +430,8 @@ encode_tree_to_bitpos (tree expr, unsigned char *ptr, int bitlen, int bitpos,
  contain a sign bit due to sign-extension).  */
   unsigned int padding
 = byte_size - ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT - 1;
-  if (padding != 0)
+  if (padding != 0
+  || bitlen % BITS_PER_UNIT != 0)
 {
   /* On big-endian the padding is at the 'front' so just skip the initial
 	 bytes.  */

[PATCH] Fix PR78185


The following fixes PR78185 by properly honoring possibly infinite child
loops when computing what blocks are always executed during loop invariant
motion.  Such loops behave as if the loop would exit at this point.

Both GIMPLE and RTL level passes have that very same issue and the 
following fixes the GIMPLE one and simply disables hoisting of possibly
trapping or faulting instructions in the RTL pass ... Eric seems to
remember this might regress gzip so I'm going to put this on one of
our SPEC testers for tonights run as well.  Maybe somebody else
wants to check the performance impact (I'm interested in both,
GIMPLE and RTL change fallout for performance).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

If all goes well I'll followup with the obvious removal of no longer
needed stuff in loop-invariant.c.

Another variant for RTL would be to simply treat all edges entering
a child loop (not only those entering possibly infinitely looping ones)
as an exit.  I think finite_loop_p has no RTL level variant (yet).

Richard.

2016-11-02  Richard Biener  

PR middle-end/78185
* loop-invariant.c (find_invariant_insn): Never hoist trapping or
faulting instructions.
* tree-ssa-loop-im.c: Include tree-ssa-loop-niter.h.
(fill_always_executed_in_1): Honor infinite child loops.

* gcc.dg/pr78185.c: New testcase.

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 551103f..deb5be6 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1076,7 +1076,7 @@ pre_check_invariant_p (bool simple, rtx dest)
unless the program ends due to a function call.  */
 
 static void
-find_invariant_insn (rtx_insn *insn, bool always_reached, bool always_executed)
+find_invariant_insn (rtx_insn *insn, bool, bool always_executed)
 {
   df_ref ref;
   struct def *def;
@@ -1108,8 +1108,8 @@ find_invariant_insn (rtx_insn *insn, bool always_reached, 
bool always_executed)
   if (can_throw_internal (insn))
 return;
 
-  /* We cannot make trapping insn executed, unless it was executed before.  */
-  if (may_trap_or_fault_p (PATTERN (insn)) && !always_reached)
+  /* We cannot make trapping insn executed.  */
+  if (may_trap_or_fault_p (PATTERN (insn)))
 return;
 
   depends_on = BITMAP_ALLOC (NULL);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 463db04..0524e57 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans-mem.h"
 #include "gimple-fold.h"
 #include "tree-scalar-evolution.h"
+#include "tree-ssa-loop-niter.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -2369,8 +2370,16 @@ fill_always_executed_in_1 (struct loop *loop, sbitmap 
contains_call)
break;
 
  FOR_EACH_EDGE (e, ei, bb->succs)
-   if (!flow_bb_inside_loop_p (loop, e->dest))
- break;
+   {
+ /* If there is an exit from this BB.  */
+ if (!flow_bb_inside_loop_p (loop, e->dest))
+   break;
+ /* Or we enter a possibly non-finite loop.  */
+ if (flow_loop_nested_p (bb->loop_father,
+ e->dest->loop_father)
+ && ! finite_loop_p (e->dest->loop_father))
+   break;
+   }
  if (e)
break;
 
Index: trunk/gcc/testsuite/gcc.dg/pr78185.c
===
--- trunk/gcc/testsuite/gcc.dg/pr78185.c(revision 0)
+++ trunk/gcc/testsuite/gcc.dg/pr78185.c(revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do run { target *-*-linux* *-*-gnu* } } */
+/* { dg-options "-O" } */
+
+#include 
+#include 
+#include 
+
+static char var1 = 0L;
+static char *var2 = &var1;
+
+void do_exit (int i)
+{
+  exit (0);
+}
+
+int main(void)
+{
+  struct sigaction s;
+  sigemptyset (&s.sa_mask);
+  s.sa_handler = do_exit;
+  s.sa_flags = 0;
+  sigaction (SIGALRM, &s, NULL);
+  alarm (1);
+  /* The following loop is infinite, the division by zero should not
+ be hoisted out of it.  */
+  for (; (var1 == 0 ? 0 : (100 / var1)) == *var2; );
+  return 0;
+}

Re: [PATCH] Fix for big-endian gcc.c-torture/execute/pr55750.c

On Wed, 2 Nov 2016, Kyrill Tkachov wrote:

> Hi all,
> 
> I noticed that my patch for PR tree-optimization/78170 broke
> execute.exp=pr55750.c on big-endian.
> The problem with that patch is that we should not forget to clear the padding
> bits in the temporary
> buffer when merging values even when they are less than a byte.
> 
> Tested on aarch64_be-none-elf (the test failure goes away)
> Bootstrap and test on x86_64 ongoing.
> Ok for trunk if successful?

Ok.

Thanks,
Richard.

> David, does this patch allow AIX bootstrap to proceed by any chance?

Crossing fingers ;)

> Thanks,
> Kyrill
> 
> 2016-11-02  Kyrylo Tkachov  
> 
> * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Don't forget to
> clear padding bits even when they're less than a byte.
>

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

On 11/02/2016 03:51 PM, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 03:38:25PM +0100, Martin Liška wrote:
>> it converts:
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>   int _8;
>>   int _9;
>>
>>   :
>>   ASAN_MARK (2, &a, 1);
>>   a = 0;
>>   p_6 = &a;
>>   ASAN_MARK (1, &a, 1);
>>   _1 = *p_6;
> 
> You shouldn't convert if a is addressable (when ignoring &a in ASAN_MARK
> calls).  Only if there is &a just in ASAN_MARK and MEM_REF, you can convert.

Sure, which should be done in execute_update_addresses_taken via 
gimple_ior_addresses_taken.

> 
>> to:
>>
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>
>>   :
>>   a_10 = 0;
>>   a_12 = ASAN_POISON ();
>>   _1 = a_12;
>>   if (_1 != 0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # _2 = PHI <1(2), 0(3)>
>>   return _2;
>>
>> }
>>
>> and probably the last goal is to convert the newly added internal fn to a 
>> runtime call.
>> Hope sanopt pass is the right place where to it?
> 
> If ASAN_POISON is ECF_CONST and has any uses during sanopt, perhaps best
> would be to add an artificial variable you give the same name as the
> underlying var of the SSA_NAME (and alignment, locus etc.) and poison it
> right away (keep unpoisoning only to the function epilogue) and then
> ASAN_CHECK replace all uses of that SSA_NAME with ASAN_CHECK + use of
> (D) SSA_NAME.

When I create an ASAN_POISON call in execute_update_addresses_taken, there 
would not
be any ASAN_CHECK generated as it's going to be rewritten to SSA form (like the 
previous
sample I sent).

I like the idea of having a parallel variable, which can be poisoned at the 
very beginning of
a function. Whenever we have a use of the SSA_NAME (like a_12 = ASAN_POISON 
()), we can simply
insert BUILT_IN_ASAN_REPORT_LOADx(¶llel_variable) statement. No change 
would be necessary
for ASAN runtime in such case.

Will it work?
Thanks,
Martin

> 
>   Jakub
>

Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

2016-11-02 Thread Segher Boessenkool

On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote:
> >> I don't believe it needs a cleanup on every iteration. One cleanup at
> >> the end should work fine.
> >
> > But as the comment there says:
> >
> >   /* Merge the duplicated blocks into predecessors, when possible.  */
> >   cleanup_cfg (0);
> >
> > (this is not a new comment), and without merging blocks this whole
> > patch does zilch.
> >
> > There is no need to create new basic blocks at all (can replace the
> > final branch in a block with a copy of the whole block it jumps to,
> > instead); and then it is painfully obvious that switching to layout
> > mode here is pointless, too.
> >
> > Do you want me to do that?
> >
> > Btw, this isn't quadratic anyway; it is a constant number (the maximal
> > length allowed of the block to copy) linear.  Unless there are unboundedly
> > many forwarder blocks, which there shouldn't be, but I cannot prove that.
> 
> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) that
> walk the whole function.  So unless the number of iterations is bound
> with a constant I call this quadratic (in function size).

It is bounded (with that caveat above): new candidates will be bigger than
the block merged into it, so there won't be more than
PARAM_MAX_GOTO_DUPLICATION_INSNS passes.

But I can make it all work simpler, in non-layout mode, if you prefer.


Segher

Re: [PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those

On Wed, Nov 2, 2016 at 10:31 AM, Jakub Jelinek  wrote:
> It uses Alex' LANG_HOOKS_GET_PTRMEMFN_TYPE langhook.  I've tried
> to think about https://gcc.gnu.org/ml/gcc-patches/2011-05/msg00227.html
> and we even have such a langhook now, modified_type_die
> uses lang_hooks.types.get_debug_type, but
> 1) it is just called in modified_type_die and not in
>gen_type_die_with_usage, that looks weird

How much of a problem is that?  modified_type_die calls gen_type_die,
does that not cover the cases needed here?

> 2) it is used for something Ada-ish I really don't know how to test etc.
>to be able to find out if it is safe to call it in
>gen_type_die_with_usage too

You could find an Ada test that uses the code and verify that the
output stays the same?

> 3) most importantly, if the C++ version of this langhook would create
>OFFSET_TYPE on the fly, I don't know how to ensure effective sharing
>of DW_TAG_ptr_to_member_type nodes with the same DW_AT_type
>and DW_AT_containing_type; unless the C++ langhook adds some extra
>hash table that caches already created OFFSET_TYPEs or something similar,
>it would create a new OFFSET_TYPE each time it is called

build_offset_type already uses a hash table.

> Also, I really don't know how well does GDB (especially older releases)
> handle DW_TAG_ptr_to_member_type for PMF, so the patch wraps that currently
> with if (dwarf_version >= 5).  Quick grep revealed that GDB has code to
> handle the __pfn/__delta fields.  So, can I ask somebody from the GDB
> team to test this patch with that if (dwarf_version >= 5) replaced
> with if (1) and see if it works properly with current GDB as well as say
> 4-5 years old one (e.g. with -gdwarf-2 or -gdwarf-3)?  If yes, we
> should emit it unconditionally.

This all makes sense to me.

> +  if (dwarf_version >= 5)
> +   {
> + tree class_type = lang_hooks.types.get_ptrmemfn_type (type, 0);
> + if (class_type != NULL_TREE)

This can be

if (dwarf_version >= 5)
  if (tree class_type = lang_hooks.types.get_ptrmemfn_type (type, 0))

Jason

Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module

OK.

On Wed, Nov 2, 2016 at 10:18 AM, Jiong Wang  wrote:
> On 02/11/16 13:42, Jakub Jelinek wrote:
>>
>> On Wed, Nov 02, 2016 at 01:26:48PM +, Jiong Wang wrote:
>>>
>>> -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION
>>> note. */
>>> +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION
>>> note.  */
>>
>>
>> Too long line.
>
>
> Hmm, it shows 80 columns under my editor.  I guess '+' is counted in?
>
>>
>>> +/* RTL sequences inside PARALLEL are raw expression representation.
>>> +
>>> +   mem_loc_descriptor can be used to build generic DWARF expressions
>>> for
>>> +   DW_CFA_expression and DW_CFA_val_expression where the expression
>>> may can
>>> +   not be represented using normal RTL sequences.  In this case,
>>> group all
>>> +   expression operations (DW_OP_*) inside a PARALLEL.  For those
>>> DW_OP which
>>> +   doesn't have RTL mapping, wrap it using UNSPEC.  The logic for
>>> parsing
>>> +   PARALLEL sequences is:
>>> +
>>> +   foreach elem inside PARALLEL
>>> + if (elem is UNSPEC)
>>> +   dw_op =  XINT (elem, 1) (DWARF operation is kept as UNSPEC
>>> number)
>>> +   oprnd1 = XVECEXP (elem, 0, 0)
>>> +   oprnd2 = XVECEXP (elem, 0, 1)
>>> + else
>>> +   call mem_loc_descriptor  */
>>
>>
>> Not sure if it is a good idea to document in weirdly formatted
>> pseudo-language what the code actually does a few lines below.  IMHO
>> either
>> express it in words, or don't express it at all.
>
>
> OK, fixed. I replaced these comments as some brief words.
>
>>
>>> +   exp_result =
>>> + new_loc_descr ((enum dwarf_location_atom) dw_op,
>>> oprnd1,
>>> +oprnd2);
>>
>>
>> Wrong formatting, = should be on the next line.
>>
>>> + }
>>> +   else
>>> + exp_result =
>>> +   mem_loc_descriptor (elem, mode, mem_mode,
>>> +   VAR_INIT_STATUS_INITIALIZED);
>>
>>
>> Likewise.
>
>
> Both fixed. Patch updated, please review.
>
>
> Thanks.
>
> gcc/
> 2016-11-02  Jiong Wang  
>
> * reg-notes.def (CFA_VAL_EXPRESSION): New entry.
> * dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New
> function.
> (dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION.
> (output_cfa_loc): Support DW_CFA_val_expression.
> (output_cfa_loc_raw): Likewise.
> (output_cfi): Likewise.
> (output_cfi_directive): Likewise.
> * dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression.
> (dw_cfi_oprnd2_desc): Likewise.
> (mem_loc_descriptor): Recognize new pattern generated for value
> expression.
>

[PATCH][GCC][testsuite] Fix failing vminnm/vmaxnm tests for ARM

2016-11-02 Thread Tamar Christina

Hi all,

This fixes the ARM failures in the testsuite.
Previously these tests were gated on if ARMv8-a
support was available in the compiler, not if the hardware
was an ARMv8-a hardware.

This thus resulted in correct code, but wouldn't run on
any other hardware.

Ran regression tests on aarch64-none-linux-gnu, arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Tamar

gcc/testsuite/

2016-11-01  Tamar Christina  

* gcc.target/arm/simd/vmaxnm_f32_1.c (dg-require-effective-target):
Check for arm_v8_neon_hw.
* gcc.target/arm/simd/vmaxnmq_f32_1.c (dg-require-effective-target):
Likewise.
* gcc.target/arm/simd/vminnm_f32_1.c (dg-require-effective-target):
Likewise.
* gcc.target/arm/simd/vminnmq_f32_1.c(dg-require-effective-target):
Likewise.

gcc.arm-fix-vminnm-vmaxnm-tests
Description: gcc.arm-fix-vminnm-vmaxnm-tests

Re: [PATCH] combine: Improve change_zero_ext (fixes PR71847)

2016-11-02 Thread Bin.Cheng

On Tue, Oct 25, 2016 at 11:40 AM, Segher Boessenkool
 wrote:
> This improves a few things in change_zero_ext.  Firstly, it should use
> the passed in pattern in recog_for_combine, not the pattern of the insn
> (they are not the same if the whole pattern was replaced).  Secondly,
> it handled zero_ext of a subreg, but with hard registers we do not get
> a subreg, instead the mode of the reg is changed.  So this handles that.
> Thirdly, after changing a zero_ext to an AND, the resulting RTL may become
> non-canonical, like (ior (ashift ..) (and ..)); the AND should be first,
> it is commutative.  And lastly, zero_extract as a set_dest wasn't handled
> at all, but now it is.
>
> This fixes the testcase in PR71847, and improves code generation in some
> other edge cases too.
>
> Tested on powerpc64-linux {-m32,-m64}; I'll also test it on x86 and
> will build lots of crosses before committing.

Hi Segher,
This causes failure @ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78186
It includes how to report it, could you please have a look?  Thanks.

Thanks,
Bin

[PATCH v2] AIX visibility

2016-11-02 Thread David Edelsohn

This revised patch makes two changes:

1) Fix typo in configure.ac
2) Add AIX visibility support for ASM_WEAKEN_DECL, which does touch
the same code as Linux.

The AIX "weak" support fixes a large number of C++ visibility testcases.

Bootstrapped on powerpc-ibm-aix7.2.0.0.

* configure.ac (.hidden): Change to conftest_s string. Provide string
for AIX assembler.
(gcc_cv_ld_hidden): Yes for AIX.
* configure: Regenerate.

* dwarf2asm.c (USE_LINKONCE_INDIRECT): Don't set for AIX (XCOFF).

* config/rs6000/rs6000-protos.h (rs6000_asm_weaken_decl): Declare
(rs6000_xcoff_asm_output_aligned_decl_common): Declare.
* config/rs6000/xcoff.h (TARGET_ASM_GLOBALIZE_DECL_NAME): Define.
(ASM_OUTPUT_ALIGNED_DECL_COMMON): Define.
(ASM_OUTPUT_ALIGNED_COMMON): Delete.
* config/rs6000/rs6000.c (rs6000_init_builtins): Change clog rename
from #if to if.
(rs6000_xcoff_visibility): New.
(rs6000_xcoff_declare_function_name): Add visibility support.
(rs6000_xcoff_asm_globalize_decl_name): New.
(rs6000_xcoff_asm_output_aligned_decl_common): New.
(rs6000_asm_weaken_decl): New.
(rs6000_code_end): Disable HIDDEN_LINKONCE on XCOFF.
config/rs6000/rs6000.h (ASM_WEAKEN_DECL): Change definition to
reference function.

dwarf2asm.c okay?

Any comments on ASM_WEAKEN_DECL change?

Thanks, David


ZZ
Description: Binary data

Re: [PATCH][GCC][testsuite] Fix failing vminnm/vmaxnm tests for ARM



On 02/11/16 15:38, Tamar Christina wrote:

Hi all,

This fixes the ARM failures in the testsuite.
Previously these tests were gated on if ARMv8-a
support was available in the compiler, not if the hardware
was an ARMv8-a hardware.

This thus resulted in correct code, but wouldn't run on
any other hardware.

Ran regression tests on aarch64-none-linux-gnu, arm-none-linux-gnueabihf.

Ok for trunk?


Looks ok to me too.
Kyrill


Thanks,
Tamar

gcc/testsuite/

2016-11-01  Tamar Christina  

* gcc.target/arm/simd/vmaxnm_f32_1.c (dg-require-effective-target):
Check for arm_v8_neon_hw.
* gcc.target/arm/simd/vmaxnmq_f32_1.c (dg-require-effective-target):
Likewise.
* gcc.target/arm/simd/vminnm_f32_1.c (dg-require-effective-target):
Likewise.
* gcc.target/arm/simd/vminnmq_f32_1.c(dg-require-effective-target):
Likewise.

Re: [PATCH] enhance buffer overflow warnings (and c/53562)

2016-11-02 Thread Martin Sebor


On 11/02/2016 01:37 AM, Jakub Jelinek wrote:

On Tue, Nov 01, 2016 at 08:55:03PM -0600, Martin Sebor wrote:

struct S {
  int a, b, c, d;
};

#define bos(p, t) __builtin_object_size (p, t)
#define memset0(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 0))
#define memset1(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 1))

void f0 (struct S *s)
{
  memset0 (&s->d, 0, 1024);   // no warning here (bos 0)
}


But we do not want the warning here, there is nothing wrong on it.
The caller may be

void
bar (void)
{
  struct T { struct S header; char payload[1024 - offsetof (struct S, d)]; } t;
  initialize (&t);
  f0 (&t.header);
}

and the callee might rely on that.


Sure, they might and in that case the warning would be a false
positive.  It wouldn't be the first such warning that wasn't 100%
free of them.  But my testing with Binutils, GCC, and the Linux
kernel has exposed only 10 instances of new warnings and I don't
think I saw this idiom among them.  But even if some were, or if
all of them were false positives I think that would be well within
the acceptable rates.  Here are the numbers of warnings for
Binutils and the kernel:

113   -Wimplicit-fallthrough
 38   -Wformat-length=
 12   -Wunused-const-variable=
 10   -Wstringop-overflow
  2   -Wdangling-else
  2   -Wframe-address
  2   -Wint-in-bool-context
  1   -Wbool-operation


Using some header structure at the
beginning and then conditionally on fields in that structure various
payloads occurs in many projects, starting with glibc, gcc, Linux kernel,
... The warning really must not be detached from reality.


That's an unfair assertion in light of the numbers above.


If you want a warning for suspicious calls, sure, but
1) it has to be clearly worded significantly differently from how do you
   word it, so that users really understand you are warning about
   suspicious code (though, I really believe it is very common and there
   is really nothing the users can do about it)
2) it really shouldn't be enabled in -Wall, and I think not even in -Wextra


As you have argued yourself recently in our discussion of
-Wimplicit-fallthrough, warnings that aren't enabled by either
of these options don't generally benefit users because very few
turn them on explicitly.  I agree with that argument although I
would be in favor of rolling out a new warning gradually over
two or more releases if it were known to be prone to high rates
of false positive. The -Wstringop-overflow warning clearly isn't
in that category so there's no need for it.  My offer to do
additional testing is still good if you'd like to see more data.

As for the wording, I welcome suggestions for improvements.

Martin

[PATCH, Fortran] Extension: Support legacy PARAMETER statements with -std=legacy (or -fdec?)

2016-11-02 Thread Fritz Reese

All,

Another quirk of legacy compilers is their syntax for PARAMETER
statements. Such statements are similar to standard PARAMETER
statements but lack parentheses following the PARAMETER keyword. There
is a good reason the standard doesn't support this - because the
statement becomes ambiguous with assignment statements in fixed form.
Consider the following:

parameter pi = 3.14d0

In fixed form, spaces are ignored so in standard Fortran the above
looks like an assignment to the variable "parameterpi". In legacy
compilers, the above is instead interpreted as

parameter (pi = 3.14d0)

which of course declares the variable 'pi' to be a parameter with the
value 3.14.

Attached is a patch for the GNU Fortran front-end which allows the
compiler to interpret these legacy PARAMETER statements. The patch in
its current form does this without warning through -std=legacy
(warning for -std=gnu, error for -std=f*). Bootstraps and regtests on
x86_64-redhat-linux.

However, note that this would change by default the compiler's
interpretation of fixed-form variables starting with the string
"parameter", if any such cases existed in real code. IMO fixed form
code is isomorphic to legacy code, so I imagine most users writing
fixed-form/legacy code would intend for a legacy PARAMETER statement,
rather than assignment to variable PARAMETERPI, when writing such a
statement.

Beyond compatibility, one motivation for acceptance/recognizance of
these statements in GNU Fortran is the counterintuitive compile errors
currently seen when compiling legacy code which uses this type of
statement. Because the legacy PARAMETER statement is currently
interpreted as an assignment in fixed-form, and parameter statements
often precede other specification statements in real code, GNU Fortran
complains not about the parameter statement itself but often the
following line with "Unexpected data declaration" or similar. This
serves to confuse the user. An example of this can be seen by
compiling the attached dec_parameter_1.f in fixed form with the
current build of GNU Fortran.

I am amenable to only enabling support for legacy PARAMETER statements
through -fdec, so that the default (-std=gnu) behavior is unchanged
for such cases in fixed form. But this would leave the esoteric
"Unexpected data declaration statement" errors for legacy code without
-fdec. The case would be difficult to detect and work around
specifically since the entire parameter "assignment" statement is
eaten by the time the error comes about.

What do you think?

---
Fritz Reese

Author: Fritz O. Reese 
Date:   Tue Nov 1 12:26:57 2016 -0400

Support legacy PARAMETER statements with -std=legacy.

gcc/fortran/
* decl.c (gfc_match_parameter): Allow omitted '()' with -std=legacy.
* parse.c (decode_statement): Match "parmeter" before assignments.
* gfortran.texi: Document.

gcc/testsuite/gfortran.dg/
* dec_parameter_1.f: New test.
* dec_parameter_2.f90: Likewise.
* dec_parameter_3.f90: Likewise.
* dec_parameter_4.f90: Likewise.

 gcc/fortran/decl.c|   10 +++-
 gcc/fortran/gfortran.texi |   16 ++
 gcc/fortran/parse.c   |4 +-
 gcc/testsuite/gfortran.dg/dec_parameter_1.f   |   64 +
 gcc/testsuite/gfortran.dg/dec_parameter_2.f90 |   63 
 gcc/testsuite/gfortran.dg/dec_parameter_3.f90 |   13 +
 gcc/testsuite/gfortran.dg/dec_parameter_4.f90 |   13 +
 7 files changed, 180 insertions(+), 3 deletions(-)
diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index f18eb41..0120ceb 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -7821,10 +7821,16 @@ cleanup:
 match
 gfc_match_parameter (void)
 {
+  const char *term = " )%t";
   match m;
 
   if (gfc_match_char ('(') == MATCH_NO)
-return MATCH_NO;
+{
+  /* With legacy PARAMETER statements, don't expect a terminating ')'.  */
+  if (!gfc_notify_std (GFC_STD_LEGACY, "PARAMETER without '()' at %C"))
+   return MATCH_NO;
+  term = " %t";
+}
 
   for (;;)
 {
@@ -7832,7 +7838,7 @@ gfc_match_parameter (void)
   if (m != MATCH_YES)
break;
 
-  if (gfc_match (" )%t") == MATCH_YES)
+  if (gfc_match (term) == MATCH_YES)
break;
 
   if (gfc_match_char (',') != MATCH_YES)
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 85ab31b..1d60551 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -1472,6 +1472,7 @@ compatibility extensions along with those enabled by 
@option{-std=legacy}.
 * Bitwise logical operators::
 * Extended I/O specifiers::
 * Default exponents::
+* Legacy PARAMETER statements::
 @end menu
 
 @node Old-style kind specifications
@@ -2705,6 +2706,21 @@ For compatibility, GNU Fortran supports a default 
exponent of zero in real
 constants with @option{-fdec}.  For example, @code{9e} would be
 interpret

Re: [PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those

On Wed, Nov 02, 2016 at 11:31:25AM -0400, Jason Merrill wrote:
> On Wed, Nov 2, 2016 at 10:31 AM, Jakub Jelinek  wrote:
> > It uses Alex' LANG_HOOKS_GET_PTRMEMFN_TYPE langhook.  I've tried
> > to think about https://gcc.gnu.org/ml/gcc-patches/2011-05/msg00227.html
> > and we even have such a langhook now, modified_type_die
> > uses lang_hooks.types.get_debug_type, but
> > 1) it is just called in modified_type_die and not in
> >gen_type_die_with_usage, that looks weird
> 
> How much of a problem is that?  modified_type_die calls gen_type_die,
> does that not cover the cases needed here?

If e.g. on the ref-3.C testcase from the patch I put breakpoint on
both gen_ptr_to_mbr_type_die and modified_type_die (the latter only for
type->base.code == RECORD_TYPE), then I see first:
#0  gen_ptr_to_mbr_type_die (type=, 
context_die=>, 
class_type=, member_type=) at ../../gcc/dwarf2out.c:23128
#1  0x00c3a7e5 in gen_type_die_with_usage (type=, 
context_die=>, usage=DINFO_USAGE_DIR_USE)
at ../../gcc/dwarf2out.c:24428
#2  0x00c3ab92 in gen_type_die (type=, 
context_die=>) at ../../gcc/dwarf2out.c:24491
#3  0x00c3caed in gen_decl_die (decl=, 
origin=, ctx=0x0, 
context_die=>) at ../../gcc/dwarf2out.c:25117
#4  0x00c3b13e in process_scope_var (stmt=, 
decl=, origin=, 
context_die=>) at ../../gcc/dwarf2out.c:24620
#5  0x00c3b1bb in decls_for_scope (stmt=, 
context_die=>) at ../../gcc/dwarf2out.c:24645
and only afterwards:
#0  modified_type_die (type=, cv_quals=0, 
reverse=false, 
context_die=>) at ../../gcc/dwarf2out.c:12328
#1  0x00c2f03b in add_type_attribute (object_die=>, 
type=, cv_quals=0, reverse=false, 
context_die=>) at ../../gcc/dwarf2out.c:20346
#2  0x00c354f4 in gen_variable_die (decl=, 
origin=, 
context_die=>) at ../../gcc/dwarf2out.c:22688
#3  0x00c3cb8e in gen_decl_die (decl=, 
origin=, ctx=0x0, 
context_die=>) at ../../gcc/dwarf2out.c:25138
#4  0x00c3b13e in process_scope_var (stmt=, 
decl=, origin=, 
context_die=>) at ../../gcc/dwarf2out.c:24620
#5  0x00c3b1bb in decls_for_scope (stmt=, 
context_die=>) at ../../gcc/dwarf2out.c:24645

which means if gen_type_die or gen_type_die_with_usage doesn't
use the langhook, then we'd emit a completely useless { __pfn; __delta }
struct into debug info first, and then in modified_type_die used
the langhook, get OFFSET_TYPE and probably create the
DW_TAG_ptr_to_member_type.  So I think we really need that.

> > 2) it is used for something Ada-ish I really don't know how to test etc.
> >to be able to find out if it is safe to call it in
> >gen_type_die_with_usage too
> 
> You could find an Ada test that uses the code and verify that the
> output stays the same?

I can try to find the patch that introduced it and if it contained any
testcases.

> > 3) most importantly, if the C++ version of this langhook would create
> >OFFSET_TYPE on the fly, I don't know how to ensure effective sharing
> >of DW_TAG_ptr_to_member_type nodes with the same DW_AT_type
> >and DW_AT_containing_type; unless the C++ langhook adds some extra
> >hash table that caches already created OFFSET_TYPEs or something similar,
> >it would create a new OFFSET_TYPE each time it is called
> 
> build_offset_type already uses a hash table.

Ah, ok.

> > Also, I really don't know how well does GDB (especially older releases)
> > handle DW_TAG_ptr_to_member_type for PMF, so the patch wraps that currently
> > with if (dwarf_version >= 5).  Quick grep revealed that GDB has code to
> > handle the __pfn/__delta fields.  So, can I ask somebody from the GDB
> > team to test this patch with that if (dwarf_version >= 5) replaced
> > with if (1) and see if it works properly with current GDB as well as say
> > 4-5 years old one (e.g. with -gdwarf-2 or -gdwarf-3)?  If yes, we
> > should emit it unconditionally.
> 
> This all makes sense to me.
> 
> > +  if (dwarf_version >= 5)
> > +   {
> > + tree class_type = lang_hooks.types.get_ptrmemfn_type (type, 0);
> > + if (class_type != NULL_TREE)
> 
> This can be
> 
> if (dwarf_version >= 5)
>   if (tree class_type = lang_hooks.types.get_ptrmemfn_type (type, 0))

Ok.

Jakub

RFC: Warnings silenced when macros from system headers are used (PR c/78000, c/71613)

2016-11-02 Thread Marek Polacek

One of the pending issues that we should address before we release GCC 7 is
that sometimes we don't issue a warning if the location points to a macro
defined in a system header, unless -Wsystem-headers.  Consider e.g.
  
  enum { e1 = LLONG_MIN };

or

  float_var = LLONG_MIN;

Here, LLONG_MIN comes from a system header and so the compiler doesn't print
any warnings even though it should--the problem is not in the macro itself,
but in how it's used.  This has happened before, e.g. with NULL, and the fix
was to call expansion_point_location_if_in_system_header, but this strategy
is not tenable; there are too many such issues.  After having spent days on
this it seems that we should always use the expansion location except for
certain pedwarns/warnings--the challenge is of course how to distinguish
which ones.  This patch introduces expand_macros_sentinel that can be used
to suppress expanding macros, removes various
expansion_point_location_if_in_system_header calls and fixes testsuite
fallout.  I have *not* gone over all the warnings/pedwarns yet, because this
is touch-and-go whether this'll be considered a reasonable approach.

The general principle should be: is it the macro or its user that is
responsible for the warning?, but in practice it was often less clear to
me what to do.  So e.g. imagine

  #define C :
  #define F i ?: 3

in a system header and then

  i = i ? C 5;  // 1
  return F; // 2

in user code -- I believe we should NOT warn for 2 (so don't expand the 
location),
but that also means we won't warn for 1.

Thoughts?

Bootstrapped/regtested on x86_64-linux and ppc64-linux.

2016-11-02  Marek Polacek  

PR c/78000
PR c/71613
gcc/c-family/
* c-common.c (unsafe_conversion_p): Don't call
expansion_point_location_if_in_system_header.
(c_cpp_error): Add a paramater and expand locations depending on that.
* c-common.h (c_cpp_error): Update declaration.
* c-warn.c (warnings_for_convert_and_check): Don't call
expansion_point_location_if_in_system_header.
gcc/c/
* c-decl.c (declspecs_add_type): Use expand_macros_sentinel to suppress
expanding macro locations.
* c-parser.c (c_parser_postfix_expression) [RID_FUNCTION_NAME,
RID_PRETTY_FUNCTION_NAME, RID_C99_FUNCTION_NAME]: Likewise.
* c-typeck.c (convert_arguments): Don't call
expansion_point_location_if_in_system_header.
(pedwarn_init): Likewise.
(warning_init): Likewise.
(convert_for_assignment): Don't call
expansion_point_location_if_in_system_header.
(c_finish_return): Likewise.
gcc/cp/
* call.c (conversion_null_warnings): Don't call
expansion_point_location_if_in_system_header.
* cvt.c (build_expr_type_conversion): Likewise.
* parser.c (set_and_check_decl_spec_loc): Use expand_macros_sentinel to
suppress expanding macro locations.
* typeck.c (cp_build_binary_op): Don't call
expansion_point_location_if_in_system_header.
gcc/
* diagnostic.c (diagnostic_initialize): Initialize dc_expand_locations.
(diagnostic_report_warnings_p): New function.
* diagnostic.h (struct diagnostic_context): Add dc_expand_locations.
(diagnostic_report_warnings_p): Remove.
(struct expand_macros_sentinel): New sentinel.
(diagnostic_report_warnings_p): Declare.
* genmatch.c (error_cb): Add bool parameter.
(fatal_at): Adjust error_cb call.
(warning_at): Likewise.
* input.c (on_error): Add bool parameter.
gcc/fortran/
* cpp.c (cb_cpp_error): Add a paramater and expand locations depending
on that.
gcc/testsuite/
* gcc.dg/pr71613.c: New.
* gcc.dg/pr78000.c: New.
* gcc.dg/pr78000.h: New.
libcpp/
* charset.c (noop_error_cb): Add bool parameter.
(saved_error_handler): Likewise.
(cpp_interpret_string_ranges):
* errors.c (cpp_diagnostic_at_richloc): Adjust cb.error call.
(cpp_diagnostic_at): Likewise.
(cpp_diagnostic_with_line): Add bool parameter and pass it down to
cb.error.
(cpp_error_with_line_noexpand): New.
(cpp_warning_with_line_noexpand): New.
(cpp_pedwarning_with_line_noexpand): New.
(cpp_warning_with_line_syshdr): Pass true to cpp_diagnostic_with_line.
* expr.c (cpp_classify_number): Call *_noexpand variants.
* include/cpplib.h (error): Adjust declaration.
(cpp_error_with_line_noexpand): New.
(cpp_warning_with_line_noexpand): New.
(cpp_pedwarning_with_line_noexpand): New.

diff --git gcc/gcc/c-family/c-common.c gcc/gcc/c-family/c-common.c
index 307862b..b0a1b5c 100644
--- gcc/gcc/c-family/c-common.c
+++ gcc/gcc/c-family/c-common.c
@@ -1230,7 +1230,6 @@ unsafe_conversion_p (location_t loc, tree type, tree 
expr, bool produce_warns)
 {
   enum conversion_safety give_warning = SAFE_CONVERSION; /* is 0 or false */
   tree expr_

Re: [PATCH][GCC][testsuite] Fix failing vminnm/vmaxnm tests for ARM

2016-11-02 Thread Christophe Lyon

On 2 November 2016 at 16:38, Tamar Christina  wrote:
> Hi all,
>
> This fixes the ARM failures in the testsuite.
> Previously these tests were gated on if ARMv8-a
> support was available in the compiler, not if the hardware
> was an ARMv8-a hardware.
>
> This thus resulted in correct code, but wouldn't run on
> any other hardware.
>
> Ran regression tests on aarch64-none-linux-gnu, arm-none-linux-gnueabihf.
>
> Ok for trunk?
>

It seems OK to me (without actually testing, though)

Thanks

Christophe

> Thanks,
> Tamar
>
> gcc/testsuite/
>
> 2016-11-01  Tamar Christina  
>
> * gcc.target/arm/simd/vmaxnm_f32_1.c (dg-require-effective-target):
> Check for arm_v8_neon_hw.
> * gcc.target/arm/simd/vmaxnmq_f32_1.c (dg-require-effective-target):
> Likewise.
> * gcc.target/arm/simd/vminnm_f32_1.c (dg-require-effective-target):
> Likewise.
> * gcc.target/arm/simd/vminnmq_f32_1.c(dg-require-effective-target):
> Likewise.

Re: RFC: Warnings silenced when macros from system headers are used (PR c/78000, c/71613)