Re: [PATCH] Destroy arguments for _Cilk_spawn calling in the child (PR 80038)

2017-04-13 Thread Richard Biener
On Wed, Apr 12, 2017 at 11:16 PM, Jeff Law  wrote:
> On 04/07/2017 08:02 AM, Xi Ruoyao wrote:
>>
>> On 2017-04-06 11:12 -0600, Jeff Law wrote:
>>
>>> With the likely deprecation in mind, I've only done a cursory review of
>>> the changes -- mostly to verify that they hit Cilk+ paths only.
>>
>>
>>> What's the purpose behind changing when we set the in_lto_p flag?
>>
>>
>> Without that change, GCC with my patch ICEed with _Cilk_spawn and
>> -flto -O3 -fcilkplus since __cilkrts_stack_frame.ctx's type (array of void
>> *)
>> was not TYPE_STRUCTURAL_EQUALITY_P in lto stage.
>> If this change is not proper, I'll work on modifying my patch to work
>> without touching in_lto_p.
>
> It's certainly be preferable to not change in_lto_p-- unless Richi wants to
> chime in on the safety of setting in_lto_p earlier.
>
> I'm not familiar enough with the LTO interactions to know if movement of
> in_lto_p is safe.

It should be safe and even technically more correct given we now have
various conditionals in the type building routines in tree.c that check
in_lto_p and avoid setting TYPE_CANONICAL from them -- TYPE_CANONICAL
is re-computed later, and for the builtin types we first zero TYPE_CANONICAL
(see lto.c:read_cgraph_and_symbols).

Now... I think those checks are somewhat wrong given that the middle-end
_does_ create types later, hopefully not ones we use for alias purposes,
but I'm not 100% sure.  So it somewhat feels like a hack ;)

So in theory the change is a good one.  I'm still nervous at this stage.

Did you verify LTO bootstrap still works with the patch?

CCing Honza who fiddled with this last (and introduced all those
in_lto_p checks).

Thanks,
Richard.

> Jeff


Re: [Patch, GCC/ARM, gcc-5-branch] Fix PR68390 Incorrect code due to indirect tail call of varargs function with hard float ABI

2017-04-13 Thread Ramana Radhakrishnan
On Wed, Apr 12, 2017 at 6:55 PM, Christophe Lyon
 wrote:
> Hi,
>
> It looks like we forgot to backport the fix for PR68390 to gcc-5-branch.
> The patch applies cleanly, and fwiw we've had it in the linaro-5
> branch for a while.
>
> OK to apply to gcc-5-branch?
>

OK if there are no regressions (the fact that its been in the linaro-5
branch gives me some confidence about the backport).

Ramana

> Thanks,
>
> Christophe


Do not accidentally localize symbols when there is a hidden symbol in the same comdat group

2017-04-13 Thread Jan Hubicka
Hi,
In the testcase for PR (which I do not know how to turn in to a testuiste
version becuase it requires LTOed dynamic library) there is comdat group
with one hiddena and one global symbol.  At LTO we realize that hidden
symbol can be brought static, but at that time we also accidentally localize
the exported symbol.

Bootstrapped/regtested x86_64-linux, comitted. Martin also kindly tested that
it makes no difference for Firefox build.

Honza

PR lto/69953 
* ipa-visibility.c (non_local_p): Fix typos.
(localize_node): When localizing symbol in same comdat group,
dissolve the group only when we know external symbols are going
to be privatized.
(function_and_variable_visibility): Do not localize DECL_EXTERNAL.
Index: ipa-visibility.c
===
--- ipa-visibility.c(revision 246895)
+++ ipa-visibility.c(working copy)
@@ -90,8 +90,8 @@ static bool
 non_local_p (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED)
 {
   return !(node->only_called_directly_or_aliased_p ()
-  /* i386 would need update to output thunk with locak calling
- ocnvetions.  */
+  /* i386 would need update to output thunk with local calling
+ convetions.  */
   && !node->thunk.thunk_p
   && node->definition
   && !DECL_EXTERNAL (node->decl)
@@ -153,7 +153,7 @@ comdat_can_be_unshared_p_1 (symtab_node
 /* COMDAT functions must be shared only if they have address taken,
otherwise we can produce our own private implementation with
-fwhole-program.  
-   Return true when turning COMDAT functoin static can not lead to wrong
+   Return true when turning COMDAT function static can not lead to wrong
code when the resulting object links with a library defining same COMDAT.
 
Virtual functions do have their addresses taken from the vtables,
@@ -537,6 +537,35 @@ localize_node (bool whole_program, symta
 {
   gcc_assert (whole_program || in_lto_p || !TREE_PUBLIC (node->decl));
 
+  /* It is possible that one comdat group contains both hidden and non-hidden
+ symbols.  In this case we can privatize all hidden symbol but we need
+ to keep non-hidden exported.  */
+  if (node->same_comdat_group
+  && node->resolution == LDPR_PREVAILING_DEF_IRONLY)
+{
+  symtab_node *next;
+  for (next = node->same_comdat_group;
+  next != node; next = next->same_comdat_group)
+   if (next->resolution == LDPR_PREVAILING_DEF_IRONLY_EXP
+   || next->resolution == LDPR_PREVAILING_DEF)
+ break;
+  if (node != next)
+   {
+ if (!node->transparent_alias)
+   {
+ node->resolution = LDPR_PREVAILING_DEF_IRONLY;
+ node->make_decl_local ();
+ if (!flag_incremental_link)
+   node->unique_name |= true;
+ return;
+   }
+   }
+}
+  /* For similar reason do not privatize whole comdat when seeing comdat
+ local.  Wait for non-comdat symbol to be privatized first.  */
+  if (node->comdat_local_p ())
+return;
+
   if (node->same_comdat_group && TREE_PUBLIC (node->decl))
 {
   for (symtab_node *next = node->same_comdat_group;
@@ -765,7 +794,8 @@ function_and_variable_visibility (bool w
vnode->no_reorder = 1;
 
   if (!vnode->externally_visible
- && !vnode->transparent_alias)
+ && !vnode->transparent_alias
+ && !DECL_EXTERNAL (vnode->decl))
localize_node (whole_program, vnode);
 
   update_visibility_by_resolution_info (vnode);


Re: [PATCH][libgcc, fuchsia]

2017-04-13 Thread Kyrill Tkachov


On 12/04/17 19:02, Josh Conner via gcc-patches wrote:

Ping^3?

I think this should be very straightforward - it just adds fuchsia target 
support to libgcc. Please do let me know if there are any concerns...



The arm parts look ok to me. You'll still need approval for the other parts 
though.

Thanks,
Kyrill


Thanks!

- Josh


2017-04-12  Joshua Conner  

* config/arm/unwind-arm.h (_Unwind_decode_typeinfo_ptr): Use
pc-relative indirect handling for fuchsia.
* config/t-slibgcc-fuchsia: New file.
* config.host (*-*-fuchsia*, aarch64*-*-fuchsia*, arm*-*-fuchsia*,
x86_64-*-fuchsia*): Add definitions.


On 2/21/17 9:41 AM, Josh Conner wrote:

Ping^2?


On 2/2/17 11:22 AM, Josh Conner wrote:

Ping?


On 1/17/17 10:40 AM, Josh Conner wrote:

The attached patch adds fuchsia support to libgcc.

OK for trunk?

Thanks -

Josh

2017-01-17  Joshua Conner  

* config/arm/unwind-arm.h (_Unwind_decode_typeinfo_ptr): Use
pc-relative indirect handling for fuchsia.
* config/t-slibgcc-fuchsia: New file.
* config.host (*-*-fuchsia*, aarch64*-*-fuchsia*, arm*-*-fuchsia*,
x86_64-*-fuchsia*): Add definitions.











Re: [PATCH] Fix PR51513, switch statement with default case containing __builtin_unreachable leads to wild branch

2017-04-13 Thread Richard Biener
On Wed, 12 Apr 2017, Peter Bergner wrote:

> This patch is the second attempt to fix PR51513, namely the generation of
> wild branches due to switch case statements that only contain calls to
> __builtin_unreachable().  With the first attempt:
> 
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01915.html
> 
> richi said he preferred if we just eliminated the range check for
> default case statements that contained __builtin_unreachable().
> This patch implements that idea.  It also removes normal case
> statement blocks that are marked as unreachable, but in those cases,
> we just use a dummy label in the jump table for them.
> 
> This passed bootstrap and regtesting with no regressions on powerpc64-linux
> and x86_64-linux.  Ok for trunk now or trunk during stage1?

To recap the situation (from what I can deciper out of the ppc asm
and the expand RTL) we seem to expand to

  if (cond > 2)
__builtin_unreachable (); // jumps to the jump table data(?)
  goto *tbl[cond];

now I do not remember the reason why we keep __builtin_unreachable ()
at the RTL level -- on GIMPLE we keep it to be able to extract
range information from the controlling condition.

Your patch comments suggest that handling of gaps is similarly
improved but the testcase doesn't contain any gaps.

So I wonder why we don't simply apply the "transform" at the GIMPLE
level, for example in the kitchen-sink pass_fold_builtins.  That is
remove all __builtin_unreachable () labels and if the default label
is amongst them make the first one the new default (or maybe the
last one dependent on which would shrink jump table size most).

I'd apply the same to if conditions but as said I don't remember
why we keep __builtin_unreachable () -- we of course do have to
keep it if stmts preceeding it may have side-effects.

Maybe somebody can chime in on the uselfulness of keeping RTL
for this case?  For trivial cases like

int foo (int i)
{
  if (i < -__INT_MAX__)
__builtin_unreachable ();
  return i - 2;
}

VRP2 removes the if because we put the range computed by VRP1
on the SSA name used in the test (we have special code handling
unreachable paths there).  For your testcase with the switch
as cond_2 is not used after the switch we do not compute any
range for it, otherwise we'd likely eliminate the default
case as well.  For the small testcase above fold-all-builtins
handles the removal as well if VRP is not run, so extending
that to handle switch stmts sounds reasonable (see
optimize_unreachable).  There's even a TODO in this function.

Richard.

> Peter
> 
> 
> gcc/
>   * tree-cfg.c (gimple_unreachable_bb_p): New function.
>   (assert_unreachable_fallthru_edge_p): Use it.
>   * tree-cfg.h: Prototype it.
>   * stmt.c: Include cfghooks.h and tree-cfg.h.
>   (emit_case_dispatch_table) : New local variable.
>   Use it to fill dispatch table gaps and unreachable cases.
>   Remove edges to unreachable blocks.
>   (expand_case): Test for unreachable default case statement and
>   remove its edge.  Set default_label accordingly.
>   (emit_case_nodes): Only emit branch to default_label if it
>   exists.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/pr51513.c: New test.
> 
> Index: gcc/stmt.c
> ===
> --- gcc/stmt.c(revision 246661)
> +++ gcc/stmt.c(working copy)
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.
>  #include "rtl.h"
>  #include "tree.h"
>  #include "gimple.h"
> +#include "cfghooks.h"
>  #include "predict.h"
>  #include "alloc-pool.h"
>  #include "memmodel.h"
> @@ -49,6 +50,7 @@ along with GCC; see the file COPYING3.
>  #include "expr.h"
>  #include "langhooks.h"
>  #include "cfganal.h"
> +#include "tree-cfg.h"
>  #include "params.h"
>  #include "dumpfile.h"
>  #include "builtins.h"
> @@ -989,6 +991,14 @@ emit_case_dispatch_table (tree index_exp
>labelvec = XALLOCAVEC (rtx, ncases);
>memset (labelvec, 0, ncases * sizeof (rtx));
>  
> +  /* The dispatch table may contain gaps and labels of unreachable case
> + statements.  Gaps can exist at the beginning of the table if we tried
> + to avoid the minval subtraction.  We fill the dispatch table slots
> + associated with the gaps and unreachable case blocks with the default
> + case label.  However, in the event the default case itself is 
> unreachable,
> + we then use any label from one of the reachable case statements.  */
> +  rtx gap_label = (default_label) ? default_label : fallback_label;
> +
>for (n = case_list; n; n = n->right)
>  {
>/* Compute the low and high bounds relative to the minimum
> @@ -1000,42 +1010,51 @@ emit_case_dispatch_table (tree index_exp
>HOST_WIDE_INT i_high
>   = tree_to_uhwi (fold_build2 (MINUS_EXPR, index_type,
>n->high, minval));
> -  HOST_WIDE_INT i;
>  
> +  basic_block case_bb = label_to_block (n->code_label);
> +  rtx c

Re: [wwwdocs,fortran] Update link to CHKSYS

2017-04-13 Thread Arjen Markus
Hi Gerald, Jerry,

as the author of this program (and the maintainer of that site) I am
honoured that you refer to this. But I would like to tell you a bit
about the history:
I started CHKSYS in the time that FORTRAN 77 was still the main
standard, so much of the tests that are contained in this program are
specific to all the variations I had encountered with various FORTRAN
compilers. I did start a version for Fortran 90, but I was rather a
novice wrt that standard back then and I never really achieved a
satisfactory set of tests.

Some time last year I started anew with a set of programs - see the
chkfeatures subdirectory in the source repository. These programs
check whether a compiler supports features found in Fortran 2003 and
2008 and various common extensions that I am aware of. Since the
checks frequently cause the compilers to complain - and are even
designed for this - I have set this up as a collection of programs,
rather than a single one.

Checking the web pages, I see I should really update the original page
and integrate the new stuff with it. If you have any suggestions to
make the reference easier (access to the source code from that page?),
then let me know. I am no hero when it comes to web design, but a
better set-up of links and such is well within my capacity.

Regards,

Arjen


[PATCH][PR sanitizer/80414] Fix segfault with -fsanitize=undefined on 32 bit host

2017-04-13 Thread Denis Khalikov

Hello everyone.

I have patch to fix segfault with -fsanitize=undefined on 32 bit host.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80414

Can someone please review it.

Thanks.






commit 3bb53510ae11a9fa1f79ae83469c2650abe81ab4
Author: Denis Khalikov 
Date:   Thu Apr 13 12:03:19 2017 +0300

PR sanitizer/80414
* ubsan.c (ubsan_expand_bounds_ifn): Fix wrong tree val generation
for 32 bit host.
* c-c++-common/ubsan/bounds-15.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3154103..283dbd6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-04-13  Denis Khalikov 
+
+	PR sanitizer/80414
+	* ubsan.c (ubsan_expand_bounds_ifn): Fix wrong tree val generation
+	for 32 bit host.
+
 2017-04-12  Jan Hubicka  
 
 	PR lto/69953 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index b1594f2..fe55233 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2017-04-13  Denis Khalikov  
+
+	PR sanitizer/80414
+	* c-c++-common/ubsan/bounds-15.c: New test.
+
 2017-04-12  Jakub Jelinek  
 
 	PR tree-optimization/79390
diff --git a/gcc/testsuite/c-c++-common/ubsan/bounds-15.c b/gcc/testsuite/c-c++-common/ubsan/bounds-15.c
new file mode 100644
index 000..2af709a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/bounds-15.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
+
+int main()
+{
+  long long offset = 10;
+  char array[10];
+  char c = array[offset];
+  return 0;
+}
diff --git a/gcc/ubsan.c b/gcc/ubsan.c
index c01d633..936 100644
--- a/gcc/ubsan.c
+++ b/gcc/ubsan.c
@@ -672,7 +672,8 @@ ubsan_expand_bounds_ifn (gimple_stmt_iterator *gsi)
 
   /* Pick up the arguments of the UBSAN_BOUNDS call.  */
   tree type = TREE_TYPE (TREE_TYPE (gimple_call_arg (stmt, 0)));
-  tree index = gimple_call_arg (stmt, 1);
+  tree index, orig_index;
+  index = orig_index = gimple_call_arg (stmt, 1);
   tree orig_index_type = TREE_TYPE (index);
   tree bound = gimple_call_arg (stmt, 2);
 
@@ -708,9 +709,9 @@ ubsan_expand_bounds_ifn (gimple_stmt_iterator *gsi)
 	  ? BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS
 	  : BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS_ABORT;
   tree fn = builtin_decl_explicit (bcode);
-  tree val = force_gimple_operand_gsi (gsi, ubsan_encode_value (index),
-	   true, NULL_TREE, true,
-	   GSI_SAME_STMT);
+  tree val
+	= force_gimple_operand_gsi (gsi, ubsan_encode_value (orig_index), true,
+NULL_TREE, true, GSI_SAME_STMT);
   g = gimple_build_call (fn, 2, data, val);
 }
   gimple_set_location (g, loc);


Re: [PATCH] Fix dwarf2out ICE with self-inlining (PR debug/80321)

2017-04-13 Thread Richard Biener
On Fri, Apr 7, 2017 at 11:42 PM, Jakub Jelinek  wrote:
> Hi!
>
> The following C and Ada testcases show ICE due to endless recursion in
> dwarf2out.c.  The problem is that when processing BLOCK_NONLOCALIZED_VARS,
> we want to treat all the FUNCTION_DECLs in there as mere declarations,
> but gen_subprogram_die does:
>   int declaration = (current_function_decl != decl
>  || class_or_namespace_scope_p (context_die));
> and thus if there is some self-inlining and we are unlucky enough
> not to reach some early-outs that just ignore the FUNCTION_DECL,
> like:
>   /* Detect and ignore this case, where we are trying to output
>  something we have already output.  */
>   if (get_AT (old_die, DW_AT_low_pc)
>   || get_AT (old_die, DW_AT_ranges))
> return;
> we will recurse infinitely.  The following patch fixes it by
> just ignoring current_function_decl seen from BLOCK_NONLOCALIZED_VARS,
> that implies it is already inlined into somewhere and in the abstract
> origin we emit properly a DW_AT_declaration decl if needed, that is pretty
> much what gen_subprogram_die would do anyway in such cases, because there
> is already old_die, we really don't want to make some child of it its parent
> and otherwise no further action is performed.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> Other possibilities include adding some global bool flag that
> gen_subprogram_die's int declaration = ... above should ignore
> decl == current_function_decl that we'd set in decls_for_scope when
> seeing current_function_decl in BLOCK_NONLOCALIZED_VARS (and probably
> save/clear + restore in dwarf2out_abstract_function where we change
> current_function_decl).  Or we could pass through from decls_for_scope
> down through process_scope_var, gen_decl_die to gen_subprogram_die
> a bool flag force_declaration.
>
> 2017-04-07  Jakub Jelinek  
>
> PR debug/80321
> * dwarf2out.c (decls_for_scope): Ignore declarations of
> current_function_decl in BLOCK_NONLOCALIZED_VARS.
>
> * gcc.dg/debug/pr80321.c: New test.
>
> 2017-04-07  Eric Botcazou  
>
> * gnat.dg/debug10.adb: New test.
> * gnat.dg/debug10_pkg.ads: New helper.
>
> --- gcc/dwarf2out.c.jj  2017-04-07 11:46:48.0 +0200
> +++ gcc/dwarf2out.c 2017-04-07 20:00:43.503772542 +0200
> @@ -24889,7 +24889,12 @@ decls_for_scope (tree stmt, dw_die_ref c
> for (i = 0; i < BLOCK_NUM_NONLOCALIZED_VARS (stmt); i++)
>   {
> decl = BLOCK_NONLOCALIZED_VAR (stmt, i);
> -   if (TREE_CODE (decl) == FUNCTION_DECL)
> +   if (decl == current_function_decl)
> + /* Ignore declarations of the current function, while they
> +are declarations, gen_subprogram_die would treat them
> +as definitions again, because they are equal to
> +current_function_decl and endlessly recurse.  */;
> +   else if (TREE_CODE (decl) == FUNCTION_DECL)
>   process_scope_var (stmt, decl, NULL_TREE, context_die);
> else
>   process_scope_var (stmt, NULL_TREE, decl, context_die);
> --- gcc/testsuite/gcc.dg/debug/pr80321.c.jj 2017-04-07 21:39:01.930615179 
> +0200
> +++ gcc/testsuite/gcc.dg/debug/pr80321.c2017-04-07 21:39:49.722982635 
> +0200
> @@ -0,0 +1,26 @@
> +/* PR debug/80321 */
> +/* { dg-do compile } */
> +/* { dg-options "-fkeep-inline-functions" } */
> +
> +void bar (void);
> +
> +static inline void
> +test (int x)
> +{
> +  inline void
> +  foo (int x)
> +  {
> +test (0);
> +asm volatile ("" : : : "memory");
> +  }
> +  if (x != 0)
> +foo (x);
> +  else
> +bar ();
> +}
> +
> +void
> +baz (int x)
> +{
> +  test (x);
> +}
> --- gcc/testsuite/gnat.dg/debug10.adb.jj2017-04-07 20:24:44.232473780 
> +0200
> +++ gcc/testsuite/gnat.dg/debug10.adb   2017-04-07 20:26:40.493980722 +0200
> @@ -0,0 +1,68 @@
> +-- PR debug/80321
> +
> +-- { dg-do compile }
> +-- { dg-options "-O2 -g" }
> +
> +with Debug10_Pkg; use Debug10_Pkg;
> +
> +procedure Debug10 (T : Entity_Id) is
> +
> +   procedure Inner (E : Entity_Id);
> +   pragma Inline (Inner);
> +
> +   procedure Inner (E : Entity_Id) is
> +   begin
> +  if E /= Empty
> + and then not Nodes (E + 3).Flag16
> +  then
> + Debug10 (E);
> +  end if;
> +   end Inner;
> +
> +   function Ekind (E : Entity_Id) return Entity_Kind is
> +   begin
> +  return N_To_E (Nodes (E + 1).Nkind);
> +   end Ekind;
> +
> +begin
> +
> +   if T = Empty then
> +  return;
> +   end if;
> +
> +   Nodes (T + 3).Flag16 := True;
> +
> +   if Ekind (T) in Object_Kind then
> +  Inner (T);
> +
> +   elsif Ekind (T) in Type_Kind then
> +  Inner (T);
> +
> +  if Ekind (T) in Record_Kind then
> +
> + if Ekind (T) = E_Class_Wide_Subtype then
> +Inner (T);
> + end if;
> +
> +  elsif Ekind (T) in Arr

[PATCH] Fix PR80416

2017-04-13 Thread Richard Biener

The following hopefully fixes asm constraints (works for ia64 now,
still works on x86_64).

Richard.

2017-04-13  Richard Biener  

PR testsuite/80416
* g++.dg/torture/pr79671.C: Fix asm constraints.

Index: gcc/testsuite/g++.dg/torture/pr79671.C
===
--- gcc/testsuite/g++.dg/torture/pr79671.C  (revision 246899)
+++ gcc/testsuite/g++.dg/torture/pr79671.C  (working copy)
@@ -13,7 +13,7 @@ int __attribute__((noinline)) foo()
   new (&x) B (0);
   y = x;
   B *q = reinterpret_cast (&y);
-  asm volatile ("" : "=r" (q) : "r" (q));
+  asm volatile ("" : "=r" (q) : "0" (q));
   return q->i;
 }
 


Re: [PATCH] Fix PR51513, switch statement with default case containing __builtin_unreachable leads to wild branch

2017-04-13 Thread Jakub Jelinek
On Thu, Apr 13, 2017 at 10:14:51AM +0200, Richard Biener wrote:
> now I do not remember the reason why we keep __builtin_unreachable ()
> at the RTL level -- on GIMPLE we keep it to be able to extract

I believe we don't.  In RTL __builtin_unreachable () is represented
as a basic block without successors, followed by a BARRIER.

Jakub


[patch, gomp4, committed] Adjust copy/copyin/copyout/create for OpenACC 2.5

2017-04-13 Thread Chung-Lin Tang
The behavior of the copy/copyin/copyout/create clauses has been changed
in OpenACC 2.5 to be like the present_or_* variants, and the original
present_or_* syntax relegated to legacy status.

This patch removes the presence of any PRESENT_OR_* symbols, and
changes the mapping of the copy/copyin/copyout/create clauses to map
kinds without the FORCE flag. Library routines acc_[present_or_]copy, etc.
has also been updated.

This patch has been applied to the gomp-4_0-branch.

Chung-Lin

2017-04-13  Chung-Lin Tang  

gcc/
* gimplify.c (gimplify_oacc_declare_1): Remove GOMP_MAP_FORCE_* cases.

gcc/c-family/
* c-pragma.h (enum pragma_omp_clauses):
Remove PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY/COPYIN/COPYOUT/CREATE.

gcc/c/
* c-parser.c (c_parser_omp_clause_name): Remove occurences of
PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY/COPYIN/COPYOUT/CREATE, adjust
them to non-PRESENT_OR values.
(c_parser_oacc_data_clause): Remove PRAGMA_OACC_CLAUSE_PRESENT_OR_*
cases, remove FORCE from PRAGMA_OACC_CLAUSE_COPY/COPYIN/COPYOUT/CREATE
kinds, update comment description.
(c_parser_oacc_all_clauses): Remove PRAGMA_OACC_CLAUSE_PRESENT_OR_*
cases.
(OACC_DATA_CLAUSE_MASK): Remove PRAGMA_OACC_CLAUSE_PRESENT_OR_*.
(OACC_DECLARE_CLAUSE_MASK): Likewise.
(c_parser_oacc_declare): Remove GOMP_MAP_FORCE_ALLOC/TO, change to
COMP_MAP_ALLOC/TO.
(OACC_ENTER_DATA_CLAUSE_MASK): Remove PRAGMA_OACC_CLAUSE_PRESENT_OR_*.
(OACC_KERNELS_CLAUSE_MASK): Likewise.
(OACC_PARALLEL_CLAUSE_MASK): Likewise.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Remove occurences of
PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY/COPYIN/COPYOUT/CREATE, adjust
them to non-PRESENT_OR values.
(cp_parser_oacc_data_clause): Remove PRAGMA_OACC_CLAUSE_PRESENT_OR_*
cases, remove FORCE from PRAGMA_OACC_CLAUSE_COPY/COPYIN/COPYOUT/CREATE
kinds, update comment description.
(cp_parser_oacc_all_clauses): Remove PRAGMA_OACC_CLAUSE_PRESENT_OR_*
cases.
(OACC_DATA_CLAUSE_MASK): Remove PRAGMA_OACC_CLAUSE_PRESENT_OR_*.
(OACC_DECLARE_CLAUSE_MASK): Likewise.
(cp_parser_oacc_declare): Remove GOMP_MAP_FORCE_ALLOC/TO, change to
COMP_MAP_ALLOC/TO.
(OACC_ENTER_DATA_CLAUSE_MASK): Remove PRAGMA_OACC_CLAUSE_PRESENT_OR_*.
(OACC_KERNELS_CLAUSE_MASK): Likewise.
(OACC_PARALLEL_CLAUSE_MASK): Likewise.

gcc/fortran/
* openmp.c (enum omp_mask2):
Remove OMP_CLAUSE_PRESENT_OR_COPY/COPYIN/COPYOUT/CREATE.
(gfc_match_omp_clauses): Remove FORCE flag from OpenACC OMP_MAP_* cases,
adjust and remove PRESENT_OR_ values.
(OACC_PARALLEL_CLAUSES): Remove OMP_CLAUSE_PRESENT_OR_*.
(OACC_KERNELS_CLAUSES): Likewise.
(OACC_DATA_CLAUSES): Likewise.
(OACC_DECLARE_CLAUSES): Likewise.
(OACC_ENTER_DATA_CLAUSES): Likewise.

gcc/testsuite/
* gfortran.dg/goacc/data-tree.f95: Remove "force_" from dump scan.
* gfortran.dg/goacc/kernels-tree.f95: Likewise.
* gfortran.dg/goacc/data-tree.f95: Likewise.
* gfortran.dg/goacc/reduction-promotions.f90: Likewise.
* gfortran.dg/goacc/combined-directives.f90: Likewise.
* gfortran.dg/goacc/default-4.f: Likewise.
* gfortran.dg/declare-2.f95: Adjust error scan.
* c-c++-common/goacc/acc-data-chain.c: Remove "force_" from dump scan.
* c-c++-common/goacc/default-4.c: Likewise.
* c-c++-common/goacc/declare-2.c: Move present_or_copyin/create tests
from here to...
* c-c++-common/goacc/declare-1.c: ...here.

   libgomp/
* oacc-mem.c (acc_create): Add FLAG_PRESENT to call of
present_create_copy.
(acc_create_async): Likewise.
(acc_copyin): Likewise.
(acc_copyin_async): Likewise.
(acc_present_or_create): Remove definition and change to alias of
acc_create.
(acc_present_or_copyin): Remove definition and change to alias of
acc_copyin.
* oacc-parallel.c (GOACC_enter_exit_data): Add GOMP_MAP_FROM as
handled map kind.
* testsuite/libgomp.oacc-c-c++-common/data-already-1.c: Delete.
* testsuite/libgomp.oacc-c-c++-common/data-already-2.c: Delete.
* testsuite/libgomp.oacc-c-c++-common/data-already-3.c: Delete.
* testsuite/libgomp.oacc-c-c++-common/data-already-4.c: Delete.
* testsuite/libgomp.oacc-c-c++-common/data-already-5.c: Delete.
* testsuite/libgomp.oacc-c-c++-common/data-already-6.c: Delete.
* testsuite/libgomp.oacc-c-c++-common/data-already-7.c: Delete.
* testsuite/libgomp.oacc-c-c++-common/data-already-8.c: Delete.
* testsuite/libgomp.oacc-fortran/data-already-1.f: Delete.
* testsuite/libgomp.oacc-fortran/data-already-2.f: Delete.
* testsuite/libgomp.oacc-fortran/data-already-3.f: 

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Wilco Dijkstra
>On Wed, Apr 12, 2017 at 09:29:55AM +, Sudi Das wrote:
> > Hi all
> > 
> > This is a fix for PR 80131 
> > Currently the code A << (B - C) is not simplified.
>> However at least a more specific case of 1U << (C -x) where C = 
>> precision(type) - 1 can be simplified to (1 << C) >> x.
>
> Is that always a win though?

Yes assuming left and right shift have the same performance.

> Some constants have higher costs than others on various targets, some
> significantly higher.  This change might be beneficial only
> if if C is as expensive as 1, then you get rid of a one (typically cheap)
> operation.

Most targets can create the constant cheaply. Targets that can't would need 2
instructions (move+shift) or a literal load. That's not worse compared to the
original sequence (3 operations). The constant can be shared or lifted out of
a loop, so we're saving 1 subtract per use of the sequence.

> Which makes me wonder whether this should be done at GIMPLE time and not
> at RTL time (expansion or combine etc.) when one can compare the RTX costs.
> Or do this at match.pd as canonicalization and then have RTL transformation
> to rewrite such (1U << C) >> X as 1U << (C - X) if the latter is faster (or
> shorter).

I can't see why that would be useful for just this pattern. There are lots more 
useful
cases where GCC should simplify immediates to fit the target but it doesn't 
currently.
For example it changes x < 0x10 to x <= 0xf which is worse on many 
targets.

Wilco

[PATCH] Do not call memcpy with a NULL argument (PR gcov-profile/80413).

2017-04-13 Thread Martin Liška
Hello.

As the function is called with argument equal to zero, the memcpy should be
done just in case alloc is greater than zero.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
And gcov.exp works on x86_64-linux-gnu.

Ready to be installed?
Martin
>From 82f609465a9702a0c4ce0c4dbe9c3bd744a3ea34 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 13 Apr 2017 10:35:29 +0200
Subject: [PATCH] Do not call memcpy with a NULL argument (PR
 gcov-profile/80413).

gcc/ChangeLog:

2017-04-13  Martin Liska  

	PR gcov-profile/80413
	* gcov-io.c (gcov_write_string): Copy to buffer just when
	allocated size is greater than zero.
---
 gcc/gcov-io.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/gcov-io.c b/gcc/gcov-io.c
index 3b6b022d143..18e0f7f4b1b 100644
--- a/gcc/gcov-io.c
+++ b/gcc/gcov-io.c
@@ -347,8 +347,12 @@ gcov_write_string (const char *string)
   buffer = gcov_write_words (1 + alloc);
 
   buffer[0] = alloc;
-  buffer[alloc] = 0;
-  memcpy (&buffer[1], string, length);
+
+  if (alloc > 0)
+{
+  buffer[alloc] = 0;
+  memcpy (&buffer[1], string, length);
+}
 }
 #endif
 
-- 
2.12.2



Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Jakub Jelinek
On Thu, Apr 13, 2017 at 11:16:08AM +, Wilco Dijkstra wrote:
> >On Wed, Apr 12, 2017 at 09:29:55AM +, Sudi Das wrote:
> > > Hi all
> > > 
> > > This is a fix for PR 80131 
> > > Currently the code A << (B - C) is not simplified.
> >> However at least a more specific case of 1U << (C -x) where C = 
> >> precision(type) - 1 can be simplified to (1 << C) >> x.
> >
> > Is that always a win though?
> 
> Yes assuming left and right shift have the same performance.
> 
> > Some constants have higher costs than others on various targets, some
> > significantly higher.  This change might be beneficial only
> > if if C is as expensive as 1, then you get rid of a one (typically cheap)
> > operation.
> 
> Most targets can create the constant cheaply. Targets that can't would need 2

No.  Some constants sometimes even 7 instructions (e.g. sparc64; not talking
in particular about 1ULL << 63 constant), or have one instruction
that is more expensive than normal small constant load.  Compare say x86_64
movl/movq vs. movabsq, I think the latter has 3 times longer latency on many
CPUs.  So no, I think it isn't an unconditional win.

Jakub


Re: [PATCH] Do not call memcpy with a NULL argument (PR gcov-profile/80413).

2017-04-13 Thread Nathan Sidwell

On 04/13/2017 07:19 AM, Martin Liška wrote:

Hello.

As the function is called with argument equal to zero, the memcpy should be
done just in case alloc is greater than zero.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
And gcov.exp works on x86_64-linux-gnu.


the patch is ok, but it tripped me up as the diff snipped out how alloc 
was calculated, and I'd forgotten what the buffer[alloc] = 0 bit was for.


If you could amend to add a comment, it's good to go.

something like:

+  buffer[alloc] = 0; /* place nul terminators.  */

nathan

--
Nathan Sidwell


Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Wilco Dijkstra
Jakub Jelinek wrote:
    
> No.  Some constants sometimes even 7 instructions (e.g. sparc64; not talking
> in particular about 1ULL << 63 constant), or have one instruction
> that is more expensive than normal small constant load.  Compare say x86_64
> movl/movq vs. movabsq, I think the latter has 3 times longer latency on many
> CPUs.  So no, I think it isn't an unconditional win.

We're specifically only talking about the constants (1L << 63), (1 << 31) and 
(1 << 15).
On all targets these need at most 2 simple instructions. That makes it an 
unconditional win.

Wilco

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Jakub Jelinek
On Thu, Apr 13, 2017 at 11:33:12AM +, Wilco Dijkstra wrote:
> Jakub Jelinek wrote:
>     
> > No.  Some constants sometimes even 7 instructions (e.g. sparc64; not talking
> > in particular about 1ULL << 63 constant), or have one instruction
> > that is more expensive than normal small constant load.  Compare say x86_64
> > movl/movq vs. movabsq, I think the latter has 3 times longer latency on many
> > CPUs.  So no, I think it isn't an unconditional win.
> 
> We're specifically only talking about the constants (1L << 63), (1 << 31) and 
> (1 << 15).
> On all targets these need at most 2 simple instructions. That makes it an 
> unconditional win.

It is not a win on at least Haswell-E:
__attribute__((noinline, noclone)) unsigned long long int
foo (int x)
{
  asm volatile ("" : : : "memory");
  return 1ULL << (63 - x);
}

__attribute__((noinline, noclone)) unsigned long long int
bar (int x)
{
  asm volatile ("" : : : "memory");
  return (1ULL << 63) >> x;
}

int
main (int argc, const char **argv)
{
  int i;
  if (argc == 1)
for (i = 0; i < 10; i++)
  asm volatile ("" : : "r" (foo (13)));
  else
for (i = 0; i < 10; i++)
  asm volatile ("" : : "r" (bar (13)));
  return 0;
}

$ time /tmp/test

real0m1.290s
user0m1.288s
sys 0m0.002s
$ time /tmp/test 1

real0m1.542s
user0m1.540s
sys 0m0.002s

As I said, movabsq is expensive.

Jakub


Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Richard Biener
On Thu, Apr 13, 2017 at 1:41 PM, Jakub Jelinek  wrote:
> On Thu, Apr 13, 2017 at 11:33:12AM +, Wilco Dijkstra wrote:
>> Jakub Jelinek wrote:
>>
>> > No.  Some constants sometimes even 7 instructions (e.g. sparc64; not 
>> > talking
>> > in particular about 1ULL << 63 constant), or have one instruction
>> > that is more expensive than normal small constant load.  Compare say x86_64
>> > movl/movq vs. movabsq, I think the latter has 3 times longer latency on 
>> > many
>> > CPUs.  So no, I think it isn't an unconditional win.
>>
>> We're specifically only talking about the constants (1L << 63), (1 << 31) 
>> and (1 << 15).
>> On all targets these need at most 2 simple instructions. That makes it an 
>> unconditional win.
>
> It is not a win on at least Haswell-E:
> __attribute__((noinline, noclone)) unsigned long long int
> foo (int x)
> {
>   asm volatile ("" : : : "memory");
>   return 1ULL << (63 - x);
> }
>
> __attribute__((noinline, noclone)) unsigned long long int
> bar (int x)
> {
>   asm volatile ("" : : : "memory");
>   return (1ULL << 63) >> x;
> }
>
> int
> main (int argc, const char **argv)
> {
>   int i;
>   if (argc == 1)
> for (i = 0; i < 10; i++)
>   asm volatile ("" : : "r" (foo (13)));
>   else
> for (i = 0; i < 10; i++)
>   asm volatile ("" : : "r" (bar (13)));
>   return 0;
> }
>
> $ time /tmp/test
>
> real0m1.290s
> user0m1.288s
> sys 0m0.002s
> $ time /tmp/test 1
>
> real0m1.542s
> user0m1.540s
> sys 0m0.002s
>
> As I said, movabsq is expensive.

It is IMHO a valid GIMPLE optimization / canonicalization.

movabsq $-9223372036854775808, %rax

so this should then have been generated as 1<<63?

At some point variable shifts were quite expensive as well..

Richard.

> Jakub


Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Jakub Jelinek
On Thu, Apr 13, 2017 at 01:49:01PM +0200, Richard Biener wrote:
> It is IMHO a valid GIMPLE optimization / canonicalization.

As I said, we can do it as GIMPLE canonicalization, but
we should have code to undo it if beneficial at RTL level.
And the patch has not included that.
> 
> movabsq $-9223372036854775808, %rax
> 
> so this should then have been generated as 1<<63?

Maybe.  But it seems to be still more expensive than the original code,
though better than movabsq:

__attribute__((noinline, noclone)) unsigned long long int
foo (int x)
{
  asm volatile ("" : : : "memory");
  return 1ULL << (63 - x);
}

__attribute__((noinline, noclone)) unsigned long long int
bar (int x)
{
  asm volatile ("" : : : "memory");
  return (1ULL << 63) >> x;
}

__attribute__((noinline, noclone)) unsigned long long int
baz (int x)
{
  unsigned long long int y = 1;
  asm volatile ("" : "+r" (y) : : "memory");
  return (y << 63) >> x;
}

int
main (int argc, const char **argv)
{
  int i;
  if (argc == 1)
for (i = 0; i < 10; i++)
  asm volatile ("" : : "r" (foo (13)));
  else if (argc == 2)
for (i = 0; i < 10; i++)
  asm volatile ("" : : "r" (bar (13)));
  else if (argc == 3)
for (i = 0; i < 10; i++)
  asm volatile ("" : : "r" (baz (13)));
  return 0;
}

Jakub


Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Wilco Dijkstra
Richard Biener wrote:
> It is IMHO a valid GIMPLE optimization / canonicalization.
>
>    movabsq $-9223372036854775808, %rax
>
> so this should then have been generated as 1<<63?
>
> At some point variable shifts were quite expensive as well..

Yes I don't see a major difference between movabsq and

movl$1, %eax
salq$63, %rax

on my Sandy Bridge, but if the above is faster then that is what the x64
backend should emit - it's 1 byte smaller as well, so probably better in all
cases.

Wilco

Re: [PATCH][PR sanitizer/80414] Fix segfault with -fsanitize=undefined on 32 bit host

2017-04-13 Thread Jakub Jelinek
On Thu, Apr 13, 2017 at 12:28:40PM +0300, Denis Khalikov wrote:
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/ubsan/bounds-15.c
> @@ -0,0 +1,11 @@
> +/* { dg-do run } */
> +/* { dg-options "-fsanitize=bounds" } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
> +
> +int main()
> +{
> +  long long offset = 10;
> +  char array[10];
> +  char c = array[offset];
> +  return 0;
> +}

I would expect you want to dg-output here the runtime diagnostics,
at least some part of it, to make it clear the testcase is UB and
to test whether the UB is detected.

> diff --git a/gcc/ubsan.c b/gcc/ubsan.c
> index c01d633..936 100644
> --- a/gcc/ubsan.c
> +++ b/gcc/ubsan.c
> @@ -672,7 +672,8 @@ ubsan_expand_bounds_ifn (gimple_stmt_iterator *gsi)
>  
>/* Pick up the arguments of the UBSAN_BOUNDS call.  */
>tree type = TREE_TYPE (TREE_TYPE (gimple_call_arg (stmt, 0)));
> -  tree index = gimple_call_arg (stmt, 1);
> +  tree index, orig_index;
> +  index = orig_index = gimple_call_arg (stmt, 1);
>tree orig_index_type = TREE_TYPE (index);

Instead of this I'd suggest:
   tree index = gimple_call_arg (stmt, 1);
-  tree orig_index_type = TREE_TYPE (index);
+  tree orig_index = index;

>tree bound = gimple_call_arg (stmt, 2);
>  
> @@ -708,9 +709,9 @@ ubsan_expand_bounds_ifn (gimple_stmt_iterator *gsi)
> ? BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS
> : BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS_ABORT;
>tree fn = builtin_decl_explicit (bcode);
> -  tree val = force_gimple_operand_gsi (gsi, ubsan_encode_value (index),
> -true, NULL_TREE, true,
> -GSI_SAME_STMT);
> +  tree val
> + = force_gimple_operand_gsi (gsi, ubsan_encode_value (orig_index), true,
> + NULL_TREE, true, GSI_SAME_STMT);
>g = gimple_build_call (fn, 2, data, val);
>  }
>gimple_set_location (g, loc);

and replace orig_index_type use with TREE_TYPE (orig_index)

Jakub


Re: RFC: seeking insight on store_data_bypass_p (recog.c)

2017-04-13 Thread Pat Haugen
On 04/12/2017 06:33 PM, Kelvin Nilsen wrote:
> 
> 1. As input arguments, out_insn represents an rtl expression that
> potentially "produces" a store to memory and in_insn represents an rtl
> expression that potentially "consumes" a value recently stored to memory.
> 
You have this reversed, the code is trying to find situations where
out_insn is producing a value that in_insn will be storing to memory.

> 2. If the memory store produced matches the memory fetch consumed, this
> function returns true to indicate that this sequence of two instructions
> qualifies for a special "bypass" latency that represents the fact that
> the fetch will obtain the value out of the write buffer.  So, whereas
> the instruction scheduler might normally expect that this sequence of
> two instructions would experience Load-Hit-Store penalties associated
> with cache coherency hardware costs, since these two instruction qualify
> for the store_data_bypass optimization, the instruction scheduler counts
> the latency as only 1 or 2 cycles (potentially).  [This is what I
> understand, but I may be wrong, so please correct me if so.]
> 
In general, yes, if the function returns true then the sequence has been
identified and the target will take appropriate action (adjusting
latency or whatever).


As for the remainder below dealing with PARALLELs, I don't have any
history on that so hopefully others can chime in. For the rs6000 port, I
don't see the handling of multiple SET operations making much sense, but
then again I don't know if it will actually occur either based on the
places where store_data_bypass_p is used.

-Pat

> 3. Actually, what I described above is only the "simple" case.  It may
> be that the rtl for either out_insn or in_insn is really a parallel
> clause with multiple rtl trees beneath it.  In this case, we compare the
> subtrees in a "similar" way to see if the compound expressions qualify
> for the store_data_bypass_p "optimization".  (I've got some questions
> about how this is done below)  As currently implemented, special
> handling is given to a CLOBBER subtree as part of either PARALLEL
> expression: we ignore it.  This is because CLOBBER does not represent
> any real machine instructions.  It just represents semantic information
> that might be used by the compiler.
> 
> In addition to seeking confirmation of my existing understanding of the
> code as outlined above, the specific questions that I am seeking help
> with are:
> 
> 1. In the current implementation (as I understand it), near the top of
> the function body, we handle the case that the consumer (in_insn) rtl is
> a single SET expression and the producer (out_insn) rtl is a PARALLEL
> expression containing multiple sets.  The way I read this code, we are
> requiring that every one of the producer's parallel SET instructions
> produce the same value that is to be consumed in order to qualify this
> sequence as a "store data bypass".  That seems wrong to me.  I would
> expect that we only need "one" of the produced values to match the
> consumed value in order to qualify for the "store data bypass"
> optimization.  Please explain.  (The same confusing behavior happens
> below in the same function, in the case that the consumer rtl is a
> PARALLEL expression of multiple SETs: we require that every producer's
> stored value match every consumer's fetched value.)
> 
> 2. A "bigger" concern is that any time any SETs are buried within a
> PARALLEL tree, I'm not sure the answer produced by this function, as
> currently implemented, is at all reliable:
> 
>  a) PARALLEL does not necessarily mean all of its subtrees happen in
> parallel on hardware.  It just means that there is no sequencing imposed
> by the source code, so the final order in which the multiple subtrees
> beneath the PARALLEL node is not known at this stage of compilation.
> 
>  b) It seems to me that it doesn't really make sense to speak of whether
> a whole bunch of producers combined with a whole bunch of consumers
> qualify for an optimized store data bypass latency.  If we say that they
> do qualify (as a group), which pair(s) of producer and consumer machine
> instructions qualify?  It seems we need to know which producer matches
> with which consumer in order to know where the bypass latencies "fit"
> into the schedule.
> 
>  c) Furthermore, if it turns out that the "arbitrary" order in which the
> producer instructions and consumer instructions are emitted places too
> much "distance" between a producer and the matching consumer, then it is
> possible that by the time the hardware executes the consumer, the stored
> value is no longer in the write buffer, so even though we might have
> "thought" two PARALLEL rtl expressions qualified for the store bypass
> optimization, we really should have returned false.
> 
> Can someone help me understand this better?
> 
> Thanks much.
> 
> 



Re: [PATCH][PR sanitizer/80414] Fix segfault with -fsanitize=undefined on 32 bit host

2017-04-13 Thread Denis Khalikov

Thanks for review.

I updated the patch.


On 04/13/2017 04:10 PM, Jakub Jelinek wrote:

On Thu, Apr 13, 2017 at 12:28:40PM +0300, Denis Khalikov wrote:

--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/bounds-15.c
@@ -0,0 +1,11 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
+
+int main()
+{
+  long long offset = 10;
+  char array[10];
+  char c = array[offset];
+  return 0;
+}


I would expect you want to dg-output here the runtime diagnostics,
at least some part of it, to make it clear the testcase is UB and
to test whether the UB is detected.


diff --git a/gcc/ubsan.c b/gcc/ubsan.c
index c01d633..936 100644
--- a/gcc/ubsan.c
+++ b/gcc/ubsan.c
@@ -672,7 +672,8 @@ ubsan_expand_bounds_ifn (gimple_stmt_iterator *gsi)

   /* Pick up the arguments of the UBSAN_BOUNDS call.  */
   tree type = TREE_TYPE (TREE_TYPE (gimple_call_arg (stmt, 0)));
-  tree index = gimple_call_arg (stmt, 1);
+  tree index, orig_index;
+  index = orig_index = gimple_call_arg (stmt, 1);
   tree orig_index_type = TREE_TYPE (index);


Instead of this I'd suggest:
   tree index = gimple_call_arg (stmt, 1);
-  tree orig_index_type = TREE_TYPE (index);
+  tree orig_index = index;


   tree bound = gimple_call_arg (stmt, 2);

@@ -708,9 +709,9 @@ ubsan_expand_bounds_ifn (gimple_stmt_iterator *gsi)
  ? BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS
  : BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS_ABORT;
   tree fn = builtin_decl_explicit (bcode);
-  tree val = force_gimple_operand_gsi (gsi, ubsan_encode_value (index),
-  true, NULL_TREE, true,
-  GSI_SAME_STMT);
+  tree val
+   = force_gimple_operand_gsi (gsi, ubsan_encode_value (orig_index), true,
+   NULL_TREE, true, GSI_SAME_STMT);
   g = gimple_build_call (fn, 2, data, val);
 }
   gimple_set_location (g, loc);


and replace orig_index_type use with TREE_TYPE (orig_index)

Jakub



commit 5267088b655febb8dd9b675e5da7263ada41ead4
Author: Denis Khalikov 
Date:   Thu Apr 13 12:03:19 2017 +0300

PR sanitizer/80414
* ubsan.c (ubsan_expand_bounds_ifn): Fix wrong tree val generation
for 32 bit host.
* c-c++-common/ubsan/bounds-15.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3154103..283dbd6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-04-13  Denis Khalikov 
+
+	PR sanitizer/80414
+	* ubsan.c (ubsan_expand_bounds_ifn): Fix wrong tree val generation
+	for 32 bit host.
+
 2017-04-12  Jan Hubicka  
 
 	PR lto/69953 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index b1594f2..fe55233 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2017-04-13  Denis Khalikov  
+
+	PR sanitizer/80414
+	* c-c++-common/ubsan/bounds-15.c: New test.
+
 2017-04-12  Jakub Jelinek  
 
 	PR tree-optimization/79390
diff --git a/gcc/testsuite/c-c++-common/ubsan/bounds-15.c b/gcc/testsuite/c-c++-common/ubsan/bounds-15.c
new file mode 100644
index 000..d62f5d5
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/bounds-15.c
@@ -0,0 +1,13 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
+
+int main()
+{
+  long long offset = 10;
+  char array[10];
+  char c = array[offset];
+  return 0;
+}
+
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'char \\\[10\\\]'" } */
diff --git a/gcc/ubsan.c b/gcc/ubsan.c
index c01d633..4159cc5 100644
--- a/gcc/ubsan.c
+++ b/gcc/ubsan.c
@@ -673,7 +673,7 @@ ubsan_expand_bounds_ifn (gimple_stmt_iterator *gsi)
   /* Pick up the arguments of the UBSAN_BOUNDS call.  */
   tree type = TREE_TYPE (TREE_TYPE (gimple_call_arg (stmt, 0)));
   tree index = gimple_call_arg (stmt, 1);
-  tree orig_index_type = TREE_TYPE (index);
+  tree orig_index = index;
   tree bound = gimple_call_arg (stmt, 2);
 
   gimple_stmt_iterator gsi_orig = *gsi;
@@ -700,7 +700,7 @@ ubsan_expand_bounds_ifn (gimple_stmt_iterator *gsi)
   tree data
 	= ubsan_create_data ("__ubsan_out_of_bounds_data", 1, &loc,
 			 ubsan_type_descriptor (type, UBSAN_PRINT_ARRAY),
-			 ubsan_type_descriptor (orig_index_type),
+			 ubsan_type_descriptor (TREE_TYPE (orig_index)),
 			 NULL_TREE, NULL_TREE);
   data = build_fold_addr_expr_loc (loc, data);
   enum built_in_function bcode
@@ -708,9 +708,9 @@ ubsan_expand_bounds_ifn (gimple_stmt_iterator *gsi)
 	  ? BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS
 	  : BUILT_IN_UBSAN_HANDLE_OUT_OF_BOUNDS_ABORT;
   tree fn = builtin_decl_explicit (bcode);
-  tree val = force_gimple_operand_gsi (gsi, ubsan_encode_value (index),
-	   true, NULL_TREE, true,
-	   GSI_SAME_STMT);
+  tree val
+	= force_gimple_operand_gsi (gsi, ubsan_encode_value (orig_index), true,
+NULL_TREE, true, GSI_SAME_STMT);
   g = gimple_build_call (fn, 2, data, val);
 }
  

Re: [PATCH][PR sanitizer/80414] Fix segfault with -fsanitize=undefined on 32 bit host

2017-04-13 Thread Jakub Jelinek
On Thu, Apr 13, 2017 at 04:55:52PM +0300, Denis Khalikov wrote:
> +2017-04-13  Denis Khalikov 
> +
> + PR sanitizer/80414
> + * ubsan.c (ubsan_expand_bounds_ifn): Fix wrong tree val generation
> + for 32 bit host.

I'd say here instead ...): Pass original index to ubsan_encode_value.

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/ubsan/bounds-15.c
> @@ -0,0 +1,13 @@
> +/* { dg-do run } */
> +/* { dg-options "-fsanitize=bounds" } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
> +
> +int main()
> +{
> +  long long offset = 10;
> +  char array[10];
> +  char c = array[offset];
> +  return 0;
> +}
> +
> +/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'char \\\[10\\\]'" 
> } */

And the initial \[^\n\r]* makes no sense, there is no ^, so the regexp tries
to match anywhere in the output (which is sufficient here).  Other tests
use /* { dg-output "\[^\n\r]*something..." } */, but only in the second
and further dg-output directive (the regexps from all dg-output are
concatenated together).  So just use
/* { dg-output "index 10 out of bounds for type 'char \\\[10\\\]'" } */

Ok for trunk with those changes, thanks.

Jakub


[PATCH] Fix subreg in debug_insn handling issue exposed on mn103

2017-04-13 Thread Jeff Law


The mn103 port fails to build newlib/libgcc because it's got a 
non-simplifyable (subreg (mem)) in a debug insn.


reload will call eliminate_regs_1 on the debug insn.  It'll see the 
subreg and call gen_rtx_SUBREG.  That asserts that the subreg is valid. 
Which (of course) fails.


The key here is that validity of a subreg expression varies depending on 
whether or not it appears in a DEBUG_INSN or elsewhere.



The fix is to have reload use gen_rtx_raw_SUBREG for subregs appearing 
in DEBUG_INSNs.  Doing so allows the mn103 port to build libgcc and 
newlib successfully.  LRA may also be affected (by inspection), but I 
haven't tried to fix it without a way to trigger the issue.


This has been further tested by building most of the *-elf, *-gnu and 
*-rtems targets through to newlib, glibc and newlib respectively.  The 
exceptions that fail do so for completely unrelated reasons.


While it shouldn't trigger anything, I also did a bootstrap & regression 
test cycle on x86_64-linux-gnu for sanity's sake.


This is a regression -- mn103 has certainly built libgcc/newlib in the 
past.  Installing on the trunk,


Jeff


Re: [PATCH] Fix subreg in debug_insn handling issue exposed on mn103

2017-04-13 Thread Jakub Jelinek
On Thu, Apr 13, 2017 at 08:03:48AM -0600, Jeff Law wrote:
> 
> The mn103 port fails to build newlib/libgcc because it's got a
> non-simplifyable (subreg (mem)) in a debug insn.
> 
> reload will call eliminate_regs_1 on the debug insn.  It'll see the subreg
> and call gen_rtx_SUBREG.  That asserts that the subreg is valid. Which (of
> course) fails.
> 
> The key here is that validity of a subreg expression varies depending on
> whether or not it appears in a DEBUG_INSN or elsewhere.
> 
> 
> The fix is to have reload use gen_rtx_raw_SUBREG for subregs appearing in
> DEBUG_INSNs.  Doing so allows the mn103 port to build libgcc and newlib
> successfully.  LRA may also be affected (by inspection), but I haven't tried
> to fix it without a way to trigger the issue.
> 
> This has been further tested by building most of the *-elf, *-gnu and
> *-rtems targets through to newlib, glibc and newlib respectively.  The
> exceptions that fail do so for completely unrelated reasons.
> 
> While it shouldn't trigger anything, I also did a bootstrap & regression
> test cycle on x86_64-linux-gnu for sanity's sake.
> 
> This is a regression -- mn103 has certainly built libgcc/newlib in the past.
> Installing on the trunk,

ENOPATCH.  But what you describe looks reasonable.

Jakub


Re: [PATCH] Fix subreg in debug_insn handling issue exposed on mn103

2017-04-13 Thread Jeff Law

On 04/13/2017 08:06 AM, Jakub Jelinek wrote:

On Thu, Apr 13, 2017 at 08:03:48AM -0600, Jeff Law wrote:


The mn103 port fails to build newlib/libgcc because it's got a
non-simplifyable (subreg (mem)) in a debug insn.

reload will call eliminate_regs_1 on the debug insn.  It'll see the subreg
and call gen_rtx_SUBREG.  That asserts that the subreg is valid. Which (of
course) fails.

The key here is that validity of a subreg expression varies depending on
whether or not it appears in a DEBUG_INSN or elsewhere.


The fix is to have reload use gen_rtx_raw_SUBREG for subregs appearing in
DEBUG_INSNs.  Doing so allows the mn103 port to build libgcc and newlib
successfully.  LRA may also be affected (by inspection), but I haven't tried
to fix it without a way to trigger the issue.

This has been further tested by building most of the *-elf, *-gnu and
*-rtems targets through to newlib, glibc and newlib respectively.  The
exceptions that fail do so for completely unrelated reasons.

While it shouldn't trigger anything, I also did a bootstrap & regression
test cycle on x86_64-linux-gnu for sanity's sake.

This is a regression -- mn103 has certainly built libgcc/newlib in the past.
Installing on the trunk,


ENOPATCH.  But what you describe looks reasonable.

Bah.  Attached.


Jeff
commit e716863933c0cb1db0ba63d014ea989d8ff45c40
Author: law 
Date:   Thu Apr 13 14:02:33 2017 +

* reload1.c (eliminate_regs_1): Call gen_rtx_raw_SUBREG for SUBREGs
appearing in DEBUG_INSNs.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@246904 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 740ca66..caec440 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2017-04-13  Jeff Law  
+
+   * reload1.c (eliminate_regs_1): Call gen_rtx_raw_SUBREG for SUBREGs
+   appearing in DEBUG_INSNs.
+
 2017-04-13  Martin Liska  
 
PR gcov-profile/80413
diff --git a/gcc/reload1.c b/gcc/reload1.c
index c1ce7ca..4dc118e 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -2831,6 +2831,8 @@ eliminate_regs_1 (rtx x, machine_mode mem_mode, rtx insn,
  || x_size == new_size)
  )
return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));
+ else if (insn && GET_CODE (insn) == DEBUG_INSN)
+   return gen_rtx_raw_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
  else
return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
}


Re: [PATCH][PR sanitizer/80414] Fix segfault with -fsanitize=undefined on 32 bit host

2017-04-13 Thread Maxim Ostapenko

Hi,

I've applied the patch for Denis in r246909. Also, the issue seems to 
appear at least in GCC 6 (and perhaps in 5, I need to check).

Is it OK to apply the patch on branches after testing?

-Maxim

On 13/04/17 17:03, Jakub Jelinek wrote:

On Thu, Apr 13, 2017 at 04:55:52PM +0300, Denis Khalikov wrote:

+2017-04-13  Denis Khalikov 
+
+   PR sanitizer/80414
+   * ubsan.c (ubsan_expand_bounds_ifn): Fix wrong tree val generation
+   for 32 bit host.

I'd say here instead ...): Pass original index to ubsan_encode_value.


--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/bounds-15.c
@@ -0,0 +1,13 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-O0" } } */
+
+int main()
+{
+  long long offset = 10;
+  char array[10];
+  char c = array[offset];
+  return 0;
+}
+
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'char \\\[10\\\]'" } 
*/

And the initial \[^\n\r]* makes no sense, there is no ^, so the regexp tries
to match anywhere in the output (which is sufficient here).  Other tests
use /* { dg-output "\[^\n\r]*something..." } */, but only in the second
and further dg-output directive (the regexps from all dg-output are
concatenated together).  So just use
/* { dg-output "index 10 out of bounds for type 'char \\\[10\\\]'" } */

Ok for trunk with those changes, thanks.

Jakub







Re: [PR 80293] Don't totally-sRA char arrays

2017-04-13 Thread Martin Jambor
Hi,

On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
> On Wed, 12 Apr 2017, Martin Jambor wrote:
> 
> > Hi,
> > 
> > the patch below is an attempt to deal with PR 80293 as non-invasively
> > as possible.  Basically, it switches off total SRA scalarization of
> > any local aggregates which contains an array of elements that have one
> > byte (or less).
> > 
> > The logic behind this is that accessing such arrays element-wise
> > usually results in poor code and that such char arrays are often used
> > for non-statically-typed content anyway, and we do not want to copy
> > that byte per byte.
> > 
> > Alan, do you think this could impact your constant pool scalarization
> > too severely?
> 
> Hmm, isn't one of the issues that we have
> 
> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
>   {
> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> <= max_scalarization_size)
>   {
> create_total_scalarization_access (var);
> 
> which limits the size of scalarizable vars but not the number
> of accesses we create for total scalarization?

Well, yes, but at least I understood from your comment #4 in the bug
that you did not want to limit the number of replacements for gcc 7?

> 
> Is scalarizable_type_p only used in contexts where we have no hint
> of the actual accesses?

There are should_scalarize_away_bitmap and
cannot_scalarize_away_bitmap hints.

Total scalarization is a hack process where we chop up small
aggregates according to their types - as opposed to actual accesses,
meaning it is an alien process to the rest of SRA - knowing that they
will go completely away afterwards (that is ensured by
cannot_scalarize_away_bitmap) and that this will facilitate copy
propagation (this is indicated by should_scalarize_away_bitmap, which
has a bit set if there is a non-scalar assignment _from_ (a part of)
the aggregate).

> That is, for the constant pool case we
> usually have
> 
>   x = .LC0;
>   .. = x[2];
> 
> so we have a "hint" that accesses on x are those we'd want to
> optimize to accesses to .LC0.

You don't need total scalarization for this, just the existence of
read from x[2] would be sufficient but thanks for pointing me to the
right direction.  For constant pool decl scalarization, it is not
important to introduce artificial accesses for x but for .LC0.
Therefore, I have adapted the patch to allow char arrays for const
decls only and verified that it scalarizes a const-pool array of chars
on Aarch64.  The (otherwise yet untested) result is below.

What do you think?

Martin


2017-04-13  Martin Jambor  

* tree-sra.c (scalarizable_type_p): New parameter const_decl, make
char arrays not totally scalarizable if it is false.
(analyze_all_variable_accesses): Pass correct value in the new
parameter.

testsuite/
* g++.dg/tree-ssa/pr80293.C: New test.
---
 gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 +
 gcc/tree-sra.c  | 21 ++-
 2 files changed, 60 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
new file mode 100644
index 000..7faf35ae983
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
@@ -0,0 +1,45 @@
+// { dg-do compile }
+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
+
+#include 
+
+// Return a copy of the underlying memory of an arbitrary value.
+template <
+typename T,
+typename = typename 
std::enable_if::value>::type
+>
+auto getMem(
+T const & value
+) -> std::array {
+auto ret = std::array{};
+__builtin_memcpy(ret.data(), &value, sizeof(T));
+return ret;
+}
+
+template <
+typename T,
+typename = typename 
std::enable_if::value>::type
+>
+auto fromMem(
+std::array const & buf
+) -> T {
+auto ret = T{};
+__builtin_memcpy(&ret, buf.data(), sizeof(T));
+return ret;
+}
+
+double foo1(std::uint64_t arg) {
+return fromMem(getMem(arg));
+}
+
+double foo2(std::uint64_t arg) {
+return *reinterpret_cast(&arg);
+}
+
+double foo3(std::uint64_t arg) {
+double ret;
+__builtin_memcpy(&ret, &arg, sizeof(arg));
+return ret;
+}
+
+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 02453d3ed9a..ab06be66131 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool write)
 
 /* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or fixed-length
ARRAY_TYPE with fields that are either of gimple register types (excluding
-   bit-fields) or (recursively) scalarizable types.  */
+   bit-fields) or (recursively) scalarizable types.  CONST_DECL must be true if
+   we are considering a decl from constant pool.  If it is false, char arrays
+   will be refused.  */

patch to fix PR80343

2017-04-13 Thread Vladimir Makarov

  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80343

  The patch was successfully bootstrapped and tested on x86-64.

  Committed as rev. 246914.

Index: ChangeLog
===
--- ChangeLog	(revision 246907)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2017-04-13  Vladimir Makarov  
+
+	PR rtl-optimization/80343
+	* lra-remat.c (update_scratch_ops): Assign original hard reg to
+	new scratch pseudo.
+
 2017-04-13  Jeff Law  
 
 	* reload1.c (eliminate_regs_1): Call gen_rtx_raw_SUBREG for SUBREGs
Index: lra-remat.c
===
--- lra-remat.c	(revision 246907)
+++ lra-remat.c	(working copy)
@@ -1024,6 +1024,7 @@ get_hard_regs (struct lra_insn_reg *reg,
 static void
 update_scratch_ops (rtx_insn *remat_insn)
 {
+  int hard_regno;
   lra_insn_recog_data_t id = lra_get_insn_recog_data (remat_insn);
   struct lra_static_insn_data *static_id = id->insn_static_data;
   for (int i = 0; i < static_id->n_operands; i++)
@@ -1034,9 +1035,17 @@ update_scratch_ops (rtx_insn *remat_insn
   int regno = REGNO (*loc);
   if (! lra_former_scratch_p (regno))
 	continue;
+  hard_regno = reg_renumber[regno];
   *loc = lra_create_new_reg (GET_MODE (*loc), *loc,
  lra_get_allocno_class (regno),
  "scratch pseudo copy");
+  if (hard_regno >= 0)
+	{
+	  reg_renumber[REGNO (*loc)] = hard_regno;
+	  if (lra_dump_file)
+	fprintf (lra_dump_file, "	 Assigning the same %d to r%d\n",
+		 REGNO (*loc), hard_regno);
+	}
   lra_register_new_scratch_op (remat_insn, i);
 }
   
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 246907)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2017-04-13  Vladimir Makarov  
+
+	PR rtl-optimization/80343
+	* gcc.target/powerpc/pr80343.c: New.
+
 2017-04-13  Richard Biener  
 
 	PR testsuite/80416
Index: testsuite/gcc.target/powerpc/pr80343.c
===
--- testsuite/gcc.target/powerpc/pr80343.c	(nonexistent)
+++ testsuite/gcc.target/powerpc/pr80343.c	(working copy)
@@ -0,0 +1,50 @@
+/* { dg-do compile { target powerpc-*-*spe } } */
+/* { dg-options "-O2 -ftracer -fPIC" } */
+long long int vi, ls;
+int wq, oa, to, fv;
+signed char zo;
+
+long long int
+ai (long long int ip, long long int jc, int gt)
+{
+  ip /= 3;
+  jc += ip;
+  if (ip != 0)
+vi = 0;
+  vi += ls;
+
+  if (wq != oa)
+{
+  int tz;
+
+  for (tz = 0; tz < 32; ++tz)
+zo -= wq & gt;
+
+  if ((gt & 5) > oa)
+{
+  zo += gt;
+  fv += zo + to;
+}
+
+  if (gt != 0)
+oa = 0;
+
+  if (fv != 0)
+{
+  vi += wq;
+  ls += ip;
+  jc += (vi != 0) ? ip : ls;
+}
+
+  while (tz != 0)
+{
+  zo = wq;
+  tz = zo;
+}
+
+  ++to;
+  wq = ip;
+}
+
+  return jc;
+}


Re: [Patch, fortran] PR34360 (Comment 28) - ICE when assigning item of a derived-component to a pointer

2017-04-13 Thread Paul Richard Thomas
Dear Dominique,

The attached fixes the problem withPR51218 and bootstraps and regtests
on FC23/x86_64 - OK for trunk?

Cheers

Paul

2017-04-13  Paul Thomas  

PR fortran/34640
* expr.c (gfc_check_pointer_assign): Exclude pointer array
components in test for 'subref_array_pointer' attribute.
(gfc_hidden_length_field): New function.
* gfortran.h : Prototype for the above.
* resolve.c (resolve_component): Call the above for deferred
character and pointer array components to provide the hidden
field for the character length or span.
* trans-array.c (gfc_conv_scalarized_array_ref); Use the hidden
span field provided by 'gfc_pointer_array_comp_ref' in the call
to 'gfc_build_array_ref'.
(build_array_ref): Add the new argument 'passed_span' and pass
its to 'gfc_build_array_ref'.
(gfc_conv_array_ref): Same as 'gfc_conv_scalarized_array_ref'.
(gfc_array_allocate): Set the hidden span field if it is passed
by 'gfc_pointer_array_comp_ref'.
(gfc_get_dataptr_offset): Pass a null to the 'passed_span' arg.
trans-expr.c (gfc_trans_pointer_assignment): Obtain the 'span'
for pointer array components and use if applicable.
* trans-io.c (gfc_trans_transfer): Scalarize if this is a
pointer array component, rather than using the library.
trans.c (gfc_build_addr_expr): Use the 'passed_span' arg.
(gfc_pointer_array_comp_ref): New function.
(hidden_length_field): New function.
(gfc_deferred_strlen): Now just calls previous.
(gfc_span_field): New function.
* trans.h : Add prototypes for 'gfc_pointer_array_comp_ref' and
'gfc_span_field'.

2017-04-13  Paul Thomas  

PR fortran/34640
* gfortran.dg/pointer_array_component_1.f90: New test.
* gfortran.dg/pointer_array_component_2.f90: New test.


On 9 April 2017 at 17:14, Dominique d'Humières  wrote:
> The original test in pr51218 is also miscomputed with the patch:
>
>  Before t:
>
> Program received signal SIGSEGV: Segmentation fault - invalid memory 
> reference.
>
> Dominique
>
>> Le 9 avr. 2017 à 16:41, Dominique d'Humières  a écrit :
>>
>> Dear Paul,
>>
>> Your patch fixes the tests in pr34640 comments 20 and 28 (I didn’t test the 
>> variants in comment 27) and in pr57733.
>> The tests in pr34640 in comments 0, 3, and 5, as well in all the other 
>> duplicates still fail.
>>
>> Thanks for working on the issue,
>>
>> Dominique
>>
>



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein
Index: gcc/fortran/expr.c
===
*** gcc/fortran/expr.c  (revision 246903)
--- gcc/fortran/expr.c  (working copy)
*** gfc_check_pointer_assign (gfc_expr *lval
*** 3733,3739 
return false;
  }
  
!   if (rvalue->expr_type == EXPR_VARIABLE && is_subref_array (rvalue))
  lvalue->symtree->n.sym->attr.subref_array_pointer = 1;
  
attr = gfc_expr_attr (rvalue);
--- 3733,3742 
return false;
  }
  
!   /* Pointer array components are taken care of using the hidden 'span'
!  component.  */
!   if (rvalue->expr_type == EXPR_VARIABLE && is_subref_array (rvalue)
!   && lvalue->symtree->n.sym->ts.type != BT_DERIVED)
  lvalue->symtree->n.sym->attr.subref_array_pointer = 1;
  
attr = gfc_expr_attr (rvalue);
*** gfc_check_vardef_context (gfc_expr* e, b
*** 5504,5506 
--- 5507,5530 
  
return true;
  }
+ 
+ 
+ gfc_component *
+ gfc_hidden_length_field (gfc_symbol *sym, gfc_component *c,
+bool add_if_missing, const char *postfix)
+ {
+   char name[GFC_MAX_SYMBOL_LEN+9];
+   gfc_component *strlen;
+   sprintf (name, "_%s_%s", c->name, postfix);
+   strlen = gfc_find_component (sym, name, true, true, NULL);
+   if (strlen == NULL && add_if_missing)
+ {
+   if (!gfc_add_component (sym, name, &strlen))
+   return NULL;
+   strlen->ts.type = BT_INTEGER;
+   strlen->ts.kind = gfc_charlen_int_kind;
+   strlen->attr.access = ACCESS_PRIVATE;
+   strlen->attr.artificial = 1;
+ }
+   return strlen;
+ }
Index: gcc/fortran/gfortran.h
===
*** gcc/fortran/gfortran.h  (revision 246903)
--- gcc/fortran/gfortran.h  (working copy)
*** gfc_expr* gfc_find_stat_co (gfc_expr *);
*** 3157,3162 
--- 3157,3164 
  gfc_expr* gfc_build_intrinsic_call (gfc_namespace *, gfc_isym_id, const char*,
locus, unsigned, ...);
  bool gfc_check_vardef_context (gfc_expr*, bool, bool, bool, const char*);
+ gfc_component* gfc_hidden_length_field (gfc_symbol *, gfc_component *,
+   bool, const char *);
  
  
  /* st.c */
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 246903)
--- gcc/fortran/resolve.c   (working copy)
*** resolve

Re: [PATCH][PR sanitizer/80414] Fix segfault with -fsanitize=undefined on 32 bit host

2017-04-13 Thread Jakub Jelinek
On Thu, Apr 13, 2017 at 05:56:56PM +0300, Maxim Ostapenko wrote:
> Hi,
> 
> I've applied the patch for Denis in r246909. Also, the issue seems to appear
> at least in GCC 6 (and perhaps in 5, I need to check).
> Is it OK to apply the patch on branches after testing?

Yes.

Jakub


Re: [PR 80293] Don't totally-sRA char arrays

2017-04-13 Thread Richard Biener
On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor  wrote:
>Hi,
>
>On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
>> On Wed, 12 Apr 2017, Martin Jambor wrote:
>> 
>> > Hi,
>> > 
>> > the patch below is an attempt to deal with PR 80293 as
>non-invasively
>> > as possible.  Basically, it switches off total SRA scalarization of
>> > any local aggregates which contains an array of elements that have
>one
>> > byte (or less).
>> > 
>> > The logic behind this is that accessing such arrays element-wise
>> > usually results in poor code and that such char arrays are often
>used
>> > for non-statically-typed content anyway, and we do not want to copy
>> > that byte per byte.
>> > 
>> > Alan, do you think this could impact your constant pool
>scalarization
>> > too severely?
>> 
>> Hmm, isn't one of the issues that we have
>> 
>> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
>>   {
>> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>> <= max_scalarization_size)
>>   {
>> create_total_scalarization_access (var);
>> 
>> which limits the size of scalarizable vars but not the number
>> of accesses we create for total scalarization?
>
>Well, yes, but at least I understood from your comment #4 in the bug
>that you did not want to limit the number of replacements for gcc 7?
>
>> 
>> Is scalarizable_type_p only used in contexts where we have no hint
>> of the actual accesses?
>
>There are should_scalarize_away_bitmap and
>cannot_scalarize_away_bitmap hints.
>
>Total scalarization is a hack process where we chop up small
>aggregates according to their types - as opposed to actual accesses,
>meaning it is an alien process to the rest of SRA - knowing that they
>will go completely away afterwards (that is ensured by
>cannot_scalarize_away_bitmap) and that this will facilitate copy
>propagation (this is indicated by should_scalarize_away_bitmap, which
>has a bit set if there is a non-scalar assignment _from_ (a part of)
>the aggregate).

OK, but for the copy x = y where x would go away it still depends on uses of x 
what pieces we actually want?  Or is total scalarization only done for x  = y; 
z = x;?
Thus no further accesses to x?

>> That is, for the constant pool case we
>> usually have
>> 
>>   x = .LC0;
>>   .. = x[2];
>> 
>> so we have a "hint" that accesses on x are those we'd want to
>> optimize to accesses to .LC0.
>
>You don't need total scalarization for this, just the existence of
>read from x[2] would be sufficient but thanks for pointing me to the
>right direction.  For constant pool decl scalarization, it is not
>important to introduce artificial accesses for x but for .LC0.
>Therefore, I have adapted the patch to allow char arrays for const
>decls only and verified that it scalarizes a const-pool array of chars
>on Aarch64.  The (otherwise yet untested) result is below.
>
>What do you think?

Why special case char arrays?  I'd simply disallow total scalarization of 
non-const arrays completely.

>Martin
>
>
>2017-04-13  Martin Jambor  
>
>   * tree-sra.c (scalarizable_type_p): New parameter const_decl, make
>   char arrays not totally scalarizable if it is false.
>   (analyze_all_variable_accesses): Pass correct value in the new
>   parameter.
>
>testsuite/
>   * g++.dg/tree-ssa/pr80293.C: New test.
>---
>gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45
>+
> gcc/tree-sra.c  | 21 ++-
> 2 files changed, 60 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>
>diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>new file mode 100644
>index 000..7faf35ae983
>--- /dev/null
>+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>@@ -0,0 +1,45 @@
>+// { dg-do compile }
>+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
>+
>+#include 
>+
>+// Return a copy of the underlying memory of an arbitrary value.
>+template <
>+typename T,
>+typename = typename
>std::enable_if::value>::type
>+>
>+auto getMem(
>+T const & value
>+) -> std::array {
>+auto ret = std::array{};
>+__builtin_memcpy(ret.data(), &value, sizeof(T));
>+return ret;
>+}
>+
>+template <
>+typename T,
>+typename = typename
>std::enable_if::value>::type
>+>
>+auto fromMem(
>+std::array const & buf
>+) -> T {
>+auto ret = T{};
>+__builtin_memcpy(&ret, buf.data(), sizeof(T));
>+return ret;
>+}
>+
>+double foo1(std::uint64_t arg) {
>+return fromMem(getMem(arg));
>+}
>+
>+double foo2(std::uint64_t arg) {
>+return *reinterpret_cast(&arg);
>+}
>+
>+double foo3(std::uint64_t arg) {
>+double ret;
>+__builtin_memcpy(&ret, &arg, sizeof(arg));
>+return ret;
>+}
>+
>+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
>diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>index 02453d3ed9a..ab06be66131 100644
>--- a/gcc/

Re: [wwwdocs,fortran] Update link to CHKSYS

2017-04-13 Thread Jerry DeLisle
On 04/13/2017 02:13 AM, Arjen Markus wrote:
> Hi Gerald, Jerry,
> 
> as the author of this program (and the maintainer of that site) I am
> honoured that you refer to this. But I would like to tell you a bit
> about the history:
> I started CHKSYS in the time that FORTRAN 77 was still the main
> standard, so much of the tests that are contained in this program are
> specific to all the variations I had encountered with various FORTRAN
> compilers. I did start a version for Fortran 90, but I was rather a
> novice wrt that standard back then and I never really achieved a
> satisfactory set of tests.
> 
> Some time last year I started anew with a set of programs - see the
> chkfeatures subdirectory in the source repository. These programs
> check whether a compiler supports features found in Fortran 2003 and
> 2008 and various common extensions that I am aware of. Since the
> checks frequently cause the compilers to complain - and are even
> designed for this - I have set this up as a collection of programs,
> rather than a single one.
> 
> Checking the web pages, I see I should really update the original page
> and integrate the new stuff with it. If you have any suggestions to
> make the reference easier (access to the source code from that page?),
> then let me know. I am no hero when it comes to web design, but a
> better set-up of links and such is well within my capacity.
> 

First, the original page link we had is broken. so clarify for us the URL you
intend to update.

A download link on each page is helpful. Our link was to CHKSYS so I had to
manually navigate up one level to hunt around for the actual source code. We
probably should link to your main page so one navigates down to what one is
looking for.  I think the download page link has one word like 'here' or
something.  Maybe pull that out to a separate line that says "Download Sources"
in bold or something. Just a suggestion, by all means do as you please.

Jerry


Re: [PATCH] Fix another fold-const.c type bug (PR sanitizer/80403)

2017-04-13 Thread Jakub Jelinek
On Wed, Apr 12, 2017 at 06:12:57PM +0200, Jakub Jelinek wrote:
> @@ -11542,7 +11544,7 @@ fold_ternary_loc (location_t loc, enum t
> && (code == VEC_COND_EXPR || !VECTOR_TYPE_P (type)))
>   return fold_build2_loc (loc, code == VEC_COND_EXPR ? BIT_AND_EXPR
>  : TRUTH_ANDIF_EXPR,
> - type, fold_convert_loc (loc, type, arg0), arg1);
> + type, op0, op1);

This part (just the fold_convert_loc -> op0) has been a thinko, type is the
required type of only the op1 and op2 arguments on *COND_EXPR, op0 can have
a different type, thus fold_convert is needed there.

I've bootstrapped/regtested following reversion of that + testcase and
committed as obvious to trunk.

2017-04-13  Jakub Jelinek  

PR sanitizer/80403
* fold-const.c (fold_ternary_loc): Revert
use op0 instead of fold_convert_loc (loc, type, arg0) part of
2017-04-12 change.

* g++.dg/ubsan/pr80403-2.C: New test.

--- gcc/fold-const.c.jj 2017-04-12 20:07:22.0 +0200
+++ gcc/fold-const.c2017-04-13 16:08:26.429745344 +0200
@@ -11544,7 +11544,7 @@ fold_ternary_loc (location_t loc, enum t
  && (code == VEC_COND_EXPR || !VECTOR_TYPE_P (type)))
return fold_build2_loc (loc, code == VEC_COND_EXPR ? BIT_AND_EXPR
   : TRUTH_ANDIF_EXPR,
-   type, op0, op1);
+   type, fold_convert_loc (loc, type, arg0), op1);
 
   /* Convert A ? B : 1 into !A || B if A and B are truth values.  */
   if (code == VEC_COND_EXPR ? integer_all_onesp (op2) : integer_onep (op2)
--- gcc/testsuite/g++.dg/ubsan/pr80403-2.C.jj   2017-04-13 16:11:46.277223411 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr80403-2.C  2017-04-13 16:11:11.0 
+0200
@@ -0,0 +1,14 @@
+// PR sanitizer/80403
+// { dg-do compile }
+// { dg-options "-fsanitize=undefined" }
+
+extern const long long int v;
+extern unsigned long int w;
+
+int
+foo ()
+{
+  int a = (0 - 40U <= (0 == 8)) << !w << (0 < v) == 0;
+  int b = ((0 ^ 0) < (long) (1066066618772207110 <= 0)) / 0 << 0;  // { 
dg-warning "division by zero" }
+  return a + b;
+}


Jakub


[PATCH] Fix NVPTX offloading with dynamic libcuda.so.1 loading on powerpc

2017-04-13 Thread Jakub Jelinek
Hi!

Finally got to debug why most libgomp offloading tests fail on
powerpc64le-linux when configured for NVPTX offloading.

The problem is -funsigned-char by default, so
if (cuda_lib_inited != -1)
was always true.  Fixed thusly, bootstrapped/regtested on x86_64-linux
and powerpc64le-linux, committed to trunk.

2017-04-13  Jakub Jelinek  

* plugin/plugin-nvptx.c (cuda_lib_inited): Use signed char type
instead of char.

--- libgomp/plugin/plugin-nvptx.c.jj2017-02-04 03:15:32.0 -0500
+++ libgomp/plugin/plugin-nvptx.c   2017-04-13 08:24:46.456531331 -0400
@@ -106,7 +106,7 @@ struct cuda_lib_s {
 
 /* -1 if init_cuda_lib has not been called yet, false
if it has been and failed, true if it has been and succeeded.  */
-static char cuda_lib_inited = -1;
+static signed char cuda_lib_inited = -1;
 
 /* Dynamically load the CUDA runtime library and initialize function
pointers, return false if unsuccessful, true if successful.  */


Jakub


Re: [PATCH] Fix PR51513, switch statement with default case containing __builtin_unreachable leads to wild branch

2017-04-13 Thread Peter Bergner
Bah, fixing up my return address.


On 4/13/17 3:14 AM, Richard Biener wrote:
> To recap the situation (from what I can deciper out of the ppc asm
> and the expand RTL) we seem to expand to
> 
>   if (cond > 2)
> __builtin_unreachable (); // jumps to the jump table data(?)
>   goto *tbl[cond];
> 
> now I do not remember the reason why we keep __builtin_unreachable ()
> at the RTL level -- on GIMPLE we keep it to be able to extract
> range information from the controlling condition.

As Jakub said, we do remove the blocks.  But we don't remove the branch
to those blocks.  That is true of whether the __builtin_unreachable() is
in the default case statement or in a normal case statement.  That is
what leads to the wild branch issue.


> Your patch comments suggest that handling of gaps is similarly
> improved but the testcase doesn't contain any gaps.

I didn't change the way the jump table gaps are handled.  They have
always been redirected to the "default label"... although the patch
doesn't use default_label anymore, it uses gap_label, as resetting
default_label to fallback_label when default_label is NULL was
what was stopping us from removing the range check.

What is changed, is that now we also handle case labels that point to
blocks with __builtin_unreachable() and those are also redirected to
the new gap_label.

I did not include a test case with an unreachable case statement,
because unlike the unreachable default case test case, there is no
compare/branch I can verify has been removed.  However, looking
closer, it seems the unpatched compiler leaves the unreachable
block as well as it's label in the jump table, so I should be able
to create an executable test case that uses a switch condition
that targets the unreachable case and see if we SEGV/SIGILL or not.
I cannot do that for the unreachable default case, since the range
check we remove is what saves us from accessing the jump table
with an index that is outside of the table.



> So I wonder why we don't simply apply the "transform" at the GIMPLE
> level, for example in the kitchen-sink pass_fold_builtins.  That is
> remove all __builtin_unreachable () labels and if the default label
> is amongst them make the first one the new default (or maybe the
> last one dependent on which would shrink jump table size most).

Well if the default case is unreachable and we compute a "new"
default, then we won't be able to eliminate the range check
(ie, compare/branch to default).  So isn't it performance wise
better to eliminate the range check, rather than choosing a
new default?  If you think we shouldn't care about that, then
I'd vote for choosing the first/last label with the higher
edge probability rather than trying to shrink the table.




> VRP2 removes the if because we put the range computed by VRP1
> on the SSA name used in the test (we have special code handling
> unreachable paths there).  For your testcase with the switch
> as cond_2 is not used after the switch we do not compute any
> range for it, otherwise we'd likely eliminate the default
> case as well.  For the small testcase above fold-all-builtins
> handles the removal as well if VRP is not run, so extending
> that to handle switch stmts sounds reasonable (see
> optimize_unreachable).  There's even a TODO in this function.

If we updated optimize_unreachable() to remove unreachable
default cases, would we still be able to disambiguate between
a switch statement written with no default case and one where
the default case was explicitly shown to be unreachable?
Maybe the default_label would be NULL for the unreachable
case and non-NULL in the other case?  If so, we'd still need
my change that doesn't set default_label to fallback_label
and instead uses the new var gap_label.

Peter



Re: port contrib/download_prerequisites script to macOS

2017-04-13 Thread Damian Rouson
Resending as plain text:

 On April 12, 2017 at 3:03:19 PM, Jeff Law 
(l...@redhat.com(mailto:l...@redhat.com)) wrote:
 
 > > 
 > > 2017-04-05 Damian Rouson 
 > > 
 > > * download_prerequisites (md5_check): New function emulates Linux 
 > > 'md5 --check' on macOS. Modified script for macOS compatibility. 
 > I wonder if we should just switch to curl from wget in general rather 
 > than conditionalizing the code at all. 
 
 Hi Jeff,
 
 
Thanks for your comments. The conditionals are more portable than hardwiring 
one choice. On macOS, curl is always present but not wget. On the Linux 
distribution that I use (Lubuntu), wget is always present but curl isn’t 
installed by default. My goal was to support as many users as possible without 
requiring them to install prerequisites just to download the prerequisites. :) 
As my first contribution of a patch to GCC, I took the baby step of allowing 
for the use of curl. If acceptable, my next step would be to allow for the use 
of ftp when neither curl nor wget is present. That’s what I do in the scripts 
from which I culled the code in the patch. In those scripts, I invoke ftp via 
the following function: 
 
https://github.com/sourceryinstitute/OpenCoarrays/blob/master/prerequisites/build-functions/ftp_url.sh
 
 > 
 > 
 > For the sums, rather than doing a check of the OS, just see if  
 > sha512/md5sum exists. If not, then fallback to the Darwin defaults. 
 
I think I tried that first and ran into some difficulties. I’ll make another 
attempt. My recent shell programming experience is with the bash shell and I 
have some experience with the C-shell from years ago, but I had no experience 
with writing Bourne shell scripts before writing this patch and I think the 
problem I encountered was related to differences between Bourne and bash.
 
 
 Damian


Re: [PATCH] Fix fixincludes for canadian cross builds

2017-04-13 Thread Bernd Edlinger
On 04/12/17 17:58, Yvan Roux wrote:
> Hi,
>
> On 20 February 2017 at 18:53, Bruce Korb  wrote:
>> On 02/18/17 01:01, Bernd Edlinger wrote:
>>> On 02/18/17 00:37, Bruce Korb wrote:
 On 02/06/17 10:44, Bernd Edlinger wrote:
> I tested this change with different arm-linux-gnueabihf cross
> compilers, and verified that mkheaders still works on the host system.
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

 As long as you certify that this is correct for all systems we care about:

 +BUILD_SYSTEM_HEADER_DIR = `
 +echo $(CROSS_SYSTEM_HEADER_DIR) | \
 +sed -e :a -e 's,[^/]*/\.\.\/,,' -e ta`

 that is pretty obtuse sed-speak to me.  I suggest a comment
 explaining what sed is supposed to be doing.  What should
 "$(CROSS_SYSTEM_HEADER_DIR)" look like?

>>>
>>> I took it just from a few lines above, so I thought that comment would
>>> sufficiently explain the syntax:
>>
>> I confess, I didn't pull a new copy of gcc, sorry.
>> So it looks good to me.
>
>
> We just noticed that this patch brakes canadian cross builds when
> configured with --with-build-sysroot, since headers are searched into
> the target sysroot instead of the one specified for builds.
>
> Maybe there's a cleaner way to fix this and avoid the duplication but
> I didn't find another to test if --with-build-sysroot is used.  The
> attached patch fixes the issue.  Tested with a Full canadian cross
> build for i686-w64-mingw32 host and arm-linux-gnueabihf.
>
> Thanks
> Yvan
>
> 2017-04-12  Yvan Roux  
>
>* Makefile.in (BUILD_SYSTEM_HEADER_DIR): Set to SYSTEM_HEADER_DIR
>when configured with --with-build-sysroot.
>

Oops, sorry for the breakage...

However I think the patch simply restores the previous behavior, because
ifdef SYSROOT_CFLAGS_FOR_TARGET is always true, even if it's empty,
but it does still not work correctly.

I tried to build a cross with your patch and a --with-build-sysroot
but the target compiler does fix the wrong includes for me:

../gcc-trunk/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-cross 
--host=arm-linux-gnueabihf --target=arm-linux-gnueabihf 
--enable-languages=c,c++,ada,fortran --with-arch=armv7-a 
--with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard 
--with-build-sysroot=/home/ed/gnu/arm-linux-gnueabihf-3
=>
Fixing headers into 
/home/ed/gnu/gcc-build-arm-linux-gnueabihf-2/gcc/include-fixed for 
arm-unknown-linux-gnueabihf target
Forbidden identifiers: linux unix
Finding directories and links to directories
  Searching /usr/include/.
  Searching /usr/include/./c++/4.8.4
  Searching /usr/include/./numpy
  Searching /usr/include/./python2.7/numpy
Making symbolic directory links
Fixing directory /usr/include into 
/home/ed/gnu/gcc-build-arm-linux-gnueabihf-2/gcc/include-fixed

but it should fix headers in .../arm-linux-gnueabihf-3/usr/include

I think it would work if I use --with-sysroot together with
--with-build-sysroot in the config above, but why should the
target compiler use --with-sysroot ?
There is no need for that on the target, just the cross-compiler
might need to support that.

I tried to fix some possible combinations of --with-sysroot/
--with-build-sysroot, and moved the logic to the configure
script.  When I did that I noticed also some more glitches
when grabbing the TARGET_GLIBC_MAJOR/MINOR defines from sysroot
files.

This updated patch seems to work for me, could you give it a try?




Thanks
Bernd.
2017-04-14  Bernd Edlinger  

	* configure.ac (SYSTEM_HEADER_DIR, BUILD_SYSTEM_HEADER_DIR,
	target_header_dir): Set correctly.
	* configure: Regenerated.
	* Makefile.in (BUILD_SYSTEM_HEADER_DIR): Use directly.

Index: gcc/configure
===
--- gcc/configure	(revision 246899)
+++ gcc/configure	(working copy)
@@ -719,6 +719,7 @@ BUILD_CFLAGS
 CXX_FOR_BUILD
 CC_FOR_BUILD
 inhibit_libc
+BUILD_SYSTEM_HEADER_DIR
 SYSTEM_HEADER_DIR
 ALL
 CROSS
@@ -12214,14 +12215,15 @@ done
 CROSS=
 ALL=all.internal
 SYSTEM_HEADER_DIR='$(NATIVE_SYSTEM_HEADER_DIR)'
+BUILD_SYSTEM_HEADER_DIR=$SYSTEM_HEADER_DIR
 
 if test "x$with_build_sysroot" != x; then
-  build_system_header_dir=$with_build_sysroot'$${sysroot_headers_suffix}$(NATIVE_SYSTEM_HEADER_DIR)'
-else
+  BUILD_SYSTEM_HEADER_DIR=$with_build_sysroot'$${sysroot_headers_suffix}$(NATIVE_SYSTEM_HEADER_DIR)'
+elif test x$host != x$target; then
   # This value is used, even on a native system, because
   # CROSS_SYSTEM_HEADER_DIR is just
   # $(TARGET_SYSTEM_ROOT)$(NATIVE_SYSTEM_HEADER_DIR).
-  build_system_header_dir='$(CROSS_SYSTEM_HEADER_DIR)'
+  BUILD_SYSTEM_HEADER_DIR='$(CROSS_SYSTEM_HEADER_DIR)'
 fi
 
 if test x$host != x$target
@@ -12228,7 +12230,7 @@ if test x$host != x$target
 then
 	CROSS="-DCROSS_DIRECTORY_STRUCTURE"
 	ALL=all.cross
-	SYSTEM_HEADER_DIR=$build_system_header_dir
+	SYSTEM_HEADER_DIR=$BUILD_SYSTEM_HEADER_DIR
 	case $target in
 		*-*-mingw*)
 			if test "x$with_head

[PATCH] Fix newlib build failure for mips16

2017-04-13 Thread Jeff Law



The mips64vr-elf target will fail building newlib, particularly the 
mips16 newlib as we emit bogus assembly code.


In particular the compiler will emit something like

lwu $2,0($sp)

That's invalid in mips16 mode AFAICT.

That's emitted by the extendsidi pattern.  It's a case where the operand 
predicates are looser that the constraints.  The code we get out of 
reload is fine, but hard register propagation substitutes sp for a 
(valid mips16) hard register that as the same value.  Since hard 
register propagation tests predicates, not constraints, the substitution 
is successful and the bogus code is generated.


Sadly, we don't have a good predicate to use for the source operand of 
an extendsidi pattern.  In general sp+offset is a valid memory address, 
but not for lwu.


I briefly pondered disabling the pattern for mips16, but that seems 
somewhat anti-performant.  So I looked at alternatives.


I poked a bit at adding an appropriate operand predicate, but it just 
gets ugly as to do it right.  I'd want to use some of the 
GO_IF_LEGITIMATE_ADDRESS machinery to decompose the address.  But 
everything is static.  Exposing it would be possible I suppose.


I could also have passed in a flag to the GO_IF_LEGITIMATE_ADDRESS 
machinery to indicating we're handling lwu, but that seemed hacky.


Instead I added additional tests to the pattern's condition to verify 
that if TARGET_MIPS16 was active, the input operand was not a MEM where 
sp appears in the address.  That's fairly surgical so we're not going to 
adversely affect code generation and doesn't require hacking up the 
GO_IF_LEGITIMATE_ADDRESS machinery.


That allows mips64vr-elf to build newlib & libgcc.  It was certainly a 
regression as we've been able to build mips16 newlib multilibs in the past.


Installing on the trunk.

Jeff


ps.  bz74563 (P2 regression) is an unrelated mips16 issue introduced a 
couple years ago that probably makes classic mips16 unusable right now. 
I may take a stab at fixing that too since I have a reasonable idea 
what's happening.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 33b094e..788f029 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-04-13  Jeff Law  
+
+   * config/mips.mips.md (zero_extendsidi2): Do not allow SP to appear
+   in operands[1] if it is a MEM and TARGET_MIPS16 is active.
+   (zero_extendsidi2_dext): Likewise.
+
 2017-04-13  Jakub Jelinek  
 
PR sanitizer/80403
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 7acf00d..dd5e1e7 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -3493,7 +3493,10 @@
 (define_insn_and_split "*zero_extendsidi2"
   [(set (match_operand:DI 0 "register_operand" "=d,d")
 (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "d,W")))]
-  "TARGET_64BIT && !ISA_HAS_EXT_INS"
+  "TARGET_64BIT && !ISA_HAS_EXT_INS
+   && !(TARGET_MIPS16
+&& MEM_P (operands[1])
+&& reg_mentioned_p (stack_pointer_rtx, operands[1]))"
   "@
#
lwu\t%0,%1"
@@ -3509,7 +3512,10 @@
 (define_insn "*zero_extendsidi2_dext"
   [(set (match_operand:DI 0 "register_operand" "=d,d")
 (zero_extend:DI (match_operand:SI 1 "nonimmediate_operand" "d,W")))]
-  "TARGET_64BIT && ISA_HAS_EXT_INS"
+  "TARGET_64BIT && ISA_HAS_EXT_INS
+   && !(TARGET_MIPS16
+&& MEM_P (operands[1])
+&& reg_mentioned_p (stack_pointer_rtx, operands[1]))"
   "@
dext\t%0,%1,0,32
lwu\t%0,%1"