-fstrict-aliasing fixes 4/6: do not fiddle with flag_strict_aliasing when expanding debug locations

2015-12-02 Thread Jan Hubicka
Hi,
this patch removes flag_strict_aliasing kludge in expanding debug locations and
instead it introduces explicit parameter DEBUG that makes
set_mem_attributes_minus_bitpos to not affect alias sets.  This is sanity
checked by comparing number of alias sets before and after at a time we
originally overwritten flag_strict_aliasing.

I also added code to prevent memory attributes creation for !optimize and to
avoid get_alias_set computation for !flag_strict_aliasing. This slightly
optimizes -O0 builds but the results seems to be down in the noise (I would not
object to leave it out).

The patch should fix at least one (latent?) bug that call_stmt expansion
invoke expand_debug_expr without clearing flag_strict_aliasing.

Bootstrapped/regtested x86_64-linux, also tested with compare-debug, OK?

Honza

* cfgexpand.c: Include alias.h
(expand_debug_expr): Pass debug=true to set_mem_attributes.
(expand_debug_locations): Do not fiddle with flag_strict_aliasing;
sanity check that no new alias set was introduced.
* varasm.c: Include alias.h
(make_decl_rtl): New parameter DEBUG; pass it to set_mem_attributes.
(make_decl_rtl_for_debug): Do ont fiddle with flag_strict_aliasing;
assert that no new alias set was introduced.
* varasm.h (make_decl_rtl): New parameter debug.
* alias.h (num_alias_sets): New function.
* emit-rtl.c (set_mem_attributes_minus_bitpos): New parameter DEBUG;
exit early when not optimizing; do not introduce new alias set when
producing debug only attributes.
(set_mem_attributes): New parameter DEBUG.
* emit-rtl.h (set_mem_attributes, set_mem_attributes_minus_bitpos):
New parameters DEBUG.
(num_alias_sets): New function.

Index: cfgexpand.c
===
--- cfgexpand.c (revision 231122)
+++ cfgexpand.c (working copy)
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.
 #include "builtins.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
+#include "alias.h"
 
 /* Some systems use __main in a way incompatible with its use in gcc, in these
cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to
@@ -4178,7 +4179,7 @@ expand_debug_expr (tree exp)
return NULL_RTX;
  op0 = gen_rtx_CONST_STRING (Pmode, TREE_STRING_POINTER (exp));
  op0 = gen_rtx_MEM (BLKmode, op0);
- set_mem_attributes (op0, exp, 0);
+ set_mem_attributes (op0, exp, 0, true);
  return op0;
}
   /* Fall through...  */
@@ -4346,7 +4347,7 @@ expand_debug_expr (tree exp)
return NULL;
 
   op0 = gen_rtx_MEM (mode, op0);
-  set_mem_attributes (op0, exp, 0);
+  set_mem_attributes (op0, exp, 0, true);
   if (TREE_CODE (exp) == MEM_REF
  && !is_gimple_mem_ref_addr (TREE_OPERAND (exp, 0)))
set_mem_expr (op0, NULL_TREE);
@@ -4372,7 +4373,7 @@ expand_debug_expr (tree exp)
 
   op0 = gen_rtx_MEM (mode, op0);
 
-  set_mem_attributes (op0, exp, 0);
+  set_mem_attributes (op0, exp, 0, true);
   set_mem_addr_space (op0, as);
 
   return op0;
@@ -4458,7 +4459,7 @@ expand_debug_expr (tree exp)
  op0 = copy_rtx (op0);
if (op0 == orig_op0)
  op0 = shallow_copy_rtx (op0);
-   set_mem_attributes (op0, exp, 0);
+   set_mem_attributes (op0, exp, 0, true);
  }
 
if (bitpos == 0 && mode == GET_MODE (op0))
@@ -5219,12 +5220,11 @@ expand_debug_locations (void)
 {
   rtx_insn *insn;
   rtx_insn *last = get_last_insn ();
-  int save_strict_alias = flag_strict_aliasing;
 
   /* New alias sets while setting up memory attributes cause
  -fcompare-debug failures, even though it doesn't bring about any
  codegen changes.  */
-  flag_strict_aliasing = 0;
+  int num = num_alias_sets ();
 
   for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
 if (DEBUG_INSN_P (insn))
@@ -5284,7 +5284,7 @@ expand_debug_locations (void)
  avoid_complex_debug_insns (insn2, &INSN_VAR_LOCATION_LOC (insn2), 0);
   }
 
-  flag_strict_aliasing = save_strict_alias;
+  gcc_checking_assert (num == num_alias_sets ());
 }
 
 /* Performs swapping operands of commutative operations to expand
Index: varasm.c
===
--- varasm.c(revision 231122)
+++ varasm.c(working copy)
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.
 #include "common/common-target.h"
 #include "asan.h"
 #include "rtl-iter.h"
+#include "alias.h"
 
 #ifdef XCOFF_DEBUGGING_INFO
 #include "xcoffout.h"  /* Needed for external data declarations.  */
@@ -1280,7 +1281,7 @@ ultimate_transparent_alias_target (tree
This is never called for PARM_DECL nodes.  */
 
 void
-make_decl_rtl (tree decl)
+make_decl_rtl (tree decl, bool debug)
 {
   const char *name = 0;
   int reg_number;
@@ -1470,7 +1471,7 @@ make_decl_rtl (tree decl)
 
   x 

-fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

2015-12-02 Thread Jan Hubicka
Hi,
this patch makes the type system to be unchanged by flag_strict_aliasing.
This is needed to prevent optimization loss in flag_strict_aliasing code where
some !flag_strict_aliasing code put alias set 0 into a type (this happens
in all cases I modified in my original patch). It is also necessary to validate
ipa-icf and operand_equal_p transformations to be safe for code transitions
!flag_strict_aliasing->flag_strict_aliasing that I wasn to do in the inliner.

This patch goes the opposite way than my previous attempt (and is short unlike
the explanation ;).  Instead of adding extra parameter to get_alias_set it
makes get_alias_set do ignore flag_strict_aliasing.  To make sure that no TBAA
is used when !flag_strict_aliasing I can simply disable alias_set_subset_of and
alias_sets_conflict_p which are the only way TBAA oracle can disambiguate
items.

Next there are cases where optimizations are disabled to keep TBAA right.  
I audited the code and found only function.c (that uses object_must_conflict
for packing) and ipa-icf/fold-const.  This patch updates 
objects_must_conflict_p, fold-const
already check flag_strict_aliasing and I did not update ipa-icf because I would
have to disable non-strict-aliasing path in the followup patch.

I checked that there is no code difference with -fno-strict-aliasing 
-fno-ipa-icf
with this patch on tramp3d and dealII


Bootstrapped/regtested x86_64-linux and also lto-bootstraped. Looks OK?

* alias.c (alias_set_subset_of, alias_sets_conflict_p,
objects_must_conflict_p): Short circuit for !flag_strict_aliasing
(get_alias_set): Remove flag_strict_aliasing check.
(new_alias_set): Likewise.
Index: alias.c
===
--- alias.c (revision 231081)
+++ alias.c (working copy)
@@ -405,6 +405,10 @@ alias_set_subset_of (alias_set_type set1
 {
   alias_set_entry *ase2;
 
+  /* Disable TBAA oracle with !flag_strict_aliasing.  */
+  if (!flag_strict_aliasing)
+return true;
+
   /* Everything is a subset of the "aliases everything" set.  */
   if (set2 == 0)
 return true;
@@ -466,6 +470,10 @@ alias_sets_conflict_p (alias_set_type se
   alias_set_entry *ase1;
   alias_set_entry *ase2;
 
+  /* Disable TBAA oracle with !flag_strict_aliasing.  */
+  if (!flag_strict_aliasing)
+return true;
+
   /* The easy case.  */
   if (alias_sets_must_conflict_p (set1, set2))
 return 1;
@@ -561,6 +569,9 @@ objects_must_conflict_p (tree t1, tree t
 {
   alias_set_type set1, set2;
 
+  if (!flag_strict_aliasing)
+return 1;
+
   /* If neither has a type specified, we don't know if they'll conflict
  because we may be using them to store objects of various types, for
  example the argument and local variables areas of inlined functions.  */
@@ -816,10 +827,12 @@ get_alias_set (tree t)
 {
   alias_set_type set;
 
-  /* If we're not doing any alias analysis, just assume everything
- aliases everything else.  Also return 0 if this or its type is
- an error.  */
-  if (! flag_strict_aliasing || t == error_mark_node
+  /* We can not give up with -fno-strict-aliasing because we need to build
+ proper type representation for possible functions which are build with
+ -fstirct-aliasing.  */
+
+  /* return 0 if this or its type is an error.  */
+  if (t == error_mark_node
   || (! TYPE_P (t)
  && (TREE_TYPE (t) == 0 || TREE_TYPE (t) == error_mark_node)))
 return 0;
@@ -1085,15 +1098,10 @@ get_alias_set (tree t)
 alias_set_type
 new_alias_set (void)
 {
-  if (flag_strict_aliasing)
-{
-  if (alias_sets == 0)
-   vec_safe_push (alias_sets, (alias_set_entry *) NULL);
-  vec_safe_push (alias_sets, (alias_set_entry *) NULL);
-  return alias_sets->length () - 1;
-}
-  else
-return 0;
+  if (alias_sets == 0)
+vec_safe_push (alias_sets, (alias_set_entry *) NULL);
+  vec_safe_push (alias_sets, (alias_set_entry *) NULL);
+  return alias_sets->length () - 1;
 }
 
 /* Indicate that things in SUBSET can alias things in SUPERSET, but that


Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-02 Thread Richard Biener
On Tue, 1 Dec 2015, Jeff Law wrote:

> On 12/01/2015 11:33 AM, Alan Lawrence wrote:
> > 
> > I was not able to reduce the testcase below about 30k characters, with e.g.
> > #define T_VOID 0
> >  T_VOID 
> > producing the ICE, but manually changing to
> >  0 
> > preventing the ICE; as did running the preprocessor as a separate step, or a
> > wide variety of options (e.g. -fdump-tree-alias).
> Which is almost always an indication that there's a memory corruption, or
> uninitialized memory read or something similar.
> 
> 
> > 
> > In the end I traced this to loop_unswitch reading stale values from the edge
> > redirect map, which is keyed on 'edge' (a pointer to struct edge_def); the
> > map
> > entries had been left there by pass_dominator (on a different function), and
> > by
> > "chance" the edge *pointers* were the same as to some current edge_defs
> > (even
> > though they pointed to structures created by different allocations, the
> > first
> > of which had since been freed). Hence the fragility of the testcase and
> > environment.
> Right.  So the question I have is how/why did DOM leave anything in the map.
> And if DOM is fixed to not leave stuff lying around, can we then assert that
> nothing is ever left in those maps between passes?  There's certainly no good
> reason I'm aware of why DOM would leave things in this state.

It happens not only with DOM but with all passes doing edge redirection.
This is because the map is populated by GIMPLE cfg hooks just in case
it might be used.  But there is no such thing as a "start CFG manip"
and "end CFG manip" to cleanup such dead state.

IMHO the redirect-edge-var-map stuff is just the very most possible
unclean implementation possible. :(  (see how remove_edge "clears"
stale info from the map to avoid even more "interesting" stale
data)

Ideally we could assert the map is empty whenever we leave a pass,
but as said it triggers all over the place.  Even cfg-cleanup causes
such stale data.

I agree that the patch is only a half-way "solution", but a full
solution would require sth more explicit, like we do with
initialize_original_copy_tables/free_original_copy_tables.  Thus
require passes to explicitely request the edge data to be preserved
with a initialize_edge_var_map/free_edge_var_map call pair.

Not appropriate at this stage IMHO (well, unless it turns out to be
a very localized patch).

Richard.


PR68146: Check for null SSA_NAME_DEF_STMTs in fold-const.c

2015-12-02 Thread Richard Sandiford
The problem in the testcase was that tree-complex.c was trying
to fold ABS_EXPRs of SSA names that didn't yet have a definition
(because the SSA names were real and imaginary parts of a complex
SSA name whose definition hadn't yet been visited by the pass).
tree-complex.c uses a straightforward walk in index order:

  /* ??? Ideally we'd traverse the blocks in breadth-first order.  */
  old_last_basic_block = last_basic_block_for_fn (cfun);
  FOR_EACH_BB_FN (bb, cfun)
{

and in the testcase, we have a block A with a single successor B that
comes before it.  B has no other predecessor and has a complex division
that uses an SSA name X defined in A, so we split the components of X
before we reach the definition of X.  (I imagine cfgcleanup would
clean this up by joining A and B.)

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/68146
* fold-const.c (tree_binary_nonnegative_warnv_p): Check for null
SSA_NAME_DEF_STMTs.
(integer_valued_real_call_p): Likewise.

gcc/testsuite/
* gfortran.fortran-torture/compile/pr68146.f90: New test.

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 16bff5f..c99e78e 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -12867,6 +12867,8 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, 
tree type, tree op0,
 bool
 tree_single_nonnegative_warnv_p (tree t, bool *strict_overflow_p, int depth)
 {
+  gimple *stmt;
+
   if (TYPE_UNSIGNED (TREE_TYPE (t)))
 return true;
 
@@ -12892,8 +12894,9 @@ tree_single_nonnegative_warnv_p (tree t, bool 
*strict_overflow_p, int depth)
 to provide it through dataflow propagation.  */
   return (!name_registered_for_update_p (t)
  && depth < PARAM_VALUE (PARAM_MAX_SSA_NAME_QUERY_DEPTH)
- && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
- strict_overflow_p, depth));
+ && (stmt = SSA_NAME_DEF_STMT (t))
+ && gimple_stmt_nonnegative_warnv_p (stmt, strict_overflow_p,
+ depth));
 
 default:
   return tree_simple_nonnegative_warnv_p (TREE_CODE (t), TREE_TYPE (t));
@@ -13508,6 +13511,7 @@ integer_valued_real_call_p (combined_fn fn, tree arg0, 
tree arg1, int depth)
 bool
 integer_valued_real_single_p (tree t, int depth)
 {
+  gimple *stmt;
   switch (TREE_CODE (t))
 {
 case REAL_CST:
@@ -13524,8 +13528,8 @@ integer_valued_real_single_p (tree t, int depth)
 to provide it through dataflow propagation.  */
   return (!name_registered_for_update_p (t)
  && depth < PARAM_VALUE (PARAM_MAX_SSA_NAME_QUERY_DEPTH)
- && gimple_stmt_integer_valued_real_p (SSA_NAME_DEF_STMT (t),
-   depth));
+ && (stmt = SSA_NAME_DEF_STMT (t))
+ && gimple_stmt_integer_valued_real_p (stmt, depth));
 
 default:
   break;
diff --git a/gcc/testsuite/gfortran.fortran-torture/compile/pr68146.f90 
b/gcc/testsuite/gfortran.fortran-torture/compile/pr68146.f90
new file mode 100644
index 000..7f75ec0
--- /dev/null
+++ b/gcc/testsuite/gfortran.fortran-torture/compile/pr68146.f90
@@ -0,0 +1,11 @@
+subroutine foo(a1, a2, s1, s2, n)
+  integer :: n
+  complex(kind=8) :: a1(n), a2(n), s1, s2
+  do i = 1, n
+ a1(i) = i
+  end do
+  s1 = 20.0 / s2
+  do i = 1, n
+ a2(i) = i / s2
+  end do
+end subroutine foo



Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-02 Thread Richard Biener
On Tue, 1 Dec 2015, Alan Lawrence wrote:

> This follows on from discussion at
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03392.html
> To recap: Starting in r229479 and continuing until at least 229711, compiling
> polynom.c from spec2000 on aarch64-none-linux-gnu, with options
> -O3 -mcpu=cortex-a53 -ffast-math (on both cross, native bootstrapped, and 
> native
> --disable-bootstrap compilers), produced a verify_gimple ICE after unswitch:
> 
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c: In function 
> 'NormalizeCoeffsListx':
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: 
> incompatible types in PHI argument 0
>  TypHandle NormalizeCoeffsListx ( hdC )
>^
> long int
> 
> int
> 
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location 
> references block not in block tree
> l1_279 = PHI <1(28), l1_299(33)>
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: invalid 
> PHI argument
> 
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: internal 
> compiler error: tree check: expected class 'type', have 'declaration' 
> (namespace_decl) in useless_type_conversion_p, at gimple-expr.c:84
> 0xd110ef tree_class_check_failed(tree_node const*, tree_code_class, char 
> const*, int, char const*)
> ../../gcc-fsf/gcc/tree.c:9643
> 0x82561b tree_class_check
> ../../gcc-fsf/gcc/tree.h:3042
> 0x82561b useless_type_conversion_p(tree_node*, tree_node*)
> ../../gcc-fsf/gcc/gimple-expr.c:84
> 0xaca043 verify_gimple_phi
> ../../gcc-fsf/gcc/tree-cfg.c:4673
> 0xaca043 verify_gimple_in_cfg(function*, bool)
> ../../gcc-fsf/gcc/tree-cfg.c:4967
> 0x9c2e0b execute_function_todo
> ../../gcc-fsf/gcc/passes.c:1967
> 0x9c360b do_per_function
> ../../gcc-fsf/gcc/passes.c:1659
> 0x9c3807 execute_todo
> ../../gcc-fsf/gcc/passes.c:2022
> 
> I was not able to reduce the testcase below about 30k characters, with e.g.
> #define T_VOID 0
>  T_VOID 
> producing the ICE, but manually changing to
>  0 
> preventing the ICE; as did running the preprocessor as a separate step, or a
> wide variety of options (e.g. -fdump-tree-alias).
> 
> In the end I traced this to loop_unswitch reading stale values from the edge
> redirect map, which is keyed on 'edge' (a pointer to struct edge_def); the map
> entries had been left there by pass_dominator (on a different function), and 
> by
> "chance" the edge *pointers* were the same as to some current edge_defs (even
> though they pointed to structures created by different allocations, the first
> of which had since been freed). Hence the fragility of the testcase and
> environment.
> 
> While the ICE is prevented merely by adding a call to
> redirect_edge_var_map_destroy at the end of pass_dominator::execute, given the
> fragility of the bug, difficulty of reducing the testcase, and the low 
> overhead
> of emptying an already-empty map, I believe the right fix is to empty the map
> as often as can correctly do so, hence this patch - based substantially on
> Richard's comments in PR/68117.
> 
> Bootstrapped + check-gcc + check-g++ on x86_64 linux, based on r231105; I've
> also built SPEC2000 on aarch64-none-linux-gnu by applying this patch (roughly)
> onto the previously-failing r229711, which also passes aarch64 bootstrap, and
> a more recent bootstrap on aarch64 is ongoing. Assuming/if no regressions 
> there...
> 
> Is this ok for trunk?
> 
> This could also be a candidate for the 5.3 release; backporting depends only 
> on
> the (fairly trivial) r230357.

Looks good to me (for both, but backport only after 5.3 is released).  But
please wait for the discussion with Jeff to settle down.

Thanks,
Richard.
 
> gcc/ChangeLog:
> 
>   Alan Lawrence  
>   Richard Biener  
> 
>   * cfgexpand.c (pass_expand::execute): Replace call to
>   redirect_edge_var_map_destroy with redirect_edge_var_map_empty.
>   * tree-ssa.c (delete_tree_ssa): Likewise.
>   * function.c (set_cfun): Call redirect_edge_var_map_empty.
>   * passes.c (execute_one_ipa_transform_pass, execute_one_pass): Likewise.
>   * tree-ssa.h (redirect_edge_var_map_destroy): Remove.
>   (redirect_edge_var_map_empty): New.
>   * tree-ssa.c (redirect_edge_var_map_destroy): Remove.
>   (redirect_edge_var_map_empty): New.
> 
> ---
>  gcc/cfgexpand.c | 2 +-
>  gcc/function.c  | 2 ++
>  gcc/passes.c| 2 ++
>  gcc/tree-ssa.c  | 8 
>  gcc/tree-ssa.h  | 2 +-
>  5 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 1990e10..ede1b82 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -6291,7 +6291,7 @@ pass_expand::execute (function *fun)
>expand_phi_nodes (&SA);
>  
>/* Release any stale SSA redirection data.  */
> -  redirect_edge_var_map_destroy ();
> +  redirect_edge_var_map_empty ();
>  
>/* Register rtl specific functions for cfg.  */
>rtl_register_cfg_hooks (

Re: [PATCH 1/2] destroy values as well as keys when removing them from hash maps

2015-12-02 Thread Richard Biener
On Wed, 2 Dec 2015, Trevor Saunders wrote:

> On Tue, Dec 01, 2015 at 07:43:35PM +, Richard Sandiford wrote:
> > tbsaunde+...@tbsaunde.org writes:
> > > -template 
> > > +template 
> > >  template 
> > >  inline void
> > > -simple_hashmap_traits ::remove (T &entry)
> > > +simple_hashmap_traits ::remove (T &entry)
> > >  {
> > >H::remove (entry.m_key);
> > > +  entry.m_value.~Value ();
> > >  }
> > 
> > This is just repeating my IRC comment really, but doesn't this mean that
> > we're calling the destructor on an object that was never constructed?
> > I.e. nothing ever calls placement new on the entry, the m_key, or the
> > m_value.
> 
> I believe you are correct that placement new is not called.  I'd say its
> a bug waiting to happen given that the usage of auto_vec seems to
> demonstrate that people expect objects to be initialized and destroyed.
> However for now all values are either POD, or auto_vec and in either
> case the current 0 initialization has the same effect as the
> constructor.  So There may be a theoretical problem with how we
> initialize values that will become real when somebody adds a constructor
> that doesn't just 0 initialize.  So it should probably be improved at
> some point, but it doesn't seem necessary to mess with it at this point
> instead of next stage 1.

Agreed.  You'll also need a more elaborate allocator/constructor
scheme for this considering the case where no default constructor
is available.  See how alloc-pool.h tries to dance around this
using a "raw" allocate and a operator new...

Richard.

> Trev
> 
> > 
> > Thanks,
> > Richard
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-12-02 Thread Andreas Krebbel
On 11/30/2015 03:45 PM, Dominik Vogt wrote:
> On Mon, Nov 09, 2015 at 01:33:23PM +0100, Andreas Krebbel wrote:
>> On 11/04/2015 02:39 AM, Dominik Vogt wrote:
>>> On Tue, Nov 03, 2015 at 06:47:28PM +0100, Ulrich Weigand wrote:
 Dominik Vogt wrote:
+++ b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
@@ -0,0 +1,11 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do assemble } */
+/* { dg-options "-dP -save-temps" } */

-save-temps is not necessary for a dg-do assemble test.

+# Additional md torture tests.
+torture-init
+set MD_TEST_OPTS [list \
+   {-Os -march=z900} {-Os -march=z13} \
+   {-O0 -march=z900} {-O0 -march=z13} \
+   {-O1 -march=z900} {-O1 -march=z13} \
+   {-O2 -march=z900} {-O2 -march=z13} \
+   {-O3 -march=z900} {-O3 -march=z13}]
+set-torture-options $MD_TEST_OPTS
+gcc-dg-runtest [lsort [glob -nocomplain $md_tests]] "" $DEFAULT_CFLAGS
 torture-finish

Does it really make sense to use different -march options for the md/ tests? 
Whether a certain
pattern will match usually depends on the -march level. I would say the -march 
option needs to be
part of testcase.

-Andreas-



Re: -fstrict-aliasing fixes 4/6: do not fiddle with flag_strict_aliasing when expanding debug locations

2015-12-02 Thread Richard Biener
On Wed, 2 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch removes flag_strict_aliasing kludge in expanding debug locations 
> and
> instead it introduces explicit parameter DEBUG that makes
> set_mem_attributes_minus_bitpos to not affect alias sets.  This is sanity
> checked by comparing number of alias sets before and after at a time we
> originally overwritten flag_strict_aliasing.
> 
> I also added code to prevent memory attributes creation for !optimize and to
> avoid get_alias_set computation for !flag_strict_aliasing. This slightly
> optimizes -O0 builds but the results seems to be down in the noise (I would 
> not
> object to leave it out).
> 
> The patch should fix at least one (latent?) bug that call_stmt expansion
> invoke expand_debug_expr without clearing flag_strict_aliasing.
> 
> Bootstrapped/regtested x86_64-linux, also tested with compare-debug, OK?

First of all, why do debug MEMs need mem-attrs?  Second, I'd rather
refactor make_decl_rtl into a _raw part that can be used from
both make_decl_rtl and make_decl_rtl_for_debug avoiding the debug
parameter.

I don't think any of this is suitable for stage3.

Thanks,
Richard.

> Honza
> 
>   * cfgexpand.c: Include alias.h
>   (expand_debug_expr): Pass debug=true to set_mem_attributes.
>   (expand_debug_locations): Do not fiddle with flag_strict_aliasing;
>   sanity check that no new alias set was introduced.
>   * varasm.c: Include alias.h
>   (make_decl_rtl): New parameter DEBUG; pass it to set_mem_attributes.
>   (make_decl_rtl_for_debug): Do ont fiddle with flag_strict_aliasing;
>   assert that no new alias set was introduced.
>   * varasm.h (make_decl_rtl): New parameter debug.
>   * alias.h (num_alias_sets): New function.
>   * emit-rtl.c (set_mem_attributes_minus_bitpos): New parameter DEBUG;
>   exit early when not optimizing; do not introduce new alias set when
>   producing debug only attributes.
>   (set_mem_attributes): New parameter DEBUG.
>   * emit-rtl.h (set_mem_attributes, set_mem_attributes_minus_bitpos):
>   New parameters DEBUG.
>   (num_alias_sets): New function.
> 
> Index: cfgexpand.c
> ===
> --- cfgexpand.c   (revision 231122)
> +++ cfgexpand.c   (working copy)
> @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.
>  #include "builtins.h"
>  #include "tree-chkp.h"
>  #include "rtl-chkp.h"
> +#include "alias.h"
>  
>  /* Some systems use __main in a way incompatible with its use in gcc, in 
> these
> cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN 
> to
> @@ -4178,7 +4179,7 @@ expand_debug_expr (tree exp)
>   return NULL_RTX;
> op0 = gen_rtx_CONST_STRING (Pmode, TREE_STRING_POINTER (exp));
> op0 = gen_rtx_MEM (BLKmode, op0);
> -   set_mem_attributes (op0, exp, 0);
> +   set_mem_attributes (op0, exp, 0, true);
> return op0;
>   }
>/* Fall through...  */
> @@ -4346,7 +4347,7 @@ expand_debug_expr (tree exp)
>   return NULL;
>  
>op0 = gen_rtx_MEM (mode, op0);
> -  set_mem_attributes (op0, exp, 0);
> +  set_mem_attributes (op0, exp, 0, true);
>if (TREE_CODE (exp) == MEM_REF
> && !is_gimple_mem_ref_addr (TREE_OPERAND (exp, 0)))
>   set_mem_expr (op0, NULL_TREE);
> @@ -4372,7 +4373,7 @@ expand_debug_expr (tree exp)
>  
>op0 = gen_rtx_MEM (mode, op0);
>  
> -  set_mem_attributes (op0, exp, 0);
> +  set_mem_attributes (op0, exp, 0, true);
>set_mem_addr_space (op0, as);
>  
>return op0;
> @@ -4458,7 +4459,7 @@ expand_debug_expr (tree exp)
> op0 = copy_rtx (op0);
>   if (op0 == orig_op0)
> op0 = shallow_copy_rtx (op0);
> - set_mem_attributes (op0, exp, 0);
> + set_mem_attributes (op0, exp, 0, true);
> }
>  
>   if (bitpos == 0 && mode == GET_MODE (op0))
> @@ -5219,12 +5220,11 @@ expand_debug_locations (void)
>  {
>rtx_insn *insn;
>rtx_insn *last = get_last_insn ();
> -  int save_strict_alias = flag_strict_aliasing;
>  
>/* New alias sets while setting up memory attributes cause
>   -fcompare-debug failures, even though it doesn't bring about any
>   codegen changes.  */
> -  flag_strict_aliasing = 0;
> +  int num = num_alias_sets ();
>  
>for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
>  if (DEBUG_INSN_P (insn))
> @@ -5284,7 +5284,7 @@ expand_debug_locations (void)
> avoid_complex_debug_insns (insn2, &INSN_VAR_LOCATION_LOC (insn2), 0);
>}
>  
> -  flag_strict_aliasing = save_strict_alias;
> +  gcc_checking_assert (num == num_alias_sets ());
>  }
>  
>  /* Performs swapping operands of commutative operations to expand
> Index: varasm.c
> ===
> --- varasm.c  (revision 231122)
> +++ varasm.c  (working copy)
> @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.
>  

[committed] Check for invalid FAILs

2015-12-02 Thread Richard Sandiford
This patch makes it a compile-time error for an internal-fn optab
to FAIL.  There are certainly other optabs and patterns besides these
that aren't allowed to fail, but this at least deals with the immediate
point of controversy.

Tested normally on x86_64-linux-gnu.  Also tested by building one
configuration per cpu directory.  arc-elf and pdp11 didn't build
for unrelated reasons, but I checked that insn-emit.o built for
both without error.

The patch is mostly just moving code around, so it looks bigger and
more invasive than it really is.

optabs.def isn't technically part of the gen* code, but moving
the comment describing optabs.def to optabs.def seemed obvious.
Applied.

Thanks,
Richard


gcc/
* Makefile.in (GENSUPPORT_H): New macro.
(build/gensupport.o, build/read-rtl.o, build/genattr.o)
(build/genattr-common.o, build/genattrtab.o, build/genautomata.o)
(build/gencodes.o, build/genconditions.o, build/genconfig.o)
(build/genconstants.o, build/genextract.o, build/genflags.o)
(build/gentarget-def.o): Use it.
(build/genemit.o): Likewise.  Depend on internal-fn.def.
* genopinit.c: Move block comment to optabs.def.
(optab_tag, optab_def): Move to gensupport.h
(pattern): Likewise, renaming to optab_pattern.
(match_pattern): Move to gensupport.c
(gen_insn): Use find_optab.
(patterns, pattern_cmp): Replace pattern with optab_pattern.
(main): Likewise.  Use num_optabs.
* optabs.def: Add comment that was previously in genopinit.c.
* gensupport.h (optab_tag): Moved from genopinit.c
(optab_def): Likewise, expanding commentary.
(optab_pattern): Likewise, after renaming from pattern.
(optabs, num_optabs, find_optab): Declare.
* gensupport.c (optabs): Moved from genopinit.c.
(num_optabs): New variable.
(match_pattern): Moved from genopinit.c.
(find_optab): New function, extracted from genopinit.c:gen_insn.
* genemit.c (nofail_optabs): New variable.
(emit_c_code): New function.
(gen_expand): Check whether the instruction is an optab that isn't
allowed to fail.  Call emit_c_code.
(gen_split): Call emit_c_code here too.
(main): Initialize nofail_optabs.  Don't emit FAIL and DONE here.

Index: gcc/Makefile.in
===
--- gcc/Makefile.in 2015-12-01 09:29:39.464193624 +
+++ gcc/Makefile.in 2015-12-02 09:04:05.753188970 +
@@ -978,6 +978,7 @@ GCC_PLUGIN_H = gcc-plugin.h highlev-plug
 PLUGIN_H = plugin.h $(GCC_PLUGIN_H)
 PLUGIN_VERSION_H = plugin-version.h configargs.h
 CONTEXT_H = context.h
+GENSUPPORT_H = gensupport.h read-md.h optabs.def
 
 #
 # Now figure out from those variables how to compile and link.
@@ -2476,7 +2477,7 @@ build/version.o:  version.c version.h \
 build/errors.o : errors.c $(BCONFIG_H) $(SYSTEM_H) errors.h
 build/gensupport.o: gensupport.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h  \
   $(GTM_H) $(RTL_BASE_H) $(OBSTACK_H) errors.h $(HASHTAB_H)\
-  $(READ_MD_H) gensupport.h
+  $(READ_MD_H) $(GENSUPPORT_H)
 build/ggc-none.o : ggc-none.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h \
   $(GGC_H)
 build/min-insn-modes.o : min-insn-modes.c $(BCONFIG_H) $(SYSTEM_H) \
@@ -2487,7 +2488,7 @@ build/read-md.o: read-md.c $(BCONFIG_H)
   $(HASHTAB_H) errors.h $(READ_MD_H)
 build/read-rtl.o: read-rtl.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h  \
   $(GTM_H) $(RTL_BASE_H) $(OBSTACK_H) $(HASHTAB_H) $(READ_MD_H)
\
-  gensupport.h
+  $(GENSUPPORT_H)
 build/rtl.o: rtl.c $(BCONFIG_H) coretypes.h $(GTM_H) $(SYSTEM_H)   \
   $(RTL_H) $(GGC_H) errors.h
 build/vec.o : vec.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h $(VEC_H)  \
@@ -2509,38 +2510,38 @@ build/gencondmd.o : \
 
 # ...these are the programs themselves.
 build/genattr.o : genattr.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \
-  coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h
+  coretypes.h $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
 build/genattr-common.o : genattr-common.c $(RTL_BASE_H) $(BCONFIG_H)   \
-  $(SYSTEM_H) coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h
+  $(SYSTEM_H) coretypes.h $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
 build/genattrtab.o : genattrtab.c $(RTL_BASE_H) $(OBSTACK_H)   \
   $(BCONFIG_H) $(SYSTEM_H) coretypes.h $(GTM_H) errors.h $(GGC_H)  \
-  $(READ_MD_H) gensupport.h $(FNMATCH_H)
+  $(READ_MD_H) $(GENSUPPORT_H) $(FNMATCH_H)
 build/genautomata.o : genautomata.c $(RTL_BASE_H) $(OBSTACK_H) \
   $(BCONFIG_H) $(SYSTEM_H) coretypes.h $(GTM_H) errors.h $(VEC_H)  \
-  $(HASHTAB_H) gensupport.h $(FNMATCH_H)
+  $(HASHTAB_H) $(GENSUPPORT_H) $(FNMATCH_H)
 build/gencheck.o : gencheck.c all-tree.def $(BCONFIG_H) $(GTM_H)   \
$(SYSTEM_H) coretypes.h tree.def c-family/c-common.def  \
$(lang_tree_files) gimple.def
 build/genchecksum.o : genchecksum

Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-12-02 Thread Dominik Vogt
On Wed, Dec 02, 2015 at 09:59:10AM +0100, Andreas Krebbel wrote:
> On 11/30/2015 03:45 PM, Dominik Vogt wrote:
> > On Mon, Nov 09, 2015 at 01:33:23PM +0100, Andreas Krebbel wrote:
> >> On 11/04/2015 02:39 AM, Dominik Vogt wrote:
> >>> On Tue, Nov 03, 2015 at 06:47:28PM +0100, Ulrich Weigand wrote:
>  Dominik Vogt wrote:
> > +++ b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
> > @@ -0,0 +1,11 @@
> > +/* Machine description pattern tests.  */
> > +
> > +/* { dg-do assemble } */
> > +/* { dg-options "-dP -save-temps" } */
> 
> -save-temps is not necessary for a dg-do assemble test.

It *is* necessary for "assemble", but not for "compile" which
should be used here.  Anyway, I want to upgrade the test to a
"run" test that also veryfies whether the generated code does the
right thing.

> > +# Additional md torture tests.
> > +torture-init
> > +set MD_TEST_OPTS [list \
> > +   {-Os -march=z900} {-Os -march=z13} \
> > +   {-O0 -march=z900} {-O0 -march=z13} \
> > +   {-O1 -march=z900} {-O1 -march=z13} \
> > +   {-O2 -march=z900} {-O2 -march=z13} \
> > +   {-O3 -march=z900} {-O3 -march=z13}]
> > +set-torture-options $MD_TEST_OPTS
> > +gcc-dg-runtest [lsort [glob -nocomplain $md_tests]] "" $DEFAULT_CFLAGS
> >  torture-finish
> 
> Does it really make sense to use different -march options for the
> md/ tests? Whether a certain pattern will match usually depends on
> the -march level. I would say the -march option needs to be part
> of testcase.

Agreed, but I think with "run" tests various -march= and -O
options are useful.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

2015-12-02 Thread Richard Biener
On Wed, 2 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch makes the type system to be unchanged by flag_strict_aliasing.
> This is needed to prevent optimization loss in flag_strict_aliasing code where
> some !flag_strict_aliasing code put alias set 0 into a type (this happens
> in all cases I modified in my original patch). It is also necessary to 
> validate
> ipa-icf and operand_equal_p transformations to be safe for code transitions
> !flag_strict_aliasing->flag_strict_aliasing that I wasn to do in the inliner.
> 
> This patch goes the opposite way than my previous attempt (and is short unlike
> the explanation ;).  Instead of adding extra parameter to get_alias_set it
> makes get_alias_set do ignore flag_strict_aliasing.  To make sure that no TBAA
> is used when !flag_strict_aliasing I can simply disable alias_set_subset_of 
> and
> alias_sets_conflict_p which are the only way TBAA oracle can disambiguate
> items.
> 
> Next there are cases where optimizations are disabled to keep TBAA right.  
> I audited the code and found only function.c (that uses object_must_conflict
> for packing) and ipa-icf/fold-const.  This patch updates 
> objects_must_conflict_p, fold-const
> already check flag_strict_aliasing and I did not update ipa-icf because I 
> would
> have to disable non-strict-aliasing path in the followup patch.
> 
> I checked that there is no code difference with -fno-strict-aliasing 
> -fno-ipa-icf
> with this patch on tramp3d and dealII
> 
> 
> Bootstrapped/regtested x86_64-linux and also lto-bootstraped. Looks OK?
> 
>   * alias.c (alias_set_subset_of, alias_sets_conflict_p,
>   objects_must_conflict_p): Short circuit for !flag_strict_aliasing
>   (get_alias_set): Remove flag_strict_aliasing check.
>   (new_alias_set): Likewise.
> Index: alias.c
> ===
> --- alias.c   (revision 231081)
> +++ alias.c   (working copy)
> @@ -405,6 +405,10 @@ alias_set_subset_of (alias_set_type set1
>  {
>alias_set_entry *ase2;
>  
> +  /* Disable TBAA oracle with !flag_strict_aliasing.  */
> +  if (!flag_strict_aliasing)
> +return true;
> +
>/* Everything is a subset of the "aliases everything" set.  */
>if (set2 == 0)
>  return true;
> @@ -466,6 +470,10 @@ alias_sets_conflict_p (alias_set_type se
>alias_set_entry *ase1;
>alias_set_entry *ase2;
>  
> +  /* Disable TBAA oracle with !flag_strict_aliasing.  */
> +  if (!flag_strict_aliasing)
> +return true;
> +
>/* The easy case.  */
>if (alias_sets_must_conflict_p (set1, set2))
>  return 1;
> @@ -561,6 +569,9 @@ objects_must_conflict_p (tree t1, tree t
>  {
>alias_set_type set1, set2;
>  
> +  if (!flag_strict_aliasing)
> +return 1;
> +

Rather than adjusting this function please adjust 
alias_sets_must_conflict_p.

Otherwise this looks ok and indeed much nicer.

Thanks,
Richard.

>/* If neither has a type specified, we don't know if they'll conflict
>   because we may be using them to store objects of various types, for
>   example the argument and local variables areas of inlined functions.  */
> @@ -816,10 +827,12 @@ get_alias_set (tree t)
>  {
>alias_set_type set;
>  
> -  /* If we're not doing any alias analysis, just assume everything
> - aliases everything else.  Also return 0 if this or its type is
> - an error.  */
> -  if (! flag_strict_aliasing || t == error_mark_node
> +  /* We can not give up with -fno-strict-aliasing because we need to build
> + proper type representation for possible functions which are build with
> + -fstirct-aliasing.  */
> +
> +  /* return 0 if this or its type is an error.  */
> +  if (t == error_mark_node
>|| (! TYPE_P (t)
> && (TREE_TYPE (t) == 0 || TREE_TYPE (t) == error_mark_node)))
>  return 0;
> @@ -1085,15 +1098,10 @@ get_alias_set (tree t)
>  alias_set_type
>  new_alias_set (void)
>  {
> -  if (flag_strict_aliasing)
> -{
> -  if (alias_sets == 0)
> - vec_safe_push (alias_sets, (alias_set_entry *) NULL);
> -  vec_safe_push (alias_sets, (alias_set_entry *) NULL);
> -  return alias_sets->length () - 1;
> -}
> -  else
> -return 0;
> +  if (alias_sets == 0)
> +vec_safe_push (alias_sets, (alias_set_entry *) NULL);
> +  vec_safe_push (alias_sets, (alias_set_entry *) NULL);
> +  return alias_sets->length () - 1;
>  }
>  
>  /* Indicate that things in SUBSET can alias things in SUPERSET, but that


Re: -fstrict-aliasing fixes 4/6: do not fiddle with flag_strict_aliasing when expanding debug locations

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 10:05:13AM +0100, Richard Biener wrote:
> On Wed, 2 Dec 2015, Jan Hubicka wrote:
> 
> > Hi,
> > this patch removes flag_strict_aliasing kludge in expanding debug locations 
> > and
> > instead it introduces explicit parameter DEBUG that makes
> > set_mem_attributes_minus_bitpos to not affect alias sets.  This is sanity
> > checked by comparing number of alias sets before and after at a time we
> > originally overwritten flag_strict_aliasing.
> > 
> > I also added code to prevent memory attributes creation for !optimize and to
> > avoid get_alias_set computation for !flag_strict_aliasing. This slightly
> > optimizes -O0 builds but the results seems to be down in the noise (I would 
> > not
> > object to leave it out).
> > 
> > The patch should fix at least one (latent?) bug that call_stmt expansion
> > invoke expand_debug_expr without clearing flag_strict_aliasing.
> > 
> > Bootstrapped/regtested x86_64-linux, also tested with compare-debug, OK?
> 
> First of all, why do debug MEMs need mem-attrs?

For aliasing purposes, like any other MEMs.  var-tracking needs to be able
to find out if some store through say some pointer could alias certain debug
MEM (then we need to flush the corresponding VALUEs), or not (then that
VALUE can be still considered live at that MEM location).

Jakub


Re: -fstrict-aliasing fixes 4/6: do not fiddle with flag_strict_aliasing when expanding debug locations

2015-12-02 Thread Richard Biener
On Wed, 2 Dec 2015, Jakub Jelinek wrote:

> On Wed, Dec 02, 2015 at 10:05:13AM +0100, Richard Biener wrote:
> > On Wed, 2 Dec 2015, Jan Hubicka wrote:
> > 
> > > Hi,
> > > this patch removes flag_strict_aliasing kludge in expanding debug 
> > > locations and
> > > instead it introduces explicit parameter DEBUG that makes
> > > set_mem_attributes_minus_bitpos to not affect alias sets.  This is sanity
> > > checked by comparing number of alias sets before and after at a time we
> > > originally overwritten flag_strict_aliasing.
> > > 
> > > I also added code to prevent memory attributes creation for !optimize and 
> > > to
> > > avoid get_alias_set computation for !flag_strict_aliasing. This slightly
> > > optimizes -O0 builds but the results seems to be down in the noise (I 
> > > would not
> > > object to leave it out).
> > > 
> > > The patch should fix at least one (latent?) bug that call_stmt expansion
> > > invoke expand_debug_expr without clearing flag_strict_aliasing.
> > > 
> > > Bootstrapped/regtested x86_64-linux, also tested with compare-debug, OK?
> > 
> > First of all, why do debug MEMs need mem-attrs?
> 
> For aliasing purposes, like any other MEMs.  var-tracking needs to be able
> to find out if some store through say some pointer could alias certain debug
> MEM (then we need to flush the corresponding VALUEs), or not (then that
> VALUE can be still considered live at that MEM location).

Ok, so we then pessimize the alias-set for all of them (when they
don't have a DECL_RTL already) to use alias-set zero even if we
wouldn't need to create a new alias set?  (and even if I don't see
what issues that would cause apart from "messing up the dumps" which
should be fixed by stripping alias sets from dumps)

Richard.


Re: [gomp4] Adjust Fortran OACC async lib test

2015-12-02 Thread Chung-Lin Tang
Ping.

Hi Thomas, this is only for gomp4 ATM, okay to commit?

Thanks,
Chung-Lin

On 2015/11/23 7:09 PM, Chung-Lin Tang wrote:
> Hi Thomas,
> this fix adds more acc_wait's to libgomp.oacc-fortran/lib-1[13].f90.
> 
> For lib-12.f90, it's sort of a fix before we can resolve the issue
> of intended semantics for "wait+async".
> 
> As for lib-13.f90, I believe these added acc_wait calls seem
> reasonable, since we can't immediately assume the async-launched parallels
> already completed there.
> 
> Does this seem reasonable?
> 
> Thanks,
> Chung-Lin
> 
>   * testsuite/libgomp.oacc-fortran/lib-12.f90 (main): Add acc_wait()
>   after async parallel construct.
>   * testsuite/libgomp.oacc-fortran/lib-13.f90 (main): Add acc_wait()
>   calls after parallel construct launches.
> 



Re: [PATCH, C++] Wrap OpenACC wait in EXPR_STMT

2015-12-02 Thread Chung-Lin Tang
Ping.

On 2015/11/23 9:15 PM, Chung-Lin Tang wrote:
> The OpenACC wait directive is represented as a call to the runtime
> function "GOACC_wait" instead of a tree code.  I am seeing when
> '#pragma acc wait' is using inside a template function, the CALL_EXPR
> to GOACC_wait is being silently ignored/removed during tsubst_expr().
> 
> I think the correct way to organize this is that the call should be inside
> an EXPR_STMT, so here's a patch to do that; basically remove the
> add_stmt() call from the shared c_finish_oacc_wait() code, and add
> add_stmt()/finish_expr_stmt() in the corresponding C/C++ parts.
> 
> Tested with no regressions on trunk, okay to commit?
> 
> Thanks,
> Chung-Lin
> 
>   * c-family/c-omp.c (c_finish_oacc_wait): Remove add_stmt() call.
>   * c/c-parser.c (c_parser_oacc_wait): Add add_stmt() call.
>   * cp/parser.c (cp_parser_oacc_wait): Add finish_expr_stmt() call.
> 



Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 12:46:47AM +0100, Tom de Vries wrote:
> Does this explanation address your concern?

Yeah, for now it is fine I hope.

Jakub


Re: [PATCH 1/2] destroy values as well as keys when removing them from hash maps

2015-12-02 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 2 Dec 2015, Trevor Saunders wrote:
>> On Tue, Dec 01, 2015 at 07:43:35PM +, Richard Sandiford wrote:
>> > tbsaunde+...@tbsaunde.org writes:
>> > > -template 
>> > > +template 
>> > >  template 
>> > >  inline void
>> > > -simple_hashmap_traits ::remove (T &entry)
>> > > +simple_hashmap_traits ::remove (T &entry)
>> > >  {
>> > >H::remove (entry.m_key);
>> > > +  entry.m_value.~Value ();
>> > >  }
>> > 
>> > This is just repeating my IRC comment really, but doesn't this mean that
>> > we're calling the destructor on an object that was never constructed?
>> > I.e. nothing ever calls placement new on the entry, the m_key, or the
>> > m_value.
>> 
>> I believe you are correct that placement new is not called.  I'd say its
>> a bug waiting to happen given that the usage of auto_vec seems to
>> demonstrate that people expect objects to be initialized and destroyed.
>> However for now all values are either POD, or auto_vec and in either
>> case the current 0 initialization has the same effect as the
>> constructor.  So There may be a theoretical problem with how we
>> initialize values that will become real when somebody adds a constructor
>> that doesn't just 0 initialize.  So it should probably be improved at
>> some point, but it doesn't seem necessary to mess with it at this point
>> instead of next stage 1.
>
> Agreed.

OK.  I was just worried that (IIRC) we had cases where for:

 a.~foo ()
 a.x = ...;

the assignment to a.x was removed as dead since the object had been
destroyed.  Maybe that could happen again if we don't have an explicit
constructor to create a new object.

Thanks,
Richard

> You'll also need a more elaborate allocator/constructor
> scheme for this considering the case where no default constructor
> is available.  See how alloc-pool.h tries to dance around this
> using a "raw" allocate and a operator new...
>
> Richard.



Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-02 Thread Richard Biener
On Tue, 1 Dec 2015, Gary Funck wrote:

> On 12/01/15 12:12:29, Richard Biener wrote:
> > On Mon, 30 Nov 2015, Gary Funck wrote:
> > > At this time, we would like to re-submit the UPC patches for comment
> > > with the goal of introducing these changes into GCC 6.0.
> >
> >  First of all let me say that it is IMNSHO now too late for GCC 6.
> 
> I realize that stage 1 recently closed, and that if UPC were
> accepted for inclusion, it would be an exception.  To offset
> potential risk, we perform weekly merges and run a large suite
> of tests and apps. on differing hosts/cpu architectures.
> We have also tried to follow the sorts of re-factoring and C++
> changes made over the course of the last year/so.  I'd just ask
> that the changes be given some further consideration for 6.0.
> 
> > You claim bits in tree_base - are those bits really used for
> > all tree kinds?  The qualifiers look type specific where
> > eventually FE specific flags in type-lang-specific parts could
> > have been used (yeah, there are no spare bits in tree_type_*).
> > Similar the _factor stuff should not be on all tree kinds.
> 
> When we first started building the gupc branch, it was suggested
> that UPC be implemented as a separate language ala ObjC.
> In that case, we used "language bits".  Over time, this approach
> fell out of favor, and we were asked to move everything into
> the C front-end and middle-end, making compilation contingent
> upon -fupc, which is the way it is now.  Also, over the past
> couple of years, there has been work to minimize the number of
> bits used by tree nodes, so some additional changes were needed.
> 
> The main change recommended to reduce tree space was moving the
> "layout factor" (blocking factor) out of the tree node, and using
> only two bits there, one bit for a relatively common case of 0,
> and the other for > 1.  It was suggested that we use a hash
> table to map tree nodes to layout qualifiers for the case they
> are > 1.  This necessitated using a garbage collected tree map,
> which unfortunately meant that tree nodes needed special garbage
> collection logic.

I still don't see why it needs special garbage collection logic.
We have many tree -> X maps that just get away without.

> It is worth noting that the "layout qualifier" is an integral
> constant, currently represented as a tree node reference.
> It might be possible to represent it as a "wide int" instead.
> I did give that a go once, but it rippled through the code
> making some things awkward.  Perhaps not as awkward as a
> custom tree node GC routine; this could be re-visited.

As said, I don't see why you need a special GC collection logic
at all.  Please explain.

> > I find the names used a bit unspecific, please consider
> > prefixing them with upc_ (esp. shared_flag may be confused
> > with the similar private_flag).
> 
> When we previously asked for a review, it was noted that
> if the UPC bits were moved into what amounts to common/generic
> tree node fields that we should drop UPC_ or upc_ from the
> related node names and functions.  That's what we did.
> There is some middle ground, for example, where only
> TYPE_SHARED_P() is renamed to UPC_SHARED_TYPE_P()
> and the rest remain as is.
> 
> Since renames are straight forward, we can make any
> recommended changes quickly.
> 
> Originally, we were keeping the door open for UPC++, but
> there are complications with generalizing C++ into a multi-node
> environment, and that idea has been tabled for now.
> Therefore, the current structure/implementation is C only,
> with most of the new front-end/middle-end logic under
> the c/ directory.
> 
> > Are these and the new tree codes below living beyond the time
> > the frontend is in control?  That is, do they need to survive
> > throughout the middle-end?
> 
> I'm not sure where the line is drawn for the front-end and middle-end.
> After upc_genericize() runs (just before c_genericize())
> all operations on tree nodes that are UPC-specific are lowered
> into operations on the internal representation of a pointer-to-shared
> and/or runtime calls that operate on the internal representation.
> The pointer-to-shared values/types still show up
> in the tree, but only as containers (pointers-to-shared
> are typically 2x the size of a regular "C" pointer).

The line between FE and middle-end is indeed a bit of a grey area.
I am considering everything after gimplification middle-end.  This
means that UPC lowering is done in the frontend.  And if indeed
none of the "special" pointers survive to middle-end code then
more of the bits needed could go into on-the-side structures.

I'd have done a

hash_map

and put all of the UPC state in there.  And just hash the tree
by pointer.

I realize that by using the C frontend you think you need to
make the UPC types variants of the C type.  I'm questioning that
but don't know too much about the issue (not needing that would
avoid exposing the UPC types to tree.[ch]).

Of course much of the mi

Re: [PATCH] Add testcase for tree-optimization/64769

2015-12-02 Thread Richard Biener
On Tue, Dec 1, 2015 at 4:38 PM, Marek Polacek  wrote:
> There's an open PR with -fopenmp-simd testcase that used to ICE but is now
> fixed for 5/6, but not 4.9.
>
> Should I commit this right away to trunk, wait for gcc-5 branch to open and
> then commit it to 5 as well and then close the PR?

Ok for GCC 5.

Richard.

> Or just to trunk and close the PR?
>
> Tested on x86_64-linux.
>
> 2015-12-01  Marek Polacek  
>
> PR tree-optimization/64769
> * c-c++-common/gomp/pr64769.c: New test.
>
> diff --git gcc/testsuite/c-c++-common/gomp/pr64769.c 
> gcc/testsuite/c-c++-common/gomp/pr64769.c
> index e69de29..3a30149 100644
> --- gcc/testsuite/c-c++-common/gomp/pr64769.c
> +++ gcc/testsuite/c-c++-common/gomp/pr64769.c
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/64769 */
> +/* { dg-do compile } */
> +/* { dg-options "-fopenmp-simd" } */
> +
> +#pragma omp declare simd linear(i)
> +void
> +foo (int i)
> +{
> +}
>
> Marek


Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-02 Thread Jakub Jelinek
On Fri, Nov 27, 2015 at 12:42:09PM +0100, Tom de Vries wrote:
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -1366,10 +1366,12 @@ build_sender_ref (tree var, omp_context *ctx)
>return build_sender_ref ((splay_tree_key) var, ctx);
>  }
>  
> -/* Add a new field for VAR inside the structure CTX->SENDER_DECL.  */
> +/* Add a new field for VAR inside the structure CTX->SENDER_DECL.  If
> +   BASE_POINTERS_RESTRICT, declare the field with restrict.  */
>  
>  static void
> -install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
> +install_var_field_1 (tree var, bool by_ref, int mask, omp_context *ctx,
> +  bool base_pointers_restrict)

Ugh, why the renaming?  Just use default argument:
bool base_pointers_restrict = false

> +/* As install_var_field_1, but with base_pointers_restrict == false.  */
> +
> +static void
> +install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
> +{
> +  install_var_field_1 (var, by_ref, mask, ctx, false);
> +}

And avoid the wrapper.

>  /* Instantiate decls as necessary in CTX to satisfy the data sharing
> -   specified by CLAUSES.  */
> +   specified by CLAUSES.  If BASE_POINTERS_RESTRICT, install var field with
> +   restrict.  */
>  
>  static void
> -scan_sharing_clauses (tree clauses, omp_context *ctx)
> +scan_sharing_clauses_1 (tree clauses, omp_context *ctx,
> + bool base_pointers_restrict)

Likewise.

Otherwise LGTM, but I'm worried if this isn't related in any way to
PR68640 and might not make things worse.

Jakub


Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-02 Thread Richard Biener
On Tue, Dec 1, 2015 at 10:32 PM, Jeff Law  wrote:
> On 10/09/2015 09:45 AM, Jeff Law wrote:
>>>
>>> Yes, but as you remove jump threading paths you could leave the CFG
>>> change to
>>> cfg-cleanup anyway?  To get better behavior wrt loop fixup at least?
>>
>> So go ahead and detect, remove the threading paths, but leave final
>> fixup to cfg-cleanup.  I can certainly try that.
>>
>> It'd actually be a good thing to experiement with regardless -- I've
>> speculated that removing the edges in DOM allows DOM to do a better job,
>> but never did the instrumentation to find out for sure.  Deferring the
>> final cleanup like you've suggested ought to give me most of what I'd
>> want to see if there's really any good secondary effects of cleaning up
>> the edges in DOM.
>
> So I started looking at this in response to 68619, where this approach does
> indeed solve the problem.
>
> Essentially DOM's optimization of those edges results in two irredicuble
> loops becoming reducible.  The loop analysis code then complains because we
> don't have proper loop structures for the new natural loops.
>
> Deferring to cfg_cleanup works because if cfg_cleanup does anything, it sets
> LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it seems that
> the gyrations we often do to avoid LOOPS_NEED_FIXUP are probably not all
> that valuable in the end.  Anyway...

Yeah, I have partially working patches lying around to "fix" CFG cleanup to
avoid this.  Of course in the case of new loops appearing that's not easily
possible.

> There's some fallout which I'm still exploring.  For example, we have cases
> where removal of the edge by DOM results in removal of a PHI argument in the
> target, which in turn results in the PHI becoming a degenerate which we can
> then propagate away.  I have a possible solution for this that I'm playing
> with.
>
> I suspect the right path is to continue down this path.

Yeah, the issue here is that DOM isn't tracking which edges are executable
to handle merge PHIs (or to aovid doing work in unreachable regions).  It should
be possible to make it do that much like I extended SCCVN to do this
(when doing the DOM walk see if any incoming edge is marked executable
and if not, mark all outgoing edges as not executable, if the block is
executable
at the time we process the last stmt determine if we can compute the edge
that ends up always executed and mark all others as not executable)

Richard.

> Jeff
>
>


Re: Go patch committed: Fix array dimension handling on 32-bit host

2015-12-02 Thread Richard Biener
On Wed, Dec 2, 2015 at 2:28 AM, Ian Lance Taylor  wrote:
> The Go frontend code that handled array dimensions when generating
> reflection and mangling assumed that an array dimension would fit in
> an unsigned long.  That is of course not true when a 32-bit host is
> cross-compiling to a 64-bit target.  This patch fixes the problem.
> This was reported as GCC PR 65717.  Bootstrapped and ran Go tests on
> x86_64-pc-linux-gnu, and also on a 32-bit Solaris host crossing to a
> 64-bit Solaris target.  Committed to mainline.  Could be committed to
> GCC 5 branch but I'm not sure whether the branch is open yet.

It is not, it will be again after GCC 5.3.0 was released next week.

Richard.

> Ian


Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-02 Thread Jakub Jelinek
On Fri, Nov 27, 2015 at 01:03:52PM +0100, Tom de Vries wrote:
> Handle non-declared variables in kernels alias analysis
> 
> 2015-11-27  Tom de Vries  
> 
>   * gimplify.c (gimplify_scan_omp_clauses): Initialize
>   OMP_CLAUSE_ORIG_DECL.
>   * omp-low.c (install_var_field_1): Handle base_pointers_restrict for
>   pointers.
>   (map_ptr_clause_points_to_clause_p)
>   (nr_map_ptr_clauses_pointing_to_clause): New function.
>   (omp_target_base_pointers_restrict_p): Handle GOMP_MAP_POINTER.
>   * tree-pretty-print.c (dump_omp_clause): Print OMP_CLAUSE_ORIG_DECL.
>   * tree.c (omp_clause_num_ops): Set num_ops for OMP_CLAUSE_MAP to 3.
>   * tree.h (OMP_CLAUSE_ORIG_DECL): New macro.
> 
>   * c-c++-common/goacc/kernels-alias-10.c: New test.
>   * c-c++-common/goacc/kernels-alias-9.c: New test.

I don't like this (mainly the addition of OMP_CLAUSE_ORIG_DECL),
but it also sounds wrong to me.
The primary question is how do you handle GOMP_MAP_POINTER
(which is something we don't use for C/C++ OpenMP anymore,
and Fortran OpenMP will stop using it in GCC 7 or 6.2?) on the OpenACC
libgomp side, does it work like GOMP_MAP_ALLOC or GOMP_MAP_FORCE_ALLOC?
Similarly GOMP_MAP_TO_PSET.  If it works like GOMP_MAP_ALLOC (it does
on the OpenMP side in target.c, so if something is already mapped, no
further pointer assignment happens), then your change looks wrong.
If it works like GOMP_MAP_FORCE_ALLOC, then you just should treat
GOMP_MAP_POINTER on all OpenACC constructs as opcode that allows the
restrict operation.  If it should behave differently depending on
if the corresponding array section has been mapped with GOMP_MAP_FORCE_*
or without it, then supposedly you should use a different code for
those two.

Jakub


Re: PR68146: Check for null SSA_NAME_DEF_STMTs in fold-const.c

2015-12-02 Thread Richard Biener
On Wed, Dec 2, 2015 at 9:33 AM, Richard Sandiford
 wrote:
> The problem in the testcase was that tree-complex.c was trying
> to fold ABS_EXPRs of SSA names that didn't yet have a definition
> (because the SSA names were real and imaginary parts of a complex
> SSA name whose definition hadn't yet been visited by the pass).
> tree-complex.c uses a straightforward walk in index order:
>
>   /* ??? Ideally we'd traverse the blocks in breadth-first order.  */
>   old_last_basic_block = last_basic_block_for_fn (cfun);
>   FOR_EACH_BB_FN (bb, cfun)
> {
>
> and in the testcase, we have a block A with a single successor B that
> comes before it.  B has no other predecessor and has a complex division
> that uses an SSA name X defined in A, so we split the components of X
> before we reach the definition of X.  (I imagine cfgcleanup would
> clean this up by joining A and B.)
>
> Tested on x86_64-linux-gnu.  OK to install?

I think the new checks are just bogus because all SSA names in the IL ought
to have a def-stmt (if only a GIMPLE_NOP).

So I think what tree-complex.c does it just wrong(TM) and worked by
accident only.

So I suggest instead compute a proper CFG order and process basic blocks in
that order instead.  (a domwalk doesn't work as explained in the PR)

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> PR tree-optimization/68146
> * fold-const.c (tree_binary_nonnegative_warnv_p): Check for null
> SSA_NAME_DEF_STMTs.
> (integer_valued_real_call_p): Likewise.
>
> gcc/testsuite/
> * gfortran.fortran-torture/compile/pr68146.f90: New test.
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 16bff5f..c99e78e 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -12867,6 +12867,8 @@ tree_binary_nonnegative_warnv_p (enum tree_code code, 
> tree type, tree op0,
>  bool
>  tree_single_nonnegative_warnv_p (tree t, bool *strict_overflow_p, int depth)
>  {
> +  gimple *stmt;
> +
>if (TYPE_UNSIGNED (TREE_TYPE (t)))
>  return true;
>
> @@ -12892,8 +12894,9 @@ tree_single_nonnegative_warnv_p (tree t, bool 
> *strict_overflow_p, int depth)
>  to provide it through dataflow propagation.  */
>return (!name_registered_for_update_p (t)
>   && depth < PARAM_VALUE (PARAM_MAX_SSA_NAME_QUERY_DEPTH)
> - && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t),
> - strict_overflow_p, depth));
> + && (stmt = SSA_NAME_DEF_STMT (t))
> + && gimple_stmt_nonnegative_warnv_p (stmt, strict_overflow_p,
> + depth));
>
>  default:
>return tree_simple_nonnegative_warnv_p (TREE_CODE (t), TREE_TYPE (t));
> @@ -13508,6 +13511,7 @@ integer_valued_real_call_p (combined_fn fn, tree 
> arg0, tree arg1, int depth)
>  bool
>  integer_valued_real_single_p (tree t, int depth)
>  {
> +  gimple *stmt;
>switch (TREE_CODE (t))
>  {
>  case REAL_CST:
> @@ -13524,8 +13528,8 @@ integer_valued_real_single_p (tree t, int depth)
>  to provide it through dataflow propagation.  */
>return (!name_registered_for_update_p (t)
>   && depth < PARAM_VALUE (PARAM_MAX_SSA_NAME_QUERY_DEPTH)
> - && gimple_stmt_integer_valued_real_p (SSA_NAME_DEF_STMT (t),
> -   depth));
> + && (stmt = SSA_NAME_DEF_STMT (t))
> + && gimple_stmt_integer_valued_real_p (stmt, depth));
>
>  default:
>break;
> diff --git a/gcc/testsuite/gfortran.fortran-torture/compile/pr68146.f90 
> b/gcc/testsuite/gfortran.fortran-torture/compile/pr68146.f90
> new file mode 100644
> index 000..7f75ec0
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.fortran-torture/compile/pr68146.f90
> @@ -0,0 +1,11 @@
> +subroutine foo(a1, a2, s1, s2, n)
> +  integer :: n
> +  complex(kind=8) :: a1(n), a2(n), s1, s2
> +  do i = 1, n
> + a1(i) = i
> +  end do
> +  s1 = 20.0 / s2
> +  do i = 1, n
> + a2(i) = i / s2
> +  end do
> +end subroutine foo
>


Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-12-02 Thread Dominik Vogt
On Wed, Dec 02, 2015 at 10:11:54AM +0100, Dominik Vogt wrote:
> On Wed, Dec 02, 2015 at 09:59:10AM +0100, Andreas Krebbel wrote:
> > On 11/30/2015 03:45 PM, Dominik Vogt wrote:
> > > On Mon, Nov 09, 2015 at 01:33:23PM +0100, Andreas Krebbel wrote:
> > >> On 11/04/2015 02:39 AM, Dominik Vogt wrote:
> > >>> On Tue, Nov 03, 2015 at 06:47:28PM +0100, Ulrich Weigand wrote:
> >  Dominik Vogt wrote:
> > > +++ b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
> > > @@ -0,0 +1,11 @@
> > > +/* Machine description pattern tests.  */
> > > +
> > > +/* { dg-do assemble } */
> > > +/* { dg-options "-dP -save-temps" } */
> > 
> > -save-temps is not necessary for a dg-do assemble test.
> 
> It *is* necessary for "assemble", but not for "compile" which
> should be used here.  Anyway, I want to upgrade the test to a
> "run" test that also veryfies whether the generated code does the
> right thing.
> 
> > > +# Additional md torture tests.
> > > +torture-init
> > > +set MD_TEST_OPTS [list \
> > > + {-Os -march=z900} {-Os -march=z13} \
> > > + {-O0 -march=z900} {-O0 -march=z13} \
> > > + {-O1 -march=z900} {-O1 -march=z13} \
> > > + {-O2 -march=z900} {-O2 -march=z13} \
> > > + {-O3 -march=z900} {-O3 -march=z13}]
> > > +set-torture-options $MD_TEST_OPTS
> > > +gcc-dg-runtest [lsort [glob -nocomplain $md_tests]] "" $DEFAULT_CFLAGS
> > >  torture-finish
> > 
> > Does it really make sense to use different -march options for the
> > md/ tests? Whether a certain pattern will match usually depends on
> > the -march level. I would say the -march option needs to be part
> > of testcase.
> 
> Agreed, but I think with "run" tests various -march= and -O
> options are useful.

Version 4 of the patch attached (enhanced test case).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.md ("movstr", "*movstr"): Fix warning.
("movstr"): New indirect expanders used by "movstr".

testsuite/ChangeLog

* gcc.target/s390/md/movstr-1.c: New test.
* gcc.target/s390/s390.exp: Add subdir md.
Do not run hotpatch tests twice.
>From 552c1d416ef33acaa2a119149238666a171cbb1b Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Tue, 3 Nov 2015 18:03:02 +0100
Subject: [PATCH] S/390: Fix warning in "*movstr" pattern.

---
 gcc/config/s390/s390.md | 20 +---
 gcc/testsuite/gcc.target/s390/md/movstr-1.c | 29 +
 gcc/testsuite/gcc.target/s390/s390.exp  | 25 -
 3 files changed, 66 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/movstr-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index e5db537..7eca315 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -2910,13 +2910,27 @@
 ;
 
 (define_expand "movstr"
+  ;; The pattern is never generated.
+  [(match_operand 0 "" "")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")]
+  ""
+{
+  if (TARGET_64BIT)
+emit_insn (gen_movstrdi (operands[0], operands[1], operands[2]));
+  else
+emit_insn (gen_movstrsi (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "movstr"
   [(set (reg:SI 0) (const_int 0))
(parallel
 [(clobber (match_dup 3))
  (set (match_operand:BLK 1 "memory_operand" "")
 	  (match_operand:BLK 2 "memory_operand" ""))
- (set (match_operand 0 "register_operand" "")
-	  (unspec [(match_dup 1)
+ (set (match_operand:P 0 "register_operand" "")
+	  (unspec:P [(match_dup 1)
 		   (match_dup 2)
 		   (reg:SI 0)] UNSPEC_MVST))
  (clobber (reg:CC CC_REGNUM))])]
@@ -2937,7 +2951,7 @@
(set (mem:BLK (match_operand:P 1 "register_operand" "0"))
 	(mem:BLK (match_operand:P 3 "register_operand" "2")))
(set (match_operand:P 0 "register_operand" "=d")
-	(unspec [(mem:BLK (match_dup 1))
+	(unspec:P [(mem:BLK (match_dup 1))
 		 (mem:BLK (match_dup 3))
 		 (reg:SI 0)] UNSPEC_MVST))
(clobber (reg:CC CC_REGNUM))]
diff --git a/gcc/testsuite/gcc.target/s390/md/movstr-1.c b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
new file mode 100644
index 000..2fce743
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/md/movstr-1.c
@@ -0,0 +1,29 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do run } */
+/* { dg-options "-dP" } */
+
+void test(char *dest, const char *src)
+{
+  __builtin_stpcpy (dest, src);
+}
+
+/* { dg-final { scan-assembler-times {{[*]movstr}} 1 } } */
+
+#include 
+#include 
+
+#define LEN 200
+char buf[LEN];
+
+int main(void)
+{
+  memset(buf, 0, LEN);
+  test(buf, "hello world!");
+  if (strcmp(buf, "hello world!") != 0)
+{
+  fprintf(stderr, "error: test() failed\n");
+  return 1;
+}
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp
index 0b8f80ed..0d7a7eb 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -61,20 +61,35 @@ if ![info exists DEFAULT_CFLAGS] then {
 # Initialize `dg'.
 dg-init

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-02 Thread Dominik Vogt
Hopefully, this is correct now; it does pass the functional test case
that's part of the patch.  Unfortunately the define_insn patters
had to be duplicated because of the new subreg offsets.  Not sure
whether I've missed any "use" patterns that should be added.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.c (s390_expand_setmem): Use new expanders.
* config/s390/s390.md ("*setmem_long")
("*setmem_long_and", "*setmem_long_31z"): Fix warnings.
("*setmem_long_and_64", "*setmem_long_and_31", "*setmem_long_and_31z")
("*setmem_long_64", "*setmem_long_31"): Renamed and duplicated.
("setmem_long_"): New expanders.
("setmem_long"): Removed.

gcc/testsuite/ChangeLog

* gcc.target/s390/md/setmem_long-1.c: New test.
>From 922d200afbe8493e62b0ffb300fbac11356469c8 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 4 Nov 2015 03:16:24 +0100
Subject: [PATCH 1/1.5] S/390: Fix warnings in "*setmem_long..." patterns.

---
 gcc/config/s390/s390.c   |  7 +-
 gcc/config/s390/s390.md  | 89 ++--
 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c | 64 +
 3 files changed, 138 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 7e7ed45..1a77437 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -5203,7 +5203,12 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
   else if (TARGET_MVCLE)
 {
   val = force_not_mem (convert_modes (Pmode, QImode, val, 1));
-  emit_insn (gen_setmem_long (dst, convert_to_mode (Pmode, len, 1), val));
+  if (TARGET_64BIT)
+	emit_insn (gen_setmem_long_di (dst, convert_to_mode (Pmode, len, 1),
+   val));
+  else
+	emit_insn (gen_setmem_long_si (dst, convert_to_mode (Pmode, len, 1),
+   val));
 }
 
   else
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 7eca315..27e5c7f 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -70,6 +70,9 @@
; Copy CC as is into the lower 2 bits of an integer register
UNSPEC_CC_TO_INT
 
+   ; Convert Pmode to BLKmode
+   UNSPEC_REPLICATE_BYTE
+
; GOT/PLT and lt-relative accesses
UNSPEC_LTREL_OFFSET
UNSPEC_LTREL_BASE
@@ -3281,12 +3284,12 @@
 
 ; Initialize a block of arbitrary length with (operands[2] % 256).
 
-(define_expand "setmem_long"
+(define_expand "setmem_long_"
   [(parallel
 [(clobber (match_dup 1))
  (set (match_operand:BLK 0 "memory_operand" "")
-  (match_operand 2 "shift_count_or_setmem_operand" ""))
- (use (match_operand 1 "general_operand" ""))
+	  (unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")
+		  (match_dup 4)] UNSPEC_REPLICATE_BYTE))
  (use (match_dup 3))
  (clobber (reg:CC CC_REGNUM))])]
   ""
@@ -3307,30 +3310,29 @@
   operands[0] = replace_equiv_address_nv (operands[0], addr0);
   operands[1] = reg0;
   operands[3] = reg1;
+  operands[4] = gen_lowpart (Pmode, operands[1]);
 })
 
-(define_insn "*setmem_long"
-  [(clobber (match_operand: 0 "register_operand" "=d"))
-   (set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
-(match_operand 2 "shift_count_or_setmem_operand" "Y"))
-   (use (match_dup 3))
-   (use (match_operand: 1 "register_operand" "d"))
+(define_insn "*setmem_long_64"
+  [(clobber (match_operand:TI 0 "register_operand" "=d"))
+   (set (mem:BLK (subreg:DI (match_operand:TI 3 "register_operand" "0") 0))
+(unspec:BLK [(match_operand:DI 2 "shift_count_or_setmem_operand" "Y")
+		 (subreg:DI (match_dup 3) 8)] UNSPEC_REPLICATE_BYTE))
+   (use (match_operand:TI 1 "register_operand" "d"))
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_64BIT || !TARGET_ZARCH"
+  "TARGET_64BIT"
   "mvcle\t%0,%1,%Y2\;jo\t.-4"
   [(set_attr "length" "8")
(set_attr "type" "vs")])
 
-(define_insn "*setmem_long_and"
-  [(clobber (match_operand: 0 "register_operand" "=d"))
-   (set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
-(and (match_operand 2 "shift_count_or_setmem_operand" "Y")
-	 (match_operand 4 "const_int_operand" "n")))
-   (use (match_dup 3))
-   (use (match_operand: 1 "register_operand" "d"))
+(define_insn "*setmem_long_31"
+  [(clobber (match_operand:DI 0 "register_operand" "=d"))
+   (set (mem:BLK (subreg:SI (match_operand:DI 3 "register_operand" "0") 0))
+(unspec:BLK [(match_operand:SI 2 "shift_count_or_setmem_operand" "Y")
+		 (subreg:SI (match_dup 3) 4)] UNSPEC_REPLICATE_BYTE))
+   (use (match_operand:DI 1 "register_operand" "d"))
(clobber (reg:CC CC_REGNUM))]
-  "(TARGET_64BIT || !TARGET_ZARCH) &&
-   (INTVAL (operands[4]) & 255) == 255"
+  "!TARGET_64BIT && !TARGET_ZARCH"
   "mvcle\t%0,%1,%Y2\;jo\t.-4"
   [(set_attr "length" "8")
(set_attr "type" "vs")])
@@ -3338,8 +3340,8 @@
 (define_insn "

[PATCH] Add testcase for PR middle-end/68570

2015-12-02 Thread Marek Polacek
This PR got fixed along with PR68625, so I'd like to add the testcase and close
the bug.

Tested on x86_64-linux, ok for trunk?

2015-12-02  Marek Polacek  

PR middle-end/68570
* gcc.dg/torture/pr68570.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr68570.c 
gcc/testsuite/gcc.dg/torture/pr68570.c
index e69de29..a8f2843 100644
--- gcc/testsuite/gcc.dg/torture/pr68570.c
+++ gcc/testsuite/gcc.dg/torture/pr68570.c
@@ -0,0 +1,35 @@
+/* PR middle-end/68570 */
+/* { dg-do compile } */
+
+int a, d, e, f, h, i, k;
+
+void
+fn1 ()
+{
+  char m;
+  for (;;)
+{
+  for (;;)
+{
+  e = f = 1;
+  if (i)
+d = h = 0;
+  else
+a = 0;
+  break;
+}
+  k = 0;
+  if (f)
+a = 3;
+  if (d)
+f = 0;
+  if (a > (i < 1))
+{
+  if (e)
+break;
+}
+  else
+i = m;
+  k = i ? a : i;
+}
+}

Marek


Re: [PATCH] Add testcase for PR middle-end/68570

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 11:16:06AM +0100, Marek Polacek wrote:
> This PR got fixed along with PR68625, so I'd like to add the testcase and 
> close
> the bug.
> 
> Tested on x86_64-linux, ok for trunk?
> 
> 2015-12-02  Marek Polacek  
> 
>   PR middle-end/68570
>   * gcc.dg/torture/pr68570.c: New test.

Ok.

Jakub


Re: [PATCH] Avoid false vector mask conversion

2015-12-02 Thread Ilya Enkovich
Ping

2015-11-23 16:05 GMT+03:00 Ilya Enkovich :
> Ping
>
> 2015-11-13 16:17 GMT+03:00 Ilya Enkovich :
>> 2015-11-13 13:03 GMT+03:00 Richard Biener :
>>> On Thu, Nov 12, 2015 at 5:08 PM, Ilya Enkovich  
>>> wrote:
 Hi,

 When we use LTO for fortran we may have a mix 32bit and 1bit scalar 
 booleans. It means we may have conversion of one scalar type to another 
 which confuses vectorizer because values with different scalar boolean 
 type may get the same vectype.
>>>
>>> Confuses aka fails to vectorize?
>>
>> Right.
>>
>>>
  This patch transforms such conversions into comparison.

 I managed to make a small fortran test which gets vectorized with this 
 patch but I didn't find how I can run fortran test with LTO and then scan 
 tree dump to check it is vectorized.  BTW here is a loop from the test:

   real*8 a(18)
   logical b(18)
   integer i

   do i=1,18
  if(a(i).gt.0.d0) then
 b(i)=.true.
  else
 b(i)=.false.
  endif
   enddo
>>>
>>> This looks the the "error" comes from if-conversion - can't we do
>>> better there then?
>>
>> No, this loop is transformed into a single BB before if-conversion by
>> cselim + phiopt.
>>
>> Ilya
>>
>>>
>>> Richard.
>>>
 Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?

 Thanks,
 Ilya


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 06:28:20PM +0300, Alexander Monakov wrote:
> The approach in OpenACC is to, outside of "vector" loops, 1) make threads 1-31
> "slaves" which just follow branches without any computation -- that requires
> extra jumps and broadcasting branch predicates, -- and 2) broadcast register
> state and stack state from master to slaves when entering "vector" regions.
> 
> I'm taking a different approach.  I want to execute all insns in all warp
> members, while ensuring that effect (on global and local state) is that same
> as if any single thread was executing that instruction.  Most instructions
> automatically satisfy that: if threads have the same state, then executing an
> arithmetic instruction, normal memory load/store, etc. keep local state the
> same in all threads.

Don't know the HW good enough, is there any power consumption, heat etc.
difference between the two approaches?  I mean does the HW consume different
amount of power if only one thread in a warp executes code and the other
threads in the same warp just jump around it, vs. having all threads busy?

If it is the same, then I think your approach is reasonable, but my
understanding of PTX is limited.

How exactly does OpenACC copy the stack?  At least for OpenMP, one could
have automatic vars whose addresses are passed to simd regions in different
functions, say like:

void
baz (int x, int *arr)
{
  int i;
  #pragma omp simd
  for (i = 0; i < 128; i++)
arr[i] *= arr[i] + i + x; // Replace with something useful and expensive
}

void
bar (int x)
{
  int arr[128], i;
  for (i = 0; i < 128; i++)
arr[i] = i + x;
  baz (x, arr);
}
#pragma omp declare target to (bar, baz)

void
foo ()
{
  int i;
  #pragma omp target teams distribute parallel for
  for (i = 0; i < 131072; i++)
bar (i);
}
and without inlining you don't know if the arr in bar above will be shared
by all SIMD lanes (SIMT in PTX case) or not.

Jakub


Re: [gomp-nvptx 4/9] nvptx backend: add -mgomp option and multilib

2015-12-02 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 06:28:22PM +0300, Alexander Monakov wrote:
> Since OpenMP offloading requires both soft-stacks and "uniform SIMT", both
> non-traditional codegen variants, I'm building a multilib variant with those
> enabled.  This patch adds option -mgomp which enables -msoft-stack plus
> -muniform-simt, and builds a multilib with it.
> 
>   * config/nvptx/nvptx.c (nvptx_option_override): Handle TARGET_GOMP.
>   * config/nvptx/nvptx.opt (mgomp): New option.
>   * config/nvptx/t-nvptx (MULTILIB_OPTIONS): New.
>   * doc/invoke.texi (mgomp): Document.

I thought the MULTILIB* vars allow you to multilib on none of
-msoft-stack/-muniform-simt and both -msoft-stack/-muniform-simt, without
building other variants, so you wouldn't need this.
Furthermore, as I said, I believe for e.g. most of newlib libc / libm
I think it is enough if they are built as -muniform-simt -mno-soft-stack,
if those functions are leaf or don't call user routines that could have
#pragma omp parallel.  -msoft-stack would unnecessarily slow the routines
down.
So perhaps just multilib on -muniform-simt, and document that -muniform-simt
built code requires also that the soft-stack var is set up and thus
-msoft-stack can be used when needed?

Can you post sample code with assembly for -msoft-stack and -muniform-simt
showing how are short interesting cases expanded?
Is there really no way even in direct PTX assembly to have .local file scope
vars (rather than the global arrays indexed by %tid)?

Jakub


Re: [PATCH] rs6000: Optimise SImode cstore on 64-bit

2015-12-02 Thread Segher Boessenkool
On Tue, Dec 01, 2015 at 09:39:30PM -0600, Segher Boessenkool wrote:
> On Wed, Dec 02, 2015 at 01:50:46PM +1030, Alan Modra wrote:
> > On Wed, Dec 02, 2015 at 01:55:17AM +, Segher Boessenkool wrote:
> > > +  emit_insn (gen_subdi3 (tmp, op1, op2));
> > > +  emit_insn (gen_lshrdi3 (tmp2, tmp, GEN_INT (63)));
> > > +  emit_insn (gen_anddi3 (tmp3, tmp2, const1_rtx));
> > 
> > Why the AND?  The top 63 bits are already clear.
> 
> Ha, yes.  Thanks.  In a previous version I shifted by less, in which
> case GCC is smart enough to make it 63 anyway.  63 is always correct
> as well, and simpler because you don't need the AND.  But I forgot
> to take it out :-)

For completeness, this is what I committed.


Segher

---
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a500d67..26b0962 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10564,6 +10564,51 @@ (define_expand "cstore4_unsigned"
   DONE;
 })
 
+(define_expand "cstore_si_as_di"
+  [(use (match_operator 1 "unsigned_comparison_operator"
+ [(match_operand:SI 2 "gpc_reg_operand")
+  (match_operand:SI 3 "reg_or_short_operand")]))
+   (clobber (match_operand:SI 0 "register_operand"))]
+  ""
+{
+  int uns_flag = unsigned_comparison_operator (operands[1], VOIDmode) ? 1 : 0;
+  enum rtx_code cond_code = signed_condition (GET_CODE (operands[1]));
+
+  rtx op1 = gen_reg_rtx (DImode);
+  rtx op2 = gen_reg_rtx (DImode);
+  convert_move (op1, operands[2], uns_flag);
+  convert_move (op2, operands[3], uns_flag);
+
+  if (cond_code == GT || cond_code == LE)
+{
+  cond_code = swap_condition (cond_code);
+  std::swap (op1, op2);
+}
+
+  rtx tmp = gen_reg_rtx (DImode);
+  rtx tmp2 = gen_reg_rtx (DImode);
+  emit_insn (gen_subdi3 (tmp, op1, op2));
+  emit_insn (gen_lshrdi3 (tmp2, tmp, GEN_INT (63)));
+
+  rtx tmp3;
+  switch (cond_code)
+{
+default:
+  gcc_unreachable ();
+case LT:
+  tmp3 = tmp2;
+  break;
+case GE:
+  tmp3 = gen_reg_rtx (DImode);
+  emit_insn (gen_xordi3 (tmp3, tmp2, const1_rtx));
+  break;
+}
+
+  convert_move (operands[0], tmp3, 1);
+
+  DONE;
+})
+
 (define_expand "cstore4_signed_imm"
   [(use (match_operator 1 "signed_comparison_operator"
  [(match_operand:GPR 2 "gpc_reg_operand")
@@ -10688,6 +10733,11 @@ (define_expand "cstore4"
 emit_insn (gen_cstore4_unsigned (operands[0], operands[1],
   operands[2], operands[3]));
 
+  /* For comparisons smaller than Pmode we can cheaply do things in Pmode.  */
+  else if (mode == SImode && Pmode == DImode)
+emit_insn (gen_cstore_si_as_di (operands[0], operands[1],
+   operands[2], operands[3]));
+
   /* For signed comparisons against a constant, we can do some simple
  bit-twiddling.  */
   else if (signed_comparison_operator (operands[1], VOIDmode)
-- 
1.9.3




Re: [gomp-nvptx 8/9] libgomp: update gomp_nvptx_main for -mgomp

2015-12-02 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 06:28:26PM +0300, Alexander Monakov wrote:
> +void
> +gomp_nvptx_main (void (*fn) (void *), void *fn_data)
> +{
> +  int tid, ntids;
> +  asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
> +  asm ("mov.u32 %0, %%ntid.y;" : "=r"(ntids));

Formatting (missing space before ( ).

Jakub


Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-02 Thread Andreas Krebbel
On 12/02/2015 11:12 AM, Dominik Vogt wrote:
> Hopefully, this is correct now; it does pass the functional test case
> that's part of the patch.  Unfortunately the define_insn patters
> had to be duplicated because of the new subreg offsets.  

The number of patterns could possibly be reduced using the define_subst 
machinery.  I'm looking into
this for some other changes. No need to do this right now. We can do this later 
on-top.

> Not sure
> whether I've missed any "use" patterns that should be added.

With adding the length operand explicitly to the unspec we should have all the 
uses of the register
pair covered. To my understanding it is correct to remove these as done with 
your patch.

+   ; Convert Pmode to BLKmode
+   UNSPEC_REPLICATE_BYTE

The comment does not match.

+++ b/gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
@@ -0,0 +1,64 @@
+/* Machine description pattern tests.  */
+
+/* { dg-do run } */
+/* { dg-options "-mmvcle -dP" } */
...
+/* Check that the right patterns are used.  */
+/* { dg-final { scan-assembler-times {c:12 .*{[*]setmem_long_[36][14]z?}} 1 } 
} */
+/* { dg-final { scan-assembler-times {c:17 .*{[*]setmem_long_[36][14]z?}} 1 } 
} */

Don't you need a --save-temps as part of the options?

Apart from these things the patch looks good to me now. I'll wait two days for 
other comments before
applying it (I can do the remaining changes while doing the commit.).

Thanks!

-Andreas-


> 
> Ciao
> 
> Dominik ^_^  ^_^
> 



[PATCH] Fix undefined behavior in vect testcases

2015-12-02 Thread Richard Biener

Spotted by disabling init-regs.c (see PR61810).

Bah, parts of our testsuite should be -Wall clean, really.

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-02  Richard Biener  

* gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c: Fix uninitialized
y guarding a call to abort ().
* gcc.dg/vect/vect-strided-a-u8-i8-gap7.c: Likewise.
* gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c
===
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c 
(revision 231163)
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7-big-array.c 
(working copy)
@@ -26,7 +26,7 @@ main1 ()
   s *ptr = arr;
   s check_res[N];
   s res[N];
-  unsigned char u, t, s, x, y, z, w;
+  unsigned char u, t, s, x, z, w;
 
   for (i = 0; i < N; i++)
 {
Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
===
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c   (revision 
231163)
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c   (working copy)
@@ -25,7 +25,7 @@ main1 ()
   s arr[N];
   s *ptr = arr;
   s res[N];
-  unsigned char u, t, s, x, y, z, w;
+  unsigned char u, t, s, x, z, w;
 
   for (i = 0; i < N; i++)
 {
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c
===
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c   
(revision 231163)
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7-big-array.c   
(working copy)
@@ -26,7 +26,7 @@ main1 (s *arr)
   int i;
   s *ptr = arr;
   s res[N];
-  unsigned char u, t, s, x, y, z, w;
+  unsigned char u, t, s, x, z, w;
 
   for (i = 0; i < N; i++)
 {
@@ -65,7 +65,7 @@ int main (void)
 {
   int i;
   s arr[N];
-  unsigned char u, t, s, x, y, z, w;
+  unsigned char u, t, s, x, z, w;
 
   check_vect ();
 


Re: [gomp-nvptx 9/9] adjust SIMD loop lowering for SIMT targets

2015-12-02 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 06:28:27PM +0300, Alexander Monakov wrote:
> @@ -10218,12 +10218,37 @@ expand_omp_simd (struct omp_region *region, struct 
> omp_for_data *fd)
>  
>n1 = fd->loop.n1;
>n2 = fd->loop.n2;
> +  step = fd->loop.step;
> +  bool do_simt_transform
> += (cgraph_node::get (current_function_decl)->offloadable
> +   && !broken_loop
> +   && !safelen
> +   && !simduid
> +   && !(fd->collapse > 1));

expand_omp is depth-first expansion, so for the case where the simd
region is in lexically (directly or indirectly) nested inside of a
target region, the above will not trigger.  You'd need to
use cgraph_node::get (current_function_decl)->offloadable or
just walk through outer fields of region up and see if this isn't in
a target region.

Also, please consider privatized variables in the simd loops.
int
foo (int *p)
{
  int r = 0, i;
  #pragma omp simd reduction(+:r)
  for (i = 0; i < 32; i++)
{
  p[i] += i;
  r += i;
}
  return r;
}
#pragma omp declare target to (foo)

int
main ()
{
  int p[32], err, i;
  for (i = 0; i < 32; i++)
p[i] = i;
  #pragma omp target map(tofrom:p) map(from:err)
  {
int r = 0;
#pragma omp simd reduction(+:r)
for (i = 0; i < 32; i++)
{
  p[i] += i;
  r += i;
}
err = r != 31 * 32 / 2;
err |= foo (p) != 31 * 32 / 2;
  }
  if (err)
__builtin_abort ();
  for (i = 0; i < 32; i++)
if (p[i] != 3 * i)
  __builtin_abort ();
  return 0;
}

Here, it would be nice to extend omp_max_vf in the host compiler,
such that if PTX offloading is enabled, and optimize && !optimize_debug
(and vectorizer on the host not disabled, otherwise it won't be cleaned up
on the host), it returns MIN (32, whatever it would return otherwise).
And then arrange for the stores to and other operations on the "omp simd array"
attributed arrays before/after the simd loop to be handled specially for
SIMT, basically you want those to be .local, if non-addressable handled as
any other scalars, the loop up to GOMP_SIMD_LANES run exactly once, and for
the various reductions or lastprivate selection reduce it the SIMT way or
pick value from the thread in warp that had the last SIMT lane, etc.

> +  if (do_simt_transform)
> +{
> +  tree simt_lane
> + = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_LANE,
> + integer_type_node, 0);
> +  simt_lane = fold_convert (TREE_TYPE (step), simt_lane);
> +  simt_lane = fold_build2 (MULT_EXPR, TREE_TYPE (step), step, simt_lane);
> +  cfun->curr_properties &= ~PROP_gimple_lomp_dev;

How does this even compile?  simt_lane is a local var in the if
(do_simt_transform) body.
> +}
> +
>if (gimple_omp_for_combined_into_p (fd->for_stmt))
>  {
>tree innerc = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
>OMP_CLAUSE__LOOPTEMP_);
>gcc_assert (innerc);
>n1 = OMP_CLAUSE_DECL (innerc);
> +  if (do_simt_transform)
> + {
> +   n1 = fold_convert (type, n1);
> +   if (POINTER_TYPE_P (type))
> + n1 = fold_build_pointer_plus (n1, simt_lane);

And then you use it here, outside of its scope.

BTW, again, it would help if you post a simple *.ompexp dump on what exactly
you want to look it up.

Jakub


Re: [UPC 15/22] RTL changes

2015-12-02 Thread Richard Biener
On Tue, Dec 1, 2015 at 7:02 AM, Gary Funck  wrote:
>
> Background
> --
>
> An overview email, describing the UPC-related changes is here:
>   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg5.html
>
> The GUPC branch is described here:
>   http://gcc.gnu.org/projects/gupc.html
>
> The UPC-related source code differences are summarized here:
>   http://gccupc.org/gupc-changes
>
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.
>
> If you are on the cc-list, your name was chosen either
> because you are listed as a maintainer for the area that
> applies to the patches described in this email, or you
> were a frequent contributor of patches made to files listed
> in this email.
>
> In the change log entries included in each patch, the directory
> containing the affected files is listed, followed by the files.
> When the patches are applied, the change log entries will be
> distributed to the appropriate ChangeLog file.
>
> Overview
> 
>
> UPC pointers-to-shared have an internal representation which is
> defined as a 'struct' with three fields.  Special logic is
> needed in promote_mode() to handle this case.

Errr - but how are 'struct's ever REFERENCE_TYPE or POINTER_TYPE?

Richard.

> 2015-11-30  Gary Funck  
>
> gcc/
> * explow.c (promote_mode): For UPC pointer-to-shared values,
> return the mode of the UPC PTS representation type.
>
> Index: gcc/explow.c
> ===
> --- gcc/explow.c(.../trunk) (revision 231059)
> +++ gcc/explow.c(.../branches/gupc) (revision 231080)
> @@ -794,6 +794,8 @@ promote_mode (const_tree type ATTRIBUTE_
>  case REFERENCE_TYPE:
>  case POINTER_TYPE:
>*punsignedp = POINTERS_EXTEND_UNSIGNED;
> +  if (SHARED_TYPE_P (TREE_TYPE (type)))
> +return TYPE_MODE (upc_pts_type_node);
>return targetm.addr_space.address_mode
>(TYPE_ADDR_SPACE (TREE_TYPE (type)));
>break;


Re: [UPC 05/22] language hooks changes

2015-12-02 Thread Richard Biener
On Tue, Dec 1, 2015 at 7:02 AM, Gary Funck  wrote:
>
> Background
> --
>
> An overview email, describing the UPC-related changes is here:
>   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg5.html
>
> The GUPC branch is described here:
>   http://gcc.gnu.org/projects/gupc.html
>
> The UPC-related source code differences are summarized here:
>   http://gccupc.org/gupc-changes
>
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.
>
> If you are on the cc-list, your name was chosen either
> because you are listed as a maintainer for the area that
> applies to the patches described in this email, or you
> were a frequent contributor of patches made to files listed
> in this email.
>
> In the change log entries included in each patch, the directory
> containing the affected files is listed, followed by the files.
> When the patches are applied, the change log entries will be
> distributed to the appropriate ChangeLog file.
>
> Overview
> 
>
> Two new UPC-specific 'decl' language hooks are defined and then called from
> layout_decl() in stor-layout.c.  The layout_decl_p() function tests if
> this is a UPC shared array declaration that requires special handling.
> If it does, then layout_decl() is called.
>
> A few new UPC-specific language hooks are defined in a 'upc' sub-structure
> of the language hooks structure.  They are defined as
> hooks because they are called from code in the 'c-family/' directory,
> but are implemented in the 'c/' directory.

Please post these together with the langhook uses, otherwise it's really hard
to review.

I'll note that whatever "special" layout you want for your pointer
representation
should be better ensured via attributes if possible.

Richard.

> 2015-11-30  Gary Funck  
>
> gcc/
> * langhooks-def.h (lhd_do_nothing_b, lhd_do_nothing_t_t):
> New do nothing hook prototypes.
> (LANG_HOOKS_UPC_TOGGLE_KEYWORDS,
> LANG_HOOKS_UPC_PTS_INIT_TYPE, LANG_HOOKS_UPC_BUILD_INIT_FUNC,
> LANG_HOOKS_UPC_WRITE_GLOBAL_DECLS): New default UPC hooks.
> * langhooks-def.h (LANG_HOOKS_LAYOUT_DECL_P, LANG_HOOKS_LAYOUT_DECL):
> New language hook defaults.
> (LANG_HOOKS_UPC): New.  Define UPC hooks structure.
> * langhooks.c (lhd_do_nothing_b, lhd_do_nothing_t_t):
> New do nothing hooks.
> * langhooks.h (layout_decl_p, layout_decl): New language hooks.
> (lang_hooks_for_upc): New UPC language hooks structure.
> * stor-layout.c (layout_decl): Call the layout_decl_p() and
> and layout_decl() hooks.
> gcc/c/
> * c-lang.c: #include "c-upc-lang.h".
> #include "c-upc-low.h".
> (LANG_HOOKS_UPC_TOGGLE_KEYWORDS, LANG_HOOKS_UPC_PTS_INIT_TYPE,
> LANG_HOOKS_UPC_BUILD_INIT_FUNC, LANG_HOOKS_UPC_WRITE_GLOBAL_DECLS,
> LANG_HOOKS_LAYOUT_DECL_P, LANG_HOOKS_LAYOUT_DECL):
> Override defaults.  Define UPC-specific hook routines.
> * c-upc-lang.c: New.  Implement UPC-specific hook routines.
> * c-upc-lang.h: New.  Define UPC-specific hook prototypes.
>
> Index: gcc/langhooks-def.h
> ===
> --- gcc/langhooks-def.h (.../trunk) (revision 231059)
> +++ gcc/langhooks-def.h (.../branches/gupc) (revision 231080)
> @@ -35,7 +35,9 @@ struct diagnostic_info;
>  /* See langhooks.h for the definition and documentation of each hook.  */
>
>  extern void lhd_do_nothing (void);
> +extern void lhd_do_nothing_b (bool);
>  extern void lhd_do_nothing_t (tree);
> +extern void lhd_do_nothing_t_t (tree, tree);
>  extern void lhd_do_nothing_f (struct function *);
>  extern tree lhd_pass_through_t (tree);
>  extern bool lhd_post_options (const char **);
> @@ -175,6 +177,10 @@ extern tree lhd_make_node (enum tree_cod
>  #define LANG_HOOKS_GET_SUBRANGE_BOUNDS NULL
>  #define LANG_HOOKS_DESCRIPTIVE_TYPENULL
>  #define LANG_HOOKS_RECONSTRUCT_COMPLEX_TYPE reconstruct_complex_type
> +#define LANG_HOOKS_UPC_TOGGLE_KEYWORDS  lhd_do_nothing_b
> +#define LANG_HOOKS_UPC_PTS_INIT_TYPE  lhd_do_nothing
> +#define LANG_HOOKS_UPC_BUILD_INIT_FUNC lhd_do_nothing_t
> +#define LANG_HOOKS_UPC_WRITE_GLOBAL_DECLS lhd_do_nothing
>  #define LANG_HOOKS_ENUM_UNDERLYING_BASE_TYPE lhd_enum_underlying_base_type
>
>  #define LANG_HOOKS_FOR_TYPES_INITIALIZER { \
> @@ -219,6 +225,8 @@ extern tree lhd_make_node (enum tree_cod
>  #define LANG_HOOKS_OMP_CLAUSE_LINEAR_CTOR NULL
>  #define LANG_HOOKS_OMP_CLAUSE_DTOR hook_tree_tree_tree_null
>  #define LANG_HOOKS_OMP_FINISH_CLAUSE lhd_omp_finish_clause
> +#define LANG_HOOKS_LAYOUT_DECL_P hook_bool_tree_tree_false
> +#define LANG_HOOKS_LAYOUT_DECL lhd_do_nothing_t_t
>
>  #define LANG_HOOKS_DECLS { \
>LANG_HOOKS_GLOBAL_BINDINGS_P, \
> @@ -243,7 +251,9 @@ extern tree lhd_make_node (enum tree_cod
>LANG_HOOKS_OMP_CLAUSE_ASSIGN_OP, \
>LANG_HOOKS_

Re: [UPC 17/22] misc/common changes

2015-12-02 Thread Richard Biener
On Tue, Dec 1, 2015 at 7:02 AM, Gary Funck  wrote:
>
> Background
> --
>
> An overview email, describing the UPC-related changes is here:
>   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg5.html
>
> The GUPC branch is described here:
>   http://gcc.gnu.org/projects/gupc.html
>
> The UPC-related source code differences are summarized here:
>   http://gccupc.org/gupc-changes
>
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.
>
> If you are on the cc-list, your name was chosen either
> because you are listed as a maintainer for the area that
> applies to the patches described in this email, or you
> were a frequent contributor of patches made to files listed
> in this email.
>
> In the change log entries included in each patch, the directory
> containing the affected files is listed, followed by the files.
> When the patches are applied, the change log entries will be
> distributed to the appropriate ChangeLog file.
>
> Overview
> 
>
> Given that UPC pointers-to-shared (PTS's) have special arithmetic rules
> and their internal representation is a structure with
> three separate fields, they are not meaningfully convertible to integers
> and pointer arithmetic involving PTS's cannot be optimized in
> the same fashion as normal "C" pointer arithmetic.  Further,
> the representation of a NULL pointer-to-shared is different from
> a "C" null pointer.  Logic has been added to convert.c and jump.c
> to handle operations involving UPC PTS's.  In function.c,
> UPC pointers-to-shared which have an internal representation that
> is a 'struct' are treated as aggregates.  Also in function.c
> logic is added that prevents marking them as potential
> pointer register values.
>
> In varasm.c, a check is added for the linker section used by
> UPC to coalesce file scoped UPC shared variables.  This section
> is used only to assign offsets into UPC's shared data area for
> the UPC shared variables.  When UPC linker scripts are supported,
> this shared section is not loaded and has an origin of 0.

I think this also shows that using a POINTER_TYPE for a non-pointer
is bogus.  POINTER_TYPE is not for "semantically a pointer" but
for pointers.  Just use RECORD_TYPE here (and of course lower
things earlier).

Richard.

> 2015-11-30  Gary Funck  
>
> gcc/
> * convert.c (convert_to_pointer): Add check for null
> UPC pointer-to-shared.
> (convert_to_integer): Do not optimize pointer
> subtraction for UPC pointers-to-shared.
> (convert_to_integer): Issue error for an attempt
> to convert a UPC pointer-to-shared to an integer.
> * dojump.c (do_jump): If a UPC pointer-to-shared conversion
> can change representation, it must be compared in the result type.
> * function.c (aggregate_value_p): Handle 'struct' pointer-to-shared
> values as an aggregate when passing them as a return value.
> (assign_parm_setup_reg): Do not target UPC pointers-to-shared that are
> represented as a 'struct' into a pointer register.
> * varasm.c (default_section_type_flags): Handle UPC's shared
> section as BSS, and if a UPC link script is supported,
> make it a non-loadable, read-only section.
>
> Index: gcc/convert.c
> ===
> --- gcc/convert.c   (.../trunk) (revision 231059)
> +++ gcc/convert.c   (.../branches/gupc) (revision 231080)
> @@ -53,6 +53,14 @@ convert_to_pointer_1 (tree type, tree ex
>if (TREE_TYPE (expr) == type)
>  return expr;
>
> +  if (integer_zerop (expr) && POINTER_TYPE_P (type)
> +  && SHARED_TYPE_P (TREE_TYPE (type)))
> +{
> +  expr = copy_node (upc_null_pts_node);
> +  TREE_TYPE (expr) = build_unshared_type (type);
> +  return expr;
> +}
> +
>switch (TREE_CODE (TREE_TYPE (expr)))
>  {
>  case POINTER_TYPE:
> @@ -437,6 +445,16 @@ convert_to_integer_1 (tree type, tree ex
>return error_mark_node;
>  }
>
> +  /* Can't optimize the conversion of UPC shared pointer difference.  */
> +  if (ex_form == MINUS_EXPR
> +  && POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (expr, 0)))
> +  && POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (expr, 1)))
> +  && SHARED_TYPE_P (TREE_TYPE (TREE_TYPE (TREE_OPERAND (expr, 0
> +  && SHARED_TYPE_P (TREE_TYPE (TREE_TYPE (TREE_OPERAND (expr, 1)
> +  {
> +  return build1 (CONVERT_EXPR, type, expr);
> +  }
> +
>if (ex_form == COMPOUND_EXPR)
>  {
>tree t = convert_to_integer_1 (type, TREE_OPERAND (expr, 1), dofold);
> @@ -581,6 +599,12 @@ convert_to_integer_1 (tree type, tree ex
>  {
>  case POINTER_TYPE:
>  case REFERENCE_TYPE:
> +  if (SHARED_TYPE_P (TREE_TYPE (intype)))
> +{
> +  error ("invalid conversion from a UPC pointer-to-shared "
> +"to an integer

Re: [UPC 15/22] RTL changes

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 01:10:44PM +0100, Richard Biener wrote:
> > UPC pointers-to-shared have an internal representation which is
> > defined as a 'struct' with three fields.  Special logic is
> > needed in promote_mode() to handle this case.
> 
> Errr - but how are 'struct's ever REFERENCE_TYPE or POINTER_TYPE?

Then obviously the FE should lower the pointers to RECORD_TYPEs
then and present it to the middle-end as RECORD_TYPE.  For debug info,
the FEs lang flag or whatever on the side info should through some langhook
propagate the info that some var is a UPC pointer to shared into the debug
info, but the middle-end should treat it as any other RECORD_TYPE
afterwards.

> > 2015-11-30  Gary Funck  
> >
> > gcc/
> > * explow.c (promote_mode): For UPC pointer-to-shared values,
> > return the mode of the UPC PTS representation type.
> >
> > Index: gcc/explow.c
> > ===
> > --- gcc/explow.c(.../trunk) (revision 231059)
> > +++ gcc/explow.c(.../branches/gupc) (revision 231080)
> > @@ -794,6 +794,8 @@ promote_mode (const_tree type ATTRIBUTE_
> >  case REFERENCE_TYPE:
> >  case POINTER_TYPE:
> >*punsignedp = POINTERS_EXTEND_UNSIGNED;
> > +  if (SHARED_TYPE_P (TREE_TYPE (type)))
> > +return TYPE_MODE (upc_pts_type_node);
> >return targetm.addr_space.address_mode
> >(TYPE_ADDR_SPACE (TREE_TYPE (type)));
> >break;

Jakub


Re: [Patch 2/3][Aarch64] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-12-02 Thread James Greenhalgh
On Tue, Dec 01, 2015 at 04:34:01PM +, David Sherwood wrote:
> Hi,
> 
> Thanks for the comments James, I've moved the patterns around
> and added new comments to them. Hope this is ok.

This is fine.

Thanks,
James



Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-12-02 Thread Kirill Yukhin
Hello Jakub,

On 13 Nov 13:16, Jakub Jelinek wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/attr-simd.c
> 
> Similarly.
> 
> Ok for trunk with those changes.
It turns out that current implementation of GLibC does not
contain masked variants of math routines. So, this attribute
is useless until it is capable to generation only [nonmasked|maked]
variants of the routines.

Patch in the bottom introduces `notinbranch' and `inbranch' flags to
the attribute.

Bootstrapped and regtested. New tests pass.

Is it ok for trunk (GCC v6)?

gcc/
* c-family/c-common.c (c_common_attribute_table[]): Update max 
aerguments
count for "simd" attribute.
(handle_simd_attribute): Parse "notinbranch" and "inbranch" arguments.
* doc/extend.texi ("simd"): Describe new flags.
gcc/testsuite/
* c-c++-common/attr-simd-4.c: New test.
* c-c++-common/attr-simd-5.c: New test.

--
Thanks, K

> 
>   Jakub

commit cf458a0a00214022556498bdda94a07d0af70574
Author: Kirill Yukhin 
Date:   Mon Nov 30 16:24:39 2015 +0300

[attr-simd] Add notinbranch/inbranch flags.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 369574f..0104306 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -818,7 +818,7 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_omp_declare_simd_attribute, false },
   { "cilk simd function", 0, -1, true,  false, false,
  handle_omp_declare_simd_attribute, false },
-  { "simd",  0, 0, true,  false, false,
+  { "simd",  0, 1, true,  false, false,
  handle_simd_attribute, false },
   { "omp declare target", 0, 0, true, false, false,
  handle_omp_declare_target_attribute, false },
@@ -9032,7 +9032,7 @@ handle_omp_declare_simd_attribute (tree *, tree, tree, 
int, bool *)
 /* Handle a "simd" attribute.  */
 
 static tree
-handle_simd_attribute (tree *node, tree name, tree, int, bool *no_add_attrs)
+handle_simd_attribute (tree *node, tree name, tree args, int, bool 
*no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
 {
@@ -9045,9 +9045,41 @@ handle_simd_attribute (tree *node, tree name, tree, int, 
bool *no_add_attrs)
  *no_add_attrs = true;
}
   else
-   DECL_ATTRIBUTES (*node)
- = tree_cons (get_identifier ("omp declare simd"),
-  NULL_TREE, DECL_ATTRIBUTES (*node));
+   {
+ tree t = get_identifier ("omp declare simd");
+ tree attr = NULL_TREE;
+ if (args)
+   {
+ tree id = TREE_VALUE (args);
+
+ if (TREE_CODE (id) != STRING_CST)
+   {
+ error ("attribute %qE argument not a string", name);
+ *no_add_attrs = true;
+ return NULL_TREE;
+   }
+
+ if (strcmp (TREE_STRING_POINTER (id), "notinbranch") == 0)
+   attr = build_omp_clause (DECL_SOURCE_LOCATION (*node),
+OMP_CLAUSE_NOTINBRANCH);
+ else
+   if (strcmp (TREE_STRING_POINTER (id), "inbranch") == 0)
+ attr = build_omp_clause (DECL_SOURCE_LOCATION (*node),
+  OMP_CLAUSE_INBRANCH);
+   else
+   {
+ error ("only % and % flags are "
+"allowed for %<__simd__%> attribute");
+ *no_add_attrs = true;
+ return NULL_TREE;
+   }
+   }
+
+ DECL_ATTRIBUTES (*node) = tree_cons (t,
+  build_tree_list (NULL_TREE,
+   attr),
+  DECL_ATTRIBUTES (*node));
+   }
 }
   else
 {
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 63fce0f..c517038 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3144,6 +3144,7 @@ At the function call it will create resolver 
@code{ifunc}, that will
 dynamically call a clone suitable for current architecture.
 
 @item simd
+@itemx simd("@var{mask}")
 @cindex @code{simd} function attribute.
 This attribute enables creation of one or more function versions that
 can process multiple arguments using SIMD instructions from a
@@ -3158,6 +3159,9 @@ attribute on the same function.
 If the attribute is specified and @code{#pragma omp declare simd}
 present on a declaration and @code{-fopenmp} or @code{-fopenmp-simd}
 switch is specified, then the attribute is ignored.
+The optional argument @var{mask} may have "notinbranch" or "inbranch"
+value and instructs the compiler to generate non-masked or masked
+clones correspondingly. By default, all clones are generated.
 
 @item target (@var{options})
 @cindex @code{target} function attribute
diff --git a/gcc/testsuite/c-c++-common

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-02 Thread Ulrich Weigand
Andreas Krebbel wrote:
> On 12/02/2015 11:12 AM, Dominik Vogt wrote:
> > Hopefully, this is correct now; it does pass the functional test case
> > that's part of the patch.  Unfortunately the define_insn patters
> > had to be duplicated because of the new subreg offsets.  
> 
> The number of patterns could possibly be reduced using the define_subst 
> machinery.  I'm looking into
> this for some other changes. No need to do this right now. We can do this 
> later on-top.

For this particular issue, shouldn't a simple mode_attr be OK?
I see that the sh port uses this:

(define_mode_attr lowpart_be [(QI "3") (HI "2")])

  [(set (reg:SI T_REG)
(eq:SI
  (subreg:QIHI
(and:SI (match_operand:SI 0 "arith_reg_operand")
(match_operand:SI 1 "arith_reg_operand")) )
  (const_int 0)))]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell

On 12/02/15 05:40, Jakub Jelinek wrote:

 Don't know the HW good enough, is there any power consumption, heat etc.
difference between the two approaches?  I mean does the HW consume different
amount of power if only one thread in a warp executes code and the other
threads in the same warp just jump around it, vs. having all threads busy?


Having all threads busy will increase power consumption.  It's also bad if the 
other vectors are executing memory access instructions.  However, for small 
blocks, it is probably a win over the jump around approach.  One of the 
optimizations for the future of the neutering algorithm is to add such 
predication for small blocks and keep branching for the larger blocks.



How exactly does OpenACC copy the stack?  At least for OpenMP, one could
have automatic vars whose addresses are passed to simd regions in different
functions, say like:


The stack frame of the current function is copied when entering a partitioned 
region.  (There is no visibility of caller's frame and such.) Again, 
optimization would be trying to only copy the stack that's used in the 
partitioned region.


nathan


[PATCH] Fix PRs 67800 and 68333

2015-12-02 Thread Richard Biener

This partly reverts / implements differently a patch done to support
SLP reductions for SAD_EXPRs (gcc.dg/vect/slp-reduc-sad.c).  Detecting
those patterns unconditionally causes missed vectorization opportunities
as we don't implement vectorizing them in non-reduction context.

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Richard.

2015-12-02  Richard Biener  

PR tree-optimization/67800
PR tree-optimization/68333
* tree-vect-patterns.c (vect_recog_dot_prod_pattern): Restore
restriction to reduction contexts but allow SLP reductions as well.
(vect_recog_sad_pattern): Likewise.
(vect_recog_widen_sum_pattern): Likewise.

* gcc.target/i386/vect-pr67800.c: New testcase.

Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c(revision 231167)
+++ gcc/tree-vect-patterns.c(working copy)
@@ -312,6 +312,9 @@ vect_recog_dot_prod_pattern (vec *s
 {
   gimple *def_stmt;
 
+  if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def
+ && ! STMT_VINFO_GROUP_FIRST_ELEMENT (stmt_vinfo))
+   return NULL;
   plus_oprnd0 = gimple_assign_rhs1 (last_stmt);
   plus_oprnd1 = gimple_assign_rhs2 (last_stmt);
   if (!types_compatible_p (TREE_TYPE (plus_oprnd0), sum_type)
@@ -1152,6 +1158,10 @@ vect_recog_widen_sum_pattern (vec> 
SCALE);
+  i = (byte)(((R2I * r) + (G2I * g) + (B2I * b) + (1 << (SCALE - 1))) >> 
SCALE);
+  q = (byte)(((R2Q * r) + (G2Q * g) + (B2Q * b) + (1 << (SCALE - 1))) >> 
SCALE);
+
+  *out++ = y;
+ *out++ = i;
+ *out++ = q;
+  }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
> On 12/02/15 05:40, Jakub Jelinek wrote:
> > Don't know the HW good enough, is there any power consumption, heat etc.
> >difference between the two approaches?  I mean does the HW consume different
> >amount of power if only one thread in a warp executes code and the other
> >threads in the same warp just jump around it, vs. having all threads busy?
> 
> Having all threads busy will increase power consumption.  It's also bad if
> the other vectors are executing memory access instructions.  However, for

Then the uniform SIMT approach might not be that good idea.

> small blocks, it is probably a win over the jump around approach.  One of
> the optimizations for the future of the neutering algorithm is to add such
> predication for small blocks and keep branching for the larger blocks.
> 
> >How exactly does OpenACC copy the stack?  At least for OpenMP, one could
> >have automatic vars whose addresses are passed to simd regions in different
> >functions, say like:
> 
> The stack frame of the current function is copied when entering a
> partitioned region.  (There is no visibility of caller's frame and such.)
> Again, optimization would be trying to only copy the stack that's used in
> the partitioned region.

Always the whole stack, from the current stack pointer up to top of the
stack, so sometimes a few bytes, sometimes a few kilobytes or more each time?

Jakub


Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-02 Thread Tom de Vries

On 02/12/15 10:45, Jakub Jelinek wrote:

On Fri, Nov 27, 2015 at 12:42:09PM +0100, Tom de Vries wrote:

--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1366,10 +1366,12 @@ build_sender_ref (tree var, omp_context *ctx)
return build_sender_ref ((splay_tree_key) var, ctx);
  }

-/* Add a new field for VAR inside the structure CTX->SENDER_DECL.  */
+/* Add a new field for VAR inside the structure CTX->SENDER_DECL.  If
+   BASE_POINTERS_RESTRICT, declare the field with restrict.  */

  static void
-install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
+install_var_field_1 (tree var, bool by_ref, int mask, omp_context *ctx,
+bool base_pointers_restrict)


Ugh, why the renaming?  Just use default argument:
bool base_pointers_restrict = false


+/* As install_var_field_1, but with base_pointers_restrict == false.  */
+
+static void
+install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
+{
+  install_var_field_1 (var, by_ref, mask, ctx, false);
+}


And avoid the wrapper.


  /* Instantiate decls as necessary in CTX to satisfy the data sharing
-   specified by CLAUSES.  */
+   specified by CLAUSES.  If BASE_POINTERS_RESTRICT, install var field with
+   restrict.  */

  static void
-scan_sharing_clauses (tree clauses, omp_context *ctx)
+scan_sharing_clauses_1 (tree clauses, omp_context *ctx,
+   bool base_pointers_restrict)


Likewise.

Otherwise LGTM,


Hi Jakub,

thanks for the review.


but I'm worried if this isn't related in any way to
PR68640 and might not make things worse.



AFAIU, they're sort of opposite cases:
- in the case of the PR, we add restrict in a function argument
  by accident
- in the case of this patch, we add restrict in a function argument
  by analysis

[ Btw, now that this patch (which exploits GOMP_MAP_FORCE_* mappings)
  is OK-ed, the patch "Fix oacc kernels default mapping for scalars" at
  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03334.html becomes more
  relevant, since that one ensures that scalars by default
  get the GOMP_MAP_FORCE_COPY mapping (rather than the incorrect
  GOMP_MAP_COPY) ]

Thanks,
- Tom


Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-02 Thread Andreas Krebbel
On 12/02/2015 01:51 PM, Ulrich Weigand wrote:
> Andreas Krebbel wrote:
>> On 12/02/2015 11:12 AM, Dominik Vogt wrote:
>>> Hopefully, this is correct now; it does pass the functional test case
>>> that's part of the patch.  Unfortunately the define_insn patters
>>> had to be duplicated because of the new subreg offsets.  
>>
>> The number of patterns could possibly be reduced using the define_subst 
>> machinery.  I'm looking into
>> this for some other changes. No need to do this right now. We can do this 
>> later on-top.
> 
> For this particular issue, shouldn't a simple mode_attr be OK?
> I see that the sh port uses this:
> 
> (define_mode_attr lowpart_be [(QI "3") (HI "2")])
> 
>   [(set (reg:SI T_REG)
> (eq:SI
>   (subreg:QIHI
> (and:SI (match_operand:SI 0 "arith_reg_operand")
> (match_operand:SI 1 "arith_reg_operand")) )
>   (const_int 0)))]

Unfortunately in our case the attribute value doesn't only depend on the mode.  
It also depends on
zarch/esa.  We would need some kind of conditional attribute.

-Andreas-




[PATCH] Avoid SAVE_EXPR generation from generic-match.c

2015-12-02 Thread Richard Biener

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

This should solve the generic-match.c part of the C_MAYBE_CONST_EXPR
issues.

Richard.

2015-12-02  Richard Biener  

* tree.h (tree_invariant_p): Declare.
* tree.c (tree_invariant_p): Export.
* genmatch.c (dt_simplify::gen_1): For GENERIC code-gen never
create SAVE_EXPRs but reject patterns if we would need to.

Index: gcc/tree.c
===
*** gcc/tree.c  (revision 231167)
--- gcc/tree.c  (working copy)
*** decl_address_ip_invariant_p (const_tree
*** 3231,3238 
 not handle arithmetic; that's handled in skip_simple_arithmetic and
 tree_invariant_p).  */
  
- static bool tree_invariant_p (tree t);
- 
  static bool
  tree_invariant_p_1 (tree t)
  {
--- 3231,3236 
*** tree_invariant_p_1 (tree t)
*** 3282,3288 
  
  /* Return true if T is function-invariant.  */
  
! static bool
  tree_invariant_p (tree t)
  {
tree inner = skip_simple_arithmetic (t);
--- 3280,3286 
  
  /* Return true if T is function-invariant.  */
  
! bool
  tree_invariant_p (tree t)
  {
tree inner = skip_simple_arithmetic (t);
Index: gcc/tree.h
===
*** gcc/tree.h  (revision 231167)
--- gcc/tree.h  (working copy)
*** extern tree staticp (tree);
*** 4320,4325 
--- 4320,4329 
  
  extern tree save_expr (tree);
  
+ /* Return true if T is function-invariant.  */
+ 
+ extern bool tree_invariant_p (tree);
+ 
  /* Look inside EXPR into any simple arithmetic operations.  Return the
 outermost non-arithmetic or non-invariant node.  */
  
Index: gcc/genmatch.c
===
*** gcc/genmatch.c  (revision 231167)
--- gcc/genmatch.c  (working copy)
*** dt_simplify::gen_1 (FILE *f, int indent,
*** 3119,3126 
if (cinfo.info[i].result_use_count
> cinfo.info[i].match_use_count)
  fprintf_indent (f, indent,
! "captures[%d] = save_expr (captures[%d]);\n",
! i, i);
  }
  for (unsigned j = 0; j < e->ops.length (); ++j)
{
--- 3119,3126 
if (cinfo.info[i].result_use_count
> cinfo.info[i].match_use_count)
  fprintf_indent (f, indent,
! "if (! tree_invariant_p (captures[%d])) "
! "return NULL_TREE;\n", i);
  }
  for (unsigned j = 0; j < e->ops.length (); ++j)
{


Re: [PATCH] Fix oacc kernels default mapping for scalars

2015-12-02 Thread Jakub Jelinek
On Fri, Nov 27, 2015 at 12:29:21PM +0100, Tom de Vries wrote:
> Fix oacc kernels default mapping for scalars
> 
> 2015-11-27  Tom de Vries  
> 
>   * gimplify.c (enum gimplify_omp_var_data): Add enum value
>   GOVD_MAP_FORCE.
>   (oacc_default_clause): Fix default for scalars in oacc kernels.
>   (gimplify_adjust_omp_clauses_1): Handle GOVD_MAP_FORCE.
> 
>   * c-c++-common/goacc/kernels-default-2.c: New test.
>   * c-c++-common/goacc/kernels-default.c: New test.
> 
> ---
>  gcc/gimplify.c   | 19 ++-
>  gcc/testsuite/c-c++-common/goacc/kernels-default-2.c | 17 +
>  gcc/testsuite/c-c++-common/goacc/kernels-default.c   | 14 ++
>  3 files changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index fcac745..68d90bf 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -87,6 +87,9 @@ enum gimplify_omp_var_data
>/* Flag for GOVD_MAP, if it is always, to or always, tofrom mapping.  */
>GOVD_MAP_ALWAYS_TO = 65536,
>  
> +  /* Flag for GOVD_MAP, if it is a forced mapping.  */
> +  GOVD_MAP_FORCE = 131072,

The patch has been outdated already when posted, there is GOVD_WRITTEN at
this spot.

Once you fix this, it is ok for trunk.

Jakub


[PATCH] Fix PR68639

2015-12-02 Thread Richard Biener

The testcase shows that we rely on consistently rejecting all group
members from vect_analyze_data_ref_access but that isn't done reliably
when they are not in the same loop level (and thus
nested_in_vect_loop_p doesn't return the same answer for all group
elements).  Of course such groups are hardly useful thus the
following patch fixes group detection for this (and not the still
somewhat fragile 'mark this as non-group' code in
vect_analyze_data_ref_access).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-12-02  Richard Biener  

PR tree-optimization/68639
* tree-vect-data-refs.c (dr_group_sort_cmp): Split groups
belonging to different loops.
(vect_analyze_data_ref_accesses): Likewise.

* gfortran.fortran-torture/compile/pr68639.f90: New testcase.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 231163)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -2597,6 +2597,12 @@ dr_group_sort_cmp (const void *dra_, con
   if (dra == drb)
 return 0;
 
+  /* DRs in different loops never belong to the same group.  */
+  loop_p loopa = gimple_bb (DR_STMT (dra))->loop_father;
+  loop_p loopb = gimple_bb (DR_STMT (drb))->loop_father;
+  if (loopa != loopb)
+return loopa->num < loopb->num ? -1 : 1;
+
   /* Ordering of DRs according to base.  */
   if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0))
 {
@@ -2688,6 +2694,12 @@ vect_analyze_data_ref_accesses (vec_info
 matters we can push those to a worklist and re-iterate
 over them.  The we can just skip ahead to the next DR here.  */
 
+ /* DRs in a different loop should not be put into the same
+interleaving group.  */
+ if (gimple_bb (DR_STMT (dra))->loop_father
+ != gimple_bb (DR_STMT (drb))->loop_father)
+   break;
+
  /* Check that the data-refs have same first location (except init)
 and they are both either store or load (not load and store,
 not masked loads or stores).  */
Index: gcc/testsuite/gfortran.fortran-torture/compile/pr68639.f90
===
--- gcc/testsuite/gfortran.fortran-torture/compile/pr68639.f90  (revision 0)
+++ gcc/testsuite/gfortran.fortran-torture/compile/pr68639.f90  (working copy)
@@ -0,0 +1,22 @@
+  SUBROUTINE makeCoulE0(natorb,Coul)
+INTEGER, PARAMETER :: dp=8
+REAL(KIND=dp), PARAMETER :: fourpi=432.42, oorootpi=13413.3142
+INTEGER :: natorb
+REAL(KIND=dp), DIMENSION(45, 45), &
+  INTENT(OUT):: Coul
+INTEGER  :: gpt, imA, imB, k1, k2, k3, &
+k4, lp, mp, np
+REAL(KIND=dp):: alpha, d2f(3,3), &
+d4f(3,3,3,3), f, ff, w
+REAL(KIND=dp), DIMENSION(3, 45)  :: M1A
+REAL(KIND=dp), DIMENSION(45) :: M0A
+DO imA=1, (natorb*(natorb+1))/2
+   DO imB=1, (natorb*(natorb+1))/2
+  w= M0A(imA)*M0A(imB)
+  DO k1=1,3
+w=w+ M1A(k1,imA)*M1A(k1,imB)
+  ENDDO
+  Coul(imA,imB)=Coul(imA,imB)-4.0_dp*alpha**3*oorootpi*w/3.0_dp
+   ENDDO
+ENDDO
+  END SUBROUTINE makeCoulE0


Re: [UPC 17/22] misc/common changes

2015-12-02 Thread Eric Botcazou
> I think this also shows that using a POINTER_TYPE for a non-pointer
> is bogus.  POINTER_TYPE is not for "semantically a pointer" but
> for pointers.  Just use RECORD_TYPE here (and of course lower
> things earlier).

FWIW that's what Ada does for its fat pointers.

-- 
Eric Botcazou


Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-02 Thread Andreas Krebbel
On 12/02/2015 02:11 PM, Andreas Krebbel wrote:
> On 12/02/2015 01:51 PM, Ulrich Weigand wrote:
>> Andreas Krebbel wrote:
>>> On 12/02/2015 11:12 AM, Dominik Vogt wrote:
 Hopefully, this is correct now; it does pass the functional test case
 that's part of the patch.  Unfortunately the define_insn patters
 had to be duplicated because of the new subreg offsets.  
>>>
>>> The number of patterns could possibly be reduced using the define_subst 
>>> machinery.  I'm looking into
>>> this for some other changes. No need to do this right now. We can do this 
>>> later on-top.
>>
>> For this particular issue, shouldn't a simple mode_attr be OK?
>> I see that the sh port uses this:
>>
>> (define_mode_attr lowpart_be [(QI "3") (HI "2")])
>>
>>   [(set (reg:SI T_REG)
>> (eq:SI
>>   (subreg:QIHI
>> (and:SI (match_operand:SI 0 "arith_reg_operand")
>> (match_operand:SI 1 "arith_reg_operand")) )
>>   (const_int 0)))]
> 
> Unfortunately in our case the attribute value doesn't only depend on the 
> mode.  It also depends on
> zarch/esa.  We would need some kind of conditional attribute.

However, we could probably at least merge the two zarch variants.

The define_subst is probably the right thing for adding the AND around the 
padding byte operand.

Bye,

-Andreas-




Re: Gimple loop splitting v2

2015-12-02 Thread Michael Matz
Hi,

On Tue, 1 Dec 2015, Jeff Law wrote:

> > So, okay for trunk?
> -ENOPATCH

Sigh :)
Here it is.


Ciao,
Michael.
* common.opt (-fsplit-loops): New flag.
* passes.def (pass_loop_split): Add.
* opts.c (default_options_table): Add OPT_fsplit_loops entry at -O3.
(enable_fdo_optimizations): Add loop splitting.
* timevar.def (TV_LOOP_SPLIT): Add.
* tree-pass.h (make_pass_loop_split): Declare.
* tree-ssa-loop-manip.h (rewrite_into_loop_closed_ssa_1): Declare.
* tree-ssa-loop-unswitch.c: Include tree-ssa-loop-manip.h,
* tree-ssa-loop-split.c: New file.
* Makefile.in (OBJS): Add tree-ssa-loop-split.o.
* doc/invoke.texi (fsplit-loops): Document.
* doc/passes.texi (Loop optimization): Add paragraph about loop
splitting.

testsuite/
* gcc.dg/loop-split.c: New test.

Index: common.opt
===
--- common.opt  (revision 231115)
+++ common.opt  (working copy)
@@ -2453,6 +2457,10 @@ funswitch-loops
 Common Report Var(flag_unswitch_loops) Optimization
 Perform loop unswitching.
 
+fsplit-loops
+Common Report Var(flag_split_loops) Optimization
+Perform loop splitting.
+
 funwind-tables
 Common Report Var(flag_unwind_tables) Optimization
 Just generate unwind tables for exception handling.
Index: passes.def
===
--- passes.def  (revision 231115)
+++ passes.def  (working copy)
@@ -252,6 +252,7 @@ along with GCC; see the file COPYING3.
  NEXT_PASS (pass_dce);
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
+ NEXT_PASS (pass_loop_split);
  NEXT_PASS (pass_record_bounds);
  NEXT_PASS (pass_loop_distribution);
  NEXT_PASS (pass_copy_prop);
Index: opts.c
===
--- opts.c  (revision 231115)
+++ opts.c  (working copy)
@@ -532,6 +532,7 @@ static const struct default_options defa
regardless of them being declared inline.  */
 { OPT_LEVELS_3_PLUS_AND_SIZE, OPT_finline_functions, NULL, 1 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_finline_functions_called_once, NULL, 1 
},
+{ OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
@@ -1411,6 +1412,8 @@ enable_fdo_optimizations (struct gcc_opt
 opts->x_flag_ipa_cp_alignment = value;
   if (!opts_set->x_flag_predictive_commoning)
 opts->x_flag_predictive_commoning = value;
+  if (!opts_set->x_flag_split_loops)
+opts->x_flag_split_loops = value;
   if (!opts_set->x_flag_unswitch_loops)
 opts->x_flag_unswitch_loops = value;
   if (!opts_set->x_flag_gcse_after_reload)
Index: timevar.def
===
--- timevar.def (revision 231115)
+++ timevar.def (working copy)
@@ -182,6 +182,7 @@ DEFTIMEVAR (TV_LIM   , "
 DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST, "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH, "tree loop unswitching")
+DEFTIMEVAR (TV_LOOP_SPLIT, "loop splitting")
 DEFTIMEVAR (TV_COMPLETE_UNROLL   , "complete unrolling")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
 DEFTIMEVAR (TV_TREE_VECTORIZATION, "tree vectorization")
Index: tree-pass.h
===
--- tree-pass.h (revision 231115)
+++ tree-pass.h (working copy)
@@ -370,6 +370,7 @@ extern gimple_opt_pass *make_pass_tree_n
 extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
Index: tree-ssa-loop-manip.h
===
--- tree-ssa-loop-manip.h   (revision 231115)
+++ tree-ssa-loop-manip.h   (working copy)
@@ -24,6 +24,8 @@ typedef void (*transform_callback)(struc
 
 extern void create_iv (tree, tree, tree, struct loop *, gimple_stmt_iterator *,
   bool, tree *, tree *);
+extern void rewrite_into_loop_closed_ssa_1 (bitmap, unsigned, int,
+   struct loop *);
 extern void rewrite_into_loop_closed_ssa (bitmap, unsigned);
 extern void rewrite_virtuals_into_loop_closed_ssa (struct loop *);
 extern void verify_loop_closed_ssa (bool);
Index: Makefile.in
==

Re: [PATCH] Fix PR68559

2015-12-02 Thread Alan Lawrence

On 27/11/15 14:13, Richard Biener wrote:


The following fixes the excessive peeling for gaps we do when doing
SLP now that I removed most of the restrictions on having gaps in
the first place.

This should make low-trip vectorized loops more efficient (sth
also the combine-epilogue-with-vectorized-body-by-masking patches
claim to do).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-27  Richard Biener  

PR tree-optimization/68559
* tree-vect-data-refs.c (vect_analyze_group_access_1): Move
peeling for gap checks ...
* tree-vect-stmts.c (vectorizable_load): ... here and relax
for SLP.
* tree-vect-loop.c (vect_analyze_loop_2): Re-set
LOOP_VINFO_PEELING_FOR_GAPS before re-trying without SLP.

* gcc.dg/vect/slp-perm-4.c: Adjust again.
* gcc.dg/vect/pr45752.c: Likewise.


Since this, we have

FAIL: gcc.dg/vect/pr45752.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"gaps requires scalar epilogue loop" 0
FAIL: gcc.dg/vect/pr45752.c scan-tree-dump-times vect "gaps requires scalar 
epilogue loop" 0


on aarch64 platforms (aarch64-none-linux-gnu, aarch64-none-elf, 
aarch64_be-none-elf).



Thanks, Alan



Re: [PATCH] Fix PR68559

2015-12-02 Thread Richard Biener
On Wed, 2 Dec 2015, Alan Lawrence wrote:

> On 27/11/15 14:13, Richard Biener wrote:
> > 
> > The following fixes the excessive peeling for gaps we do when doing
> > SLP now that I removed most of the restrictions on having gaps in
> > the first place.
> > 
> > This should make low-trip vectorized loops more efficient (sth
> > also the combine-epilogue-with-vectorized-body-by-masking patches
> > claim to do).
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > 
> > Richard.
> > 
> > 2015-11-27  Richard Biener  
> > 
> > PR tree-optimization/68559
> > * tree-vect-data-refs.c (vect_analyze_group_access_1): Move
> > peeling for gap checks ...
> > * tree-vect-stmts.c (vectorizable_load): ... here and relax
> > for SLP.
> > * tree-vect-loop.c (vect_analyze_loop_2): Re-set
> > LOOP_VINFO_PEELING_FOR_GAPS before re-trying without SLP.
> > 
> > * gcc.dg/vect/slp-perm-4.c: Adjust again.
> > * gcc.dg/vect/pr45752.c: Likewise.
> 
> Since this, we have
> 
> FAIL: gcc.dg/vect/pr45752.c -flto -ffat-lto-objects  scan-tree-dump-times vect
> "gaps requires scalar epilogue loop" 0
> FAIL: gcc.dg/vect/pr45752.c scan-tree-dump-times vect "gaps requires scalar
> epilogue loop" 0
> 
> on aarch64 platforms (aarch64-none-linux-gnu, aarch64-none-elf,
> aarch64_be-none-elf).

Can you open a bug and attach -details vectorizer dumps?

Richard.


Re: [gomp-nvptx 1/9] nvptx backend: allow emitting COND_EXEC insns

2015-12-02 Thread Bernd Schmidt

On 12/01/2015 04:28 PM, Alexander Monakov wrote:

This allows to use COND_EXEC patterns on nvptx.  The backend is mostly ready
for that, although I had to slightly fix nvptx_print_operand.  I've also opted
to make calls predicable to make the uniform-simt patch simpler, and to that
end I need a small fixup in nvptx_output_call_insn.

RTL optimization won't emit COND_EXEC insns, because it's done only after
reload, and register allocation is not done.  I need this patch to create
COND_EXEC patterns in the backend during reorg.


This looks OK to me (in general, not in the sense of OK for trunk in 
stage 3).



Bernd



Re: [RFA] Compact EH Patch

2015-12-02 Thread Jonathan Wakely

On 01/12/15 16:33 -0500, Jason Merrill wrote:

On 11/25/2015 11:58 AM, Moore, Catherine wrote:




-Original Message-
From: Richard Henderson [mailto:r...@redhat.com]
Sent: Friday, September 18, 2015 3:25 PM
To: Moore, Catherine; gcc-patches@gcc.gnu.org
Cc: ja...@redhat.com; Matthew Fortune
Subject: Re: [RFA] Compact EH Patch


Index: libgcc/libgcc-std.ver.in


==
=

--- libgcc/libgcc-std.ver.in(revision 226409)
+++ libgcc/libgcc-std.ver.in(working copy)
@@ -1918,6 +1918,7 @@ GCC_4.6.0 {
   __morestack_current_segment
   __morestack_initial_sp
   __splitstack_find
+  _Unwind_GetEhEncoding
 }

 %inherit GCC_4.7.0 GCC_4.6.0
@@ -1938,3 +1939,8 @@ GCC_4.7.0 {
 %inherit GCC_4.8.0 GCC_4.7.0
 GCC_4.8.0 {
 }
+
+%inherit GCC_4.8.0 GCC_4.7.0
+GCC_4.8.0 {
+  __register_frame_info_header_bases
+}


You can't push new symbols into old versions.  These have to go into the
version for the current gcc.


Index: libstdc++-v3/config/abi/pre/gnu.ver


==
=

--- libstdc++-v3/config/abi/pre/gnu.ver (revision 226409)
+++ libstdc++-v3/config/abi/pre/gnu.ver (working copy)
@@ -1909,6 +1909,7 @@ CXXABI_1.3 {
 __gxx_personality_v0;
 __gxx_personality_sj0;
 __gxx_personality_seh0;
+__gnu_compact_pr2;
 __dynamic_cast;

 # *_type_info classes, ctor and dtor
Index: libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver


==
=

--- libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver

(revision 226409)

+++ libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver

(working copy)

@@ -200,6 +200,7 @@ CXXABI_2.0 {
 __cxa_vec_new;
 __gxx_personality_v0;
 __gxx_personality_sj0;
+__gnu_compact_pr2;
 __dynamic_cast;

 # std::exception_ptr


Likewise.


I'm getting ready to post the updates to this patch -- hopefully, I can still 
get it in GCC 6.0.
I'm not sure how to tell what the current CXXABI is for these two files.  
Should it be CXXABI_2.0 for both of these?


Jonathan, can you answer this question?


It's OK to add to CXXABI_2.0 for the gnu-versioned-namespace file,
when you configure libstdc++ to use the versioned namespaces you're
explicitly requesting no ABI compatibility, and everything gets the
same version.

For gnu.ver you need to create a new symbol version and add new
symbols in there. CXXABI_1.3.9 was present in GCC 5, so you need to
create CXXABI_1.3.10 for your new symbol.

CXXABI_1.3.10 {

   __gnu_compact_pr2;

} CXXABI_1.3.9;


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell

On 12/02/15 08:10, Jakub Jelinek wrote:

On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:



Always the whole stack, from the current stack pointer up to top of the
stack, so sometimes a few bytes, sometimes a few kilobytes or more each time?


The frame of the current function.  Not the whole stack.  As I said, there's no 
visibility of the stack beyond the current function.  (one could implement some 
kind of chaining, I guess)


PTX does not expose the concept of a stack at all.  No stack pointer, no link 
register, no argument pushing.


It does expose 'local' memory, which is private to a thread and only live during 
a function (not like function-scope 'static').  From that we construct stack frames.


The rules of PTX are such that one can (almost) determine the call graph 
statically.  I don't know whether the JIT implements .local as a stack or 
statically allocates it (and perhaps uses a liveness algorithm to determine 
which pieces may overlap).  Perhaps it depends on the physical device capabilities.


The 'almost' fails with indirect calls, except that
1) at an indirect call, you may specify the static set of fns you know it'll 
resolve to
2) if you don't know that, you have to specify the function prototype anyway. 
So the static set would be 'all functions of that type'.


I don't know if the JIT makes use of that information.

nathan



Re: [PATCH, PING*4] Track indirect calls for call site information in debug info.

2015-12-02 Thread Pierre-Marie de Rodat

On 11/24/2015 06:10 PM, Jakub Jelinek wrote:

The new pass is IMNSHO completely useless and undesirable, both for compile
time (another whole IL traversal) reasons and for the unnecessary creation
of memory allocations.


Understood. Thank you very much for explaining how you think it should 
be! Here’s the patch implemeting this, bootstrapped and regtested 
without regression on x86_64-linux.


--
Pierre-Marie de Rodat
>From 41ed1a37921b4f9c5f762334265e72fd8e4b4a25 Mon Sep 17 00:00:00 2001
From: Pierre-Marie de Rodat 
Date: Thu, 13 Jun 2013 11:13:08 +0200
Subject: [PATCH] Track indirect calls for call site information in debug info

gcc/ChangeLog:

	* dwarf2out.c (dwar2out_var_location): In addition to notes,
	process indirect calls whose target is compile-time known.
	Enhance pattern matching to get the SYMBOL_REF they embed.
	(gen_subprogram_die): Handle such calls.
	* final.c (final_scan_insn): For call instructions, invoke the
	var_location debug hook only after the call has been emitted.
---
 gcc/dwarf2out.c | 97 -
 gcc/final.c | 11 +--
 2 files changed, 84 insertions(+), 24 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 357f114..6af57b5 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -19268,7 +19268,9 @@ gen_subprogram_die (tree decl, dw_die_ref context_die)
 	  rtx tloc = NULL_RTX, tlocc = NULL_RTX;
 	  rtx arg, next_arg;
 
-	  for (arg = NOTE_VAR_LOCATION (ca_loc->call_arg_loc_note);
+	  for (arg = (ca_loc->call_arg_loc_note != NULL_RTX
+			  ? NOTE_VAR_LOCATION (ca_loc->call_arg_loc_note)
+			  : NULL_RTX);
 		   arg; arg = next_arg)
 		{
 		  dw_loc_descr_ref reg, val;
@@ -19291,18 +19293,23 @@ gen_subprogram_die (tree decl, dw_die_ref context_die)
 		}
 		  if (mode == VOIDmode || mode == BLKmode)
 		continue;
-		  if (XEXP (XEXP (arg, 0), 0) == pc_rtx)
+		  /* Get dynamic information about call target only if we
+		 have no static information: we cannot generate both
+		 DW_AT_abstract_origin and DW_AT_GNU_call_site_target
+		 attributes.  */
+		  if (ca_loc->symbol_ref == NULL_RTX)
 		{
-		  gcc_assert (ca_loc->symbol_ref == NULL_RTX);
-		  tloc = XEXP (XEXP (arg, 0), 1);
-		  continue;
-		}
-		  else if (GET_CODE (XEXP (XEXP (arg, 0), 0)) == CLOBBER
-			   && XEXP (XEXP (XEXP (arg, 0), 0), 0) == pc_rtx)
-		{
-		  gcc_assert (ca_loc->symbol_ref == NULL_RTX);
-		  tlocc = XEXP (XEXP (arg, 0), 1);
-		  continue;
+		  if (XEXP (XEXP (arg, 0), 0) == pc_rtx)
+			{
+			  tloc = XEXP (XEXP (arg, 0), 1);
+			  continue;
+			}
+		  else if (GET_CODE (XEXP (XEXP (arg, 0), 0)) == CLOBBER
+			   && XEXP (XEXP (XEXP (arg, 0), 0), 0) == pc_rtx)
+			{
+			  tlocc = XEXP (XEXP (arg, 0), 1);
+			  continue;
+			}
 		}
 		  reg = NULL;
 		  if (REG_P (XEXP (XEXP (arg, 0), 0)))
@@ -22289,6 +22296,7 @@ dwarf2out_var_location (rtx_insn *loc_note)
   char loclabel[MAX_ARTIFICIAL_LABEL_BYTES + 2];
   struct var_loc_node *newloc;
   rtx_insn *next_real, *next_note;
+  rtx_insn *call_insn = NULL;
   static const char *last_label;
   static const char *last_postcall_label;
   static bool last_in_cold_section_p;
@@ -22303,6 +22311,35 @@ dwarf2out_var_location (rtx_insn *loc_note)
 	  call_site_count++;
 	  if (SIBLING_CALL_P (loc_note))
 	tail_call_site_count++;
+	  if (optimize == 0 && !flag_var_tracking)
+	{
+	  /* When the var-tracking pass is not running, there is no note
+		 for indirect calls whose target is compile-time known. In this
+		 case, process such calls specifically so that we generate call
+		 sites for them anyway.  */
+	  rtx x = PATTERN (loc_note);
+	  if (GET_CODE (x) == PARALLEL)
+		x = XVECEXP (x, 0, 0);
+	  if (GET_CODE (x) == SET)
+		x = SET_SRC (x);
+	  if (GET_CODE (x) == CALL)
+		x = XEXP (x, 0);
+	  if (!MEM_P (x)
+		  || GET_CODE (XEXP (x, 0)) != SYMBOL_REF
+		  || !SYMBOL_REF_DECL (XEXP (x, 0))
+		  || (TREE_CODE (SYMBOL_REF_DECL (XEXP (x, 0)))
+		  != FUNCTION_DECL))
+		{
+		  call_insn = loc_note;
+		  loc_note = NULL;
+		  var_loc_p = false;
+
+		  next_real = next_real_insn (call_insn);
+		  next_note = NULL;
+		  cached_next_real_insn = NULL;
+		  goto create_label;
+		}
+	}
 	}
   return;
 }
@@ -22348,6 +22385,8 @@ dwarf2out_var_location (rtx_insn *loc_note)
   && !NOTE_DURING_CALL_P (loc_note))
 return;
 
+create_label:
+
   if (next_real == NULL_RTX)
 next_real = get_last_insn ();
 
@@ -22427,12 +22466,16 @@ dwarf2out_var_location (rtx_insn *loc_note)
 	}
 }
 
+  gcc_assert ((loc_note == NULL_RTX && call_insn != NULL_RTX)
+	  || (loc_note != NULL_RTX && call_insn == NULL_RTX));
+
   if (!var_loc_p)
 {
   struct call_arg_loc_node *ca_loc
 	= ggc_cleared_alloc ();
-  rtx_insn *prev = prev_real_insn (loc_note);
-  rtx x;
+  rtx_insn *prev
+= loc_note != NULL_RTX ? prev_real_insn (loc_note) : call_insn;
+
   c

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 08:38:56AM -0500, Nathan Sidwell wrote:
> On 12/02/15 08:10, Jakub Jelinek wrote:
> >On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
> 
> >Always the whole stack, from the current stack pointer up to top of the
> >stack, so sometimes a few bytes, sometimes a few kilobytes or more each time?
> 
> The frame of the current function.  Not the whole stack.  As I said, there's
> no visibility of the stack beyond the current function.  (one could
> implement some kind of chaining, I guess)

So, how does OpenACC cope with this?

Or does the OpenACC execution model not allow anything like that, i.e.
have some function with an automatic variable pass the address of that
variable to some other function and that other function use #acc loop kind
that expects the caller to be at the worker level and splits the work among
the threads in the warp, on the array section pointed by that passed in
pointer?  See the OpenMP testcase I've posted in this thread.

Jakub


Re: [PING^2][PATCH] Improve C++ loop's backward-jump location

2015-12-02 Thread Alan Lawrence

On 24/11/15 14:55, Andreas Arnez wrote:

Ping?

   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01192.html

I guess we want C and C++ behave the same here?


gcc/cp/ChangeLog:

* cp-gimplify.c (genericize_cp_loop): Change LOOP_EXPR's location
to start of loop body instead of start of loop.

gcc/testsuite/ChangeLog:

* g++.dg/guality/pr67192.C: New test.






Since this, we've been seeing these tests fail natively on AArch64 and ARM:

FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  gcov: 3 failures in line counts, 0 in 
branch percentages, 0 in return percentages, 0 in intermediate format

FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  line 115: is 27:should be 14
FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  line 58: is 18:should be 9
FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  line 73: is 162:should be 81
FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  gcov: 3 failures in line counts, 0 in 
branch percentages, 0 in return percentages, 0 in intermediate format

FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  line 115: is 27:should be 14
FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  line 58: is 18:should be 9
FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  line 73: is 162:should be 81
FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  gcov: 3 failures in line counts, 0 in 
branch percentages, 0 in return percentages, 0 in intermediate format

FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  line 115: is 27:should be 14
FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  line 58: is 18:should be 9
FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  line 73: is 162:should be 81

I've not had a chance to look any further yet.

Thanks, Alan



[PATCH, i386] Fix alignment check for AVX-512 masked store

2015-12-02 Thread Ilya Enkovich
Hi,

This patch fixes wrong alignment check in _store_mask
pattern.  Currently we check a register operand instead of a memory
one.  This fixes segfault on 481.wrf compiled at -O3 for KNL target.
I bootstrapped and tested this patch on x86_64-unknown-linux-gnu.

I got a bunch of new failures:

FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1

With patch applied test generates vmovup[sd] because memory
references don't have proper alignment set.  Since this is
another bug and it's actually a performance one, I think
this patch should go to trunk.

Thanks,
Ilya
--
gcc/

2015-12-02  Ilya Enkovich  

* config/i386/sse.md (_store_mask): Fix
operand checked for alignment.


diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e7b517a..d65ed0c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1051,7 +1051,7 @@
   sse_suffix = "";
 }
 
-  if (misaligned_operand (operands[1], mode))
+  if (misaligned_operand (operands[0], mode))
 align = "u";
   else
 align = "a";


Re: [gomp-nvptx 9/9] adjust SIMD loop lowering for SIMT targets

2015-12-02 Thread Alexander Monakov
On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> expand_omp is depth-first expansion, so for the case where the simd
> region is in lexically (directly or indirectly) nested inside of a
> target region, the above will not trigger.  You'd need to
> use cgraph_node::get (current_function_decl)->offloadable or
> just walk through outer fields of region up and see if this isn't in
> a target region.

I've addressed this in my follow-up response to this patch.  Again, sorry for
the mishap, I was overconfident when adjusting the patch just before sending.

> Here, it would be nice to extend omp_max_vf in the host compiler,
> such that if PTX offloading is enabled, and optimize && !optimize_debug
> (and vectorizer on the host not disabled, otherwise it won't be cleaned up
> on the host), it returns MIN (32, whatever it would return otherwise).

Did you mean MAX (32, host_vf), not MIN?

> How does this even compile?  simt_lane is a local var in the if
> (do_simt_transform) body.

I addressed in this in the reposted patch too, a few hours after posting this
broken code.

> BTW, again, it would help if you post a simple *.ompexp dump on what exactly
> you want to look it up.

Sorry, I'm not following you here -- can you rephrase what I should post?

Thanks.
Alexander


Re: [PATCH, PING*4] Track indirect calls for call site information in debug info.

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 02:46:28PM +0100, Pierre-Marie de Rodat wrote:
> On 11/24/2015 06:10 PM, Jakub Jelinek wrote:
> >The new pass is IMNSHO completely useless and undesirable, both for compile
> >time (another whole IL traversal) reasons and for the unnecessary creation
> >of memory allocations.
> 
> Understood. Thank you very much for explaining how you think it should be!
> Here’s the patch implemeting this, bootstrapped and regtested without
> regression on x86_64-linux.
> 
> -- 
> Pierre-Marie de Rodat

> >From 41ed1a37921b4f9c5f762334265e72fd8e4b4a25 Mon Sep 17 00:00:00 2001
> From: Pierre-Marie de Rodat 
> Date: Thu, 13 Jun 2013 11:13:08 +0200
> Subject: [PATCH] Track indirect calls for call site information in debug info
> 
> gcc/ChangeLog:
> 
>   * dwarf2out.c (dwar2out_var_location): In addition to notes,
>   process indirect calls whose target is compile-time known.
>   Enhance pattern matching to get the SYMBOL_REF they embed.
>   (gen_subprogram_die): Handle such calls.
>   * final.c (final_scan_insn): For call instructions, invoke the
>   var_location debug hook only after the call has been emitted.

Ok, thanks.

Jakub


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Bernd Schmidt

On 12/02/2015 02:46 PM, Jakub Jelinek wrote:

Or does the OpenACC execution model not allow anything like that, i.e.
have some function with an automatic variable pass the address of that
variable to some other function and that other function use #acc loop kind
that expects the caller to be at the worker level and splits the work among
the threads in the warp, on the array section pointed by that passed in
pointer?  See the OpenMP testcase I've posted in this thread.


I believe you're making a mistake if you think that the OpenACC 
"specification" considers such cases.



Bernd



[hsa] Support internal functions and implement various __builtin_*

2015-12-02 Thread Martin Liška
Hello.

Following patch add support for internal functions that are either expanded to 
an HSAIL
instruction, or a function call is generated. Apart from that, utilizing bit 
string instructions,
we support all builtins that are based on the type of instructions.

Patch set:
00c2bb6 HSA: reorder BUILT_IN_* enum handling in a switch stmt
d1965a7 HSA: add initial support for internal functions
ad292fd HSA: implement __builtin_popcount
2e9d0a0 HSA: expand natively not handled builtins
8767212 HSA: generate HSAIL instructions for bit string insns
a185160 HSA: support 'unsigned long long' type for integer builtins
fc08ffd HSA: improve warning message in IPA HSA

The series has been just installed to the HSA branch.

Thanks,
Martin
>From 00c2bb6f1c8a04f9ac28767401919fd058cc2808 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 27 Nov 2015 11:22:22 +0100
Subject: [PATCH 1/7] HSA: reorder BUILT_IN_* enum handling in a switch stmt

gcc/ChangeLog:

2015-11-30  Martin Liska  

	* hsa-gen.c (gen_hsa_insns_for_call): Logically reorder cases
	in a switch.
---
 gcc/hsa-gen.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index ed47b35..cb1dc97 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -4626,18 +4626,6 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
   tree fndecl = gimple_call_fndecl (stmt);
   switch (DECL_FUNCTION_CODE (fndecl))
 {
-case BUILT_IN_OMP_GET_THREAD_NUM:
-  {
-	query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb);
-	break;
-  }
-
-case BUILT_IN_OMP_GET_NUM_THREADS:
-  {
-	query_hsa_grid (stmt, BRIG_OPCODE_GRIDSIZE, 0, hbb);
-	break;
-  }
-
 case BUILT_IN_FABS:
 case BUILT_IN_FABSF:
   gen_hsa_unaryop_for_builtin (BRIG_OPCODE_ABS, stmt, hbb);
@@ -4892,6 +4880,17 @@ gen_hsa_insns_for_call (gimple *stmt, hsa_bb *hbb)
 
 	break;
   }
+case BUILT_IN_OMP_GET_THREAD_NUM:
+  {
+	query_hsa_grid (stmt, BRIG_OPCODE_WORKITEMABSID, 0, hbb);
+	break;
+  }
+
+case BUILT_IN_OMP_GET_NUM_THREADS:
+  {
+	query_hsa_grid (stmt, BRIG_OPCODE_GRIDSIZE, 0, hbb);
+	break;
+  }
 case BUILT_IN_GOMP_TEAMS:
   {
 	gen_set_num_threads (gimple_call_arg (stmt, 1), hbb);
-- 
2.6.3

>From d1965a7744a14f6662cca9be40d6ab81b70e6d04 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 27 Nov 2015 15:32:45 +0100
Subject: [PATCH 2/7] HSA: add initial support for internal functions

gcc/ChangeLog:

2015-11-30  Martin Liska  

	* hsa-brig.c (emit_function_directives): Use
	hsa_function_representation::get_linkage.  Fill up code offset
	of internal functions.
	(emit_internal_fn_decl): New function.
	(emit_call_insn): Handle internal functions.
	(hsa_brig_emit_function): Likewise.
	(hsa_output_brig): Release memory of emitted_internal_decls.
	* hsa-dump.c (dump_hsa_insn_1): Print name of internal function.
	* hsa-gen.c (hsa_function_representation::~hsa_function_representation):
	Release internal function.
	(hsa_function_representation::get_linkage): New function.
	(gen_hsa_insns_for_direct_call): Fix comment of argument end
	block.
	(gen_hsa_insns_for_call_of_internal_fn): New function.
	(gen_hsa_unaryop_builtin_call): Dispatch between a function
	with declaration and an internal FN.
	(gen_hsa_insn_for_internal_fn_call): New function.
	(gen_hsa_insns_for_call): Handle internal functions.
	(hsa_generate_internal_fn_decl): New function.
	* hsa.c (hsa_float_for_bitsize): Dtto.
	(hsa_internal_fn::name): Dtto.
	(hsa_internal_fn::get_arity): Dtto.
	(hsa_internal_fn::get_argument_type): Dtto.
	* hsa.h (struct hsa_internal_fn_hasher): New structure.
	(hsa_internal_fn_hasher::hash): New function.
	(hsa_internal_fn_hasher::equal): New function.
---
 gcc/hsa-brig.c |  66 ++---
 gcc/hsa-dump.c |  14 +++-
 gcc/hsa-gen.c  | 222 +++--
 gcc/hsa.c  | 154 +++
 gcc/hsa.h  |  80 -
 5 files changed, 517 insertions(+), 19 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 9f65d50..234a6c9 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -99,9 +99,12 @@ static bool brig_initialized = false;
 /* Mapping between emitted HSA functions and their offset in code segment.  */
 static hash_map *function_offsets;
 
-/* Set of emitted function declarations.  */
+/* Hash map of emitted function declarations.  */
 static hash_map  *emitted_declarations;
 
+/* Hash table of emitted internal function declaration offsets.  */
+hash_table  *hsa_emitted_internal_decls;
+
 /* List of sbr instructions.  */
 static vec  *switch_instructions;
 
@@ -585,17 +588,27 @@ emit_function_directives (hsa_function_representation *f, bool is_declaration)
   fndir.firstInArg = htole32 (inarg_off);
   fndir.firstCodeBlockEntry = htole32 (scoped_off);
   fndir.nextModuleEntry = htole32 (next_toplev_off);
-  fndir.linkage = f->m_kern_p || TREE_PUBLIC (f->m_decl) ?
-BRIG_LINKAGE_PROGRAM : BRIG_LINK

Re: [gomp-nvptx 9/9] adjust SIMD loop lowering for SIMT targets

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 04:54:39PM +0300, Alexander Monakov wrote:
> On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> > expand_omp is depth-first expansion, so for the case where the simd
> > region is in lexically (directly or indirectly) nested inside of a
> > target region, the above will not trigger.  You'd need to
> > use cgraph_node::get (current_function_decl)->offloadable or
> > just walk through outer fields of region up and see if this isn't in
> > a target region.
> 
> I've addressed this in my follow-up response to this patch.  Again, sorry for
> the mishap, I was overconfident when adjusting the patch just before sending.
> 
> > Here, it would be nice to extend omp_max_vf in the host compiler,
> > such that if PTX offloading is enabled, and optimize && !optimize_debug
> > (and vectorizer on the host not disabled, otherwise it won't be cleaned up
> > on the host), it returns MIN (32, whatever it would return otherwise).
> 
> Did you mean MAX (32, host_vf), not MIN?

Sure, MAX.  Though, if the SIMTification treats "omp simd array" arrays
specially, it probably only cares whether it is > 1 (because 1 disables the
"omp simd array" handling).  If all we want to achieve is that those arrays
in PTX ACCEL_COMPILER become again scalars (or aggregates or whatever they
were before) with each thread in warp writing their own, it doesn't really
care about their size that much.

> > How does this even compile?  simt_lane is a local var in the if
> > (do_simt_transform) body.
> 
> I addressed in this in the reposted patch too, a few hours after posting this
> broken code.
> 
> > BTW, again, it would help if you post a simple *.ompexp dump on what exactly
> > you want to look it up.
> 
> Sorry, I'm not following you here -- can you rephrase what I should post?

Just wanted to see -fdump-tree-ompexp dump say from the testcase I've
posted.  Does your patchset have any dependencies that aren't on the trunk?
If not, I guess I just could apply the patchset and look at the results, but
if there are, it would need applying more.

Jakub


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell

On 12/02/15 08:46, Jakub Jelinek wrote:


Or does the OpenACC execution model not allow anything like that, i.e.
have some function with an automatic variable pass the address of that
variable to some other function and that other function use #acc loop kind
that expects the caller to be at the worker level and splits the work among
the threads in the warp, on the array section pointed by that passed in
pointer?  See the OpenMP testcase I've posted in this thread.


There are two cases to consider

1) the caller (& address taker) is already partitioned.  Thus the callers' 
frames are already copied.  The caller takes the address of the object in its 
own frame.


An example would be calling say __mulcd3 where the return value location is 
passed by pointer.


2) the caller is not partitioned and calls a function containing a partitioned 
loop.  The caller takes the address of its instance of the variable.  As part of 
the RTL expansion we have to convert addresses (to be stored in registers) to 
the generic address space.  That conversion creates a pointer that may be used 
by any thread (on the same CTA)[*].  The function call is  executed by all 
threads (they're partially un-neutered before the call).  In the partitioned 
loop, each thread ends up accessing the location in the frame of the original 
calling active thread.


[*]  although .local is private to each thread, it's placed in memory that is 
reachable from anywhere, provided a generic address is used.  Essentially it's 
like TLS and genericization is simply adding the thread pointer to the local 
memory offset to create a generic address.


nathan



Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-02 Thread Jeff Law

On 12/02/2015 01:36 AM, Richard Biener wrote:



This could also be a candidate for the 5.3 release; backporting depends only on
the (fairly trivial) r230357.


Looks good to me (for both, but backport only after 5.3 is released).  But
please wait for the discussion with Jeff to settle down.
No need to wait on me.  I think if we want the little debug/verify 
function, that can go in as a follow-up.


jeff



Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-02 Thread Jeff Law

On 12/02/2015 01:33 AM, Richard Biener wrote:

Right.  So the question I have is how/why did DOM leave anything in the map.
And if DOM is fixed to not leave stuff lying around, can we then assert that
nothing is ever left in those maps between passes?  There's certainly no good
reason I'm aware of why DOM would leave things in this state.


It happens not only with DOM but with all passes doing edge redirection.
This is because the map is populated by GIMPLE cfg hooks just in case
it might be used.  But there is no such thing as a "start CFG manip"
and "end CFG manip" to cleanup such dead state.

Sigh.



IMHO the redirect-edge-var-map stuff is just the very most possible
unclean implementation possible. :(  (see how remove_edge "clears"
stale info from the map to avoid even more "interesting" stale
data)

Ideally we could assert the map is empty whenever we leave a pass,
but as said it triggers all over the place.  Even cfg-cleanup causes
such stale data.

I agree that the patch is only a half-way "solution", but a full
solution would require sth more explicit, like we do with
initialize_original_copy_tables/free_original_copy_tables.  Thus
require passes to explicitely request the edge data to be preserved
with a initialize_edge_var_map/free_edge_var_map call pair.

Not appropriate at this stage IMHO (well, unless it turns out to be
a very localized patch).
So maybe as a follow-up to aid folks in the future, how about a 
debugging verify_whatever function that we can call manually if 
debugging a problem in this space.  With a comment indicating why we 
can't call it unconditionally (yet).



jeff


Re: [Fortran, Patch] (RFC, Coarray) Implement TS18508's EVENTS

2015-12-02 Thread Alessandro Fanfarillo
*PING*

2015-11-26 17:51 GMT+01:00 Steve Kargl :
> On Wed, Nov 25, 2015 at 06:24:49PM +0100, Alessandro Fanfarillo wrote:
>> Dear all,
>>
>> in attachment the previous patch compatible with the current trunk.
>> The patch also includes the changes introduced in the latest TS 18508.
>>
>> Built and regtested on x86_64-pc-linux-gnu.
>>
>> PS: I will add the test cases in a different patch.
>>
>
> I have now built and regression tested the patch on
> x86_64-*-freebsd and i386-*-freebsd.  There were no
> regressions.  In reading through the patch, nothing
> jumped out at me as suspicious/wrong.  Tobias, this
> is OK to commit.  If you don't committed by Sunday,
> I'll do it for you.
>
> --
> steve


Re: [gomp-nvptx 4/9] nvptx backend: add -mgomp option and multilib

2015-12-02 Thread Alexander Monakov
On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> I thought the MULTILIB* vars allow you to multilib on none of
> -msoft-stack/-muniform-simt and both -msoft-stack/-muniform-simt, without
> building other variants, so you wouldn't need this.

The nice effect of having -mgomp is better factorization: if I need to change
what OpenMP needs, e.g. for going with your suggestion below and dropping
-msoft-stack, I need to only change one line.  Otherwise I'd have to change
mkoffload too.

> Furthermore, as I said, I believe for e.g. most of newlib libc / libm
> I think it is enough if they are built as -muniform-simt -mno-soft-stack,
> if those functions are leaf or don't call user routines that could have
> #pragma omp parallel.  -msoft-stack would unnecessarily slow the routines
> down.

Not obviously so.  Outside of SIMD regions, running on hard stacks pointlessly
amplifies cache/memory traffic for stack references, so there would have to be
some evaluation before deciding.

> So perhaps just multilib on -muniform-simt, and document that -muniform-simt
> built code requires also that the soft-stack var is set up and thus
> -msoft-stack can be used when needed?

It's an interesting point, but I have doubts.  Is that something you'd want me
to address short-term?

> Can you post sample code with assembly for -msoft-stack and -muniform-simt
> showing how are short interesting cases expanded?
> Is there really no way even in direct PTX assembly to have .local file scope
> vars (rather than the global arrays indexed by %tid)?

Allow me to post samples a bit later; as for .local, the PTX documentation
explicitely states it must not be done:

5.1.5. Local State Space
[...]
When compiling to use the Application Binary Interface (ABI), .local
state-space variables must be declared within function scope and are
allocated on the stack. In implementations that do not support a stack,
all local memory variables are stored at fixed addresses, recursive
function calls are not supported, and .local variables may be declared at
module scope. When compiling legacy PTX code (ISA versions prior to 3.0)
containing module-scoped .local variables, the compiler silently disables
use of the ABI.

(while I'm unsure as to what exactly "compiling to use the ABI" is defined,
I'm assuming that's what we want in GCC, and otherwise linking may not work)

Thanks.
Alexander


Re: [ping] pending patches

2015-12-02 Thread Jason Merrill

On 12/02/2015 02:58 AM, Eric Botcazou wrote:

DWARF-2 (debug info for Scalar_Storage_Order attribute):
   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01659.html


It doesn't look to me like DW_AT_endianity is applicable to array types 
or members in DWARF 3/4; instead, it should be applied to the underlying 
base type.



C++ (PR 68290: internal error with concepts):
   https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03301.html


OK, thanks.

Jason



Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 09:14:03AM -0500, Nathan Sidwell wrote:
> On 12/02/15 08:46, Jakub Jelinek wrote:
> 
> >Or does the OpenACC execution model not allow anything like that, i.e.
> >have some function with an automatic variable pass the address of that
> >variable to some other function and that other function use #acc loop kind
> >that expects the caller to be at the worker level and splits the work among
> >the threads in the warp, on the array section pointed by that passed in
> >pointer?  See the OpenMP testcase I've posted in this thread.
> 
> There are two cases to consider
> 
> 1) the caller (& address taker) is already partitioned.  Thus the callers'
> frames are already copied.  The caller takes the address of the object in
> its own frame.
> 
> An example would be calling say __mulcd3 where the return value location is
> passed by pointer.
> 
> 2) the caller is not partitioned and calls a function containing a
> partitioned loop.  The caller takes the address of its instance of the
> variable.  As part of the RTL expansion we have to convert addresses (to be
> stored in registers) to the generic address space.  That conversion creates
> a pointer that may be used by any thread (on the same CTA)[*].  The function
> call is  executed by all threads (they're partially un-neutered before the
> call).  In the partitioned loop, each thread ends up accessing the location
> in the frame of the original calling active thread.
> 
> [*]  although .local is private to each thread, it's placed in memory that
> is reachable from anywhere, provided a generic address is used.  Essentially
> it's like TLS and genericization is simply adding the thread pointer to the
> local memory offset to create a generic address.

I believe Alex' testing revealed that if you take address of the same .local
objects in several threads, the addresses are the same, and therefore you
refer to your own .local space rather than the other thread's.  Which is why
the -msoft-stack stuff has been added.
Perhaps we need to use it everywhere, at least for OpenMP, and do it
selectively, non-addressable vars can stay .local, addressable vars proven
not to escape to other threads (or other functions that could access them
from other threads) would go to soft stack.

Jakub


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell

On 12/02/15 09:22, Jakub Jelinek wrote:


I believe Alex' testing revealed that if you take address of the same .local
objects in several threads, the addresses are the same, and therefore you
refer to your own .local space rather than the other thread's.


Before or after applying cvta?

nathan



Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 09:23:11AM -0500, Nathan Sidwell wrote:
> On 12/02/15 09:22, Jakub Jelinek wrote:
> 
> >I believe Alex' testing revealed that if you take address of the same .local
> >objects in several threads, the addresses are the same, and therefore you
> >refer to your own .local space rather than the other thread's.
> 
> Before or after applying cvta?

I'll let Alex answer that.

Jakub


Re: [gomp-nvptx 9/9] adjust SIMD loop lowering for SIMT targets

2015-12-02 Thread Alexander Monakov
On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> Just wanted to see -fdump-tree-ompexp dump say from the testcase I've
> posted.  Does your patchset have any dependencies that aren't on the trunk?
> If not, I guess I just could apply the patchset and look at the results, but
> if there are, it would need applying more.

Hm, the testcase has a reduction, which would cause the loop have a _SIMDUID
clause, which would in turn make my patch give up, setting do_simt_transform
to false.  So I'm using presence of SIMDUID to see whether the loop has any
reduction/lastprivate data, which I'm not handling for SIMT yet.

(I should really start a branch)

Alexander


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Alexander Monakov


On Wed, 2 Dec 2015, Jakub Jelinek wrote:

> On Wed, Dec 02, 2015 at 09:23:11AM -0500, Nathan Sidwell wrote:
> > On 12/02/15 09:22, Jakub Jelinek wrote:
> > 
> > >I believe Alex' testing revealed that if you take address of the same 
> > >.local
> > >objects in several threads, the addresses are the same, and therefore you
> > >refer to your own .local space rather than the other thread's.
> > 
> > Before or after applying cvta?
> 
> I'll let Alex answer that.

Both before and after, see this email:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02081.html

Alexander


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell

On 12/02/15 09:24, Jakub Jelinek wrote:

On Wed, Dec 02, 2015 at 09:23:11AM -0500, Nathan Sidwell wrote:

On 12/02/15 09:22, Jakub Jelinek wrote:


I believe Alex' testing revealed that if you take address of the same .local
objects in several threads, the addresses are the same, and therefore you
refer to your own .local space rather than the other thread's.


Before or after applying cvta?


I'll let Alex answer that.


Nevermind, I've run an experiment, and it appears that local addresses converted 
to generic do give the same value regardless of executing thread.  I guess that 
means that genericization of local addresses to physical memory is done late at 
the load/store insn, rather than in the cvta insn.


When I added routine support, I did wonder whether the calling routine would 
need to clone its stack frame, but determined against it using the logic I wrote 
earlier.


nathan



Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Alexander Monakov
On Wed, 2 Dec 2015, Nathan Sidwell wrote:

> On 12/02/15 05:40, Jakub Jelinek wrote:
> > Don't know the HW good enough, is there any power consumption, heat etc.
> > difference between the two approaches?  I mean does the HW consume different
> > amount of power if only one thread in a warp executes code and the other
> > threads in the same warp just jump around it, vs. having all threads busy?
> 
> Having all threads busy will increase power consumption. >

Is that from general principles (i.e. "if it doesn't increase power
consumption, the GPU is poorly optimized"), or is that based on specific
knowledge on how existing GPUs operate (presumably reverse-engineered or
privately communicated -- I've never seen any public statements on this
point)?

The only certain case I imagine is instructions that go to SFU rather than
normal SPs -- but those are relatively rare.

> It's also bad if the other vectors are executing memory access instructions.

How so?  The memory accesses are the same independent of whether you reading
the same data from 1 thread or 32 synchronous threads.

Alexander


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell

On 12/02/15 09:41, Alexander Monakov wrote:

On Wed, 2 Dec 2015, Nathan Sidwell wrote:


On 12/02/15 05:40, Jakub Jelinek wrote:

Don't know the HW good enough, is there any power consumption, heat etc.
difference between the two approaches?  I mean does the HW consume different
amount of power if only one thread in a warp executes code and the other
threads in the same warp just jump around it, vs. having all threads busy?


Having all threads busy will increase power consumption. >


Is that from general principles (i.e. "if it doesn't increase power
consumption, the GPU is poorly optimized"), or is that based on specific
knowledge on how existing GPUs operate (presumably reverse-engineered or
privately communicated -- I've never seen any public statements on this
point)?


Nvidia told me.


The only certain case I imagine is instructions that go to SFU rather than
normal SPs -- but those are relatively rare.


It's also bad if the other vectors are executing memory access instructions.


How so?  The memory accesses are the same independent of whether you reading
the same data from 1 thread or 32 synchronous threads.


Nvidia told me.



Re: [PING^2][PATCH] Improve C++ loop's backward-jump location

2015-12-02 Thread Andreas Arnez
On Wed, Dec 02 2015, Alan Lawrence wrote:

[...]

> Since this, we've been seeing these tests fail natively on AArch64 and ARM:
>
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  gcov: 3 failures in line
> counts, 0 in branch percentages, 0 in return percentages, 0 in
> intermediate format
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  line 115: is 27:should be 14
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  line 58: is 18:should be 9
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++11  line 73: is 162:should be 81
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  gcov: 3 failures in line
> counts, 0 in branch percentages, 0 in return percentages, 0 in
> intermediate format
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  line 115: is 27:should be 14
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  line 58: is 18:should be 9
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++14  line 73: is 162:should be 81
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  gcov: 3 failures in line
> counts, 0 in branch percentages, 0 in return percentages, 0 in
> intermediate format
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  line 115: is 27:should be 14
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  line 58: is 18:should be 9
> FAIL: g++.dg/gcov/gcov-1.C  -std=gnu++98  line 73: is 162:should be 81

Right, sorry about that.  See also:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68603

This should be fixed now.  Let me know if you still see problems.

--
Andreas



Re: [PATCH] Avoid false vector mask conversion

2015-12-02 Thread Richard Biener
On Thu, Nov 12, 2015 at 5:08 PM, Ilya Enkovich  wrote:
> Hi,
>
> When we use LTO for fortran we may have a mix 32bit and 1bit scalar booleans. 
> It means we may have conversion of one scalar type to another which confuses 
> vectorizer because values with different scalar boolean type may get the same 
> vectype.  This patch transforms such conversions into comparison.
>
> I managed to make a small fortran test which gets vectorized with this patch 
> but I didn't find how I can run fortran test with LTO and then scan tree dump 
> to check it is vectorized.  BTW here is a loop from the test:
>
>   real*8 a(18)
>   logical b(18)
>   integer i
>
>   do i=1,18
>  if(a(i).gt.0.d0) then
> b(i)=.true.
>  else
> b(i)=.false.
>  endif
>   enddo
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-12  Ilya Enkovich  
>
> * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
> Transform useless boolean conversion into assignment.
>
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index b9d900c..62070da 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -3674,6 +3674,38 @@ vect_recog_mask_conversion_pattern (vec 
> *stmts, tree *type_in,
>if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
>  return NULL;
>
> +  /* Check conversion between boolean types of different sizes.
> + If no vectype is specified, then we have a regular mask
> + assignment with no actual conversion.  */
> +  if (rhs_code == CONVERT_EXPR

CONVERT_EXPR_CODE_P (rhs_code)

> +  && !STMT_VINFO_DATA_REF (stmt_vinfo)
> +  && !STMT_VINFO_VECTYPE (stmt_vinfo))
> +{
> +  if (TREE_CODE (rhs1) != SSA_NAME)
> +   return NULL;
> +
> +  rhs1_type = search_type_for_mask (rhs1, vinfo);
> +  if (!rhs1_type)
> +   return NULL;
> +
> +  vectype1 = get_mask_type_for_scalar_type (rhs1_type);
> +
> +  if (!vectype1)
> +   return NULL;
> +
> +  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
> +  pattern_stmt = gimple_build_assign (lhs, rhs1);

So what's the actual issue here?  That the conversion is spurious?
Why can't you accept this simply in vectorizable_assignment then?

Richard.

> +  *type_out = vectype1;
> +  *type_in = vectype1;
> +  stmts->safe_push (last_stmt);
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location,
> + "vect_recog_mask_conversion_pattern: detected:\n");
> +
> +  return pattern_stmt;
> +}
> +
>if (rhs_code != BIT_IOR_EXPR
>&& rhs_code != BIT_XOR_EXPR
>&& rhs_code != BIT_AND_EXPR)


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Alexander Monakov
On Wed, 2 Dec 2015, Jakub Jelinek wrote:

> On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
> > On 12/02/15 05:40, Jakub Jelinek wrote:
> > > Don't know the HW good enough, is there any power consumption, heat etc.
> > >difference between the two approaches?  I mean does the HW consume 
> > >different
> > >amount of power if only one thread in a warp executes code and the other
> > >threads in the same warp just jump around it, vs. having all threads busy?
> > 
> > Having all threads busy will increase power consumption.  It's also bad if
> > the other vectors are executing memory access instructions.  However, for
> 
> Then the uniform SIMT approach might not be that good idea.

Why?  Remember that the tradeoff is copying registers (and in OpenACC, stacks
too).  We don't know how the costs balance.  My intuition is that copying is
worse compared to what I'm doing.

Anyhow, for good performance the offloaded code needs to be running in vector
regions most of the time, where the concern doesn't apply.

Alexander


[patch] libstdc++/56383 Fix ambiguity with multiple enable_shared_from_this bases

2015-12-02 Thread Jonathan Wakely

The friend function defined in enable_shared_from_this did not match
the declaration at namespace scope, so instead of defining the
previously declared function it added a new overload, which was always
a better match than the declared (but never defined) one.

That worked fine for a single base class, the friend function got
found by ADL, but with two enable_shared_from_this base classes the
two friend overloads were ambiguous.

This changes the friend to match the declaration, so that it can only
be called when an unambiguous enable_shared_from_this base can be
deduced, and so fails silently (as it is supposed to) when there is
not an unambiguous base class.

Tested powerpc64le-linux, committed to trunk.

This fix is simple enough that I'm going to backport it after 5.3 is
released.

commit 1fa7fd2d7699c41f147b42ae96f70bf7b9e8e2d6
Author: Jonathan Wakely 
Date:   Wed Dec 2 14:43:34 2015 +

Fix ambiguity with multiple enable_shared_from_this bases

	PR libstdc++/56383
	* testsuite/20_util/enable_shared_from_this/56383.cc: New.
	* include/bits/shared_ptr_base.h (__enable_shared_from_this): Make
	friend declaration match previous declaration of
	__enable_shared_from_this_helper.
	* include/bits/shared_ptr.h (enable_shared_from_this): Likewise.

diff --git a/libstdc++-v3/include/bits/shared_ptr.h b/libstdc++-v3/include/bits/shared_ptr.h
index 2413b1b..26a0ad3 100644
--- a/libstdc++-v3/include/bits/shared_ptr.h
+++ b/libstdc++-v3/include/bits/shared_ptr.h
@@ -582,19 +582,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	_M_weak_assign(_Tp1* __p, const __shared_count<>& __n) const noexcept
 	{ _M_weak_this._M_assign(__p, __n); }
 
-  template
+  template
 	friend void
-	__enable_shared_from_this_helper(const __shared_count<>& __pn,
-	 const enable_shared_from_this* __pe,
-	 const _Tp1* __px) noexcept
-	{
-	  if (__pe != nullptr)
-	__pe->_M_weak_assign(const_cast<_Tp1*>(__px), __pn);
-	}
+	__enable_shared_from_this_helper(const __shared_count<>&,
+	 const enable_shared_from_this<_Tp1>*,
+	 const _Tp2*) noexcept;
 
   mutable weak_ptr<_Tp>  _M_weak_this;
 };
 
+  template
+inline void
+__enable_shared_from_this_helper(const __shared_count<>& __pn,
+ const enable_shared_from_this<_Tp1>*
+ __pe, const _Tp2* __px) noexcept
+{
+  if (__pe != nullptr)
+	__pe->_M_weak_assign(const_cast<_Tp2*>(__px), __pn);
+}
+
   /**
*  @brief  Create an object that is owned by a shared_ptr.
*  @param  __a An allocator.
diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h
index 1a96b4c..f4f98e6 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -1546,19 +1546,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	_M_weak_assign(_Tp1* __p, const __shared_count<_Lp>& __n) const noexcept
 	{ _M_weak_this._M_assign(__p, __n); }
 
-  template
+  template<_Lock_policy _Lp1, typename _Tp1, typename _Tp2>
 	friend void
-	__enable_shared_from_this_helper(const __shared_count<_Lp>& __pn,
-	 const __enable_shared_from_this* __pe,
-	 const _Tp1* __px) noexcept
-	{
-	  if (__pe != nullptr)
-	__pe->_M_weak_assign(const_cast<_Tp1*>(__px), __pn);
-	}
+	__enable_shared_from_this_helper(const __shared_count<_Lp1>&,
+	 const __enable_shared_from_this<_Tp1,
+	 _Lp1>*, const _Tp2*) noexcept;
 
   mutable __weak_ptr<_Tp, _Lp>  _M_weak_this;
 };
 
+  template<_Lock_policy _Lp1, typename _Tp1, typename _Tp2>
+inline void
+__enable_shared_from_this_helper(const __shared_count<_Lp1>& __pn,
+ const __enable_shared_from_this<_Tp1,
+ _Lp1>* __pe,
+ const _Tp2* __px) noexcept
+{
+  if (__pe != nullptr)
+	__pe->_M_weak_assign(const_cast<_Tp2*>(__px), __pn);
+}
 
   template
 inline __shared_ptr<_Tp, _Lp>
diff --git a/libstdc++-v3/testsuite/20_util/enable_shared_from_this/56383.cc b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/56383.cc
new file mode 100644
index 000..ea0f28d
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/enable_shared_from_this/56383.cc
@@ -0,0 +1,56 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11"

[PTX] simplify movs

2015-12-02 Thread Nathan Sidwell
The PTX md file goes to a lot of effort handling SC and DC movs, including for 
unspecs to mov low and high parts around.  However, these code paths are not 
exercised in any gcc test or the build of newlib.  The generic handling of these 
movs deals with type punning, (using the stack frame, if needed).  There 
doesn't appear a need for a separate punbuffer.


Thus this patch deletes a lot of that machinery.

nathan
2015-12-02  Nathan Sidwell  

	* config/nvptx/nvptx-protos.h (nvptx_output_mov_insn): Declare.
	(nvptx_underlying_object_mode): Delete.
	* config/nvptx/nvptx.c (nvptx_underlying_object_mode): Delete.
	(output_reg): New.
	(nvptx_declare_function_name): Use output_reg.  Remove punning
	buffer.
	(nvptx_output_mov_insn): New.
	(nvptx_print_operand): Separate SUBREG handling, remove 'f' case,
	Use output_reg. Merge 't' and 'u' handling.
	* config/nvptx/nvptx.h (NVPTX_PUNNING_BUFFER_REGNUM): Delete.
	(struct machine_function): Remvoe punning_buffer_size.
	(REGISTER_NAMES): Remove %punbuffer.
	* config/nvptx/nvptx.md (UNSPEC_CPLX_LOWPART,
	UNSPEC_CPLX_HIGHPART): Delete.
	(*mov_insn [QHSDIM): Remove unnecessary constraints, use
	nvptx_output_mov_insn.
	(*mov_insn [SDFM): Reorder constraints to match integer
	moc.  Use nvptx_output_mov_insn.
	(highpartscsf2,  set_highpartscsf2, lowpartscsf2,
	set_lowpartscsf2): Delete.
	(mov [SDCM]): Delete.

Index: config/nvptx/nvptx-protos.h
===
--- config/nvptx/nvptx-protos.h	(revision 231177)
+++ config/nvptx/nvptx-protos.h	(working copy)
@@ -38,9 +38,9 @@ extern void nvptx_expand_oacc_join (unsi
 extern void nvptx_expand_call (rtx, rtx);
 extern rtx nvptx_expand_compare (rtx);
 extern const char *nvptx_ptx_type_from_mode (machine_mode, bool);
+extern const char *nvptx_output_mov_insn (rtx, rtx);
 extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx);
 extern const char *nvptx_output_return (void);
-extern machine_mode nvptx_underlying_object_mode (rtx);
 extern const char *nvptx_section_from_addr_space (addr_space_t);
 extern bool nvptx_hard_regno_mode_ok (int, machine_mode);
 extern rtx nvptx_maybe_convert_symbolic_operand (rtx);
Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231177)
+++ config/nvptx/nvptx.c	(working copy)
@@ -155,23 +155,6 @@ nvptx_option_override (void)
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 }
 
-/* Return the mode to be used when declaring a ptx object for OBJ.
-   For objects with subparts such as complex modes this is the mode
-   of the subpart.  */
-
-machine_mode
-nvptx_underlying_object_mode (rtx obj)
-{
-  if (GET_CODE (obj) == SUBREG)
-obj = SUBREG_REG (obj);
-  machine_mode mode = GET_MODE (obj);
-  if (mode == TImode)
-return DImode;
-  if (COMPLEX_MODE_P (mode))
-return GET_MODE_INNER (mode);
-  return mode;
-}
-
 /* Return a ptx type for MODE.  If PROMOTE, then use .u32 for QImode to
deal with ptx ideosyncracies.  */
 
@@ -257,6 +240,37 @@ maybe_split_mode (machine_mode mode)
   return VOIDmode;
 }
 
+/* Output a register, subreg, or register pair (with optional
+   enclosing braces).  */
+
+static void
+output_reg (FILE *file, unsigned regno, machine_mode inner_mode,
+	int subreg_offset = -1)
+{
+  if (inner_mode == VOIDmode)
+{
+  if (HARD_REGISTER_NUM_P (regno))
+	fprintf (file, "%s", reg_names[regno]);
+  else
+	fprintf (file, "%%r%d", regno);
+}
+  else if (subreg_offset >= 0)
+{
+  output_reg (file, regno, VOIDmode);
+  fprintf (file, "$%d", subreg_offset);
+}
+  else
+{
+  if (subreg_offset == -1)
+	fprintf (file, "{");
+  output_reg (file, regno, inner_mode, GET_MODE_SIZE (inner_mode));
+  fprintf (file, ",");
+  output_reg (file, regno, inner_mode, 0);
+  if (subreg_offset == -1)
+	fprintf (file, "}");
+}
+}
+
 /* Emit forking instructions for MASK.  */
 
 static void
@@ -724,16 +738,12 @@ nvptx_declare_function_name (FILE *file,
 	{
 	  machine_mode mode = PSEUDO_REGNO_MODE (i);
 	  machine_mode split = maybe_split_mode (mode);
+
 	  if (split != VOIDmode)
-	{
-	  fprintf (file, "\t.reg%s %%r%d$%d;\n",
-		   nvptx_ptx_type_from_mode (split, true), i, 0);
-	  fprintf (file, "\t.reg%s %%r%d$%d;\n",
-		   nvptx_ptx_type_from_mode (split, true), i, 1);
-	}
-	  else
-	fprintf (file, "\t.reg%s %%r%d;\n",
-		 nvptx_ptx_type_from_mode (mode, true), i);
+	mode = split;
+	  fprintf (file, "\t.reg%s ", nvptx_ptx_type_from_mode (mode, true));
+	  output_reg (file, i, split, -2);
+	  fprintf (file, ";\n");
 	}
 }
 
@@ -754,15 +764,6 @@ nvptx_declare_function_name (FILE *file,
 	   BITS_PER_WORD);
 }
 
-  if (cfun->machine->punning_buffer_size > 0)
-{
-  fprintf (file, "\t.reg.u%d %%punbuffer;\n"
-	   "\t.local.align 8 .b8 %%punbuffer_ar[%d];\n",
-	   BITS_PER_WORD, cfun->machine->punning_buffer_size);
-  

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Jakub Jelinek
On Wed, Dec 02, 2015 at 05:54:51PM +0300, Alexander Monakov wrote:
> On Wed, 2 Dec 2015, Jakub Jelinek wrote:
> 
> > On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
> > > On 12/02/15 05:40, Jakub Jelinek wrote:
> > > > Don't know the HW good enough, is there any power consumption, heat etc.
> > > >difference between the two approaches?  I mean does the HW consume 
> > > >different
> > > >amount of power if only one thread in a warp executes code and the other
> > > >threads in the same warp just jump around it, vs. having all threads 
> > > >busy?
> > > 
> > > Having all threads busy will increase power consumption.  It's also bad if
> > > the other vectors are executing memory access instructions.  However, for
> > 
> > Then the uniform SIMT approach might not be that good idea.
> 
> Why?  Remember that the tradeoff is copying registers (and in OpenACC, stacks
> too).  We don't know how the costs balance.  My intuition is that copying is
> worse compared to what I'm doing.
> 
> Anyhow, for good performance the offloaded code needs to be running in vector
> regions most of the time, where the concern doesn't apply.

But you never know if people actually use #pragma omp simd regions or not,
sometimes they will, sometimes they won't, and if the uniform SIMT increases
power consumption, it might not be desirable.

If we have a reasonable IPA pass to discover which addressable variables can
be shared by multiple threads and which can't, then we could use soft-stack
for those that can be shared by multiple PTX threads (different warps, or
same warp, different threads in it), then we shouldn't need to copy any
stack, just broadcast the scalar vars.

Jakub


[PATCH] Fix PR66051

2015-12-02 Thread Richard Biener

This fixes the vectorizer part of PR66051 (a x86 target part remains
for the testcase in the PR - PR68655).  The issue is again a
misplaced check for SLP detection:

  /* Check that the size of interleaved loads group is not
 greater than the SLP group size.  */
  unsigned ncopies
= vectorization_factor / TYPE_VECTOR_SUBPARTS (vectype);
  if (is_a  (vinfo)
  && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) == stmt
  && ((GROUP_SIZE (vinfo_for_stmt (stmt))
   - GROUP_GAP (vinfo_for_stmt (stmt)))
  > ncopies * group_size))
{
  if (dump_enabled_p ())
{
  dump_printf_loc (MSG_MISSED_OPTIMIZATION, 
vect_location,
   "Build SLP failed: the number "
   "of interleaved loads is greater 
than "
   "the SLP group size ");
  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
stmt, 0);
  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
}
  /* Fatal mismatch.  */
  matches[0] = false;
  return false;
}

I've relaxed this multiple times but that still doesn't make it necessary.
It also uses a vectorization factor estimate as the vectorization factor
is not yet determined.  A good side-effect of the patch is that we
can get rid of that estimate completely.

Tested on the x86_64 vectorization tests sofar.

Bootstrap & regtest pending and I'll make sure SPEC CPU 2006 is
happy as well.

Thanks,
Richard.

2015-12-02  Richard Biener  

PR tree-optimization/66051
* tree-vect-slp.c (vect_build_slp_tree_1): Remove restriction
on load group size.  Do not pass in vectorization_factor.
(vect_transform_slp_perm_load): Do not require any permute support.
(vect_build_slp_tree): Do not pass in vectorization factor.
(vect_analyze_slp_instance): Do not compute vectorization
factor estimate.  Use vector size instead of vectorization factor
estimate to split store groups for BB vectorization.

* gcc.dg/vect/slp-42.c: New testcase.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 231167)
--- gcc/tree-vect-slp.c (working copy)
*** static bool
*** 430,437 
  vect_build_slp_tree_1 (vec_info *vinfo,
   vec stmts, unsigned int group_size,
   unsigned nops, unsigned int *max_nunits,
!  unsigned int vectorization_factor, bool *matches,
!  bool *two_operators)
  {
unsigned int i;
gimple *first_stmt = stmts[0], *stmt = stmts[0];
--- 430,436 
  vect_build_slp_tree_1 (vec_info *vinfo,
   vec stmts, unsigned int group_size,
   unsigned nops, unsigned int *max_nunits,
!  bool *matches, bool *two_operators)
  {
unsigned int i;
gimple *first_stmt = stmts[0], *stmt = stmts[0];
*** vect_build_slp_tree_1 (vec_info *vinfo,
*** 523,533 
  
/* In case of multiple types we need to detect the smallest type.  */
if (*max_nunits < TYPE_VECTOR_SUBPARTS (vectype))
! {
!   *max_nunits = TYPE_VECTOR_SUBPARTS (vectype);
!   if (is_a  (vinfo))
! vectorization_factor = *max_nunits;
! }
  
if (gcall *call_stmt = dyn_cast  (stmt))
{
--- 522,528 
  
/* In case of multiple types we need to detect the smallest type.  */
if (*max_nunits < TYPE_VECTOR_SUBPARTS (vectype))
!   *max_nunits = TYPE_VECTOR_SUBPARTS (vectype);
  
if (gcall *call_stmt = dyn_cast  (stmt))
{
*** vect_build_slp_tree_1 (vec_info *vinfo,
*** 700,730 
  else
{
  /* Load.  */
-   /* Check that the size of interleaved loads group is not
-  greater than the SLP group size.  */
- unsigned ncopies
-   = vectorization_factor / TYPE_VECTOR_SUBPARTS (vectype);
-   if (is_a  (vinfo)
- && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) == stmt
-   && ((GROUP_SIZE (vinfo_for_stmt (stmt))
-  - GROUP_GAP (vinfo_for_stmt (stmt)))
- > ncopies * group_size))
- {
-   if (dump_enabled_p ())
- {
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-  "Build SLP failed: the number "
-  "of interleaved loads is greater than "
-  "the SLP group s

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-02 Thread Nathan Sidwell

On 12/02/15 10:12, Jakub Jelinek wrote:


If we have a reasonable IPA pass to discover which addressable variables can
be shared by multiple threads and which can't, then we could use soft-stack
for those that can be shared by multiple PTX threads (different warps, or
same warp, different threads in it), then we shouldn't need to copy any
stack, just broadcast the scalar vars.


Note the current scalar (.reg)  broadcasting uses the live register set.  Not 
the subset of that that is actually read within the partitioned region.  That'd 
be a relatively straightforward optimization I think.


nathan


Re: [PATCH] Avoid false vector mask conversion

2015-12-02 Thread Ilya Enkovich
2015-12-02 17:52 GMT+03:00 Richard Biener :
> On Thu, Nov 12, 2015 at 5:08 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> When we use LTO for fortran we may have a mix 32bit and 1bit scalar 
>> booleans. It means we may have conversion of one scalar type to another 
>> which confuses vectorizer because values with different scalar boolean type 
>> may get the same vectype.  This patch transforms such conversions into 
>> comparison.
>>
>> I managed to make a small fortran test which gets vectorized with this patch 
>> but I didn't find how I can run fortran test with LTO and then scan tree 
>> dump to check it is vectorized.  BTW here is a loop from the test:
>>
>>   real*8 a(18)
>>   logical b(18)
>>   integer i
>>
>>   do i=1,18
>>  if(a(i).gt.0.d0) then
>> b(i)=.true.
>>  else
>> b(i)=.false.
>>  endif
>>   enddo
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-11-12  Ilya Enkovich  
>>
>> * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
>> Transform useless boolean conversion into assignment.
>>
>>
>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>> index b9d900c..62070da 100644
>> --- a/gcc/tree-vect-patterns.c
>> +++ b/gcc/tree-vect-patterns.c
>> @@ -3674,6 +3674,38 @@ vect_recog_mask_conversion_pattern (vec 
>> *stmts, tree *type_in,
>>if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
>>  return NULL;
>>
>> +  /* Check conversion between boolean types of different sizes.
>> + If no vectype is specified, then we have a regular mask
>> + assignment with no actual conversion.  */
>> +  if (rhs_code == CONVERT_EXPR
>
> CONVERT_EXPR_CODE_P (rhs_code)
>
>> +  && !STMT_VINFO_DATA_REF (stmt_vinfo)
>> +  && !STMT_VINFO_VECTYPE (stmt_vinfo))
>> +{
>> +  if (TREE_CODE (rhs1) != SSA_NAME)
>> +   return NULL;
>> +
>> +  rhs1_type = search_type_for_mask (rhs1, vinfo);
>> +  if (!rhs1_type)
>> +   return NULL;
>> +
>> +  vectype1 = get_mask_type_for_scalar_type (rhs1_type);
>> +
>> +  if (!vectype1)
>> +   return NULL;
>> +
>> +  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
>> +  pattern_stmt = gimple_build_assign (lhs, rhs1);
>
> So what's the actual issue here?  That the conversion is spurious?
> Why can't you accept this simply in vectorizable_assignment then?

The problem is that conversion is supposed to be handled by
vectorizable_conversion,
but it fails to because it is not actually a conversion. I suppose it
may be handled
in vectorizable_assignment but I chose this pattern because it's meant
to handle mask
conversion issues.

Thanks,
Ilya

>
> Richard.
>
>> +  *type_out = vectype1;
>> +  *type_in = vectype1;
>> +  stmts->safe_push (last_stmt);
>> +  if (dump_enabled_p ())
>> +   dump_printf_loc (MSG_NOTE, vect_location,
>> + "vect_recog_mask_conversion_pattern: detected:\n");
>> +
>> +  return pattern_stmt;
>> +}
>> +
>>if (rhs_code != BIT_IOR_EXPR
>>&& rhs_code != BIT_XOR_EXPR
>>&& rhs_code != BIT_AND_EXPR)


Re: [PATCH] Avoid false vector mask conversion

2015-12-02 Thread Richard Biener
On Wed, Dec 2, 2015 at 4:24 PM, Ilya Enkovich  wrote:
> 2015-12-02 17:52 GMT+03:00 Richard Biener :
>> On Thu, Nov 12, 2015 at 5:08 PM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> When we use LTO for fortran we may have a mix 32bit and 1bit scalar 
>>> booleans. It means we may have conversion of one scalar type to another 
>>> which confuses vectorizer because values with different scalar boolean type 
>>> may get the same vectype.  This patch transforms such conversions into 
>>> comparison.
>>>
>>> I managed to make a small fortran test which gets vectorized with this 
>>> patch but I didn't find how I can run fortran test with LTO and then scan 
>>> tree dump to check it is vectorized.  BTW here is a loop from the test:
>>>
>>>   real*8 a(18)
>>>   logical b(18)
>>>   integer i
>>>
>>>   do i=1,18
>>>  if(a(i).gt.0.d0) then
>>> b(i)=.true.
>>>  else
>>> b(i)=.false.
>>>  endif
>>>   enddo
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2015-11-12  Ilya Enkovich  
>>>
>>> * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
>>> Transform useless boolean conversion into assignment.
>>>
>>>
>>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>>> index b9d900c..62070da 100644
>>> --- a/gcc/tree-vect-patterns.c
>>> +++ b/gcc/tree-vect-patterns.c
>>> @@ -3674,6 +3674,38 @@ vect_recog_mask_conversion_pattern (vec 
>>> *stmts, tree *type_in,
>>>if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE)
>>>  return NULL;
>>>
>>> +  /* Check conversion between boolean types of different sizes.
>>> + If no vectype is specified, then we have a regular mask
>>> + assignment with no actual conversion.  */
>>> +  if (rhs_code == CONVERT_EXPR
>>
>> CONVERT_EXPR_CODE_P (rhs_code)
>>
>>> +  && !STMT_VINFO_DATA_REF (stmt_vinfo)
>>> +  && !STMT_VINFO_VECTYPE (stmt_vinfo))
>>> +{
>>> +  if (TREE_CODE (rhs1) != SSA_NAME)
>>> +   return NULL;
>>> +
>>> +  rhs1_type = search_type_for_mask (rhs1, vinfo);
>>> +  if (!rhs1_type)
>>> +   return NULL;
>>> +
>>> +  vectype1 = get_mask_type_for_scalar_type (rhs1_type);
>>> +
>>> +  if (!vectype1)
>>> +   return NULL;
>>> +
>>> +  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
>>> +  pattern_stmt = gimple_build_assign (lhs, rhs1);
>>
>> So what's the actual issue here?  That the conversion is spurious?
>> Why can't you accept this simply in vectorizable_assignment then?
>
> The problem is that conversion is supposed to be handled by
> vectorizable_conversion,
> but it fails to because it is not actually a conversion. I suppose it
> may be handled
> in vectorizable_assignment but I chose this pattern because it's meant
> to handle mask
> conversion issues.

I think it's always better to avoid patterns if you can.

Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>
>>> +  *type_out = vectype1;
>>> +  *type_in = vectype1;
>>> +  stmts->safe_push (last_stmt);
>>> +  if (dump_enabled_p ())
>>> +   dump_printf_loc (MSG_NOTE, vect_location,
>>> + "vect_recog_mask_conversion_pattern: 
>>> detected:\n");
>>> +
>>> +  return pattern_stmt;
>>> +}
>>> +
>>>if (rhs_code != BIT_IOR_EXPR
>>>&& rhs_code != BIT_XOR_EXPR
>>>&& rhs_code != BIT_AND_EXPR)


Re: [OpenACC 0/7] host_data construct

2015-12-02 Thread Tom de Vries

On 30/11/15 20:30, Julian Brown wrote:

 libgomp/
 * oacc-parallel.c (GOACC_host_data): New function.
 * libgomp.map (GOACC_host_data): Add to GOACC_2.0.1.
 * testsuite/libgomp.oacc-c-c++-common/host_data-1.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-2.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-3.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-4.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-5.c: New test.
 * testsuite/libgomp.oacc-c-c++-common/host_data-6.c: New test.



Hi,

At r231169, I'm seeing these failures for a no-accelerator setup:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-2.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-4.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test

...

Thanks,
- Tom


Re: [PTX] simplify movs

2015-12-02 Thread Bernd Schmidt

On 12/02/2015 04:09 PM, Nathan Sidwell wrote:

The PTX md file goes to a lot of effort handling SC and DC movs,
including for unspecs to mov low and high parts around.  However, these
code paths are not exercised in any gcc test or the build of newlib.
The generic handling of these movs deals with type punning, (using the
stack frame, if needed).  There doesn't appear a need for a separate
punbuffer.

Thus this patch deletes a lot of that machinery.


Hmm, that was definitely necessary at one point. I wonder what changed?


Bernd



Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-02 Thread Jeff Law

On 12/02/2015 02:54 AM, Richard Biener wrote:

Deferring to cfg_cleanup works because if cfg_cleanup does anything, it sets
LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it seems that
the gyrations we often do to avoid LOOPS_NEED_FIXUP are probably not all
that valuable in the end.  Anyway...


Yeah, I have partially working patches lying around to "fix" CFG cleanup to
avoid this.  Of course in the case of new loops appearing that's not easily
possible.
And that may argue that it's largely inevitable if we collapse a 
conditional (and thus delete an edge).






There's some fallout which I'm still exploring.  For example, we have cases
where removal of the edge by DOM results in removal of a PHI argument in the
target, which in turn results in the PHI becoming a degenerate which we can
then propagate away.  I have a possible solution for this that I'm playing
with.

I suspect the right path is to continue down this path.


Yeah, the issue here is that DOM isn't tracking which edges are executable
to handle merge PHIs (or to aovid doing work in unreachable regions).

Right.


It should

be possible to make it do that much like I extended SCCVN to do this
(when doing the DOM walk see if any incoming edge is marked executable
and if not, mark all outgoing edges as not executable, if the block is
executable
at the time we process the last stmt determine if we can compute the edge
that ends up always executed and mark all others as not executable)
Essentially yes. I'm using the not-executable flag and bypassing things 
when it's discovered.


The most interesting side effect, and one I haven't fully analyzed yet 
is an unexpected jump thread -- which I've traced back to differences in 
what the alias oracle is able to find when we walk unaliased vuses. 
Which makes totally no sense that it's unable to find the unaliased vuse 
in the simplified CFG, but finds it when we don't remove the 
unexecutable edge.  As I said, it makes no sense to me yet and I'm still 
digging.


jeff


Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-02 Thread Richard Biener
On Wed, Dec 2, 2015 at 4:31 PM, Jeff Law  wrote:
> On 12/02/2015 02:54 AM, Richard Biener wrote:
>>>
>>> Deferring to cfg_cleanup works because if cfg_cleanup does anything, it
>>> sets
>>> LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it seems
>>> that
>>> the gyrations we often do to avoid LOOPS_NEED_FIXUP are probably not all
>>> that valuable in the end.  Anyway...
>>
>>
>> Yeah, I have partially working patches lying around to "fix" CFG cleanup
>> to
>> avoid this.  Of course in the case of new loops appearing that's not
>> easily
>> possible.
>
> And that may argue that it's largely inevitable if we collapse a conditional
> (and thus delete an edge).
>
>
>>
>>> There's some fallout which I'm still exploring.  For example, we have
>>> cases
>>> where removal of the edge by DOM results in removal of a PHI argument in
>>> the
>>> target, which in turn results in the PHI becoming a degenerate which we
>>> can
>>> then propagate away.  I have a possible solution for this that I'm
>>> playing
>>> with.
>>>
>>> I suspect the right path is to continue down this path.
>>
>>
>> Yeah, the issue here is that DOM isn't tracking which edges are executable
>> to handle merge PHIs (or to aovid doing work in unreachable regions).
>
> Right.
>
>
> It should
>>
>> be possible to make it do that much like I extended SCCVN to do this
>> (when doing the DOM walk see if any incoming edge is marked executable
>> and if not, mark all outgoing edges as not executable, if the block is
>> executable
>> at the time we process the last stmt determine if we can compute the edge
>> that ends up always executed and mark all others as not executable)
>
> Essentially yes. I'm using the not-executable flag and bypassing things when
> it's discovered.
>
> The most interesting side effect, and one I haven't fully analyzed yet is an
> unexpected jump thread -- which I've traced back to differences in what the
> alias oracle is able to find when we walk unaliased vuses. Which makes
> totally no sense that it's unable to find the unaliased vuse in the
> simplified CFG, but finds it when we don't remove the unexecutable edge.  As
> I said, it makes no sense to me yet and I'm still digging.

The walking of PHI nodes is quite simplistic to avoid doing too much work so
an extra (not executable) edge may confuse it enough.  So this might be
"expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
trusted would be an option (also helping SCCVN).

Richard.

> jeff


RE: [mips] Rotate stack checking loop

2015-12-02 Thread Moore, Catherine


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Eric Botcazou
> Sent: Thursday, November 12, 2015 4:51 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [mips] Rotate stack checking loop
> 
> Hi,
> 
> this patch rotates the loop generated in the prologue to do stack checking
> when -fstack-check is specified, thereby saving one branch instruction.  It
> was initially implemented as a WHILE loop to match the generic
> implementation but can be turned into a DO-WHILE loop because the
> amount of stack to be checked is known at compile time (since it's the static
> part of the frame).
> 
> Unfortunately I don't have access to MIPS hardware any more so I only
> verified
> that the assembly code is as expected and can be assembled.   OK for
> mainline?
> 
> 
> 2015-11-12  Eric Botcazou  
> 
>   * config/mips/mips.c (mips_emit_probe_stack_range): Adjust.
>   (mips_output_probe_stack_range): Rotate the loop and simplify.
> 
This is OK.


Re: [PATCH, PING*4] Track indirect calls for call site information in debug info.

2015-12-02 Thread Pierre-Marie de Rodat

On 12/02/2015 02:57 PM, Jakub Jelinek wrote:

Ok, thanks.


Great, thank you! I’ve pushed the change.

--
Pierre-Marie de Rodat


Re: [OpenACC 0/7] host_data construct

2015-12-02 Thread Thomas Schwinge
Hi!

Cesar and Jim copied, for help with Fortran and generally testsuite
things.

On Mon, 30 Nov 2015 19:30:34 +, Julian Brown  
wrote:
> [patch]

First, thanks!

> Tests look OK (libgomp/gcc/g++/libstdc++), and the new ones pass.

I see a regression (ICE) in gfortran.dg/goacc/coarray.f95 (done: XFAILed,
and obsolete dg-excess-errors directives removed; compare to
gfortran.dg/goacc/coarray_2.f90), and I see new FAILs for non-offloading
execution of libgomp.oacc-c-c++-common/host_data-2.c,
libgomp.oacc-c-c++-common/host_data-4.c, and
libgomp.oacc-c-c++-common/host_data-5.c (done: see below); confirmed by a
number of reports on the  and
 mailing lists.  I can understand that you
didn't see the Fortran problem if not running Fortrant testing (but
why?), but it's strange that you didn't see the libgomp C/C++ FAILs.

A few patch review items, some of which I've already addressed (see
below).

> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -10279,6 +10279,8 @@ c_parser_omp_clause_name (c_parser *parser)
>   result = PRAGMA_OMP_CLAUSE_UNTIED;
> else if (!strcmp ("use_device_ptr", p))
>   result = PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR;
> +   else if (!strcmp ("use_device", p))
> + result = PRAGMA_OACC_CLAUSE_USE_DEVICE;

"use_device" sorts before "use_device_ptr".  (Done.)

> @@ -12940,6 +12951,10 @@ c_parser_oacc_all_clauses (c_parser *parser, 
> omp_clause_mask mask,
> clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
> c_name = "self";
> break;
> + case PRAGMA_OACC_CLAUSE_USE_DEVICE:
> +   clauses = c_parser_oacc_clause_use_device (parser, clauses);
> +   c_name = "use_device";
> +   break;
>   case PRAGMA_OACC_CLAUSE_SEQ:
> clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ,
>   clauses);

Sorting?  (Done.)

> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -29232,6 +29232,8 @@ cp_parser_omp_clause_name (cp_parser *parser)
>   result = PRAGMA_OMP_CLAUSE_UNTIED;
> else if (!strcmp ("use_device_ptr", p))
>   result = PRAGMA_OMP_CLAUSE_USE_DEVICE_PTR;
> +   else if (!strcmp ("use_device", p))
> + result = PRAGMA_OACC_CLAUSE_USE_DEVICE;
> break;

Likewise.  (Done.)

> @@ -31598,6 +31600,11 @@ cp_parser_oacc_all_clauses (cp_parser *parser, 
> omp_clause_mask mask,
> clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
> c_name = "self";
> break;
> + case PRAGMA_OACC_CLAUSE_USE_DEVICE:
> +   clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_USE_DEVICE,
> + clauses);
> +   c_name = "use_device";
> +   break;
>   case PRAGMA_OACC_CLAUSE_SEQ:
> clauses = cp_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ,
>clauses, here);

Likewise.  (Done.)

> +#define OACC_HOST_DATA_CLAUSE_MASK   \
> +  ( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_USE_DEVICE) )
> +
> +/* OpenACC 2.0:
> +  # pragma acc host_data  new-line
> +  structured-block  */

Define OACC_HOST_DATA_CLAUSE_MASK after the "accepted syntax" comment.
(Done.)

There is no handlig of OMP_CLAUSE_USE_DEVICE in
gcc/cp/pt.c:tsubst_omp_clauses.  (Done.)

> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c

> @@ -6418,6 +6422,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
|if (!lang_GNU_Fortran ())
|  switch (code)
|{
|case OMP_TARGET:
>case OMP_TARGET_DATA:
>case OMP_TARGET_ENTER_DATA:
>case OMP_TARGET_EXIT_DATA:
> +  case OACC_HOST_DATA:
>   ctx->target_firstprivatize_array_bases = true;
>default:
>   break;

I understand it's not yet relevant/supported for OpenMP in Fortran, but
why is C/C++ vs. Fortran being handled differently here for OpenACC
host_data?

> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c

> +void
> +GOACC_host_data (int device, size_t mapnum,
> +  void **hostaddrs, size_t *sizes, unsigned short *kinds)
> +{
> +  bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
> +  struct target_mem_desc *tgt;
> +
> +#ifdef HAVE_INTTYPES_H
> +  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p\n",
> +   __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds);
> +#else
> +  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p\n",
> +   __FUNCTION__, (unsigned long) mapnum, hostaddrs, sizes, kinds);
> +#endif
> +
> +  goacc_lazy_initialize ();
> +
> +  struct goacc_thread *thr = goacc_thread ();
> +  struct gomp_device_descr *acc_dev = thr->dev;
> +
> +  /* Host fallback or 'do nothing'.  */
> +  if ((acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> +  || host_fallback)
> +{
> +  tgt = gomp_map_vars (NULL, 0, NULL, NULL, NULL, NULL, true,
> +GOMP_MAP_VARS_OPENACC);
> +  tgt

Re: Add fuzzing coverage support

2015-12-02 Thread Dmitry Vyukov
ping

Number of bugs found with this coverage in kernel already crossed 40:
https://github.com/google/syzkaller/wiki/Found-Bugs




On Fri, Nov 27, 2015 at 3:30 PM, Dmitry Vyukov  wrote:
> +syzkaller group
>
> On Fri, Nov 27, 2015 at 3:28 PM, Dmitry Vyukov  wrote:
>> Hello,
>>
>> This patch adds support for coverage-guided fuzzing:
>> https://codereview.appspot.com/280140043
>>
>> Coverage-guided fuzzing is a powerful randomized testing technique
>> that uses coverage feedback to determine new interesting inputs to a
>> program. Some examples of the systems that use it are AFL
>> (http://lcamtuf.coredump.cx/afl/) and LibFuzzer
>> (http://llvm.org/docs/LibFuzzer.html). Compiler coverage
>> instrumentation allows to make fuzzing more efficient as compared to
>> binary instrumentation.
>>
>> Flag that enables coverage is named -fsanitize-coverage=trace-pc to
>> make it consistent with similar functionality in clang, which already
>> supports a set of fuzzing coverage modes:
>> http://clang.llvm.org/docs/SanitizerCoverage.html
>> -fsanitize-coverage=trace-pc is not yet supported in clang, but we
>> plan to add it.
>>
>> This particular coverage mode simply inserts function calls into every
>> basic block.
>> I've built syzkaller, a Linux system call fuzzer
>> (https://github.com/google/syzkaller), using this functionality. The
>> fuzzer has found 30+ previously unknown bugs in kernel
>> (https://github.com/google/syzkaller/wiki/Found-Bugs) in slightly more
>> than a month (while kernel was extensively fuzzed with trinity --
>> non-guided fuzzer). Quentin also built some kernel fuzzer on top of
>> it.
>>
>> Why not gcov. Typical fuzzing loop looks as follows: (1) reset
>> coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
>> typical coverage can be just a dozen of basic blocks (e.g. invalid
>> input). In such context gcov becomes prohibitively expensive as
>> reset/collect coverage steps depend on total number of basic
>> blocks/edges in program (in case of kernel it is about 2M). Cost of
>> this "tracing" mode depends on number of executed basic blocks/edges.
>> On top of that, kernel required per-thread coverage because there are
>> always background threads and unrelated processes that also produce
>> coverage. With inlined gcov instrumentation per-thread coverage is not
>> possible.
>> Inlined fast paths do not make lots of sense in fuzzing scenario,
>> because lots of code is executed just once in between resets. Also it
>> is not really possible to inline accesses to per-cpu and per-task data
>> structures for kernel.
>>
>> OK for trunk?


  1   2   >