date:20151108

Re: [PATCH] PR fortran/68244 -- Check for NULL() in an array spec.

2015-11-08 Thread Paul Richard Thomas

Hi Steve,

That's OK for trunk.

Thanks for the patch.

Paul

On 7 November 2015 at 21:20, Steve Kargl
 wrote:
> NULL() can only appear in a few situations.  It cannot
> be part of an array spec.  See testcase for example.
> OK to commit?
>
> 2015-11-07  Steven G. Kargl  
>
> PR fortran/68224
> * array.c (match_array_element_spec): Check of invalid NULL().
> While here, fix nearby comments.
>
> 2015-11-07  Steven G. Kargl  
>
> PR fortran/68224
> * gfortran.dg/pr68224.f90: New test.
>
> --
> Steve



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx

Re: [PATCH PR52272]Be smart when adding iv candidates

2015-11-08 Thread Richard Biener

On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng"  
wrote:
>On Fri, Nov 6, 2015 at 9:24 PM, Richard Biener
> wrote:
>> On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng  wrote:
>>> Hi,
>>> PR52272 reported a performance regression in spec2006/410.bwaves
>once GCC is
>>> prevented from representing address of one memory object using
>address of
>>> another memory object.  Also as I commented in that PR, we have two
>possible
>>> fixes for this:
>>> 1) Improve how TMR.base is deduced, so that we can represent addr of
>mem obj
>>> using another one, while not breaking PR50955.
>>> 2) Add iv candidates with base object stripped.  In this way, we use
>the
>>> common base-stripped part to represent all address expressions, in
>the form
>>> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>>>
>>> In terms of code generation, method 2) is at least as good as 1),
>actually
>>> better in my opinion.  The problem of 2) is we need to tell when iv
>>> candidates should be added for the common part and when shouldn't. 
>This
>>> issue can be generalized and described as: We know IVO tries to add
>>> candidates by deriving from iv uses.  One disadvantage is that
>candidates
>>> are derived from iv use independently.  It doesn't take common sub
>>> expression among different iv uses into consideration.  As a result,
>>> candidate for common sub expression is not added, while many useless
>>> candidates are added.
>>>
>>> As a matter of fact, candidate derived from iv use is useful only if
>it's
>>> common enough and could be shared among different uses.  A candidate
>is most
>>> likely useless if it's derived from a single use and could not be
>shared by
>>> others.  This patch works in this way by firstly recording all kinds
>>> candidates derived from iv uses, then adding candidates for common
>ones.
>>>
>>> The patch improves 410.bwaves by 3-4% on x86_64.  I also saw
>regression for
>>> 400.perlbench and small regression for 401.bzip on x86_64, but I can
>confirm
>>> they are false alarms caused by align issues.
>>> For aarch64, fp cases are obviously improved for both spec2000 and
>spec2006.
>>> Also the patch causes 2-3% regression for 459.GemsFDTD, which I
>think is
>>> another irrelevant issue caused by heuristic candidate selecting
>algorithm.
>>> Unfortunately, I don't have fix to it currently.
>>>
>>> This patch may add more candidates in some cases, but generally
>candidates
>>> number is smaller because we don't need to add useless candidates
>now.
>>> Statistic data shows there are quite fewer loops with more than 30
>>> candidates when building spec2k6 on x86_64 using this patch.
>>>
>>> Bootstrap and test on x86_64.  I will re-test it against latest
>trunk on
>>> AArch64.  Is it OK?
>>
>> +inline bool
>> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
>> +  const iv_common_cand *ccand2)
>> +{
>> +  return ccand1->hash == ccand2->hash
>> +&& operand_equal_p (ccand1->base, ccand2->base, 0)
>> +&& operand_equal_p (ccand1->step, ccand2->step, 0)
>> +&& TYPE_PRECISION (TREE_TYPE (ccand1->base))
>> + == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>>
>Hi Richard,
>Thanks for reviewing.
>
>> I'm wondering on the TYPE_PRECISION check.  a) why is that needed?
>Because operand_equal_p doesn't check type precision for constant int
>nodes, and IVO needs to take precision into consideration.

Ok

>> and b) what kind of tree is base so that it is safe to inspect
>TYPE_PRECISION
>> unconditionally?
>Both SCEV and IVO work on expressions with type satisfying
>POINTER_TYPE_P or INTEGRAL_TYPE_P, so it's safe to access precision
>unconditionally?

Yes.  Patch is OK then.

Richard.
>
>>
>> +  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
>> +  if (*slot == NULL)
>> +{
>> +  *slot = XNEW (struct iv_common_cand);
>>
>> allocate from the IV obstack instead?  I see we do a lot of heap
>allocations
>> in IVOPTs, so we can improve that as followup as well.
>>
>Yes, small structures in IVO like iv, iv_use, iv_cand, iv_common_cand
>are better to be allocated in obstack.  Actually I have already make
>that change to struct iv.  others will be followup too.
>
>Thanks,
>bin
>> We probably should empty the obstack after each processed loop.
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks,
>>> bin
>>>
>>> 2015-11-03  Bin Cheng  
>>>
>>> PR tree-optimization/52272
>>> * tree-ssa-loop-ivopts.c (struct iv_common_cand): New
>struct.
>>> (struct iv_common_cand_hasher): New struct.
>>> (iv_common_cand_hasher::hash): New function.
>>> (iv_common_cand_hasher::equal): New function.
>>> (struct ivopts_data): New fields, iv_common_cand_tab and
>>> iv_common_cands.
>>> (tree_ssa_iv_optimize_init): Initialize above fields.
>>> (record_common_cand, common_cand_cmp): New functions.
>>> (add_iv_candidate_derived_from_uses): New function.
>>> (add_iv_candidate_for_use): R

Re: [PATCH] Add -fchecking

2015-11-08 Thread Richard Biener

On November 8, 2015 1:40:57 AM GMT+01:00, Jeff Law  wrote:
>On 11/07/2015 01:47 PM, Gerald Pfeifer wrote:
>> On Tue, 27 Oct 2015, Richard Biener wrote:
>>> This adds -fchecking as a way to enable internal consistency checks
>>> even in release builds (or disable checking with -fno-checking - up
>to
>>> a certain extent - with checking enabled).
>>
>> How (much) do we want to advertize this?
>I don't think much -- it's really a developer-centric option.

Might be worth mentioning on bugs.html as sometimes miscompiles can be detected 
by -fchecking.

Richard.

>Jeff

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-11-08 Thread Kugan


Thanks Richard for the comments.  Please find the attached patches which
now passes bootstrap with x86_64-none-linux-gnu, aarch64-linux-gnu  and
ppc64-linux-gnu. Regression testing is ongoing. Please find the comments
for your questions/suggestions below.

> 
> I notice
> 
> diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
> index 82fd4a1..80fcf70 100644
> --- a/gcc/tree-ssanames.c
> +++ b/gcc/tree-ssanames.c
> @@ -207,7 +207,8 @@ set_range_info (tree name, enum value_range_type 
> range_type,
>unsigned int precision = TYPE_PRECISION (TREE_TYPE (name));
> 
>/* Allocate if not available.  */
> -  if (ri == NULL)
> +  if (ri == NULL
> +  || (precision != ri->get_min ().get_precision ()))
> 
> and I think you need to clear range info on promoted SSA vars in the
> promotion pass.

Done.

> 
> The basic "structure" thing still remains.  You walk over all uses and
> defs in all stmts
> in promote_all_stmts which ends up calling promote_ssa_if_not_promoted on all
> uses and defs which in turn promotes (the "def") and then fixes up all
> uses in all stmts.

Done.

> 
> Instead of this you should, in promote_all_stmts, walk over all uses doing 
> what
> fixup_uses does and then walk over all defs, doing what promote_ssa does.
> 
> +case GIMPLE_NOP:
> +   {
> + if (SSA_NAME_VAR (def) == NULL)
> +   {
> + /* Promote def by fixing its type for anonymous def.  */
> + TREE_TYPE (def) = promoted_type;
> +   }
> + else
> +   {
> + /* Create a promoted copy of parameters.  */
> + bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> 
> I think the uninitialized vars are somewhat tricky and it would be best
> to create a new uninit anonymous SSA name for them.  You can
> have SSA_NAME_VAR != NULL and def _not_ being a parameter
> btw.

Done. I also had to do some changes to in couple of other places to
reflect this.
They are:
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -302,6 +302,7 @@ phi_rank (gimple *stmt)
 {
   tree arg = gimple_phi_arg_def (stmt, i);
   if (TREE_CODE (arg) == SSA_NAME
+ && SSA_NAME_VAR (arg)
  && !SSA_NAME_IS_DEFAULT_DEF (arg))
{
  gimple *def_stmt = SSA_NAME_DEF_STMT (arg);
@@ -434,7 +435,8 @@ get_rank (tree e)
   if (gimple_code (stmt) == GIMPLE_PHI)
return phi_rank (stmt);

-  if (!is_gimple_assign (stmt))
+  if (!is_gimple_assign (stmt)
+ && !gimple_nop_p (stmt))
return bb_rank[gimple_bb (stmt)->index];

and

--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -752,7 +752,8 @@ verify_use (basic_block bb, basic_block def_bb,
use_operand_p use_p,
   TREE_VISITED (ssa_name) = 1;

   if (gimple_nop_p (SSA_NAME_DEF_STMT (ssa_name))
-  && SSA_NAME_IS_DEFAULT_DEF (ssa_name))
+  && (SSA_NAME_IS_DEFAULT_DEF (ssa_name)
+ || SSA_NAME_VAR (ssa_name) == NULL))
 ; /* Default definitions have empty statements.  Nothing to do.  */
   else if (!def_bb)
 {

Does this look OK?

> 
> +/* Return true if it is safe to promote the defined SSA_NAME in the STMT
> +   itself.  */
> +static bool
> +safe_to_promote_def_p (gimple *stmt)
> +{
> +  enum tree_code code = gimple_assign_rhs_code (stmt);
> +  if (gimple_vuse (stmt) != NULL_TREE
> +  || gimple_vdef (stmt) != NULL_TREE
> +  || code == ARRAY_REF
> +  || code == LROTATE_EXPR
> +  || code == RROTATE_EXPR
> +  || code == VIEW_CONVERT_EXPR
> +  || code == BIT_FIELD_REF
> +  || code == REALPART_EXPR
> +  || code == IMAGPART_EXPR
> +  || code == REDUC_MAX_EXPR
> +  || code == REDUC_PLUS_EXPR
> +  || code == REDUC_MIN_EXPR)
> +return false;
> +  return true;
> 
> huh, I think this function has an odd name, maybe
> can_promote_operation ()?  Please
> use TREE_CODE_CLASS (code) == tcc_reference for all _REF trees.

Done.

> 
> Note that as followup things like the rotates should be "expanded" like
> we'd do on RTL (open-coding the thing).  And we'd need a way to
> specify zero-/sign-extended loads.
> 
> +/* Return true if it is safe to promote the use in the STMT.  */
> +static bool
> +safe_to_promote_use_p (gimple *stmt)
> +{
> +  enum tree_code code = gimple_assign_rhs_code (stmt);
> +  tree lhs = gimple_assign_lhs (stmt);
> +
> +  if (gimple_vuse (stmt) != NULL_TREE
> +  || gimple_vdef (stmt) != NULL_TREE
> 
> I think the vuse/vdef check is bogus, you can have a use of 'i_3' in say
> _2 = a[i_3];
> 
When I remove this, I see errors in stmts like:

unsigned char
unsigned int
# .MEM_197 = VDEF <.MEM_187>
fs_9(D)->fde_encoding = _154;


> +  || code == VIEW_CONVERT_EXPR
> +  || code == LROTATE_EXPR
> +  || code == RROTATE_EXPR
> +  || code == CONSTRUCTOR
> +  || code == BIT_FIELD_REF
> +  || code == COMPLEX_EXPR
> +  || code == ASM_EXPR
> +  || VECTOR_TYPE_P (TREE_TYPE (lhs)))
> +return false;
> +  return true;
> 
> ASM_EXPR can never appear here.  I think PROMOTE_MODE

Re: [PATCH] Merge from gomp-4_5-branch to trunk

2015-11-08 Thread Thomas Schwinge

Hi!

On Thu, 5 Nov 2015 16:29:36 +0100, Jakub Jelinek  wrote:
> I've merged the current state of gomp-4_5-branch into trunk, after
> bootstrapping/regtesting it on x86_64-linux and i686-linux.

Merged trunk r229814 into gomp-4_0-branch in r229947:

commit 7a6eb6b7cf9b72cf68a72b29d3bdc33e89dae58b
Merge: f782e15 9561765
Author: tschwinge 
Date:   Sun Nov 8 11:11:22 2015 +

svn merge -r 229809:229814 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229947 
138bc75d-0d04-0410-961f-82ee72b054a4

> There are
> +FAIL: gfortran.dg/goacc/private-3.f95   -O  (test for excess errors)

[...]/gcc/testsuite/gfortran.dg/goacc/private-3.f95:19:0: Error: reduction 
variable 'k' is private in outer context

> +[more FAILs of the same kind]
> regressions, but I really don't know why OpenACC allows reductions against
> private variables, so either the testcases are wrong, or if OpenACC
> reduction can work against private vars (automatic vars inside of parallel
> too?), then [...]

These FAILs are not seen in the merged sources, and as Nathan noted in

and
,
they also disappear when applying the OpenACC firstprivate support patch
on trunk.

> This is much smaller merge than the one from 3 weeks ago

Confirmed.  ;-)


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-08 Thread Andreas Schwab

This is causing a bootstrap comparison failure in gcc/go/gogo.o.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

RFC: Experimental use of Sphinx for GCC documentation

2015-11-08 Thread David Malcolm

I've been experimenting with using Sphinx [1] for GCC's documentation.

You can see an HTML sample of GCC docs built with Sphinx here:
https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/gcc.html
(it's a work-in-progress; i.e. there are bugs).

Compare with:
 https://gcc.gnu.org/onlinedocs/gcc/index.html


In particular, note how options get stable, clickable URLs:
https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/option-summary.html

as compared to:
https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html#Option-Summary


Example of an option URL, for "-ftree-loop-if-convert-stores" (also
showing syntax-highlighted example code):
https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/options-that-control-optimization.html#cmdoption-ftree-loop-if-convert-stores

as compared to:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-ftree-loop-if-convert-1054
 (where I had to use via "View Source" to find that URL, and what's up with 
that "-1054" wart? note also that the number can change, making the URL 
unstable)


Example of a stable URL for "What does -O2 do?":
https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/options-that-control-optimization.html#cmdoption-O2

...etc


Every HTML page also gets a "Show Source" link, showing the input
markup.


Sphinx is a modern, well-maintained documentation toolchain, implemented
in Python.  The input format is an easy minimal-markup "semantic"
format; the output is high-quality HTML, with PDF and texinfo supported
amongst other output formats.   It was created for documenting the
Python programming language, and is in use by many FLOSS projects for
their docs, including e.g. LLVM (http://llvm.org/docs/).  See also:
http://sphinx-doc.org/examples.html (though this list is far from
complete).   It's BSD-licensed.

We are currently using Sphinx for libgccjit:
  https://gcc.gnu.org/onlinedocs/jit/index.html
and, I believe, for Ada:
https://docs.adacore.com/gnat_ugn-docs/html/gnat_ugn/gnat_ugn.html [2]

I've also used it for generating slides for Cauldron presentations:
https://dmalcolm.fedorapeople.org/presentations/cauldron-2014/rtl/
https://dmalcolm.fedorapeople.org/presentations/cauldron-2014/jit/

and for gcc-python-plugin:
 https://gcc-python-plugin.readthedocs.org/en/latest/



I've written a tool called texi2rst which attempts to convert a .texi
based document to .rst ("restructured text", the input format for
Sphinx):
 https://github.com/davidmalcolm/texi2rst

This is what generated the examples above.

It doesn't *quite* do a direct .texi to .rst conversion yet: it can take
the XML output from texinfo's "makeinfo --xml", and generate either one
big .rst file, or a group of smaller .rst files.

My hope was that for every gcc/docs/foo.texi file we have, my tool would
be able to generate a gcc/docs/foo.rst (maybe retaining the name, to
allow for sane diff and hence sane patch review).

Unfortunately, "makeinfo --xml" resolves includes and conditional
processing, so the underlying input structure of .texinfo files is lost
at that point.

To fix that, I've been working on a frontend from texi2rst that
re-implements the .texi to xml processing, retaining information on
includes, and directives, so that I can translate them to
corresponding .rst directives.  Unfortunately it's clear that I'm not
going to finish that before stage 1 closes - but I think it's feasible
in the stage3 timeframe.

Hence in the example posted above, the doc is split into pages based on
nodes, named after the nodes, and thus get rather long names e.g.
options-that-control-optimization.html, generated from
options-that-control-optimization.rst.  In a more polished version,
these names would be saner.

The primary advantages of .rst/sphinx over .texi/texinfo I see are in
the generated HTML:

* sane, stable URLs (so e.g. there is a reliable URL for the docs for,
say, "-Wall").

* a page-splitting structure that make sense, to me, at least [3]

* much more use of markup, with restrained and well-chosen CSS
(texinfo's HTML seems to ignore much of the inline markup in
the .texinfo file)

* autogenerated internal links, so that almost everything is clickable,
and will take you somewhere sane, by default

* syntax-highlighting of code examples, with support for multiple
programming languages (note the mixture of C, C++, Fortran, etc in the
docs for the gcc options).

* looks modern and fresh (IMHO), letting casual observers see that the
project is alive and kicking.


Thoughts?
Dave

[1] http://sphinx-doc.org/
[2] I couldn't find Sphinx-built HTML for Ada on the gcc website, just
the Texinfo output here:
https://gcc.gnu.org/onlinedocs/gnat_ugn/index.html
[3] I have never fathomed the way texinfo's navigation works, for HTML,
at least, and I believe I'm not the only one; I generally pick the
all-in-one-HTML-page option when viewing texinfo-html docs and do
textual searches, since otherwise I usually can't find the thing I'm
looking for (or have t

Re: [OpenACC] internal fn folding

2015-11-08 Thread Thomas Schwinge

Hi!

On Thu, 5 Nov 2015 10:48:02 -0500, Nathan Sidwell  wrote:
> On 11/04/15 05:02, Bernd Schmidt wrote:
> > On 11/02/2015 02:56 PM, Nathan Sidwell wrote:
> >> On 10/28/15 14:40, Nathan Sidwell wrote:
> >>> Richard,
> >>> this patch adds folding for the new GOACC_DIM_POS and GOACC_DIM_SIZE
> >>> internal
> >>> functions.  IIUC gimple_fold_call is the right place to add this.
> >>>
> >>> The size of a compute dimension is very often a compile-time
> >>> constant.  On the
> >>> host, in particular it's 1, which means we can deduce the POS must be
> >>> zero.

> This is what I committed, using the helpers I recently added. (I realized we 
> can 
> only get here for functions with the oacc attribute already set)

> --- gimple-fold.c (revision 229809)
> +++ gimple-fold.c (working copy)

> +/* Transform IFN_GOACC_DIM_SIZE and IFN_GOACC_DIM_POS internal
> +   function calls to constants, where possible.  */
> +
> +static tree
> +fold_internal_goacc_dim (const gimple *call)
> +{
> +  int axis = get_oacc_ifn_dim_arg (call);
> +  int size = get_oacc_fn_dim_size (current_function_decl, axis);
> +  bool is_pos = gimple_call_internal_fn (call) == IFN_GOACC_DIM_POS;
> +  tree result = NULL_TREE;
> +
> +  /* If the size is 1, or we only want the size and it is not dynamic,
> + we know the answer.  */
> +  if (size == 1 || (!is_pos && size))
> +{
> +  tree type = TREE_TYPE (gimple_call_lhs (call));
> +  result = build_int_cst (type, size - is_pos);
> +}
> +
> +  return result;
> +}

> @@ -3106,6 +3129,10 @@ gimple_fold_call (gimple_stmt_iterator *
> return true;
>   }
> break;
> + case IFN_GOACC_DIM_SIZE:
> + case IFN_GOACC_DIM_POS:
> +   result = fold_internal_goacc_dim (stmt);
> +   break;

Merging this into gomp-4_0-branch, we'd run into a lot of regressions
(for OpenACC kernels construct), for example:

[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c: In 
function 'main._omp_fn.0':
[...]/source-gcc/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c:8:0: 
internal compiler error: Segmentation fault
0xac387f crash_signal
[...]/source-gcc/gcc/toplev.c:336
0x9b8399 tree_int_cst_elt_check
[...]/source-gcc/gcc/tree.h:3129
0x9b8399 get_oacc_fn_dim_size(tree_node*, int)
[...]/source-gcc/gcc/omp-low.c:12630
0x86c530 fold_internal_goacc_dim
[...]/source-gcc/gcc/gimple-fold.c:2917
0x86c530 gimple_fold_call
[...]/source-gcc/gcc/gimple-fold.c:3134
0x86dfe4 fold_stmt_1
[...]/source-gcc/gcc/gimple-fold.c:3702
0xbf9953 execute
[...]/source-gcc/gcc/tree-ssa-forwprop.c:2310

The dims in gcc/omp-low.c:get_oacc_fn_dim_size don't have values set, so
the "TREE_INT_CST_LOW (TREE_VALUE (dims))" fails.  I have not analyzed
what exactly is going wrong; I just figured out that it's related to the
IFN_GOACC_DIM_POS without LHS usage that Tom introduced in
gomp-4_0-branch r228735 for OpenACC kernels to "neuter gang-single code
in gang-redundant mode",
.

So, in r229948 I merged Nathan's trunk r229816 into gomp-4_0-branch with
an additional hack as indicated ("++" prefix) by the following three-way
diff:

commit b421a6415fc223866bc97f8248a1fbd0a524505e
Merge: 7a6eb6b b0ccb4e
Author: tschwinge 
Date:   Sun Nov 8 13:49:46 2015 +

svn merge -r 229814:229816 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229948 
138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/ChangeLog |  6 ++
 gcc/gimple-fold.c | 33 +
 2 files changed, 39 insertions(+)

diff --cc gcc/gimple-fold.c
index c9b9593,45840af..869c6c2
--- gcc/gimple-fold.c
+++ gcc/gimple-fold.c
@@@ -2906,6 -2907,28 +2907,34 @@@ gimple_fold_builtin (gimple_stmt_iterat
return false;
  }
  
+ /* Transform IFN_GOACC_DIM_SIZE and IFN_GOACC_DIM_POS internal
+function calls to constants, where possible.  */
+ 
+ static tree
+ fold_internal_goacc_dim (const gimple *call)
+ {
++  /* TODO.  There is something going wrong here, for the gang_single
++ IFN_GOACC_DIM_POS without LHS, generated in 
gcc/omp-low.c:lower_omp_target
++ for is_oacc_kernels (see gomp-4_0-branch r228735).  */
++  if (gimple_call_lhs (call) == NULL_TREE)
++return NULL_TREE;
++
+   int axis = get_oacc_ifn_dim_arg (call);
+   int size = get_oacc_fn_dim_size (current_function_decl, axis);
+   bool is_pos = gimple_call_internal_fn (call) == IFN_GOACC_DIM_POS;
+   tree result = NULL_TREE;
+ 
+   /* If the size is 1, or we only want the size and it is not dynamic,
+  we know the answer.  */
+   if (size == 1 || (!is_pos && size))
+ {
+   tree type = TREE_TYPE (gimple_call_lhs (call));
+   result = build_int_cst (type, size - is_pos);
+ }
+ 
+   return result;
+ }
+ 
  /* Return true if ARG0 CODE ARG1 in infinite s

Re: RFC: Experimental use of Sphinx for GCC documentation

2015-11-08 Thread Arnaud Charlet

We've switched the Ada doc to sphinx indeed, so can only be
in favor of this change for the rest of GCC.

We do have also a texi2rst script which handles 90% of the work, the
rest requiring manual adaptations. I can send the script we've used if
this can help.

Arno

Re: OpenACC declare directive updates

2015-11-08 Thread James Norris


Jakub,

The attached patch and ChangeLog reflect the updates from your
review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00714.html.
All of the issues pointed out, have been address.

With the changes made in this patch I think I'm handling the
situation that you pointed out here correctly:

On Fri, Nov 06, 2015 at 01:45:09PM -0600, James Norris wrote:

Also, wonder about BLOCK stmt in Fortran, that can give you variables that
don't live through the whole function, but only a portion of it even in
Fortran.

OK to commit to trunk?

Thanks!
Jim

2015-XX-XX  James Norris  
Cesar Philippidis  

gcc/fortran/
* dump-parse-tree.c (show_namespace): Handle declares.
* f95-lang.c (gfc_attribute_table): New entry.
* gfortran.h (struct symbol_attribute): New fields.
(enum gfc_omp_map_map): Add OMP_MAP_DEVICE_RESIDENT and OMP_MAP_LINK.
(OMP_LIST_LINK): New enum.
(struct gfc_oacc_declare): New structure.
(gfc_get_oacc_declare): New definition.
(struct gfc_namespace): Change type.
(enum gfc_exec_op): Add EXEC_OACC_DECLARE.
(struct gfc_code): New field.
* module.c (enum ab_attribute): Add AB_OACC_DECLARE_CREATE,
AB_OACC_DECLARE_COPYIN, AB_OACC_DECLARE_DEVICEPTR,
AB_OACC_DECLARE_DEVICE_RESIDENT, AB_OACC_DECLARE_LINK
(attr_bits): Add new initializers.
(mio_symbol_attribute): Handle new atributes.
* openmp.c (gfc_free_oacc_declare_clauses): New function.
(OMP_CLAUSE_LINK): New definition.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_LINK.
(OACC_DECLARE_CLAUSES): Add OMP_CLAUSE_LINK
(gfc_match_oacc_declare): Add checking and module handling.
(gfc_resolve_oacc_declare): Use duplicate detection.
* parse.c (case_decl): Add ST_OACC_DECLARE.
(parse_spec): Remove handling.
(parse_progunit): Remove handling.
* parse.h (struct gfc_state_data): Change type.
* resolve.c (gfc_resolve_blocks): Handle EXEC_OACC_DECLARE.
* st.c (gfc_free_statement): Handle EXEC_OACC_DECLARE.
* symbol.c (check_conflict): Add conflict checks.
(gfc_add_oacc_declare_create, gfc_add_oacc_declare_copyin, 
gfc_add_oacc_declare_deviceptr, gfc_add_oacc_declare_device_resident):
New functions.
(gfc_copy_attr): Handle new symbols.
* trans-decl.c (add_attributes_to_decl): Create identifier.
(struct oacc_return): New structure.
(find_oacc_return, add_clause, find_module_oacc_declare_clauses,
finish_oacc_declare): New functions.
(gfc_generate_function_code): Replace with call.
* trans-openmp.c (gfc_trans_omp_clauses): Add conditional.
(gfc_trans_oacc_declare): Reimplement.
(gfc_trans_oacc_directive): Handle EXEC_OACC_DECLARE.
* trans-stmt.c (gfc_trans_block_construct): Replace with call.
* trans-stmt.h (gfc_trans_oacc_declare): Remove argument.
* trans.c (trans_code): Handle EXEC_OACC_DECLARE.

gcc/testsuite
* gfortran.dg/goacc/declare-1.f95: Update test.
* gfortran.dg/goacc/declare-2.f95: New test.

libgomp/
* testsuite/libgomp.oacc-fortran/declare-1.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/declare-3.f90: Likewise.
* testsuite/libgomp.oacc-fortran/declare-4.f90: Likewise.
* testsuite/libgomp.oacc-fortran/declare-5.f90: Likewise.
diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 83ecbaa..48476af 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -2570,12 +2570,16 @@ show_namespace (gfc_namespace *ns)
   for (eq = ns->equiv; eq; eq = eq->next)
 show_equiv (eq);
 
-  if (ns->oacc_declare_clauses)
+  if (ns->oacc_declare)
 {
+  struct gfc_oacc_declare *decl;
   /* Dump !$ACC DECLARE clauses.  */
-  show_indent ();
-  fprintf (dumpfile, "!$ACC DECLARE");
-  show_omp_clauses (ns->oacc_declare_clauses);
+  for (decl = ns->oacc_declare; decl; decl = decl->next)
+	{
+	  show_indent ();
+	  fprintf (dumpfile, "!$ACC DECLARE");
+	  show_omp_clauses (decl->clauses);
+	}
 }
 
   fputc ('\n', dumpfile);
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 2e91470..a8458b0 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -99,6 +99,8 @@ static const struct attribute_spec gfc_attribute_table[] =
affects_type_identity } */
   { "omp declare target", 0, 0, true,  false, false,
 gfc_handle_omp_declare_target_attribute, false },
+  { "oacc declare", 0, 0, true,  false, false,
+gfc_handle_omp_declare_target_attribute, false },
   { NULL,		  0, 0, false, false, false, NULL, false }
 };
 
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index e13b4d4..3965b08 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -841,6 +841,13 @

Re: [OpenACC] declare directive

2015-11-08 Thread James Norris

Jakub,

The attached patch and ChangeLog reflect the updates from your
review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00703.html.

The most significant change was the one that you suggestion:

On 11/06/2015 02:28 PM, Jakub Jelinek wrote:
> Perhaps what would work is stick the exit clauses you need for automatic
> vars in the function inside of some pointer map / hash table / whatever,
> and then in gimplify_bind_expr in the
>/* Add clobbers for all variables that go out of scope.  */
> if if flag_openacc && the pointer_map /hash table has any entries look
> up each variable in there and collect the clauses from those vars that go
> out of scope, after the loop if any were collected construct the statement
> you need prepend it to cleanup (so that it works before restoring VLA memory
> and before the clobber stmts).

This particular change allowed for the removal of all the
'stuff looks broken' code. Thanks for the suggestion.

The following change I did not address:

On 11/06/2015 01:03 PM, Jakub Jelinek wrote
>> @@ -5841,6 +5863,8 @@ omp_default_clause (struct gimplify_omp_ctx *ctx, tree 
decl,

>> flags |= GOVD_FIRSTPRIVATE;
>> break;
>>   case OMP_CLAUSE_DEFAULT_UNSPECIFIED:
>> +  if (is_global_var (decl) && device_resident_p (decl))
>> +flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;
>
> I don't think you want to do this except for (selected or all?)
> OpenACC contexts.  Say, I don't see why one couldn't e.g. try to mix
> OpenMP host parallelization or tasking with OpenACC offloading,
> and that affecting in weird way OpenMP semantics.

A colleague is adding code to allow for the detection of OpenACC contexts.
This change has yet make it to trunk. I need some guidance from you whether
I can leave the code as is and resolve the issue at stage3 time,
or remove the code and the associated function device_resident_()
and address the issue at stage 3.

OK to commit to trunk?

Thanks!
Jim

2015-XX-XX  James Norris  
Joseph Myers  

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add entry for declare directive. 
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DECLARE.
(enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and
PRAGMA_OACC_CLAUSE_LINK.

gcc/c/
* c-parser.c (c_parser_pragma): Handle PRAGMA_OACC_DECLARE.
(c_parser_omp_clause_name): Handle 'device_resident' clause.
(c_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(c_parser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OACC_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(c_parser_oacc_declare): New function.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Handle 'device_resident'
clause.
(cp_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(cp_paser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(cp_parser_oacc_declare): New function.
(cp_parser_pragma): Handle PRAGMA_OACC_DECLARE.
* pt.c (tsubst_expr): Handle OACC_DECLARE.

gcc/
* gimple-pretty-print.c (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE. 
* gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DECLARE.
(is_gomple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.
* gimplify.c (gimplify_bind_expr): Prepend 'exit' stmt to cleanup.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): New builtin.
* omp-low.c (expand_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE and BUILTIN_GOACC_DECLARE.
(build_omp_regions_1): Handlde GF_OMP_TARGET_KIND_OACC_DECLARE.
(lower_omp_target): Handle GF_OMP_TARGET_KIND_OACC_DECLARE,
GOMP_MAP_DEVICE_RESIDENT and GOMP_MAP_LINK.
(make_gimple_omp_edges): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.

gcc/testsuite
* c-c++-common/goacc/declare-1.c: New test.
* c-c++-common/goacc/declare-2.c: Likewise.

include/
* gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_DEVICE_RESIDENT
and GOMP_MAP_LINK.

libgomp/

* libgomp.map (GOACC_2.0.1): Export GOACC_declare.
* oacc-parallel.c (GOACC_declare): New function.
* testsuite/libgomp.oacc-c-c++-common/declare-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/declare-5.c: Likewise.
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index ac11838..cd0cc27 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1207,6 +1207,7 @@ static const struct omp_pragma_def oacc_pragmas[] = {
   { "atomic", PRAGMA_OACC_ATOMIC },
   { "cache", PRAGMA_OACC_CACHE },
   { "data", PRAGMA_OACC_DATA },
+  { "declare", PRAGMA_OACC_DECLARE },
   { "enter", PRAGMA_

Re: [PATCH 10/9] ENABLE_CHECKING refactoring: remove remaining occurrences

2015-11-08 Thread Mikhail Maltsev

On 11/01/2015 11:34 PM, Bernhard Reutner-Fischer wrote:
> Mikhail,
> 
> On November 1, 2015 9:19:19 PM GMT+01:00, Mikhail Maltsev 
>  wrote:
>> This patch cleans up remaining bits related to ENABLE_CHECKING. After
>> applying
>> this patch (on top of part 9) we will no longer have any references to
>> ENABLE_CHECKING in the source code.
> 
> I don't remember if you sent size(1) comparison for the frontends and driver, 
> BTW. How bad is it?
> 
> TIA,
> 

Here are the results for r228786 "base" (bootstrapped with
--enable-checking=release) and for the same revision after applying
ENABLE_CHECKING-related patches "head":

   text   data  bss   dec
filename rev
cc1  base  21442165  70624  1343960  22856749
 head  21489220  70624  1343992  22903836
cc1plus  base  22772252  70712  1369624  24212588
 head  22820770  70712  1369656  24261138
f951 base  22230769  80136  1349592  23660497
 head  22277096  80136  1349624  23706856
lto1 base  20611962  69792  1342008  22023762
 head  20658721  69792  1342040  22070553
gcc  base   1606938  2840017056   1652394
 head   1608434  2840017056   1653890

Relative difference:

  text  data   bss   dec
filename
cc1   0.002195 0  0.24  0.002060
cc1plus   0.002131 0  0.23  0.002005
f951  0.002084 0  0.24  0.001959
lto1  0.002269 0  0.24  0.002125
gcc   0.000931 0  0.00  0.000905

-- 
Regards,
Mikhail Maltsev

[gomp4, committed] Simplify get_omp_data_i_param

2015-11-08 Thread Tom de Vries


Hi,

this patch simplifies get_omp_data_i_param by using ssa_default_def.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Simplify get_omp_data_i_param

2015-11-08  Tom de Vries  

	* tree-parloops.c: Include tree-dfa.h.
	(get_omp_data_i_param): Simplify using ssa_default_def.
---
 gcc/tree-parloops.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index b4039ad..1551eec 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params-enum.h"
 #include "tree-ssa-alias.h"
 #include "tree-eh.h"
+#include "tree-dfa.h"
 
 /* This pass tries to distribute iterations of loops into several threads.
The implementation is straightforward -- for each loop we test whether its
@@ -2607,15 +2608,7 @@ get_omp_data_i_param (void)
 {
   tree decl = DECL_ARGUMENTS (cfun->decl);
   gcc_assert (DECL_CHAIN (decl) == NULL_TREE);
-  for (unsigned int i = 0; i < num_ssa_names; ++i)
-{
-  tree name = ssa_name (i);
-  if (name != NULL_TREE
-	  && SSA_NAME_VAR (name) == decl)
-	return name;
-}
-
-  gcc_unreachable ();
+  return ssa_default_def (cfun, decl);
 }
 
 /* Try to initialize REDUCTION_LIST for code generation part.
-- 
1.9.1

[gomp4, committed] Insert IFN_GOACC_DIM_POS for oacc kernels call in parloops

2015-11-08 Thread Tom de Vries


Hi,

this patch postpones insertion of the IFN_GOACC_DIM_POS call until we 
actually need it, in parloops.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Insert IFN_GOACC_DIM_POS for oacc kernels call in parloops

2015-11-08  Tom de Vries  

	* omp-low.c (lower_omp_target): Remove insertion of IFN_GOACC_DIM_POS
	call in kernels region.
	* tree-parloops.c: Include gomp-constants.h.
	(oacc_entry_exit_single_gang): Remove gang_pos parameter.  Insert
	IFN_GOACC_DIM_POS call, if required.
	(oacc_entry_exit_ok): Remove gang_pos variable.
---
 gcc/omp-low.c   |  8 
 gcc/tree-parloops.c | 33 -
 2 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d263609..57ac2aa 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -16321,14 +16321,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
  false, NULL, NULL, &fork_seq, &join_seq, ctx);
 	}
 
-  if (is_oacc_kernels (ctx))
-	{
-	  tree arg = build_int_cst (integer_type_node, GOMP_DIM_GANG);
-	  gcall *gang_single
-	= gimple_build_call_internal (IFN_GOACC_DIM_POS, 1, arg);
-	  gimple_seq_add_stmt (&new_body, gang_single);
-	}
-
   if (offloaded)
 	gimple_seq_add_stmt (&new_body, gimple_build_omp_entry_end ());
 
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 1551eec..313637b 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params-enum.h"
 #include "tree-ssa-alias.h"
 #include "tree-eh.h"
+#include "gomp-constants.h"
 #include "tree-dfa.h"
 
 /* This pass tries to distribute iterations of loops into several threads.
@@ -2984,13 +2985,14 @@ oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs,
 }
 
 /* Find stores inside REGION_BBS and outside IN_LOOP_BBS, and guard them with
-   GANG_POS == 0, except when the stores are REDUCTION_STORES.  Return true
+   gang_pos == 0, except when the stores are REDUCTION_STORES.  Return true
if any changes were made.  */
 
 static bool
 oacc_entry_exit_single_gang (bitmap in_loop_bbs, vec region_bbs,
-			 bitmap reduction_stores, tree gang_pos)
+			 bitmap reduction_stores)
 {
+  tree gang_pos = NULL_TREE;
   bool changed = false;
 
   unsigned i;
@@ -3030,6 +3032,20 @@ oacc_entry_exit_single_gang (bitmap in_loop_bbs, vec region_bbs,
 
 	  changed = true;
 
+	  if (gang_pos == NULL_TREE)
+	{
+	  tree arg = build_int_cst (integer_type_node, GOMP_DIM_GANG);
+	  gcall *gang_single
+		= gimple_build_call_internal (IFN_GOACC_DIM_POS, 1, arg);
+	  gang_pos = make_ssa_name (integer_type_node);
+	  gimple_call_set_lhs (gang_single, gang_pos);
+	  gimple_stmt_iterator start
+		= gsi_start_bb (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	  tree vuse = ssa_default_def (cfun, gimple_vop (cfun));
+	  gimple_set_vuse (gang_single, vuse);
+	  gsi_insert_before (&start, gang_single, GSI_SAME_STMT);
+	}
+
 	  if (dump_file)
 	{
 	  fprintf (dump_file,
@@ -3093,14 +3109,6 @@ oacc_entry_exit_ok (struct loop *loop,
   vec region_bbs
 = get_all_dominated_blocks (CDI_DOMINATORS, ENTRY_BLOCK_PTR_FOR_FN (cfun));
 
-  gimple_stmt_iterator gsi
-= gsi_start_bb (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
-  gimple *stmt = gsi_stmt (gsi);
-  gcc_assert (gimple_call_internal_p (stmt)
-	  && gimple_call_internal_fn (stmt) == IFN_GOACC_DIM_POS);
-  tree gang_pos = make_ssa_name (integer_type_node);
-  gimple_call_set_lhs (stmt, gang_pos);
-
   bitmap in_loop_bbs = BITMAP_ALLOC (NULL);
   bitmap_clear (in_loop_bbs);
   for (unsigned int i = 0; i < loop->num_nodes; i++)
@@ -3112,9 +3120,8 @@ oacc_entry_exit_ok (struct loop *loop,
 
   if (res)
 {
-  bool changed
-	= oacc_entry_exit_single_gang (in_loop_bbs, region_bbs,
-   reduction_stores, gang_pos);
+  bool changed = oacc_entry_exit_single_gang (in_loop_bbs, region_bbs,
+		  reduction_stores);
   if (changed)
 	{
 	  free_dominance_info (CDI_DOMINATORS);
-- 
1.9.1

Re: [OpenACC] internal fn folding

2015-11-08 Thread Tom de Vries


On 08/11/15 15:04, Thomas Schwinge wrote:

Hi!

On Thu, 5 Nov 2015 10:48:02 -0500, Nathan Sidwell  wrote:

>On 11/04/15 05:02, Bernd Schmidt wrote:

> >On 11/02/2015 02:56 PM, Nathan Sidwell wrote:

> >>On 10/28/15 14:40, Nathan Sidwell wrote:

> >>>Richard,
> >>>this patch adds folding for the new GOACC_DIM_POS and GOACC_DIM_SIZE
> >>>internal
> >>>functions.  IIUC gimple_fold_call is the right place to add this.
> >>>
> >>>The size of a compute dimension is very often a compile-time
> >>>constant.  On the
> >>>host, in particular it's 1, which means we can deduce the POS must be
> >>>zero.

>This is what I committed, using the helpers I recently added. (I realized we 
can
>only get here for functions with the oacc attribute already set)
>--- gimple-fold.c   (revision 229809)
>+++ gimple-fold.c   (working copy)
>+/* Transform IFN_GOACC_DIM_SIZE and IFN_GOACC_DIM_POS internal
>+   function calls to constants, where possible.  */
>+
>+static tree
>+fold_internal_goacc_dim (const gimple *call)
>+{
>+  int axis = get_oacc_ifn_dim_arg (call);
>+  int size = get_oacc_fn_dim_size (current_function_decl, axis);
>+  bool is_pos = gimple_call_internal_fn (call) == IFN_GOACC_DIM_POS;
>+  tree result = NULL_TREE;
>+
>+  /* If the size is 1, or we only want the size and it is not dynamic,
>+ we know the answer.  */
>+  if (size == 1 || (!is_pos && size))
>+{
>+  tree type = TREE_TYPE (gimple_call_lhs (call));
>+  result = build_int_cst (type, size - is_pos);
>+}
>+
>+  return result;
>+}
>@@ -3106,6 +3129,10 @@ gimple_fold_call (gimple_stmt_iterator *
>  return true;
>}
>  break;
>+   case IFN_GOACC_DIM_SIZE:
>+   case IFN_GOACC_DIM_POS:
>+ result = fold_internal_goacc_dim (stmt);
>+ break;

Merging this into gomp-4_0-branch, we'd run into a lot of regressions
(for OpenACC kernels construct), for example:

 [...]/source-gcc/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c: In 
function 'main._omp_fn.0':
 [...]/source-gcc/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c:8:0: 
internal compiler error: Segmentation fault
 0xac387f crash_signal
 [...]/source-gcc/gcc/toplev.c:336
 0x9b8399 tree_int_cst_elt_check
 [...]/source-gcc/gcc/tree.h:3129
 0x9b8399 get_oacc_fn_dim_size(tree_node*, int)
 [...]/source-gcc/gcc/omp-low.c:12630
 0x86c530 fold_internal_goacc_dim
 [...]/source-gcc/gcc/gimple-fold.c:2917
 0x86c530 gimple_fold_call
 [...]/source-gcc/gcc/gimple-fold.c:3134
 0x86dfe4 fold_stmt_1
 [...]/source-gcc/gcc/gimple-fold.c:3702
 0xbf9953 execute
 [...]/source-gcc/gcc/tree-ssa-forwprop.c:2310

The dims in gcc/omp-low.c:get_oacc_fn_dim_size don't have values set, so
the "TREE_INT_CST_LOW (TREE_VALUE (dims))" fails.  I have not analyzed
what exactly is going wrong; I just figured out that it's related to the
IFN_GOACC_DIM_POS without LHS usage that Tom introduced in
gomp-4_0-branch r228735 for OpenACC kernels to "neuter gang-single code
in gang-redundant mode",
.


I've just removed that ( 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00847.html ).


Thanks,
- Tom

Re: Combined constructs' clause splitting

2015-11-08 Thread Tom de Vries


On 07/11/15 12:45, Thomas Schwinge wrote:

Hi!

On Fri, 6 Nov 2015 15:31:23 -0800, Cesar Philippidis  
wrote:

I've applied this patch to gomp-4_0-branch which backports most of my
front end changes from trunk. Note that I found a regression while
testing, which is also present in trunk. It looks like
kernels-acc-loop-reduction.c is failing because I'm incorrectly
propagating the reduction variable to both to the kernels and loop
constructs for combined 'acc kernels loop'. The problem here is that
kernels don't support the reduction clause. I'll fix that next week.


Always need to consider both what the specification allows -- and thus
what the front ends accept/refuse -- as well as what we might do
differently, internally in later processing stages.  I have not analyzed
whether it makes sense to have the OMP_CLAUSE_REDUCTION of a combined
"kernels loop reduction([...])" construct be attached to the outer
OACC_KERNELS or inner OACC_LOOP, or duplicated for both.

Tom, if you need a solution for that right now/want to restore the
previous behavior (attached to innter OACC_LOOP only), here's what you
should try: in gcc/c-family/c-omp.c:c_oacc_split_loop_clauses remove the
special handling for OMP_CLAUSE_REDUCTION, and move it to "Loop clauses"
section,


Committed to gomp-4_0-branch, as attached.

Thanks,
- Tom


and in
gcc/fortran/trans-openmp.c:gfc_trans_oacc_combined_directive I don't see
reduction clauses being handled, hmm, maybe the Fortran front end is
doing that differently?







Ignore reduction clause on kernels directive

2015-11-08  Tom de Vries  

	* c-omp.c (c_oacc_split_loop_clauses): Don't copy OMP_CLAUSE_REDUCTION,
	classify as loop clause.
---
 gcc/c-family/c-omp.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 8b30844..907d329 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -867,7 +867,7 @@ c_omp_check_loop_iv_exprs (location_t stmt_loc, tree declv, tree decl,
 tree
 c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 {
-  tree next, loop_clauses, t;
+  tree next, loop_clauses;
 
   loop_clauses = *not_loop_clauses = NULL_TREE;
   for (; clauses ; clauses = next)
@@ -886,16 +886,11 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_PRIVATE:
+	case OMP_CLAUSE_REDUCTION:
 	  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
 	  loop_clauses = clauses;
 	  break;
 
-	  /* Reductions belong in both constructs.  */
-	case OMP_CLAUSE_REDUCTION:
-	  t = copy_node (clauses);
-	  OMP_CLAUSE_CHAIN (t) = loop_clauses;
-	  loop_clauses = t;
-
 	  /* FIXME: device_type */
 
 	  /* Parallel/kernels clauses.  */
-- 
1.9.1

[gomp4, committed] Add pass_dce in oacc kernels pass group

2015-11-08 Thread Tom de Vries


Hi,

now that I've moved insertion of the IFN_GOACC_DIM_POS call to parloops 
( https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00847.html ) , there's 
no need anymore for the dead_load_p hack in parloops.


This patch adds a pass_dce invocation in the kernels pass group.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Add pass_dce in oacc kernels pass group

2015-11-08  Tom de Vries  

	* passes.def: Add pass_dce in oacc kernels pass group.
	* tree-parloops.c (dead_load_p): Remove.
	(ref_conflicts_with_region, oacc_entry_exit_ok_1): Remove handling of
	dead loads.

	* g++.dg/tree-ssa/copyprop-1.C: Update for extra pass_dce in pass list.
	* gcc.dg/pr23911.c: Same.
	* gcc.dg/tree-ssa/20030709-2.c: Same.
	* gcc.dg/tree-ssa/20030731-2.c: Same.
	* gcc.dg/tree-ssa/20040729-1.c: Same.
	* gcc.dg/tree-ssa/cfgcleanup-1.c: Same.
	* gcc.dg/tree-ssa/loop-36.c: Same.
	* gcc.dg/tree-ssa/pr21086.c: Same.
	* gcc.dg/tree-ssa/ssa-dce-1.c: Same.
	* gcc.dg/tree-ssa/ssa-dce-2.c: Same.
	* gcc.dg/vect/pr26359.c: Same.
---
 gcc/passes.def   |  1 +
 gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C   |  4 +--
 gcc/testsuite/gcc.dg/pr23911.c   |  6 ++---
 gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c   |  8 +++---
 gcc/testsuite/gcc.dg/tree-ssa/20030731-2.c   |  4 +--
 gcc/testsuite/gcc.dg/tree-ssa/20040729-1.c   |  4 +--
 gcc/testsuite/gcc.dg/tree-ssa/cfgcleanup-1.c |  4 +--
 gcc/testsuite/gcc.dg/tree-ssa/loop-36.c  |  4 +--
 gcc/testsuite/gcc.dg/tree-ssa/pr21086.c  |  6 ++---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-1.c|  4 +--
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-2.c|  4 +--
 gcc/testsuite/gcc.dg/vect/pr26359.c  |  4 +--
 gcc/tree-parloops.c  | 39 
 13 files changed, 27 insertions(+), 65 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 5683bb7..2420b3b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -101,6 +101,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_scev_cprop);
 	  NEXT_PASS (pass_tree_loop_done);
 	  NEXT_PASS (pass_dominator_oacc_kernels);
+	  NEXT_PASS (pass_dce);
 	  NEXT_PASS (pass_tree_loop_init);
   	  NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 	  NEXT_PASS (pass_expand_omp_ssa);
diff --git a/gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C b/gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C
index 5ff289c..34a9f7b 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-dce2" } */
+/* { dg-options "-O -fdump-tree-dce3" } */
 
 /* Verify that we can eliminate the useless conversions to/from
const qualified pointer types
@@ -27,4 +27,4 @@ int foo(Object&o)
 
 /* Remaining should be two loads.  */
 
-/* { dg-final { scan-tree-dump-times " = \[^\n\]*;" 2 "dce2" } } */
+/* { dg-final { scan-tree-dump-times " = \[^\n\]*;" 2 "dce3" } } */
diff --git a/gcc/testsuite/gcc.dg/pr23911.c b/gcc/testsuite/gcc.dg/pr23911.c
index 2c27397..3fa0412 100644
--- a/gcc/testsuite/gcc.dg/pr23911.c
+++ b/gcc/testsuite/gcc.dg/pr23911.c
@@ -1,7 +1,7 @@
 /* This was a missed optimization in tree constant propagation
that CSE would catch later on.  */
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-dce2" } */
+/* { dg-options "-O -fdump-tree-dce3" } */
 
 double _Complex *a; 
 static const double _Complex b[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; 
@@ -16,5 +16,5 @@ test (void)
 
 /* After DCE2 which runs after FRE, the expressions should be fully
constant folded.  There should be no loads from b left.  */
-/* { dg-final { scan-tree-dump-times "__complex__ \\\(1.0e\\\+0, 0.0\\\)" 2 "dce2" } } */
-/* { dg-final { scan-tree-dump-times "= b" 0 "dce2" } } */
+/* { dg-final { scan-tree-dump-times "__complex__ \\\(1.0e\\\+0, 0.0\\\)" 2 "dce3" } } */
+/* { dg-final { scan-tree-dump-times "= b" 0 "dce3" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
index d4f42f9..5009cd6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-dce2" } */
+/* { dg-options "-O -fdump-tree-dce3" } */
   
 struct rtx_def;
 typedef struct rtx_def *rtx;
@@ -42,13 +42,13 @@ get_alias_set (t)
 
 /* There should be precisely one load of ->decl.rtl.  If there is
more than, then the dominator optimizations failed.  */
-/* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "dce2"} } */
+/* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "dce3"} } */
   
 /* There should be no loads of .rtmem since the complex return statement
is just "return 0".  */
-/* { dg-final { scan-tree-dump-times ".rtmem" 0 "dce2"} } */
+/* { dg-final { scan-tree-dump-times ".rtmem" 0 "dce3"} } */
   
 /* There should be one IF statement (the complex return statement should
collapse down to a simp

Re: [Patch, fortran] PR68196 [4.9/5/6 Regression] ICE on function result with procedure pointer component

2015-11-08 Thread Paul Richard Thomas

Committed as revision 229954.

Thanks for checking it out.

Paul

On 7 November 2015 at 16:57, Steve Kargl
 wrote:
> On Wed, Nov 04, 2015 at 04:03:10PM +0100, Paul Richard Thomas wrote:
>>
>> 2015-11-04  Paul Thomas  
>>
>> PR fortran/68196
>> * class.c (has_finalizer_component): Prevent infinite recursion
>> through this function if the derived type and that of its
>> component are the same.
>> * trans-types.c (gfc_get_derived_type): Do the same for proc
>> pointers by ignoring the explicit interface for the component.
>>
>> PR fortran/66465
>> * check.c (same_type_check): If either of the expressions is
>> BT_PROCEDURE, use the typespec from the symbol, rather than the
>> expression.
>>
>> 2015-11-04  Paul Thomas  
>>
>> PR fortran/68196
>> * gfortran.dg/proc_ptr_47.f90: New test.
>>
>> PR fortran/66465
>> * gfortran.dg/pr66465.f90: New test.
>
> OK.  Thanks for the patch.
>
> --
> steve



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx

Re: [PATCH] PR67518 and PR53852 -- add testcase.

2015-11-08 Thread Paul Richard Thomas

Dear Joost,

These cause regressions on my tree:
f951: sorry, unimplemented: Graphite loop optimizations cannot be used
(ISL is not available)(-fgraphite, -fgraphite-identity, -floop-block,
-floop-interchange, -floop-strip-mine, -floop-parallelize-all,
-floop-unroll-and-jam, and -ftree-loop-linear)

I suppose that I can deal with it. However, is there some way to
detect the presence of ISL? I'll try and figure out some dejagnu-ery
to pass the tests if this message comes up.

Cheers

Paul

On 6 November 2015 at 10:45, VandeVondele  Joost
 wrote:
> Thanks Paul. I believe PR53852 won't be fixed on 4.9/5 as it seems to depend 
> on the recent graphite cleanup work and recent isl. As such I'll commit to 
> trunk only.



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx

Re: [PATCH] PR67518 and PR53852 -- add testcase.

2015-11-08 Thread Andre Vehreschild

Hi Paul,

I fell for the same issue. You need a more recent isl library then
0.14.2. I installed 0.12.something when I started gfortran hacking and
didn't upgrade since. 

You can get the ISL lib from:

ftp://gcc.gnu.org/pub/gcc/infrastructure/

or from your distribution (Fedora 21 is shipping 0.14.3, which is
sufficient to run the test.)

Regards,
Andre

On Sun, 8 Nov 2015 18:02:15 +0100
Paul Richard Thomas  wrote:

> Dear Joost,
> 
> These cause regressions on my tree:
> f951: sorry, unimplemented: Graphite loop optimizations cannot be used
> (ISL is not available)(-fgraphite, -fgraphite-identity, -floop-block,
> -floop-interchange, -floop-strip-mine, -floop-parallelize-all,
> -floop-unroll-and-jam, and -ftree-loop-linear)
> 
> I suppose that I can deal with it. However, is there some way to
> detect the presence of ISL? I'll try and figure out some dejagnu-ery
> to pass the tests if this message comes up.
> 
> Cheers
> 
> Paul
> 
> On 6 November 2015 at 10:45, VandeVondele  Joost
>  wrote:
> > Thanks Paul. I believe PR53852 won't be fixed on 4.9/5 as it seems to 
> > depend on the recent graphite cleanup work and recent isl. As such I'll 
> > commit to trunk only.
> 
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de

Re: libgo patch committed: Update to Go 1.5 release

2015-11-08 Thread Rainer Orth

Ian Lance Taylor  writes:

> On Fri, Nov 6, 2015 at 5:01 AM, Rainer Orth  
> wrote:
>> Ian Lance Taylor  writes:
>>
>>> I have committed a patch to libgo to update it to the Go 1.5 release.
>>>
>>> As usual for libgo updates, the actual patch is too large to attach to
>>> this e-mail message.  I've attached the changes to the gccgo-specific
>>> files.
>>>
>>> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
>>> to mainline.
>>>
>>> This may cause trouble on non-GNU/Linux operating systems.  Please let
>>> me know about any problems you encounter.
>>
>> It does indeed (first tried on i386-pc-solaris2.10):
>>
>> *
>>
>> /vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c: In function 
>> '__go_ioctl':
>> /vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c:63:10: error: 
>> implicit declaration of function 'ioctl' 
>> [-Werror=implicit-function-declaration]
>>return ioctl (d, request, arg);
>>   ^
>>
>>   Needs , the following patch works:
>>
>>
>>
>> *
>>
>> /vol/gcc/src/hg/trunk/local/libgo/go/syscall/exec_bsd.go:107:7: error: 
>> incompatible types in assignment (cannot use type int as type Pid_t)
>> r1 = raw_getpid()
>>^
>>
>> I can cast to Pid_t and this works.  The underlying error to me seems
>> that raw_getpid the in the generated libcalls.go is wrong, casting
>> c_getpid return value to int while pid_t can be long.
>>
>> *
>>
>> /vol/gcc/src/hg/trunk/local/libgo/go/net/hook_cloexec.go:13:70: error: 
>> reference to undefined identifier 'syscall.Accept4'
>>   accept4Func func(int, int) (int, syscall.Sockaddr, error) = syscall.Accept4
>>   ^
>>
>> No accept4 on Solaris (and certainly other systems, thence configure
>> test), but used unconditionally.
>>
>> *
>>
>> /vol/gcc/src/hg/trunk/local/libgo/go/net/sendfile_solaris.go:78:22: error: 
>> reference to undefined identifier 'syscall.Sendfile'
>>n, err1 := syscall.Sendfile(dst, src, &pos1, n)
>>   ^
>>
>> Only in go/syscall/libcall_linux.go!?
>>
>> *
>>
>> /vol/gcc/src/hg/trunk/local/libgo/go/net/tcpsockopt_solaris.go:34:103: 
>> error: reference to undefined identifier 'syscall.TCP_KEEPALIVE_THRESHOLD'
>>   return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(fd.sysfd, 
>> syscall.IPPROTO_TCP, syscall.TCP_KEEPALIVE_THRESHOLD, msecs))
>>
>>^
>>
>> Not in Solaris 10, only Solaris 11 and 12 have it.
>
> Thanks for the notes.  I committed this patch to address these problems.

Worked like a charm, thanks.

There were two remaining problems:

* Before Solaris 12, sendfile only lives in libsendfile.  This lead to
  link failures in gotools.

* Solaris 12 introduced a couple more types that use _in6_addr_t, which
  are filtered out by mksysinfo.sh, leading to compilation failues.

The following patch addresses both issues.  Solaris 10 and 11 bootstraps
have completed, a Solaris 12 bootstrap is still running make check.

diff --git a/libgo/configure.ac b/libgo/configure.ac
--- a/libgo/configure.ac
+++ b/libgo/configure.ac
@@ -482,6 +482,9 @@ AC_CACHE_CHECK([for socket libraries], l
 		 [AC_CHECK_LIB(nsl, main,
 		 	[libgo_cv_lib_sockets="$libgo_cv_lib_sockets -lnsl"])])
unset ac_cv_func_gethostbyname
+   AC_CHECK_FUNC(sendfile, ,
+		 [AC_CHECK_LIB(sendfile, main,
+		 	[libgo_cv_lib_sockets="$libgo_cv_lib_sockets -lsendfile"])])
LIBS=$libgo_old_libs
 ])
 NET_LIBS="$libgo_cv_lib_sockets"
diff --git a/libgo/mksysinfo.sh b/libgo/mksysinfo.sh
--- a/libgo/mksysinfo.sh
+++ b/libgo/mksysinfo.sh
@@ -1488,4 +1488,24 @@ grep '^type _zone_net_addr_t ' gen-sysin
 sed -e 's/_in6_addr/[16]byte/' \
 >> ${OUT}
 
+# The Solaris 12 _flow_arp_desc_t struct.
+grep '^type _flow_arp_desc_t ' gen-sysinfo.go | \
+sed -e 's/_in6_addr_t/[16]byte/g' \
+>> ${OUT}
+
+# The Solaris 12 _flow_l3_desc_t struct.
+grep '^type _flow_l3_desc_t ' gen-sysinfo.go | \
+sed -e 's/_in6_addr_t/[16]byte/g' \
+>> ${OUT}
+
+# The Solaris 12 _mac_ipaddr_t struct.
+grep '^type _mac_ipaddr_t ' gen-sysinfo.go | \
+sed -e 's/_in6_addr_t/[16]byte/g' \
+>> ${OUT}
+
+# The Solaris 12 _mactun_info_t struct.
+grep '^type _mactun_info_t ' gen-sysinfo.go | \
+sed -e 's/_in6_addr_t/[16]byte/g' \
+>> ${OUT}
+
 exit $?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-08 Thread Martin Sebor


On 11/06/2015 05:50 AM, Andreas Schwab wrote:

I see this failure on m68k:

FAIL: g++.dg/warn/Wplacement-new-size.C  -std=gnu++11 (test for excess errors)
Excess errors:
/daten/aranym/gcc/gcc-20151106/gcc/testsuite/g++.dg/warn/Wplacement-new-size.C:189:19:
 warning: placement new constructing an object of type 'int' and size '4' in a 
region of type 'char [4]' and size '0' [-Wplacement-new]
/daten/aranym/gcc/gcc-20151106/gcc/testsuite/g++.dg/warn/Wplacement-new-size.C:191:19:
 warning: placement new constructing an object of type 'int' and size '4' in a 
region of type 'char [4]' and size '0' [-Wplacement-new]
/daten/aranym/gcc/gcc-20151106/gcc/testsuite/g++.dg/warn/Wplacement-new-size.C:194:19:
 warning: placement new constructing an object of type 'int' and size '4' in a 
region of type 'char [4]' and size '0' [-Wplacement-new]
/daten/aranym/gcc/gcc-20151106/gcc/testsuite/g++.dg/warn/Wplacement-new-size.C:198:19:
 warning: placement new constructing an object of type 'int' and size '4' in a 
region of type 'char [4]' and size '0' [-Wplacement-new]

That appears to be a 32-bit problem, the test also fails here on x86-64
with -m32 
or here on powerpc



This should be fixed now via r229959 (tested on x86_64 with -m32).

The problem was caused by assuming that the POINTER_PLUS_EXPR offset
which is stored as sizetype, an unsigned 32-bit type in ILP32, can
be "extracted" as an unsigned HOST_WIDE_INT (a 64-bit type when
the host compiler is LP64), and converted to signed to obtain the
original negative offset.

Martin

Re: [patch 0/6] scalar-storage-order merge (2)

2015-11-08 Thread Eric Botcazou

> See https://gcc.gnu.org/ml/gcc/2015-06/msg00126.html for the proposal.

The branch has been merged into mainline but without C++ support.  This means 
that the scalar_storage_order attribute is supported in C, Objective-C and Ada 
only for the time being; the branch will remain open to host the C++ support.
I'm also going to submit a small patch to add minimal debug info support.

The result of the merge was bootstrapped/regtested on x86/Linux, x86/Solaris, 
x86-64/Linux, PowerPC/Linux, IA-64/Linux and SPARC/Solaris.

Bugzilla tickets pertaining to the attribute should be tagged with [sso] and 
ebotca...@gcc.gnu.org be added to the CC field.

-- 
Eric Botcazou

RE: [PATCH] PR67518 and PR53852 -- add testcase.

2015-11-08 Thread VandeVondele Joost

I see, graphite is optional. Trying to find the dejagnu-ery, I think the 
obvious thing is to move the tests from gfortran.dg/ to gfortran.dg/graphite/ 
I'll do that under the obvious rule, unless this get's preapproved before that 
...

RE: [PATCH] PR67518 and PR53852 -- add testcase.

2015-11-08 Thread VandeVondele Joost

r229967

2015-11-08  Joost VandeVondele 

* gfortran.dg/PR67518.f90: move from here...
* gfortran.dg/graphite/PR67518.f90: to here.
* gfortran.dg/PR53852.f90: move from here...
* gfortran.dg/graphite/PR53852.f90: to here.

Enable pointer TBAA for LTO

2015-11-08 Thread Jan Hubicka

Hi,
this patch adds basic TBAA for pointers to LTO.  The basic scheme is simple;
because TYPE_CANONICAL is not really needed by get_alias_set, we completely
drop the caluclation of these (which also saves about 50% canonical type hash
searches) and update get_alias_set to not punt on pointers with
TYPE_STRUCTURAL_EQUALITY.

The patch makes quite nice improvements (32%) on number of disambiguations on
dealII (that is my random C++ testbed):

Before:
[WPA] GIMPLE canonical type table: size 16381, 817 elements, 35453 searches, 91 
collisions (ratio: 0.002567)
[WPA] GIMPLE canonical type pointer-map: 817 elements, 15570 searches   

after:
[WPA] GIMPLE canonical type table: size 16381, 822 elements, 14863 searches, 
114 collisions (ratio: 0.007670)
[WPA] GIMPLE canonical type pointer-map: 822 elements, 12663 searches   

The number of disambiguations goes 1713472->2331078 (32%)
and number of queries goes 3387753->3669698 (8%)
We get code size growth 677527->701782 (3%)

Also a query is disambiguated 63% of the time instead of 50% we had before.

Clearly there are many areas for improvements (since functions are
TYPE_STRUCTURAL_EQUALITY in LTO we ptr_type_node alias set on them), but that
M
can wait for next stage1.

lto-bootstrapped/regtested x86_64-linux and also used it in my tree for quite
a while, so the patch was tested on Firefox and other applications.

OK?
Honza

* alias.c (get_alias_set): Do structural equality for pointer types;
drop LTO specific path.
* lto.c (iterative_hash_canonical_type): Do not compute TYPE_CANONICAL
for pointer types.
(gimple_register_canonical_type_1): Likewise.
(gimple_register_canonical_type): Likewise.
(lto_register_canonical_types): Do not clear canonical types of pointer
types.
Index: lto/lto.c
===
--- lto/lto.c   (revision 229968)
+++ lto/lto.c   (working copy)
@@ -396,8 +396,13 @@ iterative_hash_canonical_type (tree type
 
   /* All type variants have same TYPE_CANONICAL.  */
   type = TYPE_MAIN_VARIANT (type);
+
+  /* We do not compute TYPE_CANONICAl of POINTER_TYPE becuase the aliasing
+ code never use it anyway.  */
+  if (POINTER_TYPE_P (type))
+v = hash_canonical_type (type);
   /* An already processed type.  */
-  if (TYPE_CANONICAL (type))
+  else if (TYPE_CANONICAL (type))
 {
   type = TYPE_CANONICAL (type);
   v = gimple_canonical_type_hash (type);
@@ -445,7 +450,9 @@ gimple_register_canonical_type_1 (tree t
 {
   void **slot;
 
-  gcc_checking_assert (TYPE_P (t) && !TYPE_CANONICAL (t));
+  gcc_checking_assert (TYPE_P (t) && !TYPE_CANONICAL (t)
+  && type_with_alias_set_p (t)
+  && TREE_CODE (t) != POINTER_TYPE);
 
   slot = htab_find_slot_with_hash (gimple_canonical_types, t, hash, INSERT);
   if (*slot)
@@ -478,7 +485,7 @@ gimple_register_canonical_type_1 (tree t
 static void
 gimple_register_canonical_type (tree t)
 {
-  if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t))
+  if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t) || POINTER_TYPE_P (t))
 return;
 
   /* Canonical types are same among all complete variants.  */
@@ -498,14 +505,13 @@ static void
 lto_register_canonical_types (tree node, bool first_p)
 {
   if (!node
-  || !TYPE_P (node))
+  || !TYPE_P (node) || POINTER_TYPE_P (node))
 return;
 
   if (first_p)
 TYPE_CANONICAL (node) = NULL_TREE;
 
-  if (POINTER_TYPE_P (node)
-  || TREE_CODE (node) == COMPLEX_TYPE
+  if (TREE_CODE (node) == COMPLEX_TYPE
   || TREE_CODE (node) == ARRAY_TYPE)
 lto_register_canonical_types (TREE_TYPE (node), first_p);
 
Index: tree.c
===
--- tree.c  (revision 229968)
+++ tree.c  (working copy)
@@ -13198,6 +13198,7 @@ gimple_canonical_types_compatible_p (con
   /* If the types have been previously registered and found equal
  they still are.  */
   if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t2)
+  && !POINTER_TYPE_P (t1) && !POINTER_TYPE_P (t2)
   && trust_type_canonical)
 return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
 
Index: alias.c
===
--- alias.c (revision 229968)
+++ alias.c (working copy)
@@ -869,13 +874,19 @@ get_alias_set (tree t)
   set = lang_hooks.get_alias_set (t);
   if (set != -1)
return set;
-  return 0;
+  /* LTO frontend does not assign canonical types to pointers (which we
+ignore anyway) and we compute them.  The following path may be
+probably enabled for non-LTO, too, and it may improve TBAA for
+pointers to types with structural equality.  */
+  if (!in_lto_p || !POINTER_TYPE_P (t))
+return 0;
+}
+  else
+{
+  t = TYPE_CANONICAL (t);
+  /* The canonical type should not require structural equality checks.  */
+  gcc_checkin

Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-08 Thread Segher Boessenkool

On Fri, Nov 06, 2015 at 04:00:08PM -0600, Segher Boessenkool wrote:
> This patch stops combine from generating widening muls of anything else
> but registers (immediates, memory, ...).  This probably is a reasonable
> tradeoff for all targets, even those (if any) that have such insns.
> 
> > >I'll let you put it through it's paces on your setup :)
> 
> > I'll let Segher give the final yes/no on this, but it generally looks 
> > good to me.
> 
> It looks okay to me too.  Testing now, combine patches have the tendency
> to do unforeseen things on other targets ;-)

Testing shows it makes a difference only very rarely.  For many targets
it makes no difference, for a few it is a small win.  For 32-bit x86 it
creates slightly bigger code.

I think it looks good, but let's wait to hear Uros' opinion.


Segher

[gomp4,committed] Revert "Add pass_dominator::jump_threading_p ()"

2015-11-08 Thread Tom de Vries


Hi,

this patch reverts the patch that added 
pass_dominator::jump_threading_p. We no longer require this 
functionality for the oacc kernels pass group.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Revert "Add pass_dominator::jump_threading_p ()"

2015-11-08  Tom de Vries  

	revert:
	2015-10-12  Tom de Vries  

	* tree-ssa-dom.c (dom_opt_dom_walker::dom_opt_dom_walker): Add
	jump_threading_p parameters.
	(dom_opt_dom_walker::m_jump_threading_p): New private var.
	(pass_dominator::jump_threading_p): New protected virtual function.
	(pass_dominator::execute): Handle jump_threading_p.
	(dom_opt_dom_walker::before_dom_children)
	(dom_opt_dom_walker::after_dom_children): Handle m_jump_threading_p.
---
 gcc/tree-ssa-dom.c | 112 +
 1 file changed, 45 insertions(+), 67 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 44253bf..10110d7 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -493,14 +493,11 @@ class dom_opt_dom_walker : public dom_walker
 public:
   dom_opt_dom_walker (cdi_direction direction,
 		  class const_and_copies *const_and_copies,
-		  class avail_exprs_stack *avail_exprs_stack,
-		  bool jump_threading_p)
+		  class avail_exprs_stack *avail_exprs_stack)
 : dom_walker (direction),
   m_const_and_copies (const_and_copies),
   m_avail_exprs_stack (avail_exprs_stack),
-  m_dummy_cond (NULL),
-  m_jump_threading_p (jump_threading_p)
-  {}
+  m_dummy_cond (NULL) {}
 
   virtual void before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
@@ -513,7 +510,6 @@ private:
   class avail_exprs_stack *m_avail_exprs_stack;
 
   gcond *m_dummy_cond;
-  bool m_jump_threading_p;
 };
 
 /* Jump threading, redundancy elimination and const/copy propagation.
@@ -533,8 +529,6 @@ class dominator_base : public gimple_opt_pass
 
   unsigned int execute (function *);
 
-  /* Return true if pass should perform jump threading.  */
-  virtual bool jump_threading_p (void) { return true; }
 }; // class dominator_base
 
 const pass_data pass_data_dominator =
@@ -594,29 +588,25 @@ dominator_base::execute (function *fun)
   /* Initialize the value-handle array.  */
   threadedge_initialize_values ();
 
-  if (jump_threading_p ())
-{
-  /* We need accurate information regarding back edges in the CFG
-	 for jump threading; this may include back edges that are not part of
-	 a single loop.  */
-  mark_dfs_back_edges ();
-
-  /* We want to create the edge info structures before the dominator walk
-	 so that they'll be in place for the jump threader, particularly when
-	 threading through a join block.
-
-	 The conditions will be lazily updated with global equivalences as
-	 we reach them during the dominator walk.  */
-  basic_block bb;
-  FOR_EACH_BB_FN (bb, fun)
-	record_edge_info (bb);
-}
+  /* We need accurate information regarding back edges in the CFG
+ for jump threading; this may include back edges that are not part of
+ a single loop.  */
+  mark_dfs_back_edges ();
+
+  /* We want to create the edge info structures before the dominator walk
+ so that they'll be in place for the jump threader, particularly when
+ threading through a join block.
+
+ The conditions will be lazily updated with global equivalences as
+ we reach them during the dominator walk.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+record_edge_info (bb);
 
   /* Recursively walk the dominator tree optimizing statements.  */
   dom_opt_dom_walker walker (CDI_DOMINATORS,
 			 const_and_copies,
-			 avail_exprs_stack,
-			 jump_threading_p ());
+			 avail_exprs_stack);
   walker.walk (fun->cfg->x_entry_block_ptr);
 
   {
@@ -636,13 +626,10 @@ dominator_base::execute (function *fun)
  duplication and CFG manipulation.  */
   update_ssa (TODO_update_ssa);
 
-  if (jump_threading_p ())
-{
-  free_all_edge_infos ();
+  free_all_edge_infos ();
 
-  /* Thread jumps, creating duplicate blocks as needed.  */
-  cfg_altered |= thread_through_all_blocks (first_pass_instance);
-}
+  /* Thread jumps, creating duplicate blocks as needed.  */
+  cfg_altered |= thread_through_all_blocks (first_pass_instance);
 
   if (cfg_altered)
 free_dominance_info (CDI_DOMINATORS);
@@ -749,11 +736,6 @@ public:
 
  private:
   bitmap m_regions;
-
-protected:
-  /* dominator_base methods: */
-  /* Return true if pass should perform jump threading.  */
-  virtual bool jump_threading_p (void) { return false; }
 }; // class pass_dominator_oacc_kernels
 
 } // anon namespace
@@ -1375,8 +1357,7 @@ dom_opt_dom_walker::before_dom_children (basic_block bb)
 optimize_stmt (bb, gsi, m_const_and_copies, m_avail_exprs_stack);
 
   /* Now prepare to process dominated blocks.  */
-  if (m_jump_threading_p)
-record_edge_info (bb);
+  record_edge_info (bb);
   cprop_into_successor_phis (bb, m_const_and_copies);
 }
 
@@ -1389,38 +1370

[gomp4, committed] Add dominator_base::may_peel_loop_headers_p

2015-11-08 Thread Tom de Vries


Hi,

This patch eliminates the first_pass_instance test in 
dominator_base::execute, and introduces a virtual function 
dominator_base::may_peel_loop_headers_p that is tested instead.  This 
allows us to choose may_peel_loop_headers_p == false for 
pass_dominator_oacc_kernels.


Thanks,
- Tom
Add dominator_base::may_peel_loop_headers_p

2015-11-08  Tom de Vries  

	* tree-ssa-dom.c (dominator_base::may_peel_loop_headers_p)
	(pass_dominator::may_peel_loop_headers_p)
	(pass_dominator_oacc_kernels::may_peel_loop_headers_p):: New function.
	(dominator_base::execute): Use may_peel_loop_headers_p.
---
 gcc/tree-ssa-dom.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 10110d7..a2abb8e 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -529,6 +529,8 @@ class dominator_base : public gimple_opt_pass
 
   unsigned int execute (function *);
 
+ protected:
+  virtual bool may_peel_loop_headers_p (void) { return true; }
 }; // class dominator_base
 
 const pass_data pass_data_dominator =
@@ -554,6 +556,9 @@ public:
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_dominator (m_ctxt); }
   virtual bool gate (function *) { return flag_tree_dom != 0; }
+
+ protected:
+  virtual bool may_peel_loop_headers_p (void) { return first_pass_instance; }
 }; // class pass_dominator
 
 unsigned int
@@ -629,7 +634,7 @@ dominator_base::execute (function *fun)
   free_all_edge_infos ();
 
   /* Thread jumps, creating duplicate blocks as needed.  */
-  cfg_altered |= thread_through_all_blocks (first_pass_instance);
+  cfg_altered |= thread_through_all_blocks (may_peel_loop_headers_p ());
 
   if (cfg_altered)
 free_dominance_info (CDI_DOMINATORS);
@@ -736,6 +741,9 @@ public:
 
  private:
   bitmap m_regions;
+
+ protected:
+  virtual bool may_peel_loop_headers_p (void) { return false; }
 }; // class pass_dominator_oacc_kernels
 
 } // anon namespace
-- 
1.9.1

[gomp4, committed] Remove omp-low.h include

2015-11-08 Thread Tom de Vries


Hi,

this patch removes a superfluous include in tree-ssa-dom.c.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Remove omp-low.h include

2015-11-08  Tom de Vries  

	* tree-ssa-dom.c: Remove omp-low.h include.
---
 gcc/tree-ssa-dom.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index a2abb8e..c6e5744 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -44,7 +44,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dom.h"
 #include "gimplify.h"
 #include "tree-cfgcleanup.h"
-#include "omp-low.h"
 
 /* This file implements optimizations on the dominator tree.  */
 
-- 
1.9.1

[gomp4, committed] Remove superfluous pass_expand_omp_ssa

2015-11-08 Thread Tom de Vries


Hi,

This patch removes a superfluous pass_expand_omp_ssa. It used to trigger 
if the kernels pass group didn't run. But now that the kernels region is 
split off at the first omp-expand, that isn't necessary anymore.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Remove superfluous pass_expand_omp_ssa

2015-11-08  Tom de Vries  

	* passes.def: Remove superfluous pass_expand_omp_ssa.
---
 gcc/passes.def | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 2420b3b..a7fd9a7 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -122,7 +122,6 @@ along with GCC; see the file COPYING3.  If not see
 	  late.  */
 	  NEXT_PASS (pass_split_functions);
   POP_INSERT_PASSES ()
-  NEXT_PASS (pass_expand_omp_ssa);
   NEXT_PASS (pass_release_ssa_names);
   NEXT_PASS (pass_rebuild_cgraph_edges);
   NEXT_PASS (pass_inline_parameters);
-- 
1.9.1

[PATCH] PR fortran/68053 -- Reduce initialization expression to constant value

2015-11-08 Thread Steve Kargl

The attached patch has been built and regression tested
on i386-*-freebsd and x86_64-*-freebsd.  If an array
index in an initialization expression is an array element
from an array named constant, the array index needs to be
reduced.  This patch causes the reduction to occur.
OK to commit?

2015-11-08  Steven g. Kargl  

PR fortran/68053
* decl.c (add_init_expr_to_sym):  Try to reduce initialization 
expression
before testing for a constant value.

2015-11-08  Steven g. Kargl  

PR fortran/68053
* gfortran.dg/pr68053.f90: New test.
-- 
Steve

Re: [PATCH] PR fortran/68053 -- Reduce initialization expression to constant value

2015-11-08 Thread Steve Kargl

On Sun, Nov 08, 2015 at 02:35:58PM -0800, Steve Kargl wrote:
> The attached patch has been built and regression tested
> on i386-*-freebsd and x86_64-*-freebsd.  If an array
> index in an initialization expression is an array element
> from an array named constant, the array index needs to be
> reduced.  This patch causes the reduction to occur.
> OK to commit?
> 
> 2015-11-08  Steven g. Kargl  
> 
>   PR fortran/68053
>   * decl.c (add_init_expr_to_sym):  Try to reduce initialization 
> expression
>   before testing for a constant value.
> 
> 2015-11-08  Steven g. Kargl  
> 
>   PR fortran/68053
>   * gfortran.dg/pr68053.f90: New test.

Now with the patch attached!

-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 229970)
+++ gcc/fortran/decl.c	(working copy)
@@ -1529,26 +1529,34 @@ add_init_expr_to_sym (const char *name, 
 	  for (dim = 0; dim < sym->as->rank; ++dim)
 	{
 	  int k;
-	  gfc_expr* lower;
-	  gfc_expr* e;
+	  gfc_expr *e, *lower;
 
 	  lower = sym->as->lower[dim];
-	  if (lower->expr_type != EXPR_CONSTANT)
+
+	  /* If the lower bound is an array element from another 
+		 parameterized array, then it is marked with EXPR_VARIABLE and
+		 is an initialization expression.  Try to reduce it.  */
+	  if (lower->expr_type == EXPR_VARIABLE)
+		gfc_reduce_init_expr (lower);
+
+	  if (lower->expr_type == EXPR_CONSTANT)
+		{
+		  /* All dimensions must be without upper bound.  */
+		  gcc_assert (!sym->as->upper[dim]);
+
+		  k = lower->ts.kind;
+		  e = gfc_get_constant_expr (BT_INTEGER, k, &sym->declared_at);
+		  mpz_add (e->value.integer, lower->value.integer,
+			   init->shape[dim]);
+		  mpz_sub_ui (e->value.integer, e->value.integer, 1);
+		  sym->as->upper[dim] = e;
+		}
+	  else
 		{
 		  gfc_error ("Non-constant lower bound in implied-shape"
 			 " declaration at %L", &lower->where);
 		  return false;
 		}
-
-	  /* All dimensions must be without upper bound.  */
-	  gcc_assert (!sym->as->upper[dim]);
-
-	  k = lower->ts.kind;
-	  e = gfc_get_constant_expr (BT_INTEGER, k, &sym->declared_at);
-	  mpz_add (e->value.integer,
-		   lower->value.integer, init->shape[dim]);
-	  mpz_sub_ui (e->value.integer, e->value.integer, 1);
-	  sym->as->upper[dim] = e;
 	}
 
 	  sym->as->type = AS_EXPLICIT;
Index: gcc/testsuite/gfortran.dg/pr68053.f90
===
--- gcc/testsuite/gfortran.dg/pr68053.f90	(revision 0)
+++ gcc/testsuite/gfortran.dg/pr68053.f90	(working copy)
@@ -0,0 +1,10 @@
+! { dg-do run }
+! PR fortran/68053
+! Original code contributed by Gerhard Steinmetz
+! 
+program p
+   integer, parameter :: n(3) = [1,2,3]
+   integer, parameter :: x(1) = 7
+   integer, parameter :: z(n(2):*) = x
+   if (lbound(z,1) /= 2) call abort
+end

Re: Add null identifiers to genmatch

2015-11-08 Thread Jeff Law


On 11/07/2015 07:31 AM, Pedro Alves wrote:

Hi Richard,

Passerby comment below.

On 11/07/2015 01:21 PM, Richard Sandiford wrote:

-/* Lookup the identifier ID.  */
+/* Lookup the identifier ID.  Allow "null" if ALLOW_NULL.  */

  id_base *
-get_operator (const char *id)
+get_operator (const char *id, bool allow_null = false)
  {
+  if (allow_null && strcmp (id, "null") == 0)
+return null_id;
+
id_base tem (id_base::CODE, id);


Boolean params are best avoided if possible, IMO.  In this case,
it seems this could instead be a new wrapper function, like:
This hasn't been something we've required for GCC.I've come across 
this recommendation a few times over the last several months as I 
continue to look at refactoring and best practices for codebases such as 
GCC.


By encoding the boolean in the function's signature, it (IMHO) does make 
the code a bit easier to read, primarily because you don't have to go 
lookup the tense of the boolean).  The problem is when the boolean is 
telling us some property an argument, but there's more than one argument 
and other similar situations.


I wonder if the real benefit is in the refactoring necessary to do 
things in this way without a ton of code duplication.


Jeff

Re: [PATCH 1/7] New obstack_next_free is not an lvalue

2015-11-08 Thread Jeff Law


On 11/07/2015 01:07 AM, Alan Modra wrote:

New obstack.h casts obstack_next_free to (void *), resulting in it
being a non-lvalue, and warnings on pointer arithmetic.

gcc/
* gensupport.c (add_mnemonic_string): Make len param a size_t.
(gen_mnemonic_setattr): Make "size" var a size_t.  Use
obstack_blank_fast to shrink obstack.  Cast obstack_next_free
return value.
gcc/objc/
* objc-encoding.c (encode_aggregate_within): Cast obstack_next_free
return value.
Richard S. already approved gensupport.  I'll go ahead and approve the 
objc-encoding changes.


Please install.

Thanks,
jeff

Re: [PATCH 2/7] Correct libvtv obstack use

2015-11-08 Thread Jeff Law


On 11/07/2015 01:08 AM, Alan Modra wrote:

Fixes a compile error with both old and new obstacks due to
obstack_chunk_free having the wrong signature.  Also, setting chunk
size and alignment before obstack_init is pointless since they are
overwritten.

* vtv_malloc.cc (obstack_chunk_free): Correct param type.
(__vtv_malloc_init): Use obstack_specify_allocation.

OK
jeff

Re: [PATCH 3/7] Update libsanitizer obstack interceptors

2015-11-08 Thread Jeff Law


On 11/07/2015 01:08 AM, Alan Modra wrote:

New obstack uses sensible types, size_t instead of int for length
params.  Since libsanitizer does not use prototypes from obstack.h to
call the real functions, it's necessary to update the libsanitizer
function declarations emitted by the INTERCEPTOR macro.

As per the comment added to configure.ac, it would be nice if we could
update to a more recent autoconf, but what I have should do given the
limited target support for libsanitizer.

I'll be pushing this one upstream too, when I figure out something
reasonable for cmake.

* sanitizer_common/sanitizer_common_interceptors.inc: Update size
params for _obstack_begin_1, _obstack_begin, _obstack_newchunk
interceptors.
* configure.ac: Substitute OBSTACK_DEFS.
* asan/Makefile.am: Add OBSTACK_DEFS to DEFS.
* tsan/Makefile.am: Likewise.
* configure: Regenerate.
* Makefile.in: Regenerate.
* asan/Makefile.in: Regenerate.
* interception/Makefile.in: Regenerate.
* libbacktrace/Makefile.in: Regenerate.
* lsan/Makefile.in: Regenerate.
* sanitizer_common/Makefile.in: Regenerate.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.in: Regenerate.
I'm going to consider this a portability fix, which means it can go in 
now rather than wait for an upstream merge.


OK.  But please do continue to coordinate with upstream so that we don't 
have to carry this as a GCC specific change.


jeff

Re: [PATCH 4/7] Copy gnulib obstack files

2015-11-08 Thread Jeff Law


On 11/07/2015 01:09 AM, Alan Modra wrote:

This copies obstack.[ch] from gnulib, and updates the docs.  The next
patch should be applied if someone repeats the import at a later date.

include/
PR gdb/17133
* obstack.h: Import current gnulib file.
libiberty/
PR gdb/17133
* obstack.c: Import current gnulib file.
* obstacks.texi: Updated doc, from glibc's manual/memory.texi.
I didn't really walk through this patch since it's coming directly from 
gnulib. OK for the trunk.


jeff

Re: [PATCH 5/7] Modify obstack.[hc] to avoid having to include other gnulib files

2015-11-08 Thread Jeff Law


On 11/07/2015 01:10 AM, Alan Modra wrote:

Using the standard gnulib obstack source requires importing quite a
lot of other files from gnulib, and requires build changes.

If one did want to use gnulib obstack directly, then it would need to
go in a sub-directory and after ".../gnulib-tool --import obstack"
we'd have the following:

./lib:
alignof.h   gettext.hobstack.hstdlib.in.h unistd.in.h
exitfail.c  Makefile.am  stddef.in.h  sys_types.in.h
exitfail.h  obstack.cstdint.in.h  unistd.c

./m4:
00gnulib.m4 gnulib-comp.m4   obstack.m4   stdint.m4   wchar_t.m4
absolute-header.m4  gnulib-tool.m4   off_t.m4 stdlib_h.m4
extern-inline.m4include_next.m4  onceonly.m4  sys_types_h.m4
gnulib-cache.m4 longlong.m4  ssize_t.m4   unistd_h.m4
gnulib-common.m4multiarch.m4 stddef_h.m4  warn-on-use.m4

./snippet:
arg-nonnull.h  c++defs.h  _Noreturn.h  warn-on-use.h

include/
PR gdb/17133
* obstack.h (__attribute_pure__): Expand _GL_ATTRIBUTE_PURE.
libiberty/
PR gdb/17133
* obstack.c (__alignof__): Expand alignof_type from alignof.h.
(obstack_exit_failure): Don't use exitfail.h.
(_): Include libintl.h when HAVE_LIBINTL_H and nls enabled.
Provide default.  Don't include gettext.h.
(_Noreturn): Define.
* obstacks.texi: Adjust node references to external libc info files.
Ewww.  I suspect we'll probably want to go with direct use of gnulib 
obstack at some point, but this hack-ish patch is OK for now.


jeff

Re: [PATCH 6/7] Silence obstack.c -Wc++compat warning

2015-11-08 Thread Jeff Law


On 11/07/2015 01:11 AM, Alan Modra wrote:

Fixes
warning: request for implicit conversion from ‘void *’ to ‘struct 
_obstack_chunk *’ not permitted in C++ [-Wc++-compat]

I moved the assignment to h->chunk to fix an overlong line, then
decided it would be better after the alloc failure check just to do
things the same way as in _obstack_newchunk.

* obstack.c (_obstack_newchunk): Silence -Wc++compat warning.
(_obstack_begin_worker): Likewise.  Move assignment to h->chunk
after alloc failure check.
OK.  Please consider feeding this to gnulib since it looks like 
something they may want to fix.


jeff

Re: [PATCH 7/7] Configury changes for obstack optimization

2015-11-08 Thread Jeff Law


On 11/07/2015 01:11 AM, Alan Modra wrote:

Provides defines used to determine whether glibc obstacks are
compatible.  Generally speaking, 32-bit targets won't need to use
obstack.o from libiberty if glibc is used, while 64-bit targets will,
until glibc gets the new obstack code.

* configure.ac: Check size of size_t.
* configure: Regenerate.

OK.
jeff

State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-08 Thread Torvald Riegel

Hi,

I'd like to summarize the current state of support for the TM TS, and
outline the current plan for the work that remains to complete the
support.

I'm aware we're at the end of stage 1, but I'm confident we can still
finish this work and hope to include it in GCC 6 because:
(1) most of the support is already in GCC, and we have a big head start
in the space of TM so it would be unfortunate to not waste that by not
delivering support for the TM TS,
(2) this is a TS and support for it is considered experimental,
(3) most of the affected code is in libitm or the compiler's TM passes,
which has to be enabled explicitly by the user.

Currently, we have complete support for the syntax and all necessary
instrumentation except the exception handling bits listed below.  libitm
has a good set of STM and HTM-based algorithms.


What is missing on the compiler side is essentially a change of how we
support atomic_noexcept and atomic_cancel, in particular exception
handling.  Instead of just using a finally block as done currently, the
compiler need to build a catch clause so that it can actively intercept
exceptions that escape an atomic_noexcept or atomic_cancel.  For
atomic_noexcept, the compiler needs to include a call to abort() in the
catch clause.


For atomic_cancel, it needs to call ITM_commitTransactionEH in the catch
clause, and use NULL as exception argument.  This can then be used by
libitm to look at the currently being handled exception and (a) check
whether the type support transaction cancellation as specified by the TS
and (b) pick out the allocations that belong to this exception and roll
back everything else before rethrowing this exception.

For (a), it's probably best to place this check into libstdc++
(specifically, libsupc++ I suppose) because support for transaction
cancellation is a property that library parts of the standard (or the
TS) require, and that has to match the implementation in libstdc++.
Attached is a patch by Jason that implements this check.  This adds one
symbol, which should be okay we hope.

For (b), our plan is to track the additional allocations that happen
when during construction of the exception types that support
cancellation (eg, creating the what() string for logic_error).  There
are several ways to do that, one of that being that we create custom
transactional clones of those constructors that tell libitm that either
such a constructor is currently running or explicitly list the
allocations that have been made by the constructor; eventually, we would
always (copy) construct into memory returned by cxa_allocate_exception,
which then makes available the necessary undo information when such an
exception is handled in libitm.


The other big piece of missing support is making sure that the functions
that are specified in the TS as transaction_safe are indeed that.  I
believe we do not need to actually add such annotations to any libstdc++
functions that are already transaction-safe and completely defined in
headers -- those functions are implicitly transaction-safe, and we can
thus let the compiler isntrument them at the point of use inside of a
transaction.

If a supposedly transaction-safe function is not defined in a header,
we'd need a transaction_safe annotation at the declaration.  Jason has
implemented the TM TS feature test macro, so we can only add the
annotation if the user has enabled support for the TM TS in the
respective compilation process.
We also need ensure that there is a transaction clode of the function.
This will add symbols to libstdc++, but these all have their own special
prefix in the mangled name.  I'd like to get feedback on how to best
trigger the insturmentation and make it a part of a libstdc++ build.
(If that would show to be too problematic, we could still fall back to
writing transacitonal clones manually.)
For the clones of the constructors of the types that support
cancellation, I suppose manually written clones might be easier than
automatic instrumentation.

I've not yet created tests for the full list of functions specified as
transaction-safe in the TS, but my understanding is that this list was
created after someone from the ISO C++ TM study group looked at libstdc
++'s implementation and investigated which functions might be feasible
to be declared transaction-safe in it.

I'm looking forward to your feedback.

Thanks,

Torvald
// -*- C++ -*- GNU C++ atomic_cancel support.
// Copyright (C) 2015 Free Software Foundation, Inc.
//
// This file is part of GCC.
//
// GCC is free software; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 3, or (at your option)
// any later version.
//
// GCC is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
//
//

[PATCH v2] libitm: Support sized delete.

2015-11-08 Thread Torvald Riegel

This patch supports the sized variants of operator delete.
Some change compare to v1.
Tested on x86_64-linux.
commit df00a283f2e37bd3c69f37783fa81dde7ccf1f94
Author: Torvald Riegel 
Date:   Thu Oct 29 18:52:20 2015 +0100

Support sized delete.

This adds transactional clones of the sized version of operator delete.

diff --git a/libitm/alloc.cc b/libitm/alloc.cc
index bb292da..7b8786c 100644
--- a/libitm/alloc.cc
+++ b/libitm/alloc.cc
@@ -29,26 +29,38 @@ namespace GTM HIDDEN {
 void
 gtm_thread::record_allocation (void *ptr, void (*free_fn)(void *))
 {
-  uintptr_t iptr = (uintptr_t) ptr;
-
-  gtm_alloc_action *a = this->alloc_actions.find(iptr);
-  if (a == 0)
-a = this->alloc_actions.insert(iptr);
+  // We do not deallocate before outermost commit, so we should never have
+  // an existing log entry for a new allocation.
+  gtm_alloc_action *a = this->alloc_actions.insert((uintptr_t) ptr);
 
   a->free_fn = free_fn;
+  a->free_fn_sz = 0;
   a->allocated = true;
 }
 
 void
 gtm_thread::forget_allocation (void *ptr, void (*free_fn)(void *))
 {
-  uintptr_t iptr = (uintptr_t) ptr;
-
-  gtm_alloc_action *a = this->alloc_actions.find(iptr);
-  if (a == 0)
-a = this->alloc_actions.insert(iptr);
-
+  // We do not deallocate before outermost commit, so we should never have
+  // an existing log entry for a deallocation at the same address.  We may
+  // have an existing entry for a matching allocation, but this is handled
+  // correctly because both are complementary in that only one of these will
+  // cause an action at commit or abort.
+  gtm_alloc_action *a = this->alloc_actions.insert((uintptr_t) ptr);
   a->free_fn = free_fn;
+  a->free_fn_sz = 0;
+  a->allocated = false;
+}
+
+void
+gtm_thread::forget_allocation (void *ptr, size_t sz,
+			   void (*free_fn_sz)(void *, size_t))
+{
+  // Same as forget_allocation but with a size.
+  gtm_alloc_action *a = this->alloc_actions.insert((uintptr_t) ptr);
+  a->free_fn = 0;
+  a->free_fn_sz = free_fn_sz;
+  a->sz = sz;
   a->allocated = false;
 }
 
@@ -67,31 +79,27 @@ commit_allocations_2 (uintptr_t key, gtm_alloc_action *a, void *data)
 
   if (cb_data->revert_p)
 {
-  // Roll back nested allocations.
+  // Roll back nested allocations, discard deallocations.
   if (a->allocated)
-	a->free_fn (ptr);
+	{
+	  if (a->free_fn_sz != 0)
+	a->free_fn_sz (ptr, a->sz);
+	  else
+	a->free_fn (ptr);
+	}
 }
   else
 {
-  if (a->allocated)
-	{
-	  // Add nested allocations to parent transaction.
-	  gtm_alloc_action* a_parent = cb_data->parent->insert(key);
-	  *a_parent = *a;
-	}
-  else
-	{
-	  // ??? We could eliminate a parent allocation that matches this
-	  // memory release, if we had support for removing all accesses
-	  // to this allocation from the transaction's undo and redo logs
-	  // (otherwise, the parent transaction's undo or redo might write to
-	  // data that is already shared again because of calling free()).
-	  // We don't have this support currently, and the benefit of this
-	  // optimization is unknown, so just add it to the parent.
-	  gtm_alloc_action* a_parent;
-	  a_parent = cb_data->parent->insert(key);
-	  *a_parent = *a;
-	}
+  // Add allocations and deallocations to parent.
+  // ??? We could eliminate a (parent) allocation that matches this
+  // a deallocation, if we had support for removing all accesses
+  // to this allocation from the transaction's undo and redo logs
+  // (otherwise, the parent transaction's undo or redo might write to
+  // data that is already shared again because of calling free()).
+  // We don't have this support currently, and the benefit of this
+  // optimization is unknown, so just add it to the parent.
+  gtm_alloc_action* a_parent = cb_data->parent->insert(key);
+  *a_parent = *a;
 }
 }
 
@@ -99,10 +107,15 @@ static void
 commit_allocations_1 (uintptr_t key, gtm_alloc_action *a, void *cb_data)
 {
   void *ptr = (void *)key;
-  uintptr_t revert_p = (uintptr_t) cb_data;
+  bool revert_p = (bool) (uintptr_t) cb_data;
 
-  if (a->allocated == revert_p)
-a->free_fn (ptr);
+  if (revert_p == a->allocated)
+{
+  if (a->free_fn_sz != 0)
+	a->free_fn_sz (ptr, a->sz);
+  else
+	a->free_fn (ptr);
+}
 }
 
 /* Permanently commit allocated memory during transaction.
diff --git a/libitm/alloc_cpp.cc b/libitm/alloc_cpp.cc
index 8514618..13185a7 100644
--- a/libitm/alloc_cpp.cc
+++ b/libitm/alloc_cpp.cc
@@ -35,41 +35,50 @@ using namespace GTM;
 
 #define _ZnwX			S(_Znw,MANGLE_SIZE_T)
 #define _ZnaX			S(_Zna,MANGLE_SIZE_T)
+#define _ZdlPvX			S(_ZdlPv,MANGLE_SIZE_T)
 #define _ZnwXRKSt9nothrow_t	S(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t)
 #define _ZnaXRKSt9nothrow_t	S(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t)
+#define _ZdlPvXRKSt9nothrow_t	S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9nothrow_t)
 
 #define _ZGTtnwX		S(_ZGTtnw,MANGLE_SIZE_T)
 #define _ZGTtnaX		S(_ZGTtna,MANGLE_SIZE_T)
+#define _ZGTtdlPvX		S(_ZGTtdlPv,MANGLE_

[PATCH] libitm: Support __cxa_free_exception and fix exception handling.

2015-11-08 Thread Torvald Riegel

See the added overview comments in eh_cpp.cc.
This is still lacking updated docs in the ABI spec, which I can add
later.  We don't need to change __cxa_tm_cleanup in libsupc++, but don't
use the first two arguments of it anymore, which we may want to document
too.

Thoughts?
commit 0a67dc5a13fd17a24fc667a251d000a73cd5159e
Author: Torvald Riegel 
Date:   Tue Nov 3 15:38:22 2015 +0100

Support __cxa_free_exception and fix exception handling.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index c3ed11b..86f7b39 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -132,6 +132,8 @@ GTM::gtm_thread::gtm_thread ()
   number_of_threads_changed(number_of_threads - 1, number_of_threads);
   serial_lock.write_unlock ();
 
+  init_cpp_exceptions ();
+
   if (pthread_once(&thr_release_once, thread_exit_init))
 GTM_fatal("Initializing thread release TLS key failed.");
   // Any non-null value is sufficient to trigger destruction of this
@@ -383,6 +385,11 @@ GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
 #endif
 }
 
+  // Log the number of uncaught exceptions if we might have to roll back this
+  // state.
+  if (tx->cxa_uncaught_count_ptr != 0)
+tx->cxa_uncaught_count = *tx->cxa_uncaught_count_ptr;
+
   // Run dispatch-specific restart code. Retry until we succeed.
   GTM::gtm_restart_reason rr;
   while ((rr = disp->begin_or_restart()) != NO_RESTART)
@@ -411,7 +418,7 @@ GTM::gtm_transaction_cp::save(gtm_thread* tx)
   id = tx->id;
   prop = tx->prop;
   cxa_catch_count = tx->cxa_catch_count;
-  cxa_unthrown = tx->cxa_unthrown;
+  cxa_uncaught_count = tx->cxa_uncaught_count;
   disp = abi_disp();
   nesting = tx->nesting;
 }
@@ -583,7 +590,6 @@ GTM::gtm_thread::trycommit ()
   undolog.commit ();
   // Reset further transaction state.
   cxa_catch_count = 0;
-  cxa_unthrown = NULL;
   restart_total = 0;
 
   // Ensure privatization safety, if necessary.
diff --git a/libitm/eh_cpp.cc b/libitm/eh_cpp.cc
index a86dbf1..e28c057 100644
--- a/libitm/eh_cpp.cc
+++ b/libitm/eh_cpp.cc
@@ -26,6 +26,50 @@
 
 using namespace GTM;
 
+/* Exceptions can exist in three phases: (1) after having been allocated by
+   __cxa_allocate_exception but before being handed off to __cxa_throw,
+   (2) when they are in flight, so between __cxa_throw and __cxa_begin_catch,
+   and (3) when they are being handled (between __cxa_begin_catch and
+   __cxa_end_catch).
+
+   We can get aborts in all three phases, for example in (1) during
+   construction of the exception object, or in (2) during destructors called
+   while unwinding the stack.  The transaction that created an exception
+   object can commit in phase (2) but not in phases (1) and (3) because both
+   throw expressions and catch clauses are properly nested wrt transactions.
+
+   We handle phase (1) by dealing with exception objects similar to how we
+   deal with other (de)allocations, which also ensures that we can have more
+   than one exception object allocated at the same time (e.g., if the
+   throw expression itself throws an exception and thus calls
+   __cxa_allocate_exception).  However, on the call to __cxa_begin_catch
+   we hand off the exception to the special handling of phase (3) and
+   remove the undo log entry of the allocation.  Note that if the allocation
+   happened outside of this transaction, we do not need to do anything.
+
+   When an exception reaches phase (2) due to a call to __cxa_throw, the count
+   of uncaught exceptions is incremented.  We roll back this effect by saving
+   and restoring this number in the structure returned from __cxa_get_globals.
+   This also takes care of increments of this count when rethrowing an
+   exception.
+
+   For phase (3), we keep track of the number of times __cxa_begin_catch
+   has been called without a matching call to __cxa_end_catch.  This count
+   is then used by __cxa_tm_cleanup to roll back the exception handling state
+   by calling __cxa_end_catch for the exceptions that have not been finished
+   yet (without running destructors though because we roll back the memory
+   anyway).
+   Once an exception that was allocated in this transaction enters phase (3),
+   it does not need to be deallocated on abort anymore because the calls to
+   __cxa_end_catch will take care of that.
+
+   We require all code executed by the transaction to be transaction_safe (or
+   transaction_pure, or to have wrappers) if the transaction is to be rolled
+   back.  However, we take care to not require this for transactions that
+   just commit; this way, transactions that enter serial mode and then call
+   uninstrumented code continue to work.
+   */
+
 /* Everything from libstdc++ is weak, to avoid requiring that library
to be linked into plain C applications using libitm.so.  */
 
@@ -33,85 +77,138 @@ using namespace GTM;
 
 extern "C" {
 
+struct __cxa_eh_globals
+{
+  void *	caughtExceptions;
+  unsigned int	uncaughtExceptions;
+};
+
 extern

Re: [PATCH], Add power9 support to GCC, patch #1 (revised)

2015-11-08 Thread Michael Meissner

This is patch #1 that I revised.  I changed -mfusion-toc to -mtoc-fusion.  I
changed the references to ISA 2.08 to 3.0.  I added two new debug switches for
code in future patches that in undergoing development and is not ready to be on
by default.

I have done a bootstrap build on a little endian power8 system and there were
no regressions in this patch.  Is it ok to install in the trunk?

2015-11-08  Michael Meissner  

* config/rs6000/rs6000.opt (-mpower9-fusion): Add new switches for
ISA 3.0 (power9).
(-mpower9-vector): Likewise.
(-mpower9-dform): Likewise.
(-mpower9-minmax): Likewise.
(-mtoc-fusion): Likewise.
(-mmodulo): Likewise.
(-mfloat128-hardware): Likewise.

* config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Add option
mask for ISA 3.0 (power9).
(POWERPC_MASKS): Add new ISA 3.0 switches.
(power9 cpu): Add power9 cpu.

* config/rs6000/rs6000.h (ASM_CPU_POWER9_SPEC): Add support for
power9.
(ASM_CPU_SPEC): Likewise.
(EXTRA_SPECS): Likewise.

* config/rs6000/rs6000-opts.h (enum processor_type): Add
PROCESSOR_POWER9.

* config/rs6000/rs6000.c (power9_cost): Initial cost setup for
power9.
(rs6000_debug_reg_global): Add support for power9 fusion.
(rs6000_setup_reg_addr_masks): Cache mode size.
(rs6000_option_override_internal): Until real power9 tuning is
added, use -mtune=power8 for -mcpu=power9.
(rs6000_setup_reg_addr_masks): Do not allow pre-increment,
pre-decrement, or pre-modify on SFmode/DFmode if we allow the use
of Altivec registers.
(rs6000_option_override_internal): Add support for ISA 3.0
switches.
(rs6000_loop_align): Add support for power9 cpu.
(rs6000_file_start): Likewise.
(rs6000_adjust_cost): Likewise.
(rs6000_issue_rate): Likewise.
(insn_must_be_first_in_group): Likewise.
(insn_must_be_last_in_group): Likewise.
(force_new_group): Likewise.
(rs6000_register_move_cost): Likewise.
(rs6000_opt_masks): Likewise.

* config/rs6000/rs6000.md (cpu attribute): Add power9.
* config/rs6000/rs6000-tables.opt: Regenerate.

* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
_ARCH_PWR9 if power9 support is available.

* config/rs6000/aix61.h (ASM_CPU_SPEC): Add power9.
* config/rs6000/aix53.h (ASM_CPU_SPEC): Likewise.

* configure.ac: Determine if the assembler supports the ISA 3.0
instructions.
* config.in (HAVE_AS_POWER9): Likewise.
* configure: Regenerate.

* doc/invoke.texi (RS/6000 and PowerPC Options): Document ISA 3.0
switches.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.opt
===
--- gcc/config/rs6000/rs6000.opt(revision 229970)
+++ gcc/config/rs6000/rs6000.opt(working copy)
@@ -601,6 +601,34 @@ moptimize-swaps
 Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
+mpower9-fusion
+Target Report Mask(P9_FUSION) Var(rs6000_isa_flags)
+Fuse certain operations together for better performance on power9.
+
+mpower9-vector
+Target Report Mask(P9_VECTOR) Var(rs6000_isa_flags)
+Use/do not use vector and scalar instructions added in ISA 3.0.
+
+mpower9-dform
+Target Undocumented Mask(P9_DFORM) Var(rs6000_isa_flags)
+Use/do not use vector and scalar instructions added in ISA 3.0.
+
+mpower9-minmax
+Target Undocumented Mask(P9_MINMAX) Var(rs6000_isa_flags)
+Use/do not use the new min/max instructions defined in ISA 3.0.
+
+mtoc-fusion
+Target Undocumented Mask(TOC_FUSION) Var(rs6000_isa_flags)
+Fuse medium/large code model toc references with the memory instruction.
+
+mmodulo
+Target Report Mask(MODULO) Var(rs6000_isa_flags)
+Generate the integer modulo instructions.
+
 mfloat128
 Target Report Mask(FLOAT128) Var(rs6000_isa_flags)
 Enable/disable IEEE 128-bit floating point via the __float128 keyword.
+
+mfloat128-hardware
+Target Report Mask(FLOAT128_HW) Var(rs6000_isa_flags)
+Enable/disable using IEEE 128-bit floating point instructions.
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 229970)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -60,6 +60,15 @@
 | OPTION_MASK_QUAD_MEMORY_ATOMIC   \
 | OPTION_MASK_UPPER_REGS_SF)
 
+/* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not add
+   P9_DFORM or P9_MINMAX until they are fully debugged.  */
+#define ISA_3_0_MASKS_SERVER   (ISA_2_7_MASKS_SERVER

Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)

2015-11-08 Thread Michael Meissner

This is patch #2.  It adds support for the new modulus instructions that are
being added in ISA 3.0 (power9):

I have built this patch (along with patches #3 and #4) with a bootstrap build
on a power8 little endian system.  There were no regressions in the test
suite.  Is this patch ok to install in the trunk once patch #1 has been
installed.

[gcc]
2015-11-08  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for
modulus instructions if we have hardware support.

* config/rs6000/rs6000.md (mod3): Add support for ISA 3.0
modulus instructions.
(umod3): Likewise.
(divmod peephole): Likewise.
(udivmod peephole): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  

* lib/target-supports.exp (check_p9vector_hw_available): Add
checks for power9 availability.
(check_effective_target_powerpc_p9vector_ok): Likewise.
(check_vect_support_and_set_flags): Likewise.

* gcc.target/powerpc/mod-1.c: New test.
* gcc.target/powerpc/mod-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/testsuite/gcc.target/powerpc/mod-1.c
===
--- gcc/testsuite/gcc.target/powerpc/mod-1.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/mod-1.c(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+int ismod (int a, int b) { return a%b; }
+long lsmod (long a, long b) { return a%b; }
+unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
+unsigned long lumod (unsigned long a, unsigned long b) { return a%b; }
+
+/* { dg-final { scan-assembler-times "modsw " 1 } } */
+/* { dg-final { scan-assembler-times "modsd " 1 } } */
+/* { dg-final { scan-assembler-times "moduw " 1 } } */
+/* { dg-final { scan-assembler-times "modud " 1 } } */
+/* { dg-final { scan-assembler-not   "mullw "   } } */
+/* { dg-final { scan-assembler-not   "mulld "   } } */
+/* { dg-final { scan-assembler-not   "divw "} } */
+/* { dg-final { scan-assembler-not   "divd "} } */
+/* { dg-final { scan-assembler-not   "divwu "   } } */
+/* { dg-final { scan-assembler-not   "divdu "   } } */
Index: gcc/testsuite/gcc.target/powerpc/mod-2.c
===
--- gcc/testsuite/gcc.target/powerpc/mod-2.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/mod-2.c(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-* && ilp32 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+int ismod (int a, int b) { return a%b; }
+unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
+
+/* { dg-final { scan-assembler-times "modsw " 1 } } */
+/* { dg-final { scan-assembler-times "moduw " 1 } } */
+/* { dg-final { scan-assembler-not   "mullw "   } } */
+/* { dg-final { scan-assembler-not   "divw "} } */
+/* { dg-final { scan-assembler-not   "divwu "   } } */
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 229970)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -1635,6 +1635,30 @@ proc check_p8vector_hw_available { } {
 }]
 }
 
+# Return 1 if the target supports executing power9 vector instructions, 0
+# otherwise.  Cache the result.
+
+proc check_p9vector_hw_available { } {
+return [check_cached_effective_target p9vector_hw_available {
+   # Some simulators are known to not support VSX/power8 instructions.
+   # For now, disable on Darwin
+   if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || 
[istarget *-*-darwin*]} {
+   expr 0
+   } else {
+   set options "-mpower9-vector"
+   check_runtime_nocache p9vector_hw_available {
+   int main()
+   {
+ long e = -1;
+ vector double v = (vector double) { 0.0, 0.0 };
+ asm ("xsxexpdp %0,%1" : "+r" (e) : "wa" (v));
+ return e;
+   }
+   } $options
+   }
+}]
+}
+
 # Return 1 if the target supports executing VSX instructions, 0
 # otherwise.  Cache the result.
 
@@ -3358,6 +3382,31 @@ proc check_effective_target_powerpc_p8ve
 }
 }
 
+# Return 1 if this is a PowerPC target supporting -mpower9-vecto

Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)

2015-11-08 Thread Michael Meissner

This patch adds support for scalar count trailing zeros instruction that is
being added to ISA 3.0 (power9).

I have built this patch (along with patches #2 and #4) with a bootstrap build
on a power8 little endian system.  There were no regressions in the test
suite.  Is this patch ok to install in the trunk once patch #1 has been
installed.

[gcc]
2015-11-08  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for
count trailing zero instruction if we have hardware support.

* config/rs6000/rs6000.h (TARGET_CTZ): Add support for count
trailing zero instruction in ISA 3.0.
* config/rs6000/rs6000.c (ctz2): Likewise.
(ctz2_h): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  

* gcc.target/powerpc/ctz-1.c: Add test for count trailing zero
instruciton support.
* gcc.target/powerpc/ctz-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 229973)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -31850,6 +31850,9 @@ rs6000_rtx_costs (rtx x, machine_mode mo
   return false;
 
 case CTZ:
+  *total = COSTS_N_INSNS (TARGET_CTZ ? 1 : 4);
+  return false;
+
 case FFS:
   *total = COSTS_N_INSNS (4);
   return false;
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 229972)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -565,6 +565,7 @@ extern int rs6000_vector_align[];
 #define TARGET_FCFIDUS TARGET_POPCNTD
 #define TARGET_FCTIDUZ TARGET_POPCNTD
 #define TARGET_FCTIWUZ TARGET_POPCNTD
+#define TARGET_CTZ TARGET_MODULO
 
 #define TARGET_XSCVDPSPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 229973)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -2101,12 +2101,25 @@ (define_expand "ctz2"
  (clobber (reg:GPR CA_REGNO))])]
   ""
 {
+  if (TARGET_CTZ)
+{
+  emit_insn (gen_ctz2_hw (operands[0], operands[1]));
+  DONE;
+}
+
   operands[2] = gen_reg_rtx (mode);
   operands[3] = gen_reg_rtx (mode);
   operands[4] = gen_reg_rtx (mode);
   operands[5] = GEN_INT (GET_MODE_BITSIZE (mode) - 1);
 })
 
+(define_insn "ctz2_hw"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+   (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
+  "TARGET_CTZ"
+  "cnttz %0,%1"
+  [(set_attr "type" "cntlz")])
+
 (define_expand "ffs2"
   [(set (match_dup 2)
(neg:GPR (match_operand:GPR 1 "gpc_reg_operand" "")))
Index: gcc/testsuite/gcc.target/powerpc/ctz-1.c
===
--- gcc/testsuite/gcc.target/powerpc/ctz-1.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ctz-1.c(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+int i_trailing_zero (int a) { return __builtin_ctz (a); }
+int l_trailing_zero (long a) { return __builtin_ctzl (a); }
+int ll_trailing_zero (long long a) { return __builtin_ctzll (a); }
+
+/* { dg-final { scan-assembler "cnttzw " } } */
+/* { dg-final { scan-assembler "cnttzd " } } */
+/* { dg-final { scan-assembler-not "cntlzw " } } */
+/* { dg-final { scan-assembler-not "cntlzd " } } */
Index: gcc/testsuite/gcc.target/powerpc/ctz-2.c
===
--- gcc/testsuite/gcc.target/powerpc/ctz-2.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ctz-2.c(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { powerpc*-*-* && ilp32 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+int i_trailing_zero (int a) { return __builtin_ctz (a); }
+
+/* { dg-final { scan-assembler "cnttzw " } } */
+/* { dg-final { scan-assembler-not "cntlzw " } } */

Re: [PATCH], Add power9 support to GCC, patch #4

2015-11-08 Thread Michael Meissner

This patch adds support for the EXTSWSLI instruction that is being added to
PowerPC ISA 3.0 (power9).

I have built this patch (along with patches #2 and #3) with a bootstrap build
on a power8 little endian system.  There were no regressions in the test
suite.  Is this patch ok to install in the trunk once patch #1 has been
installed.

[gcc]
2015-11-08  Michael Meissner  

* config/rs6000/predicates.md (u6bit_cint_operand): New
predicate, recognize 0..63.

* config/rs6000/rs6000.c (rs6000_rtx_costs): Adjust the costs if
the EXTSWSLI instruction is generated.

* config/rs6000/rs6000.h (TARGET_EXTSWSLI): Add support for ISA
3.0 EXTSWSLI instruction.
* config/rs6000/rs6000.md (ashdi3_extswsli): Likewise.
(ashdi3_extswsli_dot): Likewise.
(ashdi3_extswsli_dot2): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  

* gcc.target/powerpc/extswsli-1.c: New file to test EXTSWSLI
instruction generation.
* gcc.target/powerpc/extswsli-2.c: Likewise.
* gcc.target/powerpc/extswsli-3.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 229970)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -142,6 +142,11 @@ (define_predicate "u5bit_cint_operand"
   (and (match_code "const_int")
(match_test "INTVAL (op) >= 0 && INTVAL (op) <= 31")))
 
+;; Return 1 if op is a unsigned 6-bit constant integer.
+(define_predicate "u6bit_cint_operand"
+  (and (match_code "const_int")
+   (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 63")))
+
 ;; Return 1 if op is a signed 8-bit constant integer.
 ;; Integer multiplication complete more quickly
 (define_predicate "s8bit_cint_operand"
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 229974)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -31927,6 +31927,17 @@ rs6000_rtx_costs (rtx x, machine_mode mo
   return false;
 
 case ASHIFT:
+  /* The EXTSWSLI instruction is a combined instruction.  Don't count both
+the sign extend and shift separately within the insn.  */
+  if (TARGET_EXTSWSLI && mode == DImode
+ && GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+ && GET_MODE (XEXP (XEXP (x, 0), 0)) == SImode)
+   {
+ *total = 0;
+ return false;
+   }
+  /* fall through */
+ 
 case ASHIFTRT:
 case LSHIFTRT:
 case ROTATE:
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 229974)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -566,6 +566,7 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIDUZ TARGET_POPCNTD
 #define TARGET_FCTIWUZ TARGET_POPCNTD
 #define TARGET_CTZ TARGET_MODULO
+#define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
 
 #define TARGET_XSCVDPSPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 229974)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -3933,6 +3933,127 @@ (define_insn_and_split "*ashl3_dot
(set_attr "dot" "yes")
(set_attr "length" "4,8")])
 
+;; Pretend we have a memory form of extswsli until register allocation is done
+;; so that we use LWZ to load the value from memory, instead of LWA.
+(define_insn_and_split "ashdi3_extswsli"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
+   (ashift:DI
+(sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,m"))
+(match_operand:DI 2 "u6bit_cint_operand" "n,n")))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli %0,%1,%2
+   #"
+  "&& reload_completed && MEM_P (operands[1])"
+  [(set (match_dup 3)
+   (match_dup 1))
+   (set (match_dup 0)
+   (ashift:DI (sign_extend:DI (match_dup 3))
+  (match_dup 2)))]
+{
+  operands[3] = gen_lowpart (SImode, operands[0]);
+}
+  [(set_attr "type" "shift")
+   (set_attr "maybe_var_shift" "no")])
+
+
+(define_insn_and_split "*ashdi3_extswsli_dot"
+  [(set (match_operand:CC 3 "cc_reg_operand" "=x,?y,?x,??y")
+   (compare:CC
+(ashift:DI
+ (sign_extend:DI (match_operand:SI 1 "reg_or_mem_operand" "r,r,m,m"))
+ (match_operand:DI 2 "u6bit_cint_operand" "n,n,n,n"))
+(const_int 0)))
+   (clobber (match_scratch:DI 0 "=r,r,r,r"))]
+  "TARGET_EXTSWSLI"
+  "@
+   extswsli. %0,%1,%2
+   #
+   #
+   #"
+  "&& reload_completed
+   && (cc_reg_not_cr0_operand (operands[3], CCmode)
+   || memory_operand (operands[1], SImode))"
+

Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-08 Thread Michael Meissner

This patch adds support for new fusion forms in ISA 3.0 (power9).  In
particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR
stores, and some constant generation that ISA 2.07 (power8) could not
generate.

I have built this patch with a bootstrap build on a power8 little endian
system.  There were no regressions in the test suite.  Is this patch ok to
install in the trunk once patch #1 has been installed.

[gcc]
2015-11-08  Michael Meissner  

* config/rs6000/constraints.md (wF constraint): New constraints
for power9/toc fusion.
(wG constraint): Likewise.

* config/rs6000/predicates.md (upper16_cint_operand): New
predicate for power9 and toc fusion.
(fpr_reg_operand): Likewise.
(toc_fusion_or_p9_reg_operand): Likewise.
(toc_fusion_mem_raw): Likewise.
(toc_fusion_mem_wrapped): Likewise.
(fusion_gpr_addis): If power9 fusion, allow fusion for a larger
address range.
(fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
instead.
(fusion_addis_mem_combo_load): Add support for power9 fusion of
floating point loads, floating point stores, and gpr stores.
(fusion_addis_mem_combo_store): Likewise.
(fusion_offsettable_mem_operand): Likewise.

* config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
declarations.
(emit_fusion_load_store): Likewise.
(fusion_p9_p): Likewise.
(expand_fusion_p9_load): Likewise.
(expand_fusion_p9_store): Likewise.
(emit_fusion_p9_load): Likewise.
(emit_fusion_p9_store): Likewise.
(fusion_wrap_memory_address): Likewise.

* config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
elements for power9 fusion.
(rs6000_debug_print_mode): Rework debug information to print more
information about fusion.
(rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
support.
(rs6000_legitimate_address_p): Recognize toc fusion as a valid
offsettable memory address.
(emit_fusion_gpr_load): Move most of the code from
emit_fusion_gpr_load into emit_fusion-addis that handles both
power8 and power9 fusion.
(emit_fusion_addis): Likewise.
(emit_fusion_load_store): Likewise.
(fusion_wrap_memory_address): Add support for TOC fusion.
(fusion_split_address): Likewise.
(fusion_p9_p): Add support for power9 fusion.
(expand_fusion_p9_load): Likewise.
(expand_fusion_p9_store): Likewise.
(emit_fusion_p9_load): Likewise.
(emit_fusion_p9_store): Likewise.

* config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): New macros for
power9 fusion support.
(TARGET_TOC_FUSION_FP): Likewise.

* config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
fusion unspecs.
(UNSPEC_FUSION_ADDIS): Likewise.
(QHSI mode iterator): New iterator for power9 fusion.
(GPR_FUSION): Likewise.
(FPR_FUSION): Likewise.
(power9 fusion splitter): New power9/toc fusion support.
(toc_fusionload_): Likewise.
(toc_fusionload_di): Likewise.
(fusion_gpr_load_): Update predicate function.
(power9 fusion peephole2s): New power9/toc fusion support.
(fusion_gpr___load): Likewise.
(fusion_gpr___store): Likewise.
(fusion_fpr___load): Likewise.
(fusion_fpr___store): Likewise.
(fusion_p9__constant): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  

* gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c
and allow the test on PowerPC LE.
* gcc.target/powerpc/fusion2.c (fusion_vector): Likewise.

* gcc.target/powerpc/fusion3.c: New file, test power9 fusion.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 229970)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -137,6 +137,16 @@ (define_constraint "wD"
   (and (match_code "const_int")
(match_test "TARGET_VSX && (ival == VECTOR_ELEMENT_SCALAR_64BIT)")))
 
+;; Extended fusion store
+(define_memory_constraint "wF"
+  "Memory operand suitable for power9 fusion load/stores"
+  (match_operand 0 "fusion_addis_mem_combo_load"))
+
+;; Fusion gpr load.
+(define_memory_constraint "wG"
+  "Memory operand suitable for TOC fusion memory references"
+  (match_operand 0 "toc_fusion_mem_wrapped"))
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predic

Re: [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support)

2015-11-08 Thread Michael Meissner

This patch adds support for the IEEE 128-bit hardware instructions that are
being added to the PowerPC ISA 3.0 (power9).  With this patch, users on power7
and power8 will use the software emulation functions that are committed, but
still need some enhancment.  On ISA 3.0/power9, they would be able to use the
direct instructions.

I have built this patch with a bootstrap build on a power8 little endian
system.  There were no regressions in the test suite.  Is this patch ok to
install in the trunk?

[gcc]
2015-11-08  Michael Meissner  

* config/rs6000/rs6000-protos.h (convert_float128_to_int): Add
declaration.
(convert_int_to_float128): Likewise.
(rs6000_generate_compare): Add support for ISA 3.0 (power9)
hardware support for IEEE 128-bit floating point.
(rs6000_expand_float128_convert): Likewise.
(convert_float128_to_int): Likewise.
(convert_int_to_float128): Likewise.

* config/rs6000/rs6000.md (UNSPEC_ROUND_TO_ODD): New unspecs for
ISA 3.0 hardware IEEE 128-bit floating point.
(UNSPEC_IEEE128_MOVE): Likewise.
(UNSPEC_IEEE128_CONVERT): Likewise.
(FMA_F): Add support for IEEE 128-bit floating point hardware
support.
(Ff): Add support for DImode.
(Fv): Likewise.
(any_fix code iterator): New and updated iterators for IEEE
128-bit floating point hardware support.
(any_float code iterator): Likewise.
(s code attribute): Likewise.
(su code attribute): Likewise.
(az code attribute): Likewise.
(neg2, FLOAT128 iterator): Add support for IEEE 128-bit
floating point hardware support.
(abs2, FLOAT128 iterator): Likewise.
(add3, IEEE128 iterator): New insns for IEEE 128-bit
floating point hardware.
(sub3, IEEE128 iterator): Likewise.
(mul3, IEEE128 iterator): Likewise.
(div3, IEEE128 iterator): Likewise.
(copysign3, IEEE128 iterator): Likewise.
(sqrt2, IEEE128 iterator): Likewise.
(neg2, IEEE128 iterator): Likewise.
(abs2, IEEE128 iterator): Likewise.
(nabs2, IEEE128 iterator): Likewise.
(fma4_hw, IEEE128 iterator): Likewise.
(fms4_hw, IEEE128 iterator): Likewise.
(nfma4_hw, IEEE128 iterator): Likewise.
(nfms4_hw, IEEE128 iterator): Likewise.
(extend2_hw): Likewise.
(truncdf2_hw, IEEE128 iterator): Likewise.
(truncsf2_hw, IEEE128 iterator): Likewise.
(fix_fixuns code attribute): Likewise.
(float_floatuns code attribute): Likewise.
(_si2_hw): Likewise.
(_di2_hw): Likewise.
(_si2_hw): Likewise.
(_di2_hw): Likewise.
(xscvqpwz_): Likewise.
(xscvqpdz_): Likewise.
(xscvdqp_df2_odd): Likewise.
(cmp_h): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  

* gcc.target/powerpc/float128-hw.c: New test for IEEE 128-bit
hardware floating point support.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 229976)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -55,6 +55,8 @@ extern const char *rs6000_output_move_12
 extern bool rs6000_move_128bit_ok_p (rtx []);
 extern bool rs6000_split_128bit_ok_p (rtx []);
 extern void rs6000_expand_float128_convert (rtx, rtx, bool);
+extern void convert_float128_to_int (rtx *, enum rtx_code);
+extern void convert_int_to_float128 (rtx *, enum rtx_code);
 extern void rs6000_expand_vector_init (rtx, rtx);
 extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 229976)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -20504,11 +20504,12 @@ rs6000_generate_compare (rtx cmp, machin
   emit_insn (cmp);
 }
 
-  /* IEEE 128-bit support in VSX registers.  The comparison functions
- (__cmpokf2 and __cmpukf2) returns 0..15 that is laid out the same way as
- the PowerPC CR register would for a normal floating point comparison from
- the fcmpo and fcmpu instructions.  */
-  else if (FLOAT128_IEEE_P (mode))
+  /* IEEE 128-bit support in VSX registers.  If we do not have IEEE 128-bit
+ hardware, the comparison functions (__cmpokf2 and __cmpukf2) returns 0..15
+ that is laid out the same way as the PowerPC CR register would for a
+ normal floating point comparison from the fcmpo and fcmpu
+ instructions.  */
+  else if (!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))
 {
   rtx and_reg = gen_reg_rtx (SImode);
   rtx dest = gen_reg_rtx (SImode);
@@ -20647,7 +20648,7 @@ rs6000_generate

Re: [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements)

2015-11-08 Thread Michael Meissner

This patch adds support for the new direct move instructions (MFVSRLD and
MTVSRDD) that simplify moving 128-bit data between GPRs and vector registers.

I have built previous versions of this patch with no regressions.  At the
moment, I have built a non-bootstrap build and ran the PowerPC tests, with no
regressions.  Assuming the bootstrap build that I've started has no
regressions, is it ok to install in the trunk?

[gcc]
2015-11-08  Michael Meissner  

* config/rs6000/constraints.md (we constraint): New constraint for
64-bit power9 vector support.
(wL constraint): New constraint for the element in a vector that
can be addressed by the MFVSRLD instruction.

* config/rs6000/rs6000.c (rs6000_debug_reg_global): Add ISA 3.0
debugging.
(rs6000_init_hard_regno_mode_ok): If ISA 3.0 and 64-bit, enable we
constraint.  Disable the VSX<->GPR direct move helpers if we have
the MFVSRLD and MTVSRDD instructions.
(rs6000_secondary_reload_simple_move): Add support for doing
vector direct moves directly without additional scratch registers
if we have ISA 3.0 instructions.
(rs6000_secondary_reload_direct_move): Update comments.
(rs6000_output_move_128bit): Add support for ISA 3.0 vector
instructions.

* config/rs6000/vsx.md (vsx_mov): Add support for ISA 3.0
direct move instructions.
(vsx_movti_64bit): Likewise.
(vsx_extract_): Likewise.

* config/rs6000/rs6000.h (VECTOR_ELEMENT_MFVSRLD_64BIT): New
macros for ISA 3.0 direct move instructions.
(TARGET_DIRECT_MOVE_128): Likewise.

* config/rs6000/rs6000.md (128-bit GPR splitters): Don't split a
128-bit move that is a direct move between GPR and vector
registers using ISA 3.0 direct move instructions.

* doc/md.texi (RS/6000 constraints): Document we, wF, wG, wL
constraints.  Update wa documentation to say not to use %x on
instructions that only take Altivec registers.

[gcc/testsuite]
2015-11-08  Michael Meissner  

* gcc.target/powerpc/direct-move-vector.c: New test for 128-bit
vector direct move instructions.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: Re: OpenACC declare directive updates

2015-11-08 Thread Cesar Philippidis

On 11/08/2015 07:29 AM, James Norris wrote:

> The attached patch and ChangeLog reflect the updates from your
> review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00714.html.
> All of the issues pointed out, have been address.
> 
> With the changes made in this patch I think I'm handling the
> situation that you pointed out here correctly:
> 
> On Fri, Nov 06, 2015 at 01:45:09PM -0600, James Norris wrote:
> 
> Also, wonder about BLOCK stmt in Fortran, that can give you variables that
> don't live through the whole function, but only a portion of it even in
> Fortran.
> 
> OK to commit to trunk?

I'll defer to Jakub, but here are a couple of comments.

>  void
>  gfc_resolve_oacc_declare (gfc_namespace *ns)
>  {
>int list;
>gfc_omp_namelist *n;
>locus loc;
> +  gfc_oacc_declare *oc;
>  
> -  if (ns->oacc_declare_clauses == NULL)
> +  if (ns->oacc_declare == NULL)
>  return;
>  
> -  loc = ns->oacc_declare_clauses->loc;
> +  loc = gfc_current_locus;
>  
> -  for (list = OMP_LIST_DEVICE_RESIDENT;
> -   list <= OMP_LIST_DEVICE_RESIDENT; list++)
> -for (n = ns->oacc_declare_clauses->lists[list]; n; n = n->next)
> -  {
> - n->sym->mark = 0;
> - if (n->sym->attr.flavor == FL_PARAMETER)
> -   gfc_error ("PARAMETER object %qs is not allowed at %L", n->sym->name, 
> &loc);
> -  }
> +  for (oc = ns->oacc_declare; oc; oc = oc->next)
> +{
> +  for (list = OMP_LIST_DEVICE_RESIDENT;
> +list <= OMP_LIST_DEVICE_RESIDENT; list++)

Why is this loop necessary?

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   {
> + n->sym->mark = 0;
> + if (n->sym->attr.flavor == FL_PARAMETER)
> +   gfc_error ("PARAMETER object %qs is not allowed at %L",
> +  n->sym->name, &loc);
> +   }
>  
> -  for (list = OMP_LIST_DEVICE_RESIDENT;
> -   list <= OMP_LIST_DEVICE_RESIDENT; list++)
> -for (n = ns->oacc_declare_clauses->lists[list]; n; n = n->next)
> -  {
> - if (n->sym->mark)
> -   gfc_error ("Symbol %qs present on multiple clauses at %L",
> -  n->sym->name, &loc);
> - else
> -   n->sym->mark = 1;
> -  }
> +  for (list = OMP_LIST_DEVICE_RESIDENT;
> + list <= OMP_LIST_DEVICE_RESIDENT; list++)

And here.

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   {
> + if (n->sym->mark)
> +   gfc_error ("Symbol %qs present on multiple clauses at %L",
> +  n->sym->name, &loc);
> + else
> +   n->sym->mark = 1;
> +   }
>  
> -  for (n = ns->oacc_declare_clauses->lists[OMP_LIST_DEVICE_RESIDENT]; n;
> -   n = n->next)
> -check_array_not_assumed (n->sym, loc, "DEVICE_RESIDENT");
> -}
> +  for (n = oc->clauses->lists[OMP_LIST_DEVICE_RESIDENT]; n; n = n->next)

This is better.

> + check_array_not_assumed (n->sym, loc, "DEVICE_RESIDENT");
> +
> +  for (n = oc->clauses->lists[OMP_LIST_MAP]; n; n = n->next)
> + {
> +   if (n->expr && n->expr->ref->type == REF_ARRAY)
> +   gfc_error ("Array sections: %qs not allowed in"
> +  " $!ACC DECLARE at %L", n->sym->name, &loc);
> + }
> +}
> +
> +  for (oc = ns->oacc_declare; oc; oc = oc->next)
> +{
> +  for (list = OMP_LIST_LINK; list <= OMP_LIST_LINK; list++)

?

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   n->sym->mark = 0;
> +}
>  
> +  for (oc = ns->oacc_declare; oc; oc = oc->next)
> +{
> +  for (list = OMP_LIST_LINK; list <= OMP_LIST_LINK; list++)

?

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   {
> + if (n->sym->mark)
> +   gfc_error ("Symbol %qs present on multiple clauses at %L",
> +  n->sym->name, &loc);
> + else
> +   n->sym->mark = 1;
> +   }
> +}
> +
> +  for (oc = ns->oacc_declare; oc; oc = oc->next)
> +{
> +  for (list = OMP_LIST_LINK; list <= OMP_LIST_LINK; list++)

?

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   n->sym->mark = 0;
> +}
> +}

I only noticed these because I thought I fixed them in the patch you
asked me to revert from gomp-4_0-branch. At the very least, please try
to be consistent on iterating OMP_LIST_*.

Cesar

[PATCH] Fix bb-reorder problem with degenerate cond_jump (PR68182)

2015-11-08 Thread Segher Boessenkool

The code mistakenly thinks any cond_jump has two successors.  This is
not true if both destinations are the same, as can happen with weird
patterns as in the PR.

Bootstrapped and tested on powerpc64-linux; also tested the simplified
test in the PR on an x86_64-linux cross.

Sorry for the breakage.  Is this okay for trunk?


Segher


2015-11-09  Segher Boessenkool  

* gcc/bb-reorder.c (reorder_basic_blocks_simple): Treat a conditional
branch with only one successor just like unconditional branches.

---
 gcc/bb-reorder.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 5f1c2cc..950b1a1 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2304,7 +2304,9 @@ reorder_basic_blocks_simple (void)
   if (JUMP_P (end) && extract_asm_operands (end))
continue;
 
-  if (any_condjump_p (end))
+  if (single_succ_p (bb))
+   edges[n++] = EDGE_SUCC (bb, 0);
+  else if (any_condjump_p (end))
{
  edge e0 = EDGE_SUCC (bb, 0);
  edge e1 = EDGE_SUCC (bb, 1);
@@ -2315,8 +2317,6 @@ reorder_basic_blocks_simple (void)
  edges[n++] = e0;
  edges[n++] = e1;
}
-  else if (single_succ_p (bb))
-   edges[n++] = EDGE_SUCC (bb, 0);
 }
 
   /* Sort the edges, the most desirable first.  When optimizing for size
-- 
1.9.3

[PATCH] Remove backedge handling support in tree-ssa-threadupdate.c

2015-11-08 Thread Jeff Law

With the FSM threader handling all edges with EDGE_DFS_BACK set, we can 
simplify a variety of bits in tree-ssa-threadupdate.c which is precisely 
what this patch does.


The patch also introduces some checking bits to ensure that backedges do 
not appear in old-fashioned jump threading paths.


Bootstrapped and regression tested on x86_64-linux-gnu.  Installing on 
the trunk.


This is the last threading cleanup for stage1.  I am hoping to do one 
more cleanup in the ssa name manager before stage1 closes, then onward 
to bugfixing.


Jeff
commit a22a1fbd306cf41a797a3457a8c8ebf4ba07a276
Author: law 
Date:   Mon Nov 9 03:19:09 2015 +

[PATCH] Remove backedge handling support in tree-ssa-threadupdate.c

* tree-ssa-threadupdate.c (register_jump_thraed): Assert that a
non-FSM path has no edges marked with EDGE_DFS_BACK.
(ssa_redirect_edges): No longer call mark_loop_for_removal.
(thread_single_edge, def_split_header_continue_p): Remove.
(bb_ends_with_multiway_branch): Likewise.
(thread_through_loop_header): Remove cases of threading from
latch through the header.  Simplify knowing we won't thread
the latch.
(thread_through_all_blocks): Simplify knowing that only the FSM
threader needs to handle backedges.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@229982 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 81664bf..6401c43 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2015-11-08  Jeff Law 
+
+   * tree-ssa-threadupdate.c (register_jump_thraed): Assert that a
+   non-FSM path has no edges marked with EDGE_DFS_BACK.
+   (ssa_redirect_edges): No longer call mark_loop_for_removal.
+   (thread_single_edge, def_split_header_continue_p): Remove.
+   (bb_ends_with_multiway_branch): Likewise.
+   (thread_through_loop_header): Remove cases of threading from
+   latch through the header.  Simplify knowing we won't thread
+   the latch.
+   (thread_through_all_blocks): Simplify knowing that only the FSM
+   threader needs to handle backedges.
+
 2015-11-08  Eric Botcazou  
 
* doc/extend.texi (type attributes): Document scalar_storage_order.
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 68650e5..184cf34 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1406,10 +1406,6 @@ ssa_redirect_edges (struct redirection_data **slot,
fprintf (dump_file, "  Threaded jump %d --> %d to %d\n",
 e->src->index, e->dest->index, rd->dup_blocks[0]->index);
 
- /* If we redirect a loop latch edge cancel its loop.  */
- if (e->src == e->src->loop_father->latch)
-   mark_loop_for_removal (e->src->loop_father);
-
  /* Redirect the incoming edge (possibly to the joiner block) to the
 appropriate duplicate block.  */
  e2 = redirect_edge_and_branch (e, rd->dup_blocks[0]);
@@ -1630,67 +1626,6 @@ thread_block (basic_block bb, bool noloop_only)
   return retval;
 }
 
-
-/* Threads edge E through E->dest to the edge THREAD_TARGET (E).  Returns the
-   copy of E->dest created during threading, or E->dest if it was not necessary
-   to copy it (E is its single predecessor).  */
-
-static basic_block
-thread_single_edge (edge e)
-{
-  basic_block bb = e->dest;
-  struct redirection_data rd;
-  vec *path = THREAD_PATH (e);
-  edge eto = (*path)[1]->e;
-
-  delete_jump_thread_path (path);
-  e->aux = NULL;
-
-  thread_stats.num_threaded_edges++;
-
-  if (single_pred_p (bb))
-{
-  /* If BB has just a single predecessor, we should only remove the
-control statements at its end, and successors except for ETO.  */
-  remove_ctrl_stmt_and_useless_edges (bb, eto->dest);
-
-  /* And fixup the flags on the single remaining edge.  */
-  eto->flags &= ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE | EDGE_ABNORMAL);
-  eto->flags |= EDGE_FALLTHRU;
-
-  return bb;
-}
-
-  /* Otherwise, we need to create a copy.  */
-  if (e->dest == eto->src)
-update_bb_profile_for_threading (bb, EDGE_FREQUENCY (e), e->count, eto);
-
-  vec *npath = new vec ();
-  jump_thread_edge *x = new jump_thread_edge (e, EDGE_START_JUMP_THREAD);
-  npath->safe_push (x);
-
-  x = new jump_thread_edge (eto, EDGE_COPY_SRC_BLOCK);
-  npath->safe_push (x);
-  rd.path = npath;
-
-  create_block_for_threading (bb, &rd, 0, NULL);
-  remove_ctrl_stmt_and_useless_edges (rd.dup_blocks[0], NULL);
-  create_edge_and_update_destination_phis (&rd, rd.dup_blocks[0], 0);
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-fprintf (dump_file, "  Threaded jump %d --> %d to %d\n",
-e->src->index, e->dest->index, rd.dup_blocks[0]->index);
-
-  rd.dup_blocks[0]->count = e->count;
-  rd.dup_blocks[0]->frequency = EDGE_FREQUENCY (e);
-  single_succ_edge (rd.dup_blocks[0])->count = e->count;
-  redirect_edge_and_branch (e, rd.d

Re: [PATCH] Fix bb-reorder problem with degenerate cond_jump (PR68182)

2015-11-08 Thread Jeff Law


On 11/08/2015 08:09 PM, Segher Boessenkool wrote:

The code mistakenly thinks any cond_jump has two successors.  This is
not true if both destinations are the same, as can happen with weird
patterns as in the PR.

Bootstrapped and tested on powerpc64-linux; also tested the simplified
test in the PR on an x86_64-linux cross.

Sorry for the breakage.  Is this okay for trunk?


Segher


2015-11-09  Segher Boessenkool  

* gcc/bb-reorder.c (reorder_basic_blocks_simple): Treat a conditional
branch with only one successor just like unconditional branches.
OK.  Though this begs the question, should something have cleaned that 
up prior to bb-reorder?


Don't forget the PR marker in the committed ChangeLog.

jeff

[PATCH] Partially fix PR c++/12277 (Warn on dynamic cast with known NULL results)

2015-11-08 Thread Patrick Palka

When either the static type or the target type of a dynamic cast is
marked 'final', and the type marked final is not derived from the other,
then there is no way to declare a class that is derived from both types
(since one of the types is marked final) so there is no way for such a
dynamic cast to possibly succeed.

This patch detects this case and emits a warning accordingly.

Bootstrap + regtest in progress. Is this OK to commit if testing
succeeds?

gcc/cp/ChangeLog:

PR c++/12277
* rtti.c (build_dynamic_cast_1): Warn on dynamic_cast that can
never succeed due to either the target type or the static type
being marked final.

gcc/testsuite/ChangeLog:

PR c++/12277
* g++.dg/rtti/dyncast8.C: New test.
---
 gcc/cp/rtti.c| 18 ++
 gcc/testsuite/g++.dg/rtti/dyncast8.C | 47 
 2 files changed, 65 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/rtti/dyncast8.C

diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index be25be4..b823f8b 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -698,6 +698,24 @@ build_dynamic_cast_1 (tree type, tree expr, tsubst_flags_t 
complain)
 
  target_type = TYPE_MAIN_VARIANT (TREE_TYPE (type));
  static_type = TYPE_MAIN_VARIANT (TREE_TYPE (exprtype));
+
+ if ((CLASSTYPE_FINAL (static_type)
+  && !DERIVED_FROM_P (target_type, static_type))
+ || (CLASSTYPE_FINAL (target_type)
+ && !DERIVED_FROM_P (static_type, target_type)))
+   {
+ if (complain & tf_warning)
+   {
+ if (VAR_P (old_expr))
+   warning (0, "dynamic_cast of %q#D to %q#T can never 
succeed",
+   old_expr, type);
+ else
+   warning (0, "dynamic_cast of %q#E to %q#T can never 
succeed",
+   old_expr, type);
+   }
+ return build_zero_cst (type);
+   }
+
  td2 = get_tinfo_decl (target_type);
  if (!mark_used (td2, complain) && !(complain & tf_error))
return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/rtti/dyncast8.C 
b/gcc/testsuite/g++.dg/rtti/dyncast8.C
new file mode 100644
index 000..d98878c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/rtti/dyncast8.C
@@ -0,0 +1,47 @@
+// PR c++/12277
+// { dg-do require { target c++11 } }
+
+struct A1 { virtual ~A1 () { } };
+struct A2 { virtual ~A2 () { } };
+
+struct B1 { virtual ~B1 () { } };
+struct B2 final : B1 { virtual ~B2 () { } };
+
+struct C1 { virtual ~C1 () { } };
+struct C2 final { virtual ~C2 () { } };
+
+A1 *a1;
+
+B1 *b1;
+B2 *b2;
+
+C1 *c1;
+C2 *c2;
+
+void
+foo (void)
+{
+  {
+A2 *x = dynamic_cast (a1); // { dg-bogus "can never succeed" }
+  }
+
+  {
+B2 *x = dynamic_cast (b1); // { dg-bogus "can never suceed" }
+B2 &y = dynamic_cast (*b1); // { dg-bogus "can never suceed" }
+  }
+
+  {
+B1 *x = dynamic_cast (b2); // { dg-bogus "can never suceed" }
+B1 &y = dynamic_cast (*b2); // { dg-bogus "can never suceed" }
+  }
+
+  {
+C2 *x = dynamic_cast (c1); // { dg-warning "can never succeed" }
+C2 &y = dynamic_cast (*c1); // { dg-warning "can never suceed" }
+  }
+
+  {
+C1 *x = dynamic_cast (c2); // { dg-warning "can never succeed" }
+C1 &y = dynamic_cast (*c2); // { dg-warning "can never suceed" }
+  }
+}
-- 
2.6.3.412.gac8e876.dirty

Re: [PATCH] Partially fix PR c++/12277 (Warn on dynamic cast with known NULL results)

2015-11-08 Thread Patrick Palka

On Sun, Nov 8, 2015 at 10:30 PM, Patrick Palka  wrote:
> When either the static type or the target type of a dynamic cast is
> marked 'final', and the type marked final is not derived from the other,
> then there is no way to declare a class that is derived from both types
> (since one of the types is marked final) so there is no way for such a
> dynamic cast to possibly succeed.
>
> This patch detects this case and emits a warning accordingly.
>
> Bootstrap + regtest in progress. Is this OK to commit if testing
> succeeds?
>
> gcc/cp/ChangeLog:
>
> PR c++/12277
> * rtti.c (build_dynamic_cast_1): Warn on dynamic_cast that can
> never succeed due to either the target type or the static type
> being marked final.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/12277
> * g++.dg/rtti/dyncast8.C: New test.
> ---
>  gcc/cp/rtti.c| 18 ++
>  gcc/testsuite/g++.dg/rtti/dyncast8.C | 47 
> 
>  2 files changed, 65 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/rtti/dyncast8.C
>
> diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
> index be25be4..b823f8b 100644
> --- a/gcc/cp/rtti.c
> +++ b/gcc/cp/rtti.c
> @@ -698,6 +698,24 @@ build_dynamic_cast_1 (tree type, tree expr, 
> tsubst_flags_t complain)
>
>   target_type = TYPE_MAIN_VARIANT (TREE_TYPE (type));
>   static_type = TYPE_MAIN_VARIANT (TREE_TYPE (exprtype));
> +
> + if ((CLASSTYPE_FINAL (static_type)
> +  && !DERIVED_FROM_P (target_type, static_type))
> + || (CLASSTYPE_FINAL (target_type)
> + && !DERIVED_FROM_P (static_type, target_type)))
> +   {
> + if (complain & tf_warning)
> +   {
> + if (VAR_P (old_expr))
> +   warning (0, "dynamic_cast of %q#D to %q#T can never 
> succeed",
> +   old_expr, type);
> + else
> +   warning (0, "dynamic_cast of %q#E to %q#T can never 
> succeed",
> +   old_expr, type);
> +   }
> + return build_zero_cst (type);
> +   }
> +
>   td2 = get_tinfo_decl (target_type);
>   if (!mark_used (td2, complain) && !(complain & tf_error))
> return error_mark_node;
> diff --git a/gcc/testsuite/g++.dg/rtti/dyncast8.C 
> b/gcc/testsuite/g++.dg/rtti/dyncast8.C
> new file mode 100644
> index 000..d98878c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/rtti/dyncast8.C
> @@ -0,0 +1,47 @@
> +// PR c++/12277
> +// { dg-do require { target c++11 } }
> +
> +struct A1 { virtual ~A1 () { } };
> +struct A2 { virtual ~A2 () { } };
> +
> +struct B1 { virtual ~B1 () { } };
> +struct B2 final : B1 { virtual ~B2 () { } };
> +
> +struct C1 { virtual ~C1 () { } };
> +struct C2 final { virtual ~C2 () { } };
> +
> +A1 *a1;
> +
> +B1 *b1;
> +B2 *b2;
> +
> +C1 *c1;
> +C2 *c2;
> +
> +void
> +foo (void)
> +{
> +  {
> +A2 *x = dynamic_cast (a1); // { dg-bogus "can never succeed" }
> +  }
> +
> +  {
> +B2 *x = dynamic_cast (b1); // { dg-bogus "can never suceed" }
> +B2 &y = dynamic_cast (*b1); // { dg-bogus "can never suceed" }
> +  }
> +
> +  {
> +B1 *x = dynamic_cast (b2); // { dg-bogus "can never suceed" }
> +B1 &y = dynamic_cast (*b2); // { dg-bogus "can never suceed" }
> +  }
> +
> +  {
> +C2 *x = dynamic_cast (c1); // { dg-warning "can never succeed" }
> +C2 &y = dynamic_cast (*c1); // { dg-warning "can never suceed" }
> +  }
> +
> +  {
> +C1 *x = dynamic_cast (c2); // { dg-warning "can never succeed" }
> +C1 &y = dynamic_cast (*c2); // { dg-warning "can never suceed" }
> +  }
> +}
> --
> 2.6.3.412.gac8e876.dirty
>

Oops, this is the wrong test case (forgot to amend the patch before
sending).  The correct test case has s/require/compile/ and
s/suceed/succeed/g done on it.

Re: [PATCH] Fix bb-reorder problem with degenerate cond_jump (PR68182)

2015-11-08 Thread Segher Boessenkool

On Sun, Nov 08, 2015 at 08:21:47PM -0700, Jeff Law wrote:
> On 11/08/2015 08:09 PM, Segher Boessenkool wrote:
> >The code mistakenly thinks any cond_jump has two successors.  This is
> >not true if both destinations are the same, as can happen with weird
> >patterns as in the PR.
> >
> >Bootstrapped and tested on powerpc64-linux; also tested the simplified
> >test in the PR on an x86_64-linux cross.
> >
> >Sorry for the breakage.  Is this okay for trunk?
> >
> >
> >Segher
> >
> >
> >2015-11-09  Segher Boessenkool  
> >
> > * gcc/bb-reorder.c (reorder_basic_blocks_simple): Treat a conditional
> > branch with only one successor just like unconditional branches.
> OK.  Though this begs the question, should something have cleaned that 
> up prior to bb-reorder?

It normally does (which I why I hadn't noticed it), but there is no
unconditional version of this in the machine description.

It seems to create a conditional branch so that it can do a move from a
pseudo (that it sets in the branch pattern itself, it's a parallel) to
the AX register.  bb-reorder runs after RA so that move has been folded,
the pseudo itself is in AX already, so both arms of the conditional now
point to the next insn.

I don't know why the backend cannot put AX directly in this pattern (or
while expanding it).  Either way, bb-reorder should be able to handle
the situation.

> Don't forget the PR marker in the committed ChangeLog.

Uh yes, thanks.

Segher

Re: Add a combined_fn enum

2015-11-08 Thread Jeff Law


On 11/07/2015 05:22 AM, Richard Sandiford wrote:

I'm working on a patch series that needs to be able to treat built-in
functions and internal functions in a similar way.  This patch adds a
new enum, combined_fn, that combines the two together.  It also adds
utility functions for seeing which combined_fn (if any) is called by
a given CALL_EXPR or gcall.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* tree-core.h (internal_fn): Move immediately after the definition
of built_in_function.
(combined_fn): New enum.
* tree.h (as_combined_fn, builtin_fn_p, as_builtin_fn)
(internal_fn_p, as_internal_fn): New functions.
(get_call_combined_fn, combined_fn_name): Declare.
* tree.c (get_call_combined_fn): New function.
(combined_fn_name): Likewise.
* gimple.h (gimple_call_combined_fn): Declare.
* gimple.c (gimple_call_combined_fn): New function.

OK.
jeff

Re: OpenACC declare directive updates

2015-11-08 Thread James Norris


Cesar,


I only noticed these because I thought I fixed them in the patch you
asked me to revert from gomp-4_0-branch. At the very least, please try
to be consistent on iterating OMP_LIST_*.

Thank you for noticing!

Jakub,

The attached patch and ChangeLog reflect the updates from your
review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00714.html
and Cesar's review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00885.html.

With the changes made in this patch I think I'm handling the
situation that you pointed out here correctly:

"Also, wonder about BLOCK stmt in Fortran, that can give you variables that
don't live through the whole function, but only a portion of it even in
Fortran."

OK to commit to trunk?

Thanks!
Jim

2015-XX-XX  James Norris  
Joseph Myers  

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add entry for declare directive. 
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DECLARE.
(enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and
PRAGMA_OACC_CLAUSE_LINK.

gcc/c/
* c-parser.c (c_parser_pragma): Handle PRAGMA_OACC_DECLARE.
(c_parser_omp_clause_name): Handle 'device_resident' clause.
(c_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(c_parser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OACC_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(c_parser_oacc_declare): New function.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Handle 'device_resident'
clause.
(cp_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(cp_paser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(cp_parser_oacc_declare): New function.
(cp_parser_pragma): Handle PRAGMA_OACC_DECLARE.
* pt.c (tsubst_expr): Handle OACC_DECLARE.

gcc/
* gimple-pretty-print.c (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE. 
* gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DECLARE.
(is_gomple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.
* gimplify.c (gimplify_bind_expr): Prepend 'exit' stmt to cleanup.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): New builtin.
* omp-low.c (expand_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE and BUILTIN_GOACC_DECLARE.
(build_omp_regions_1): Handlde GF_OMP_TARGET_KIND_OACC_DECLARE.
(lower_omp_target): Handle GF_OMP_TARGET_KIND_OACC_DECLARE,
GOMP_MAP_DEVICE_RESIDENT and GOMP_MAP_LINK.
(make_gimple_omp_edges): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.

gcc/testsuite
* c-c++-common/goacc/declare-1.c: New test.
* c-c++-common/goacc/declare-2.c: Likewise.

include/
* gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_DEVICE_RESIDENT
and GOMP_MAP_LINK.

libgomp/

* libgomp.map (GOACC_2.0.1): Export GOACC_declare.
* oacc-parallel.c (GOACC_declare): New function.
* testsuite/libgomp.oacc-c-c++-common/declare-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/declare-5.c: Likewise.
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index ac11838..cd0cc27 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1207,6 +1207,7 @@ static const struct omp_pragma_def oacc_pragmas[] = {
   { "atomic", PRAGMA_OACC_ATOMIC },
   { "cache", PRAGMA_OACC_CACHE },
   { "data", PRAGMA_OACC_DATA },
+  { "declare", PRAGMA_OACC_DECLARE },
   { "enter", PRAGMA_OACC_ENTER_DATA },
   { "exit", PRAGMA_OACC_EXIT_DATA },
   { "kernels", PRAGMA_OACC_KERNELS },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 953c4e3..c6a2981 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -30,6 +30,7 @@ enum pragma_kind {
   PRAGMA_OACC_ATOMIC,
   PRAGMA_OACC_CACHE,
   PRAGMA_OACC_DATA,
+  PRAGMA_OACC_DECLARE,
   PRAGMA_OACC_ENTER_DATA,
   PRAGMA_OACC_EXIT_DATA,
   PRAGMA_OACC_KERNELS,
@@ -151,6 +152,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_CREATE,
   PRAGMA_OACC_CLAUSE_DELETE,
   PRAGMA_OACC_CLAUSE_DEVICEPTR,
+  PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
   PRAGMA_OACC_CLAUSE_INDEPENDENT,
@@ -175,7 +177,8 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
   PRAGMA_OACC_CLAUSE_IF = PRAGMA_OMP_CLAUSE_IF,
   PRAGMA_OACC_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
-  PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION
+  PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION,
+  PRAGMA_OACC_CLAUSE_LINK = PRAGMA_OMP_CLAUSE_LINK
 };
 
 extern struct cpp_reader* parse_in;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser

[PATCH] RFC: Enable graphite at -O3 -fprofile_use

2015-11-08 Thread hiraditya

Since graphite will not modify the CFG when it does not do any optimization,
we would like to propose that graphite's polyhedral optimizer be enabled at
-O3 -fprofile-use where, compile time is of lesser concern.

gcc/ChangeLog:

2015-11-08  Aditya Kumar  
Sebastian Pop  

* graphite.c (gate_graphite_transforms): Enable graphite on -O3 
-fprofile-use.
---
 gcc/graphite.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/graphite.c b/gcc/graphite.c
index 5316bc4..49595ac 100644
--- a/gcc/graphite.c
+++ b/gcc/graphite.c
@@ -383,6 +383,9 @@ gate_graphite_transforms (void)
   || flag_loop_optimize_isl)
 flag_graphite = 1;
 
+  if (optimize >= 3 && flag_profile_use)
+flag_loop_optimize_isl = 1;
+
   return flag_graphite != 0;
 }
 
-- 
2.1.4

[PATCH] Preserve the original program while using graphite.

2015-11-08 Thread aditya kumar

Earlier, graphite used to translate portions of the original program after
scop-detection in order to represent the SCoP into polyhedral model.
This was required because each basic block was represented as independent
basic block in the polyhedral model. So all the cross-basic-block dependencies
were translated out-of-ssa.
With this patch those dependencies are also exposed to the ISL, so there
is no need to modify the original structure of the program. There is still
a little bit of work required but we are posting it before the stage 1 closes.
The complete patch would be ready in a few days.

After this patch we should be able to enable graphite at some default
optimization level.

The patch is attached:

Aditya Kumar
Compiler Engineer


0001-Preserve-the-original-program-while-running-graphite.patch
Description: Binary data

Re: [PATCH] PR fortran/68053 -- Reduce initialization expression to constant value

2015-11-08 Thread Paul Richard Thomas

Dear Steve,

Thanks for beavering away on these front-end issues.

OK for trunk

Paul

On 8 November 2015 at 23:37, Steve Kargl
 wrote:
> On Sun, Nov 08, 2015 at 02:35:58PM -0800, Steve Kargl wrote:
>> The attached patch has been built and regression tested
>> on i386-*-freebsd and x86_64-*-freebsd.  If an array
>> index in an initialization expression is an array element
>> from an array named constant, the array index needs to be
>> reduced.  This patch causes the reduction to occur.
>> OK to commit?
>>
>> 2015-11-08  Steven g. Kargl  
>>
>>   PR fortran/68053
>>   * decl.c (add_init_expr_to_sym):  Try to reduce initialization 
>> expression
>>   before testing for a constant value.
>>
>> 2015-11-08  Steven g. Kargl  
>>
>>   PR fortran/68053
>>   * gfortran.dg/pr68053.f90: New test.
>
> Now with the patch attached!
>
> --
> Steve



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx

Re: [Aarch64] Use vector wide add for mixed-mode adds

2015-11-08 Thread Michael Collison


This is a followup patch to my earlier patch here:

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00408.html

and comments here:

https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01300.html

This patches fixes the failure in slp-reduc-3.c by adding aarch64 support in
check_effective_target_vect_widen_sum_hi_to_si_pattern in 
target-supports.exp.
The remaining failures in slp-multitypes-[45].c and vect-125.c appear to 
be deficiencies in

the vectorizer, as the same failures are seen on PowerPC and ia64. See here:

PowerPC: https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg03293.html
ia64: https://gcc.gnu.org/ml/gcc-testresults/2015-10/msg03176.html

Thanks to James Greenhalgh at Arm for pointing this out. My patch 
disables these tests for targets with
widening adds that support V8HI to V4SI. Tested on aarch64-none-elf, 
aarch64_be-none-elf, and aarch64-none-linus-gnu.


2015-11-06  Michael Collison 
* config/aarch64/aarch64-simd.md (widen_ssum, widen_usum)
(aarch64_w_internal): New patterns
* config/aarch64/iterators.md (Vhalf, VDBLW): New mode attributes.
* gcc.target/aarch64/saddw-1.c: New test.
* gcc.target/aarch64/saddw-2.c: New test.
* gcc.target/aarch64/uaddw-1.c: New test.
* gcc.target/aarch64/uaddw-2.c: New test.
* gcc.target/aarch64/uaddw-3.c: New test.
* gcc.dg/vect/slp-multitypes-4.c: Disable test for
targets with widening adds from V8HI=>V4SI.
* gcc.dg/vect/slp-multitypes-5.c: Ditto.
* gcc.dg/vect/vect-125.c: Ditto.
* lib/target-support.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern):
Add aarch64 to list of support targets.

Okay to commit?

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 65a2b6f..acb7cf0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2750,6 +2750,60 @@
 
 ;; w.
 
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (sign_extend: (match_operand:VQW 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, false);
+rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
+
+emit_insn (gen_aarch64_saddw_internal (temp, operands[2],
+		operands[1], p));
+emit_insn (gen_aarch64_saddw2 (operands[0], temp, operands[1]));
+DONE;
+  }
+)
+
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (sign_extend:
+		   (match_operand:VD_BHSI 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_aarch64_saddw (operands[0], operands[2], operands[1]));
+  DONE;
+})
+
+(define_expand "widen_usum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (zero_extend: (match_operand:VQW 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+  {
+rtx p = aarch64_simd_vect_par_cnst_half (mode, false);
+rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
+
+emit_insn (gen_aarch64_uaddw_internal (temp, operands[2],
+		 operands[1], p));
+emit_insn (gen_aarch64_uaddw2 (operands[0], temp, operands[1]));
+DONE;
+  }
+)
+
+(define_expand "widen_usum3"
+  [(set (match_operand: 0 "register_operand" "")
+	(plus: (zero_extend:
+		   (match_operand:VD_BHSI 1 "register_operand" ""))
+		  (match_operand: 2 "register_operand" "")))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_aarch64_uaddw (operands[0], operands[2], operands[1]));
+  DONE;
+})
+
 (define_insn "aarch64_w"
   [(set (match_operand: 0 "register_operand" "=w")
 (ADDSUB: (match_operand: 1 "register_operand" "w")
@@ -2760,6 +2814,18 @@
   [(set_attr "type" "neon__widen")]
 )
 
+(define_insn "aarch64_w_internal"
+  [(set (match_operand: 0 "register_operand" "=w")
+(ADDSUB: (match_operand: 1 "register_operand" "w")
+			(ANY_EXTEND:
+			  (vec_select:
+			   (match_operand:VQW 2 "register_operand" "w")
+			   (match_operand:VQW 3 "vect_par_cnst_lo_half" "")]
+  "TARGET_SIMD"
+  "w\\t%0., %1., %2."
+  [(set_attr "type" "neon__widen")]
+)
+
 (define_insn "aarch64_w2_internal"
   [(set (match_operand: 0 "register_operand" "=w")
 (ADDSUB: (match_operand: 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 964f8f1..f851dca 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -455,6 +455,13 @@
 			 (V4SF "V2SF")  (V4HF "V2HF")
 			 (V8HF "V4HF")  (V2DF  "DF")])
 
+;; Half modes of all vector modes, in lower-case.
+(define_mode_attr Vhalf [(V8QI "v4qi")  (V16QI "v8qi")
+			 (V4HI "v2hi")  (V8HI  "v4hi")
+			 (V2SI "si")(V4SI  "v2si")
+			 (V2DI "di")(V2SF  "sf")
+			 (V4SF "v2sf")  (V2DF  "df")])
+
 ;; Double modes of vector modes.
 (define_mode_attr VDBL [(V8QI "V16QI") (V4HI "V8HI")
 			(V4HF "V8H

Re: [PATCH 1/2] s/390: Implement "target" attribute.

2015-11-08 Thread Andreas Krebbel

On 10/31/2015 06:58 PM, Dominik Vogt wrote:
> But what the heck is this "exact power of 2" limitation good for
> in the first place?  Why is a stack size of 1, 2 or
> 36028797018963968 valid, but not 800?  Shouldn't the stack size
> (and the size of the stack guard) just be multiples of the stack
> slot size?

That's because of the way we implement the stack check.  We use test under mask 
to check for certain
bits instead of doing the full math and compare.

-Andreas-

[PATCH 2/2] rs6000: Extend 20050603-3.c testcase to 64-bit

2015-11-08 Thread Segher Boessenkool

The testcase used to fail on 64-bit, but it was disabled there.
This patch makes it run there, and beefs up the checking of the
generated code a bit.

Tested on powerpc64-linux *-m32,-m32/-mpowerpc64,-m64).
Is this okay for trunk?


Segher


2015-11-09  Segher Boessenkool  

gcc/testsuite/
* gcc.target/powerpc/20050603-3.c: Don't restrict to ilp32.  Do more
tests for the expected generated code.

---
 gcc/testsuite/gcc.target/powerpc/20050603-3.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/20050603-3.c 
b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
index 0f328e1..4017d34 100644
--- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
@@ -1,15 +1,19 @@
-/* { dg-do compile { target { ilp32 } } } */
+/* { dg-do compile } */
 /* { dg-options "-O2" } */
-struct Q 
+struct Q
 {
   long x:20;
   long y:4;
   long z:8;
 }b;
-/* This should generate a single rl[w]imi. */
+/* This should generate a single rl[wd]imi. */
 void rotins (unsigned int x)
 {
   b.y = (x<<12) | (x>>20);
 }
 
-/* { dg-final { scan-assembler-not "inm" } } */
+/* { dg-final { scan-assembler-not {\mrlwinm} } } */
+/* { dg-final { scan-assembler-not {\mrldic} } } */
+/* { dg-final { scan-assembler-not {\mrot[lr]} } } */
+/* { dg-final { scan-assembler-not {\ms[lr][wd]} } } */
+/* { dg-final { scan-assembler-times {\mrl[wd]imi} 1 } } */
-- 
1.9.3

[PATCH 1/2] simplify-rtx: Simplify trunc of and of shiftrt

2015-11-08 Thread Segher Boessenkool

If we have

(truncate:M1 (and:M2 (lshiftrt:M2 (x:M2) C) C2))

we can write it instead as

(and:M1 (lshiftrt:M1 (truncate:M1 (x:M2)) C) C2)

(if that is valid, of course), which has smaller modes for the
binary ops, and the truncate can often simplify further (if "x"
is a register, for example).

This fixes gcc.target/powerpc/20050603-3.c for -m32 -mpowerpc64;
also that test is currently restricted to ilp32, but we can run
it with lp64 just fine, in which case it fixes that, too.

Bootstrapped and tested on powerpc64-linux (-m32,-m32/-mpowerpc64,-m64).
Is this okay for trunk?


Segher


2015-11-09  Segher Boessenkool  

* gcc/simplify-rtx.c (simplify_truncation): Simplify TRUNCATE
of AND of [LA]SHIFTRT.

---
 gcc/simplify-rtx.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 17568ba..1adb393 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -714,6 +714,31 @@ simplify_truncation (machine_mode mode, rtx op,
 return simplify_gen_binary (ASHIFT, mode,
XEXP (XEXP (op, 0), 0), XEXP (op, 1));
 
+  /* Likewise (truncate:QI (and:SI (lshiftrt:SI (x:SI) C) C2)) into
+ (and:QI (lshiftrt:QI (truncate:QI (x:SI)) C) C2) for suitable C
+ and C2.  */
+  if (GET_CODE (op) == AND
+  && (GET_CODE (XEXP (op, 0)) == LSHIFTRT
+ || GET_CODE (XEXP (op, 0)) == ASHIFTRT)
+  && CONST_INT_P (XEXP (XEXP (op, 0), 1))
+  && CONST_INT_P (XEXP (op, 1))
+  && UINTVAL (XEXP (XEXP (op, 0), 1)) < precision
+  && ((GET_MODE_MASK (mode) >> UINTVAL (XEXP (XEXP (op, 0), 1)))
+ & UINTVAL (XEXP (op, 1)))
+== ((GET_MODE_MASK (op_mode) >> UINTVAL (XEXP (XEXP (op, 0), 1)))
+& UINTVAL (XEXP (op, 1
+{
+  rtx op0 = simplify_gen_unary (TRUNCATE, mode, XEXP (XEXP (op, 0), 0),
+   op_mode);
+  if (op0)
+   {
+ op0 = simplify_gen_binary (LSHIFTRT, mode, op0,
+XEXP (XEXP (op, 0), 1));
+ if (op0)
+   return simplify_gen_binary (AND, mode, op0, XEXP (op, 1));
+   }
+}
+
   /* Recognize a word extraction from a multi-word subreg.  */
   if ((GET_CODE (op) == LSHIFTRT
|| GET_CODE (op) == ASHIFTRT)
-- 
1.9.3

Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-08 Thread Uros Bizjak

On Sun, Nov 8, 2015 at 9:58 PM, Segher Boessenkool
 wrote:
> On Fri, Nov 06, 2015 at 04:00:08PM -0600, Segher Boessenkool wrote:
>> This patch stops combine from generating widening muls of anything else
>> but registers (immediates, memory, ...).  This probably is a reasonable
>> tradeoff for all targets, even those (if any) that have such insns.
>>
>> > >I'll let you put it through it's paces on your setup :)
>>
>> > I'll let Segher give the final yes/no on this, but it generally looks
>> > good to me.
>>
>> It looks okay to me too.  Testing now, combine patches have the tendency
>> to do unforeseen things on other targets ;-)
>
> Testing shows it makes a difference only very rarely.  For many targets
> it makes no difference, for a few it is a small win.  For 32-bit x86 it
> creates slightly bigger code.
>
> I think it looks good, but let's wait to hear Uros' opinion.

>From the original patch submission, it looks that this patch would
also benefit x86_32.

Regarding the above code size increase -  do you perhaps have a
testcase, to see what causes the difference? It isn't necessary due to
the patch, but perhaps some loads are moved to the insn and aren't
CSE'd anymore.

Uros.

68 matches

Mail list logo