date:20140228

Re: copyright dates in binutils (and includes/)

2014-02-28 Thread Alan Modra

On Thu, Feb 27, 2014 at 06:47:17PM +, Joseph S. Myers wrote:
> On Thu, 27 Feb 2014, Joel Brobecker wrote:
> 
> > I should mention, however, that for us to use ranges like this,
> > the FSF asked us to add a note explaining that the copyright years
> > could be abbreviated into a range. See gdb/README (at the end).
> > I suspect that you'll need the same note for binutils.

Thanks Joel.  I'll copy that or the gcc wording.

> And, where a gap in the years is being implicitly filled in by conversion 
> to a range, make sure that either (a) there was a public version control 
> repository for binutils during that year, or (b) there was a release 
> (including beta releases, Cygnus releases etc., not just official 
> releases) during that year.

It looks like the earliest binutils files that are edited by
update-copyright.py have copyright dates starting at 1985.  Of those,
quite a few have skipped years.  eg. binutils/filemode.c is
Copyright 1985, 1990,...

So, CVS goes back to 1991, and there are copies of old binutils
releases for all years from 1988 to 2002 except for 1999 at
ftp://sourceware.org/pub/binutils/old-releases/

Joseph, do you know why implicitly adding years to the claimed
copyright years is a problem?  I'm guessing the file needs to be
published somewhere for each year claimed.

-- 
Alan Modra
Australia Development Lab, IBM

RE: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

2014-02-28 Thread Gopalasubramanian, Ganesh

With the locality value received in the instruction pattern, I think it would 
be safe to handle them in prefetch instruction.
This helps especially AArch64 has prefetch instructions that can handle this 
locality.

+(define_insn "prefetch"
+  [(prefetch (match_operand:DI 0 "address_operand" "r")
+(match_operand:QI 1 "const_int_operand" "n")
+(match_operand:QI 2 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[2]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+ /* non temporal locality */
+ return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : 
\"prfm\\tPLDL1STRM, [%0, #0]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL%2KEEP, [%0, #0]\" : 
\"prfm\\tPLDL%2KEEP, [%0, #0]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+

I also have attached a patch that implements
*   Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). 
Added a predicate for this.
*   Prefetch with immediate offset - in the range -256 to 255 (Gets 
generated only when we have a negative offset. Generates prfum instruction). 
Added a predicate for this.
*   Prefetch with register offset. (modified for printing the locality)

Regards
Ganesh

-Original Message-
From: Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com] 
Sent: Wednesday, February 19, 2014 2:40 AM
To: gcc-patches@gcc.gnu.org
Cc: philipp.toms...@theobroma-systems.com
Subject: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

---
 gcc/config/aarch64/aarch64.md | 17 +
 gcc/config/arm/types.md   |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md 
index 99a6ac8..b972a1b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -293,6 +293,23 @@
   [(set_attr "type" "no_insn")]
 )
 
+(define_insn "prefetch"
+  [(prefetch (match_operand:DI 0 "register_operand" "r")
+(match_operand:QI 1 "const_int_operand" "n")
+(match_operand:QI 2 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  if (INTVAL(operands[2]) == 0)
+ /* no temporal locality */
+ return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : 
+\"prfm\\tPLDL1STRM, [%0, #0]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL1KEEP, [%0, #0]\" : 
+\"prfm\\tPLDL1KEEP, [%0, #0]\"; }"
+  [(set_attr "type" "prefetch")]
+)
+
 (define_insn "trap"
   [(trap_if (const_int 1) (const_int 8))]
   ""
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 
cc39cd1..1d1280d 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -117,6 +117,7 @@
 ; mvn_shift_reg  inverting move instruction, shifted operand by a register.
 ; no_insnan insn which does not represent an instruction in the
 ;final output, thus having no impact on scheduling.
+; prefetch  a prefetch instruction
 ; rbit   reverse bits.
 ; revreverse bytes.
 ; sdiv   signed division.
@@ -553,6 +554,7 @@
   call,\
   clz,\
   no_insn,\
+  prefetch,\
   csel,\
   crc,\
   extend,\
--
1.9.0



prefetchdiff.log
Description: prefetchdiff.log

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread Richard Biener

On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu  wrote:
> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng  wrote:
>> Hi,
>> This patch is to fix regression reported in PR60280 by removing forward loop
>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>> this change since cfg cleanup is shared by all optimizers.  Some tests has
>> already been fixed by recent patches, I went through and fixed the others.
>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  When
>> GCC removing a basic block, it checks profile information by calling
>> check_bb_profile after redirecting incoming edges of the bb.  This certainly
>> results in warnings about invalid profile information and causes the case to
>> fail.  I will send a patch to skip checking profile information for a
>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>> twisted the case itself.
>>
>> Bootstrap and tested on x86_64 and arm_a15.
>>
>> Is it OK?
>>
>>
>> 2014-02-25  Bin Cheng  
>>
>> PR target/60280
>> * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>> preheaders and latches only if requested.  Fix latch if it
>> is removed.
>> * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>> LOOPS_HAVE_PREHEADERS.
>>
>
> This change:
>
> if (dest->loop_father->header == dest)
> -  return false;
> +  {
> +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
> +&& bb->loop_father->header != dest)
> +  return false;
> +
> +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
> +&& bb->loop_father->header == dest)
> +  return false;
> +  }
>  }
>
> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>
> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
> -fuse-linker-plugin
>
> This patch changes loops without LOOPS_HAVE_PREHEADERS
> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
> true.  I don't have a small testcase.  But this patch:
>
> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
> index b5c384b..2ba673c 100644
> --- a/gcc/tree-cfgcleanup.c
> +++ b/gcc/tree-cfgcleanup.c
> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
>  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>  && bb->loop_father->header == dest)
>return false;
> +
> +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
> +&& !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
> +  return false;
>}
>  }
>
> fixes the regression.  Does it make any senses?

I think the preheader test isn't fully correct (bb may be in an inner loop
for example).  So a more conservative variant would be

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 208169)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
   /* Protect loop preheaders and latches if requested.  */
   if (dest->loop_father->header == dest)
{
- if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
- && bb->loop_father->header != dest)
-   return false;
-
- if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
- && bb->loop_father->header == dest)
-   return false;
+ if (bb->loop_father == dest->loop_father)
+   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
+ else if (bb->loop_father == loop_outer (dest->loop_father))
+   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
+ /* Always preserve other edges into loop headers that are
+not simple latches or preheaders.  */
+ return false;
}
 }

that makes sure we can properly update loop information.  It's also
a more conservative change at this point which should still successfully
remove simple latches and preheaders created by loop discovery.

Does it fix 435.gromacs?

Thanks,
Richard.



>
> --
> H.J.

RE: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

2014-02-28 Thread Gopalasubramanian, Ganesh

Avoided top-posting and resending.

+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL1KEEP, [%0, #0]\" : 
+\"prfm\\tPLDL1KEEP, [%0, #0]\"; }"
+  [(set_attr "type" "prefetch")]
+)
+

With the locality value received in the instruction pattern, I think it would 
be safe to handle them in prefetch instruction.
This helps especially AArch64 has prefetch instructions that can handle this 
locality.

+(define_insn "prefetch"
+  [(prefetch (match_operand:DI 0 "address_operand" "r")
+(match_operand:QI 1 "const_int_operand" "n")
+(match_operand:QI 2 "const_int_operand" "n"))]
+  ""
+  "*
+{
+  int locality = INTVAL (operands[2]);
+
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
+  if (locality == 0)
+ /* non temporal locality */
+ return (INTVAL(operands[1])) ? \"prfm\\tPSTL1STRM, [%0, #0]\" : 
\"prfm\\tPLDL1STRM, [%0, #0]\";
+
+  /* temporal locality */
+  return (INTVAL(operands[1])) ? \"prfm\\tPSTL%2KEEP, [%0, #0]\" : 
\"prfm\\tPLDL%2KEEP, [%0, #0]\";
+}"
+  [(set_attr "type" "prefetch")]
+)
+

I also have attached a patch that implements the following. 
*   Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). 
Added a predicate for this.
*   Prefetch with immediate offset - in the range -256 to 255 (Gets 
generated only when we have a negative offset. Generates prfum instruction). 
Added a predicate for this.
*   Prefetch with register offset. (modified for printing the locality)

Regards
Ganesh


prefetchdiff.log
Description: prefetchdiff.log

[gomp4 1/2] Initial support for the OpenACC kernels construct: GIMPLE_OACC_KERNELS.

2014-02-28 Thread Thomas Schwinge

From: tschwinge 

gcc/
* gimple.def (GIMPLE_OACC_KERNELS): New code.
* doc/gimple.texi: Document it.
* gimple.h (gimple_has_substatements, CASE_GIMPLE_OMP)
(is_gimple_omp_oacc_specifically): Handle it.
(gimple_statement_oacc_kernels): New struct.
(gimple_build_oacc_kernels): New prototype.
(gimple_oacc_kernels_clauses, gimple_oacc_kernels_clauses_ptr)
(gimple_oacc_kernels_set_clauses, gimple_oacc_kernels_child_fn)
(gimple_oacc_kernels_child_fn_ptr)
(gimple_oacc_kernels_set_child_fn, gimple_oacc_kernels_data_arg)
(gimple_oacc_kernels_data_arg_ptr)
(gimple_oacc_kernels_set_data_arg): New inline functions.
* gimple.c (gimple_build_oacc_kernels): New function.
(gimple_copy): Handle GIMPLE_OACC_KERNELS.
* gimple-low.c (lower_stmt): Likewise.
* gimple-walk.c (walk_gimple_op, walk_gimple_stmt): Likewise.
* gimple-pretty-print.c (pp_gimple_stmt_1): Likewise.
(dump_gimple_oacc_parallel): Rename to dump_gimple_oacc_offload.
Also handle GIMPLE_OACC_KERNELS.  Update all callers.
* gimplify.c (gimplify_omp_workshare, gimplify_expr): Handle
OACC_KERNELS.
* oacc-builtins.def (BUILT_IN_GOACC_KERNELS): New builtin.
* omp-low.c (scan_oacc_parallel, expand_oacc_parallel)
(lower_oacc_parallel): Rename to scan_oacc_offload,
expand_oacc_offload, and lower_oacc_offload.  Also handle
GIMPLE_OACC_KERNELS.  Update all callers.
(scan_sharing_clauses, scan_omp_1_stmt, expand_omp, lower_omp_1)
(diagnose_sb_0, diagnose_sb_1, diagnose_sb_2)
(make_gimple_omp_edges): Handle GIMPLE_OACC_KERNELS.
* tree-inline.c (remap_gimple_stmt, estimate_num_insns): Likewise.
* tree-nested.c (convert_nonlocal_reference_stmt)
(convert_local_reference_stmt, convert_tramp_reference_stmt)
(convert_gimple_call): Likewise.
libgomp/
* libgomp.map (GOACC_2.0): Add GOACC_kernels.
* libgomp_g.h (GOACC_kernels): New prototype.
* oacc-parallel.c (GOACC_kernels): New function.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208215 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp|  36 +
 gcc/doc/gimple.texi   |   7 +++
 gcc/gimple-low.c  |   1 +
 gcc/gimple-pretty-print.c |  48 -
 gcc/gimple-walk.c |  16 ++
 gcc/gimple.c  |  18 +++
 gcc/gimple.def|  22 +++-
 gcc/gimple.h  | 130 --
 gcc/gimplify.c|   6 ++-
 gcc/oacc-builtins.def |   6 ++-
 gcc/omp-low.c | 116 -
 gcc/tree-inline.c |   2 +
 gcc/tree-nested.c |   4 ++
 libgomp/ChangeLog.gomp|   6 +++
 libgomp/libgomp.map   |   1 +
 libgomp/libgomp_g.h   |   6 ++-
 libgomp/oacc-parallel.c   |  12 -
 17 files changed, 389 insertions(+), 48 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 3d9b06d..79030d6 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,39 @@
+2014-02-28  Thomas Schwinge  
+
+   * gimple.def (GIMPLE_OACC_KERNELS): New code.
+   * doc/gimple.texi: Document it.
+   * gimple.h (gimple_has_substatements, CASE_GIMPLE_OMP)
+   (is_gimple_omp_oacc_specifically): Handle it.
+   (gimple_statement_oacc_kernels): New struct.
+   (gimple_build_oacc_kernels): New prototype.
+   (gimple_oacc_kernels_clauses, gimple_oacc_kernels_clauses_ptr)
+   (gimple_oacc_kernels_set_clauses, gimple_oacc_kernels_child_fn)
+   (gimple_oacc_kernels_child_fn_ptr)
+   (gimple_oacc_kernels_set_child_fn, gimple_oacc_kernels_data_arg)
+   (gimple_oacc_kernels_data_arg_ptr)
+   (gimple_oacc_kernels_set_data_arg): New inline functions.
+   * gimple.c (gimple_build_oacc_kernels): New function.
+   (gimple_copy): Handle GIMPLE_OACC_KERNELS.
+   * gimple-low.c (lower_stmt): Likewise.
+   * gimple-walk.c (walk_gimple_op, walk_gimple_stmt): Likewise.
+   * gimple-pretty-print.c (pp_gimple_stmt_1): Likewise.
+   (dump_gimple_oacc_parallel): Rename to dump_gimple_oacc_offload.
+   Also handle GIMPLE_OACC_KERNELS.  Update all callers.
+   * gimplify.c (gimplify_omp_workshare, gimplify_expr): Handle
+   OACC_KERNELS.
+   * oacc-builtins.def (BUILT_IN_GOACC_KERNELS): New builtin.
+   * omp-low.c (scan_oacc_parallel, expand_oacc_parallel)
+   (lower_oacc_parallel): Rename to scan_oacc_offload,
+   expand_oacc_offload, and lower_oacc_offload.  Also handle
+   GIMPLE_OACC_KERNELS.  Update all callers.
+   (scan_sharing_clauses, scan_omp_1_stmt, expand_omp, lower_omp_1)
+   (diagnose_sb_0, diagnose_sb_1, diagnose_sb_2)
+   (make_gimple_omp_edges): Handle GIMPLE_OACC_KERNELS.
+   * tree-inline.c (remap_gimple_stmt, estimate_num_insns)

[gomp4 2/2] Initial support for the OpenACC kernels construct in the C front end.

2014-02-28 Thread Thomas Schwinge

From: tschwinge 

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add "kernels".
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_KERNELS.
gcc/c/
* c-parser.c (OACC_KERNELS_CLAUSE_MASK): New macro definition.
(c_parser_oacc_kernels): New function.
(c_parser_omp_construct): Handle PRAGMA_OACC_KERNELS.
* c-tree.h (c_finish_oacc_kernels): New prototype.
* c-typeck.c (c_finish_oacc_kernels): New function.
gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-fail-1.c: Extend for OpenACC
kernels construct.
* c-c++-common/goacc/clauses-fail.c: Likewise.
* c-c++-common/goacc/data-clause-duplicate-1.c: Likewise.
* c-c++-common/goacc/deviceptr-1.c: Likewise.
* c-c++-common/goacc/nesting-fail-1.c: Likewise.
* c-c++-common/goacc/kernels-1.c: New file.
* gcc.dg/goacc/parallel-sb-1.c: Rename to...
* gcc.dg/goacc/sb-1.c: ... this new file, and extend for OpenACC
kernels and data constructs.
* gcc.dg/goacc/parallel-sb-2.c: Rename to...
* gcc.dg/goacc/sb-2.c: ... this new file, and extend for OpenACC
kernels and data constructs.
libgomp/
* testsuite/libgomp.oacc-c/goacc_kernels.c: New file.
* testsuite/libgomp.oacc-c/kernels-1.c: Likewise.
* testsuite/libgomp.oacc-c/parallel-1.c: Add one missing test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208216 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/c-family/ChangeLog.gomp|   5 +
 gcc/c-family/c-pragma.c|   1 +
 gcc/c-family/c-pragma.h|   1 +
 gcc/c/ChangeLog.gomp   |   8 +
 gcc/c/c-parser.c   |  42 +
 gcc/c/c-tree.h |   1 +
 gcc/c/c-typeck.c   |  19 +++
 gcc/testsuite/ChangeLog.gomp   |  16 ++
 .../c-c++-common/goacc-gomp/nesting-fail-1.c   |  84 ++
 gcc/testsuite/c-c++-common/goacc/clauses-fail.c|   3 +
 .../c-c++-common/goacc/data-clause-duplicate-1.c   |   4 +-
 gcc/testsuite/c-c++-common/goacc/deviceptr-1.c |  18 +--
 gcc/testsuite/c-c++-common/goacc/kernels-1.c   |   6 +
 gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c  |  20 +++
 gcc/testsuite/gcc.dg/goacc/parallel-sb-1.c |  22 ---
 gcc/testsuite/gcc.dg/goacc/parallel-sb-2.c |  10 --
 gcc/testsuite/gcc.dg/goacc/sb-1.c  |  54 +++
 gcc/testsuite/gcc.dg/goacc/sb-2.c  |  22 +++
 libgomp/ChangeLog.gomp |   4 +
 libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c   |  25 +++
 libgomp/testsuite/libgomp.oacc-c/kernels-1.c   | 170 +
 libgomp/testsuite/libgomp.oacc-c/parallel-1.c  |  14 ++
 22 files changed, 506 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-1.c
 delete mode 100644 gcc/testsuite/gcc.dg/goacc/parallel-sb-1.c
 delete mode 100644 gcc/testsuite/gcc.dg/goacc/parallel-sb-2.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/sb-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/sb-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/goacc_kernels.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/kernels-1.c

diff --git gcc/c-family/ChangeLog.gomp gcc/c-family/ChangeLog.gomp
index 3da377f..3b4a335 100644
--- gcc/c-family/ChangeLog.gomp
+++ gcc/c-family/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-02-28  Thomas Schwinge  
+
+   * c-pragma.c (oacc_pragmas): Add "kernels".
+   * c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_KERNELS.
+
 2014-02-21  Thomas Schwinge  
 
* c-pragma.c (oacc_pragmas): Add "data".
diff --git gcc/c-family/c-pragma.c gcc/c-family/c-pragma.c
index 08374aa..ee0ee93 100644
--- gcc/c-family/c-pragma.c
+++ gcc/c-family/c-pragma.c
@@ -1170,6 +1170,7 @@ static vec registered_pp_pragmas;
 struct omp_pragma_def { const char *name; unsigned int id; };
 static const struct omp_pragma_def oacc_pragmas[] = {
   { "data", PRAGMA_OACC_DATA },
+  { "kernels", PRAGMA_OACC_KERNELS },
   { "parallel", PRAGMA_OACC_PARALLEL },
 };
 static const struct omp_pragma_def omp_pragmas[] = {
diff --git gcc/c-family/c-pragma.h gcc/c-family/c-pragma.h
index d092f9f..d55a511 100644
--- gcc/c-family/c-pragma.h
+++ gcc/c-family/c-pragma.h
@@ -28,6 +28,7 @@ typedef enum pragma_kind {
   PRAGMA_NONE = 0,
 
   PRAGMA_OACC_DATA,
+  PRAGMA_OACC_KERNELS,
   PRAGMA_OACC_PARALLEL,
   PRAGMA_OMP_ATOMIC,
   PRAGMA_OMP_BARRIER,
diff --git gcc/c/ChangeLog.gomp gcc/c/ChangeLog.gomp
index 9b95725..0551026 100644
--- gcc/c/ChangeLog.gomp
+++ gcc/c/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2014-02-28  Thomas Schwinge  
+
+   * c-parser.c (OACC_KERNELS_CLAUSE_MASK): New macro definition.
+   (c_parser_oacc_kernels): New function.
+   (c_parser_omp_construct): Handle PRAGMA_OACC_KERNELS.
+   * c-tree

Re: [AArch64 05/14] Add AArch64 'prefetch'-pattern.

2014-02-28 Thread Dr. Philipp Tomsich

Ganesh,

On 28 Feb 2014, at 10:13 , Gopalasubramanian, Ganesh 
 wrote:

> I also have attached a patch that implements the following. 
> * Prefetch with immediate offset in the range 0 to 32760 (multiple of 8). 
> Added a predicate for this.
> * Prefetch with immediate offset - in the range -256 to 255 (Gets 
> generated only when we have a negative offset. Generates prfum instruction). 
> Added a predicate for this.
> * Prefetch with register offset. (modified for printing the locality)

These changes look good to me.
We’ll try them out on the benchmarks that caused us to add prefetching in the 
first place.

Best,
Philipp.

[Patch AArch64] Define TARGET_FLAGS_REGNUM

2014-02-28 Thread Ramana Radhakrishnan


Hi,

	This defines TARGET_FLAGS_REGNUM for AArch64 to be CC_REGNUM. Noticed 
this turns on the cmpelim pass after reload and in a few examples and a 
couple of benchmarks I noticed a number of comparisons getting deleted. 
A similar patch for AArch32 is being tested.


Tested cross with aarch64-none-elf on a model with no regressions.

Ok for stage1 ?

regards
Ramana

  Ramana Radhakrishnan  

* config/aarch64/aarch64.c (TARGET_FLAGS_REGNUM): Define.


--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 901ad3de793c2dd6ca3a2458dc6268e56322400a..617f4de494b1c9fa366dcf4a9fc7f22e7d11642a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8536,6 +8536,9 @@ aarch64_cannot_change_mode_class (enum machine_mode from,
 #undef TARGET_FIXED_CONDITION_CODE_REGS
 #define TARGET_FIXED_CONDITION_CODE_REGS aarch64_fixed_condition_code_regs
 
+#undef TARGET_FLAGS_REGNUM
+#define TARGET_FLAGS_REGNUM CC_REGNUM
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"

Re: [PATCH/AARCH64 1/3] Add AARCH64 ILP32 PCH support

2014-02-28 Thread Richard Earnshaw

On 26/02/14 02:25, Andrew Pinski wrote:
> 
> Hi,
>   Just like most of the targets out there we should define
> TRY_EMPTY_VM_SPACE to have better PCH support.
> 
> OK?  Built and tested on aarch64-linux-gnu with no regressions.
> 
> Thanks,
> Andrew Pinski
> 
>   * config/host-linux.c (TRY_EMPTY_VM_SPACE): Change aarch64 ilp32
>   definition.
> ---
>  gcc/ChangeLog   |5 +
>  gcc/config/host-linux.c |4 +++-
>  2 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 616d8ec..fd2b6cd 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2014-02-25  Andrew Pinski  
> +
> + * config/host-linux.c (TRY_EMPTY_VM_SPACE): Change aarch64 ilp32
> + definition.
> +
>  2014-02-25  Vladimir Makarov  
>  
>   PR rtl-optimization/60317
> diff --git a/gcc/config/host-linux.c b/gcc/config/host-linux.c
> index 17048d7..b298a17 100644
> --- a/gcc/config/host-linux.c
> +++ b/gcc/config/host-linux.c
> @@ -86,8 +86,10 @@
>  # define TRY_EMPTY_VM_SPACE  0x6000
>  #elif defined(__mc68000__)
>  # define TRY_EMPTY_VM_SPACE  0x4000
> -#elif defined(__aarch64__)
> +#elif defined(__aarch64__) && defined(__LP64__)
>  # define TRY_EMPTY_VM_SPACE  0x10
> +#elif defined(__aarch64__)
> +# define TRY_EMPTY_VM_SPACE  0x6000
>  #elif defined(__ARM_EABI__)
>  # define TRY_EMPTY_VM_SPACE 0x6000
>  #elif defined(__mips__) && defined(__LP64__)
> 

I'd prefer to see this written as:


-#elif defined(__aarch64__)
+#elif defined(__aarch64__) && defined(__ILP32__)
 # define TRY_EMPTY_VM_SPACE0x6000
+#elif defined(__aarch64__)
+# define TRY_EMPTY_VM_SPACE0x10


Since I'd expect there to be a much higher likelihood of another variant
that uses 64-bit pointers (eg LLP64) than of there being another variant
that uses 32-bit.

R.

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread Richard Biener

On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
 wrote:
> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu  wrote:
>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng  wrote:
>>> Hi,
>>> This patch is to fix regression reported in PR60280 by removing forward loop
>>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>>> this change since cfg cleanup is shared by all optimizers.  Some tests has
>>> already been fixed by recent patches, I went through and fixed the others.
>>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  When
>>> GCC removing a basic block, it checks profile information by calling
>>> check_bb_profile after redirecting incoming edges of the bb.  This certainly
>>> results in warnings about invalid profile information and causes the case to
>>> fail.  I will send a patch to skip checking profile information for a
>>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>>> twisted the case itself.
>>>
>>> Bootstrap and tested on x86_64 and arm_a15.
>>>
>>> Is it OK?
>>>
>>>
>>> 2014-02-25  Bin Cheng  
>>>
>>> PR target/60280
>>> * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>>> preheaders and latches only if requested.  Fix latch if it
>>> is removed.
>>> * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>>> LOOPS_HAVE_PREHEADERS.
>>>
>>
>> This change:
>>
>> if (dest->loop_father->header == dest)
>> -  return false;
>> +  {
>> +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>> +&& bb->loop_father->header != dest)
>> +  return false;
>> +
>> +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>> +&& bb->loop_father->header == dest)
>> +  return false;
>> +  }
>>  }
>>
>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>>
>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
>> -fuse-linker-plugin
>>
>> This patch changes loops without LOOPS_HAVE_PREHEADERS
>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
>> true.  I don't have a small testcase.  But this patch:
>>
>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
>> index b5c384b..2ba673c 100644
>> --- a/gcc/tree-cfgcleanup.c
>> +++ b/gcc/tree-cfgcleanup.c
>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted)
>>  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>  && bb->loop_father->header == dest)
>>return false;
>> +
>> +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>> +&& !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
>> +  return false;
>>}
>>  }
>>
>> fixes the regression.  Does it make any senses?
>
> I think the preheader test isn't fully correct (bb may be in an inner loop
> for example).  So a more conservative variant would be
>
> Index: gcc/tree-cfgcleanup.c
> ===
> --- gcc/tree-cfgcleanup.c   (revision 208169)
> +++ gcc/tree-cfgcleanup.c   (working copy)
> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
>/* Protect loop preheaders and latches if requested.  */
>if (dest->loop_father->header == dest)
> {
> - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
> - && bb->loop_father->header != dest)
> -   return false;
> -
> - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
> - && bb->loop_father->header == dest)
> -   return false;
> + if (bb->loop_father == dest->loop_father)
> +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
> + else if (bb->loop_father == loop_outer (dest->loop_father))
> +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
> + /* Always preserve other edges into loop headers that are
> +not simple latches or preheaders.  */
> + return false;
> }
>  }
>
> that makes sure we can properly update loop information.  It's also
> a more conservative change at this point which should still successfully
> remove simple latches and preheaders created by loop discovery.

I think the patch makes sense anyway and thus I'll install it once it
passed bootstrap / regtesting.

Another fix that may make sense is to restrict it to
!loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
itself can end up setting that ... which we eventually should fix if it
still happens.  That is, check if

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 208169)
+++ gcc/tree-cfgcleanup.c   (working copy)

@@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)

   timevar_pop (TV_TREE_CLEANUP_CFG);

-  if (changed && current_loops)
-loops_state_set (LOOPS_NEED_FIXUP);
+  if (changed && current_loops
+  && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+verify_loop_stru

Re: [AArch64] 64-bit float vreinterpret implemention

2014-02-28 Thread Alex Velenko


On 25/02/14 18:15, Richard Henderson wrote:

On 02/25/2014 09:02 AM, Alex Velenko wrote:

+(define_expand "aarch64_reinterpretdf"
+  [(match_operand:DF 0 "register_operand" "")
+   (match_operand:VD_RE 1 "register_operand" "")]
+  "TARGET_SIMD"
+{
+  aarch64_simd_reinterpret (operands[0], operands[1]);
+  DONE;
+})


I believe you want to implement these in aarch64_fold_builtin to fold to a
VIEW_CONVERT_EXPR.  No sense in leaving these opaque until rtl expansion.


r~



Hi Richard,
Thank you for your suggestion. Attached is a patch that includes
implementation of your proposition. A testsuite was run on LE and BE
compilers with no regressions.

Here is the description of the patch:

This patch introduces vreinterpret implementation for vectors with 
64-bit float lanes and adds testcase for those intrinsics.


Thanks,
Alex

gcc/

2014-02-28  Alex Velenko  

* config/aarch64/aarch64-builtins.c (TYPES_REINTERP): Removed.
(aarch64_types_signed_unsigned_qualifiers): Qualifier added.
(aarch64_types_signed_poly_qualifiers): Likewise.
(aarch64_types_unsigned_signed_qualifiers): Likewise.
(aarch64_types_poly_signed_qualifiers): Likewise.
(TYPES_REINTERP_SS): Type macro added.
(TYPES_REINTERP_SU): Likewise.
(TYPES_REINTERP_SP): Likewise.
(TYPES_REINTERP_US): Likewise.
(TYPES_REINTERP_PS): Likewise.
(aarch64_fold_builtin): New expression folding added.
* config/aarch64/aarch64-simd-builtins.def (REINTERP):
Declarations removed.
(REINTERP_SS): Declarations added.
(REINTERP_US): Likewise.
(REINTERP_PS): Likewise.
(REINTERP_SU): Likewise.
(REINTERP_SP): Likewise.
* config/aarch64/arm_neon.h (vreinterpret_p8_f64): Implemented.
(vreinterpretq_p8_f64): Likewise.
(vreinterpret_p16_f64): Likewise.
(vreinterpretq_p16_f64): Likewise.
(vreinterpret_f32_f64): Likewise.
(vreinterpretq_f32_f64): Likewise.
(vreinterpret_f64_f32): Likewise.
(vreinterpret_f64_p8): Likewise.
(vreinterpret_f64_p16): Likewise.
(vreinterpret_f64_s8): Likewise.
(vreinterpret_f64_s16): Likewise.
(vreinterpret_f64_s32): Likewise.
(vreinterpret_f64_s64): Likewise.
(vreinterpret_f64_u8): Likewise.
(vreinterpret_f64_u16): Likewise.
(vreinterpret_f64_u32): Likewise.
(vreinterpret_f64_u64): Likewise.
(vreinterpretq_f64_f32): Likewise.
(vreinterpretq_f64_p8): Likewise.
(vreinterpretq_f64_p16): Likewise.
(vreinterpretq_f64_s8): Likewise.
(vreinterpretq_f64_s16): Likewise.
(vreinterpretq_f64_s32): Likewise.
(vreinterpretq_f64_s64): Likewise.
(vreinterpretq_f64_u8): Likewise.
(vreinterpretq_f64_u16): Likewise.
(vreinterpretq_f64_u32): Likewise.
(vreinterpretq_f64_u64): Likewise.
(vreinterpret_s64_f64): Likewise.
(vreinterpretq_s64_f64): Likewise.
(vreinterpret_u64_f64): Likewise.
(vreinterpretq_u64_f64): Likewise.
(vreinterpret_s8_f64): Likewise.
(vreinterpretq_s8_f64): Likewise.
(vreinterpret_s16_f64): Likewise.
(vreinterpretq_s16_f64): Likewise.
(vreinterpret_s32_f64): Likewise.
(vreinterpretq_s32_f64): Likewise.
(vreinterpret_u8_f64): Likewise.
(vreinterpretq_u8_f64): Likewise.
(vreinterpret_u16_f64): Likewise.
(vreinterpretq_u16_f64): Likewise.
(vreinterpret_u32_f64): Likewise.
(vreinterpretq_u32_f64): Likewise.

gcc/testsuite/

2014-02-28  Alex Velenko  

* gcc.target/aarch64/vreinterpret_f64_1.c: new_testcase
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 5e0e9b94653deb1530955d62d9842c39da95058a..8241f918e3fcfb71144daf1c873ba1ed481a4385 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -147,6 +147,23 @@ aarch64_types_unopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned };
 #define TYPES_UNOPU (aarch64_types_unopu_qualifiers)
 #define TYPES_CREATE (aarch64_types_unop_qualifiers)
+#define TYPES_REINTERP_SS (aarch64_types_unop_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_unop_su_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned };
+#define TYPES_REINTERP_SU (aarch64_types_unop_su_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_unop_sp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_poly };
+#define TYPES_REINTERP_SP (aarch64_types_unop_sp_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_unop_us_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none };
+#define TYPES_REINTERP_US (aarch64_types_unop_us_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_unop_ps_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_pol

Re: [C++ Patch] PR 60314 (ICE with decltype(auto))

2014-02-28 Thread Paolo Carlini


Hi,

On 02/27/2014 08:29 PM, Jason Merrill wrote:

On 02/25/2014 05:03 AM, Paolo Carlini wrote:

here we ICE exactly as we did in c++/53756: the only difference is the
use of decltype(auto) instead of auto. Now, if we compare is_cxx_auto to
is_auto (the front-end helper), evidently there is an inconsistency
about the handling of decltype(auto) and the below fixes the ICE.
However, also clearly the patchlet needs a review, because an out of
class decltype(auto) is already fine. Also, I'm not 100% sure we don't
need a decltype_auto_die, etc.


I think we do need a decltype_auto_die.


Ok, then I tested on x86_64-linux the below.

Thanks!
Paolo.

///
2014-02-28  Paolo Carlini  

PR c++/60314
* dwarf2out.c (decltype_auto_die): New static.
(gen_subprogram_die): Handle 'decltype(auto)' like 'auto'.
(gen_type_die_with_usage): Handle 'decltype(auto)'.
(is_cxx_auto): Likewise.

/testsuite
2014-02-28  Paolo Carlini  

PR c++/60314
* g++.dg/cpp1y/auto-fn24.C: New.
Index: dwarf2out.c
===
--- dwarf2out.c (revision 208214)
+++ dwarf2out.c (working copy)
@@ -250,6 +250,9 @@ static GTY(()) section *cold_text_section;
 /* The DIE for C++1y 'auto' in a function return type.  */
 static GTY(()) dw_die_ref auto_die;
 
+/* The DIE for C++1y 'decltype(auto)' in a function return type.  */
+static GTY(()) dw_die_ref decltype_auto_die;
+
 /* Forward declarations for functions defined in this file.  */
 
 static char *stripattributes (const char *);
@@ -10230,7 +10233,8 @@ is_cxx_auto (tree type)
   tree name = TYPE_NAME (type);
   if (TREE_CODE (name) == TYPE_DECL)
name = DECL_NAME (name);
-  if (name == get_identifier ("auto"))
+  if (name == get_identifier ("auto")
+ || name == get_identifier ("decltype(auto)"))
return true;
 }
   return false;
@@ -18022,10 +18026,11 @@ gen_subprogram_die (tree decl, dw_die_ref context_
  if (get_AT_unsigned (old_die, DW_AT_decl_line) != (unsigned) s.line)
add_AT_unsigned (subr_die, DW_AT_decl_line, s.line);
 
- /* If the prototype had an 'auto' return type, emit the real
-type on the definition die.  */
+ /* If the prototype had an 'auto' or 'decltype(auto)' return type,
+emit the real type on the definition die.  */
  if (is_cxx() && debug_info_level > DINFO_LEVEL_TERSE
- && get_AT_ref (old_die, DW_AT_type) == auto_die)
+ && (get_AT_ref (old_die, DW_AT_type) == auto_die
+ || get_AT_ref (old_die, DW_AT_type) == decltype_auto_die))
add_type_attribute (subr_die, TREE_TYPE (TREE_TYPE (decl)),
0, 0, context_die);
}
@@ -19852,13 +19857,18 @@ gen_type_die_with_usage (tree type, dw_die_ref con
 default:
   if (is_cxx_auto (type))
{
- if (!auto_die)
+ tree name = TYPE_NAME (type);
+ if (TREE_CODE (name) == TYPE_DECL)
+   name = DECL_NAME (name);
+ dw_die_ref *die = (name == get_identifier ("auto")
+? &auto_die : &decltype_auto_die);
+ if (!*die)
{
- auto_die = new_die (DW_TAG_unspecified_type,
- comp_unit_die (), NULL_TREE);
- add_name_attribute (auto_die, "auto");
+ *die = new_die (DW_TAG_unspecified_type,
+ comp_unit_die (), NULL_TREE);
+ add_name_attribute (*die, IDENTIFIER_POINTER (name));
}
- equate_type_number_to_die (type, auto_die);
+ equate_type_number_to_die (type, *die);
  break;
}
   gcc_unreachable ();
Index: testsuite/g++.dg/cpp1y/auto-fn24.C
===
--- testsuite/g++.dg/cpp1y/auto-fn24.C  (revision 0)
+++ testsuite/g++.dg/cpp1y/auto-fn24.C  (working copy)
@@ -0,0 +1,12 @@
+// PR c++/60314
+// { dg-options "-std=c++1y -g" }
+
+// fine
+decltype(auto) qux() { return 42; }
+
+struct foo
+{
+  // also ICEs if not static 
+  static decltype(auto) bar()
+  { return 42; }
+};

Re: [PATCH v4] PR middle-end/60281

2014-02-28 Thread lin zuojian

于 2014年02月28日 15:58, lin zuojian 写道:
> Hi Bernd,
> I agree you with the mode problem.
>
> And I have not change the stack alignment.What I change is the virtual
> register base's alignment.
> Realignment must be make in !STRICT_ALIGNMENT machine,or emitting the
> efficient code is impossible.
Sorry, it should be "Realignment must be make in STRICT_ALIGNMENT machine".
> For example 4 set mem:QI X,REG:QI Y will not combine into one set mem:SI
> X1,REG:SI Y1,if X is not mentioned as SI mode aligned.
> To make sure X is SI mode algined,virtual register base must be realigned.
>
> For this patch,I only intent to make it right.Making it best is next task.
> --
> Regards
> lin zuojian.
>
> 于 2014年02月28日 15:47, Bernd Edlinger 写道:
>> Hi,
>>
>> I see the problem too.
>>
>> But I think it is not necessary to change the stack alignment
>> to solve the problem.
>>
>> It appears to me that the code in asan_emit_stack_protection
>> is just wrong. It uses SImode when the memory is not aligned
>> enough for that mode. This would not happen if that code
>> is rewritten to use get_best_mode, and by the way, even on
>> x86_64 the emitted code is not optimal, because that target
>> could work with DImode more efficiently.
>>
>> So, to fix that, it would be better to concentrate on that function,
>> and use word_mode instead of SImode, and let get_best_mode
>> choose the required mode.
>>
>>
>> Regards
>> Bernd Edlinger.

Re: [patch] [arm] Fix PR60169 - thumb1 far jump

2014-02-28 Thread Ramana Radhakrishnan

On Fri, Feb 28, 2014 at 2:42 AM, Joey Ye  wrote:
> Ping. OK for trunk and 4.8?

Ok if no regressions.

Ramana

>
>> -Original Message-
>> From: Joey Ye [mailto:joey...@arm.com]
>> Sent: 21 February 2014 19:32
>> To: gcc-patches@gcc.gnu.org
>> Subject: [patch] [arm] Fix PR60169 - thumb1 far jump
>>
>> Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced
>> this ICE:
>>
>> 1. thumb1 estimate if far_jump is used based on function insn size 2.
> During
>> reload, after stack layout finalized, it does reload_as_needed. It however
>> increases insn size that changes estimation result of far_jump, which in
>> return need to save lr and change stack layout again. While there is not
>> chance to change, GCC crashes.
>>
>> Solution:
>> Do not change estimation result of far_jump if reload_in_progress or
>> reload_completed is true.
>>
>> Not likely need to fix lra according to Vlad:
>> http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html
>>
>> ChangeLog:
>> * config/arm/arm.c (thumb_far_jump_used_p): Don't change
>>   if reload in progress or completed.
>>
>> * gcc.target/arm/thumb1-far-jump-3.c: New case.

[PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.

2014-02-28 Thread Kirill Yukhin

Hello,
This is relatively obvious patch which eliminates comparision
of inifinities for exp2 AVX-512 test and properly comparing floats
for avx512f-sqrtps-2.c.

Tests pass.

Is it ok for trunk?

gcc/testsuite/
* gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent
argument to avoid inf values.
* gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with
UNION_FP_CHECK machinery.

--
Thanks, K

---
 gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c 
b/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c
index 06ef68c..ab911c0 100644
--- a/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512er-vexp2ps-2.c
@@ -25,7 +25,7 @@ avx512er_test (void)

   for (i = 0; i < 16; i++)
 {
-  src.a[i] = 179.345 - 6.5645 * i;
+  src.a[i] = 79.345 - 6.5645 * i;
   res2.a[i] = DEFAULT_VALUE;
 }

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c
index 5249bbd..f5a7b78 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vdivps-2.c
@@ -46,10 +46,10 @@ TEST (void)
 abort ();

   MASK_MERGE () (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN,) (res2, res_ref))
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res2, res_ref))
 abort ();

   MASK_ZERO () (res_ref, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN,) (res3, res_ref))
+  if (UNION_FP_CHECK (AVX512F_LEN,) (res3, res_ref))
 abort ();

Re: [PATCH] [libgcc,arm] Fix PR 60166 - NAN fraction bits

2014-02-28 Thread Ramana Radhakrishnan

On Fri, Feb 28, 2014 at 7:16 AM, Joey Ye  wrote:
> This patch is a mirror copy from approved patch in glibc:
> http://sourceware.org/ml/libc-alpha/2014-02/msg00741.html
>
> OK to trunk, 4.8 and 4.7?

OK everywhere.

Ramana

>
> ChangeLog.libgcc:
>
> * config/arm/sfp-machine.h (_FP_NANFRAC_H,
>   _FP_NANFRAC_S, _FP_NANFRAC_D, _FP_NANFRAC_Q):
>   Set to zero.
>
> diff --git a/libgcc/config/arm/sfp-machine.h
> b/libgcc/config/arm/sfp-machine.h
> index bb34895..8d45320 100644
> --- a/libgcc/config/arm/sfp-machine.h
> +++ b/libgcc/config/arm/sfp-machine.h
> @@ -19,10 +19,12 @@ typedef int __gcc_CMPtype __attribute__ ((mode
> (__libgcc_cmp_return__)));
>  #define _FP_DIV_MEAT_D(R,X,Y)  _FP_DIV_MEAT_2_udiv(D,R,X,Y)
>  #define _FP_DIV_MEAT_Q(R,X,Y)  _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
>
> -#define _FP_NANFRAC_H  ((_FP_QNANBIT_H << 1) - 1)
> -#define _FP_NANFRAC_S  ((_FP_QNANBIT_S << 1) - 1)
> -#define _FP_NANFRAC_D  ((_FP_QNANBIT_D << 1) - 1), -1
> -#define _FP_NANFRAC_Q  ((_FP_QNANBIT_Q << 1) - 1), -1, -1, -1
> +/* According to RTABI, QNAN is only with the most significant bit of the
> +   significand set, and all other significand bits zero.  */
> +#define _FP_NANFRAC_H  0
> +#define _FP_NANFRAC_S  0
> +#define _FP_NANFRAC_D  0, 0
> +#define _FP_NANFRAC_Q  0, 0, 0, 0
>  #define _FP_NANSIGN_H  0
>  #define _FP_NANSIGN_S  0
>  #define _FP_NANSIGN_D  0
>
>
>

[PATCH] Restrict and fix the PR60280 fix

2014-02-28 Thread Richard Biener


This narrows down the effect of the PR60280 fix (removing more
forwarder blocks during cfg-cleanup when loops are present) to
only remove forwarders how loop_optimizer_init would create
them.  It also fixes the loop latch updating in remove_forwarder_block
(though that doesn't have any immediate effect as we fixup loops
anywya) - it was set to the wrong loop.  Which also made me
figure that we don't honor !LOOPS_MAY_HAVE_MULTIPLE_LATCHES
properly (also fixed).

Maybe any of the above will fix the gromacs miscompare HJ is seeing
(can't reproduce it).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-02-28  Richard Biener  

PR target/60280
* tree-cfgcleanup.c (tree_forwarder_block_p): Restrict
previous fix and only allow to remove trivial pre-headers
and latches.  Also honor LOOPS_MAY_HAVE_MULTIPLE_LATCHES.
(remove_forwarder_block): Properly update the latch of
a loop.

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 208216)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -316,13 +316,22 @@ tree_forwarder_block_p (basic_block bb,
   /* Protect loop preheaders and latches if requested.  */
   if (dest->loop_father->header == dest)
{
- if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
- && bb->loop_father->header != dest)
-   return false;
-
- if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
- && bb->loop_father->header == dest)
-   return false;
+ if (bb->loop_father == dest->loop_father)
+   {
+ if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
+   return false;
+ /* If bb doesn't have a single predecessor we'd make this
+loop have multiple latches.  Don't do that if that
+would in turn require disambiguating them.  */
+ return (single_pred_p (bb)
+ || loops_state_satisfies_p
+  (LOOPS_MAY_HAVE_MULTIPLE_LATCHES));
+   }
+ else if (bb->loop_father == loop_outer (dest->loop_father))
+   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
+ /* Always preserve other edges into loop headers that are
+not simple latches or preheaders.  */
+ return false;
}
 }
 
@@ -417,6 +426,10 @@ remove_forwarder_block (basic_block bb)
 
   can_move_debug_stmts = MAY_HAVE_DEBUG_STMTS && single_pred_p (dest);
 
+  basic_block pred = NULL;
+  if (single_pred_p (bb))
+pred = single_pred (bb);
+
   /* Redirect the edges.  */
   for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
 {
@@ -510,7 +523,7 @@ remove_forwarder_block (basic_block bb)
   /* Adjust latch infomation of BB's parent loop as otherwise
  the cfg hook has a hard time not to kill the loop.  */
   if (current_loops && bb->loop_father->latch == bb)
-bb->loop_father->latch = dest;
+bb->loop_father->latch = pred;
 
   /* And kill the forwarder block.  */
   delete_basic_block (bb);

Re: [PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.

2014-02-28 Thread Uros Bizjak

On Fri, Feb 28, 2014 at 1:14 PM, Kirill Yukhin  wrote:
> Hello,
> This is relatively obvious patch which eliminates comparision
> of inifinities for exp2 AVX-512 test and properly comparing floats
> for avx512f-sqrtps-2.c.
>
> Tests pass.
>
> Is it ok for trunk?
>
> gcc/testsuite/
> * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent
> argument to avoid inf values.
> * gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with
> UNION_FP_CHECK machinery.

You are talking about avx512f-sqrtps-2.c, the ChangeLog refers to
avx512er-vexp2ps-2.c, but the patch is modifying avx512f-vdivps-2.c.

Uros.

Re: copyright dates in binutils (and includes/)

2014-02-28 Thread Joel Brobecker

> Joseph, do you know why implicitly adding years to the claimed
> copyright years is a problem?  I'm guessing the file needs to be
> published somewhere for each year claimed.

IANAL, but from 2 discussions with copyright-clerk:

  1. We start claiming copyright the year the file as committed
 to a medium (hard drive), not the year it was published.

  2. As long as we have evidence of a copyrightable change each year,
 we can include that year in the list of copyright years in
 all files' headers.

For (2), this is how I asked the FSF:

> My question is: As we have evidence of copyrightable changes to the
> GDB project every year since 1986, is it acceptable fix the copyright
> headers to add the missing holes? And if yes, is it acceptable to go
> straight to the next step, which is reducing the copyright years to
> a single range, even if the original list had holes in it? (we will
> make sure that the first year of the range is always 1986 or later,
> or else investigate to make sure that the range is correct).
>
> For example, we would reduce:
>
> > Copyright (C) 1986, 1988-1989, 1991-1993, 1999-2000, 2007-2012 Free
> > Software Foundation, Inc.
>
> into:
>
> > 1986-2012 Free Software Foundation, Inc.
>
> Naturally, if the initial year was 1995, then it would be the year
> used as the start of the range!

... to which they answered that it would be acceptable.

Does it mean that the sources needed to be made public that year for
us to be able to claim copyright that year? It did not seem so to me.
But you could ask the FSF (copyright DASH clerk AT fsf DOT org).

-- 
Joel

[jit] New API entrypoint: gcc_jit_block_get_function

2014-02-28 Thread David Malcolm

Committed to branch dmalcolm/jit:

gcc/jit/
* libgccjit.h (gcc_jit_block_get_function): New.
* libgccjit.map (gcc_jit_block_get_function): New.
* libgccjit++.h (gccjit::block::get_function): New method.
* libgccjit.c (gcc_jit_block_get_function): New.
---
 gcc/jit/ChangeLog.jit | 7 +++
 gcc/jit/libgccjit++.h | 8 
 gcc/jit/libgccjit.c   | 8 
 gcc/jit/libgccjit.h   | 4 
 gcc/jit/libgccjit.map | 1 +
 5 files changed, 28 insertions(+)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index c7b2395..6c43ce9 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,3 +1,10 @@
+2014-02-28  David Malcolm  
+
+   * libgccjit.h (gcc_jit_block_get_function): New.
+   * libgccjit.map (gcc_jit_block_get_function): New.
+   * libgccjit++.h (gccjit::block::get_function): New method.
+   * libgccjit.c (gcc_jit_block_get_function): New.
+
 2014-02-27  David Malcolm  
 
* libgccjit.h (gcc_jit_label): Delete in favor of...
diff --git a/gcc/jit/libgccjit++.h b/gcc/jit/libgccjit++.h
index a8801a3..7c1c3be 100644
--- a/gcc/jit/libgccjit++.h
+++ b/gcc/jit/libgccjit++.h
@@ -316,6 +316,8 @@ namespace gccjit
 
 gcc_jit_block *get_inner_block () const;
 
+function get_function () const;
+
 void add_eval (rvalue rvalue,
   location loc = location ());
 
@@ -1109,6 +,12 @@ function::new_local (type type_,
 name.c_str ()));
 }
 
+inline function
+block::get_function () const
+{
+  return function (gcc_jit_block_get_function ( get_inner_block ()));
+}
+
 inline void
 block::add_eval (rvalue rvalue,
 location loc)
diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
index 1146261..ce7987c 100644
--- a/gcc/jit/libgccjit.c
+++ b/gcc/jit/libgccjit.c
@@ -591,6 +591,14 @@ gcc_jit_block_as_object (gcc_jit_block *block)
   return static_cast  (block->as_object ());
 }
 
+gcc_jit_function *
+gcc_jit_block_get_function (gcc_jit_block *block)
+{
+  RETURN_NULL_IF_FAIL (block, NULL, "NULL block");
+
+  return static_cast  (block->get_function ());
+}
+
 gcc_jit_lvalue *
 gcc_jit_context_new_global (gcc_jit_context *ctxt,
gcc_jit_location *loc,
diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h
index c24fddd..f00d672 100644
--- a/gcc/jit/libgccjit.h
+++ b/gcc/jit/libgccjit.h
@@ -503,6 +503,10 @@ gcc_jit_function_new_block (gcc_jit_function *func,
 extern gcc_jit_object *
 gcc_jit_block_as_object (gcc_jit_block *block);
 
+/* Which function is this block within?  */
+extern gcc_jit_function *
+gcc_jit_block_get_function (gcc_jit_block *block);
+
 /**
  lvalues, rvalues and expressions.
  **/
diff --git a/gcc/jit/libgccjit.map b/gcc/jit/libgccjit.map
index 48fd9d2..9f6a466 100644
--- a/gcc/jit/libgccjit.map
+++ b/gcc/jit/libgccjit.map
@@ -11,6 +11,7 @@
 gcc_jit_block_end_with_jump;
 gcc_jit_block_end_with_return;
 gcc_jit_block_end_with_void_return;
+gcc_jit_block_get_function;
 gcc_jit_context_acquire;
 gcc_jit_context_compile;
 gcc_jit_context_dump_to_file;
-- 
1.7.11.7

Re: [C++ Patch] PR 60314 (ICE with decltype(auto))

2014-02-28 Thread Jason Merrill


OK, thanks.

Jason

Re: [C++ Patch] PR 58610

2014-02-28 Thread Jason Merrill


OK.

Jason

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-28 Thread Ilya Verbin

2014-02-20 22:27 GMT+04:00 Bernd Schmidt :
>  * Functions and variables now go into different tables, otherwise
>intermixing between them could be a problem that causes tables to
>go out of sync between host and target (imagine one big table being
>generated by ptx lto1/mkoffload, and multiple small table fragments
>being linked together on the host side).

If you need 2 different tables for funcs and vars, we can also use
them. But I still don't understand how it will help synchronization
between host and target tables.

>  * I've put the begin/end fragments for the host tables into crtstuff,
>which seems like the standard way of doing things.

Our plan was that the host side descriptor __OPENMP_TARGET__ will
contain (in addition to func/var table) pointers to the images for all
enabled accelerators (e.g. omp_image_nvptx_start and
omp_image_intelmic_start), therefore we generated it in the
lto-wrapper. But if the number of accelerators and their types/names
will be defined during configuration, then it's ok to generate the
descriptor in crtstuff.

>  * Is there a reason to call a register function for the host tables?
>The way I've set it up, we register a target function/variable table
>while also passing a pointer to the __OPENMP_TARGET__ symbol which
>holds information about the host side tables.

In our case we can't register target table with a call to libgomp, it
can be obtained only from the accelerator. Therefore we propose a
target-independent approach: during device initialization libgomp
calls 2 functions from the plugin (or this can be implemented by a
single function):
1. devicep->device_load_image_func, which will load target image (its
pointer will be taken from the host descriptor);
2. devicep->device_get_table_func, which in our case connects to the
device and receives its table. And in your case it will return
func_mappings and var_mappings. Will it work for you?

>  * An offload compiler is built with --enable-as-accelerator-for=, which
>eliminates the need for -fopenmp-target, and changes install paths so
>that the host compiler knows where to find it. No need for
>OFFLOAD_TARGET_COMPILERS anymore.

Unfortunately I don't fully understand this configure magic... When a
user specifies 2 or 3 accelerators during configuration with
--enable-accelerators, will several different accel-gccs be built?

Thanks,
  -- Ilya

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread H.J. Lu

On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
 wrote:
> On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
>  wrote:
>> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu  wrote:
>>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng  wrote:
 Hi,
 This patch is to fix regression reported in PR60280 by removing forward 
 loop
 headers/latches in cfg cleanup if possible.  Several tests are broken by
 this change since cfg cleanup is shared by all optimizers.  Some tests has
 already been fixed by recent patches, I went through and fixed the others.
 One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  When
 GCC removing a basic block, it checks profile information by calling
 check_bb_profile after redirecting incoming edges of the bb.  This 
 certainly
 results in warnings about invalid profile information and causes the case 
 to
 fail.  I will send a patch to skip checking profile information for a
 removing basic block in stage 1 if it sounds reasonable.  For now I just
 twisted the case itself.

 Bootstrap and tested on x86_64 and arm_a15.

 Is it OK?


 2014-02-25  Bin Cheng  

 PR target/60280
 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
 preheaders and latches only if requested.  Fix latch if it
 is removed.
 * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
 LOOPS_HAVE_PREHEADERS.

>>>
>>> This change:
>>>
>>> if (dest->loop_father->header == dest)
>>> -  return false;
>>> +  {
>>> +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>> +&& bb->loop_father->header != dest)
>>> +  return false;
>>> +
>>> +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>> +&& bb->loop_father->header == dest)
>>> +  return false;
>>> +  }
>>>  }
>>>
>>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>>>
>>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
>>> -fuse-linker-plugin
>>>
>>> This patch changes loops without LOOPS_HAVE_PREHEADERS
>>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
>>> true.  I don't have a small testcase.  But this patch:
>>>
>>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
>>> index b5c384b..2ba673c 100644
>>> --- a/gcc/tree-cfgcleanup.c
>>> +++ b/gcc/tree-cfgcleanup.c
>>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool 
>>> phi_wanted)
>>>  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>>  && bb->loop_father->header == dest)
>>>return false;
>>> +
>>> +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>> +&& !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
>>> +  return false;
>>>}
>>>  }
>>>
>>> fixes the regression.  Does it make any senses?
>>
>> I think the preheader test isn't fully correct (bb may be in an inner loop
>> for example).  So a more conservative variant would be
>>
>> Index: gcc/tree-cfgcleanup.c
>> ===
>> --- gcc/tree-cfgcleanup.c   (revision 208169)
>> +++ gcc/tree-cfgcleanup.c   (working copy)
>> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
>>/* Protect loop preheaders and latches if requested.  */
>>if (dest->loop_father->header == dest)
>> {
>> - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>> - && bb->loop_father->header != dest)
>> -   return false;
>> -
>> - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>> - && bb->loop_father->header == dest)
>> -   return false;
>> + if (bb->loop_father == dest->loop_father)
>> +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
>> + else if (bb->loop_father == loop_outer (dest->loop_father))
>> +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
>> + /* Always preserve other edges into loop headers that are
>> +not simple latches or preheaders.  */
>> + return false;
>> }
>>  }
>>
>> that makes sure we can properly update loop information.  It's also
>> a more conservative change at this point which should still successfully
>> remove simple latches and preheaders created by loop discovery.
>
> I think the patch makes sense anyway and thus I'll install it once it
> passed bootstrap / regtesting.
>
> Another fix that may make sense is to restrict it to
> !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
> itself can end up setting that ... which we eventually should fix if it
> still happens.  That is, check if
>
> Index: gcc/tree-cfgcleanup.c
> ===
> --- gcc/tree-cfgcleanup.c   (revision 208169)
> +++ gcc/tree-cfgcleanup.c   (working copy)
>
> @@ -729,8 +729,9 @@ cleanup_tree_cfg_noloop (void)
>
>time

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-28 Thread Bernd Schmidt


On 02/28/2014 05:09 PM, Ilya Verbin wrote:

2014-02-20 22:27 GMT+04:00 Bernd Schmidt :

  * Functions and variables now go into different tables, otherwise
intermixing between them could be a problem that causes tables to
go out of sync between host and target (imagine one big table being
generated by ptx lto1/mkoffload, and multiple small table fragments
being linked together on the host side).


If you need 2 different tables for funcs and vars, we can also use
them. But I still don't understand how it will help synchronization
between host and target tables.


I think it won't help that much - I still think this entire scheme is 
likely to fail on nvptx. I'll try to construct an example at some point.


One other thing about the split tables is that we don't have to write a 
useless size of 1 for functions.



  * I've put the begin/end fragments for the host tables into crtstuff,
which seems like the standard way of doing things.


Our plan was that the host side descriptor __OPENMP_TARGET__ will
contain (in addition to func/var table) pointers to the images for all
enabled accelerators (e.g. omp_image_nvptx_start and
omp_image_intelmic_start), therefore we generated it in the
lto-wrapper.


The concept of "image" is likely to vary somewhat between accelerators. 
For ptx, it's just a string and it can't really be generated the same 
way as for your target where you can manipulate ELF images. So I think 
it is better to have a call to a gomp registration function for every 
offload target. That should also give you the ordering you said you 
wanted between shared libraries.



  * Is there a reason to call a register function for the host tables?
The way I've set it up, we register a target function/variable table
while also passing a pointer to the __OPENMP_TARGET__ symbol which
holds information about the host side tables.


In our case we can't register target table with a call to libgomp, it
can be obtained only from the accelerator. Therefore we propose a
target-independent approach: during device initialization libgomp
calls 2 functions from the plugin (or this can be implemented by a
single function):
1. devicep->device_load_image_func, which will load target image (its
pointer will be taken from the host descriptor);
2. devicep->device_get_table_func, which in our case connects to the
device and receives its table. And in your case it will return
func_mappings and var_mappings. Will it work for you?


Probably. I think the constructor call to the gomp registration function 
would contain an opaque pointer to whatever data the target wants, so it 
can arrange its image/table data in whatever way it likes.


It would help to see the code you have on the libgomp side, I don't 
believe that's been posted yet?



Unfortunately I don't fully understand this configure magic... When a
user specifies 2 or 3 accelerators during configuration with
--enable-accelerators, will several different accel-gccs be built?


No - the idea is that --enable-accelerator= is likely specific to ptx, 
where we really just want to build a gcc and no target libraries, so 
building it alongside the host in an accel-gcc subdirectory is ideal.


For your use case, I'd imagine the offload compiler would be built 
relatively normally as a full build with 
"--enable-as-accelerator-for=x86_64-linux", which would install it into 
locations where the host will eventually be able to find it. Then the 
host compiler would be built with another new configure option (as yet 
unimplemented in my patch set) "--enable-offload-targets=mic,..." which 
would tell the host compiler about the pre-built offload target 
compilers. On the ptx side, "--enable-accelerator=ptx" would then also 
add ptx to the list of --enable-offload-targets.
Naming of all these configure options can be discussed, I have no real 
preference for any of them.



Bernd

Re: [PATCH] Fix epilogue bb expansion (PR middle-end/60175)

2014-02-28 Thread Richard Henderson

On 02/17/2014 11:45 AM, Jakub Jelinek wrote:
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2014-02-17  Jakub Jelinek  
> 
>   PR middle-end/60175
>   * function.c (expand_function_end): Don't emit
>   clobber_return_register sequence if clobber_after is a BARRIER.
>   * cfgexpand.c (construct_exit_block): Append instructions before
>   return_label to prev_bb.

Ok.


r~

[wwwdocs] GSoC2014 and POWER8 News items

2014-02-28 Thread David Edelsohn

I added a news item for GSoC2014. I also realized that POWER8 support
had not been added to the News announcements, so I inserted an item.

Thanks, David


Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.905
diff -c -p -r1.905 index.html
*** index.html  17 Feb 2014 08:28:36 -  1.905
--- index.html  28 Feb 2014 16:41:17 -
*** mission statement.
*** 48,58 
  


-
  News

  

  Intel AVX-512 support
  [2014-02-17]
  Intel AVX-512 support was added to GCC.  That includes inline
--- 48,63 
  


  News

  

+ GCC Google Summer of Code 2014
+ [2014-02-24]
+ GCC has been accepted as a
+ http://www.google-melange.com/gsoc/org2/google/gsoc2014/gcc";>Goog
le Summer of Code 2014 project.
+ Students, mentors and project ideas welcome!
+
  Intel AVX-512 support
  [2014-02-17]
  Intel AVX-512 support was added to GCC.  That includes inline
*** mission statement.
*** 109,114 
--- 114,126 
  https://plus.google.com/108467477471815191158"; rel="publisher" ta
rget="_blank">Google+
   to help developers stay informed of progress.

+ IBM POWER8 support
+ [2013-07-15]
+ Support for the POWER8 processor has been contributed by IBM.
+ This includes new VSX, HTM and atomic instructions, new intrinsics,
+ and scheduling improvements. Little Endian support also has been
+ enhanced, including control over vector element endianness.
+
  GCC 4.8.1 released
  [2013-05-31]

Re: [PATCH] Handle more COMDAT profiling issues

2014-02-28 Thread Teresa Johnson

>>> Here's the new patch. The only changes from the earlier patch are in
>>> handle_missing_profiles, where we now get the counts off of the entry
>>> and call stmt bbs, and in tree_profiling, where we call
>>> handle_missing_profiles earlier and I have removed the outlined cgraph
>>> rebuilding code since it doesn't need to be reinvoked.
>>>
>>> Honza, does this look ok for trunk when stage 1 reopens? David, I can
>>> send a similar patch for review to google-4_8 if it looks good.
>>>
>>> Thanks,
>>> Teresa
>>>
...
>
> Spec testing of my earlier patch hit an issue with the call to
> gimple_bb in this routine, since the caller was a thunk and therefore
> the edge did not have a call_stmt set. I've attached a slightly
> modified patch that guards the call by a check to
> cgraph_function_with_gimple_body_p. Regression and spec testing are
> clean.
>
> Teresa

Ping - Honza, does this patch look ok for stage 1?

Thanks,
Teresa


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread H.J. Lu

On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu  wrote:
> On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
>  wrote:
>> On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
>>  wrote:
>>> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu  wrote:
 On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng  wrote:
> Hi,
> This patch is to fix regression reported in PR60280 by removing forward 
> loop
> headers/latches in cfg cleanup if possible.  Several tests are broken by
> this change since cfg cleanup is shared by all optimizers.  Some tests has
> already been fixed by recent patches, I went through and fixed the others.
> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  
> When
> GCC removing a basic block, it checks profile information by calling
> check_bb_profile after redirecting incoming edges of the bb.  This 
> certainly
> results in warnings about invalid profile information and causes the case 
> to
> fail.  I will send a patch to skip checking profile information for a
> removing basic block in stage 1 if it sounds reasonable.  For now I just
> twisted the case itself.
>
> Bootstrap and tested on x86_64 and arm_a15.
>
> Is it OK?
>
>
> 2014-02-25  Bin Cheng  
>
> PR target/60280
> * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
> preheaders and latches only if requested.  Fix latch if it
> is removed.
> * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
> LOOPS_HAVE_PREHEADERS.
>

 This change:

 if (dest->loop_father->header == dest)
 -  return false;
 +  {
 +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 +&& bb->loop_father->header != dest)
 +  return false;
 +
 +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 +&& bb->loop_father->header == dest)
 +  return false;
 +  }
  }

 miscompiled 435.gromacs in SPEC CPU 2006 on x32 with

 -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
 -fuse-linker-plugin

 This patch changes loops without LOOPS_HAVE_PREHEADERS
 nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
 true.  I don't have a small testcase.  But this patch:

 diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
 index b5c384b..2ba673c 100644
 --- a/gcc/tree-cfgcleanup.c
 +++ b/gcc/tree-cfgcleanup.c
 @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool 
 phi_wanted)
  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
  && bb->loop_father->header == dest)
return false;
 +
 +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 +&& !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
 +  return false;
}
  }

 fixes the regression.  Does it make any senses?
>>>
>>> I think the preheader test isn't fully correct (bb may be in an inner loop
>>> for example).  So a more conservative variant would be
>>>
>>> Index: gcc/tree-cfgcleanup.c
>>> ===
>>> --- gcc/tree-cfgcleanup.c   (revision 208169)
>>> +++ gcc/tree-cfgcleanup.c   (working copy)
>>> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
>>>/* Protect loop preheaders and latches if requested.  */
>>>if (dest->loop_father->header == dest)
>>> {
>>> - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>>> - && bb->loop_father->header != dest)
>>> -   return false;
>>> -
>>> - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>> - && bb->loop_father->header == dest)
>>> -   return false;
>>> + if (bb->loop_father == dest->loop_father)
>>> +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
>>> + else if (bb->loop_father == loop_outer (dest->loop_father))
>>> +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
>>> + /* Always preserve other edges into loop headers that are
>>> +not simple latches or preheaders.  */
>>> + return false;
>>> }
>>>  }
>>>
>>> that makes sure we can properly update loop information.  It's also
>>> a more conservative change at this point which should still successfully
>>> remove simple latches and preheaders created by loop discovery.
>>
>> I think the patch makes sense anyway and thus I'll install it once it
>> passed bootstrap / regtesting.
>>
>> Another fix that may make sense is to restrict it to
>> !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
>> itself can end up setting that ... which we eventually should fix if it
>> still happens.  That is, check if
>>
>> Index: gcc/tree-cfgcleanup.c
>>

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread H.J. Lu

On Fri, Feb 28, 2014 at 9:25 AM, H.J. Lu  wrote:
> On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu  wrote:
>> On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
>>  wrote:
>>> On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
>>>  wrote:
 On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu  wrote:
> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng  wrote:
>> Hi,
>> This patch is to fix regression reported in PR60280 by removing forward 
>> loop
>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>> this change since cfg cleanup is shared by all optimizers.  Some tests 
>> has
>> already been fixed by recent patches, I went through and fixed the 
>> others.
>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  
>> When
>> GCC removing a basic block, it checks profile information by calling
>> check_bb_profile after redirecting incoming edges of the bb.  This 
>> certainly
>> results in warnings about invalid profile information and causes the 
>> case to
>> fail.  I will send a patch to skip checking profile information for a
>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>> twisted the case itself.
>>
>> Bootstrap and tested on x86_64 and arm_a15.
>>
>> Is it OK?
>>
>>
>> 2014-02-25  Bin Cheng  
>>
>> PR target/60280
>> * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>> preheaders and latches only if requested.  Fix latch if it
>> is removed.
>> * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>> LOOPS_HAVE_PREHEADERS.
>>
>
> This change:
>
> if (dest->loop_father->header == dest)
> -  return false;
> +  {
> +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
> +&& bb->loop_father->header != dest)
> +  return false;
> +
> +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
> +&& bb->loop_father->header == dest)
> +  return false;
> +  }
>  }
>
> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>
> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
> -fuse-linker-plugin
>
> This patch changes loops without LOOPS_HAVE_PREHEADERS
> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
> true.  I don't have a small testcase.  But this patch:
>
> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
> index b5c384b..2ba673c 100644
> --- a/gcc/tree-cfgcleanup.c
> +++ b/gcc/tree-cfgcleanup.c
> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool 
> phi_wanted)
>  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>  && bb->loop_father->header == dest)
>return false;
> +
> +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
> +&& !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
> +  return false;
>}
>  }
>
> fixes the regression.  Does it make any senses?

 I think the preheader test isn't fully correct (bb may be in an inner loop
 for example).  So a more conservative variant would be

 Index: gcc/tree-cfgcleanup.c
 ===
 --- gcc/tree-cfgcleanup.c   (revision 208169)
 +++ gcc/tree-cfgcleanup.c   (working copy)
 @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
/* Protect loop preheaders and latches if requested.  */
if (dest->loop_father->header == dest)
 {
 - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
 - && bb->loop_father->header != dest)
 -   return false;
 -
 - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
 - && bb->loop_father->header == dest)
 -   return false;
 + if (bb->loop_father == dest->loop_father)
 +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
 + else if (bb->loop_father == loop_outer (dest->loop_father))
 +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
 + /* Always preserve other edges into loop headers that are
 +not simple latches or preheaders.  */
 + return false;
 }
  }

 that makes sure we can properly update loop information.  It's also
 a more conservative change at this point which should still successfully
 remove simple latches and preheaders created by loop discovery.
>>>
>>> I think the patch makes sense anyway and thus I'll install it once it
>>> passed bootstrap / regtesting.
>>>
>>> Another fix that may make sense is to restrict it to
>>> !loops_state_satisfies_p (LOOPS_NEED_FIXUP), though cfgcleanup
>>> itself can end up

[PATCH, AArch64] Sync merge libffi - fix call frame information in ffi_closure_SYSV

2014-02-28 Thread Yufeng Zhang


Hi,

The attached patch fixes a bug in ./src/aarch64/sysv.S:ffi_closure_SYSV 
where stack unwinding information was not generated correctly.  The 
change has been reviewed, approved and merged into the stand-alone 
libffi release tree**.


OK for the trunk?

Thanks,
Yufeng

** http://github.com/atgreen/libffi


2014-02-28  Yufeng Zhang  

* src/aarch64/sysv.S (ffi_closure_SYSV): Use x29 as the
main CFA reg; update cfi_rel_offset.diff --git a/libffi/src/aarch64/sysv.S b/libffi/src/aarch64/sysv.S
index b8cd421..ffb16f8 100644
--- a/libffi/src/aarch64/sysv.S
+++ b/libffi/src/aarch64/sysv.S
@@ -231,13 +231,13 @@ ffi_closure_SYSV:
 cfi_rel_offset (x30, 8)
 
 mov x29, sp
+cfi_def_cfa_register (x29)
 
 sub sp, sp, #ffi_closure_SYSV_FS
-   cfi_adjust_cfa_offset (ffi_closure_SYSV_FS)
 
 stp x21, x22, [x29, #-16]
-cfi_rel_offset (x21, 0)
-cfi_rel_offset (x22, 8)
+cfi_rel_offset (x21, -16)
+cfi_rel_offset (x22, -8)
 
 /* Load x21 with &call_context.  */
 mov x21, sp
@@ -295,7 +295,7 @@ ffi_closure_SYSV:
 cfi_restore (x22)
 
 mov sp, x29
-   cfi_adjust_cfa_offset (-ffi_closure_SYSV_FS)
+cfi_def_cfa_register (sp)
 
 ldp x29, x30, [sp], #16
cfi_adjust_cfa_offset (-16)

Re: copyright dates in binutils (and includes/)

2014-02-28 Thread Joseph S. Myers

On Fri, 28 Feb 2014, Joel Brobecker wrote:

> > Joseph, do you know why implicitly adding years to the claimed
> > copyright years is a problem?  I'm guessing the file needs to be
> > published somewhere for each year claimed.
> 
> IANAL, but from 2 discussions with copyright-clerk:
> 
>   1. We start claiming copyright the year the file as committed
>  to a medium (hard drive), not the year it was published.

I don't think it counts unless the version in question got published at 
some point.  The question is about versions that weren't published at the 
time, but were published later when the version control history was 
released.

There was a discussion on bug-standards starting Jan 2012.  Karl's revised 
wording from 11 May 2012 seems to indicate that if a version was committed 
to a version control history that was later released, the dates from that 
history count as copyrightable years (so reducing the number of cases 
where it may not be possible to fill in gaps) - but that revised wording 
doesn't seem to have been committed.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches

2014-02-28 Thread H.J. Lu

On Fri, Feb 28, 2014 at 9:42 AM, H.J. Lu  wrote:
> On Fri, Feb 28, 2014 at 9:25 AM, H.J. Lu  wrote:
>> On Fri, Feb 28, 2014 at 8:11 AM, H.J. Lu  wrote:
>>> On Fri, Feb 28, 2014 at 2:09 AM, Richard Biener
>>>  wrote:
 On Fri, Feb 28, 2014 at 10:09 AM, Richard Biener
  wrote:
> On Fri, Feb 28, 2014 at 1:52 AM, H.J. Lu  wrote:
>> On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng  wrote:
>>> Hi,
>>> This patch is to fix regression reported in PR60280 by removing forward 
>>> loop
>>> headers/latches in cfg cleanup if possible.  Several tests are broken by
>>> this change since cfg cleanup is shared by all optimizers.  Some tests 
>>> has
>>> already been fixed by recent patches, I went through and fixed the 
>>> others.
>>> One case needs to be clarified is "gcc.dg/tree-prof/update-loopch.c".  
>>> When
>>> GCC removing a basic block, it checks profile information by calling
>>> check_bb_profile after redirecting incoming edges of the bb.  This 
>>> certainly
>>> results in warnings about invalid profile information and causes the 
>>> case to
>>> fail.  I will send a patch to skip checking profile information for a
>>> removing basic block in stage 1 if it sounds reasonable.  For now I just
>>> twisted the case itself.
>>>
>>> Bootstrap and tested on x86_64 and arm_a15.
>>>
>>> Is it OK?
>>>
>>>
>>> 2014-02-25  Bin Cheng  
>>>
>>> PR target/60280
>>> * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop
>>> preheaders and latches only if requested.  Fix latch if it
>>> is removed.
>>> * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set
>>> LOOPS_HAVE_PREHEADERS.
>>>
>>
>> This change:
>>
>> if (dest->loop_father->header == dest)
>> -  return false;
>> +  {
>> +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>> +&& bb->loop_father->header != dest)
>> +  return false;
>> +
>> +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>> +&& bb->loop_father->header == dest)
>> +  return false;
>> +  }
>>  }
>>
>> miscompiled 435.gromacs in SPEC CPU 2006 on x32 with
>>
>> -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
>> -fuse-linker-plugin
>>
>> This patch changes loops without LOOPS_HAVE_PREHEADERS
>> nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning
>> true.  I don't have a small testcase.  But this patch:
>>
>> diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
>> index b5c384b..2ba673c 100644
>> --- a/gcc/tree-cfgcleanup.c
>> +++ b/gcc/tree-cfgcleanup.c
>> @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool 
>> phi_wanted)
>>  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
>>  && bb->loop_father->header == dest)
>>return false;
>> +
>> +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
>> +&& !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
>> +  return false;
>>}
>>  }
>>
>> fixes the regression.  Does it make any senses?
>
> I think the preheader test isn't fully correct (bb may be in an inner loop
> for example).  So a more conservative variant would be
>
> Index: gcc/tree-cfgcleanup.c
> ===
> --- gcc/tree-cfgcleanup.c   (revision 208169)
> +++ gcc/tree-cfgcleanup.c   (working copy)
> @@ -316,13 +316,13 @@ tree_forwarder_block_p (basic_block bb,
>/* Protect loop preheaders and latches if requested.  */
>if (dest->loop_father->header == dest)
> {
> - if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS)
> - && bb->loop_father->header != dest)
> -   return false;
> -
> - if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)
> - && bb->loop_father->header == dest)
> -   return false;
> + if (bb->loop_father == dest->loop_father)
> +   return !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES);
> + else if (bb->loop_father == loop_outer (dest->loop_father))
> +   return !loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS);
> + /* Always preserve other edges into loop headers that are
> +not simple latches or preheaders.  */
> + return false;
> }
>  }
>
> that makes sure we can properly update loop information.  It's also
> a more conservative change at this point which should still successfully
> remove simple latches and preheaders created by loop discovery.

 I think the patch makes sense anyway and thus I'll install it once it
 passed bootst

Re: [C++ Patch] PR 58610

2014-02-28 Thread Paolo Carlini


On 02/28/2014 04:57 PM, Jason Merrill wrote:

OK.
Thanks. I'm going to commit as obvious the additional lambda.c hunk 
below, which removes another now redundant STRIP_TEMPLATE use.


Thanks,
Paolo.


/cp
2014-02-28  Paolo Carlini  

PR c++/58610
* cp-tree.h (DECL_DELETED_FN): Use LANG_DECL_FN_CHECK.
* call.c (print_z_candidate): Remove STRIP_TEMPLATE use.
* lambda.c (maybe_add_lambda_conv_op): Likewise.

/testsuite
2014-02-28  Paolo Carlini  

PR c++/58610
* g++.dg/cpp0x/constexpr-ice11.C: New.
Index: cp/call.c
===
--- cp/call.c   (revision 208224)
+++ cp/call.c   (working copy)
@@ -3237,7 +3237,7 @@ print_z_candidate (location_t loc, const char *msg
 inform (cloc, "%s%T ", msg, candidate->fn);
   else if (candidate->viable == -1)
 inform (cloc, "%s%#D ", msg, candidate->fn);
-  else if (DECL_DELETED_FN (STRIP_TEMPLATE (candidate->fn)))
+  else if (DECL_DELETED_FN (candidate->fn))
 inform (cloc, "%s%#D ", msg, candidate->fn);
   else
 inform (cloc, "%s%#D", msg, candidate->fn);
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 208224)
+++ cp/cp-tree.h(working copy)
@@ -3222,7 +3222,7 @@ more_aggr_init_expr_args_p (const aggr_init_expr_a
 
 /* Nonzero if DECL was declared with '= delete'.  */
 #define DECL_DELETED_FN(DECL) \
-  (DECL_LANG_SPECIFIC (FUNCTION_DECL_CHECK 
(DECL))->u.base.threadprivate_or_deleted_p)
+  (LANG_DECL_FN_CHECK (DECL)->min.base.threadprivate_or_deleted_p)
 
 /* Nonzero if DECL was declared with '= default' (maybe implicitly).  */
 #define DECL_DEFAULTED_FN(DECL) \
Index: cp/lambda.c
===
--- cp/lambda.c (revision 208224)
+++ cp/lambda.c (working copy)
@@ -975,7 +975,7 @@ maybe_add_lambda_conv_op (tree type)
  the conversion op is used.  */
   if (varargs_function_p (callop))
 {
-  DECL_DELETED_FN (STRIP_TEMPLATE (fn)) = 1;
+  DECL_DELETED_FN (fn) = 1;
   return;
 }
 
Index: testsuite/g++.dg/cpp0x/constexpr-ice11.C
===
--- testsuite/g++.dg/cpp0x/constexpr-ice11.C(revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-ice11.C(working copy)
@@ -0,0 +1,9 @@
+// PR c++/58610
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  template A();
+};
+
+constexpr A a;  // { dg-error "literal|matching" }

Re: [AArch64] Improve vst4_lane intrinsics

2014-02-28 Thread Marcus Shawcroft

On 13 February 2014 16:03, James Greenhalgh  wrote:
>
> Hi,
>
> This patch rewrites the vst4_lane intrinsics in terms of RTL builtins.
>
> Tested on aarch64-none-elf with no issues.
>
> OK to queue for Stage 1?

OK for stage 1
/Marcus

Re: RFA: RL78: Add missing instruction patterns

2014-02-28 Thread DJ Delorie


>   * config/rl78/rl78-real.md (cbranchsi4_real_signed): Add
>   anti-cacnonical alternatives.
>   (negandhi3_real): New pattern.
>   * config/rl78/rl78-virt.md (negandhi3_virt): New pattern.

These are fine, although I don't know why gcc would require a negandhi3 
pattern...

Re: [PATCH,GRAPHITE] Fix for P1 bug 58028

2014-02-28 Thread Mircea Namolaru

Hi,

Thanks. Here is the updated patch.

2014-02-26  Tobias Grosser  
Mircea Namolaru  

 PR tree-optimization/58028
 * graphite-clast-to-gimple.c (set_cloog_options): Don't remove scalar
   dimensions.

Index: gcc/graphite-clast-to-gimple.c
===
--- gcc/graphite-clast-to-gimple.c  (revision 207298)
+++ gcc/graphite-clast-to-gimple.c  (working copy)
@@ -1522,6 +1522,13 @@
  variables.  */
   options->save_domains = 1;
 
+  /* Do not remove scalar dimensions.  CLooG by default removes scalar 
+ dimensions very early from the input schedule.  However, they are 
+ necessary to correctly derive from the saved domains 
+ (options->save_domains) the relationship between the generated loops 
+ and the schedule dimensions they are generated from.  */ 
+  options->noscalars = 1;
+
   /* Disable optimizations and make cloog generate source code closer to the
  input.  This is useful for debugging,  but later we want the optimized
  code.

Mircea

RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)

2014-02-28 Thread Jason Merrill

Multiple large C++ projects (KDE and libreoffice, at least) have been 
breaking when GCC speculatively devirtualizes a call to an 
implicitly-declared virtual destructor, because this leads to references 
to base destructors and vtables that might be hidden in another DSO. 
This patch avoids this problem by avoiding speculative devirtualization 
of calls to implicitly-declared functions.


Tested x86_64-pc-linux-gnu.  OK for trunk?

commit 94eb5df9fb20c796d09151d7293ae89ac012ae79
Author: Jason Merrill 
Date:   Fri Feb 28 14:03:19 2014 -0500

	PR c++/58678
	* ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared
	function.

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 21649cb..27dc27d 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -1710,7 +1710,7 @@ ipa_devirt (void)
 
   int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0;
   int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0;
-  int nwrong = 0, nok = 0, nexternal = 0;;
+  int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0;
 
   FOR_EACH_DEFINED_FUNCTION (n)
 {	
@@ -1820,6 +1820,16 @@ ipa_devirt (void)
 		nexternal++;
 		continue;
 	  }
+	/* Don't use an implicitly-declared destructor (c++/58678).  */
+	struct cgraph_node *real_target
+	  = cgraph_function_node (likely_target);
+	if (DECL_ARTIFICIAL (real_target->decl))
+	  {
+		if (dump_file)
+		  fprintf (dump_file, "Target is implicitly declared\n\n");
+		nartificial++;
+		continue;
+	  }
 	if (cgraph_function_body_availability (likely_target)
 		<= AVAIL_OVERWRITABLE
 		&& symtab_can_be_discarded (likely_target))
@@ -1862,10 +1872,10 @@ ipa_devirt (void)
 	 " %i speculatively devirtualized, %i cold\n"
 	 "%i have multiple targets, %i overwritable,"
 	 " %i already speculated (%i agree, %i disagree),"
-	 " %i external, %i not defined\n",
+	 " %i external, %i not defined, %i artificial\n",
 	 npolymorphic, ndevirtualized, nconverted, ncold,
 	 nmultiple, noverwritable, nspeculated, nok, nwrong,
-	 nexternal, nnotdefined);
+	 nexternal, nnotdefined, nartificial);
   return ndevirtualized ? TODO_remove_functions : 0;
 }
 
diff --git a/gcc/testsuite/g++.dg/ipa/devirt-28.C b/gcc/testsuite/g++.dg/ipa/devirt-28.C
new file mode 100644
index 000..35c8df1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/devirt-28.C
@@ -0,0 +1,17 @@
+// PR c++/58678
+// { dg-options "-O3 -fdump-ipa-devirt" }
+
+struct A {
+  virtual ~A();
+};
+struct B : A {
+  virtual int m_fn1();
+};
+void fn1(B* b) {
+  delete b;
+}
+
+// { dg-final { scan-assembler-not "_ZN1AD2Ev" } }
+// { dg-final { scan-assembler-not "_ZN1BD0Ev" } }
+// { dg-final { scan-ipa-dump "Target is implicitly declared" "devirt" } }
+// { dg-final { cleanup-ipa-dump "devirt" } }

Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)

2014-02-28 Thread Jan Hubicka

> Multiple large C++ projects (KDE and libreoffice, at least) have
> been breaking when GCC speculatively devirtualizes a call to an
> implicitly-declared virtual destructor, because this leads to
> references to base destructors and vtables that might be hidden in
> another DSO. This patch avoids this problem by avoiding speculative
> devirtualization of calls to implicitly-declared functions.
> 
> Tested x86_64-pc-linux-gnu.  OK for trunk?
> 

> commit 94eb5df9fb20c796d09151d7293ae89ac012ae79
> Author: Jason Merrill 
> Date:   Fri Feb 28 14:03:19 2014 -0500
> 
>   PR c++/58678
>   * ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared
>   function.
> 
> diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
> index 21649cb..27dc27d 100644
> --- a/gcc/ipa-devirt.c
> +++ b/gcc/ipa-devirt.c
> @@ -1710,7 +1710,7 @@ ipa_devirt (void)
>  
>int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0;
>int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0;
> -  int nwrong = 0, nok = 0, nexternal = 0;;
> +  int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0;
>  
>FOR_EACH_DEFINED_FUNCTION (n)
>  {
> @@ -1820,6 +1820,16 @@ ipa_devirt (void)
>   nexternal++;
>   continue;
> }
> + /* Don't use an implicitly-declared destructor (c++/58678).  */
> + struct cgraph_node *real_target
> +   = cgraph_function_node (likely_target);
> + if (DECL_ARTIFICIAL (real_target->decl))

I think we can safely test here DECL_ARTIFICIAL && (DECL_EXTERNAL ||
DECL_COMDAT).  If the dtor is going to be output anyway, we are safe to use it.

Are those programs valid by C++ standard? (I believe it is not valid to include
sutff whose implementation you do not link with.). If we just want to avoid
breaking python and libreoffice (I fixed libreoffice part however), we may just
go with the ipa-devirt change as you propose (with external&comdat check). 

If this is an correcness issue, I think we want to be safe that other 
optimizations
won't do the same. In that case your check seems misplaced.

If DECL_ARTIFICIAL destructors are not safe to inline, I would add it into
function_attribute_inlinable_p.  If the dtor is not safe to refer, then I would
add it into can_refer_decl_in_current_unit_p

Both such changes would however inhibit quite some potimization, since
artificial destructors are quite common case, right? Or is there some reason why
only speculative devirtualiztaion count possibly work out reference to these?

Honza

Re: [C++ patch] for C++/52369

2014-02-28 Thread Fabien Chêne

2014-02-27 19:25 GMT+01:00 Jason Merrill :
> On 02/23/2014 02:36 PM, Fabien Chêne wrote:
>>
>>  * cp/method.c (walk_field_subobs): improve the diagnostic
>>  locations for both REFERENCE_TYPEs and non-static const members.
>
>
> It's important to have the error location be the place where the actual
> problem is, namely the constructor definition.  Instead of changing it,
> please add an inform pointing out the location of the member in question.

Well, I am not very happy with the c++11 diagnostic compared to the c++98 one.
Below is the original c++11 diagnostic for pr44086.C:

struct A
{
int const i : 2;
};

void f()
{
A a;  // { dg-error "deleted|uninitialized const" }
new A;  // { dg-error "deleted|uninitialized const" }
A();  // { dg-error "deleted" "" { target c++11 } }
new A();  // { dg-error "deleted" "" { target c++11 } }
}

testsuite/g++.dg/init/uninitialized1.C:10:3: error: use of deleted
function 'A::A()'
testsuite/g++.dg/init/uninitialized1.C:3:8: note: 'A::A()' is
implicitly deleted because the default definition would be ill-formed:
testsuite/g++.dg/init/uninitialized1.C:3:8: error: uninitialized
non-static const member 'const int A::value1'

The first two lines are fine in my opinion. The third line should
actually be split into an error + an inform. By doing that, I think we
also need to reformulate the error message like this:
testsuite/g++.dg/init/pr44086.C:4:8: error: 'struct A' needs its
non-static const members to be initialized
testsuite/g++.dg/init/pr44086.C:6:19: note: 'A::i' should be initialized

What do you think ? (before I bother adjusting the testsuite)

Incidentally, while moving the diagnostic concerning the uninitialized
field from an error to an inform, I realized that the syntactic sugar
%q#D is no longer honored an is treated as %qD, is it expected ?

-- 
Fabien

Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)

2014-02-28 Thread Jason Merrill


On 02/28/2014 03:56 PM, Jan Hubicka wrote:

I think we can safely test here DECL_ARTIFICIAL && (DECL_EXTERNAL ||
DECL_COMDAT).  If the dtor is going to be output anyway, we are safe to use it.


We already skipped DECL_EXTERNAL decls, and artificial members are 
always DECL_COMDAT, but I'll add the COMDAT check.



Are those programs valid by C++ standard? (I believe it is not valid to include
stuff whose implementation you do not link with.).


Symbol visibility is outside the scope of the standard.


If we just want to avoid
breaking python and libreoffice (I fixed libreoffice part however), we may just
go with the ipa-devirt change as you propose (with external&comdat check).

If this is an correctness issue, I think we want to be safe that other 
optimizations
won't do the same. In that case your check seems misplaced.

If DECL_ARTIFICIAL destructors are not safe to inline, I would add it into
function_attribute_inlinable_p.  If the dtor is not safe to refer, then I would
add it into can_refer_decl_in_current_unit_p



Both such changes would however inhibit quite some optimization, since
artificial destructors are quite common case, right? Or is there some reason why
only speculative devirtualization count possibly work out reference to these?


Normally, it's fine to inline destructors, and refer to them.  The 
problem comes when we turn what had been a virtual call (which goes 
through the vtable that is hidden in the DSO) into a direct call to a 
hidden function.  We don't do that for user-defined virtual functions 
because the user controls whether or not they are defined in the header, 
and we don't devirtualize if no definition is available, but 
implicitly-declared functions are different because the user has no way 
to prevent the definition from being available.


This also isn't a problem for cprop devirtualization, because in that 
situation we must have already referred to the vtable.


Jason

commit 2a05a09c268ce3abb373aa86cf731d20aac8dd7a
Author: Jason Merrill 
Date:   Fri Feb 28 14:03:19 2014 -0500

	PR c++/58678
	* ipa-devirt.c (ipa_devirt): Don't choose an implicitly-declared
	function.

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 21649cb..2f84f17 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -1710,7 +1710,7 @@ ipa_devirt (void)
 
   int npolymorphic = 0, nspeculated = 0, nconverted = 0, ncold = 0;
   int nmultiple = 0, noverwritable = 0, ndevirtualized = 0, nnotdefined = 0;
-  int nwrong = 0, nok = 0, nexternal = 0;;
+  int nwrong = 0, nok = 0, nexternal = 0, nartificial = 0;
 
   FOR_EACH_DEFINED_FUNCTION (n)
 {	
@@ -1820,6 +1820,17 @@ ipa_devirt (void)
 		nexternal++;
 		continue;
 	  }
+	/* Don't use an implicitly-declared destructor (c++/58678).  */
+	struct cgraph_node *non_thunk_target
+	  = cgraph_function_node (likely_target);
+	if (DECL_ARTIFICIAL (non_thunk_target->decl)
+		&& DECL_COMDAT (non_thunk_target->decl))
+	  {
+		if (dump_file)
+		  fprintf (dump_file, "Target is artificial\n\n");
+		nartificial++;
+		continue;
+	  }
 	if (cgraph_function_body_availability (likely_target)
 		<= AVAIL_OVERWRITABLE
 		&& symtab_can_be_discarded (likely_target))
@@ -1862,10 +1873,10 @@ ipa_devirt (void)
 	 " %i speculatively devirtualized, %i cold\n"
 	 "%i have multiple targets, %i overwritable,"
 	 " %i already speculated (%i agree, %i disagree),"
-	 " %i external, %i not defined\n",
+	 " %i external, %i not defined, %i artificial\n",
 	 npolymorphic, ndevirtualized, nconverted, ncold,
 	 nmultiple, noverwritable, nspeculated, nok, nwrong,
-	 nexternal, nnotdefined);
+	 nexternal, nnotdefined, nartificial);
   return ndevirtualized ? TODO_remove_functions : 0;
 }
 
diff --git a/gcc/testsuite/g++.dg/ipa/devirt-28.C b/gcc/testsuite/g++.dg/ipa/devirt-28.C
new file mode 100644
index 000..e18b818
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/devirt-28.C
@@ -0,0 +1,17 @@
+// PR c++/58678
+// { dg-options "-O3 -fdump-ipa-devirt" }
+
+struct A {
+  virtual ~A();
+};
+struct B : A {
+  virtual int m_fn1();
+};
+void fn1(B* b) {
+  delete b;
+}
+
+// { dg-final { scan-assembler-not "_ZN1AD2Ev" } }
+// { dg-final { scan-assembler-not "_ZN1BD0Ev" } }
+// { dg-final { scan-ipa-dump "Target is artificial" "devirt" } }
+// { dg-final { cleanup-ipa-dump "devirt" } }

Re: [C++ patch] for C++/52369

2014-02-28 Thread Jason Merrill


On 02/28/2014 04:03 PM, Fabien Chêne wrote:

The first two lines are fine in my opinion. The third line should
actually be split into an error + an inform. By doing that, I think we
also need to reformulate the error message like this:
testsuite/g++.dg/init/pr44086.C:4:8: error: 'struct A' needs its
non-static const members to be initialized
testsuite/g++.dg/init/pr44086.C:6:19: note: 'A::i' should be initialized

What do you think ? (before I bother adjusting the testsuite)


Let's change the C++11 diagnostic to match the C++98 diagnostic.  So, 
"uninitialized const member in %q#T" + "%qD should be initialized".



Incidentally, while moving the diagnostic concerning the uninitialized
field from an error to an inform, I realized that the syntactic sugar
%q#D is no longer honored an is treated as %qD, is it expected ?


No, how do you mean?

Jason

[jit] New API entrypoint: gcc_jit_context_new_cast

2014-02-28 Thread David Malcolm

Committed to branch dmalcolm/jit:

gcc/jit/
* libgccjit.h (gcc_jit_context_new_cast): New.
* libgccjit.map (gcc_jit_context_new_cast): New.
* libgccjit++.h (gccjit::context::new_cast): New method.
* libgccjit.c (gcc_jit_context_new_cast): New.

* internal-api.h (gcc::jit::recording::context::new_cast): New method.
(gcc::jit::recording::cast): New subclass of rvalue.
(gcc::jit::playback::context::new_cast): New method.
(gcc::jit::playback::context::build_cast): New method.

* internal-api.c (convert): New.
(gcc::jit::recording::context::new_cast): New.
(gcc::jit::recording::cast::replay_into): New.
(gcc::jit::recording::cast::make_debug_string): New.
(gcc::jit::playback::context::build_cast): New.
(gcc::jit::playback::context::new_cast): New.

* TODO.rst: Update.

gcc/testsuite/
* jit.dg/test-expressions.c (make_test_of_cast): New, to test new
entrypoint gcc_jit_context_new_cast.
(make_tests_of_casts): New.
(create_code): Add call to make_tests_of_casts.
(verify_code): Add call to verify_casts.
---
 gcc/jit/ChangeLog.jit   |  21 ++
 gcc/jit/TODO.rst|   9 +--
 gcc/jit/internal-api.c  | 102 ++
 gcc/jit/internal-api.h  |  33 +
 gcc/jit/libgccjit++.h   |  15 
 gcc/jit/libgccjit.c |  13 
 gcc/jit/libgccjit.h |  11 +++
 gcc/jit/libgccjit.map   |   1 +
 gcc/testsuite/ChangeLog.jit |   8 ++
 gcc/testsuite/jit.dg/test-expressions.c | 126 
 10 files changed, 332 insertions(+), 7 deletions(-)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 6c43ce9..625e01a 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,5 +1,26 @@
 2014-02-28  David Malcolm  
 
+   * libgccjit.h (gcc_jit_context_new_cast): New.
+   * libgccjit.map (gcc_jit_context_new_cast): New.
+   * libgccjit++.h (gccjit::context::new_cast): New method.
+   * libgccjit.c (gcc_jit_context_new_cast): New.
+
+   * internal-api.h (gcc::jit::recording::context::new_cast): New method.
+   (gcc::jit::recording::cast): New subclass of rvalue.
+   (gcc::jit::playback::context::new_cast): New method.
+   (gcc::jit::playback::context::build_cast): New method.
+
+   * internal-api.c (convert): New.
+   (gcc::jit::recording::context::new_cast): New.
+   (gcc::jit::recording::cast::replay_into): New.
+   (gcc::jit::recording::cast::make_debug_string): New.
+   (gcc::jit::playback::context::build_cast): New.
+   (gcc::jit::playback::context::new_cast): New.
+
+   * TODO.rst: Update.
+
+2014-02-28  David Malcolm  
+
* libgccjit.h (gcc_jit_block_get_function): New.
* libgccjit.map (gcc_jit_block_get_function): New.
* libgccjit++.h (gccjit::block::get_function): New method.
diff --git a/gcc/jit/TODO.rst b/gcc/jit/TODO.rst
index 227113a..8a2308e 100644
--- a/gcc/jit/TODO.rst
+++ b/gcc/jit/TODO.rst
@@ -23,13 +23,6 @@ Initial Release
 
 * expose the statements in the API? (mostly so they can be stringified?)
 
-* explicit casts::
-
-extern gcc_jit_rvalue *
-gcc_jit_rvalue_cast (gcc_jit_rvalue *, gcc_jit_type *);
-
-  e.g. (void*) to (struct foo*)
-
 * support more arithmetic ops and comparison modes
 
 * access to a function by address::
@@ -119,6 +112,8 @@ Initial Release
   have each block have its own stmt_list, avoiding the need for this
   traversal, and having the block structure show up within tree dumps.
 
+* Implement more kinds of casts e.g. pointers
+
 Bugs
 
 * INTERNAL functions don't seem to work (see e.g. test-quadratic, on trying
diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c
index fa08e56..573dc67 100644
--- a/gcc/jit/internal-api.c
+++ b/gcc/jit/internal-api.c
@@ -16,12 +16,29 @@
 #include "diagnostic-core.h"
 #include "dumpfile.h"
 #include "tree-cfg.h"
+#include "target.h"
+#include "convert.h"
 
 #include 
 
 #include "internal-api.h"
 #include "jit-builtins.h"
 
+/* gcc::jit::playback::context::build_cast uses the convert.h API,
+   which in turn requires the frontend to provide a "convert"
+   function, apparently as a fallback.
+
+   Hence we provide this dummy one, with the requirement that any casts
+   are handled before reaching this.  */
+extern tree convert (tree type, tree expr);
+
+tree
+convert (tree /*type*/, tree /*expr*/)
+{
+  error ("unhandled conversion");
+  return error_mark_node;
+}
+
 namespace gcc {
 namespace jit {
 
@@ -474,6 +491,16 @@ recording::context::new_comparison (recording::location 
*loc,
 }
 
 recording::rvalue *
+recording::context::new_cast (recording::location *loc,
+ recording::rvalue *expr,
+ recording::type *type_)
+{
+  recording::rvalue

Re: [jit] Major API change: blocks rather than labels

2014-02-28 Thread David Malcolm

On Thu, 2014-02-27 at 17:25 -0500, David Malcolm wrote:
> On Thu, 2014-02-27 at 17:11 -0500, David Malcolm wrote:
> 
> [...]
> 
> > With this commit, the API changes to using basic blocks instead: blocks
> > are created within functions, and statements are added to blocks, rather
> > than to functions.
> 
> [...]
> 
> I've also ported the "jittest" example to the new API, as of this
> commit:
> https://github.com/davidmalcolm/jittest/commit/af66efe0386e52a9292b7527174ae402c0af5e43
> 
> (though currently it falls foul of type-checking, due to int vs bool
> issues in conditionals; upon hacking out the type-checking from
> libgccjit it compiles and runs OK).

jittest is now fixed, as of:
https://github.com/davidmalcolm/jittest/commit/7af0765c018e15d600016d41f7b444273cc0389a

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-28 Thread Bernd Schmidt


On 02/28/2014 05:21 PM, Bernd Schmidt wrote:

On 02/28/2014 05:09 PM, Ilya Verbin wrote:

Unfortunately I don't fully understand this configure magic... When a
user specifies 2 or 3 accelerators during configuration with
--enable-accelerators, will several different accel-gccs be built?


No - the idea is that --enable-accelerator= is likely specific to ptx,
where we really just want to build a gcc and no target libraries, so
building it alongside the host in an accel-gcc subdirectory is ideal.

For your use case, I'd imagine the offload compiler would be built
relatively normally as a full build with
"--enable-as-accelerator-for=x86_64-linux", which would install it into
locations where the host will eventually be able to find it. Then the
host compiler would be built with another new configure option (as yet
unimplemented in my patch set) "--enable-offload-targets=mic,..." which
would tell the host compiler about the pre-built offload target
compilers. On the ptx side, "--enable-accelerator=ptx" would then also
add ptx to the list of --enable-offload-targets.
Naming of all these configure options can be discussed, I have no real
preference for any of them.


IOW, something like the following on top of the other patches. Ideally 
we'd also add error checking to make sure the offload compilers exist in 
the places we'll be looking for them.



Bernd

Index: gomp-4_0-branch/gcc/config.in
===
--- gomp-4_0-branch.orig/gcc/config.in
+++ gomp-4_0-branch/gcc/config.in
@@ -1748,6 +1748,12 @@
 #endif
 
 
+/* Define to hold the list of target names suitable for offloading. */
+#ifndef USED_FOR_TARGET
+#undef OFFLOAD_TARGETS
+#endif
+
+
 /* Define to the address where bug reports for this package should be sent. */
 #ifndef USED_FOR_TARGET
 #undef PACKAGE_BUGREPORT
Index: gomp-4_0-branch/gcc/configure
===
--- gomp-4_0-branch.orig/gcc/configure
+++ gomp-4_0-branch/gcc/configure
@@ -908,6 +908,7 @@ with_bugurl
 enable_languages
 enable_accelerator
 enable_as_accelerator_for
+enable_offload_targets
 with_multilib_list
 enable_rpath
 with_libiconv_prefix
@@ -1618,6 +1619,8 @@ Optional Features:
   --enable-acceleratorbuild accelerator [ARG={no,device-triplet}]
   --enable-as-accelerator-for
   build compiler as accelerator target for given host
+  --enable-offload-targets=LIST
+  enable offloading to devices from LIST
   --disable-rpath do not hardcode runtime library paths
   --enable-sjlj-exceptions
   arrange to use setjmp/longjmp exception handling
@@ -7299,12 +7302,14 @@ else
 fi
 
 
+offload_targets=
 # Check whether --enable-accelerator was given.
 if test "${enable_accelerator+set}" = set; then :
   enableval=$enable_accelerator;
   case $enable_accelerator in
   no) ;;
   *)
+offload_targets=$enable_accelerator
 
 $as_echo "#define ENABLE_OFFLOADING 1" >>confdefs.h
 
@@ -7343,6 +7348,31 @@ fi
 
 
 
+# Check whether --enable-offload-targets was given.
+if test "${enable_offload_targets+set}" = set; then :
+  enableval=$enable_offload_targets;
+  if test x$enable_offload_targets = x; then
+as_fn_error "no offload targets specified" "$LINENO" 5
+  else
+if test x$offload_targets = x; then
+  offload_targets=$enable_offload_targets
+else
+  offload_targets=$offload_targets,$enable_offload_targets
+fi
+  fi
+
+else
+  enable_accelerator=no
+fi
+
+
+offload_targets=`echo $offload_targets | sed -e 's#,#:#'`
+
+cat >>confdefs.h <<_ACEOF
+#define OFFLOAD_TARGETS "$offload_targets"
+_ACEOF
+
+
 
 # Check whether --with-multilib-list was given.
 if test "${with_multilib_list+set}" = set; then :
@@ -17983,7 +18013,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 17986 "configure"
+#line 18016 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18089,7 +18119,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18092 "configure"
+#line 18122 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
Index: gomp-4_0-branch/gcc/configure.ac
===
--- gomp-4_0-branch.orig/gcc/configure.ac
+++ gomp-4_0-branch/gcc/configure.ac
@@ -839,12 +839,14 @@ AC_ARG_ENABLE(languages,
 esac],
 [enable_languages=c])
 
+offload_targets=
 AC_ARG_ENABLE(accelerator,
 [AS_HELP_STRING([--enable-accelerator], [build accelerator @<:@ARG={no,device-triplet}@:>@])],
 [
   case $enable_accelerator in
   no) ;;
   *)
+offload_targets=$enable_accelerator
 AC_DEFINE(ENABLE_OFFLOADING, 1,
  [Define this to enable support for offloading.])
 AC_DEFINE_UNQUOTED(ACCEL_TARGET,"${enable_accelerator}",
@@ -871,6 +873,25 @@ AC_ARG_ENABLE(as-accelerator-for,
 ], [enable_as_accelera

Re: [C++ patch] for C++/52369

2014-02-28 Thread Fabien Chêne

2014-02-28 22:27 GMT+01:00 Jason Merrill :
> Let's change the C++11 diagnostic to match the C++98 diagnostic.  So,
> "uninitialized const member in %q#T" + "%qD should be initialized".

OK.

>> Incidentally, while moving the diagnostic concerning the uninitialized
>> field from an error to an inform, I realized that the syntactic sugar
>> %q#D is no longer honored an is treated as %qD, is it expected ?
>
>
> No, how do you mean?

I must be tired, false alarm, sorry.

-- 
Fabien

Re: [C++ patch] for C++/52369

2014-02-28 Thread Fabien Chêne

2014-02-28 22:52 GMT+01:00 Fabien Chêne :
>>> Incidentally, while moving the diagnostic concerning the uninitialized
>>> field from an error to an inform, I realized that the syntactic sugar
>>> %q#D is no longer honored an is treated as %qD, is it expected ?
>>
>>
>> No, how do you mean?
>
> I must be tired, false alarm, sorry.

I guess my mistake comes from the fact that %q#D is not present in the
c++98 diagnostic. Shall we homogeneise that as well ?
In favor of %q#D ?

-- 
Fabien

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-28 Thread Ilya Verbin

On 28 Feb 17:21, Bernd Schmidt wrote:
> It would help to see the code you have on the libgomp side, I don't
> believe that's been posted yet?

It was posted here: http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01777.html
And below is the updated version.

---
 libgomp/libgomp.map |1 +
 libgomp/target.c|  138 ---
 2 files changed, 132 insertions(+), 7 deletions(-)

diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index cb52e45..d33673d 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -208,6 +208,7 @@ GOMP_3.0 {
 
 GOMP_4.0 {
   global:
+   GOMP_register_lib;
GOMP_barrier_cancel;
GOMP_cancel;
GOMP_cancellation_point;
diff --git a/libgomp/target.c b/libgomp/target.c
index a6a5505..7fafa9a 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -84,6 +84,19 @@ struct splay_tree_key_s {
   bool copy_from;
 };
 
+enum library_descr {
+  DESCR_TABLE_START,
+  DESCR_TABLE_END,
+  DESCR_IMAGE_START,
+  DESCR_IMAGE_END
+};
+
+/* Array of pointers to target shared library descriptors.  */
+static void **libraries;
+
+/* Total number of target shared libraries.  */
+static int num_libraries;
+
 /* Array of descriptors of all available devices.  */
 static struct gomp_device_descr *devices;
 
@@ -107,6 +120,12 @@ splay_compare (splay_tree_key x, splay_tree_key y)
 
 #include "splay-tree.h"
 
+struct target_table_s
+{
+  void **entries;
+  int num_entries;
+};
+
 /* This structure describes accelerator device.
It contains name of the corresponding libgomp plugin, function handlers for
interaction with the device, ID-number of the device, and information about
@@ -117,15 +136,21 @@ struct gomp_device_descr
  TARGET construct.  */
   int id;
 
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+
   /* Plugin file handler.  */
   void *plugin_handle;
 
   /* Function handlers.  */
-  bool (*device_available_func) (void);
+  bool (*device_available_func) (int);
+  void (*device_init_func) (int);
+  struct target_table_s (*device_load_image_func) (void *, int);
   void *(*device_alloc_func) (size_t);
   void (*device_free_func) (void *);
   void *(*device_dev2host_func)(void *, const void *, size_t);
   void *(*device_host2dev_func)(void *, const void *, size_t);
+  void (*device_run_func) (void *, void *);
 
   /* Splay tree containing information about mapped memory regions.  */
   struct splay_tree_s dev_splay_tree;
@@ -471,6 +496,80 @@ gomp_update (struct gomp_device_descr *devicep, size_t 
mapnum,
   gomp_mutex_unlock (&devicep->dev_env_lock);
 }
 
+void
+GOMP_register_lib (const void *openmp_target)
+{
+  libraries = realloc (libraries, (num_libraries + 1) * sizeof (void *));
+
+  if (libraries == NULL)
+return;
+
+  libraries[num_libraries] = (void *) openmp_target;
+
+  num_libraries++;
+}
+
+static void
+gomp_init_device (struct gomp_device_descr *devicep)
+{
+  /* Initialize the target device.  */
+  devicep->device_init_func (devicep->id);
+
+  /* Load shared libraries into target device and
+ perform host-target address mapping.  */
+  int i;
+  for (i = 0; i < num_libraries; i++)
+{
+  /* Get the pointer to the target image from the library descriptor.  */
+  void **lib = libraries[i];
+
+  /* FIXME: Select the proper target image, if there are several.  */
+  void *target_image = lib[DESCR_IMAGE_START];
+  int target_img_size = lib[DESCR_IMAGE_END] - lib[DESCR_IMAGE_START];
+
+  /* Calculate the size of host address table.  */
+  void **host_table_start = lib[DESCR_TABLE_START];
+  void **host_table_end = lib[DESCR_TABLE_END];
+  int host_table_size = host_table_end - host_table_start;
+
+  /* Load library into target device and receive its address table.  */
+  struct target_table_s target_table
+   = devicep->device_load_image_func (target_image, target_img_size);
+
+  if (host_table_size != target_table.num_entries)
+   gomp_fatal ("Can't map target objects");
+
+  void **host_entry, **target_entry;
+  for (host_entry = host_table_start, target_entry = target_table.entries;
+  host_entry < host_table_end; host_entry += 2, target_entry += 2)
+   {
+ struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+ tgt->refcount = 1;
+ tgt->array = gomp_malloc (sizeof (*tgt->array));
+ tgt->tgt_start = (uintptr_t) *target_entry;
+ tgt->tgt_end = tgt->tgt_start + *((uint64_t *) target_entry + 1);
+ tgt->to_free = NULL;
+ tgt->list_count = 0;
+ tgt->device_descr = devicep;
+ splay_tree_node node = tgt->array;
+ splay_tree_key k = &node->key;
+ k->host_start = (uintptr_t) *host_entry;
+ k->host_end = k->host_start + *((uint64_t *) host_entry + 1);
+ k->tgt_offset = 0;
+ k->tgt = tgt;
+ node->left = NULL;
+ node->right = NULL;
+ splay_tree_insert (&devicep->dev_spla

[GOOGLE] Remove size check when loop is very hot

2014-02-28 Thread Dehao Chen

This patch removes the size limit for loop unroll/peel when the loop
is truly hot. This makes the implementation easily maintanable between
FDO and AutoFDO.

Bootstrapped and loadtest perf show neutral impact.

OK for google-4_8?

Thanks,
Dehao

Index: gcc/loop-unroll.c
===
--- gcc/loop-unroll.c (revision 208233)
+++ gcc/loop-unroll.c (working copy)
@@ -347,11 +347,9 @@ code_size_limit_factor(struct loop *loop)
   /* Next, set the value of the codesize-based unroll factor divisor which in
  most loops will need to be set to a value that will reduce or eliminate
  unrolling/peeling.  */
-  if (num_hot_counters < size_threshold * 2
-  && loop->header->count > 0)
+  if (loop->header->count > 0)
 {
-  /* For applications that are less than twice the codesize limit, allow
- limited unrolling for very hot loops.  */
+  /* Allow limited unrolling for very hot loops.  */
   sum_to_header_ratio = profile_info->sum_all / loop->header->count;
   hotness_ratio_threshold = PARAM_VALUE
(PARAM_UNROLLPEEL_HOTNESS_THRESHOLD);
   /* When the profile count sum to loop entry header ratio is smaller than

Re: [GOOGLE] Remove size check when loop is very hot

2014-02-28 Thread Teresa Johnson

Looks good to me.
Thanks, Teresa

On Fri, Feb 28, 2014 at 2:17 PM, Dehao Chen  wrote:
> This patch removes the size limit for loop unroll/peel when the loop
> is truly hot. This makes the implementation easily maintanable between
> FDO and AutoFDO.
>
> Bootstrapped and loadtest perf show neutral impact.
>
> OK for google-4_8?
>
> Thanks,
> Dehao
>
> Index: gcc/loop-unroll.c
> ===
> --- gcc/loop-unroll.c (revision 208233)
> +++ gcc/loop-unroll.c (working copy)
> @@ -347,11 +347,9 @@ code_size_limit_factor(struct loop *loop)
>/* Next, set the value of the codesize-based unroll factor divisor which in
>   most loops will need to be set to a value that will reduce or eliminate
>   unrolling/peeling.  */
> -  if (num_hot_counters < size_threshold * 2
> -  && loop->header->count > 0)
> +  if (loop->header->count > 0)
>  {
> -  /* For applications that are less than twice the codesize limit, allow
> - limited unrolling for very hot loops.  */
> +  /* Allow limited unrolling for very hot loops.  */
>sum_to_header_ratio = profile_info->sum_all / loop->header->count;
>hotness_ratio_threshold = PARAM_VALUE
> (PARAM_UNROLLPEEL_HOTNESS_THRESHOLD);
>/* When the profile count sum to loop entry header ratio is smaller 
> than



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

calloc = malloc + memset

2014-02-28 Thread Marc Glisse


Hello,

this is a stage 1 patch, and I'll ping it then, but if you have comments 
now...


Passes bootstrap+testsuite on x86_64-linux-gnu.

2014-02-28  Marc Glisse  

PR tree-optimization/57742
gcc/
* tree-ssa-forwprop.c (simplify_malloc_memset): New function.
(simplify_builtin_call): Call it.
gcc/testsuite/
* g++.dg/tree-ssa/calloc.C: New testcase.
* gcc.dg/tree-ssa/calloc.c: Likewise.

--
Marc GlisseIndex: gcc/testsuite/g++.dg/tree-ssa/calloc.C
===
--- gcc/testsuite/g++.dg/tree-ssa/calloc.C  (revision 0)
+++ gcc/testsuite/g++.dg/tree-ssa/calloc.C  (working copy)
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu++11 -O3 -fdump-tree-optimized" } */
+
+#include 
+#include 
+#include 
+
+void g(void*);
+inline void* operator new(std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
+{
+  void *p;
+
+  if (sz == 0)
+sz = 1;
+
+  // Slightly modified from the libsupc++ version, that one has 2 calls
+  // to malloc which makes it too hard to optimize.
+  while ((p = std::malloc (sz)) == 0)
+{
+  std::new_handler handler = std::get_new_handler ();
+  if (! handler)
+_GLIBCXX_THROW_OR_ABORT(std::bad_alloc());
+  handler ();
+}
+  return p;
+}
+
+void f(void*p,int n){
+  new(p)std::vector(n);
+}
+
+/* { dg-final { scan-tree-dump-times "calloc" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "memset" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */

Property changes on: gcc/testsuite/g++.dg/tree-ssa/calloc.C
___
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Index: gcc/testsuite/gcc.dg/tree-ssa/calloc.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/calloc.c  (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/calloc.c  (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+#include 
+
+extern int a;
+extern int* b;
+int n;
+void* f(long*q){
+  int*p=malloc(n);
+  ++*q;
+  if(p){
+++*q;
+a=2;
+memset(p,0,n);
+*b=3;
+  }
+  return p;
+}
+void* g(void){
+  float*p=calloc(8,4);
+  return memset(p,0,32);
+}
+
+/* { dg-final { scan-tree-dump-times "calloc" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "memset" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */

Property changes on: gcc/testsuite/gcc.dg/tree-ssa/calloc.c
___
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c (revision 208224)
+++ gcc/tree-ssa-forwprop.c (working copy)
@@ -1487,20 +1487,149 @@ constant_pointer_difference (tree p1, tr
 }
 
   for (i = 0; i < cnt[0]; i++)
 for (j = 0; j < cnt[1]; j++)
   if (exps[0][i] == exps[1][j])
return size_binop (MINUS_EXPR, offs[0][i], offs[1][j]);
 
   return NULL_TREE;
 }
 
+/* Optimize
+   ptr = malloc (n);
+   memset (ptr, 0, n);
+   into
+   ptr = calloc (n);
+   gsi_p is known to point to a call to __builtin_memset.  */
+static bool
+simplify_malloc_memset (gimple_stmt_iterator *gsi_p)
+{
+  /* First make sure we have:
+ ptr = malloc (n);
+ memset (ptr, 0, n);  */
+  gimple stmt2 = gsi_stmt (*gsi_p);
+  if (!integer_zerop (gimple_call_arg (stmt2, 1)))
+return false;
+  tree ptr1, ptr2 = gimple_call_arg (stmt2, 0);
+  tree size = gimple_call_arg (stmt2, 2);
+  if (TREE_CODE (ptr2) != SSA_NAME) 
+return false;
+  gimple stmt1 = SSA_NAME_DEF_STMT (ptr2);
+  tree callee1;
+  /* Handle the case where STMT1 is a unary PHI, which happends
+ for instance with:
+ while (!(p = malloc (n))) { ... }
+ memset (p, 0, n);  */
+  if (!stmt1)
+return false;
+  if (gimple_code (stmt1) == GIMPLE_PHI
+  && gimple_phi_num_args (stmt1) == 1)
+{
+  ptr1 = gimple_phi_arg_def (stmt1, 0);
+  if (TREE_CODE (ptr1) != SSA_NAME)
+   return false;
+  stmt1 = SSA_NAME_DEF_STMT (ptr1);
+}
+  else
+ptr1 = ptr2;
+  if (!stmt1
+  || !is_gimple_call (stmt1)
+  || !(callee1 = gimple_call_fndecl (stmt1)))
+return false;
+
+  bool is_calloc;
+  if (DECL_FUNCTION_CODE (callee1) == BUILT_IN_MALLOC)
+{
+  is_calloc = false;
+  if (!operand_equal_p (gimple_call_arg (stmt1, 0), size, 0))
+   return false;
+}
+  else if (DECL_FUNCTION_CODE (callee1) == BUILT_IN_CAL

Re: [C++ patch] for C++/52369

2014-02-28 Thread Jason Merrill


On 02/28/2014 05:04 PM, Fabien Chêne wrote:

I guess my mistake comes from the fact that %q#D is not present in the
c++98 diagnostic. Shall we homogeneise that as well ?
In favor of %q#D ?


OK.

Jason

[jit] Add typechecking to binary ops and comparisons

2014-02-28 Thread David Malcolm

Committed to branch dmalcolm/jit:

gcc/jit/
* libgccjit.c (gcc_jit_context_new_binary_op): Check that the
operands have the same type.
(gcc_jit_context_new_comparison): Likewise.
---
 gcc/jit/ChangeLog.jit |  6 ++
 gcc/jit/libgccjit.c   | 18 ++
 2 files changed, 24 insertions(+)

diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit
index 625e01a..f2fea8c 100644
--- a/gcc/jit/ChangeLog.jit
+++ b/gcc/jit/ChangeLog.jit
@@ -1,5 +1,11 @@
 2014-02-28  David Malcolm  
 
+   * libgccjit.c (gcc_jit_context_new_binary_op): Check that the
+   operands have the same type.
+   (gcc_jit_context_new_comparison): Likewise.
+
+2014-02-28  David Malcolm  
+
* libgccjit.h (gcc_jit_context_new_cast): New.
* libgccjit.map (gcc_jit_context_new_cast): New.
* libgccjit++.h (gccjit::context::new_cast): New method.
diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c
index 6c078ce..d9f63cf 100644
--- a/gcc/jit/libgccjit.c
+++ b/gcc/jit/libgccjit.c
@@ -752,6 +752,15 @@ gcc_jit_context_new_binary_op (gcc_jit_context *ctxt,
   RETURN_NULL_IF_FAIL (result_type, ctxt, "NULL result_type");
   RETURN_NULL_IF_FAIL (a, ctxt, "NULL a");
   RETURN_NULL_IF_FAIL (b, ctxt, "NULL b");
+  RETURN_NULL_IF_FAIL_PRINTF4 (
+a->get_type () == b->get_type (),
+ctxt,
+"mismatching types for binary op:"
+" a: %s (type: %s) b: %s (type: %s)",
+a->get_debug_string (),
+a->get_type ()->get_debug_string (),
+b->get_debug_string (),
+b->get_type ()->get_debug_string ());
 
   return (gcc_jit_rvalue *)ctxt->new_binary_op (loc, op, result_type, a, b);
 }
@@ -766,6 +775,15 @@ gcc_jit_context_new_comparison (gcc_jit_context *ctxt,
   /* op is checked by the inner function.  */
   RETURN_NULL_IF_FAIL (a, ctxt, "NULL a");
   RETURN_NULL_IF_FAIL (b, ctxt, "NULL b");
+  RETURN_NULL_IF_FAIL_PRINTF4 (
+a->get_type () == b->get_type (),
+ctxt,
+"mismatching types for comparison:"
+" a: %s (type: %s) b: %s (type: %s)",
+a->get_debug_string (),
+a->get_type ()->get_debug_string (),
+b->get_debug_string (),
+b->get_type ()->get_debug_string ());
 
   return (gcc_jit_rvalue *)ctxt->new_comparison (loc, op, a, b);
 }
-- 
1.7.11.7

[PATCH, rs6000] Restrict reload use of FLOAT_REGS

2014-02-28 Thread Bill Schmidt

Hi,

We've encountered a rare bug that occurs when attempting to reload for
an unaligned store in DImode.  For an unaligned store, using stfd gets
preference over std since stfd doesn't have an alignment restriction and
therefore the "m" constraint matches.  However, when there is not a
register available for the REG to be stored, register elimination can
replace the REG with its REQ_EQUIV.  When this is a PLUS, we end up with
an attempt to compute an integer add into a floating-point register, and
things rapidly go downhill.

We had some internal discussion and determined the best way to fix this
is to avoid ever using FLOAT_REGS for a PLUS in
rs6000_preferred_reload_class, similar to what's currently done to avoid
loading constants into FLOAT_REGS.  Uli Weigand pointed out that this
existing test is actually a bit too strong, as rclass could be ALL_REGS
and this prevents us from using GENERAL_REGS in that case.  So I've
relaxed that test to only look for superclasses of FLOAT_REGS.  (If you
feel this is too risky, I can avoid that change.)

The patch below fixes the one case where we've observed this bug in the
wild (it occurred for a particular snapshot of code for an internal
build that doesn't match any public branch).  Because it's dependent on
register spill, it is very difficult to try to produce a test case that
isn't too fragile, so I haven't tried to add one.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu
(--with-cpu=power8) and powerpc64-unknown-linux-gnu (--with-cpu=power7)
with no regressions.  Is this ok for trunk?

Thanks,
Bill


2014-02-28  Bill Schmidt  

* config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow
PLUS rtx's from reloading into a superset of FLOAT_REGS; relax
constraint on constants to only prevent them from being reloaded
into a superset of FLOAT_REGS.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 208207)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -16751,7 +16751,8 @@ rs6000_preferred_reload_class (rtx x, enum reg_cla
   && easy_vector_constant (x, mode))
 return ALTIVEC_REGS;
 
-  if (CONSTANT_P (x) && reg_classes_intersect_p (rclass, FLOAT_REGS))
+  if ((CONSTANT_P (x) || GET_CODE (x) == PLUS)
+  && reg_class_subset_p (FLOAT_REGS, rclass))
 return NO_REGS;
 
   if (GET_MODE_CLASS (mode) == MODE_INT && rclass == NON_SPECIAL_REGS)

[PATCH, rs6000] Document reserved use of "wc" constraint

2014-02-28 Thread Bill Schmidt

Hi,

Hal Finkel requested that we define a constraint for representing
individual CR bits.  We agreed to reserve "wc" for this purpose to
maintain compatibility with LLVM.  This patch documents that use.

A pro-forma regstrap is in progress.  Assuming no problems, is this ok
for trunk?

Thanks,
Bill


2014-02-28  Bill Schmidt  

* config/rs6000/constraints.md: Document reserved use of "wc".


Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 208237)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -56,6 +56,9 @@
 (define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
   "Any VSX register if the -mvsx option was used or NO_REGS.")
 
+;; NOTE: For compatibility, "wc" is reserved to represent individual CR bits.
+;; It is currently used for that purpose in LLVM.
+
 (define_register_constraint "wd" "rs6000_constraints[RS6000_CONSTRAINT_wd]"
   "VSX vector register to hold vector double data or NO_REGS.")

[PATCH, LIBITM] Backport libitm bug fixes to FSF 4.8

2014-02-28 Thread Peter Bergner

I'd like to ask for permission to backport the following two LIBITM bug
fixes to the FSF 4.8 branch.  Although these are not technically fixing
regressions, they do fix the libitm.c/reentrant.c testsuite failure on
s390 and powerpc (or at least it will when we finally get our power8
code backported to FSF 4.8).  It also fixes a real bug on x86 that is
latent because we don't currently have a test case that warms up the
x86's RTM hardware enough such that its xbegin succeeds exposing the
bug.  I'd like this backport so that the 4.8 based distros won't need
to carry this as an add-on patch.

It should also be fairly safe as well, since the fixed code is limited
to the arches (x86, s390 and powerpc) that define USE_HTM_FASTPATH,
so all others definitely won't see a difference.

I'll note I CC'd some of the usual suspects interested in TM as well
as the normal RMs, because LIBITM doesn't seem to have a maintainer
or reviewer listed in the MAINTAINERS file.  Is that an oversight or???

Peter

Backport from mainline
2013-06-20  Torvald Riegel  

* query.cc (_ITM_inTransaction): Abort when using the HTM fastpath.
(_ITM_getTransactionId): Same.
* config/x86/target.h (htm_transaction_active): New.

2013-06-20  Torvald Riegel  

PR libitm/57643
* beginend.cc (gtm_thread::begin_transaction): Handle reentrancy in
the HTM fastpath.

Index: libitm/beginend.cc
===
--- libitm/beginend.cc  (revision 208151)
+++ libitm/beginend.cc  (working copy)
@@ -197,6 +197,8 @@
  // We are executing a transaction now.
  // Monitor the writer flag in the serial-mode lock, and abort
  // if there is an active or waiting serial-mode transaction.
+ // Note that this can also happen due to an enclosing
+ // serial-mode transaction; we handle this case below.
  if (unlikely(serial_lock.is_write_locked()))
htm_abort();
  else
@@ -219,6 +221,14 @@
  tx = new gtm_thread();
  set_gtm_thr(tx);
}
+ // Check whether there is an enclosing serial-mode transaction;
+ // if so, we just continue as a nested transaction and don't
+ // try to use the HTM fastpath.  This case can happen when an
+ // outermost relaxed transaction calls unsafe code that starts
+ // a transaction.
+ if (tx->nesting > 0)
+   break;
+ // Another thread is running a serial-mode transaction.  Wait.
  serial_lock.read_lock(tx);
  serial_lock.read_unlock(tx);
  // TODO We should probably reset the retry count t here, unless
Index: libitm/config/x86/target.h
===
--- libitm/config/x86/target.h  (revision 208151)
+++ libitm/config/x86/target.h  (working copy)
@@ -125,6 +125,13 @@
 {
   return begin_ret & _XABORT_RETRY;
 }
+
+/* Returns true iff a hardware transaction is currently being executed.  */
+static inline bool
+htm_transaction_active ()
+{
+  return _xtest() != 0;
+}
 #endif


Index: libitm/query.cc
===
--- libitm/query.cc (revision 208151)
+++ libitm/query.cc (working copy)
@@ -43,6 +43,15 @@
 _ITM_howExecuting ITM_REGPARM
 _ITM_inTransaction (void)
 {
+#if defined(USE_HTM_FASTPATH)
+  // If we use the HTM fastpath, we cannot reliably detect whether we are
+  // in a transaction because this function can be called outside of
+  // a transaction and thus we can't deduce this by looking at just the serial
+  // lock.  This function isn't used in practice currently, so the easiest
+  // way to handle it is to just abort.
+  if (htm_fastpath && htm_transaction_active())
+htm_abort();
+#endif
   struct gtm_thread *tx = gtm_thr();
   if (tx && (tx->nesting > 0))
 {
@@ -58,6 +67,11 @@
 _ITM_transactionId_t ITM_REGPARM
 _ITM_getTransactionId (void)
 {
+#if defined(USE_HTM_FASTPATH)
+  // See ITM_inTransaction.
+  if (htm_fastpath && htm_transaction_active())
+htm_abort();
+#endif
   struct gtm_thread *tx = gtm_thr();
   return (tx && (tx->nesting > 0)) ? tx->id : _ITM_noTransactionId;
 }

Re: RFA: ipa-devirt PATCH for c++/58678 (devirt causes KDE build failure)

2014-02-28 Thread Jason Merrill

I went ahead and checked in my patch so that the regression is fixed 
over the weekend.


Jason

Re: [PATCH, rs6000] Document reserved use of "wc" constraint

2014-02-28 Thread David Edelsohn

On Fri, Feb 28, 2014 at 7:23 PM, Bill Schmidt
 wrote:
> Hi,
>
> Hal Finkel requested that we define a constraint for representing
> individual CR bits.  We agreed to reserve "wc" for this purpose to
> maintain compatibility with LLVM.  This patch documents that use.
>
> A pro-forma regstrap is in progress.  Assuming no problems, is this ok
> for trunk?

You're not going to implement the new register class?















Okay

Thanks, David

Re: [PATCH, rs6000] Restrict reload use of FLOAT_REGS

2014-02-28 Thread David Edelsohn

On Fri, Feb 28, 2014 at 7:11 PM, Bill Schmidt
 wrote:
> Hi,
>
> We've encountered a rare bug that occurs when attempting to reload for
> an unaligned store in DImode.  For an unaligned store, using stfd gets
> preference over std since stfd doesn't have an alignment restriction and
> therefore the "m" constraint matches.  However, when there is not a
> register available for the REG to be stored, register elimination can
> replace the REG with its REQ_EQUIV.  When this is a PLUS, we end up with
> an attempt to compute an integer add into a floating-point register, and
> things rapidly go downhill.
>
> We had some internal discussion and determined the best way to fix this
> is to avoid ever using FLOAT_REGS for a PLUS in
> rs6000_preferred_reload_class, similar to what's currently done to avoid
> loading constants into FLOAT_REGS.  Uli Weigand pointed out that this
> existing test is actually a bit too strong, as rclass could be ALL_REGS
> and this prevents us from using GENERAL_REGS in that case.  So I've
> relaxed that test to only look for superclasses of FLOAT_REGS.  (If you
> feel this is too risky, I can avoid that change.)
>
> The patch below fixes the one case where we've observed this bug in the
> wild (it occurred for a particular snapshot of code for an internal
> build that doesn't match any public branch).  Because it's dependent on
> register spill, it is very difficult to try to produce a test case that
> isn't too fragile, so I haven't tried to add one.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu
> (--with-cpu=power8) and powerpc64-unknown-linux-gnu (--with-cpu=power7)
> with no regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> 2014-02-28  Bill Schmidt  
>
> * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow
> PLUS rtx's from reloading into a superset of FLOAT_REGS; relax
> constraint on constants to only prevent them from being reloaded
> into a superset of FLOAT_REGS.

This is okay with me. Uli is the best one to comment if this is the right test.

Thanks, David

Re: [PATCH i386 14/8] [AVX-512] Fix exp2 and sqrt tests.

2014-02-28 Thread Kirill Yukhin

Hello Uroš,
On 28 Feb 13:55, Uros Bizjak wrote:
> On Fri, Feb 28, 2014 at 1:14 PM, Kirill Yukhin  
> wrote:
> > Hello,
> > This is relatively obvious patch which eliminates comparision
> > of inifinities for exp2 AVX-512 test and properly comparing floats
> > for avx512f-sqrtps-2.c.
> >
> > Tests pass.
> >
> > Is it ok for trunk?
> >
> > gcc/testsuite/
> > * gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent
> > argument to avoid inf values.
> > * gcc.target/i386/avx512er-vexp2ps-2.c: Compare results with
> > UNION_FP_CHECK machinery.
> 
> You are talking about avx512f-sqrtps-2.c, the ChangeLog refers to
> avx512er-vexp2ps-2.c, but the patch is modifying avx512f-vdivps-2.c.
Sorry for mess.
Broken was avx512f-vdivps-2.c.

Updated testsuite/CHangelog:
* gcc.target/i386/avx512er-vexp2ps-2.c: Decrease exponent
argument to avoid inf values.
* gcc.target/i386/avx512f-vdivps-2.c: Compare results with
UNION_FP_CHECK machinery.

--
Thanks, K

60 matches

Mail list logo