Re: Fix prototype for print_insn in rtl.h

2015-10-20 Thread Nikolai Bozhenov


On 10/15/2015 09:42 PM, Trevor Saunders wrote:

Sorry, a little late to the party.. but why is print_insn even in rtl.h?
it seems that sched-vis.c is the only thing that uses it...

Andrew

I'm going to use it in the scheduler...

but then wouldn't something like sched-int.h make more sense? On
the other hand it seems like printing insns is generally useful
functionality, so I'm curious why the scheduler needs its own way of
doing it.

Trev

As for me, I believe sched-int.h is inappropriate place for print_insn
prototype because the function has nothing scheduler specific. And I like
Jeff's idea of removing sched-vis.c and moving everything from it into
print-rtl.[hc]. It would be nice if such code motion resulted also in some
interface and implementation unification for regular and slim dumpers.

Thanks,
Nikolai



Re: OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)

2015-10-20 Thread Jakub Jelinek
On Mon, Oct 19, 2015 at 07:43:59PM +0300, Ilya Verbin wrote:
> On Mon, Oct 19, 2015 at 18:24:35 +0200, Thomas Schwinge wrote:
> > Chung-Lin, would you please have a look at the following (on
> > gomp-4_0-branch)?  Also, anyone else got any ideas off-hand?
> > 
> > PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> > [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c 
> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> > PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
> > [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c 
> > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test
> 
> Maybe it was caused by this change in gomp_unmap_vars?
> https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01376.html
> 
> Looking at the code, I don't see any difference in async_refcount handling, 
> but
> I was unable to test it without having hardware :(

I think that is the only patch that could have affected it.
The copy_from change is from the old behavior, where basically all
concurrent mappings ored into the copy_from flag and when refcount went to
0, if there were any mappings with from or tofrom, it copied back,
the OpenMP 4.5 behavior is that whether data is copied from the device
is determined solely by the mapping kind of the mapping that performs the
refcount decrease to 0.  Plus there is the always flag which requests
the data copying operation always, no matter what the refcount is (either on
the mapping/refcount increase side, or unmapping/refcount decrease size).

Jakub


Re: [PATCH] Move cproj simplification to match.pd

2015-10-20 Thread Richard Biener
On Mon, 19 Oct 2015, Christophe Lyon wrote:

> On 19 October 2015 at 15:54, Richard Biener  wrote:
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >
> 
> Hi Richard,
> 
> This patch caused arm and aarch64 builds of newlib to cause ICEs:
> In file included from
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdlib.h:11:0,
>  from
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:13:
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:
> In function '_mktm_r':
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:28:9:
> internal compiler error: Segmentation fault
>  _DEFUN (_mktm_r, (tim_p, res, is_gmtime),
> 0xa90205 crash_signal
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:353
> 0x7b3b0c tree_class_check
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3055
> 0x7b3b0c tree_single_nonnegative_warnv_p(tree_node*, bool*, int)
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13025
> 0x814053 gimple_phi_nonnegative_warnv_p
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6239
> 0x814053 gimple_stmt_nonnegative_warnv_p(gimple*, bool*, int)
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6264
> 0x7b5c94 tree_expr_nonnegative_p(tree_node*)
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13325
> 0xe2f657 gimple_simplify_108
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:5116
> 0xe3060d gimple_simplify_TRUNC_MOD_EXPR
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:24762
> 0xe0809b gimple_simplify
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:34389
> 0xe08c2b gimple_resimplify2(gimple**, code_helper*, tree_node*,
> tree_node**, tree_node* (*)(tree_node*))
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:193
> 0xe17600 gimple_simplify(gimple*, code_helper*, tree_node**, gimple**,
> tree_node* (*)(tree_node*), tree_node* (*)(tree_node*))
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:762
> 0x81c694 fold_stmt_1
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:3605
> 0xad0f6c replace_uses_by(tree_node*, tree_node*)
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1835
> 0xad1a2f gimple_merge_blocks
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1921
> 0x67d325 merge_blocks(basic_block_def*, basic_block_def*)
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:776
> 0xae06da cleanup_tree_cfg_bb
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:654
> 0xae1118 cleanup_tree_cfg_1
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:686
> 0xae1118 cleanup_tree_cfg_noloop
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:738
> 0xae1118 cleanup_tree_cfg()
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:793
> 0x9c5c94 execute_function_todo
> 
> /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/passes.c:1920
> Please submit a full bug report,
> 
> This happens for instance with GCC configured
> --target arm-none-eabi
> --with-cpu cortex-a9
> 
> You can download logs of a failed build from
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-build/trunk/228970/build.html
> 
> Sorry, I'm out of office for one week, I can't produce further details.

Ok, without preprocessed source it's hard to track down but from the
backtrace I figure it's an issue similar to that of PR67815.  So a
testcase would be really nice.

Richard.

> Christophe
> 
> 
> > Richard.
> >
> > 2015-10-19  Richard Biener  
> >
> > * gimple-fold.c (gimple_phi_nonnegative_warnv_p): New function.
> > (gimple_stmt_nonnegative_warnv_p): Use it.
> > * match.pd (CPROJ): New operator list.
> > (cproj (complex ...)): Move simplifications from ...
> > * builtins.c (fold_builtin_cproj): ... here.
> >
> > * gcc.dg/torture/builtin-cproj-1.c: Skip for -O0.
> >
> > Index: gcc/gimple-fold.c
> > ===
> > --- gcc/gimple-fold.c   (revision 228877)
> > +++ gcc/gimple-fold.c   (working copy)
> > @@ -6224,6 +6224,24 @@ gimple_call_nonnegative_warnv_p (gimple
> > strict_overflow_p, depth);
> >  }
> >
> > +/* Return true if return value of call STMT is known to be non-negative.
> > +   If

Re: [ARM] Use vector wide add for mixed-mode adds

2015-10-20 Thread Michael Collison

Hi Kyrill,

Since your email I have done the following:

1. Added the ENDIAN_LANE_N to the define_expand patterns for big endian 
targets. The big endian patches produced no change in the test results. 
I still have several execution failures with targeting big endian with 
lto enabled.


2. I diff'd the rtl dumps from a big endian compiler with lto enabled 
and disabled. I also examined the assembly language and there no 
differences except for the .ascii directives.


I want to ask a question about existing patterns in neon.md that utilize 
the vec_select and all the lanes as my example does: Why are the 
following pattern not matched if the target is big endian?



(define_insn "neon_vec_unpack_lo_"
  [(set (match_operand: 0 "register_operand" "=w")
(SE: (vec_select:
  (match_operand:VU 1 "register_operand" "w")
  (match_operand:VU 2 "vect_par_constant_low" ""]
  "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovl. %q0, %e1"
  [(set_attr "type" "neon_shift_imm_long")]
)

(define_insn "neon_vec_unpack_hi_"
  [(set (match_operand: 0 "register_operand" "=w")
(SE: (vec_select:
  (match_operand:VU 1 "register_operand" "w")
  (match_operand:VU 2 "vect_par_constant_high" ""]
  "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovl. %q0, %f1"
  [(set_attr "type" "neon_shift_imm_long")]

These patterns are similar to the new patterns I am adding and I am 
wondering if my patterns should exclude BYTES_BIG_ENDIAN?


On 10/08/2015 04:02 AM, Kyrill Tkachov wrote:

Hi Michael,

On 01/10/15 11:05, Michael Collison wrote:

Kyrill,

I have modified the patch to address your comments. I also modified
check_effective_target_vect_widen_sum_hi_to_si_pattern in
target-supports.exp to
indicate that arm neon supports vector widen sum of HImode to SImode.
This resolved
several test suite failures.

Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have
four related execution failure
tests on armeb-non-linux-gnueabihf with -flto only.

gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


We'd want to get to the bottom of these before committing.
Does codegen before and after the patch show anything?
When it comes to big-endian and NEON, the fiddly parts are
usually lane numbers. Do you need to select the proper lanes with
ENDIAN_LANE_N like Charles in his patch at:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00656.html?

Thanks,
Kyrill



I am debugging but have not tracked down the root cause yet. Feedback?

2015-07-22  Michael Collison 

  * config/arm/neon.md (widen_sum): New patterns
  where mode is VQI to improve mixed mode vectorization.
  * config/arm/neon.md 
(vec_sel_widen_ssum_lo3): New

  define_insn to match low half of signed vaddw.
  * config/arm/neon.md 
(vec_sel_widen_ssum_hi3): New

  define_insn to match high half of signed vaddw.
  * config/arm/neon.md 
(vec_sel_widen_usum_lo3): New

  define_insn to match low half of unsigned vaddw.
  * config/arm/neon.md 
(vec_sel_widen_usum_hi3): New

  define_insn to match high half of unsigned vaddw.
  * testsuite/gcc.target/arm/neon-vaddws16.c: New test.
  * testsuite/gcc.target/arm/neon-vaddws32.c: New test.
  * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
  * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
  * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
  * testsuite/lib/target-supports.exp
  (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
  that arm neon support vector widen sum of HImode TO SImode.


Note that the testsuite changes should have their own ChangeLog entry
with the paths there starting relative to gcc/testsuite/



On 09/23/2015 01:49 AM, Kyrill Tkachov wrote:

Hi Michael,

On 23/09/15 00:52, Michael Collison wrote:

This is a modified version of the previous patch that removes the
documentation and read-md.c fixes. These patches have been submitted
separately and approved.

This patch is designed to address code that was not being 
vectorized due

to missing widening patterns in the ARM backend. Code such as:

int t6(int len, void * dummy, short * __restrict x)
{
 len = len & ~31;
 int result = 0;
 __asm volatile ("");
 for (int i = 0; i < len; i++)
   result += x[i];
 return result;
}

Validated on arm-none-eabi, arm-none-linux-gnueabi,
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.

2015-09-22  Michael Collison 

   * config/arm/neon.md (widen_sum): New patterns
   where mode is VQI to improve mixed mode add vectorization.


Please list all the new define_expands and define_insns
in the changelog. Also, please add an ChangeLog entry for
the testsuite additions.

The approach looks ok to me with a few comments on some
p

Re: [PATCH] Move cproj simplification to match.pd

2015-10-20 Thread Bernhard Reutner-Fischer
On October 19, 2015 3:54:05 PM GMT+02:00, Richard Biener  
wrote:
>

>+static bool
>+gimple_phi_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
>+  int depth)
>+{

Shouldn't all such depth parms be unsigned short or at least unsigned int?

Thanks,



Re: [PATCH, wwwdocs] Add -march=skylake-avx512 to gcc-6/changes.html.

2015-10-20 Thread Kirill Yukhin
On 17 Oct 02:37, Gerald Pfeifer wrote:
> On Fri, 16 Oct 2015, Kirill Yukhin wrote:
> > Is it ok to install?
> 
> Yes, just add a "the" before "following".
> 
> > This switch was backported to gcc-5.
> > Is it ok to create a new section `GCC 5.3' and put it there
> > or I need to wait for actual release?
> 
> How about if you add it now, and add "(pending)" to that title?

Thanks a lot, Gerald. I've implemented both suggestions.

Checked in.
> 
> Gerald

--
Thanks, K


Move more cproj simplifications to match.pd

2015-10-20 Thread Richard Sandiford
Richi and I both had patches for cproj.  I thought I might as well
post a rebased version of what I had, since for the other patches I'd
been moving the constant handling into fold_builtin_1 (where functions
without combinatorial folds also handled constants).  I'm hoping the
switch statement there will eventually become the switch statement
for folding a function with constant arguments.

The patch also makes build_complex_cproj available globally and uses it
for the existing match.pd rules.

Certainly not a big deal or a big improvement, but... tested on
x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.  OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_cproj): Delete.
(fold_builtin_1): Handle constant arguments here.
(build_complex_cproj): Move to...
* tree.c: ...here.
* tree.h (build_complex_cproj): Declare.
* match.pd: Fold cproj(x)->x if x has no infinity.
Use build_complex_cproj for existing cproj rules.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index a9872c4..16f3bfd 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7539,50 +7539,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree arg)
   return NULL_TREE;
 }
 
-/* Build a complex (inf +- 0i) for the result of cproj.  TYPE is the
-   complex tree type of the result.  If NEG is true, the imaginary
-   zero is negative.  */
-
-static tree
-build_complex_cproj (tree type, bool neg)
-{
-  REAL_VALUE_TYPE rinf, rzero = dconst0;
-  
-  real_inf (&rinf);
-  rzero.sign = neg;
-  return build_complex (type, build_real (TREE_TYPE (type), rinf),
-   build_real (TREE_TYPE (type), rzero));
-}
-
-/* Fold call to builtin cproj, cprojf or cprojl with argument ARG.  TYPE is the
-   return type.  Return NULL_TREE if no simplification can be made.  */
-
-static tree
-fold_builtin_cproj (location_t loc, tree arg, tree type)
-{
-  if (!validate_arg (arg, COMPLEX_TYPE)
-  || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != REAL_TYPE)
-return NULL_TREE;
-
-  /* If there are no infinities, return arg.  */
-  if (! HONOR_INFINITIES (type))
-return non_lvalue_loc (loc, arg);
-
-  /* Calculate the result when the argument is a constant.  */
-  if (TREE_CODE (arg) == COMPLEX_CST)
-{
-  const REAL_VALUE_TYPE *real = TREE_REAL_CST_PTR (TREE_REALPART (arg));
-  const REAL_VALUE_TYPE *imag = TREE_REAL_CST_PTR (TREE_IMAGPART (arg));
-  
-  if (real_isinf (real) || real_isinf (imag))
-   return build_complex_cproj (type, imag->sign);
-  else
-   return arg;
-}
-
-  return NULL_TREE;
-}
-
 /* Fold function call to builtin tan, tanf, or tanl with argument ARG.
Return NULL_TREE if no simplification can be made.  */
 
@@ -9505,7 +9461,20 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
   break;
 
 CASE_FLT_FN (BUILT_IN_CPROJ):
-  return fold_builtin_cproj (loc, arg0, type);
+  if (TREE_CODE (arg0) == COMPLEX_CST
+ && TREE_CODE (TREE_TYPE (TREE_TYPE (arg0))) == REAL_TYPE)
+   {
+ const REAL_VALUE_TYPE *real
+   = TREE_REAL_CST_PTR (TREE_REALPART (arg0));
+ const REAL_VALUE_TYPE *imag
+   = TREE_REAL_CST_PTR (TREE_IMAGPART (arg0));
+
+ if (real_isinf (real) || real_isinf (imag))
+   return build_complex_cproj (type, imag->sign);
+ else
+   return arg0;
+   }
+  break;
 
 CASE_FLT_FN (BUILT_IN_CSIN):
   if (validate_arg (arg0, COMPLEX_TYPE)
diff --git a/gcc/match.pd b/gcc/match.pd
index aaca3a0..7d16c52 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2448,30 +2448,24 @@ along with GCC; see the file COPYING3.  If not see
   (CABS (complex @0 @0))
   (mult (abs @0) { build_real_truncate (type, dconst_sqrt2 ()); })))
 
+/* cproj(x) -> x if we're ignoring infinities.  */
+(simplify
+ (CPROJ @0)
+ (if (!HONOR_INFINITIES (type))
+   @0))
+
 /* If the real part is inf and the imag part is known to be
nonnegative, return (inf + 0i).  */
 (simplify
  (CPROJ (complex REAL_CST@0 tree_expr_nonnegative_p@1))
  (if (real_isinf (TREE_REAL_CST_PTR (@0)))
-  (with
-{
-  REAL_VALUE_TYPE rinf;
-  real_inf (&rinf);
-}
-   { build_complex (type, build_real (TREE_TYPE (type), rinf),
-   build_zero_cst (TREE_TYPE (type))); })))
+  { build_complex_cproj (type, false); }))
+
 /* If the imag part is inf, return (inf+I*copysign(0,imag)).  */
 (simplify
  (CPROJ (complex @0 REAL_CST@1))
  (if (real_isinf (TREE_REAL_CST_PTR (@1)))
-  (with
-{
-  REAL_VALUE_TYPE rinf, rzero = dconst0;
-  real_inf (&rinf);
-  rzero.sign = TREE_REAL_CST_PTR (@1)->sign;
-}
-   { build_complex (type, build_real (TREE_TYPE (type), rinf),
-   build_real (TREE_TYPE (type), rzero)); })))
+  { build_complex_cproj (type, TREE_REAL_CST_PTR (@1)->sign); }))
 
 
 /* Narrowing of arithmetic and logical operations. 
diff --git a/gcc/tree.c b/gcc/tree.c
index 9d0e9de..1405328 100644
--- a/gcc/tree.c
+++ b

Move tan simplifications to match.pd

2015-10-20 Thread Richard Sandiford
Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_tab): Delete.
(fold_builtin_1): Handle constant tan arguments here.
* match.pd: Simplify (tan (atan x)) to x.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 16f3bfd..3244538 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -160,7 +160,6 @@ static rtx expand_builtin_fabs (tree, rtx, rtx);
 static rtx expand_builtin_signbit (tree, rtx);
 static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
 static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
-static tree fold_builtin_tan (tree, tree);
 static tree fold_builtin_trunc (location_t, tree, tree);
 static tree fold_builtin_floor (location_t, tree, tree);
 static tree fold_builtin_ceil (location_t, tree, tree);
@@ -7539,33 +7538,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree arg)
   return NULL_TREE;
 }
 
-/* Fold function call to builtin tan, tanf, or tanl with argument ARG.
-   Return NULL_TREE if no simplification can be made.  */
-
-static tree
-fold_builtin_tan (tree arg, tree type)
-{
-  enum built_in_function fcode;
-  tree res;
-
-  if (!validate_arg (arg, REAL_TYPE))
-return NULL_TREE;
-
-  /* Calculate the result when the argument is a constant.  */
-  if ((res = do_mpfr_arg1 (arg, type, mpfr_tan, NULL, NULL, 0)))
-return res;
-
-  /* Optimize tan(atan(x)) = x.  */
-  fcode = builtin_mathfn_code (arg);
-  if (flag_unsafe_math_optimizations
-  && (fcode == BUILT_IN_ATAN
- || fcode == BUILT_IN_ATANF
- || fcode == BUILT_IN_ATANL))
-return CALL_EXPR_ARG (arg, 0);
-
-  return NULL_TREE;
-}
-
 /* Fold function call to builtin sincos, sincosf, or sincosl.  Return
NULL_TREE if no simplification can be made.  */
 
@@ -9613,7 +9585,9 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
   break;
 
 CASE_FLT_FN (BUILT_IN_TAN):
-  return fold_builtin_tan (arg0, type);
+  if (validate_arg (arg0, REAL_TYPE))
+   return do_mpfr_arg1 (arg0, type, mpfr_tan, NULL, NULL, 0);
+  break;
 
 CASE_FLT_FN (BUILT_IN_CEXP):
   return fold_builtin_cexp (loc, arg0, type);
diff --git a/gcc/match.pd b/gcc/match.pd
index 7d16c52..cd02b04 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not see
 (define_operator_list SIN BUILT_IN_SINF BUILT_IN_SIN BUILT_IN_SINL)
 (define_operator_list COS BUILT_IN_COSF BUILT_IN_COS BUILT_IN_COSL)
 (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
+(define_operator_list ATAN BUILT_IN_ATANF BUILT_IN_ATAN BUILT_IN_ATANL)
 (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
 (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
 (define_operator_list CPROJ BUILT_IN_CPROJF BUILT_IN_CPROJ BUILT_IN_CPROJL)
@@ -2343,7 +2344,14 @@ along with GCC; see the file COPYING3.  If not see
   /* cbrt(expN(x)) -> expN(x/3).  */
   (simplify
(cbrts (exps @0))
-   (exps (mult @0 { build_real_truncate (type, dconst_third ()); })
+   (exps (mult @0 { build_real_truncate (type, dconst_third ()); }
+
+ /* tan(atan(x)) -> x.  */
+ (for tans (TAN)
+  atans (ATAN)
+  (simplify
+   (tans (atans @0))
+   @0)))
 
 /* cabs(x+0i) or cabs(0+xi) -> abs(x).  */
 (simplify



Re: using scratchpads to enhance RTL-level if-conversion: revised patch

2015-10-20 Thread Richard Biener
On Tue, Oct 20, 2015 at 7:43 AM, Jeff Law  wrote:
> On 10/14/2015 01:15 PM, Bernd Schmidt wrote:
>>
>> On 10/14/2015 07:43 PM, Jeff Law wrote:
>>>
>>> Obviously some pessimization relative to current code is necessary to
>>> fix some of the problems WRT thread safety and avoiding things like
>>> introducing faults in code which did not previously fault.
>>
>>
>> Huh? This patch is purely an (attempt at) optimization, not something
>> that fixes any problems.
>
> Then I must be mentally merging two things Abe has been working on then.
> He's certainly had an if-converter patch that was designed to avoid
> introducing races in code that didn't previously have races.
>
> Looking back through the archives that appears to be the case. His patches
> to avoid racing are for the tree level if converter, not the RTL if
> converter.

Even for the tree level this wasn't the case, he just run into a bug
of the existing
converter that I've fixed meanwhile.

> Sigh, sorry for the confusion.  It's totally my fault.  Assuming Abe doesn't
> have a correctness case at all here, then I don't see any way for the code
> to go forward as-is since it's likely making things significantly worse.
>
>>
>> I can't test valgrind right now, it fails to run on my machine, but I
>> guess it could adapt to allow stores slightly below the stack (maybe
>> warning once)? It seems like a bit of an edge case to worry about, but
>> if supporting it is critical and it can't be changed to adapt to new
>> optimizations, then I think we're probably better off entirely without
>> this scratchpad transformation.
>>
>> Alternatively I can think of a few other possible approaches which
>> wouldn't require this kind of bloat:
>>   * add support for allocating space in the stack redzone. That could be
>> interesting for the register allocator as well. Would help only
>> x86_64, but that's a large fraction of gcc's userbase.
>>   * add support for opportunistically finding unused alignment padding
>> in the existing stack frame. Less likely to work but would produce
>> better results when it does.
>>   * on embedded targets we probably don't have to worry about valgrind,
>> so do the optimal (sp - x) thing there
>>   * allocate a single global as the dummy target. Might be more
>> expensive to load the address on some targets though.
>>   * at least find a way to express costs for this transformation.
>> Difficult since you don't yet necessarily know if the function is
>> going to have a stack frame. Hence, IMO this approach is flawed.
>> (You'll still want cost estimates even when not allocating stuff in
>> the normal stack frame, because generated code will still execute
>> between two and four extra instructions).
>
> One could argue these should all be on the table.  However, I tend to really
> dislike using area beyond the current stack.  I realize it's throw-away
> data, but it just seems like a bad idea to me -- even on embedded targets
> that don't support valgrind.
>
>


Re: [PR fortran/63858] Fix mix of OpenACC and OpenMP sentinels in continuations

2015-10-20 Thread Jakub Jelinek
On Fri, Oct 09, 2015 at 12:15:24PM +0200, Thomas Schwinge wrote:
> diff --git gcc/fortran/scanner.c gcc/fortran/scanner.c
> index bfb7d45..1e1ea84 100644
> --- gcc/fortran/scanner.c
> +++ gcc/fortran/scanner.c
> @@ -935,6 +935,63 @@ skip_free_comments (void)
>return false;
>  }
>  
> +/* Return true if MP was matched in fixed form.  */
> +static bool
> +skip_omp_attribute_fixed (locus *start)

Technically, this isn't attribute, but sentinel.
So, skip_fixed_omp_sentinel?  I know the free functions
are called attribute, perhaps we should rename them too,
patch to do so preapproved.

> +{
> +  gfc_char_t c;
> +  if (((c = next_char ()) == 'm' || c == 'M')
> +  && ((c = next_char ()) == 'p' || c == 'P'))
> +{
> +  c = next_char ();
> +  if (c != '\n'
> +   && (continue_flag

The original code checked here
(openmp_flag && continue_flag)
instead.  Is that change intentional?
Looking around, we probably don't have a testcase coverage for say
fixed form:

C*OMP+PARALLEL DO
do ...

(i.e. where it starts with an OpenMP (or OpenACC) continuation, without
non-continued line first), or for free form where:

something &
!$omp & parallel

(ditto for OpenACC).

> +   while (gfc_is_whitespace (c));
> +   if (c != '\n' && c != '!')
> + {
> +   /* Canonicalize to *$omp.  */

The comment has a pasto, by storing * you canonicalize to *$acc.

> -   if (flag_openacc)
> +   if (flag_openacc || (flag_openmp || flag_openmp_simd))

I'd just write
if (flag_openacc || flag_openmp || flag_openmp_simd)
the ()s around are just misleading.

Anyway, if the removal of "openmp_flag &&" is intentional, then
the patch is ok with the above mentioned changes.  We can deal with the
cases I've mentioned up above in a follow-up.

Jakub


Re: Move more cproj simplifications to match.pd

2015-10-20 Thread Richard Biener
On Tue, Oct 20, 2015 at 11:03 AM, Richard Sandiford
 wrote:
> Richi and I both had patches for cproj.  I thought I might as well
> post a rebased version of what I had, since for the other patches I'd
> been moving the constant handling into fold_builtin_1 (where functions
> without combinatorial folds also handled constants).  I'm hoping the
> switch statement there will eventually become the switch statement
> for folding a function with constant arguments.
>
> The patch also makes build_complex_cproj available globally and uses it
> for the existing match.pd rules.
>
> Certainly not a big deal or a big improvement, but... tested on
> x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.  OK to install?

Ok with build_complex_cproj renamed to sth more descriptive - build_complex_inf?
After all it builds a complex Inf with an imag -0 eventually.

Richard.

> Thanks,
> Richard
>
>
> gcc/
> * builtins.c (fold_builtin_cproj): Delete.
> (fold_builtin_1): Handle constant arguments here.
> (build_complex_cproj): Move to...
> * tree.c: ...here.
> * tree.h (build_complex_cproj): Declare.
> * match.pd: Fold cproj(x)->x if x has no infinity.
> Use build_complex_cproj for existing cproj rules.
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index a9872c4..16f3bfd 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -7539,50 +7539,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree 
> arg)
>return NULL_TREE;
>  }
>
> -/* Build a complex (inf +- 0i) for the result of cproj.  TYPE is the
> -   complex tree type of the result.  If NEG is true, the imaginary
> -   zero is negative.  */
> -
> -static tree
> -build_complex_cproj (tree type, bool neg)
> -{
> -  REAL_VALUE_TYPE rinf, rzero = dconst0;
> -
> -  real_inf (&rinf);
> -  rzero.sign = neg;
> -  return build_complex (type, build_real (TREE_TYPE (type), rinf),
> -   build_real (TREE_TYPE (type), rzero));
> -}
> -
> -/* Fold call to builtin cproj, cprojf or cprojl with argument ARG.  TYPE is 
> the
> -   return type.  Return NULL_TREE if no simplification can be made.  */
> -
> -static tree
> -fold_builtin_cproj (location_t loc, tree arg, tree type)
> -{
> -  if (!validate_arg (arg, COMPLEX_TYPE)
> -  || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != REAL_TYPE)
> -return NULL_TREE;
> -
> -  /* If there are no infinities, return arg.  */
> -  if (! HONOR_INFINITIES (type))
> -return non_lvalue_loc (loc, arg);
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if (TREE_CODE (arg) == COMPLEX_CST)
> -{
> -  const REAL_VALUE_TYPE *real = TREE_REAL_CST_PTR (TREE_REALPART (arg));
> -  const REAL_VALUE_TYPE *imag = TREE_REAL_CST_PTR (TREE_IMAGPART (arg));
> -
> -  if (real_isinf (real) || real_isinf (imag))
> -   return build_complex_cproj (type, imag->sign);
> -  else
> -   return arg;
> -}
> -
> -  return NULL_TREE;
> -}
> -
>  /* Fold function call to builtin tan, tanf, or tanl with argument ARG.
> Return NULL_TREE if no simplification can be made.  */
>
> @@ -9505,7 +9461,20 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
>break;
>
>  CASE_FLT_FN (BUILT_IN_CPROJ):
> -  return fold_builtin_cproj (loc, arg0, type);
> +  if (TREE_CODE (arg0) == COMPLEX_CST
> + && TREE_CODE (TREE_TYPE (TREE_TYPE (arg0))) == REAL_TYPE)
> +   {
> + const REAL_VALUE_TYPE *real
> +   = TREE_REAL_CST_PTR (TREE_REALPART (arg0));
> + const REAL_VALUE_TYPE *imag
> +   = TREE_REAL_CST_PTR (TREE_IMAGPART (arg0));
> +
> + if (real_isinf (real) || real_isinf (imag))
> +   return build_complex_cproj (type, imag->sign);
> + else
> +   return arg0;
> +   }
> +  break;
>
>  CASE_FLT_FN (BUILT_IN_CSIN):
>if (validate_arg (arg0, COMPLEX_TYPE)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index aaca3a0..7d16c52 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2448,30 +2448,24 @@ along with GCC; see the file COPYING3.  If not see
>(CABS (complex @0 @0))
>(mult (abs @0) { build_real_truncate (type, dconst_sqrt2 ()); })))
>
> +/* cproj(x) -> x if we're ignoring infinities.  */
> +(simplify
> + (CPROJ @0)
> + (if (!HONOR_INFINITIES (type))
> +   @0))
> +
>  /* If the real part is inf and the imag part is known to be
> nonnegative, return (inf + 0i).  */
>  (simplify
>   (CPROJ (complex REAL_CST@0 tree_expr_nonnegative_p@1))
>   (if (real_isinf (TREE_REAL_CST_PTR (@0)))
> -  (with
> -{
> -  REAL_VALUE_TYPE rinf;
> -  real_inf (&rinf);
> -}
> -   { build_complex (type, build_real (TREE_TYPE (type), rinf),
> -   build_zero_cst (TREE_TYPE (type))); })))
> +  { build_complex_cproj (type, false); }))
> +
>  /* If the imag part is inf, return (inf+I*copysign(0,imag)).  */
>  (simplify
>   (CPROJ (complex @0 REAL_CST@1))
>   (if (real_isinf (TREE_REAL_CST_PTR (@1)))
> -  (with
> -{

Re: Move tan simplifications to match.pd

2015-10-20 Thread Richard Biener
On Tue, Oct 20, 2015 at 11:14 AM, Richard Sandiford
 wrote:
> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
> OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
> * builtins.c (fold_builtin_tab): Delete.
> (fold_builtin_1): Handle constant tan arguments here.
> * match.pd: Simplify (tan (atan x)) to x.
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 16f3bfd..3244538 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -160,7 +160,6 @@ static rtx expand_builtin_fabs (tree, rtx, rtx);
>  static rtx expand_builtin_signbit (tree, rtx);
>  static tree fold_builtin_pow (location_t, tree, tree, tree, tree);
>  static tree fold_builtin_powi (location_t, tree, tree, tree, tree);
> -static tree fold_builtin_tan (tree, tree);
>  static tree fold_builtin_trunc (location_t, tree, tree);
>  static tree fold_builtin_floor (location_t, tree, tree);
>  static tree fold_builtin_ceil (location_t, tree, tree);
> @@ -7539,33 +7538,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree 
> arg)
>return NULL_TREE;
>  }
>
> -/* Fold function call to builtin tan, tanf, or tanl with argument ARG.
> -   Return NULL_TREE if no simplification can be made.  */
> -
> -static tree
> -fold_builtin_tan (tree arg, tree type)
> -{
> -  enum built_in_function fcode;
> -  tree res;
> -
> -  if (!validate_arg (arg, REAL_TYPE))
> -return NULL_TREE;
> -
> -  /* Calculate the result when the argument is a constant.  */
> -  if ((res = do_mpfr_arg1 (arg, type, mpfr_tan, NULL, NULL, 0)))
> -return res;
> -
> -  /* Optimize tan(atan(x)) = x.  */
> -  fcode = builtin_mathfn_code (arg);
> -  if (flag_unsafe_math_optimizations
> -  && (fcode == BUILT_IN_ATAN
> - || fcode == BUILT_IN_ATANF
> - || fcode == BUILT_IN_ATANL))
> -return CALL_EXPR_ARG (arg, 0);
> -
> -  return NULL_TREE;
> -}
> -
>  /* Fold function call to builtin sincos, sincosf, or sincosl.  Return
> NULL_TREE if no simplification can be made.  */
>
> @@ -9613,7 +9585,9 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
>break;
>
>  CASE_FLT_FN (BUILT_IN_TAN):
> -  return fold_builtin_tan (arg0, type);
> +  if (validate_arg (arg0, REAL_TYPE))
> +   return do_mpfr_arg1 (arg0, type, mpfr_tan, NULL, NULL, 0);
> +  break;
>
>  CASE_FLT_FN (BUILT_IN_CEXP):
>return fold_builtin_cexp (loc, arg0, type);
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7d16c52..cd02b04 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not see
>  (define_operator_list SIN BUILT_IN_SINF BUILT_IN_SIN BUILT_IN_SINL)
>  (define_operator_list COS BUILT_IN_COSF BUILT_IN_COS BUILT_IN_COSL)
>  (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
> +(define_operator_list ATAN BUILT_IN_ATANF BUILT_IN_ATAN BUILT_IN_ATANL)
>  (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
>  (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
>  (define_operator_list CPROJ BUILT_IN_CPROJF BUILT_IN_CPROJ BUILT_IN_CPROJL)
> @@ -2343,7 +2344,14 @@ along with GCC; see the file COPYING3.  If not see
>/* cbrt(expN(x)) -> expN(x/3).  */
>(simplify
> (cbrts (exps @0))
> -   (exps (mult @0 { build_real_truncate (type, dconst_third ()); })
> +   (exps (mult @0 { build_real_truncate (type, dconst_third ()); }
> +
> + /* tan(atan(x)) -> x.  */
> + (for tans (TAN)
> +  atans (ATAN)
> +  (simplify
> +   (tans (atans @0))
> +   @0)))
>
>  /* cabs(x+0i) or cabs(0+xi) -> abs(x).  */
>  (simplify
>


Re: [mask-vec_cond, patch 3/2] SLP support

2015-10-20 Thread Ilya Enkovich
2015-10-19 19:05 GMT+03:00 Jeff Law :
> On 10/19/2015 05:21 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch adds missing support for cond_expr with no embedded comparison
>> in SLP.  No new test added because vec cmp SLP test becomes (due to changes
>> in bool patterns by the first patch) a regression test for this patch.  Does
>> it look OK?
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-10-19  Ilya Enkovich  
>>
>> * tree-vect-slp.c (vect_get_and_check_slp_defs): Allow
>> cond_exp with no embedded comparison.
>> (vect_build_slp_tree_1): Likewise.
>
> Is it even valid gimple to have a COND_EXPR that is anything other than a
> conditional?
>
> From looking at gimplify_cond_expr, it looks like we could have a SSA_NAME
> that's a bool as the conditional.  Presumably we're allowing a vector of
> bools as the conditional once we hit the vectorizer, which seems fairly
> natural.

Currently vectorizer doesn't handle such COND_EXPR and never produces
VEC_COND_EXPR with SSA_NAME as a condition. Expand treats such
VEC_COND_EXPR as implicit (OP < 0) case (we just have no optab to
expand it with no comparison). But the first patch in this series[1]
allows such conditions to enable re-using vector comparison result by
multiple VEC_COND_EXPRs.

>
> OK.  Please install when the prerequisites are installed.
>
> Thanks,
> jeff
>

Thanks!
Ilya

[1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00862.html


[Ada] Pragma Constant_After_Elaboration

2015-10-20 Thread Arnaud Charlet
This patch allows for external tools that utilize the front end abstract syntax
tree to querry for pragma Constant_After_Elaboration in the N_Contract node of
a variable. No change in behavior, no need for a test.

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Hristian Kirtchev  

* einfo.adb (Get_Pragma): Minor reformatting. Rename local constant
Is_CDG to Is_CLS. Add pragma Constant_After_Elaboration to the list of
classification pragmas.

Index: einfo.adb
===
--- einfo.adb   (revision 229023)
+++ einfo.adb   (working copy)
@@ -6980,30 +6980,40 @@

 
function Get_Pragma (E : Entity_Id; Id : Pragma_Id) return Node_Id is
-  Is_CDG  : constant Boolean :=
-  Id = Pragma_Abstract_State or else
-  Id = Pragma_Async_Readers  or else
-  Id = Pragma_Async_Writers  or else
-  Id = Pragma_Dependsor else
-  Id = Pragma_Effective_Readsor else
-  Id = Pragma_Effective_Writes   or else
-  Id = Pragma_Extensions_Visible or else
-  Id = Pragma_Global or else
-  Id = Pragma_Initial_Condition  or else
-  Id = Pragma_Initializesor else
-  Id = Pragma_Part_Ofor else
-  Id = Pragma_Refined_Dependsor else
-  Id = Pragma_Refined_Global or else
-  Id = Pragma_Refined_State;
+
+  --  Classification pragmas
+
+  Is_CLS : constant Boolean :=
+ Id = Pragma_Abstract_State or else
+ Id = Pragma_Async_Readers  or else
+ Id = Pragma_Async_Writers  or else
+ Id = Pragma_Constant_After_Elaboration or else
+ Id = Pragma_Dependsor else
+ Id = Pragma_Effective_Readsor else
+ Id = Pragma_Effective_Writes   or else
+ Id = Pragma_Extensions_Visible or else
+ Id = Pragma_Global or else
+ Id = Pragma_Initial_Condition  or else
+ Id = Pragma_Initializesor else
+ Id = Pragma_Part_Ofor else
+ Id = Pragma_Refined_Dependsor else
+ Id = Pragma_Refined_Global or else
+ Id = Pragma_Refined_State;
+
+  --  Contract / test case pragmas
+
   Is_CTC : constant Boolean :=
-  Id = Pragma_Contract_Cases or else
+  Id = Pragma_Contract_Casesor else
   Id = Pragma_Test_Case;
+
+  --  Pre / postcondition pragmas
+
   Is_PPC : constant Boolean :=
-  Id = Pragma_Precondition   or else
-  Id = Pragma_Postcondition  or else
+  Id = Pragma_Precondition  or else
+  Id = Pragma_Postcondition or else
   Id = Pragma_Refined_Post;
 
-  In_Contract : constant Boolean := Is_CDG or Is_CTC or Is_PPC;
+  In_Contract : constant Boolean := Is_CLS or Is_CTC or Is_PPC;
 
   Item  : Node_Id;
   Items : Node_Id;
@@ -7018,7 +7028,7 @@
  if No (Items) then
 return Empty;
 
- elsif Is_CDG then
+ elsif Is_CLS then
 Item := Classifications (Items);
 
  elsif Is_CTC then


[Ada] Avoid overflow in Write_Int

2015-10-20 Thread Arnaud Charlet
This patch avoids an overflow on the most-negative number in Output.Write_Int.
No simple test available.

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Bob Duff  

* output.adb (Write_Int): Work with negative numbers in order to avoid
negating Int'First and thereby causing overflow.

Index: output.adb
===
--- output.adb  (revision 229023)
+++ output.adb  (working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2013, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -350,6 +350,7 @@
 
procedure Write_Char (C : Character) is
begin
+  pragma Assert (Next_Col in Buffer'Range);
   if Next_Col = Buffer'Length then
  Write_Eol;
   end if;
@@ -406,17 +407,29 @@
---
 
procedure Write_Int (Val : Int) is
+  --  Type Int has one extra negative number (i.e. two's complement), so we
+  --  work with negative numbers here. Otherwise, negating Int'First will
+  --  overflow.
+
+  subtype Nonpositive is Int range Int'First .. 0;
+  procedure Write_Abs (Val : Nonpositive);
+  --  Write out the absolute value of Val
+
+  procedure Write_Abs (Val : Nonpositive) is
+  begin
+ if Val < -9 then
+Write_Abs (Val / 10); -- Recursively write higher digits
+ end if;
+
+ Write_Char (Character'Val (-(Val rem 10) + Character'Pos ('0')));
+  end Write_Abs;
+
begin
   if Val < 0 then
  Write_Char ('-');
- Write_Int (-Val);
-
+ Write_Abs (Val);
   else
- if Val > 9 then
-Write_Int (Val / 10);
- end if;
-
- Write_Char (Character'Val ((Val mod 10) + Character'Pos ('0')));
+ Write_Abs (-Val);
   end if;
end Write_Int;
 


[Ada] Do not inline No_Return procedures in GNATprove mode

2015-10-20 Thread Arnaud Charlet
Inlining in GNATprove mode is made to improve results of formal
verification in GNATprove. It should not be done on procedures marked
No_Return, which are handled specially in GNATprove.

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Yannick Moy  

* inline.adb (Can_Be_Inlined_In_GNATprove_Mode): Return
False for procedures marked No_Return.
* sem_util.ads (Enclosing_Declaration): Improve comment.

Index: inline.adb
===
--- inline.adb  (revision 229023)
+++ inline.adb  (working copy)
@@ -1534,6 +1534,12 @@
   elsif In_Package_Visible_Spec (Id) then
  return False;
 
+  --  Do not inline subprograms marked No_Return, possibly used for
+  --  signaling errors, which GNATprove handles specially.
+
+  elsif No_Return (Id) then
+ return False;
+
   --  Do not inline subprograms that have a contract on the spec or the
   --  body. Use the contract(s) instead in GNATprove.
 
Index: sem_util.ads
===
--- sem_util.ads(revision 229023)
+++ sem_util.ads(working copy)
@@ -532,7 +532,8 @@
--  Returns the closest ancestor of Typ that is a CPP type.
 
function Enclosing_Declaration (N : Node_Id) return Node_Id;
-   --  Returns the declaration node enclosing N, if any, or Empty otherwise
+   --  Returns the declaration node enclosing N (including possibly N itself),
+   --  if any, or Empty otherwise
 
function Enclosing_Generic_Body
  (N : Node_Id) return Node_Id;


[Ada] Support for recording bind time environment info

2015-10-20 Thread Arnaud Charlet
This change adds support for recording a set of key=value pairs
at the time an application is built (or more precisely at bind time),
and making this information available at run time. Typical use case
is to record a build timestamp:

$ gnatmake record_build_time -bargs -VBUILD_TIME="`LANG=C date`"
gcc -c record_build_time.adb
gnatbind -VBUILD_TIME=Tue Oct 20 05:51:21 EDT 2015 -x record_build_time.ali
gnatlink record_build_time.ali

$ ./record_build_time 
BT=<>

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Thomas Quinot  

* Makefile.rtl: add the following...
* g-binenv.ads, g-binenv.adb: New unit providing runtime access
to bind time captured values ("bind environment")
* init.c, s-init.ads: declare new global variable
__gl_bind_env_addr.
* bindgen.ads, bindgen.adb (Set_Bind_Env): record a bind
environment key=value pair.
(Gen_Bind_Env_String): helper to produce the bind environment data
called  in the binder generated file.
(Gen_Output_File_Ada): Call the above (Gen_Adainit): Set
__gl_bind_env_addr accordingly.
* switch-b.adb: Support for command line switch -V (user interface
to set a build environment key=value pair)
* bindusg.adb: Document the above

Index: impunit.adb
===
--- impunit.adb (revision 229023)
+++ impunit.adb (working copy)
@@ -238,6 +238,7 @@
 ("g-alvevi", F),  -- GNAT.Altivec.Vector_Views
 ("g-arrspl", F),  -- GNAT.Array_Split
 ("g-awk   ", F),  -- GNAT.AWK
+("g-binenv", F),  -- GNAT.Bind_Environment
 ("g-boubuf", F),  -- GNAT.Bounded_Buffers
 ("g-boumai", F),  -- GNAT.Bounded_Mailboxes
 ("g-bubsor", F),  -- GNAT.Bubble_Sort
Index: bindusg.adb
===
--- bindusg.adb (revision 229023)
+++ bindusg.adb (working copy)
@@ -4,9 +4,9 @@
 --  --
 -- B I N D U S G--
 --  --
---B o d y   --
+-- B o d y  --
 --  --
---  Copyright (C) 1992-2014, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -228,6 +228,10 @@
   Write_Line ("  -vVerbose mode. Error messages, " &
   "header, summary output to stdout");
 
+  --  Line for -V switch
+
+  Write_Line ("  -Vkey=val Record bind-time variable key " &
+  "with value val");
   --  Line for -w switch
 
   Write_Line ("  -wx   Warning mode. (x=s/e for " &
Index: bindgen.adb
===
--- bindgen.adb (revision 229023)
+++ bindgen.adb (working copy)
@@ -35,6 +35,7 @@
 with Osint.B;  use Osint.B;
 with Output;   use Output;
 with Rident;   use Rident;
+with Stringt;  use Stringt;
 with Table;use Table;
 with Targparm; use Targparm;
 with Types;use Types;
@@ -43,6 +44,7 @@
 with System.WCh_Con; use System.WCh_Con;
 
 with GNAT.Heap_Sort_A; use GNAT.Heap_Sort_A;
+with GNAT.HTable;
 
 package body Bindgen is
 
@@ -89,6 +91,9 @@
Lib_Final_Built : Boolean := False;
--  Flag indicating whether the finalize_library rountine has been built
 
+   Bind_Env_String_Built : Boolean := False;
+   --  Flag indicating whether a bind environment string has been built
+
CodePeer_Wrapper_Name : constant String := "call_main_subprogram";
--  For CodePeer, introduce a wrapper subprogram which calls the
--  user-defined main subprogram.
@@ -124,6 +129,22 @@
  Table_Increment  => 200,
  Table_Name   => "PSD_Pragma_Settings");
 
+   
+   -- Bind_Environment Table --
+   
+
+   subtype Header_Num is Int range 0 .. 36;
+
+   function Hash (Nam : Name_Id) return Header_Num;
+
+   package Bind_Environment is new GNAT.HTable.Simple_HTable
+ (Header_Num => Header_Num,
+  Element=> Name_Id,
+  No_Element => No_Name,
+  Key=> Name_Id,
+  Hash   => Hash,
+  Equal  => "=");
+
--
-- Run-Time Globals --
--
@@ -246,6 +267,9 @@
procedure Gen_Adafinal;
--  Generate the Adafinal procedure
 
+   procedure Gen_Bind_Env_String;
+   --  Generate the bind environment buffer
+
procedure G

[Ada] Improve error message for missing dependency item

2015-10-20 Thread Arnaud Charlet
This patch modifies the analysis of pragma Depends to emit a clearer message
concerning a missing dependency item.


-- Source --


--  message.ads

package Message
  with Abstract_State => State,
   Initializes=> State,
   SPARK_Mode
is
   procedure Proc (X : in Integer; Y : out Integer)
 with Global  => (In_Out => State),
  Depends => (State => State,
  Y => X);
end Message;

--  message.adb

package body Message
  with Refined_State => (State => N),
   SPARK_Mode
is
   N : Natural := 0;

   procedure Proc(X : in Integer; Y : out Integer)
 with Refined_Global  => (In_Out => N),
  Refined_Depends => (N => N) 
   is
   begin
  N := N + 1;
  Y := X;
   end Proc;
end Message;


-- Compilation and output --


$ gcc -c message.adb
message.adb:9:11: parameter "X" is missing from input dependence list

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Hristian Kirtchev  

* sem_prag.adb (Check_Usage): Update the calls to Usage_Error.
(Usage_Error): Remove formal parameter Item. Emit a clearer message
concerning a missing dependency item and place it on the related pragma.

Index: sem_prag.adb
===
--- sem_prag.adb(revision 229031)
+++ sem_prag.adb(working copy)
@@ -1220,14 +1220,14 @@
  Used_Items : Elist_Id;
  Is_Input   : Boolean)
   is
- procedure Usage_Error (Item : Node_Id; Item_Id : Entity_Id);
+ procedure Usage_Error (Item_Id : Entity_Id);
  --  Emit an error concerning the illegal usage of an item
 
  -
  -- Usage_Error --
  -
 
- procedure Usage_Error (Item : Node_Id; Item_Id : Entity_Id) is
+ procedure Usage_Error (Item_Id : Entity_Id) is
 Error_Msg : Name_Id;
 
  begin
@@ -1245,10 +1245,10 @@
 
   Add_Item_To_Name_Buffer (Item_Id);
   Add_Str_To_Name_Buffer
-(" & must appear in at least one input dependence list");
+(" & is missing from input dependence list");
 
   Error_Msg := Name_Find;
-  SPARK_Msg_NE (Get_Name_String (Error_Msg), Item, Item_Id);
+  SPARK_Msg_NE (Get_Name_String (Error_Msg), N, Item_Id);
end if;
 
 --  Output case (SPARK RM 6.1.5(10))
@@ -1258,10 +1258,10 @@
 
Add_Item_To_Name_Buffer (Item_Id);
Add_Str_To_Name_Buffer
- (" & must appear in exactly one output dependence list");
+ (" & is missing from output dependence list");
 
Error_Msg := Name_Find;
-   SPARK_Msg_NE (Get_Name_String (Error_Msg), Item, Item_Id);
+   SPARK_Msg_NE (Get_Name_String (Error_Msg), N, Item_Id);
 end if;
  end Usage_Error;
 
@@ -1297,13 +1297,13 @@
   and then not Contains (Used_Items, Item_Id)
 then
if Is_Formal (Item_Id) then
-  Usage_Error (Item, Item_Id);
+  Usage_Error (Item_Id);
 
--  States and global objects are not used properly only when
--  the subprogram is subject to pragma Global.
 
elsif Global_Seen then
-  Usage_Error (Item, Item_Id);
+  Usage_Error (Item_Id);
end if;
 end if;
 


[Ada] Spurious error in instantiation of formal package with attribute

2015-10-20 Thread Arnaud Charlet
When verifying that a function that is an actual of a formal package matches
the corresponding function in the corresponding actual package, functions
given by attributes must be handled specially because each of them ends up
renaming a different generated body, and we must check that the attribute
references themselves match.

No short example available.

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Ed Schonberg  

* sem_ch12.adb (Same_Instantiated_Function): New predicate in
Check_Formal_Package_Instance, used to verify that the formal
and the actual of an actual package match when both are functions
given as attribute references.

Index: sem_ch12.adb
===
--- sem_ch12.adb(revision 229023)
+++ sem_ch12.adb(working copy)
@@ -5759,6 +5759,11 @@
   --  same entity we may have to traverse several definitions to recover
   --  the ultimate entity that they refer to.
 
+  function Same_Instantiated_Function (E1, E2 : Entity_Id) return Boolean;
+  --  The formal and the actual must be identical, but if both are
+  --  given by attributes they end up renaming different generated bodies,
+  --  and we must verify that the attributes themselves match.
+
   function Same_Instantiated_Variable (E1, E2 : Entity_Id) return Boolean;
   --  Similarly, if the formal comes from a nested formal package, the
   --  actual may designate the formal through multiple renamings, which
@@ -5834,6 +5839,35 @@
   end Same_Instantiated_Constant;
 
   
+  -- Same_Instantiated_Function --
+  
+
+  function Same_Instantiated_Function
+(E1, E2 : Entity_Id) return Boolean
+  is
+ U1, U2 : Node_Id;
+  begin
+ if Alias (E1) = Alias (E2) then
+return True;
+
+ elsif Present (Alias (E2)) then
+U1 := Original_Node (Unit_Declaration_Node (E1));
+U2 := Original_Node (Unit_Declaration_Node (Alias (E2)));
+
+return Nkind (U1) = N_Subprogram_Renaming_Declaration
+  and then Nkind (Name (U1)) = N_Attribute_Reference
+
+  and then Nkind (U2) = N_Subprogram_Renaming_Declaration
+  and then Nkind (Name (U2)) = N_Attribute_Reference
+
+  and then
+Attribute_Name (Name (U1)) = Attribute_Name (Name (U2));
+ else
+return False;
+ end if;
+  end Same_Instantiated_Function;
+
+  
   -- Same_Instantiated_Variable --
   
 
@@ -6050,7 +6084,8 @@
 
 else
Check_Mismatch
- (Ekind (E2) /= Ekind (E1) or else (Alias (E1)) /= Alias (E2));
+ (Ekind (E2) /= Ekind (E1)
+or else not Same_Instantiated_Function (E1, E2));
 end if;
 
  else


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Jakub Jelinek
On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote:
> > How's the following (complete patch instead of incremental patch; the
> > driver changes are still the same as before)?  The changes are:
> > 
> >   * libgomp/target.c:gomp_target_init again loads all the plugins.
> >   * libgomp/target.c:resolve_device and
> > libgomp/oacc-init.c:resolve_device verify that a default device
> > (OpenMP device-var ICV, and acc_device_default, respectively) is
> > actually enabled, or resort to host fallback if not.
> >   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> > to enable devices specified by -foffload.  Can be called multiple
> > times (executable, any shared libraries); the set of enabled devices
> > is the union of all those ever requested.
> >   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> > changed to enable all devices.  This is to maintain compatibility
> > with old executables and shared libraries built without the -foffload
> > constructor support.

Any reason not to pass the bitmask of the enabled targets to
GOMP_offload_register_ver instead, to decrease the amount of ctors and
the times you lock the various locks during initialization, or just enable
automatically the devices you load data for during GOMP_offload_register_ver?
I mean, GOMP_offload_register would enable for compatibility all devices,
GOMP_offload_register_ver would enable the device it is registered for.
For -foffload=disable on all shared libraries/binaries, naturally you would
not register anything, thus would not enable any devices (only host fallback
would work).

Or are you worried about the case where one shared library is compiled
with say -foffload=intelmic,ptx but doesn't actually contain any
#pragma omp target/#pragma omp declare target (or OpenACC similar
#directives), but only contains #pragma omp target data and/or the device
query/copying routines, then dlopens some other shared library that actually
has the offloading device code?
That could be solved by adding the call you are talking about, but
if we really should care about that unlikely case, it would be better to
only arrange for it if really needed by the shared library (i.e. if it calls
one of the OpenMP or OpenACC library routines that talk to the devices, or
has #pragma omp target data or similar constructs;
I'd strongly prefer not to have constructors in code that just got compiled
with -fopenmp, even in configuration where some offloading is configured by
default, when nothing in the code really cares about offloading.

> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -401,6 +401,8 @@ static const char 
> *compare_debug_auxbase_opt_spec_function (int, const char **);
>  static const char *pass_through_libs_spec_func (int, const char **);
>  static const char *replace_extension_spec_func (int, const char **);
>  static const char *greater_than_spec_func (int, const char **);
> +static const char *add_omp_infile_spec_func (int, const char **);
> +
>  static char *convert_white_space (char *);
>  
>  /* The Specs Language

I'd like to defer review of the driver bits, can Joseph or Bernd please have
a look at those?

> diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
> index 24fbb94..5da4fa7 100644
> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -48,7 +48,8 @@ enum offload_target_type
>OFFLOAD_TARGET_TYPE_HOST = 2,
>/* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
>OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
> -  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
> +  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
> +  OFFLOAD_TARGET_TYPE_HWM

What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?

> diff --git a/libgomp/target.c b/libgomp/target.c
> index b767410..df51bfb 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -72,6 +72,9 @@ static int num_offload_images;
>  /* Array of descriptors for all available devices.  */
>  static struct gomp_device_descr *devices;
>  
> +/* Set of enabled devices.  */
> +static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];

I must say I don't like the locking for this.
If all you ever change on this is that you change it from 0 to 1,
then supposedly just storing it with __atomic_store, perhaps with
rel semantics, and reading it as __atomic_load, with acquire semantics,
would be good enough?  And perhaps change it into int array,
so that it is actually atomic even on the old Alphas (if there are any
around).

Jakub


Re: [gomp4.1] map clause parsing improvements

2015-10-20 Thread Jakub Jelinek
On Mon, Oct 19, 2015 at 05:00:33PM +0200, Thomas Schwinge wrote:
>   n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
>   if ((ctx->region_type & ORT_TARGET) != 0
>   && !(n->value & GOVD_SEEN)
>   && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0
>   || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT))
> {
>   remove = true;
> 
> I'd suggest turning GOMP_MAP_FLAG_ALWAYS into GOMP_MAP_FLAG_SPECIAL_2,
> and then provide a GOMP_MAP_ALWAYS_P that evaluates to true just for the
> three "always,to", "always,from", and "always,tofrom" cases.

Yeah, that can be done, I'll add it to my todo list.

Jakub


[Ada] Crash in compile-only mode with dynamic predicates.

2015-10-20 Thread Arnaud Charlet
This patch fixes a compiler abort on a program with a type that has defined
dynamic predicate and a call to a procedure with a parameter of that type,
when the program is compiled with -gnatc.

The following must compile quietly:

gcc -c -gnatc p-q.adb

---
with Unchecked_Conversion;
package P is
   function Is_Valid_Name (Item : String) return Boolean;
   subtype Name_T is String with Dynamic_Predicate => Is_Valid_Name (Name_T);

   generic
  type T is private;
   package G is
function Conv is new Unchecked_Conversion (Source => T, Target => Integer);
   end G;
end P;
---
package P.Q is
   type Rec is record
  B : Integer;
   end record;

   package My_G is new G (Rec);
   procedure Proc (Name : Name_T);
end P.Q;
---
package body P.Q is
   procedure Proc (Name : Name_T) is
   begin
  null;
   end;
end P.Q;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Ed Schonberg  

* sem_ch6.adb (Analyze_Subprogram_Body_Helper): Generate freeze
node for subprogram in Compile_Only mode.

Index: sem_ch6.adb
===
--- sem_ch6.adb (revision 229023)
+++ sem_ch6.adb (working copy)
@@ -3215,18 +3215,17 @@
  --  the freeze actions that include the bodies. In particular, extra
  --  formals for accessibility or for return-in-place may need to be
  --  generated. Freeze nodes, if any, are inserted before the current
- --  body. These freeze actions are also needed in ASIS mode to enable
- --  the proper back-annotations.
+ --  body. These freeze actions are also needed in ASIS mode and in
+ --  Compile_Only mode to enable the proper back-end type annotations.
+ --  They are necessary in any case to insure order of elaboration
+ --  in gigi.
 
  if not Is_Frozen (Spec_Id)
-   and then (Expander_Active or ASIS_Mode)
+   and then (Expander_Active
+   or else ASIS_Mode
+   or else (Operating_Mode = Check_Semantics
+  and then Serious_Errors_Detected = 0))
  then
---  Force the generation of its freezing node to ensure proper
---  management of access types in the backend.
-
---  This is definitely needed for some cases, but it is not clear
---  why, to be investigated further???
-
 Set_Has_Delayed_Freeze (Spec_Id);
 Freeze_Before (N, Spec_Id);
  end if;


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Bernd Schmidt

On 10/20/2015 12:02 PM, Jakub Jelinek wrote:

I'd like to defer review of the driver bits, can Joseph or Bernd please have
a look at those?


Last time around I think I asked for some minor changes, like updated 
documentation for give_switch. Other than that, I'm ok with the patch 
iff you are happy with the overall approach.



Bernd




[Ada] Volatile functions

2015-10-20 Thread Arnaud Charlet
This patch implements SPARK volatile functions described by the following
rules:

   A type is said to be effectively volatile if it is either a volatile
   type, an array type whose Volatile_Component aspect is True, or an
   array type whose component type is effectively volatile, a protected
   type, or a descendant of Ada.Synchronous_Task_Control.Suspension_Object.
   An effectively volatile object is a volatile object or an object of an
   effectively volatile type.

   The Boolean aspect Volatile_Function may be specified as part of
   the (explicit) initial declaration of a function. A function whose
   Volatile_Function aspect is True is said to be a volatile function.
   A protected function is also defined to be a volatile function.

   A subprogram whose Volatile_Function aspect is True shall not override an
   inherited primitive operation of a tagged type whose Volatile_Function
   aspect is False.

   A global_item of a nonvolatile function shall not denote either an
   effectively volatile object or an external state abstraction.

   A nonvolatile function shall not have a formal parameter (or result) of an
   effectively volatile type.

   Contrary to the general SPARK 2014 rule that expression evaluation cannot
   have side effects, a read of an effectively volatile object with the
   properties Async_Writers or Effective_Reads set to True is considered to
   have an effect when read. To reconcile this discrepancy, a name denoting
   such an object shall only occur in a non-interfering context. A name occurs
   in a non-interfering context if it is:

  the (protected) prefix of a name denoting a protected operation; or

  the return expression of a simple_return_statement which applies to a
  volatile function; or

  the initial value expression of the extended_return_object_declaration
  of an extended_return_statement which applies to a volatile function; or

The patch also changes the implementation of aspects/pragmas Async_Readers,
Async_Writers, Effective_Reads and Effective_Writes to remove their forward
referencing capabilities and require a static Boolean expression (if any).


-- Source --


--  volatile types.ads

with Ada.Synchronous_Task_Control; use Ada.Synchronous_Task_Control;

package Volatile_Types is
   type Vol_Typ_1 is null record with Volatile;
   type Vol_Typ_1_Ptr is access all Vol_Typ_1;

   type Vol_Typ_2 is array (1 .. 5) of Vol_Typ_1;

   type Vol_Typ_3 is array (1 .. 5) of Boolean with Volatile_Components;

   protected type Vol_Typ_4 is
  entry E;
  procedure P;
   end Vol_Typ_4;

   type Vol_Typ_5 is new Suspension_Object;
end Volatile_Types;

--  volatile types.adb

package body Volatile_Types is
   protected body Vol_Typ_4 is
  entry E when True is begin null; end E;
  procedure P is begin null; end P;
   end Vol_Typ_4;
end Volatile_Types;

--  lr712.ads

package LR712 is

   --  7.1.2 - The Boolean aspect Volatile_Function may be specified as part of
   --  the initial declaration of a function. Function whose Volatile_Function
   --  aspect is True is said to be a volatile function. A protected function
   --  is also defined to be a volatile function.

   function OK_1 return Boolean with Volatile_Function;  --  OK
   function OK_2 return Boolean is (True) with Volatile_Function;--  OK
   function OK_3 return Boolean;
   pragma Volatile_Function; --  OK

   function Error_1 return Boolean;
   function Error_2 return Boolean;
   function Error_3 return Boolean;
   function Error_4 return Boolean;
   function Error_4 return Boolean is (True)
 with Volatile_Function; --  Error
   function Error_5 return Boolean
 with Volatile_Function => Junk; --  Error
   function Error_6 return Boolean
 with Volatile_Function => "Junk";   --  Error
   function Error_7 return Boolean;

   Obj : Integer := 1 with Volatile_Function;--  Error
end LR712;

--  lr712.adb

ckage body LR712 is
   function OK_1 return Boolean is begin return True; end OK_1;
   function OK_3 return Boolean is begin return True; end OK_3;

   function Error_1 return Boolean
 with Volatile_Function  --  Error
   is begin return True; end Error_1;
   function Error_2 return Boolean is (True) with Volatile_Function; --  Error
   function Error_3 return Boolean is separate;
   function Error_5 return Boolean is begin return True; end Error_5;
   function Error_6 return Boolean is begin return True; end Error_6;
   function Error_7 return Boolean is
  pragma Volatile_Function;  --  Error
   begin return True; end Error_7;
end LR712;

--  lr712-error_3.adb

separate (LR712)

function Error_3 return Boolean
  with Volatile_Function -- 

[Ada] Uninitialized variable on illegal placement of aspect specification

2015-10-20 Thread Arnaud Charlet
This patch improves the handling of programs with illegal placement of
aspect specifications, ensuring that the flag that controls the parsing of
a list of declarations is properly set in the presence of errors.

No simple test available.

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Ed Schonberg  

* par-ch3.adb (P_Declarative_Items): In case of misplaced
aspect specifications, ensure that flag Done is properly set to
continue parse.

Index: par-ch3.adb
===
--- par-ch3.adb (revision 229023)
+++ par-ch3.adb (working copy)
@@ -4425,6 +4425,12 @@
 
else
   Error_Msg_SC ("aspect specifications not allowed here");
+
+  --  Assume that this is a misplaced aspect specification
+  --  within a declarative list. After discarding the
+  --  misplaced aspects we can continue the scan.
+
+  Done := False;
end if;
 
declare


[Ada] Abort in front end client due to locked scope table

2015-10-20 Thread Arnaud Charlet
This patch reimplements the way the front end detects whether a type is a
descendant of Ada.Synchronous_Task_Control.Suspension_Object to avoid using
the RTSfind mechanism. This ensures that external clients of the front end
will not fail due to a locked scope table accessed during analysis performed
by RTSfind.

No reproducer possible as this requires an external front end client.

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Hristian Kirtchev  

* rtsfind.ads Remove the entries for Ada.Synchronous_Task_Control
and Suspension_Object from tables RE_Id, RE_Unit_Table and RTU_Id.
* sem_util.adb (Is_Descendant_Of_Suspension_Object): Update
the comment on usage. Use routine Is_Suspension_Object to detect
whether a type matches Suspension_Object.
(Is_Suspension_Object): New routine.
* snames.ads-tmpl: Add predefined names for Suspension_Object
and Synchronous_Task_Control.

Index: rtsfind.ads
===
--- rtsfind.ads (revision 229047)
+++ rtsfind.ads (working copy)
@@ -131,7 +131,6 @@
   Ada_Real_Time,
   Ada_Streams,
   Ada_Strings,
-  Ada_Synchronous_Task_Control,
   Ada_Tags,
   Ada_Task_Identification,
   Ada_Task_Termination,
@@ -607,8 +606,6 @@
 
  RE_Unbounded_String,-- Ada.Strings.Unbounded
 
- RE_Suspension_Object,   -- Ada.Synchronous_Task_Control
-
  RE_Access_Level,-- Ada.Tags
  RE_Alignment,   -- Ada.Tags
  RE_Address_Array,   -- Ada.Tags
@@ -1840,8 +1837,6 @@
 
  RE_Unbounded_String => Ada_Strings_Unbounded,
 
- RE_Suspension_Object=> Ada_Synchronous_Task_Control,
-
  RE_Access_Level => Ada_Tags,
  RE_Alignment=> Ada_Tags,
  RE_Address_Array=> Ada_Tags,
Index: sem_util.adb
===
--- sem_util.adb(revision 229047)
+++ sem_util.adb(working copy)
@@ -11397,9 +11397,7 @@
   function Is_Descendant_Of_Suspension_Object
 (Typ : Entity_Id) return Boolean;
   --  Determine whether type Typ is a descendant of type Suspension_Object
-  --  defined in Ada.Synchronous_Task_Control. This routine is similar to
-  --  Sem_Util.Is_Descendent_Of, however this version does not load unit
-  --  Ada.Synchronous_Task_Control.
+  --  defined in Ada.Synchronous_Task_Control.
 
   
   -- Is_Descendant_Of_Suspension_Object --
@@ -11408,24 +11406,39 @@
   function Is_Descendant_Of_Suspension_Object
 (Typ : Entity_Id) return Boolean
   is
- Cur_Typ : Entity_Id;
- Par_Typ : Entity_Id;
+ function Is_Suspension_Object (Id : Entity_Id) return Boolean;
+ --  Determine whether arbitrary entity Id denotes Suspension_Object
+ --  defined in Ada.Synchronous_Task_Control.
 
-  begin
- --  Do not attempt to load Ada.Synchronous_Task_Control in No_Run_Time
- --  mode. The unit contains tagged types and those are not allowed in
- --  this mode.
+ --
+ -- Is_Suspension_Object --
+ --
 
- if No_Run_Time_Mode then
-return False;
+ function Is_Suspension_Object (Id : Entity_Id) return Boolean is
+ begin
+--  This approach does an exact name match rather than to rely on
+--  RTSfind. Routine Is_Effectively_Volatile is used by clients of
+--  the front end at point where all auxiliary tables are locked
+--  and any modifications to them are treated as violations. Do not
+--  tamper with the tables, instead examine the Chars fields of all
+--  the scopes of Id.
 
- --  Unit Ada.Synchronous_Task_Control is not available, the type
- --  cannot possibly be a descendant of Suspension_Object.
+return
+  Chars (Id) = Name_Suspension_Object
+and then Present (Scope (Id))
+and then Chars (Scope (Id)) = Name_Synchronous_Task_Control
+and then Present (Scope (Scope (Id)))
+and then Chars (Scope (Scope (Id))) = Name_Ada;
+ end Is_Suspension_Object;
 
- elsif not RTE_Available (RE_Suspension_Object) then
-return False;
- end if;
+ --  Local variables
 
+ Cur_Typ : Entity_Id;
+ Par_Typ : Entity_Id;
+
+  --  Start of processing for Is_Descendant_Of_Suspension_Object
+
+  begin
  --  Climb the type derivation chain checking each parent type against
  --  Suspension_Object.
 
@@ -11435,7 +11448,7 @@
 
 --  The current type is a match
 
-if Is_RTE (Cur_Typ, R

Re: [PATCH][AArch64] Fix insn types

2015-10-20 Thread Kyrill Tkachov

Hi Evandro,

On 19/10/15 22:05, Evandro Menezes wrote:

The type assigned to some insn definitions was seemingly not correct:

 * "movi %d0, %1" was of type "fmov"
 * "fmov %s0, wzr" was of type "fconstd"
 * "mov %0, {-1,1}" were of type "csel"

This patch changes their types to:

 * "movi %d0, %1" to type "neon_move"
 * "fmov %s0, wzr" to type "f_mcr"
 * "mov %0, {-1,1}" to type "mov_imm"

Please, commit if it's alright.

Thank you,



Looking at your ChangeLog...

gcc/
* config/aarch64/aarch64.md
(*movdi_aarch64): Change the type of "movi %d0, %1" to "neon_move".
(*movtf_aarch64): Change the type of "fmov %s0, wzr" to "f_mcr".
(*cmov_insn): Change the types of "mov %0, {-1,1}" to 
"mov_imm".
(*cmovsi_insn_uxtw): Idem

The preferred form is "Likewise" rather than "Idem" AFAIK. Also, full stop at 
the end.

--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1130,7 +1130,7 @@
ldrh\\t%w0, %1
strh\\t%w1, %0
mov\\t%w0, %w1"
-  [(set_attr "type" "neon_from_gp,neon_to_gp,fmov,\
+  [(set_attr "type" "neon_from_gp,neon_to_gp,neon_move,\
  f_loads,f_stores,load1,store1,mov_reg")
(set_attr "simd" "yes,yes,yes,*,*,*,*,*")
(set_attr "fp"   "*,*,*,yes,yes,*,*,*")]


I don't think this matches up with your changelog entry.
This isn't the *movdi_aarch64 pattern. From what I can see the *movdi_aarch64 
pattern
already has the type neon_move on the movi\\t%d0, %1 alternative (the last one).

In fact, if I apply your patch using "patch -p1" I see it being applied to the 
*movhf_aarch64 pattern.
Is that what you intended?

Thanks,
Kyrill



[PATCH v2 0/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Maxim Ostapenko

Hi,

this is the second attempt to perform libsanitizer merge from upstream. 
In previous patch set ( 
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01212.html) we have 
revealed an issue with heuristic for old/new style ubsan_data that was 
needed to be fixed upstream + some errors in compiler changes were 
found. I thought that it would me too messy to proceed review in the 
previous thread, so creating the new one.


The first patch is the merge itself. Since the heuristic fix for old/new 
style ubsan_data selection was applied upstream, I'm bumping revision to 
r250806.
Since there aren't significant changes from r249633, I think this should 
be fine.


The second one combines all compiler-related changes and addresses 
Jakub's nits from previous review.


Patches 3, 4 and 5 are applied to library and were preapproved in 
previous review, but I'm attaching them here to form the full patch set.


In patch 6, I'm trying to add a brief instruction how to perform the 
merge. This is just a documentation patch.


Tested and {A, UB}San bootstrapped on x86-linux-gnu, x86_64-linux-gnu 
and aarch64-linux-gnu targets.


Thanks,
-Maxim


Re: [PATCH, sh][v3] musl support for sh

2015-10-20 Thread Oleg Endo
On Mon, 2015-10-19 at 20:01 +0100, Szabolcs Nagy wrote:
> On 17/10/15 02:14, Oleg Endo wrote:
> > On Fri, 2015-10-16 at 17:06 +0100, Szabolcs Nagy wrote:
> >> Revision of
> >> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01636.html
> >>
> >> The musl dynamic linker name is /lib/ld-musl-sh{-nofpu}{-fdpic}.so.1
> >>
> >> New in this revision:
> >>
> >> Add -fdpic to the name, will be useful with the pending sh2 FDPIC support.
> >>
> >> 2015-10-16  Gregor Richards  
> >>Szabolcs Nagy  
> >>
> >>* config/sh/linux.h (MUSL_DYNAMIC_LINKER): Define.
> >>(MUSL_DYNAMIC_LINKER_E, MUSL_DYNAMIC_LINKER_FP): Define.
> >
> >
> >> +#if TARGET_CPU_DEFAULT & (MASK_HARD_SH2A_DOUBLE | MASK_SH4)
> >> +/* "-nofpu" if any nofpu option is specified.  */
> >> +#define MUSL_DYNAMIC_LINKER_FP \
> >> +  "%{m1|m2|m2a-nofpu|m3|m4-nofpu|m4-100-nofpu|m4-200-nofpu|m4-300-nofpu|" 
> >> \
> >> +  "m4-340|m4-400|m4-500|m4al|m5-32media-nofpu|m5-64media-nofpu|" \
> >> +  "m5-compact-nofpu:-nofpu}"
> >> +#else
> >> +/* "-nofpu" if none of the hard fpu options are specified.  */
> >> +#define MUSL_DYNAMIC_LINKER_FP \
> >> +  
> >> "%{m2a|m4|m4-100|m4-200|m4-300|m4a|m5-32media|m5-64media|m5-compact:;:-nofpu}"
> >> +#endif
> >
> > SH5 has been declared obsolete.  Please do not add any new SH5 related
> > things.  In this case, drop the m5-* thingies.
> >
> 
> removed m5*.
> OK to commit with this?

OK for trunk.

Cheers,
Oleg



Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Thomas Schwinge
Hi Jakub!

Thanks for the review.

On Tue, 20 Oct 2015 12:02:45 +0200, Jakub Jelinek  wrote:
> On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote:
> > > How's the following (complete patch instead of incremental patch; the
> > > driver changes are still the same as before)?  The changes are:
> > > 
> > >   * libgomp/target.c:gomp_target_init again loads all the plugins.
> > >   * libgomp/target.c:resolve_device and
> > > libgomp/oacc-init.c:resolve_device verify that a default device
> > > (OpenMP device-var ICV, and acc_device_default, respectively) is
> > > actually enabled, or resort to host fallback if not.
> > >   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> > > to enable devices specified by -foffload.  Can be called multiple
> > > times (executable, any shared libraries); the set of enabled devices
> > > is the union of all those ever requested.
> > >   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> > > changed to enable all devices.  This is to maintain compatibility
> > > with old executables and shared libraries built without the -foffload
> > > constructor support.
> 
> Any reason not to pass the bitmask of the enabled targets to
> GOMP_offload_register_ver instead, to decrease the amount of ctors and
> the times you lock the various locks during initialization, or just enable
> automatically the devices you load data for during GOMP_offload_register_ver?
> I mean, GOMP_offload_register would enable for compatibility all devices,
> GOMP_offload_register_ver would enable the device it is registered for.
> For -foffload=disable on all shared libraries/binaries, naturally you would
> not register anything, thus would not enable any devices (only host fallback
> would work).

As explained a few times already: GOMP_offload_register_ver constructors
will only be generated if there actually are offloaded code regions, but
for example:

#include 
int main()
{
  __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia));
  return 0;
}

... is a valid OpenACC program (untested), which doesn't contain any
offloaded code regions.  As a user I'd expect it to return different
answers if compiled with -foffload=nvptx-none in contrast to
-foffload=disable.  Actually, I can foresee exactly such code to be used
to probe for offloading being available, for example in testsuites.  And,
I guess we agree that under -foffload=disable we'd like the
compilation/runtime system to be configured in a way that no offloading
will happen?

Always creating (dummy) GOMP_offload_register_ver constructors has been
another suggestion that I had voiced much earlier in this thread (months
ago), but everyone (including me) taking part in the discussion agreed
that it'd cause even higher compile-time overhead.

> Or are you worried about the case where one shared library is compiled
> with say -foffload=intelmic,ptx but doesn't actually contain any
> #pragma omp target/#pragma omp declare target (or OpenACC similar
> #directives), but only contains #pragma omp target data and/or the device
> query/copying routines, then dlopens some other shared library that actually
> has the offloading device code?

That's another example, yes.

> That could be solved by adding the call you are talking about, but
> if we really should care about that unlikely case, it would be better to
> only arrange for it if really needed by the shared library (i.e. if it calls
> one of the OpenMP or OpenACC library routines that talk to the devices, or
> has #pragma omp target data or similar constructs;
> I'd strongly prefer not to have constructors in code that just got compiled
> with -fopenmp, even in configuration where some offloading is configured by
> default, when nothing in the code really cares about offloading.

So, how to resolve our different opinions?  I mean, for any serious
program code, there will be constructor calls into libgomp already; are
you expecting that adding one more really will cause any noticeable
overhead?

I agree that enabling devices for GOMP_offload_register_ver calls makes
sense.  (I indeed had considered this earlier, but it didn't lead to
solving the problem complete -- see above.)  Can we come up with a scheme
to do it this way, and only generate the GOMP_enable_offload_targets
constructor of no GOMP_offload_register_ver constructors have been
generated?  But I have no idea how to implement that in a non-convoluted
way.  (And, it sounds excessive to me in terms of implementation overhead
on our side, in contrast to execution overhead of one libgomp constructor
call.)

> > --- a/gcc/gcc.c
> > +++ b/gcc/gcc.c
> > @@ -401,6 +401,8 @@ static const char 
> > *compare_debug_auxbase_opt_spec_function (int, const char **);
> >  static const char *pass_through_libs_spec_func (int, const char **);
> >  static const char *replace_extension_spec_func (int, const char **);
> >  static const char *greater_th

[PATCH v2 2/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Maxim Ostapenko
This patch introduces required compiler changes. Now, we don't version 
asan_init, we have a special __asan_version_mismatch_check_v[n] symbol 
for this. asan_stack_malloc_[n] doesn't take a local stack as a second 
parameter anymore, so don't pass it. Also, ubsan_instrument_float_cast 
was adjusted to pass source location for libubsan if it is possible.
gcc/ChangeLog:

2015-10-20  Maxim Ostapenko  

gcc/

	* asan.c (asan_emit_stack_protection): Don't pass local stack to
	asan_stack_malloc_[n] anymore. Check if asan_stack_malloc_[n] returned
	NULL and use local stack than.
	(asan_finish_file): Insert __asan_version_mismatch_check_v[n] call
	in addition to __asan_init.
	* sanitizer.def (BUILT_IN_ASAN_INIT): Rename to __asan_init.
	(BUILT_IN_ASAN_VERSION_MISMATCH_CHECK): Add new builtin call.
	* asan.h (asan_intercepted_p): Handle new string builtins.
	* ubsan.c (ubsan_use_new_style_p): New function.
	(ubsan_instrument_float_cast): If location is unknown, assign
	input_location to loc. Propagate loc to ubsan_create_data if
	ubsan_use_new_style_p returned true.

config/

	* bootstrap-asan.mk: Replace ASAN_OPTIONS=detect_leaks with
	LSAN_OPTIONS=detect_leaks.

gcc/testsuite/ChangeLog:

2015-10-20  Maxim Ostapenko  

	* c-c++-common/ubsan/float-cast-overflow-10.c: Adjust test.
	* c-c++-common/ubsan/float-cast-overflow-8.c: Likewise.
	* c-c++-common/ubsan/float-cast-overflow-9.c: Likewise.
	* g++.dg/asan/default-options-1.C: Likewise.

Index: gcc/asan.c
===
--- gcc/asan.c	(revision 228817)
+++ gcc/asan.c	(working copy)
@@ -1132,12 +1132,16 @@
   snprintf (buf, sizeof buf, "__asan_stack_malloc_%d",
 		use_after_return_class);
   ret = init_one_libfunc (buf);
-  rtx addr = convert_memory_address (ptr_mode, base);
-  ret = emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode, 2,
+  ret = emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode, 1,
  GEN_INT (asan_frame_size
 	  + base_align_bias),
- TYPE_MODE (pointer_sized_int_node),
- addr, ptr_mode);
+ TYPE_MODE (pointer_sized_int_node));
+  /* __asan_stack_malloc_[n] returns a pointer to fake stack if succeeded
+	 and NULL otherwise.  Check RET value is NULL here and jump over the
+	 BASE reassignment in this case.  Otherwise, reassign BASE to RET.  */
+  int very_unlikely = REG_BR_PROB_BASE / 2000 - 1;
+  emit_cmp_and_jump_insns (ret, const0_rtx, EQ, NULL_RTX,
+			   VOIDmode, 0, lab, very_unlikely);
   ret = convert_memory_address (Pmode, ret);
   emit_move_insn (base, ret);
   emit_label (lab);
@@ -2470,6 +2474,8 @@
 {
   tree fn = builtin_decl_implicit (BUILT_IN_ASAN_INIT);
   append_to_statement_list (build_call_expr (fn, 0), &asan_ctor_statements);
+  fn = builtin_decl_implicit (BUILT_IN_ASAN_VERSION_MISMATCH_CHECK);
+  append_to_statement_list (build_call_expr (fn, 0), &asan_ctor_statements);
 }
   FOR_EACH_DEFINED_VARIABLE (vnode)
 if (TREE_ASM_WRITTEN (vnode->decl)
Index: gcc/sanitizer.def
===
--- gcc/sanitizer.def	(revision 228817)
+++ gcc/sanitizer.def	(working copy)
@@ -27,8 +27,11 @@
for other FEs by asan.c.  */
 
 /* Address Sanitizer */
-DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT, "__asan_init_v4",
+DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT, "__asan_init",
 		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_VERSION_MISMATCH_CHECK,
+		  "__asan_version_mismatch_check_v6",
+		  BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
 /* Do not reorder the BUILT_IN_ASAN_{REPORT,CHECK}* builtins, e.g. cfgcleanup.c
relies on this order.  */
 DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD1, "__asan_report_load1",
Index: gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-10.c
===
--- gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-10.c	(revision 228817)
+++ gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-10.c	(working copy)
@@ -10,70 +10,37 @@
 
 /* _Decimal32 */
 /* { dg-output "value  is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "\[^\n\r]*value  is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*value  is outside the range of representable values of type 'char'\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "\[^\n\r]*value  is outside the range of representable values of type 'char'\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*value  is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r)" } */
-/* { dg-output "\[^\n\r]*value  is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*value  is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n

Re: [PATCH v2 0/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Maxim Ostapenko
This is just reapplied patch for SPARC by David S. Miller. The patch was 
preapproved here: 
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01214.html).
2015-10-20  Maxim Ostapenko  

	PR sanitizer/63958
	Reapply:
	2014-10-14  David S. Miller  

	* sanitizer_common/sanitizer_platform_limits_linux.cc (time_t):
	Define at __kernel_time_t, as needed for sparc.
	(struct __old_kernel_stat): Don't check if __sparc__ is defined.
	* libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
	(__sanitizer): Define struct___old_kernel_stat_sz,
	struct_kernel_stat_sz, and struct_kernel_stat64_sz for sparc.
	(__sanitizer_ipc_perm): Adjust for sparc targets.
	(__sanitizer_shmid_ds): Likewsie.
	(__sanitizer_sigaction): Likewise.
	(IOC_SIZE): Likewsie.

Index: libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc
===
--- libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc	(revision 250059)
+++ libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc	(working copy)
@@ -38,6 +38,7 @@
 #define uid_t __kernel_uid_t
 #define gid_t __kernel_gid_t
 #define off_t __kernel_off_t
+#define time_t __kernel_time_t
 // This header seems to contain the definitions of _kernel_ stat* structs.
 #include 
 #undef ino_t
@@ -62,7 +63,7 @@
 }  // namespace __sanitizer
 
 #if !defined(__powerpc64__) && !defined(__x86_64__) && !defined(__aarch64__)\
-&& !defined(__mips__)
+&& !defined(__mips__) && !defined(__sparc__)
 COMPILER_CHECK(struct___old_kernel_stat_sz == sizeof(struct __old_kernel_stat));
 #endif
 
Index: libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
===
--- libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h	(revision 250059)
+++ libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h	(working copy)
@@ -83,6 +83,14 @@
   const unsigned struct_kernel_stat_sz = 144;
   #endif
   const unsigned struct_kernel_stat64_sz = 104;
+#elif defined(__sparc__) && defined(__arch64__)
+  const unsigned struct___old_kernel_stat_sz = 0;
+  const unsigned struct_kernel_stat_sz = 104;
+  const unsigned struct_kernel_stat64_sz = 144;
+#elif defined(__sparc__) && !defined(__arch64__)
+  const unsigned struct___old_kernel_stat_sz = 0;
+  const unsigned struct_kernel_stat_sz = 64;
+  const unsigned struct_kernel_stat64_sz = 104;
 #endif
   struct __sanitizer_perf_event_attr {
 unsigned type;
@@ -105,7 +113,7 @@
 
 #if defined(__powerpc64__)
   const unsigned struct___old_kernel_stat_sz = 0;
-#else
+#elif !defined(__sparc__)
   const unsigned struct___old_kernel_stat_sz = 32;
 #endif
 
@@ -184,6 +192,18 @@
 unsigned short __pad1;
 unsigned long __unused1;
 unsigned long __unused2;
+#elif defined(__sparc__)
+# if defined(__arch64__)
+unsigned mode;
+unsigned short __pad1;
+# else
+unsigned short __pad1;
+unsigned short mode;
+unsigned short __pad2;
+# endif
+unsigned short __seq;
+unsigned long long __unused1;
+unsigned long long __unused2;
 #else
 unsigned short mode;
 unsigned short __pad1;
@@ -201,6 +221,26 @@
 
   struct __sanitizer_shmid_ds {
 __sanitizer_ipc_perm shm_perm;
+  #if defined(__sparc__)
+  # if !defined(__arch64__)
+u32 __pad1;
+  # endif
+long shm_atime;
+  # if !defined(__arch64__)
+u32 __pad2;
+  # endif
+long shm_dtime;
+  # if !defined(__arch64__)
+u32 __pad3;
+  # endif
+long shm_ctime;
+uptr shm_segsz;
+int shm_cpid;
+int shm_lpid;
+unsigned long shm_nattch;
+unsigned long __glibc_reserved1;
+unsigned long __glibc_reserved2;
+  #else
   #ifndef __powerpc__
 uptr shm_segsz;
   #elif !defined(__powerpc64__)
@@ -238,6 +278,7 @@
 uptr __unused4;
 uptr __unused5;
   #endif
+#endif
   };
 #elif SANITIZER_FREEBSD
   struct __sanitizer_ipc_perm {
@@ -555,9 +596,13 @@
 #else
 __sanitizer_sigset_t sa_mask;
 #ifndef __mips__
+#if defined(__sparc__)
+unsigned long sa_flags;
+#else
 int sa_flags;
 #endif
 #endif
+#endif
 #if SANITIZER_LINUX
 void (*sa_restorer)();
 #endif
@@ -799,7 +844,7 @@
 
 #define IOC_NRBITS 8
 #define IOC_TYPEBITS 8
-#if defined(__powerpc__) || defined(__powerpc64__) || defined(__mips__)
+#if defined(__powerpc__) || defined(__powerpc64__) || defined(__mips__) || defined(__sparc__)
 #define IOC_SIZEBITS 13
 #define IOC_DIRBITS 3
 #define IOC_NONE 1U
@@ -829,7 +874,17 @@
 #define IOC_DIR(nr) (((nr) >> IOC_DIRSHIFT) & IOC_DIRMASK)
 #define IOC_TYPE(nr) (((nr) >> IOC_TYPESHIFT) & IOC_TYPEMASK)
 #define IOC_NR(nr) (((nr) >> IOC_NRSHIFT) & IOC_NRMASK)
+
+#if defined(__sparc__)
+// In sparc the 14 bits SIZE field overlaps with the
+// least significant bit of DIR, so either IOC_READ or
+// IOC_WRITE shall be 1 in order to get a non-zero SIZE.
+# define IOC_SIZE(nr)   \
+  ((nr) >> 29) & 0x7) & (4U|2U)) == 0)? \
+   0

[PATCH v2 4/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Maxim Ostapenko
This is a reapplied Jakub's patch for disabling ODR violation detection. 
The patch was preapproved here: 
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01215.html).
2015-10-20  Maxim Ostapenko  

	PR bootstrap/63888
	Reapply:
	2015-02-20  Jakub Jelinek  

	* asan/asan_globals.cc (RegisterGlobal): Disable detect_odr_violation
	support until it is rewritten upstream.

	* c-c++-common/asan/pr63888.c: New test.

Index: libsanitizer/asan/asan_globals.cc
===
--- libsanitizer/asan/asan_globals.cc	(revision 250059)
+++ libsanitizer/asan/asan_globals.cc	(working copy)
@@ -146,7 +146,9 @@
   CHECK(AddrIsInMem(g->beg));
   CHECK(AddrIsAlignedByGranularity(g->beg));
   CHECK(AddrIsAlignedByGranularity(g->size_with_redzone));
-  if (flags()->detect_odr_violation) {
+  // This "ODR violation" detection is fundamentally incompatible with
+  // how GCC registers globals.  Disable as useless until rewritten upstream.
+  if (0 && flags()->detect_odr_violation) {
 // Try detecting ODR (One Definition Rule) violation, i.e. the situation
 // where two globals with the same name are defined in different modules.
 if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) {


[PATCH v2 5/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Maxim Ostapenko
This patch adjusts the fix for 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771 to extract the last 
PC from the stack frame if no valid FP is available for ARM. The patch 
was preapproved here: 
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01217.html).
2015-10-20  Maxim Ostapenko  

	* sanitizer_common/sanitizer_stacktrace.cc (GetCanonicFrame): Assume we
	compiled code with GCC when extracting the caller PC for ARM if no
	valid frame pointer is available.

Index: libsanitizer/sanitizer_common/sanitizer_stacktrace.cc
===
--- libsanitizer/sanitizer_common/sanitizer_stacktrace.cc	(revision 250059)
+++ libsanitizer/sanitizer_common/sanitizer_stacktrace.cc	(working copy)
@@ -62,8 +62,8 @@
   // Nope, this does not look right either. This means the frame after next does
   // not have a valid frame pointer, but we can still extract the caller PC.
   // Unfortunately, there is no way to decide between GCC and LLVM frame
-  // layouts. Assume LLVM.
-  return bp_prev;
+  // layouts. Assume GCC.
+  return bp_prev - 1;
 #else
   return (uhwptr*)bp;
 #endif


[PATCH v2 6/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Maxim Ostapenko
In this patch, I'm trying to add a general instruction how to perform 
the merge. This is just a documentation patch, any suggestions and 
opinions are welcome.
Index: libsanitizer/HOWTO_MERGE
===
--- libsanitizer/HOWTO_MERGE	(revision 0)
+++ libsanitizer/HOWTO_MERGE	(working copy)
@@ -0,0 +1,26 @@
+In general, merging process should not be very difficult, but we need to
+track various ABI changes and GCC-specific patches carefully.  Here is a
+general list of actions required to perform the merge:
+
+- Checkout recent GCC tree.
+- Run merge.sh script from libsanitizer directory.
+- Modify Makefile.am files into asan/tsan/lsan/ubsan/sanitizer_common/interception
+  directories if needed.  In particular, you may need to add new source files
+  and remove old ones in source files list, add new flags to {C, CXX}FLAGS if
+  needed and update DEFS with new defined variables.
+- Apply all needed GCC-specific patches to libsanitizer (note that some of
+  them might be already included to upstream).
+- Apply all necessary compiler changes.  Be especially careful here, you must
+  not break ABI between compiler and library.
+- Modify configure.ac file if needed (e.g. if you need to add link against new
+  library for sanitizer lilbs).
+- Remove unused (deleted by merge) files from all source and include
+  directories.  Be especially careful with headers, because they aren't listed
+  in Makefiles explicitly.
+- Regenerate configure script and all Makefiles by autoreconf.  You should use
+  exactly the same autotools version as for other GCC directories (current
+  version is 2.64, https://www.gnu.org/software/automake/faq/autotools-faq.html
+  for details how to install/use it).
+- Run regression testing on at least three platforms (e.g. x86-linux-gnu,
+  x86_64-linux-gnu, aarch64-linux-gnu).
+- Run {A, UB}San bootstrap on at least three platforms.


[Ada] Better recovery for misplaced keyword in discriminant specification

2015-10-20 Thread Arnaud Charlet
This patch improves the handling of an error in a discriminant specification
for an access discriminant.

Compiling b.ads must yield:

   b.ads:3:43: "constant" must appear after "access"

---
package B is

type Constant_Reference (D : constant access Integer) is null record
   with Implicit_Dereference => D;
end B;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Ed Schonberg  

* par-ch3.adb (P_Known_Discriminant_Part_Opt): Handle properly
a misplaced "constant" keyword in a discriminant specification.

Index: par-ch3.adb
===
--- par-ch3.adb (revision 229049)
+++ par-ch3.adb (working copy)
@@ -3030,8 +3030,23 @@
   Set_Discriminant_Type
 (Specification_Node,
  P_Access_Definition (Not_Null_Present));
+
+   --  Catch ouf-of-order keywords
+
+   elsif Token = Tok_Constant then
+  Scan;
+
+  if Token = Tok_Access then
+ Error_Msg_SC ("CONSTANT must appear after ACCESS");
+ Set_Discriminant_Type
+   (Specification_Node,
+P_Access_Definition (Not_Null_Present));
+
+  else
+ Error_Msg_SC ("misplaced CONSTANT");
+  end if;
+
else
-
   Set_Discriminant_Type
 (Specification_Node, P_Subtype_Mark);
   No_Constraint;


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Bernd Schmidt

On 10/20/2015 01:17 PM, Thomas Schwinge wrote:


As explained a few times already: GOMP_offload_register_ver constructors
will only be generated if there actually are offloaded code regions, but
for example:

 #include 
 int main()
 {
   __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia));
   return 0;
 }

... is a valid OpenACC program (untested), which doesn't contain any
offloaded code regions.  As a user I'd expect it to return different
answers if compiled with -foffload=nvptx-none in contrast to
-foffload=disable.  Actually, I can foresee exactly such code to be used
to probe for offloading being available, for example in testsuites.  And,
I guess we agree that under -foffload=disable we'd like the
compilation/runtime system to be configured in a way that no offloading
will happen?


Both of you can ignore me if you feel I'm not making sense, but what 
exactly is the use case for -foffload=disable? Isn't it slightly 
redundant with -fno-openacc? IMO it's not an option that alters the 
available devices, that's a question that is answered at run-time and 
doesn't (or shouldn't) really depend on compiler switches. As a user I'd 
expect -foffload=disable to just prevent generation of offloaded code 
for the things I'm compiling. As Jakub pointed out, shared libraries may 
still contain other pieces that are offloadable.


I guess I don't fully understand why you want to go to great lengths to 
disable devices at run-time based on a compile-time switch. What's the 
reasoning here?



Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
told me this means "High Water Mark".  I have no strong opinion on the
name to use, just want to mention that "*_LAST" sounds to me like that
one still is part of the accepted set, whereas in this case it'd be the
first enumerator outside of the accepted ones.  (And I guess, we agree
that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
"OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
ugly?)


Nah, just rename HWM to LAST, that's fairly common usage I think.


Bernd


[Ada] Abort while processing "of" container loop

2015-10-20 Thread Arnaud Charlet
This patch classifies an "of" loop parameter as never needing finalization
actions.


-- Source --


--  vectors.ads

with Ada.Unchecked_Deallocation;
with Ada.Unchecked_Conversion;
with System;

generic
   type Element_Type is private;
   Small_Vector_Capacity : Natural := 0;

package Vectors is
   type Elements_Array is array (Natural) of Element_Type;
   type Elements_Array_Access is access all Elements_Array;

   function To_Pointer is
 new Ada.Unchecked_Conversion (System.Address, Elements_Array_Access);

   procedure Free is new Ada.Unchecked_Deallocation
 (Elements_Array, Elements_Array_Access);

   type Small_Array_Type
   is array (0 .. Small_Vector_Capacity - 1) of Element_Type;

   type Vector is private
 with Iterable =>
   (First   => First_Index,
Next=> Next,
Has_Element => Has_Element,
Element => Get);

   procedure Append (Self : in out Vector; Element : Element_Type);
   pragma Inline_Always (Append);

   function Get (Self : Vector; Index : Natural) return Element_Type;
   pragma Inline_Always (Get);

   procedure Destroy (Self : in out Vector);
   pragma Inline_Always (Destroy);

   procedure Clear (Self : in out Vector);
   pragma Inline_Always (Clear);

   function First_Element (Self : Vector) return Element_Type;

   function Last_Element (Self : Vector) return Element_Type;

   function Length (Self : Vector) return Natural;
   pragma Inline_Always (Length);

   function First_Index (Self : Vector) return Natural is (0);
   pragma Inline_Always (First_Index);

   function Last_Index (Self : Vector) return Integer
 is (Length (Self) - 1);
   pragma Inline_Always (Last_Index);

   function Next (Self : Vector; N : Natural) return Natural is (N + 1);
   pragma Inline_Always (Next);

   function Has_Element (Self : Vector; N : Natural) return Boolean;
   pragma Inline_Always (Has_Element);

private
   type Vector is record
  E: Elements_Array_Access := null;
  Size : Natural := 0;
  Capacity : Natural := Small_Vector_Capacity;
  SV   : Small_Array_Type;
   end record;

   procedure Reserve (Self : in out Vector; Capacity : Positive);
   pragma Inline_Always (Reserve);

   function Has_Element (Self : Vector; N : Natural) return Boolean
 is (N < Self.Size);
end Vectors;

--  vectors.adb

with System; use type System.Address;
use System;
with System.Memory; use System.Memory;

package body Vectors is
   El_Size : constant size_t := Elements_Array'Component_Size / Storage_Unit;

   procedure Reserve (Self : in out Vector; Capacity : Positive) is
  Siz : constant size_t :=
size_t (Capacity) * El_Size;
   begin
  if Small_Vector_Capacity > 0 then
 if Self.Capacity = Small_Vector_Capacity then
Self.E := To_Pointer (Alloc (Siz));
for I in 0 .. Self.Size - 1 loop
   Self.E.all (I) := Self.SV (I);
end loop;
 else
Self.E := To_Pointer (Realloc (Self.E.all'Address, Siz));
 end if;
  else
 if Self.E /= null then
Self.E := To_Pointer (Realloc (Self.E.all'Address, Siz));
 else
Self.E := To_Pointer (Alloc (Siz));
 end if;
  end if;
  Self.Capacity := Capacity;
   end Reserve;

   procedure Append (Self : in out Vector; Element : Element_Type) is
   begin
  if Self.Capacity = Self.Size then
 Reserve (Self, (Self.Capacity * 2) + 1);
  end if;

  if Small_Vector_Capacity = 0 then
 Self.E.all (Self.Size) := Element;
  else
 if Self.Capacity = Small_Vector_Capacity then
Self.SV (Self.Size) := Element;
 else
Self.E.all (Self.Size) := Element;
 end if;
  end if;

  Self.Size := Self.Size + 1;
   end Append;

   function Get (Self : Vector; Index : Natural) return Element_Type is
   begin
  if Small_Vector_Capacity = 0 then
 return Self.E (Index);
  else
 if Self.Capacity = Small_Vector_Capacity then
return Self.SV (Index);
 else
return Self.E (Index);
 end if;
  end if;
   end Get;

   procedure Destroy (Self : in out Vector) is
   begin
  Free (Self.E);
   end Destroy;

   procedure Clear (Self : in out Vector) is
   begin
  Self.Size := 0;
   end Clear;

   function Last_Element (Self : Vector) return Element_Type
 is (Get (Self, Self.Size - 1));

   function First_Element (Self : Vector) return Element_Type
 is (Get (Self, 0));

   function Length (Self : Vector) return Natural is (Self.Size);
end Vectors;

--  ast.ads

with Vectors;

package AST is
   type AST_Node_Type is abstract tagged null record;
   type AST_Node is access all AST_Node_Type'Class;

   function Image (Node : access AST_Node_Type) return String is abstract;

   generic
  type Node_Type is abstract new AST_Node_Type with private;
  type Node_Access is access all Node_Type'Class;

   p

[Ada] Discriminants and protected units

2015-10-20 Thread Arnaud Charlet
This patch implements the following SPARK RM rule:

   7.1.3(5) - An effectively volatile type other than a protected type shall
   not have a discriminated part.


-- Source --


--  discrims.ads

package Discrims with SPARK_Mode is
   type Vol_1 (Discr : Natural) is null record with Volatile;
   type Vol_2 (Discr : Natural) is null record;

   protected type Prot (Discr : Natural) is
  entry E;
   end Prot;
end Discrims;

--  discrims.adb

package body Discrims with SPARK_Mode is
   protected body Prot is
  entry E when True is begin null; end E;
   end Prot;

   Obj_1 : Vol_1 (1);
   Obj_2 : Vol_2 (2) with Volatile;
   Obj_3 : Prot  (3);
   Obj_4 : Prot  (4) with Volatile;
end Discrims;


-- Compilation and output --


$ gcc -c discrims.adb
discrims.adb:7:04: discriminated object "Obj_2" cannot be volatile
discrims.ads:2:09: discriminated type "Vol_1" cannot be volatile

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Hristian Kirtchev  

* sem_ch3.adb (Analyze_Object_Contract):
A protected type or a protected object is allowed to have a
discriminated part.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 229049)
+++ sem_ch3.adb (working copy)
@@ -3347,9 +3347,11 @@
  Obj_Id);
 
--  An object of a discriminated type cannot be effectively
-   --  volatile (SPARK RM C.6(4)).
+   --  volatile except for protected objects (SPARK RM 7.1.3(5)).
 
-   elsif Has_Discriminants (Obj_Typ) then
+   elsif Has_Discriminants (Obj_Typ)
+ and then not Is_Protected_Type (Obj_Typ)
+   then
   Error_Msg_N
 ("discriminated object & cannot be volatile", Obj_Id);
 


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Jakub Jelinek
On Tue, Oct 20, 2015 at 01:17:45PM +0200, Thomas Schwinge wrote:
> Always creating (dummy) GOMP_offload_register_ver constructors has been
> another suggestion that I had voiced much earlier in this thread (months
> ago), but everyone (including me) taking part in the discussion agreed
> that it'd cause even higher compile-time overhead.

I'd prefer to just set a flag like "force creation of the GOMP offloading
sections" whenever you see one of the APIs or constructs used in the TU,
and if that flag is set, even when there are no offloaded vars or
functions/kernels, force creation of the corresponding data sections.
Either it can be stardard offloading LTO sections, just not containing
anything, or, if you want to improve compile-time, it could be special too,
so that the linker plugin can quickly identify those that only need
offloading support, but don't have any offloaded vars or code.
But that can certainly be done as an incremental optimization.

For OpenMP that would be whenever
#pragma omp target{, data, enter data, exit data} construct is seen
(e.g. during gimplification or OMP region nesting checking even better),
or for

omp_set_default_device
omp_get_default_device
omp_get_num_devices
omp_is_initial_device
omp_get_initial_device
omp_target_alloc
omp_target_free
omp_target_is_present
omp_target_memcpy
omp_target_memcpy_rect
omp_target_associate_ptr
omp_target_disassociate_ptr

calls.  Guess for OpenACC you have similar set of calls.
The thing is, while OpenACC is standard is pretty much solely about offloading,
OpenMP is not, and in many cases programs just use host OpenMP parallelization
(at least right now, I bet such programs are significantly larger set
than programs that use OpenACC or OpenMP offloading together).
Distributions and others will eventually configure the compilers they are
shipping to enable the offloading, and if that forces a constructor to every
TU or even every shared library just because it has been compiled with
-fopenmp, it is unacceptable overhead.

For the vendor shipped binary compilers, I'm envisioning ideal would be to
be able to configure gcc for many offloading targets, then build such main
compiler and offloading target compilers, but package them separately (one
package (or set of packages) the base compiler, and then another package (or
set of them) for each offloading target.  What the -foffload= actually will
be in the end from the linked shared library or binary POV would depend both
on the configured offloading target, but also on whether the mkoffload
binaries are found (or whatever else is needed first from the offloading
target).  That would mean that we'd not issue hard error or any kind of
diagnostics if mkoffload is missing.  Is that acceptable, or should that
e.g. be limited just to the compiled in configure default (i.e. explicit
-foffload= would error if the requested mkoffload is missing, default
-foffload= would silently skip unavailable ones; I guess this would be my
preference), or should we have two ways of configuring the offloading
targets, as hard requirements and as optional support?

> So, how to resolve our different opinions?  I mean, for any serious
> program code, there will be constructor calls into libgomp already; are
> you expecting that adding one more really will cause any noticeable
> overhead?

See above, that is really not the case.  Most of OpenMP code doesn't have
any constructor calls into libgomp at all, the only exception is
GOMP_offload_register{,_ver} at this point.

> > What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?
> 
> Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
> told me this means "High Water Mark".  I have no strong opinion on the
> name to use, just want to mention that "*_LAST" sounds to me like that
> one still is part of the accepted set, whereas in this case it'd be the
> first enumerator outside of the accepted ones.  (And I guess, we agree
> that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
> "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
> ugly?)

*_LAST or *_last is actually what we use pretty much everywhere, see e.g.
lots of places in tree-core.h.

> Are you worried about the performance issues of a very short locking
> cycle that in the majority of all cases should happen without blocking,
> in comparison to performance issues related to host/device memory
> transfers or kernel launches that will follow after the call to
> gomp_offload_target_enabled_p?  I don't really think that is reasonable
> to worry about.

Yes, I'm worried about that.  The lock could be contended, and if you take
the lock many times for each construct, it can show up, I'm worried about
cache effects etc.  It is already bad enough that we take/release the locks
for the same device e.g. in each of:
  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);

  struct target_mem_desc *tgt_vars
= gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, fals

Re: [PATCH v2 1/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Jakub Jelinek
On Tue, Oct 20, 2015 at 02:18:25PM +0300, Maxim Ostapenko wrote:
> This is the merge itself. Added DCAN_SANITIZE_UB={0, 1} and nuked
> DASAN_FLEXIBLE_MAPPING_AND_OFFSET=0 (not used now) in corresponding
> Makefiles.

Ok.

Jakub


Re: [PATCH v2 2/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Jakub Jelinek
On Tue, Oct 20, 2015 at 02:19:51PM +0300, Maxim Ostapenko wrote:
> +   a) Location's filename must not be NULL.
> +   b) Location's filename must not be equal to "".
> +   c) Location's filename must not be equal to "\1".
> +   d) First two bytes of filename msut not contain '\xff' symbol.  */

must

> +
> +static bool
> +ubsan_use_new_style_p (location_t loc)
> +{
> +  if (loc == UNKNOWN_LOCATION)
> +return false;
> +
> +  expanded_location xloc = expand_location (loc);
> +  if (xloc.file == NULL || strncmp (xloc.file, "\1", 2) == 0
> +  || strncmp (xloc.file, "", 1) == 0 || xloc.file[0] == '\xff'

I'd use || xloc.file == '\0' || xloc.file[0] == '\xff
instead, but not a big deal for me.

> +  || xloc.file[1] == '\xff')
> +return false;
> +
> +  return true;
> +}
> +

Ok with the typo fix.

Jakub


Re: [PATCH v2 6/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Jakub Jelinek
On Tue, Oct 20, 2015 at 02:29:44PM +0300, Maxim Ostapenko wrote:
> In this patch, I'm trying to add a general instruction how to perform the
> merge. This is just a documentation patch, any suggestions and opinions are
> welcome.

I'd add a line that the diff in lib/asan/tests tests since the last
merge to those tests (looking at gcc/testsuite/ChangeLog*, the 2014-05-30 entry
mentions r209283) should be incorporated into the gcc testsuite too (if
possible).

Otherwise, LGTM.

> Index: libsanitizer/HOWTO_MERGE
> ===
> --- libsanitizer/HOWTO_MERGE  (revision 0)
> +++ libsanitizer/HOWTO_MERGE  (working copy)
> @@ -0,0 +1,26 @@
> +In general, merging process should not be very difficult, but we need to
> +track various ABI changes and GCC-specific patches carefully.  Here is a
> +general list of actions required to perform the merge:
> +
> +- Checkout recent GCC tree.
> +- Run merge.sh script from libsanitizer directory.
> +- Modify Makefile.am files into 
> asan/tsan/lsan/ubsan/sanitizer_common/interception
> +  directories if needed.  In particular, you may need to add new source files
> +  and remove old ones in source files list, add new flags to {C, CXX}FLAGS if
> +  needed and update DEFS with new defined variables.
> +- Apply all needed GCC-specific patches to libsanitizer (note that some of
> +  them might be already included to upstream).
> +- Apply all necessary compiler changes.  Be especially careful here, you must
> +  not break ABI between compiler and library.
> +- Modify configure.ac file if needed (e.g. if you need to add link against 
> new
> +  library for sanitizer lilbs).
> +- Remove unused (deleted by merge) files from all source and include
> +  directories.  Be especially careful with headers, because they aren't 
> listed
> +  in Makefiles explicitly.
> +- Regenerate configure script and all Makefiles by autoreconf.  You should 
> use
> +  exactly the same autotools version as for other GCC directories (current
> +  version is 2.64, 
> https://www.gnu.org/software/automake/faq/autotools-faq.html
> +  for details how to install/use it).
> +- Run regression testing on at least three platforms (e.g. x86-linux-gnu,
> +  x86_64-linux-gnu, aarch64-linux-gnu).
> +- Run {A, UB}San bootstrap on at least three platforms.

Jakub


Re: Move some bit and binary optimizations in simplify and match

2015-10-20 Thread Richard Biener
On Tue, Oct 20, 2015 at 8:46 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
>>> +/* Fold X + (X / CST) * -CST to X % CST.  */
>>> This one is still wrong
> Removed.
>
>>> I don't understand the point of the FLOAT_TYPE_P check.
> The check was there in fold-const. So, just had the same check.
>
>>> Will we also simplify (A & B) - (A & ~B) into B - (A ^ B) ?
> Done.
>
>>> or maybe integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1))
> Done.
>
>>> :c on bit_ior? It should also allow you to merge the 2 CST versions into 
>>> one.
> Had it. But due to the error on "logical_inverted_value"; confused it with
> this pattern and had duplicated it.
>
>>> I am not really in favor of the indentation change (or the comment).
> Done.
>
>>> This patch is ok when bootstrapped / tested and with a proper changelog 
>>> entry.
> Regression tested on X86_64 with no extra failures.

+(simplify
+ (minus (bit_and:s @0 INTEGER_CST@2) (bit_and:s @0 INTEGER_CST@1))
+ (if (integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1)))
+  (minus (bit_xor @0 @1) @1)))

use if (wi::bit_and (@2, @1) == 0)

and instead of the 2nd group

+/* Fold (A & B) - (A & ~B) into B - (A ^ B).  */
+(simplify
+ (minus (bit_and:s @0 @1) (bit_and:s @0 (bit_not @1)))
+  (minus @1 (bit_xor @0 @1)))
+(simplify
+ (minus (bit_and:s @0 INTEGER_CST@1) (bit_and:s @0 INTEGER_CST@2))
+ (if (integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1)))
+  (minus @1 (bit_xor @0 @1

place a :c on the minus of the one not matching INTEGER_CSTs.

+(simplify
+ (bit_ior:c (bit_and @0 INTEGER_CST@2) (bit_and (bit_not @0) INTEGER_CST@1))
+  (if (integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1)))
+   (bit_xor @0 @1)))

see above, use wi::bit_and instead

Otherwise looks ok to me.

RIchard.




> 2015-10-20  Richard Biener  
>  Naveen H.S  
>
> * fold-const.c (fold_binary_loc) : Move (-A) * (-B) -> A * B
> to match.pd.
> Move (a * (1 << b)) is (a << b) to match.pd.
> Move convert (C1/X)*C2 into (C1*C2)/X to match.pd.
> Move ~X & X, (X == 0) & X, and !X & X are zero to match.pd.
> Move X & ~X , X & (X == 0), and X & !X are zero to match.pd.
>
> * match.pd (mult:c @0 (convert? (lshift integer_onep@1 @2))):
> New simplifier.
> (mult (rdiv:s REAL_CST@0 @1) REAL_CST@2): New simplifier.
> (bit_and:c (convert? @0) (convert? (bit_not @0))): New simplifier.
> (bit_ior (bit_and:s @0 (bit_not:s @1)) (bit_and:s (bit_not:s @0) @1))
> : New simplifier.
> (mult:c (convert1? (negate @0)) (convert2? negate_expr_p@1)):
> New simplifier.
> (match (logical_inverted_value @0) (truth_not @0)) : New Predicate.
>
>
>
> 2015-10-20  Richard Biener  
>  Naveen H.S  
>
> * fold-const.c (fold_binary_loc) : Move Fold (A & ~B) - (A & B)
> into (A ^ B) - B to match.pd
> Move (X & ~Y) | (~X & Y) is X ^ Y to match.pd.
>
> * match.pd (minus (bit_and:s @0 (bit_not @1)) (bit_and:s @0 @1)):
> New simplifier.
> (minus (bit_and:s @0 INTEGER_CST@2) (bit_and:s @0 INTEGER_CST@1)):
> New simplifier.
> (minus (bit_and:s @0 @1) (bit_and:s @0 (bit_not @1))): New simplifier.
> (minus (bit_and:s @0 INTEGER_CST@1) (bit_and:s @0 INTEGER_CST@2)):
> New simplifier.
> (bit_ior:c (bit_and:c @0 (bit_not @1)) (bit_and:c (bit_not @0) @1)):
> New simplifier.
> (bit_ior:c (bit_and @0 INTEGER_CST@2) (bit_and (bit_not @0) 
> INTEGER_CST@1)):
> New simplifier.


Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)

2015-10-20 Thread Jakub Jelinek
On Tue, Oct 20, 2015 at 01:45:37PM +0200, Bernd Schmidt wrote:
> Both of you can ignore me if you feel I'm not making sense, but what exactly
> is the use case for -foffload=disable? Isn't it slightly redundant with
> -fno-openacc? IMO it's not an option that alters the available devices,
> that's a question that is answered at run-time and doesn't (or shouldn't)
> really depend on compiler switches. As a user I'd expect -foffload=disable
> to just prevent generation of offloaded code for the things I'm compiling.
> As Jakub pointed out, shared libraries may still contain other pieces that
> are offloadable.
> 
> I guess I don't fully understand why you want to go to great lengths to
> disable devices at run-time based on a compile-time switch. What's the
> reasoning here?

At least for OpenMP, I'm also happy with what we do now (except for the
ability to configure offloading targets as optional, i.e. dynamically
configure the default based on what packages user install rather than
just on how it has been configured, so that e.g. just because it has been
configured for PTX offloading the host GCC itself doesn't have to have a
dependency on the proprietary CUDA stuff in any way).
I believe in OpenMP nobody says that if the device HW is available, but user
chose to not compile offloading code/variables for that particular device
that it can't show up among omp_get_num_devices ().  And I think it is
entirely fine if say target data map succeeds to that device, but then
target is offloaded, if that is caused by users configure or command line
choice.  Maybe OpenACC has different requirements, is it required to
terminate the program if it can't fulfill the requested offloading?

In any case, I'm fine with something I've noted in the last mail, or with
the status quo, but not with running constructors in TUs or even shared
libraries just because they have been compiled with -fopenmp (and either
haven't used any OpenMP code at all, or just the non-*target* directives).

Jakub


[Ada] Friendlier behavior for "=" and ":="

2015-10-20 Thread Arnaud Charlet
This patch avoids race conditions in certain cases. This is not necessary,
because these cases are technically erroneous, but it seems friendlier to avoid
races. Furthermore, previous versions of the containers avoided some of these
races.

No test available; no change in behavior for correct programs.

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Bob Duff  

* a-cbdlli.adb, a-cdlili.adb, a-chtgop.adb, a-cidlli.adb,
* a-cobove.adb, a-coinve.adb, a-convec.adb, a-crbtgo.adb ("="): Avoid
modifying the tampering counts unnecessarily.
(Adjust): Zero tampering counts unconditionally.

Index: a-cdlili.adb
===
--- a-cdlili.adb(revision 229049)
+++ a-cdlili.adb(working copy)
@@ -73,31 +73,35 @@
-
 
function "=" (Left, Right : List) return Boolean is
-  --  Per AI05-0022, the container implementation is required to detect
-  --  element tampering by a generic actual subprogram.
-
-  Lock_Left : With_Lock (Left.TC'Unrestricted_Access);
-  Lock_Right : With_Lock (Right.TC'Unrestricted_Access);
-
-  L  : Node_Access;
-  R  : Node_Access;
-
begin
   if Left.Length /= Right.Length then
  return False;
   end if;
 
-  L := Left.First;
-  R := Right.First;
-  for J in 1 .. Left.Length loop
- if L.Element /= R.Element then
-return False;
- end if;
+  if Left.Length = 0 then
+ return True;
+  end if;
 
- L := L.Next;
- R := R.Next;
-  end loop;
+  declare
+ --  Per AI05-0022, the container implementation is required to detect
+ --  element tampering by a generic actual subprogram.
 
+ Lock_Left : With_Lock (Left.TC'Unrestricted_Access);
+ Lock_Right : With_Lock (Right.TC'Unrestricted_Access);
+
+ L : Node_Access := Left.First;
+ R : Node_Access := Right.First;
+  begin
+ for J in 1 .. Left.Length loop
+if L.Element /= R.Element then
+   return False;
+end if;
+
+L := L.Next;
+R := R.Next;
+ end loop;
+  end;
+
   return True;
end "=";
 
@@ -109,10 +113,15 @@
   Src : Node_Access := Container.First;
 
begin
+  --  If the counts are nonzero, execution is technically erroneous, but
+  --  it seems friendly to allow things like concurrent "=" on shared
+  --  constants.
+
+  Zero_Counts (Container.TC);
+
   if Src = null then
  pragma Assert (Container.Last = null);
  pragma Assert (Container.Length = 0);
- pragma Assert (Container.TC = (Busy => 0, Lock => 0));
  return;
   end if;
 
Index: a-coinve.adb
===
--- a-coinve.adb(revision 229049)
+++ a-coinve.adb(working copy)
@@ -103,30 +103,37 @@
-
 
overriding function "=" (Left, Right : Vector) return Boolean is
-  --  Per AI05-0022, the container implementation is required to detect
-  --  element tampering by a generic actual subprogram.
-
-  Lock_Left : With_Lock (Left.TC'Unrestricted_Access);
-  Lock_Right : With_Lock (Right.TC'Unrestricted_Access);
begin
   if Left.Last /= Right.Last then
  return False;
   end if;
 
-  for J in Index_Type range Index_Type'First .. Left.Last loop
- if Left.Elements.EA (J) = null then
-if Right.Elements.EA (J) /= null then
+  if Left.Length = 0 then
+ return True;
+  end if;
+
+  declare
+ --  Per AI05-0022, the container implementation is required to detect
+ --  element tampering by a generic actual subprogram.
+
+ Lock_Left : With_Lock (Left.TC'Unrestricted_Access);
+ Lock_Right : With_Lock (Right.TC'Unrestricted_Access);
+  begin
+ for J in Index_Type range Index_Type'First .. Left.Last loop
+if Left.Elements.EA (J) = null then
+   if Right.Elements.EA (J) /= null then
+  return False;
+   end if;
+
+elsif Right.Elements.EA (J) = null then
return False;
+
+elsif Left.Elements.EA (J).all /= Right.Elements.EA (J).all then
+   return False;
 end if;
+ end loop;
+  end;
 
- elsif Right.Elements.EA (J) = null then
-return False;
-
- elsif Left.Elements.EA (J).all /= Right.Elements.EA (J).all then
-return False;
- end if;
-  end loop;
-
   return True;
end "=";
 
@@ -136,6 +143,12 @@
 
procedure Adjust (Container : in out Vector) is
begin
+  --  If the counts are nonzero, execution is technically erroneous, but
+  --  it seems friendly to allow things like concurrent "=" on shared
+  --  constants.
+
+  Zero_Counts (Container.TC);
+
   if Contai

[Ada] Remove UET_Address attribute

2015-10-20 Thread Arnaud Charlet
This was an unused and internal-use only attribute.
No testcase.

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Tristan Gingold  

* sem_util.adb (Is_Protected_Self_Reference): Remove reference to
UET_Address in comment.
* sem_attr.adb (Check_Unit_Name): Adjust comment.
(Analyze_Attribute): Remove handling of UET_Address.
* sem_attr.ads (Attribute_Impl_Def): Remove Attribute_UET_Address.
* snames.ads-tmpl Remove Name_UET_Address, Attribute_UET_Address.
* exp_attr.adb (Expand_N_Attribute_Reference): Remove
Attribute_UET_Address.

Index: exp_attr.adb
===
--- exp_attr.adb(revision 229057)
+++ exp_attr.adb(working copy)
@@ -6152,49 +6152,6 @@
 Expand_Fpt_Attribute_R (N);
  end if;
 
-  -
-  -- UET_Address --
-  -
-
-  when Attribute_UET_Address => UET_Address : declare
- Ent : constant Entity_Id := Make_Temporary (Loc, 'T');
-
-  begin
- Insert_Action (N,
-   Make_Object_Declaration (Loc,
- Defining_Identifier => Ent,
- Aliased_Present => True,
- Object_Definition   =>
-   New_Occurrence_Of (RTE (RE_Address), Loc)));
-
- --  Construct name __gnat_xxx__SDP, where xxx is the unit name
- --  in normal external form.
-
- Get_External_Unit_Name_String (Get_Unit_Name (Pref));
- Name_Buffer (1 + 7 .. Name_Len + 7) := Name_Buffer (1 .. Name_Len);
- Name_Len := Name_Len + 7;
- Name_Buffer (1 .. 7) := "__gnat_";
- Name_Buffer (Name_Len + 1 .. Name_Len + 5) := "__SDP";
- Name_Len := Name_Len + 5;
-
- Set_Is_Imported (Ent);
- Set_Interface_Name (Ent,
-   Make_String_Literal (Loc,
- Strval => String_From_Name_Buffer));
-
- --  Set entity as internal to ensure proper Sprint output of its
- --  implicit importation.
-
- Set_Is_Internal (Ent);
-
- Rewrite (N,
-   Make_Attribute_Reference (Loc,
- Prefix => New_Occurrence_Of (Ent, Loc),
- Attribute_Name => Name_Address));
-
- Analyze_And_Resolve (N, Typ);
-  end UET_Address;
-
   
   -- Update --
   
Index: sem_util.adb
===
--- sem_util.adb(revision 229059)
+++ sem_util.adb(working copy)
@@ -12730,9 +12730,9 @@
 
begin
   --  Verify that prefix is analyzed and has the proper form. Note that
-  --  the attributes Elab_Spec, Elab_Body, Elab_Subp_Body and UET_Address,
-  --  which also produce the address of an entity, do not analyze their
-  --  prefix because they denote entities that are not necessarily visible.
+  --  the attributes Elab_Spec, Elab_Body and Elab_Subp_Body which also
+  --  produce the address of an entity, do not analyze their prefix
+  --  because they denote entities that are not necessarily visible.
   --  Neither of them can apply to a protected type.
 
   return Ada_Version >= Ada_2005
Index: sem_attr.adb
===
--- sem_attr.adb(revision 229057)
+++ sem_attr.adb(working copy)
@@ -388,8 +388,8 @@
   --  itself of the form of a library unit name. Note that this is
   --  quite different from Check_Program_Unit, since it only checks
   --  the syntactic form of the name, not the semantic identity. This
-  --  is because it is used with attributes (Elab_Body, Elab_Spec,
-  --  UET_Address and Elaborated) which can refer to non-visible unit.
+  --  is because it is used with attributes (Elab_Body, Elab_Spec and
+  --  Elaborated) which can refer to non-visible unit.
 
   procedure Error_Attr (Msg : String; Error_Node : Node_Id);
   pragma No_Return (Error_Attr);
@@ -2675,7 +2675,6 @@
   if Aname /= Name_Elab_Body   and then
  Aname /= Name_Elab_Spec   and then
  Aname /= Name_Elab_Subp_Body  and then
- Aname /= Name_UET_Address and then
  Aname /= Name_Enabled and then
  Aname /= Name_Old
   then
@@ -6026,15 +6025,6 @@
 
  Analyze_And_Resolve (N, Standard_String);
 
-  -
-  -- UET_Address --
-  -
-
-  when Attribute_UET_Address =>
- Check_E0;
- Check_Unit_Name (P);
- Set_Etype (N, RTE (RE_Address));
-
   ---
   -- Unbiased_Rounding --
   ---
@@ -9710,7 +9700,6 @@
Attribute_Terminated   |
Attribute_To_Address   |
Attribute_Type_Key |
-   Attribute_UET_Address  |
Attribute_Unchecked_Access |
 

[Ada] Debug information for limited class-wide objects

2015-10-20 Thread Arnaud Charlet
This patch modifies the expansion of a build-in-place function call that
initializes a class-wide limited object to generate the necessary debug
information for the object.


-- Source --


--  types.ads

package Types is
   type Root_Type is tagged limited record
  I : Integer;
   end record;
   type Root_Access is access all Root_Type'Class;

   function Get return Root_Type'Class;
   procedure Ignore (O_Acc : Root_Access);
end Types;

--  types.adb

package body Types is
   function Get return Root_Type'Class is
   begin
  return Root_Type'(I => 0);
   end Get;

   procedure Ignore (O_Acc : Root_Access) is
   begin
  null;
   end Ignore;
end Types;

--  cw_debug_info.adb

with Types; use Types;

procedure CW_Debug_Info is
   Obj : aliased Root_Type'Class := Get;
   Obj_Ptr : constant Root_Access:= Obj'Unchecked_Access;

begin
   Ignore (Obj_Ptr);
end CW_Debug_Info;


-- Compilation and output --


$ gnatmake -q -g cw_debug_info.adb
$ gdb ./cw_debug_info
$ b cw_debug_info.adb:5
$ r
$ print (obj)
$1 = (i => 0)

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Hristian Kirtchev  

* exp_ch6.adb (Expand_N_Extended_Return_Statement): Code cleanup.
(Make_Build_In_Place_Call_In_Object_Declaration): Update the
parameter profile.  Code cleanup. Request debug info for the
object renaming declaration.
(Move_Activation_Chain): Add new formal parameter and update the
comment on usage.
* exp_ch6.ads (Make_Build_In_Place_Call_In_Object_Declaration):
Update the parameter profile and comment on usage.
* sem_util.ads, sem_util.adb (Remove_Overloaded_Entity): New routine,
currently unused.

Index: sem_util.adb
===
--- sem_util.adb(revision 229060)
+++ sem_util.adb(working copy)
@@ -16961,6 +16961,106 @@
   end if;
end Remove_Homonym;
 
+   --
+   -- Remove_Overloaded_Entity --
+   --
+
+   procedure Remove_Overloaded_Entity (Id : Entity_Id) is
+  procedure Remove_Primitive_Of (Typ : Entity_Id);
+  --  Remove primitive subprogram Id from the list of primitives that
+  --  belong to type Typ.
+
+  -
+  -- Remove_Primitive_Of --
+  -
+
+  procedure Remove_Primitive_Of (Typ : Entity_Id) is
+ Prims : Elist_Id;
+
+  begin
+ if Is_Tagged_Type (Typ) then
+Prims := Direct_Primitive_Operations (Typ);
+
+if Present (Prims) then
+   Remove (Prims, Id);
+end if;
+ end if;
+  end Remove_Primitive_Of;
+
+  --  Local variables
+
+  Scop: constant Entity_Id := Scope (Id);
+  Formal  : Entity_Id;
+  Prev_Id : Entity_Id;
+
+   --  Start of processing for Remove_Overloaded_Entity
+
+   begin
+  --  Remove the entity from the homonym chain. When the entity is the
+  --  head of the chain, associate the entry in the name table with its
+  --  homonym effectively making it the new head of the chain.
+
+  if Current_Entity (Id) = Id then
+ Set_Name_Entity_Id (Chars (Id), Homonym (Id));
+
+  --  Otherwise link the previous and next homonyms
+
+  else
+ Prev_Id := Current_Entity (Id);
+ while Present (Prev_Id) and then Homonym (Prev_Id) /= Id loop
+Prev_Id := Homonym (Prev_Id);
+ end loop;
+
+ Set_Homonym (Prev_Id, Homonym (Id));
+  end if;
+
+  --  Remove the entity from the scope entity chain. When the entity is
+  --  the head of the chain, set the next entity as the new head of the
+  --  chain.
+
+  if First_Entity (Scop) = Id then
+ Prev_Id := Empty;
+ Set_First_Entity (Scop, Next_Entity (Id));
+
+  --  Otherwise the entity is either in the middle of the chain or it acts
+  --  as its tail. Traverse and link the previous and next entities.
+
+  else
+ Prev_Id := First_Entity (Scop);
+ while Present (Prev_Id) and then Next_Entity (Prev_Id) /= Id loop
+Next_Entity (Prev_Id);
+ end loop;
+
+ Set_Next_Entity (Prev_Id, Next_Entity (Id));
+  end if;
+
+  --  Handle the case where the entity acts as the tail of the scope entity
+  --  chain.
+
+  if Last_Entity (Scop) = Id then
+ Set_Last_Entity (Scop, Prev_Id);
+  end if;
+
+  --  The entity denotes a primitive subprogram. Remove it from the list of
+  --  primitives of the associated controlling type.
+
+  if Ekind_In (Id, E_Function, E_Procedure) and then Is_Primitive (Id) then
+ Formal := First_Formal (Id);
+ while Present (Formal) loop
+if Is_Controlling_Formal (Formal) then
+   Remove_Primitive_Of (Etype (Formal));
+   exit;
+   

[Ada] Crash on address clause involving controlled objects.

2015-10-20 Thread Arnaud Charlet
An address clause that overlays an object with a controlled object of a
component of a controlled object is erroneous, and the compiler replaces the
address clause with the corresponding raise statement. However, the analysis
of the address clause must not terminate prematurely, so that the back-end 
can complete code generation.

Compiling and executing main,adb must yield:

   main.adb:4:03: warning: in instantiation at erase_on_finalize.adb:22
   main.adb:4:03: warning: cannot use overlays with controlled objects
   main.adb:4:03: warning: Program_Error will be raised at run time

   raised PROGRAM_ERROR : main.adb:2 finalize/adjust raised exception

---
with Erase_On_Finalize;
procedure Main is

  package Sensitive_Integer is new Erase_On_Finalize (Integer);

  I1: Sensitive_Integer.Object := (Sensitive_Integer.Root with 1);
  I2: Sensitive_Integer.Object := (Sensitive_Integer.Root with 2);
  I3: Sensitive_Integer.Object := (Sensitive_Integer.Root with 3);

begin
  null;
end Main;
---
with Ada.Finalization;

generic
   type Data_Type is limited private;
package Erase_On_Finalize is

   subtype Root is Ada.Finalization.Limited_Controlled;

   type Object is new Root with
  record
 Contents : Data_Type;
  end record;

   overriding procedure Finalize (Obj: in out Object);

   -- Disallow dynamic memory allocation.
   type Object_Ref is access Object with Storage_Size => 0;

end Erase_On_Finalize;
---
with System;
with Interfaces;
with GNAT.Memory_Dump;
package body Erase_On_Finalize is

   subtype Byte_Type is Interfaces.Unsigned_8;
   use type Byte_Type;

   Erase_Pattern : constant Byte_Type := 0;

   procedure Erase_Bytes (Addr: System.Address; Bytes: Natural) is
  type Byte_View_Type is array (1 .. Bytes) of Byte_Type;
  Byte_View : Byte_View_Type with Address => Addr, Volatile;
   begin
  GNAT.Memory_Dump.Dump (Addr, Bytes);
  Byte_View := (others => Erase_Pattern);
   end Erase_Bytes;

   overriding procedure Finalize (Obj: in out Object) is
  type Byte_View_Type is array (1 .. (Obj.Contents'Size + 7) / 8)
of Byte_Type;
  Byte_View : Byte_View_Type  with address => Obj.Contents'address,
   Volatile;
   begin
  GNAT.Memory_Dump.Dump (Byte_View'Address, Byte_View'Length);
  Byte_View := (others => Erase_Pattern);
   end Finalize;
end Erase_On_Finalize;

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-10-20  Ed Schonberg  

* sem_ch13.adb: nalyze_Attribute_Definition_Clause, case
'Address):  If either object is controlled the overlay is
erroneous, but analysis must be completed so that back-end sees
address clause and completes code generation.  Improve text
of warning.

Index: sem_ch13.adb
===
--- sem_ch13.adb(revision 229071)
+++ sem_ch13.adb(working copy)
@@ -4711,20 +4711,22 @@
 
   Find_Overlaid_Entity (N, O_Ent, Off);
 
-  --  Overlaying controlled objects is erroneous
+  --  Overlaying controlled objects is erroneous.
+  --  Emit warning but continue analysis because program is
+  --  itself legal, and back-end must see address clause.
 
   if Present (O_Ent)
 and then (Has_Controlled_Component (Etype (O_Ent))
or else Is_Controlled (Etype (O_Ent)))
+and then not Inside_A_Generic
   then
  Error_Msg_N
-   ("??cannot overlay with controlled object", Expr);
+   ("??cannot use overlays with controlled objects", Expr);
  Error_Msg_N
("\??Program_Error will be raised at run time", Expr);
  Insert_Action (Declaration_Node (U_Ent),
Make_Raise_Program_Error (Loc,
  Reason => PE_Overlaid_Controlled_Object));
- return;
 
   elsif Present (O_Ent)
 and then Ekind (U_Ent) = E_Constant


[PATCH] Fix PR68017

2015-10-20 Thread Richard Biener

The usual stmt removing vs. debug stmts thing.

Bootstrapped and tested on x86_64-linux-gnu, applied to trunk.

Richard.

2015-10-20  Richard Biener  

PR tree-optimization/68017
* tree-tailcall.c (eliminate_tail_call): Remove stmts backwards.

* gcc.dg/torture/pr68017.c: New testcase.

Index: gcc/tree-tailcall.c
===
--- gcc/tree-tailcall.c (revision 228971)
+++ gcc/tree-tailcall.c (working copy)
@@ -847,17 +847,21 @@ eliminate_tail_call (struct tailcall *t)
  possibly unreachable code in other blocks is removed later in
  cfg cleanup.  */
   gsi = t->call_gsi;
-  gsi_next (&gsi);
-  while (!gsi_end_p (gsi))
+  gimple_stmt_iterator gsi2 = gsi_last_bb (gimple_bb (gsi_stmt (gsi)));
+  while (gsi_stmt (gsi2) != gsi_stmt (gsi))
 {
-  gimple *t = gsi_stmt (gsi);
+  gimple *t = gsi_stmt (gsi2);
   /* Do not remove the return statement, so that redirect_edge_and_branch
 sees how the block ends.  */
-  if (gimple_code (t) == GIMPLE_RETURN)
-   break;
-
-  gsi_remove (&gsi, true);
-  release_defs (t);
+  if (gimple_code (t) != GIMPLE_RETURN)
+   {
+ gimple_stmt_iterator gsi3 = gsi2;
+ gsi_prev (&gsi2);
+ gsi_remove (&gsi3, true);
+ release_defs (t);
+   }
+  else
+   gsi_prev (&gsi2);
 }
 
   /* Number of executions of function has reduced by the tailcall.  */
Index: gcc/testsuite/gcc.dg/torture/pr68017.c
===
--- gcc/testsuite/gcc.dg/torture/pr68017.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr68017.c  (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-g" } */
+
+long long a;
+
+short
+fn1 (short p1, unsigned short p2)
+{
+  return p1 + p2;
+}
+
+short
+fn2 ()
+{
+  int b = a ? fn1 (fn2 (), a) : 0;
+  return b;
+}


[PATCH][AArch64 Testsuite][Trivial?] Remove divisions-to-produce-NaN from vdiv_f.c

2015-10-20 Thread Alan Lawrence
The test vdiv_f.c #define's NAN to (0.0 / 0.0). This produces extra scalar 
fdiv's, which complicate the scan-assembler testing. We can remove these by 
using __builtin_nan instead.

Tested on AArch64 Linux.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vdiv_f.c: Use __builtin_nan.
---
 gcc/testsuite/gcc.target/aarch64/vdiv_f.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/vdiv_f.c 
b/gcc/testsuite/gcc.target/aarch64/vdiv_f.c
index 45c72a9..a505e39 100644
--- a/gcc/testsuite/gcc.target/aarch64/vdiv_f.c
+++ b/gcc/testsuite/gcc.target/aarch64/vdiv_f.c
@@ -7,7 +7,7 @@
 #define FLT_INFINITY (__builtin_inff ())
 #define DBL_INFINITY (__builtin_inf ())
 
-#define NAN (0.0 / 0.0)
+#define NAN (__builtin_nan (""))
 
 #define PI 3.141592653589793
 #define PI_4 0.7853981633974483
@@ -228,9 +228,7 @@ test_vdiv_f64 ()
   return 0;
 }
 
-/* The following assembly should match 2 more times,
-   in 64bit NAN generation.  */
-/* { dg-final { scan-assembler-times "fdiv\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 
3 } } */
+/* { dg-final { scan-assembler-times "fdiv\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 
1 } } */
 
 #undef TESTA8
 #undef ANSW8
-- 
1.9.1



[PATCH] Pass manager: add support for termination of pass list

2015-10-20 Thread Martin Liška
Hello.

As part of upcoming merge of HSA branch, we would like to have possibility to 
terminate
pass manager after execution of the HSA generation pass. The HSA back-end is 
implemented
as a tree pass that directly emits HSAIL from gimple tree representation. The 
pass operates
on clones created by HSA IPA pass and the pass manager should stop execution of 
further
RTL passes.

Suggested patch survives bootstrap and regression tests on x86_64-linux-pc.

What do you think about it?
Thanks,
Martin
>From fc60561c3ac09188fb61dac61b1cc2422061fc1d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 19 Oct 2015 23:26:54 +0200
Subject: [PATCH] Add support for termination of pass manager.

gcc/ChangeLog:

2015-10-19  Martin Liska  

	* passes.c (execute_one_pass): Add new argument called exit.
	(execute_pass_list_1): Terminate pass manager if a pass
	requests termination.
	(execute_ipa_pass_list): Likewise.
	* tree-pass.h: Introduce new TODO_stop_pass_execution.
---
 gcc/passes.c| 24 
 gcc/tree-pass.h |  3 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/gcc/passes.c b/gcc/passes.c
index 6ef6d2e..1199ae3 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2279,10 +2279,11 @@ override_gate_status (opt_pass *pass, tree func, bool gate_status)
 }
 
 
-/* Execute PASS. */
+/* Execute PASS.  If the PASS requests to stop after its execution, EXIT
+   value is set to true.  */
 
 bool
-execute_one_pass (opt_pass *pass)
+execute_one_pass (opt_pass *pass, bool *exit)
 {
   unsigned int todo_after = 0;
 
@@ -2387,18 +2388,28 @@ execute_one_pass (opt_pass *pass)
   if (!((todo_after | pass->todo_flags_finish) & TODO_do_not_ggc_collect))
 ggc_collect ();
 
+  /* If finish TODO flags contain TODO_stop_pass_execution, set exit = true.  */
+  if (todo_after & TODO_stop_pass_execution)
+*exit = true;
+
   return true;
 }
 
 static void
 execute_pass_list_1 (opt_pass *pass)
 {
+  bool stop_pass_execution = false;
+
   do
 {
   gcc_assert (pass->type == GIMPLE_PASS
 		  || pass->type == RTL_PASS);
-  if (execute_one_pass (pass) && pass->sub)
+  if (execute_one_pass (pass, &stop_pass_execution) && pass->sub)
 execute_pass_list_1 (pass->sub);
+
+  if (stop_pass_execution)
+	return;
+
   pass = pass->next;
 }
   while (pass);
@@ -2739,12 +2750,14 @@ ipa_read_optimization_summaries (void)
 void
 execute_ipa_pass_list (opt_pass *pass)
 {
+  bool stop_pass_execution = false;
+
   do
 {
   gcc_assert (!current_function_decl);
   gcc_assert (!cfun);
   gcc_assert (pass->type == SIMPLE_IPA_PASS || pass->type == IPA_PASS);
-  if (execute_one_pass (pass) && pass->sub)
+  if (execute_one_pass (pass, &stop_pass_execution) && pass->sub)
 	{
 	  if (pass->sub->type == GIMPLE_PASS)
 	{
@@ -2763,6 +2776,9 @@ execute_ipa_pass_list (opt_pass *pass)
   gcc_assert (!current_function_decl);
   symtab->process_new_functions ();
   pass = pass->next;
+
+  if (stop_pass_execution)
+	return;
 }
   while (pass);
 }
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index c37e4b2..a481bac 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -300,6 +300,9 @@ protected:
 /* Rebuild the callgraph edges.  */
 #define TODO_rebuild_cgraph_edges   (1 << 22)
 
+/* Stop pass manager after execution of a pass.  */
+#define TODO_stop_pass_execution	(1 << 23)
+
 /* Internally used in execute_function_todo().  */
 #define TODO_update_ssa_any		\
 (TODO_update_ssa			\
-- 
2.6.0



Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof

2015-10-20 Thread Bernd Schmidt

On 10/16/2015 09:34 PM, Martin Sebor wrote:

Thank you for the review. Attached is an updated patch that hopefully
addresses all your comments. I ran the check_GNU_style.sh script on
it to make sure I didn't miss something.  I've also added replies to
a few of your comments below.


Ok, thanks. However - the logic around the context struct in the patch 
still seems fairly impenetrable to me, and I've been wondering whether a 
simpler approach wouldn't work just as well. I came up with the patch 
below, which just passes a tree_code context to recursive invocations 
and increasing the upper bound by one only if not in an ARRAY_REF or 
COMPONENT_REF context.


As far as I can tell, this produces the same warnings on the testcase 
(modulo differences in type printout), except for the final few FA5_7 
testcases, which I had doubts about anyway:


typedef struct FA5_7 {
  int i;
  char a5_7 [5][7];
} FA5_7;

__builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning 
"index" }
__builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning 
"index" }
__builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning 
"index" }
__builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning 
"index" }


Why wouldn't at least the first two be convered by the 
one-past-the-end-is-ok rule?



Bernd

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 229049)
+++ gcc/c-family/c-common.c	(working copy)
@@ -10589,11 +10589,11 @@ c_common_to_target_charset (HOST_WIDE_IN
traditional rendering of offsetof as a macro.  Return the folded result.  */
 
 tree
-fold_offsetof_1 (tree expr)
+fold_offsetof_1 (tree expr, enum tree_code ctx)
 {
   tree base, off, t;
-
-  switch (TREE_CODE (expr))
+  tree_code code = TREE_CODE (expr);
+  switch (code)
 {
 case ERROR_MARK:
   return expr;
@@ -10617,7 +10617,7 @@ fold_offsetof_1 (tree expr)
   return TREE_OPERAND (expr, 0);
 
 case COMPONENT_REF:
-  base = fold_offsetof_1 (TREE_OPERAND (expr, 0));
+  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
   if (base == error_mark_node)
 	return base;
 
@@ -10634,7 +10634,7 @@ fold_offsetof_1 (tree expr)
   break;
 
 case ARRAY_REF:
-  base = fold_offsetof_1 (TREE_OPERAND (expr, 0));
+  base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
   if (base == error_mark_node)
 	return base;
 
@@ -10649,8 +10649,9 @@ fold_offsetof_1 (tree expr)
 	  && !tree_int_cst_equal (upbound,
   TYPE_MAX_VALUE (TREE_TYPE (upbound
 	{
-	  upbound = size_binop (PLUS_EXPR, upbound,
-build_int_cst (TREE_TYPE (upbound), 1));
+	  if (ctx != ARRAY_REF && ctx != COMPONENT_REF)
+		upbound = size_binop (PLUS_EXPR, upbound,
+  build_int_cst (TREE_TYPE (upbound), 1));
 	  if (tree_int_cst_lt (upbound, t))
 		{
 		  tree v;
Index: gcc/c-family/c-common.h
===
--- gcc/c-family/c-common.h	(revision 229049)
+++ gcc/c-family/c-common.h	(working copy)
@@ -1029,7 +1029,7 @@ extern bool c_dump_tree (void *, tree);
 
 extern void verify_sequence_points (tree);
 
-extern tree fold_offsetof_1 (tree);
+extern tree fold_offsetof_1 (tree, tree_code ctx = ERROR_MARK);
 extern tree fold_offsetof (tree);
 
 /* Places where an lvalue, or modifiable lvalue, may be required.


Re: [PATCH] x86 interrupt attribute

2015-10-20 Thread Yulia Koval
The debug_info section for the interrupt function looks ok.

I tried to call it from assembler code to check it in gdb.

pushl   $0x333 ;eflags
pushl   $0x111 ;cs
pushl $0x222   ;eip
jmp  foo   ;interrupt function

#define uword_t unsigned int
struct interrupt_frame
{
 uword_t ip;
 uword_t cs;
 uword_t flags;
};

I get this inside the interrupt function:

Breakpoint 1, foo (frame=0xd560) at interrupt-1.c:7
7   a = (struct interrupt_frame*) frame;
(gdb) p/x ((struct interrupt_frame*)frame)->ip
$1 = 0x222
(gdb) p/x ((struct interrupt_frame*)frame)->cs
$3 = 0x111
(gdb) p/x ((struct interrupt_frame*)frame)->flags
$4 = 0x333

Frame pointer info looks ok.


On Tue, Oct 13, 2015 at 3:18 PM, Yulia Koval  wrote:
>
> Here is the current version of the patch with all the fixes.
> Regtested\bootstraped it on 64 bit.
>
> We need a pointer since interrupt handler will update data pointing
> to by frame.  Since error_code isn't at the normal location where the
> parameter is passed on stack and frame isn't in a hard register, we
> changed ix86_function_arg:
>
> +  if (!cum->caller && cfun->machine->func_type != TYPE_NORMAL)
> +{
> +  /* The first argument of interrupt handler is a pointer and
> +points to the return address slot on stack.  The optional
> +second argument is an integer for error code on stack.  */
> +  gcc_assert (type != NULL_TREE);
> +  if (POINTER_TYPE_P (type))
> +   {
> + if (cfun->machine->func_type == TYPE_EXCEPTION)
> +   /* (AP) in the current frame in exception handler.  */
> +   arg = arg_pointer_rtx;
> + else
> +   /* -WORD(AP) in the current frame in interrupt handler.  */
> +   arg = force_reg (Pmode,
> +plus_constant (Pmode, arg_pointer_rtx,
> +   -UNITS_PER_WORD));
> + if (mode != Pmode)
> +   arg = convert_to_mode (mode, arg, 1);
> +   }
> +  else
> +   {
> + gcc_assert (TREE_CODE (type) == INTEGER_TYPE
> + && cfun->machine->func_type == TYPE_EXCEPTION
> + && mode == word_mode);
> + /* The error code is at -WORD(AP) in the current frame in
> +exception handler.  */
> + arg = gen_rtx_MEM (word_mode,
> +plus_constant (Pmode, arg_pointer_rtx,
> +   -UNITS_PER_WORD));
> +   }
> +
> +  return arg;
> +}
> +
>
> to return a pseudo register.  It violates
>
>Return where to put the arguments to a function.
>Return zero to push the argument on the stack, or a hard register in
>which to store the argument.
>
> Register allocator has no problem with parameters in pseudo registers.
> But GCC crashes when it tries to access DECL_INCOMING_RTL as a hard
> register when generating debug information.  We worked around it by
> doing
>
> +
> +  if (cfun->machine->func_type != TYPE_NORMAL)
> +{
> +  /* Since the pointer argument of interrupt handler isn't a real
> + argument, adjust DECL_INCOMING_RTL for debug output.  */
> +  tree arg = DECL_ARGUMENTS (current_function_decl);
> +  gcc_assert (arg != NULL_TREE
> + && POINTER_TYPE_P (TREE_TYPE (arg)));
> +  if (cfun->machine->func_type == TYPE_EXCEPTION)
> +   /* (AP) in the current frame in exception handler.  */
> +   DECL_INCOMING_RTL (arg) = arg_pointer_rtx;
> +  else
> +   /* -WORD(AP) in the current frame in interrupt handler.  */
> +   DECL_INCOMING_RTL (arg) = plus_constant (Pmode,
> +arg_pointer_rtx,
> +-UNITS_PER_WORD);
> +}
>
>
> On Mon, Oct 5, 2015 at 12:29 PM, Uros Bizjak  wrote:
> > On Mon, Oct 5, 2015 at 1:17 AM, H.J. Lu  wrote:
> >
> >>> Looking a bit deeper into the code, it looks that we want to realign
> >>> the stack in the interrupt handler. Let's assume that interrupt
> >>> handler is calling some other function that saves SSE vector regs to
> >>> the stack. According to the x86 ABI, incoming stack of the called
> >>> function is assumed to be aligned to 16 bytes. But, interrupt handler
> >>> violates this assumption, since the stack could be aligned to only 4
> >>> bytes for 32bit and 8 bytes for 64bit targets. Entering the called
> >>> function with stack, aligned to less than 16 bytes will certainly
> >>> violate ABI.
> >>>
> >>> So, it looks to me that we need to realign the stack in the interrupt
> >>> handler unconditionally to 16bytes. In this case, we also won't need
> >>> the following changes:
> >>>
> >>
> >> Current stack alignment implementation requires at least
> >> one, maybe two, scratch registers:
> >>
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67841
> >>
> >> Ext

Re: [PATCH][AArch64 Testsuite][Trivial?] Remove divisions-to-produce-NaN from vdiv_f.c

2015-10-20 Thread Marcus Shawcroft
On 20 October 2015 at 13:40, Alan Lawrence  wrote:
> The test vdiv_f.c #define's NAN to (0.0 / 0.0). This produces extra scalar 
> fdiv's, which complicate the scan-assembler testing. We can remove these by 
> using __builtin_nan instead.
>
> Tested on AArch64 Linux.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vdiv_f.c: Use __builtin_nan.

OK
/Marcus


Re: [PATCH] 2015-10-19 Benedikt Huber Philipp Tomsich

2015-10-20 Thread Marcus Shawcroft
On 4 January 1970 at 00:02, Benedikt Huber
 wrote:
> * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
> * config/aarch64/aarch64-protos.h: Declare.
> * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte and
> frsqrts.
> * config/aarch64/aarch64-tuning-flags.def: Added recip_sqrt.
> * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code 
> when
> applicable.
> * config/aarch64/aarch64.md: Added enum entries.
> * config/aarch64/aarch64.opt: Added option -mlow-precision-recip-sqrt.
> * testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h: Common 
> macros for
> assembly checks.
> * testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c: Make sure
> frsqrts and frsqrte are not emitted.
> * testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c: Make sure frsqrts 
> and
> frsqrte are emitted.
> * testsuite/gcc.target/aarch64/rsqrt_1.c: Functional tests for rsqrt.

OK /Marcus


Re: [PATCH] Pass manager: add support for termination of pass list

2015-10-20 Thread Richard Biener
On Tue, Oct 20, 2015 at 3:00 PM, Martin Liška  wrote:
> Hello.
>
> As part of upcoming merge of HSA branch, we would like to have possibility to 
> terminate
> pass manager after execution of the HSA generation pass. The HSA back-end is 
> implemented
> as a tree pass that directly emits HSAIL from gimple tree representation. The 
> pass operates
> on clones created by HSA IPA pass and the pass manager should stop execution 
> of further
> RTL passes.
>
> Suggested patch survives bootstrap and regression tests on x86_64-linux-pc.
>
> What do you think about it?

Are you sure it works this way?

Btw, you will miss executing of all the cleanup passes that will
eventually free memory
associated with the function.  So I'd rather support a
TODO_discard_function which
should basically release the body from the cgraph.

Richard.

> Thanks,
> Martin


Re: [vec-cmp, patch 2/6] Vectorization factor computation

2015-10-20 Thread Richard Biener
On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich  wrote:
> 2015-10-13 16:37 GMT+03:00 Richard Biener :
>> On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich  wrote:
>>> Hi,
>>>
>>> This patch handles statements with boolean result in vectorization factor 
>>> computation.  For comparison its operands type is used instead of restult 
>>> type to compute VF.  Other boolean statements are ignored for VF.
>>>
>>> Vectype for comparison is computed using type of compared values.  Computed 
>>> type is propagated into other boolean operations.
>>
>> This feels rather ad-hoc, mixing up the existing way of computing
>> vector type and VF.  I'd rather have turned the whole
>> vector type computation around to the scheme working on the operands
>> rather than on the lhs and then searching
>> for smaller/larger types on the rhs'.
>>
>> I know this is a tricky function (heh, but you make it even worse...).
>> And it needs a helper with knowledge about operations
>> so one can compute the result vector type for an operation on its
>> operands.  The seeds should be PHIs (handled like now)
>> and loads, and yes, externals need special handling.
>>
>> Ideally we'd do things in two stages, first compute vector types in a
>> less constrained manner (not forcing a single vector size)
>> and then in a 2nd run promote to a common size also computing the VF to do 
>> that.
>
> This sounds like a refactoring, not a functional change, right? Also I
> don't see a reason to analyze DF to compute vectypes if we promote it
> to a single vector size anyway. For booleans we have to do it because
> boolean vectors of the same size may have different number of
> elements. What is the reason to do it for other types?

For conversions and operators which support different sized operands

> Shouldn't it be a patch independent from comparison vectorization series?

As you like.

>>
>> Btw, I think you "mishandle" bool b = boolvar != 0;
>
> This should be handled fine. Statement will inherit a vectype of
> 'boolvar' definition. If it's invariant - then yes, invariant boolean
> statement case is not handled. But this is only because I supposed we
> just shouldn't have such statements in a loop. If we may have them,
> then using 'vector _Bool (VF)' type for that should be OK.
>
> Ilya
>
>>
>> Richard.
>>


Re: [Patch AArch64 63304] Fix issue with global state.

2015-10-20 Thread Marcus Shawcroft
On 16 October 2015 at 12:05, Ramana Radhakrishnan
 wrote:

> 2015-10-15  Ramana Radhakrishnan  
>
> PR target/63304
> * config/aarch64/aarch64.c (aarch64_nopcrelative_literal_loads): New.
> (aarch64_expand_mov_immediate): Use 
> aarch64_nopcrelative_literal_loads.
> (aarch64_classify_address): Likewise.
> (aarch64_secondary_reload): Likewise.
> (aarch64_override_options_after_change_1): Adjust.
> * config/aarch64/aarch64.md 
> (aarch64_reload_movcp):
> Use aarch64_nopcrelative_literal_loads.
> (aarch64_reload_movcp): Likewise.
> * config/aarch64/aarch64-protos.h 
> (aarch64_nopcrelative_literal_loads): Declare
>
> 2015-10-15  Jiong Wang  
> Ramana Radhakrishnan  
>
> PR target/63304
> * gcc.target/aarch64/pr63304-1.c: New test.

> +/* Global flag for PC relative loads.  */
> +bool aarch64_nopcrelative_literal_loads;

Yuk... but I don't have a better suggestion to hand.

> +++ b/gcc/testsuite/gcc.target/aarch64/pr63304-1.c

In that directory I've been pestering folks to follow the name
convention on the wiki for new additions, so _1 please (or change the
wiki ;-).

Otherwise OK with me.

Cheers
/Marcus


[hsa] Allow simple local assignments in between OMP statements

2015-10-20 Thread Martin Jambor
Hello,

the following change to gridification part of the HSA branch are
necessary to gridify loops which are bounded by Fortran parameters
passed by reference, and therefore which have dereference of the
actual parameter in front of the loop but within the target construct.
Committed to the branch.

Thanks,

Martin


2015-10-20  Martin Jambor  

* omp-low.c (single_stmt_in_seq_skip_bind): Removed.
(reg_assignment_to_local_var_p): New function.
(seq_only_contains_local_assignments): Use it.
(find_single_omp_among_assignments_1): New function.
(find_single_omp_among_assignments): Likewise.
(target_follows_gridifiable_pattern): Use it.
(copy_leading_local_assignments): New function.
(find_mark_kernel_components): Renamed to process_kernel_body_copy,
use copy_leading_local_assignments to copy assignments leading to OMP
statements.
(attempt_target_gridification): Use the above new functions.
---
 gcc/omp-low.c | 303 --
 1 file changed, 191 insertions(+), 112 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5234a11..56cc813 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -84,7 +84,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "symbol-summary.h"
 #include "hsa.h"
 
-
 /* Lowering of OMP parallel and workshare constructs proceeds in two
phases.  The first phase scans the function looking for OMP statements
and then for variables that must be replaced to satisfy data sharing
@@ -12557,76 +12556,122 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
-/* If SEQ is a sequence containing only one statement or a bind statement which
-   itself contains only one statement, return that statement.  Otherwise return
-   NULL.  TARGET_LOC must be location of the target statement and NAME the name
-   of the currently processed statement, both are used for dumping.  */
+/* Returen true if STMT is an assignment of a register-type into a local
+   VAR_DECL.  */
 
-static gimple *
-single_stmt_in_seq_skip_bind (gimple_seq seq, location_t target_loc,
- const char *name)
+static bool
+reg_assignment_to_local_var_p (gimple *stmt)
 {
-  gimple *stmt;
-  bool loop;
-  do
+  gassign *assign = dyn_cast  (stmt);
+  if (!assign)
+return false;
+  tree lhs = gimple_assign_lhs (assign);
+  if (TREE_CODE (lhs) != VAR_DECL
+  || !is_gimple_reg_type (TREE_TYPE (lhs))
+  || is_global_var (lhs))
+return false;
+  return true;
+}
+
+/* Return true if all statements in SEQ are assignments to local register-type
+   variables.  */
+
+static bool
+seq_only_contains_local_assignments (gimple_seq seq)
+{
+  if (!seq)
+return true;
+
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
+if (!reg_assignment_to_local_var_p (gsi_stmt (gsi)))
+  return false;
+  return true;
+}
+
+
+/* Scan statements in SEQ and call itself recursively on any bind.  If during
+   whole search only assignments to register-type local variables and one
+   single OMP statement is encountered, return true, otherwise return false.
+   8RET is where we store any OMP statement encountered.  TARGET_LOC and NAME
+   are used for dumping a note about a failure.  */
+
+static bool
+find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc,
+const char *name, gimple **ret)
+{
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi))
 {
-  if (!seq)
+  gimple *stmt = gsi_stmt (gsi);
+
+  if (reg_assignment_to_local_var_p (stmt))
+   continue;
+  if (gbind *bind = dyn_cast  (stmt))
{
- gcc_assert (name);
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, target_loc,
-"Will not turn target construct into a simple "
-"GPGPU kernel because %s construct has empty "
-"body\n",
-name);
- return NULL;
+ if (!find_single_omp_among_assignments_1 (gimple_bind_body (bind),
+   target_loc, name, ret))
+ return false;
}
-
-  if (!gimple_seq_singleton_p (seq))
+  else if (is_gimple_omp (stmt))
+   {
+ if (*ret)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, target_loc,
+"Will not turn target construct into a simple "
+"GPGPU kernel because %s construct contains "
+"multiple OpenMP constructs\n", name);
+ return false;
+   }
+ *ret = stmt;
+   }
+  else
{
- gcc_assert (name);
  if (dump_enabled_p ())
dump_pri

[hsa] Do not output an extraneous kernel when gridifying

2015-10-20 Thread Martin Jambor
Hi,

a small bug caused HSAIL output of a kernel for the parallel
construct, even when it has been made a part of a gridified kernel and
the standalone construct is therefore not necessary for HSA.  This
small patch fixes that.  Committed to the branch.

On a related note, I have merged trunk revision 228776 (just before
OpenMP 4.5 landed) to the branch and am in the process of
forward-porting the relevant bits to the new OpenMP.  Hopefully it
will not take too long.

Thanks,

Martin


2015-10-20  Martin Jambor  

* omp-low.c (region_part_of_target_p): Renamed to
region_needs_kernel_p, return false for parts of gridified kernels.
(expand_parallel_call): Adjusted.
---
 gcc/omp-low.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 56cc813..900ed19 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4872,19 +4872,32 @@ gimple_build_cond_empty (tree cond)
   return gimple_build_cond (pred_code, lhs, rhs, NULL_TREE, NULL_TREE);
 }
 
-/* Return true if the REGION is within a declare target function or within a
-   target region.  */
+/* Return true if a parallel REGION is within a declare target function or
+   within a target region and is not a part of a gridified kernel.  */
 
 static bool
-region_part_of_target_p (struct omp_region *region)
+region_needs_kernel_p (struct omp_region *region)
 {
+  bool indirect = false;
+  for (region = region->outer; region; region = region->outer)
+{
+  if (region->type == GIMPLE_OMP_PARALLEL)
+   indirect = true;
+  else if (region->type == GIMPLE_OMP_TARGET)
+   {
+ gomp_target *tgt_stmt;
+ tgt_stmt = as_a  (last_stmt (region->entry));
+ if (tgt_stmt->kernel_iter)
+   return indirect;
+ else
+   return true;
+   }
+}
+
   if (lookup_attribute ("omp declare target",
DECL_ATTRIBUTES (current_function_decl)))
 return true;
 
-  for (region = region->outer; region; region = region->outer)
-if (region->type == GIMPLE_OMP_TARGET)
-  return true;
   return false;
 }
 
@@ -5058,7 +5071,7 @@ expand_parallel_call (struct omp_region *region, 
basic_block bb,
false, GSI_CONTINUE_LINKING);
 
   if (hsa_gen_requested_p ()
-  && region_part_of_target_p (region))
+  && region_needs_kernel_p (region))
 {
   cgraph_node *child_cnode = cgraph_node::get (child_fndecl);
   hsa_register_kernel (child_cnode);
-- 
2.6.0



Re: [PATCH v2 1/6] Libsanitizer merge from upstream r250806 (was r249633).

2015-10-20 Thread Maxim Ostapenko
Great, thanks! I'm going to commit the whole patch set tomorrow morning 
if no objections.


On 20/10/15 14:52, Jakub Jelinek wrote:

On Tue, Oct 20, 2015 at 02:18:25PM +0300, Maxim Ostapenko wrote:

This is the merge itself. Added DCAN_SANITIZE_UB={0, 1} and nuked
DASAN_FLEXIBLE_MAPPING_AND_OFFSET=0 (not used now) in corresponding
Makefiles.

Ok.

Jakub





Re: [PATCH] Move cproj simplification to match.pd

2015-10-20 Thread Kyrill Tkachov


On 20/10/15 08:37, Richard Biener wrote:

On Mon, 19 Oct 2015, Christophe Lyon wrote:


On 19 October 2015 at 15:54, Richard Biener  wrote:

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.


Hi Richard,

This patch caused arm and aarch64 builds of newlib to cause ICEs:
In file included from
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdlib.h:11:0,
  from
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:13:
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:
In function '_mktm_r':
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:28:9:
internal compiler error: Segmentation fault
  _DEFUN (_mktm_r, (tim_p, res, is_gmtime),
0xa90205 crash_signal
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:353
0x7b3b0c tree_class_check
 /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3055
0x7b3b0c tree_single_nonnegative_warnv_p(tree_node*, bool*, int)
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13025
0x814053 gimple_phi_nonnegative_warnv_p
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6239
0x814053 gimple_stmt_nonnegative_warnv_p(gimple*, bool*, int)
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6264
0x7b5c94 tree_expr_nonnegative_p(tree_node*)
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13325
0xe2f657 gimple_simplify_108
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:5116
0xe3060d gimple_simplify_TRUNC_MOD_EXPR
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:24762
0xe0809b gimple_simplify
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:34389
0xe08c2b gimple_resimplify2(gimple**, code_helper*, tree_node*,
tree_node**, tree_node* (*)(tree_node*))
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:193
0xe17600 gimple_simplify(gimple*, code_helper*, tree_node**, gimple**,
tree_node* (*)(tree_node*), tree_node* (*)(tree_node*))
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:762
0x81c694 fold_stmt_1
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:3605
0xad0f6c replace_uses_by(tree_node*, tree_node*)
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1835
0xad1a2f gimple_merge_blocks
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1921
0x67d325 merge_blocks(basic_block_def*, basic_block_def*)
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:776
0xae06da cleanup_tree_cfg_bb
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:654
0xae1118 cleanup_tree_cfg_1
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:686
0xae1118 cleanup_tree_cfg_noloop
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:738
0xae1118 cleanup_tree_cfg()
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:793
0x9c5c94 execute_function_todo
 
/tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/passes.c:1920
Please submit a full bug report,

This happens for instance with GCC configured
--target arm-none-eabi
--with-cpu cortex-a9

You can download logs of a failed build from
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-build/trunk/228970/build.html

Sorry, I'm out of office for one week, I can't produce further details.

Ok, without preprocessed source it's hard to track down but from the
backtrace I figure it's an issue similar to that of PR67815.  So a
testcase would be really nice.


Seems it has been filed PR 68031.

Kyrill


Richard.


Christophe



Richard.

2015-10-19  Richard Biener  

 * gimple-fold.c (gimple_phi_nonnegative_warnv_p): New function.
 (gimple_stmt_nonnegative_warnv_p): Use it.
 * match.pd (CPROJ): New operator list.
 (cproj (complex ...)): Move simplifications from ...
 * builtins.c (fold_builtin_cproj): ... here.

 * gcc.dg/torture/builtin-cproj-1.c: Skip for -O0.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 228877)
+++ gcc/gimple-fold.c   (working copy)
@@ -6224,6 +6224,24 @@ gimple_call_nonnegative_warnv_p (gimple
 strict_overflow_p, depth);
  }

+/* Return true if return value of call STMT is known to be non-negative.
+   If the return value is based on the assumption that signed overflow is
+   undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
+   *STRIC

Re: PING^2: [PATCH] Limit alignment on error_mark_node variable

2015-10-20 Thread Jan Hubicka
> >>
> >> This patch avoids calling varpool_node::finalize_decl on error_mark_node
> >> type decls.  Does it make sense?

Yep, i guess this is fine.  I wonder why error_mark_node needs to be in local
decl list? :)

honza
> >>
> >>
> >> H.J.
> >> --
> >> gcc/
> >>
> >> PR target/66810
> >> * cgraphbuild.c (pass_build_cgraph_edges::execute): Skip
> >> error_mark_node type decls.
> >>
> >> gcc/testsuite/
> >>
> >> PR target/66810
> >> * gcc.target/i386/pr66810.c: New test.
> >> ---
> >>  gcc/cgraphbuild.c   |  3 ++-
> >>  gcc/testsuite/gcc.target/i386/pr66810.c | 10 ++
> >>  2 files changed, 12 insertions(+), 1 deletion(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66810.c
> >>
> >> diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c
> >> index 7d2d096..4d679d9 100644
> >> --- a/gcc/cgraphbuild.c
> >> +++ b/gcc/cgraphbuild.c
> >> @@ -381,7 +381,8 @@ pass_build_cgraph_edges::execute (function *fun)
> >>FOR_EACH_LOCAL_DECL (fun, ix, decl)
> >>  if (TREE_CODE (decl) == VAR_DECL
> >> && (TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
> >> -   && !DECL_HAS_VALUE_EXPR_P (decl))
> >> +   && !DECL_HAS_VALUE_EXPR_P (decl)
> >> +   && TREE_TYPE (decl) != error_mark_node)
> >>varpool_node::finalize_decl (decl);
> >>record_eh_tables (node, fun);
> >>
> >> diff --git a/gcc/testsuite/gcc.target/i386/pr66810.c 
> >> b/gcc/testsuite/gcc.target/i386/pr66810.c
> >> new file mode 100644
> >> index 000..4778b37
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/i386/pr66810.c
> >> @@ -0,0 +1,10 @@
> >> +/* { dg-do compile { target ia32 } } */
> >> +/* { dg-options "-mno-sse -mno-mmx -miamcu" } */
> >> +
> >> +int vv;
> >> +
> >> +void
> >> +i (void)
> >> +{
> >> +  static int a[vv]; /* { dg-error "storage size" } */
> >> +}
> >> --
> >> 2.4.3
> >>
> >
> >
> >
> > --
> > H.J.
> 
> 
> 
> -- 
> H.J.


Re: [PATCH][AArch64] Replace insn to zero up DF register

2015-10-20 Thread Andrew Pinski
On Tue, Oct 20, 2015 at 7:59 AM, Andrew Pinski  wrote:
> On Tue, Oct 20, 2015 at 7:51 AM, Andrew Pinski  wrote:
>> On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes  
>> wrote:
>>> In the existing targets, it seems that it's always faster to zero up a DF
>>> register with "movi %d0, #0" instead of "fmov %d0, xzr".
>>
>> I think for ThunderX 1, this change will not make a difference.  So I
>> am neutral on this change.
>
> Actually depending on fmov is decoded in our pipeline, this change
> might actually be worse.  Currently fmov with an immediate is 1 cycle
> while movi is two cycles.  Let me double check how internally on how
> it is decoded and if it is 1 cycle or two.

Ok, my objections are removed as I talked with the architectures here
at Cavium and using movi is better in this case.

Thanks,
Andrew

>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Andrew
>>
>>>
>>> This patch modifies the respective pattern.
>>>
>>> Please, commit if it's alright.
>>>
>>> Thank you,
>>>
>>> --
>>> Evandro Menezes
>>>


[C PATCH] Don't accept invalid attribute-list (PR c/67964)

2015-10-20 Thread Marek Polacek
Joseph noticed that we were wrongly accepting multiple attributes without
commas.  Thus fixed by breaking out of the loop when parsing the attributes --
if we don't see a comma after an attribute, then the next tokens must be )); if
not, then the attribute is invalid.  I've also added a few more comments.

(I think it would be nicer to model the code after C++ and introduce
c_parser_attributes_list, but the following is enough to fix the bug.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-10-20  Marek Polacek  

PR c/67964
* c-parser.c (c_parser_attributes): Break out of the loop if the
token after an attribute isn't a comma.

* gcc.dg/pr67964.c: New test.

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 704ebc6..e7b8440 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -3965,7 +3965,9 @@ c_parser_attributes (c_parser *parser)
   /* ??? Follow the C++ parser rather than using the
 lex_untranslated_string kludge.  */
   parser->lex_untranslated_string = true;
+  /* Consume the `__attribute__' keyword.  */
   c_parser_consume_token (parser);
+  /* Look for the two `(' tokens.  */
   if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
{
  parser->lex_untranslated_string = false;
@@ -3993,17 +3995,24 @@ c_parser_attributes (c_parser *parser)
  attr_name = c_parser_attribute_any_word (parser);
  if (attr_name == NULL)
break;
- if (is_cilkplus_vector_p (attr_name))   
+ if (is_cilkplus_vector_p (attr_name))
{
  c_token *v_token = c_parser_peek_token (parser);
  c_parser_cilk_simd_fn_vector_attrs (parser, *v_token);
+ /* If the next token isn't a comma, we're done.  */
+ if (!c_parser_next_token_is (parser, CPP_COMMA))
+   break;
  continue;
}
  c_parser_consume_token (parser);
  if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN))
{
  attr = build_tree_list (attr_name, NULL_TREE);
+ /* Add this attribute to the list.  */
  attrs = chainon (attrs, attr);
+ /* If the next token isn't a comma, we're done.  */
+ if (!c_parser_next_token_is (parser, CPP_COMMA))
+   break;
  continue;
}
  c_parser_consume_token (parser);
@@ -4062,8 +4071,13 @@ c_parser_attributes (c_parser *parser)
 "expected %<)%>");
  return attrs;
}
+ /* Add this attribute to the list.  */
  attrs = chainon (attrs, attr);
+ /* If the next token isn't a comma, we're done.  */
+ if (!c_parser_next_token_is (parser, CPP_COMMA))
+   break;
}
+  /* Look for the two `)' tokens.  */
   if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
c_parser_consume_token (parser);
   else
diff --git gcc/testsuite/gcc.dg/pr67964.c gcc/testsuite/gcc.dg/pr67964.c
index e69de29..095b50f 100644
--- gcc/testsuite/gcc.dg/pr67964.c
+++ gcc/testsuite/gcc.dg/pr67964.c
@@ -0,0 +1,21 @@
+/* PR c/67964 */
+/* { dg-do compile } */
+
+extern int fn0 (void) __attribute__ ((const const)); /* { dg-error "expected" 
} */
+extern int fn1 (void) __attribute__ ((const, const));
+extern int fn2 (void) __attribute__ ((optimize (0) const)); /* { dg-error 
"expected" } */
+extern int fn3 (void) __attribute__ ((optimize (0), const));
+/* We allow starting/trailing comma.  */
+extern int fn4 (void) __attribute__ ((, const));
+extern int fn5 (void) __attribute__ ((const, ));
+extern int fn6 (void) __attribute__ (( const, ));
+extern int fn7 (void) __attribute__ ((,));
+extern int fn8 (void) __attribute__ ((__noreturn__ __noreturn__)); /* { 
dg-error "expected" } */
+extern int fn9 (void) __attribute__ ((__noreturn__, __noreturn__));
+extern int fn10 (void) __attribute__ ((__cold__ __pure__ __noclone__)); /* { 
dg-error "expected" } */
+extern int fn11 (void) __attribute__ ((__cold__, __pure__ __noclone__)); /* { 
dg-error "expected" } */
+int i;
+int ii;
+extern int a __attribute__ ((alias ("i") unused)); /* { dg-error "expected" } 
*/
+extern int a2 __attribute__ ((alias ("i" "i")));
+struct A { char p[6]; } __attribute__((__packed__ packed)); /* { dg-error 
"expected" } */

Marek


Re: [PATCH] Properly handle -miamcu and -march=lakemont

2015-10-20 Thread Uros Bizjak
On Sat, Oct 17, 2015 at 4:30 AM, H.J. Lu  wrote:
> -miamcu specifies how parameters are passed to functions and how value
> is returned from a function.  Inside function,  we can use instructions
> supported by -march=XXX.  -miamcu -march=haswell can generate x87, SSE
> and AVX instructions as long as the IA MCU psABI is followed.  But since
> Lakemont processor doesn't support x87 instructions, we shouldn't
> generate x87 instructions with -march=lakemont.  This patch separates
> code generation from -miamcu and makes -march=lakemont not to generate
> x87 instructions.
>
> Tested on x86-64.  OK for trunk?
>
> Thanks.
>
> H.J.
> 
> gcc/
>
> PR target/67963
> PR target/67985
> * common/config/i386/i386-common.c (ix86_handle_option): Remove
> OPT_miamcu handling.
> * config/i386/i386.c (PTA_NO_80387): New macro.
> (processor_alias_table): Add PTA_NO_80387 to lakemont.
> (ix86_option_override_internal): Update MASK_80387 from
> PTA_NO_80387.  Don't warn x87/MMX/SSE/AVX for -miamcu.  Warn
> SSE math only if 80387 is supported.  Don't change
> MASK_FLOAT_RETURNS.
> (ix86_valid_target_attribute_tree): Enable FPMATH_387 only if
> 80387 is supported.
> * config/i386/i386.h (TARGET_FLOAT_RETURNS_IN_80387): True only
> if TARGET_80387 is true and TARGET_IAMCU is false.
> (TARGET_FLOAT_RETURNS_IN_80387_P): True only if TARGET_80387_P
> is true and TARGET_IAMCU_P is false.
>
> gcc/testsuite/
>
> PR target/67963
> PR target/67985
> * gcc.target/i386/pr67963-1.c: New test.
> * gcc.target/i386/pr67963-2.c: Likewise.
> * gcc.target/i386/pr67963-3.c: Likewise.
> * gcc.target/i386/pr67985-1.c: Likewise.
> * gcc.target/i386/pr67985-2.c: Likewise.
> * gcc.target/i386/pr67985-3.c: Likewise.

OK.

Thanks,
Uros.

>  gcc/common/config/i386/i386-common.c  | 16 +---
>  gcc/config/i386/i386.c| 41 
> ++-
>  gcc/config/i386/i386.h|  8 --
>  gcc/testsuite/gcc.target/i386/pr67963-1.c |  9 +++
>  gcc/testsuite/gcc.target/i386/pr67963-2.c | 11 +
>  gcc/testsuite/gcc.target/i386/pr67963-3.c | 11 +
>  gcc/testsuite/gcc.target/i386/pr67985-1.c | 11 +
>  gcc/testsuite/gcc.target/i386/pr67985-2.c | 13 ++
>  gcc/testsuite/gcc.target/i386/pr67985-3.c | 12 +
>  9 files changed, 93 insertions(+), 39 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67963-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67963-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67963-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67985-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67985-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67985-3.c
>
> diff --git a/gcc/common/config/i386/i386-common.c 
> b/gcc/common/config/i386/i386-common.c
> index 79b2472..0f8c3e1 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -223,7 +223,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  bool
>  ix86_handle_option (struct gcc_options *opts,
> -   struct gcc_options *opts_set,
> +   struct gcc_options *opts_set ATTRIBUTE_UNUSED,
> const struct cl_decoded_option *decoded,
> location_t loc)
>  {
> @@ -232,20 +232,6 @@ ix86_handle_option (struct gcc_options *opts,
>
>switch (code)
>  {
> -case OPT_miamcu:
> -  if (value)
> -   {
> - /* Turn off x87/MMX/SSE/AVX codegen for -miamcu.  */
> - opts->x_target_flags &= ~MASK_80387;
> - opts_set->x_target_flags |= MASK_80387;
> - opts->x_ix86_isa_flags &= ~(OPTION_MASK_ISA_MMX_UNSET
> - | OPTION_MASK_ISA_SSE_UNSET);
> - opts->x_ix86_isa_flags_explicit |= (OPTION_MASK_ISA_MMX_UNSET
> - | OPTION_MASK_ISA_SSE_UNSET);
> -
> -   }
> -  return true;
> -
>  case OPT_mmmx:
>if (value)
> {
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 33e0cff..270199c 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -4295,6 +4295,7 @@ ix86_option_override_internal (bool main_args_p,
>  #define PTA_PCOMMIT(HOST_WIDE_INT_1 << 56)
>  #define PTA_MWAITX (HOST_WIDE_INT_1 << 57)
>  #define PTA_CLZERO (HOST_WIDE_INT_1 << 58)
> +#define PTA_NO_80387   (HOST_WIDE_INT_1 << 59)
>
>  #define PTA_CORE2 \
>(PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 \
> @@ -4339,7 +4340,7 @@ ix86_option_override_internal (bool main_args_p,
>{"i486", PROCESSOR_I486, CPU_NONE, 0},
>{"i586", PROCESSOR_PENTIUM, CPU_PENTIUM, 0},
>{"pentium", PROCESSOR_PENTIUM, CPU_PENTIUM, 0},
> - 

[PATCH] Refactor graphite-sese-to-poly, sese.h, graphite-poly.h

2015-10-20 Thread Aditya Kumar
Now that scop contains a list of all the basic blocks inside, it makes sense to
iterate over only those basic blocks in
graphite-sese-to-poly.c:rewrite_reductions_out_of_ssa,rewrite_cross_bb_scalar_deps_out_of_ssa

Passes regtest and bootstrap.

gcc/ChangeLog:

2015-10-20  Aditya Kumar  

* graphite-poly.h (struct dr_info): Added invalid_alias_set number.
(operator=): Removed.
(dr_info): Make alias_set number the last argument with default value 
of invalid_alias_set.
* graphite-sese-to-poly.c (build_scop_drs): Update constructor of 
dr_info.
(rewrite_reductions_out_of_ssa): Iterate only through the basic blocks 
which are inside region.
(rewrite_cross_bb_scalar_deps_out_of_ssa): Same.
* sese.h (struct sese_l): Removed assignment operator.
(split_region_for_bb): Removed dead code.


---
 gcc/graphite-poly.h | 26 +--
 gcc/graphite-sese-to-poly.c | 50 ++---
 gcc/sese.h  | 37 -
 3 files changed, 34 insertions(+), 79 deletions(-)

diff --git a/gcc/graphite-poly.h b/gcc/graphite-poly.h
index 721e914..5298f85 100644
--- a/gcc/graphite-poly.h
+++ b/gcc/graphite-poly.h
@@ -373,29 +373,23 @@ pbb_set_black_box (poly_bb_p pbb, gimple_poly_bb_p 
black_box)
 
 struct dr_info
 {
+  enum {
+invalid_alias_set = -1
+  };
   /* The data reference.  */
   data_reference_p dr;
 
-  /* ALIAS_SET is the SCC number assigned by a graph_dfs of the alias graph.  
-1
- is an invalid alias set.  */
-  int alias_set;
-
   /* The polyhedral BB containing this DR.  */
   poly_bb_p pbb;
 
+  /* ALIAS_SET is the SCC number assigned by a graph_dfs of the alias graph.
+ -1 is an invalid alias set.  */
+  int alias_set;
+
   /* Construct a DR_INFO from a data reference DR, an ALIAS_SET, and a PBB.  */
-  dr_info (data_reference_p dr, int alias_set, poly_bb_p pbb)
-: dr (dr), alias_set (alias_set), pbb (pbb) {}
-
-  /* Assignment operator, to be able to iterate over a vec of these objects.  
*/
-  const dr_info &
-  operator= (const dr_info &p)
-  {
-dr = p.dr;
-alias_set = p.alias_set;
-pbb = p.pbb;
-return *this;
-  }
+  dr_info (data_reference_p dr, poly_bb_p pbb,
+  int alias_set = invalid_alias_set)
+: dr (dr), pbb (pbb), alias_set (alias_set) {}
 };
 
 /* A SCOP is a Static Control Part of the program, simple enough to be
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index d75e6a2..d1eae90 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -1151,7 +1151,7 @@ build_scop_drs (scop_p scop)
   FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
 if (pbb)
   FOR_EACH_VEC_ELT (GBB_DATA_REFS (PBB_BLACK_BOX (pbb)), j, dr)
-   scop->drs.safe_push (dr_info (dr, -1, pbb));
+   scop->drs.safe_push (dr_info (dr, pbb));
 
   build_alias_set (scop);
 
@@ -1497,31 +1497,29 @@ rewrite_degenerate_phi (gphi_iterator *psi)
 static void
 rewrite_reductions_out_of_ssa (scop_p scop)
 {
+  int i;
   basic_block bb;
-  sese_l region = scop->scop_info->region;
-
-  FOR_EACH_BB_FN (bb, cfun)
-if (bb_in_sese_p (bb, region))
-  for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);)
-   {
- gphi *phi = psi.phi ();
+  FOR_EACH_VEC_ELT (scop->scop_info->bbs, i, bb)
+for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);)
+  {
+   gphi *phi = psi.phi ();
 
- if (virtual_operand_p (gimple_phi_result (phi)))
-   {
- gsi_next (&psi);
- continue;
-   }
+   if (virtual_operand_p (gimple_phi_result (phi)))
+ {
+   gsi_next (&psi);
+   continue;
+ }
 
- if (gimple_phi_num_args (phi) > 1
- && degenerate_phi_result (phi))
-   rewrite_degenerate_phi (&psi);
+   if (gimple_phi_num_args (phi) > 1
+   && degenerate_phi_result (phi))
+ rewrite_degenerate_phi (&psi);
 
- else if (scalar_close_phi_node_p (phi))
-   rewrite_close_phi_out_of_ssa (scop, &psi);
+   else if (scalar_close_phi_node_p (phi))
+ rewrite_close_phi_out_of_ssa (scop, &psi);
 
- else if (reduction_phi_p (region, &psi))
-   rewrite_phi_out_of_ssa (scop, &psi);
-   }
+   else if (reduction_phi_p (scop->scop_info->region, &psi))
+ rewrite_phi_out_of_ssa (scop, &psi);
+  }
 
   update_ssa (TODO_update_ssa);
 #ifdef ENABLE_CHECKING
@@ -1684,7 +1682,6 @@ rewrite_cross_bb_scalar_deps (scop_p scop, 
gimple_stmt_iterator *gsi)
 static void
 rewrite_cross_bb_scalar_deps_out_of_ssa (scop_p scop)
 {
-  basic_block bb;
   gimple_stmt_iterator psi;
   sese_l region = scop->scop_info->region;
   bool changed = false;
@@ -1692,10 +1689,11 @@ rewrite_cross_bb_scalar_deps_out_of_ssa (scop_p scop)
   /* Create an extra empty BB after the scop.  */
   split_edge (region.exit);
 
-  FOR_EACH_BB_FN (bb, cfun)
-if (bb_

Re: [PATCH v2] PR rtl-optimization/66790: uninitialized registers handling in REE

2015-10-20 Thread David Edelsohn
Pierre,

Did this revised patch address the comments about MIR from Kenny?

- David


Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation

2015-10-20 Thread Marcus Shawcroft
On 19 October 2015 at 14:57, Kyrill Tkachov  wrote:

> 2015-10-19  Kyrylo Tkachov  
>
> * config/aarch64/aarch64.md
>  (*aarch64_fcvt2_mult): New pattern.
> * config/aarch64/aarch64-simd.md
>  (*aarch64_fcvt2_mult): Likewise.
> * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
> (aarch64_fpconst_pow_of_2): New function.
> (aarch64_vec_fpconst_pow_of_2): Likewise.
> * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
> prototype.
> (aarch64_vec_fpconst_pow_of_2): Likewise.
> * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
> (aarch64_fp_vec_pow2): Likewise.
>
> 2015-10-19  Kyrylo Tkachov  
>
> * gcc.target/aarch64/fmul_fcvt_1.c: New test.
> * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.

+char buf[64];
+sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits);

Prefer snprintf please.

+  }
+  [(set_attr "type" "neon_fp_to_int_")]
+)
+
+

Superflous blank line here ?

+  *cost += rtx_cost (XEXP (x, 0), VOIDmode,
+ (enum rtx_code) code, 0, speed);

My understanding is the unnecessary use of enum  is now discouraged,
(rtx_code) is sufficient in this case.

+  int count = CONST_VECTOR_NUNITS (x);
+  int i;
+  for (i = 1; i < count; i++)

Push the int into the for initializer.
Push the rhs of the count assignment into the for condition and drop
the definition of count.

+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */

I'd prefer scan-assembler-times or do you have a particular reason to
avoid it in these tests?

Cheers
/Marcus


Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof

2015-10-20 Thread Martin Sebor

On 10/20/2015 07:20 AM, Bernd Schmidt wrote:

On 10/16/2015 09:34 PM, Martin Sebor wrote:

Thank you for the review. Attached is an updated patch that hopefully
addresses all your comments. I ran the check_GNU_style.sh script on
it to make sure I didn't miss something.  I've also added replies to
a few of your comments below.


Ok, thanks. However - the logic around the context struct in the patch
still seems fairly impenetrable to me, and I've been wondering whether a
simpler approach wouldn't work just as well. I came up with the patch
below, which just passes a tree_code context to recursive invocations
and increasing the upper bound by one only if not in an ARRAY_REF or
COMPONENT_REF context.

As far as I can tell, this produces the same warnings on the testcase
(modulo differences in type printout), except for the final few FA5_7
testcases, which I had doubts about anyway:

typedef struct FA5_7 {
   int i;
   char a5_7 [5][7];
} FA5_7;

 __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning
"index" }
 __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning
"index" }
 __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning
"index" }
 __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning
"index" }

Why wouldn't at least the first two be convered by the
one-past-the-end-is-ok rule?


Thanks for taking the time to try to simplify the patch!

Diagnosing the cases above is at the core of the issue (and can
be seen by substituting a5_7 into the oversimplified test case
in the bug).

In a 5 by 7 array, the offset computed for the out-of-bounds
a_5_7[0][7] is the same as the offset computed for the in-bounds
a_5_7[1][0].  The result might be unexpected because it suggests
that two "distinct" elements of an array share the same offset.

The "one-past-the-end rule" applies only to the offset corresponding
to the address just past the end of the whole array because like its
address, the offset of the element just beyond the end of the array
is valid and cannot be mistaken for an offset of a valid element.

My initial approach was similar to the patch you attached.  Most
of the additional complexity (i.e., the recursive use of the context)
grew out of wanting to diagnose these less than trivial cases of
surprising offsets in multi-dimensional arrays.  I don't think it
can be done without passing some state down the recursive calls but
I'd be happy to be proven wrong.

FWIW, teh bug and my patch for it were precipitated by looking for
the problems behind PR 67872 - missing -Warray-bounds warning (which
was in turn prompted by developing and testing a solution for PR
67942 - diagnose placement new buffer overflow).

I think -Warray-bounds should emit consistent diagnostics for invalid
array references regardless of the contexts. I.e., given

struct S {
int A [5][7];
int x;
} s;

these should both be diagnosed:

int i = offsetof (struct S, A [0][7]);

int *p = &s.A [0][7];

because they are both undefined and both can lead to surprising
results when used.

With my patch for 67882, GCC diagnoses the first case. I hope to
work on 67872 in the near future to diagnose the second to bring
GCC on par with Clang. (Clang doesn't yet diagnose any offsetof
errors).

Martin



[PATCH, alpha]: Use CEIL macro

2015-10-20 Thread Uros Bizjak
... plus some cleanups due to unused macro argument.

No functional changes.

2015-10-20  Uros Bizjak  

* config/alpha/alpha.h (HARD_REGNO_NREGS): Use CEIL macro.
(ALPHA_ARG_SIZE): Ditto.  Remove unused NAMED argument.
* config/alpha/alpha.c (alpha_function_arg_advance): Update
ALPHA_ARG_SIZE usage.
(alpha_arg_partial_bytes): Ditto.

Bootstrapped and regression tested on alpha-linux-gnu, committed to
mainline SVN.

Uros.
Index: config/alpha/alpha.c
===
--- config/alpha/alpha.c(revision 229082)
+++ config/alpha/alpha.c(working copy)
@@ -5613,7 +5613,7 @@ alpha_function_arg_advance (cumulative_args_t cum_
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
   bool onstack = targetm.calls.must_pass_in_stack (mode, type);
-  int increment = onstack ? 6 : ALPHA_ARG_SIZE (mode, type, named);
+  int increment = onstack ? 6 : ALPHA_ARG_SIZE (mode, type);
 
 #if TARGET_ABI_OSF
   *cum += increment;
@@ -5635,10 +5635,10 @@ alpha_arg_partial_bytes (cumulative_args_t cum_v,
 
 #if TARGET_ABI_OPEN_VMS
   if (cum->num_args < 6
-  && 6 < cum->num_args + ALPHA_ARG_SIZE (mode, type, named))
+  && 6 < cum->num_args + ALPHA_ARG_SIZE (mode, type))
 words = 6 - cum->num_args;
 #elif TARGET_ABI_OSF
-  if (*cum < 6 && 6 < *cum + ALPHA_ARG_SIZE (mode, type, named))
+  if (*cum < 6 && 6 < *cum + ALPHA_ARG_SIZE (mode, type))
 words = 6 - *cum;
 #else
 #error Unhandled ABI
Index: config/alpha/alpha.h
===
--- config/alpha/alpha.h(revision 229082)
+++ config/alpha/alpha.h(working copy)
@@ -383,7 +383,7 @@ extern enum alpha_fp_trap_mode alpha_fptm;
but can be less for certain modes in special long registers.  */
 
 #define HARD_REGNO_NREGS(REGNO, MODE)   \
-  ((GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
+  CEIL (GET_MODE_SIZE (MODE), UNITS_PER_WORD)
 
 /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE.
On Alpha, the integer registers can hold any mode.  The floating-point
@@ -674,13 +674,15 @@ extern int alpha_memory_latency;
 #define INIT_CUMULATIVE_ARGS(CUM, FNTYPE, LIBNAME, INDIRECT, N_NAMED_ARGS) \
   (CUM) = 0
 
-/* Define intermediate macro to compute the size (in registers) of an argument
-   for the Alpha.  */
+/* Define intermediate macro to compute
+   the size (in registers) of an argument.  */
 
-#define ALPHA_ARG_SIZE(MODE, TYPE, NAMED)  \
+#define ALPHA_ARG_SIZE(MODE, TYPE) \
   ((MODE) == TFmode || (MODE) == TCmode ? 1\
-   : (((MODE) == BLKmode ? int_size_in_bytes (TYPE) : GET_MODE_SIZE (MODE)) \
-  + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
+   : CEIL (((MODE) == BLKmode  \
+   ? int_size_in_bytes (TYPE)  \
+   : GET_MODE_SIZE (MODE)),\
+  UNITS_PER_WORD))
 
 /* Make (or fake) .linkage entry for function call.
IS_LOCAL is 0 if name is used in call, 1 if name is used in definition.  */


Re: GCC 6 Status Report (2015-10-16)

2015-10-20 Thread Pierre-Marie de Rodat

Hi Richard,

On 10/16/2015 05:09 AM, Richard Biener wrote:

This means it is time to get things you want to have in GCC 6 finalized
and reviewed.  As usual there may be exceptions to late reviewed features
but don't count on that.  Likewise target specific features can sneak in
during Stage 3 if maintainers ok them.


This is just to point out I have series of DWARF-related patches 
awaiting reviews for several months. I plan to rebase against trunk, 
retest and resubmit them shortly but here’s an overview:


  * The patch series for transition to standard DWARF for Ada 
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01857.html). There are 8 
patches, each one depending on the previous one, except the 6/8 one 
(https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01361.html) which could 
go on its own.


  * Fix DW_AT_static_link generation in DWARF 
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01855.html). The 
corresponding support in GDB is in the development branch.


  * Track indirect calls as call site information in DWARF 
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01856.html).


  * Materialize subprogram renamings in Ada as imported declarations in 
DWARF (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01854.html). The 
corresponding support in GDB is in the development branch.


If there is anything I should do to ease the review process, please let 
me know. :-) Thank you in advance!


--
Pierre-Marie de Rodat


Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation

2015-10-20 Thread Kyrill Tkachov


On 20/10/15 16:26, Marcus Shawcroft wrote:

On 19 October 2015 at 14:57, Kyrill Tkachov  wrote:


2015-10-19  Kyrylo Tkachov  

 * config/aarch64/aarch64.md
  (*aarch64_fcvt2_mult): New pattern.
 * config/aarch64/aarch64-simd.md
  (*aarch64_fcvt2_mult): Likewise.
 * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
 (aarch64_fpconst_pow_of_2): New function.
 (aarch64_vec_fpconst_pow_of_2): Likewise.
 * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
 prototype.
 (aarch64_vec_fpconst_pow_of_2): Likewise.
 * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
 (aarch64_fp_vec_pow2): Likewise.

2015-10-19  Kyrylo Tkachov  

 * gcc.target/aarch64/fmul_fcvt_1.c: New test.
 * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.

+char buf[64];
+sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits);

Prefer snprintf please.

+  }
+  [(set_attr "type" "neon_fp_to_int_")]
+)
+
+

Superflous blank line here ?

+  *cost += rtx_cost (XEXP (x, 0), VOIDmode,
+ (enum rtx_code) code, 0, speed);

My understanding is the unnecessary use of enum  is now discouraged,
(rtx_code) is sufficient in this case.

+  int count = CONST_VECTOR_NUNITS (x);
+  int i;
+  for (i = 1; i < count; i++)

Push the int into the for initializer.
Push the rhs of the count assignment into the for condition and drop
the definition of count.


Ok, done.



+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */

I'd prefer scan-assembler-times or do you have a particular reason to
avoid it in these tests?


No reason.
Here's the patch updated as per your feedback.

How's this?

Thanks,
Kyrill

2015-10-20  Kyrylo Tkachov  

* config/aarch64/aarch64.md
(*aarch64_fcvt2_mult): New pattern.
* config/aarch64/aarch64-simd.md
(*aarch64_fcvt2_mult): Likewise.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
(aarch64_fpconst_pow_of_2): New function.
(aarch64_vec_fpconst_pow_of_2): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
prototype.
(aarch64_vec_fpconst_pow_of_2): Likewise.
* config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
(aarch64_fp_vec_pow2): Likewise.

2015-10-20  Kyrylo Tkachov  

* gcc.target/aarch64/fmul_fcvt_1.c: New test.
* gcc.target/aarch64/fmul_fcvt_2.c: Likewise.

commit dc55192dd5f54f69d659ea6cc703c5fc2dc9b88b
Author: Kyrylo Tkachov 
Date:   Thu Oct 8 15:17:47 2015 +0100

[AArch64] Add fmul+fcvt optimisation

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index a8ac8d3..309dcfb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -294,12 +294,14 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
 enum reg_class aarch64_regno_regclass (unsigned);
 int aarch64_asm_preferred_eh_data_format (int, int);
+int aarch64_fpconst_pow_of_2 (rtx);
 machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned,
 		   machine_mode);
 int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
 int aarch64_hard_regno_nregs (unsigned, machine_mode);
 int aarch64_simd_attr_length_move (rtx_insn *);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
+int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
 const char *aarch64_output_move_struct (rtx *operands);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 167277e..cf1ff6d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1654,6 +1654,26 @@ (define_insn "l2"
   [(set_attr "type" "neon_fp_to_int_")]
 )
 
+(define_insn "*aarch64_fcvt2_mult"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(FIXUORS: (unspec:
+			   [(mult:VDQF
+	 (match_operand:VDQF 1 "register_operand" "w")
+	 (match_operand:VDQF 2 "aarch64_fp_vec_pow2" ""))]
+			   UNSPEC_FRINTZ)))]
+  "TARGET_SIMD
+   && IN_RANGE (aarch64_vec_fpconst_pow_of_2 (operands[2]), 1,
+		GET_MODE_BITSIZE (GET_MODE_INNER (mode)))"
+  {
+int fbits = aarch64_vec_fpconst_pow_of_2 (operands[2]);
+char buf[64];
+snprintf (buf, 64, "fcvtz\\t%%0., %%1., #%d", fbits);
+output_asm_insn (buf, operands);
+return "";
+  }
+  [(set_attr "type" "neon_fp_to_int_")]
+)
+
 (define_expand "2"
   [(set (match_operand: 0 "register_operand")
 	(FIXUORS: (unspec:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e304d40..17e59e1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6791,6 +6791,19 @@ cost_plus:
 	  else
 	*cost += extra_cost->fp[GET_MODE (x) == DFmode].toint;
 	}
+
+  /* We can combine fmul by a power of 2 followed by a fcvt into a single
+	 fixed-point fcvt.  */
+  if (GET_CODE (x) == MULT
+	  && (

Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof

2015-10-20 Thread Bernd Schmidt



On 10/20/2015 05:31 PM, Martin Sebor wrote:

On 10/20/2015 07:20 AM, Bernd Schmidt wrote:

On 10/16/2015 09:34 PM, Martin Sebor wrote:

Thank you for the review. Attached is an updated patch that hopefully
addresses all your comments. I ran the check_GNU_style.sh script on
it to make sure I didn't miss something.  I've also added replies to
a few of your comments below.


Ok, thanks. However - the logic around the context struct in the patch
still seems fairly impenetrable to me, and I've been wondering whether a
simpler approach wouldn't work just as well. I came up with the patch
below, which just passes a tree_code context to recursive invocations
and increasing the upper bound by one only if not in an ARRAY_REF or
COMPONENT_REF context.

As far as I can tell, this produces the same warnings on the testcase
(modulo differences in type printout), except for the final few FA5_7
testcases, which I had doubts about anyway:

typedef struct FA5_7 {
   int i;
   char a5_7 [5][7];
} FA5_7;

 __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning
"index" }
 __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning
"index" }
 __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning
"index" }
 __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning
"index" }

Why wouldn't at least the first two be convered by the
one-past-the-end-is-ok rule?


Thanks for taking the time to try to simplify the patch!

Diagnosing the cases above is at the core of the issue (and can
be seen by substituting a5_7 into the oversimplified test case
in the bug).

In a 5 by 7 array, the offset computed for the out-of-bounds
a_5_7[0][7] is the same as the offset computed for the in-bounds
a_5_7[1][0].  The result might be unexpected because it suggests
that two "distinct" elements of an array share the same offset.


Yeah, but Joseph wrote:

  Given such a flexible array member, [anything][0] and
  [anything][1] are logically valid if [anything] is
  nonnegative (of course, the array might or might not
  have enough element at runtime).  [anything][2] is a
  reference to just past the end of an element of the
  flexible array member; [anything][3] is clearly invalid.

  Regarding the actual testcase, accepting a1_1[0][1] seems
  fine (it's referencing the just-past-end element of the
  valid array a1_1[0]).

So how's that different from a5_7[0][7]? In case it matters, a1_1 in the 
earlier test that was discussed could not be a flexible array because it 
was followed by another field in the same struct.


Handing over review duties to Joseph because he'll have to decide what 
to warn about and what not to.



struct S {
int A [5][7];
int x;
} s;

these should both be diagnosed:

int i = offsetof (struct S, A [0][7]);

int *p = &s.A [0][7];

because they are both undefined and both can lead to surprising
results when used.


Is that really undefined? I thought it was explicitly allowed. 
Dereferencing the pointer is what's undefined.



Bernd

Bernd


[PATCH][ARM] Fix for testcase after r228661

2015-10-20 Thread Andre Vieira

Hi,

This patch addresses PR-67948 by changing the xor-and.c test, initially 
written for a simplify-rtx pattern, to make it pass post r228661 (see 
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00676.html). This test no 
longer triggered the simplify-rtx pattern it was written for prior to 
r228661, though other optimizations did lead to the same assembly the 
test checked for. The optimization added with r228661 matches the 
pattern used in the test and optimizes it to a better and still valid 
sequence. Being unable to easily change the test to trigger the original 
simplify-rtx pattern, I chose to change it to pass with the new produced 
assembly sequence.


This is correct because the transformation is valid and it yields a more 
efficient pattern. However, as I pointed out before this test doesn't 
test the optimization it originally was intended for.


Tested by running regression tests for armv6.

Is this OK to commit?

Thanks,
Andre
From 89922547118e716b41ddf6edefb274322193f25c Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Thu, 15 Oct 2015 12:48:26 +0100
Subject: [PATCH] Fix for xor-and.c test

---
 gcc/testsuite/gcc.target/arm/xor-and.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/xor-and.c b/gcc/testsuite/gcc.target/arm/xor-and.c
index 53dff85f8f780fb99a93bbcc24180a3d0d5d3be9..3715530cd7bf9ad8abb24cb21cd51ae3802079e8 100644
--- a/gcc/testsuite/gcc.target/arm/xor-and.c
+++ b/gcc/testsuite/gcc.target/arm/xor-and.c
@@ -10,6 +10,6 @@ unsigned short foo (unsigned short x)
   return x;
 }
 
-/* { dg-final { scan-assembler "orr" } } */
+/* { dg-final { scan-assembler "eor" } } */
 /* { dg-final { scan-assembler-not "mvn" } } */
 /* { dg-final { scan-assembler-not "uxth" } } */
-- 
1.9.1



Re: [PATCH][AArch64] Enable fusion of AES instructions

2015-10-20 Thread Marcus Shawcroft
On 14 October 2015 at 13:30, Wilco Dijkstra  wrote:
> Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. 
> This can give up to 2x
> speedup on many AArch64 implementations. Also model the crypto instructions 
> on Cortex-A57 according
> to the Optimization Guide.
>
> Passes regression tests.
>
> ChangeLog:
> 2015-10-14  Wilco Dijkstra  
>
> * gcc/config/aarch64/aarch64.c (cortexa53_tunings): Add AES fusion.
> (cortexa57_tunings): Likewise.
> (cortexa72_tunings): Likewise.
> (arch_macro_fusion_pair_p): Add support for AES fusion.
> * gcc/config/aarch64/aarch64-fusion-pairs.def: Add AES_AESMC entry.

These AArch64 changes are OK

> * gcc/config/arm/aarch-common.c (aarch_crypto_can_dual_issue):
> Allow virtual registers before reload so early scheduling works.
> * gcc/config/arm/cortex-a57.md (cortex_a57_crypto_simple): Use
> correct latency and pipeline.
> (cortex_a57_crypto_complex): Likewise.
> (cortex_a57_crypto_xor): Likewise.
> (define_bypass): Add AES bypass.

... but wait for Ramana or Kyrill to respond on these changes before you commit.

Cheers
/Marcus


Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation

2015-10-20 Thread Marcus Shawcroft
On 20 October 2015 at 16:47, Kyrill Tkachov  wrote:

> Here's the patch updated as per your feedback.
>
> How's this?
>
> Thanks,
> Kyrill
>
> 2015-10-20  Kyrylo Tkachov  
>
> * config/aarch64/aarch64.md
> (*aarch64_fcvt2_mult): New pattern.
> * config/aarch64/aarch64-simd.md
> (*aarch64_fcvt2_mult): Likewise.
> * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
> (aarch64_fpconst_pow_of_2): New function.
> (aarch64_vec_fpconst_pow_of_2): Likewise.
> * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
> prototype.
> (aarch64_vec_fpconst_pow_of_2): Likewise.
> * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
> (aarch64_fp_vec_pow2): Likewise.
>
> 2015-10-20  Kyrylo Tkachov  
>
>
> * gcc.target/aarch64/fmul_fcvt_1.c: New test.
> * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
>

OK /Marcus


Re: [PATCH][Testsuite] Turn on 64-bit-vector tests for AArch64

2015-10-20 Thread Marcus Shawcroft
On 16 October 2015 at 12:26, Alan Lawrence  wrote:
> This enables tests bb-slp-11.c and bb-slp-26.c for AArch64. Both of these are
> currently passing on little- and big-endian.
>
> (Tested on aarch64-none-linux-gnu and aarch64_be-none-elf).
>
> OK for trunk?
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp (check_effective_target_vect64): Add 
> AArch64.
> ---
>  gcc/testsuite/lib/target-supports.exp | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 3088369..bd03108 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -4762,6 +4762,7 @@ proc check_effective_target_vect64 { } {
>  if { ([istarget arm*-*-*]
>   && [check_effective_target_arm_neon_ok]
>   && [check_effective_target_arm_little_endian])
> +|| [istarget aarch64*-*-*]
>   || [istarget sparc*-*-*] } {
> set et_vect64_saved 1
>  }
> --
> 1.9.1
>

OK /Marcus


RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-10-20 Thread Ajit Kumar Agarwal
Hello Jeff:

Did you get a chance to look at the below response. Please let me know your 
opinion on the below.

Thanks & Regards
Ajit

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Saturday, September 12, 2015 4:09 PM
To: Jeff Law; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation



-Original Message-
From: Jeff Law [mailto:l...@redhat.com]
Sent: Thursday, September 10, 2015 3:10 AM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 08/26/2015 11:29 PM, Ajit Kumar Agarwal wrote:
>
> Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c
>
> void dct_unquantize_h263_inter_c (short *block, int n, int qscale, int
> nCoeffs) { int i, level, qmul, qadd;
>
> qadd = (qscale - 1) | 1; qmul = qscale << 1;
>
> for (i = 0; i <= nCoeffs; i++) { level = block[i]; if (level < 0) 
> level = level * qmul - qadd; else level = level * qmul + qadd; 
> block[i] = level; } }
>
> The above Loop is a candidate of path splitting as the IF block merges 
> at the latch of the Loop and the path splitting duplicates The latch 
> of the loop which is the statement block[i] = level into the 
> predecessors THEN and ELSE block.
>
> Due to above path splitting,  the IF conversion is disabled and the 
> above IF-THEN-ELSE is not IF-converted and the test case fails.
>>So I think the question then becomes which of the two styles generally 
>>results in better code?  The path-split version or the older if-converted 
>>version.

>>If the latter, then this may suggest that we've got the path splitting code 
>>at the wrong stage in the optimizer pipeline or that we need better 
>>heuristics for >>when to avoid applying path splitting.

The code generated by the Path Splitting is useful when it exposes the DCE, 
PRE,CCP candidates. Whereas the IF-conversion is useful When the if-conversion 
exposes the vectorization candidates. If the  if-conversion doesn't exposes the 
vectorization and the path splitting doesn't Exposes the DCE, PRE redundancy 
candidates, it's hard to predict. If the if-conversion does not exposes the 
vectorization and in the similar case Path splitting exposes the DCE , PRE  and 
CCP redundancy candidates then path splitting is useful. Also the path 
splitting increases the granularity of the THEN and ELSE path makes better 
register allocation and code scheduling.

The suggestion for keeping the path splitting later in the pipeline after the 
if-conversion and the vectorization is useful as it doesn't break the Existing 
Deja GNU tests. Also useful to keep the path splitting later in the pipeline 
after the if-conversion and vectorization is that path splitting Can always 
duplicate the merge node into its predecessor after the if-conversion and 
vectorization pass, if the if-conversion and vectorization Is not applicable to 
the Loops. But this suppresses the CCP, PRE candidates which are earlier in the 
optimization pipeline.


>
> There were following review comments from the above patch.
>
> +/* This function performs the feasibility tests for path splitting
>> +   to perform. Return false if the feasibility for path splitting
>> +   is not done and returns true if the feasibility for path
>> splitting +   is done. Following feasibility tests are performed.
>> + +   1. Return false if the join block has rhs casting for assign
>> +  gimple statements.
>
> Comments from Jeff:
>
>>> These seem totally arbitrary.  What's the reason behind each of 
>>> these restrictions?  None should be a correctness requirement 
>>> AFAICT.
>
> In the above patch I have made a check given in point 1. in the loop 
> latch and the Path splitting is disabled and the IF-conversion happens 
> and the test case passes.
>>That sounds more like a work-around/hack.  There's nothing inherent with a 
>>type conversion that should disable path splitting.

I have sent the patch with this change and I will remove the above check from 
the patch.

>>What happens if we delay path splitting to a point after if-conversion is 
>>complete?

This is better suggestion as explained above, but gains achieved through path 
splitting by keeping earlier in the pipeline before if-conversion , 
tree-vectorization, tree-vrp is suppressed if the following optimization after 
path splitting is not applicable for the above loops.

I have made the above changes and the existing set up doesn't break but the 
gains achieved in the benchmarks like rgbcmy_lite(EEMBC) Benchmarks is 
suppressed. The path splitting for the above EEMBC benchmarks give gains of 9% 
and for such loops if-conversion and Vectorization is not applicable  expo

Re: GCC 6 Status Report (2015-10-16)

2015-10-20 Thread Jeff Law

On 10/20/2015 09:42 AM, Pierre-Marie de Rodat wrote:

Hi Richard,

On 10/16/2015 05:09 AM, Richard Biener wrote:

This means it is time to get things you want to have in GCC 6 finalized
and reviewed.  As usual there may be exceptions to late reviewed features
but don't count on that.  Likewise target specific features can sneak in
during Stage 3 if maintainers ok them.


This is just to point out I have series of DWARF-related patches
awaiting reviews for several months. I plan to rebase against trunk,
retest and resubmit them shortly but here’s an overview:

   * The patch series for transition to standard DWARF for Ada
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01857.html). There are 8
patches, each one depending on the previous one, except the 6/8 one
(https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01361.html) which could
go on its own.

   * Fix DW_AT_static_link generation in DWARF
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01855.html). The
corresponding support in GDB is in the development branch.

   * Track indirect calls as call site information in DWARF
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01856.html).

   * Materialize subprogram renamings in Ada as imported declarations in
DWARF (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01854.html). The
corresponding support in GDB is in the development branch.
I've been leaving all that stuff to Jason & Jakub.  My knowledge of 
dwarf, both the specs and our implementation is so limited that my 
review would be worthless.


jeff


Re: [PATCH][AArch64] Add support for 64-bit vector-mode ldp/stp

2015-10-20 Thread Marcus Shawcroft
On 16 October 2015 at 13:58, Kyrill Tkachov  wrote:
> Hi all,
>
> We already support load/store-pair operations on the D-registers when they
> contain an FP value, but the peepholes/sched-fusion machinery that
> do all the hard work currently ignore 64-bit vector modes.
>
> This patch adds support for fusing loads/stores of 64-bit vector operands
> into ldp and stp instructions.
> I've seen this trigger a few times in SPEC2006. Not too many times, but the
> times it did trigger the code seemed objectively better
> i.e. long sequences of ldr and str instructions essentially halved in size.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2015-10-16  Kyrylo Tkachov  
>
> * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p):

We have several different flavours of fusion in the backend, this one
is specifically load/stores, perhaps making that clear in the name of
this predicate will avoid confusion further down the line?

> New function.
> (fusion_load_store): Use it.
> * config/aarch64/aarch64-ldpstp.md: Add new peephole2s for
> ldp and stp in VD modes.
> * config/aarch64/aarch64-simd.md (load_pair, VD): New pattern.
> (store_pair, VD): Likewise.
>
> 2015-10-16  Kyrylo Tkachov  
>
> * gcc.target/aarch64/stp_vec_64_1.c: New test.
> * gcc.target/aarch64/ldp_vec_64_1.c: New test.

Otherwise OK /Marcus


Re: [PATCH][AArch64] Fix insn types

2015-10-20 Thread Evandro Menezes

Kyrill,

Indeed, the correct log would be:

The type assigned to some insn definitions was not correct.

gcc/
* config/aarch64/aarch64.md
(*movhf_aarch64): Change the type of "mov %0.h[0], %1.h[0] to 
"neon_move".

(*movtf_aarch64): Change the type of "fmov %s0, wzr" to "f_mcr".
(*cmov_insn): Change the types of "mov %0, {-1,1}" to 
"mov_imm".

(*cmovsi_insn_uxtw): Likewise.

Thank you,

--
Evandro Menezes

On 10/20/2015 05:59 AM, Kyrill Tkachov wrote:

Hi Evandro,

On 19/10/15 22:05, Evandro Menezes wrote:

The type assigned to some insn definitions was seemingly not correct:

 * "movi %d0, %1" was of type "fmov"
 * "fmov %s0, wzr" was of type "fconstd"
 * "mov %0, {-1,1}" were of type "csel"

This patch changes their types to:

 * "movi %d0, %1" to type "neon_move"
 * "fmov %s0, wzr" to type "f_mcr"
 * "mov %0, {-1,1}" to type "mov_imm"

Please, commit if it's alright.

Thank you,



Looking at your ChangeLog...

gcc/
* config/aarch64/aarch64.md
(*movdi_aarch64): Change the type of "movi %d0, %1" to "neon_move".
(*movtf_aarch64): Change the type of "fmov %s0, wzr" to "f_mcr".
(*cmov_insn): Change the types of "mov %0, {-1,1}" to 
"mov_imm".

(*cmovsi_insn_uxtw): Idem

The preferred form is "Likewise" rather than "Idem" AFAIK. Also, full 
stop at the end.


--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1130,7 +1130,7 @@
ldrh\\t%w0, %1
strh\\t%w1, %0
mov\\t%w0, %w1"
-  [(set_attr "type" "neon_from_gp,neon_to_gp,fmov,\
+  [(set_attr "type" "neon_from_gp,neon_to_gp,neon_move,\
  f_loads,f_stores,load1,store1,mov_reg")
(set_attr "simd" "yes,yes,yes,*,*,*,*,*")
(set_attr "fp"   "*,*,*,yes,yes,*,*,*")]


I don't think this matches up with your changelog entry.
This isn't the *movdi_aarch64 pattern. From what I can see the 
*movdi_aarch64 pattern
already has the type neon_move on the movi\\t%d0, %1 alternative (the 
last one).


In fact, if I apply your patch using "patch -p1" I see it being 
applied to the *movhf_aarch64 pattern.

Is that what you intended?

Thanks,
Kyrill






Re: [PATCH 1/9] ENABLE_CHECKING refactoring

2015-10-20 Thread Jeff Law

On 10/18/2015 12:17 AM, Mikhail Maltsev wrote:

On 10/12/2015 11:57 PM, Jeff Law wrote:

-#ifdef ENABLE_CHECKING
+#if CHECKING_P


I fail to see the point of this change.

I'm guessing (and Mikhail, please correct me if I'm wrong), but I think he's
trying to get away from ENABLE_CHECKING and instead use a macro which is
always defined to a value.

Yes, exactly. Such macro is better because it can be used both for conditional
compilation (if needed) and normal if-s (unlike ENABLE_CHECKING).

On 10/14/2015 12:33 AM, Jeff Law wrote:

   gcc/common.opt  | 5 +
   gcc/system.h| 3 +++
   libcpp/system.h | 8 
   3 files changed, 16 insertions(+)

I committed this prerequisite patch to the trunk.

jeff



There is a typo here:


#ifdef ENABLE_CHECKING
#define gcc_checking_assert(EXPR) gcc_assert (EXPR)
+#define CHECKING_P 1
#else
+/* N.B.: in release build EXPR is not evaluated.  */
#define gcc_checking_assert(EXPR) ((void)(0 && (EXPR)))
+#define CHECKING_P 1
#endif


In my original patch the second definition actually was
'+#define CHECKING_P 0'

Not sure how that happened.  I'll take care of it.


I'd actually planned to start extracting the non-controversial stuff out 
of the later patches, but just ran out of time before going onto PTO.




Also, gcc_checking_assert in libcpp requires gcc_assert to be defined. That was
missing in my original patch (and was added in patch 2/9), but I think it would
be better to fix it here, as Bernd noticed (in the last paragraph of [1]).

Sounds wise.  Agreed.



Besides, I like Richard's proposal [2] about moving CHECKING_P macros into
configure.ac.

[1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00550.html
[2] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00555.html

It required minor tweaking in order to silence autotools' warnings. I attached
the modified patch.
When I tried that, I couldn't get it to work.  My autotools-fu is quite 
limited.




OK for trunk (after bootstrap/regtest)?

I'll take a closer look later today, after my morning meetings.



P.S. I am planning to post at least some of the other updated parts today, and I
also hope to get in time with the whole series (before stage1 ends).

Excellent.

jeff


Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops

2015-10-20 Thread Kyrill Tkachov


On 19/10/15 15:37, Kyrill Tkachov wrote:

Hi Bernd,

On 19/10/15 15:31, Bernd Schmidt wrote:

On 10/19/2015 03:57 PM, Kyrill Tkachov wrote:

This second patch teaches simplify_binary_operation to return the
dereferenced
constants from the constant pool in the binary expression if other
simplifications failed.

This, combined with the 1/2 patch for aarch64
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01744.html) allow for:

int
foo (float a)
{
   return a * 32.0f;
}

to generate the code:
foo:
 fcvtzs  w0, s0, #5
 ret

because combine now successfully tries to match:
(set (reg/i:SI 0 x0)
 (fix:SI (mult:SF (reg:SF 32 v0 [ a ])
 (const_double:SF 3.2e+1 [0x0.8p+6]

whereas before it would not try the to use the const_double directly
but rather its constant pool reference.


The only way I could see a problem with that if there are circumstances where 
the memory variant would simplify further. That doesn't seem highly likely, 
so...



I that were the case, I'd expect the earlier call to 
simplify_binary_operation_1 have returned a non-NULL rtx,
and the code in this patch would not come into play.


 * simplify-rtx.c (simplify_binary_operation): If either operand was
 a constant pool reference use them if all other simplifications
failed.


Ok.


Thanks,
I'll commit it when the first (aarch64-specific) patch is approved.



This patch has already been ok'd, but I'd like to update the aarch64 test to 
use scan-assembler-times
rather that just scan-assembler in light of Marcus' feedback to another patch.

I committed this version to trunk with r229086.

Thanks,
Kyrill

2015-10-20  Kyrylo Tkachov  

* simplify-rtx.c (simplify_binary_operation): If either operand was
a constant pool reference use them if all other simplifications failed.

2015-10-20  Kyrylo Tkachov  

* gcc.target/aarch64/fmul_fcvt_1.c: Add multiply-by-32 cases.


Kyrill




Bernd





Index: gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c
===
--- gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c	(revision 229085)
+++ gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c	(working copy)
@@ -83,6 +83,16 @@
 /* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#4" 1 } } */
 /* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#4" 1 } } */
 
+FUNC_DEFS (32)
+FUNC_DEFD (32)
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], s\[0-9\]*.*#5" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], s\[0-9\]*.*#5" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], d\[0-9\]*.*#5" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], d\[0-9\]*.*#5" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], s\[0-9\]*.*#5" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], s\[0-9\]*.*#5" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#5" 1 } } */
+/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#5" 1 } } */
 
 #define FUNC_TESTS(__a, __b)	\
 do\
@@ -120,10 +130,12 @@
   FUNC_TESTS (4, i);
   FUNC_TESTS (8, i);
   FUNC_TESTS (16, i);
+  FUNC_TESTS (32, i);
 
   FUNC_TESTD (4, i);
   FUNC_TESTD (8, i);
   FUNC_TESTD (16, i);
+  FUNC_TESTD (32, i);
 }
   return 0;
 }
Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c	(revision 229084)
+++ gcc/simplify-rtx.c	(working copy)
@@ -2001,7 +2001,17 @@
   tem = simplify_const_binary_operation (code, mode, trueop0, trueop1);
   if (tem)
 return tem;
-  return simplify_binary_operation_1 (code, mode, op0, op1, trueop0, trueop1);
+  tem = simplify_binary_operation_1 (code, mode, op0, op1, trueop0, trueop1);
+
+  if (tem)
+return tem;
+
+  /* If the above steps did not result in a simplification and op0 or op1
+ were constant pool references, use the referenced constants directly.  */
+  if (trueop0 != op0 || trueop1 != op1)
+return simplify_gen_binary (code, mode, trueop0, trueop1);
+
+  return NULL_RTX;
 }
 
 /* Subroutine of simplify_binary_operation.  Simplify a binary operation


[PATCH] Move some Cilk+ code around

2015-10-20 Thread Marek Polacek
The function is_cilkplus_vector_p is defined both in c-parser.c and parser.c
and is exactly the same so it seems that it should rather be defined in the
common Cilk+ code (even though for such a small static inline fn it probably
doesn't matter much).  While at it, fix some typos and simplify the code a bit.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-10-20  Marek Polacek  

* array-notation-common.c (is_cilkplus_vector_p): Define.
* c-common.h (is_cilkplus_vector_p): Declare.

* c-parser.c (is_cilkplus_vector_p): Don't define here.

* parser.c (is_cilkplus_vector_p): Don't define here.

diff --git gcc/c-family/array-notation-common.c 
gcc/c-family/array-notation-common.c
index 85ded8d..6b55747 100644
--- gcc/c-family/array-notation-common.c
+++ gcc/c-family/array-notation-common.c
@@ -676,3 +676,12 @@ fix_sec_implicit_args (location_t loc, vec  
*list,
   vec_safe_push (array_operand, (*list)[ii]);
   return array_operand;
 }
+
+/* Returns true if NAME is an IDENTIFIER_NODE with identifier "vector",
+   "__vector", or "__vector__".  */
+
+bool
+is_cilkplus_vector_p (tree name)
+{
+  return flag_cilkplus && is_attribute_p ("vector", name);
+}
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index cf44482..4b5cac8 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -1435,6 +1435,7 @@ extern void cilkplus_extract_an_triplets (vec *, size_t, size_t,
  vec > *);
 extern vec  *fix_sec_implicit_args
   (location_t, vec  *, vec, size_t, tree);
+extern bool is_cilkplus_vector_p (tree);
 
 /* In cilk.c.  */
 extern tree insert_cilk_frame (tree);
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 704ebc6..a5c1d84 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -3848,17 +3848,6 @@ c_parser_attribute_any_word (c_parser *parser)
   return attr_name;
 }
 
-/* Returns true of NAME is an IDENTIFIER_NODE with identiifer "vector,"
-   "__vector" or "__vector__."  */
-
-static inline bool
-is_cilkplus_vector_p (tree name)
-{ 
-  if (flag_cilkplus && is_attribute_p ("vector", name)) 
-return true;
-  return false;
-}
-
 #define CILK_SIMD_FN_CLAUSE_MASK \
((OMP_CLAUSE_MASK_1 << PRAGMA_CILK_CLAUSE_VECTORLENGTH)   \
 | (OMP_CLAUSE_MASK_1 << PRAGMA_CILK_CLAUSE_LINEAR)   \
diff --git gcc/cp/parser.c gcc/cp/parser.c
index f07a5e4..cfe5468 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -23225,17 +23225,6 @@ cp_parser_gnu_attributes_opt (cp_parser* parser)
   return attributes;
 }
 
-/* Returns true of NAME is an IDENTIFIER_NODE with identiifer "vector,"
-   "__vector" or "__vector__."  */
-
-static inline bool
-is_cilkplus_vector_p (tree name)
-{ 
-  if (flag_cilkplus && is_attribute_p ("vector", name)) 
-return true;
-  return false;
-}
-
 /* Parse a GNU attribute-list.
 
attribute-list:

Marek


Re: [PATCH][ARM] Fix for testcase after r228661

2015-10-20 Thread Ramana Radhakrishnan
On Tue, Oct 20, 2015 at 4:52 PM, Andre Vieira
 wrote:
> Hi,
>
> This patch addresses PR-67948 by changing the xor-and.c test, initially
> written for a simplify-rtx pattern, to make it pass post r228661 (see
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00676.html). This test no
> longer triggered the simplify-rtx pattern it was written for prior to
> r228661, though other optimizations did lead to the same assembly the test
> checked for. The optimization added with r228661 matches the pattern used in
> the test and optimizes it to a better and still valid sequence. Being unable
> to easily change the test to trigger the original simplify-rtx pattern, I
> chose to change it to pass with the new produced assembly sequence.
>
> This is correct because the transformation is valid and it yields a more
> efficient pattern. However, as I pointed out before this test doesn't test
> the optimization it originally was intended for.
>
> Tested by running regression tests for armv6.
>
> Is this OK to commit?
>

Missing Changelog - please remember to put the PR number in the
Changelog in the correct format i.e PR testsuite/67948. Ok with that.

I suspect that the simplify-rtx.c is much less likely to trigger given
your match.pd change, but it will be fun proving that.

regards
ramana


> Thanks,
> Andre


Re: [PATCH][AArch64] Add support for 64-bit vector-mode ldp/stp

2015-10-20 Thread Kyrill Tkachov

Hi Marcus,

On 20/10/15 17:05, Marcus Shawcroft wrote:

On 16 October 2015 at 13:58, Kyrill Tkachov  wrote:

Hi all,

We already support load/store-pair operations on the D-registers when they
contain an FP value, but the peepholes/sched-fusion machinery that
do all the hard work currently ignore 64-bit vector modes.

This patch adds support for fusing loads/stores of 64-bit vector operands
into ldp and stp instructions.
I've seen this trigger a few times in SPEC2006. Not too many times, but the
times it did trigger the code seemed objectively better
i.e. long sequences of ldr and str instructions essentially halved in size.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2015-10-16  Kyrylo Tkachov  

 * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p):

We have several different flavours of fusion in the backend, this one
is specifically load/stores, perhaps making that clear in the name of
this predicate will avoid confusion further down the line?

Thanks for the review,

This particular type of fusion is called sched_fusion in various
places in the compiler and its implementation in aarch64 is only for
load/store merging (indeed, the only usage of sched_fusion currently
is to merge loads/stores in arm and aarch64).

So, I think that sched_fusion in the name already conveys the information
that it's the ldp/stp one rather than macro fusion. In fact, there is a
macro fusion of ADRP and an LDR instruction,
so having sched_fusion in the name is actually a better differentiator than
mentioning loads/stores as both types of fusion deal with loads in some way.

Is it ok to keep the name as is?

Thanks,
Kyrill




 New function.
 (fusion_load_store): Use it.
 * config/aarch64/aarch64-ldpstp.md: Add new peephole2s for
 ldp and stp in VD modes.
 * config/aarch64/aarch64-simd.md (load_pair, VD): New pattern.
 (store_pair, VD): Likewise.

2015-10-16  Kyrylo Tkachov  

 * gcc.target/aarch64/stp_vec_64_1.c: New test.
 * gcc.target/aarch64/ldp_vec_64_1.c: New test.

Otherwise OK /Marcus





Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation

2015-10-20 Thread Ramana Radhakrishnan
On Tue, Oct 20, 2015 at 4:26 PM, Marcus Shawcroft
 wrote:
> On 19 October 2015 at 14:57, Kyrill Tkachov  wrote:
>
>> 2015-10-19  Kyrylo Tkachov  
>>
>> * config/aarch64/aarch64.md
>>  (*aarch64_fcvt2_mult): New pattern.
>> * config/aarch64/aarch64-simd.md
>>  (*aarch64_fcvt2_mult): Likewise.
>> * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
>> (aarch64_fpconst_pow_of_2): New function.
>> (aarch64_vec_fpconst_pow_of_2): Likewise.
>> * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
>> prototype.
>> (aarch64_vec_fpconst_pow_of_2): Likewise.
>> * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
>> (aarch64_fp_vec_pow2): Likewise.
>>
>> 2015-10-19  Kyrylo Tkachov  
>>
>> * gcc.target/aarch64/fmul_fcvt_1.c: New test.
>> * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
>
> +char buf[64];
> +sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits);
>
> Prefer snprintf please

Should we also update our coding standards for this ? i.e. use the non
`n' versions of the string functions only if you have a very good
reason. Even more so in the run time support libraries !

Maybe we can go to the extreme of poisoning sprintf too, but that's
probably an extreme...

regards
Ramana


>
> +  }
> +  [(set_attr "type" "neon_fp_to_int_")]
> +)
> +
> +
>
> Superflous blank line here ?
>
> +  *cost += rtx_cost (XEXP (x, 0), VOIDmode,
> + (enum rtx_code) code, 0, speed);
>
> My understanding is the unnecessary use of enum  is now discouraged,
> (rtx_code) is sufficient in this case.
>
> +  int count = CONST_VECTOR_NUNITS (x);
> +  int i;
> +  for (i = 1; i < count; i++)
>
> Push the int into the for initializer.
> Push the rhs of the count assignment into the for condition and drop
> the definition of count.
>
> +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */
>
> I'd prefer scan-assembler-times or do you have a particular reason to
> avoid it in these tests?
>
> Cheers
> /Marcus


[committed, PATCH] Add --enable-compressed-debug-sections={all,gas,gold,ld}

2015-10-20 Thread H.J. Lu
Add --enable-compressed-debug-sections={all,gas,gold,ld}

This patch removes the gas configure option:

--enable-compressed-debug-sections

and adds a toplevel configure option:

--enable-compressed-debug-sections={all,gas,gold,ld}

to enable compressed debug sections for gas, gold or ld by default.  At
the moment, this configure option is ignored by gold and ld.  For x86
Linux targets, default to compressing debug sections in gas.

Sync with binutils-gdb:
PR gas/19109
* configure.ac: Add
--enable-compressed-debug-sections={all,gas,gold,ld}.
* configure: Regenerated.

diff --git a/ChangeLog b/ChangeLog
index 1d9600a..2c19e94 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2015-10-20  H.J. Lu  
+
+   Sync with binutils-gdb:
+   2015-10-20  H.J. Lu  
+
+   PR gas/19109
+   * configure.ac: Add
+   --enable-compressed-debug-sections={all,gas,gold,ld}.
+   * configure: Regenerated.
+
 2015-10-16  Arnaud Charlet  
 
* MAINTAINERS: Update list of Ada maintainers and email addresses.
diff --git a/configure b/configure
index eca5e6f..f66f424 100755
--- a/configure
+++ b/configure
@@ -753,6 +753,7 @@ enable_as_accelerator_for
 enable_offload_targets
 enable_gold
 enable_ld
+enable_compressed_debug_sections
 enable_libquadmath
 enable_libquadmath_support
 enable_libada
@@ -1476,6 +1477,9 @@ Optional Features:
   offload target compiler during the build
   --enable-gold[=ARG] build gold [ARG={default,yes,no}]
   --enable-ld[=ARG]   build ld [ARG={default,yes,no}]
+  --enable-compressed-debug-sections={all,gas,gold,ld}
+  Enable compressed debug sections for gas, gold or ld
+  by default
   --disable-libquadmath   do not build libquadmath directory
   --disable-libquadmath-support
   disable libquadmath support for Fortran
@@ -3013,6 +3017,21 @@ $as_echo "$as_me: WARNING: neither ld nor gold are 
enabled" >&2;}
 ;;
 esac
 
+# PR gas/19109
+# Decide the default method for compressing debug sections.
+# Provide a configure time option to override our default.
+# Check whether --enable-compressed_debug_sections was given.
+if test "${enable_compressed_debug_sections+set}" = set; then :
+  enableval=$enable_compressed_debug_sections;
+  if test x"$enable_compressed_debug_sections" = xyes; then
+as_fn_error "no program with compressed debug sections specified" 
"$LINENO" 5
+  fi
+
+else
+  enable_compressed_debug_sections=
+fi
+
+
 # Configure extra directories which are host specific
 
 case "${host}" in
diff --git a/configure.ac b/configure.ac
index 9241261..cb6ca24 100644
--- a/configure.ac
+++ b/configure.ac
@@ -393,6 +393,19 @@ case "${ENABLE_LD}" in
 ;;
 esac
 
+# PR gas/19109
+# Decide the default method for compressing debug sections.
+# Provide a configure time option to override our default.
+AC_ARG_ENABLE(compressed_debug_sections,
+[AS_HELP_STRING([--enable-compressed-debug-sections={all,gas,gold,ld}],
+   [Enable compressed debug sections for gas, gold or ld by
+default])],
+[
+  if test x"$enable_compressed_debug_sections" = xyes; then
+AC_MSG_ERROR([no program with compressed debug sections specified])
+  fi
+], [enable_compressed_debug_sections=])
+
 # Configure extra directories which are host specific
 
 case "${host}" in


patch to fix PR67609

2015-10-20 Thread Vladimir Makarov

  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609

  The patch was bootstrapped and tested on x86/x86-64, arm, aarch64.  
It was also tested on ppc64 (unfortunately bootstrap with LRA is broken 
now).


Committed as rev. 229087.

Index: ChangeLog
===
--- ChangeLog	(revision 228844)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2015-10-20  Vladimir Makarov  
+
+	PR rtl-optimization/67609
+	* lra-splill.c (lra_final_code_change): Don't remove all
+	sub-registers.
+
 2015-10-15  Marek Polacek  
 
 	* tree-ssa-reassoc.c (attempt_builtin_copysign): Call
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 228844)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2015-10-20  Vladimir Makarov  
+
+	PR rtl-optimization/67609
+	* gcc.target/i386/pr67609.c: New.
+
 2015-10-15  Marek Polacek  
 
 	* gcc.dg/tree-ssa/reassoc-42.c: New test.
Index: lra-spills.c
===
--- lra-spills.c	(revision 228844)
+++ lra-spills.c	(working copy)
@@ -727,14 +727,44 @@ lra_final_code_change (void)
 	  lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
 	  struct lra_static_insn_data *static_id = id->insn_static_data;
 	  bool insn_change_p = false;
-
-	  for (i = id->insn_static_data->n_operands - 1; i >= 0; i--)
-	if ((DEBUG_INSN_P (insn) || ! static_id->operand[i].is_operator)
-		&& alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn)))
-	  {
-		lra_update_dup (id, i);
-		insn_change_p = true;
-	  }
+	  
+  for (i = id->insn_static_data->n_operands - 1; i >= 0; i--)
+	{
+	  if (! DEBUG_INSN_P (insn) && static_id->operand[i].is_operator)
+		continue;
+	  
+	  rtx op = *id->operand_loc[i];
+	  
+	  if (static_id->operand[i].type == OP_OUT
+		  && GET_CODE (op) == SUBREG && REG_P (SUBREG_REG (op))
+		  && ! LRA_SUBREG_P (op))
+		{
+		  hard_regno = REGNO (SUBREG_REG (op));
+		  /* We can not always remove sub-registers of
+		 hard-registers as we may lose information that
+		 only a part of registers is changed and
+		 subsequent optimizations may do wrong
+		 transformations (e.g. dead code eliminations).
+		 We can not also keep all sub-registers as the
+		 subsequent optimizations can not handle all such
+		 cases.  Here is a compromise which works.  */
+		  if ((GET_MODE_SIZE (GET_MODE (op))
+		   < GET_MODE_SIZE (GET_MODE (SUBREG_REG (op
+		  && (hard_regno_nregs[hard_regno][GET_MODE (SUBREG_REG (op))]
+			  == hard_regno_nregs[hard_regno][GET_MODE (op)])
+#ifdef STACK_REGS
+		  && (hard_regno < FIRST_STACK_REG
+			  || hard_regno > LAST_STACK_REG)
+#endif
+		  )
+		continue;
+		}
+	  if (alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn)))
+		{
+		  lra_update_dup (id, i);
+		  insn_change_p = true;
+		}
+	}
 	  if (insn_change_p)
 	lra_update_operator_dups (id);
 	}
Index: testsuite/gcc.target/i386/pr67609.c
===
--- testsuite/gcc.target/i386/pr67609.c	(revision 0)
+++ testsuite/gcc.target/i386/pr67609.c	(working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-final { scan-assembler "movdqa" } } */
+
+#include 
+__m128d reg;
+void set_lower(double b)
+{
+  double v[2];
+  _mm_store_pd(v, reg);
+  v[0] = b;
+  reg = _mm_load_pd(v);
+}


Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation

2015-10-20 Thread Kyrill Tkachov


On 20/10/15 17:28, Ramana Radhakrishnan wrote:

On Tue, Oct 20, 2015 at 4:26 PM, Marcus Shawcroft
 wrote:

On 19 October 2015 at 14:57, Kyrill Tkachov  wrote:


2015-10-19  Kyrylo Tkachov  

 * config/aarch64/aarch64.md
  (*aarch64_fcvt2_mult): New pattern.
 * config/aarch64/aarch64-simd.md
  (*aarch64_fcvt2_mult): Likewise.
 * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns.
 (aarch64_fpconst_pow_of_2): New function.
 (aarch64_vec_fpconst_pow_of_2): Likewise.
 * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare
 prototype.
 (aarch64_vec_fpconst_pow_of_2): Likewise.
 * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate.
 (aarch64_fp_vec_pow2): Likewise.

2015-10-19  Kyrylo Tkachov  

 * gcc.target/aarch64/fmul_fcvt_1.c: New test.
 * gcc.target/aarch64/fmul_fcvt_2.c: Likewise.

+char buf[64];
+sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits);

Prefer snprintf please

Should we also update our coding standards for this ? i.e. use the non
`n' versions of the string functions only if you have a very good
reason. Even more so in the run time support libraries !


My understanding is one should *really* use snprintf if any part of the
string comes from the user. So in this case it probably doesn't make a
difference. However, it's not hard to use the 'n' version and thinking
about the maximum length is probably a good practice to get into.

Kyrill



Maybe we can go to the extreme of poisoning sprintf too, but that's
probably an extreme...

regards
Ramana



+  }
+  [(set_attr "type" "neon_fp_to_int_")]
+)
+
+

Superflous blank line here ?

+  *cost += rtx_cost (XEXP (x, 0), VOIDmode,
+ (enum rtx_code) code, 0, speed);

My understanding is the unnecessary use of enum  is now discouraged,
(rtx_code) is sufficient in this case.

+  int count = CONST_VECTOR_NUNITS (x);
+  int i;
+  for (i = 1; i < count; i++)

Push the int into the for initializer.
Push the rhs of the count assignment into the for condition and drop
the definition of count.

+/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */

I'd prefer scan-assembler-times or do you have a particular reason to
avoid it in these tests?

Cheers
/Marcus




Re: [patch 4/6] scalar-storage-order merge: bulk

2015-10-20 Thread Eric Botcazou
> I suspect there are many comments that we should update as a result of
> this work.  Essentially there's many throughout GCC of the form "On a
> big-endian target" and the like.  With these changes it's more a
> property of the data -- the vast majority of the time the data's
> property is the same as the target, but with this patch it can vary.
> 
> I just happened to spot one in varasm.c::output_constructor_bitfield, as
> the comment started shortly after some code this patch changes.  No
> doubt there's others.  I'm torn whether or not to ask you to scan and
> update them.  It's a lot of work and I'm not terribly sure how valuable
> it'll generally be.

I have adjusted the output_constructor_bitfield case to "For big-endian data".
The cases in expmed.c have naked "big-endian" or "big-endian case" so are OK.
I have changed 2 sibling cases in expr.c but left the others unchanged since 
they are about calling conventions.  I think that's pretty much it for the 
files that are touched by the patch.

> Thanks for not trying too hard to optimize this stuff, it makes the
> expmed.c patches (for example) a lot easier to work through :-)
> 
> I didn't even try to verify you've got all the paths covered.  I mostly
> tried to make sure the patch didn't break existing code.  I'm going to
> assume that the patch essentially works and that if there's paths
> missing where reversal is needed that you'll take care of them as
> they're exposed.

Yes, the main issue in expr.c & expmed.c is getting all the paths covered.
I have run the compile and execute C torture testsuites with -fsso-struct set 
to big-endian on x86-64 so I'm relatively confident, but that's not a proof.
And one of the design constraints of the implementation was to change nothing 
to the default endianness support, so I'm more confident about the absence of 
breakages (IIRC we didn't get a single PR for a breakage with the AdaCore 
compilers while we did get a few for cases where the data was not reversed).

> I'm a bit dismayed at the number of changes to fold-const.c, but
> presumably there's no good way around them.  I probably would have
> looked for a way to punt earlier (that may not be trivial though).
> Given you've done the work, no need to undo it now.

Yes, that's the bitfield optimization stuff, so it depends on the endianness.
Initially the changes were even more pervasive because I tried to optimize 
e.g. equality comparisons of 2 bitfields with reverse endianness in there.
But this broke the "all accesses to a scalar in memory are done with the same 
storage order" invariant and was thus causing annoying issues later on.

> I think this patch is fine.

OK, thanks for the thorough review.  All the parts have been approved, modulo 
the C++ FE part.  As I already explained, this part is very likely incomplete 
(based on the changes that were made in the Ada FE proper) and only makes sure 
that C code compiled by the C++ compiler works as intended, but nothing more.
So I can propose to install only the generic, C and Ada changes for now and to 
explicitly disable the feature for C++ until a more complete implementation of 
the C++ FE part is written.  I'm OK to work on it but I'll probably need help.

-- 
Eric Botcazou


[patch] Fix failure of ACATS c45503c at -O2

2015-10-20 Thread Eric Botcazou
Hi,

this test started to fail recently as the result of the work of Richard S., 
but the underlying issue had been latent for a long time.  It boils down to 
this excerpt from the VRP1 dump file:

Found new range for _9: [0, 12]
marking stmt to be not simulated again

Visiting statement:
_3 = _9 %[fl] _11;
Found new range for _3: [0, +INF]

which is plain wrong since the sign of FLOOR_MOD_EXPR is the sign of the 
divisor and not that of the dividend (the latter is for TRUNC_MOD_EXPR).

I have attached a fix for mainline and another one for the 4.9/5 branches.

Tested on x86_64-suse-linux, OK for all active branches?


2015-10-20  Eric Botcazou  

* fold-const.c (tree_binary_nonnegative_warnv_p) :
Recurse on operand #1 instead of operand #0.
: Do not recurse.
: Likewise.

-- 
Eric BotcazouIndex: fold-const.c
===
--- fold-const.c	(revision 229022)
+++ fold-const.c	(working copy)
@@ -12982,11 +12982,15 @@ tree_binary_nonnegative_warnv_p (enum tr
   return RECURSE (op0) && RECURSE (op1);
 
 case TRUNC_MOD_EXPR:
-case CEIL_MOD_EXPR:
-case FLOOR_MOD_EXPR:
-case ROUND_MOD_EXPR:
+  /* The sign of the remainder is that of the dividend.  */
   return RECURSE (op0);
 
+case FLOOR_MOD_EXPR:
+  /* The sign of the remainder is that of the divisor.  */
+  return RECURSE (op1);
+
+case CEIL_MOD_EXPR:
+case ROUND_MOD_EXPR:
 default:
   return tree_simple_nonnegative_warnv_p (code, type);
 }
Index: fold-const.c
===
--- fold-const.c	(revision 228932)
+++ fold-const.c	(working copy)
@@ -14853,11 +14853,17 @@ tree_binary_nonnegative_warnv_p (enum tr
 		strict_overflow_p));
 
 case TRUNC_MOD_EXPR:
-case CEIL_MOD_EXPR:
-case FLOOR_MOD_EXPR:
-case ROUND_MOD_EXPR:
+  /* The sign of the remainder is that of the dividend.  */
   return tree_expr_nonnegative_warnv_p (op0,
 	strict_overflow_p);
+
+case FLOOR_MOD_EXPR:
+  /* The sign of the remainder is that of the divisor.  */
+  return tree_expr_nonnegative_warnv_p (op1,
+	strict_overflow_p);
+
+case CEIL_MOD_EXPR:
+case ROUND_MOD_EXPR:
 default:
   return tree_simple_nonnegative_warnv_p (code, type);
 }


Re: [PATCH][ARM] Fix for testcase after r228661

2015-10-20 Thread Andre Vieira



On 20/10/15 17:25, Ramana Radhakrishnan wrote:

On Tue, Oct 20, 2015 at 4:52 PM, Andre Vieira
 wrote:

Hi,

This patch addresses PR-67948 by changing the xor-and.c test, initially
written for a simplify-rtx pattern, to make it pass post r228661 (see
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00676.html). This test no
longer triggered the simplify-rtx pattern it was written for prior to
r228661, though other optimizations did lead to the same assembly the test
checked for. The optimization added with r228661 matches the pattern used in
the test and optimizes it to a better and still valid sequence. Being unable
to easily change the test to trigger the original simplify-rtx pattern, I
chose to change it to pass with the new produced assembly sequence.

This is correct because the transformation is valid and it yields a more
efficient pattern. However, as I pointed out before this test doesn't test
the optimization it originally was intended for.

Tested by running regression tests for armv6.

Is this OK to commit?



Missing Changelog - please remember to put the PR number in the
Changelog in the correct format i.e PR testsuite/67948. Ok with that.

I suspect that the simplify-rtx.c is much less likely to trigger given
your match.pd change, but it will be fun proving that.
Ideally, I'd like to prove the simplify-rtx.c couldn't be triggered 
anymore. Though I wouldn't know where to begin with that. I did spend a 
tiny effort trying to trigger it with a version before the match.pd 
change, but with no success.


regards
ramana



Thanks,
Andre



Here is the ChangeLog (with the PR), sorry for that!

gcc/testsuite/ChangeLog
2015-10-15  Andre Vieira  

PR testsuite/67948
* gcc.target/arm/xor-and.c: check for eor instead of orr.
From 89922547118e716b41ddf6edefb274322193f25c Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Thu, 15 Oct 2015 12:48:26 +0100
Subject: [PATCH] Fix for xor-and.c test

---
 gcc/testsuite/gcc.target/arm/xor-and.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/xor-and.c b/gcc/testsuite/gcc.target/arm/xor-and.c
index 53dff85f8f780fb99a93bbcc24180a3d0d5d3be9..3715530cd7bf9ad8abb24cb21cd51ae3802079e8 100644
--- a/gcc/testsuite/gcc.target/arm/xor-and.c
+++ b/gcc/testsuite/gcc.target/arm/xor-and.c
@@ -10,6 +10,6 @@ unsigned short foo (unsigned short x)
   return x;
 }
 
-/* { dg-final { scan-assembler "orr" } } */
+/* { dg-final { scan-assembler "eor" } } */
 /* { dg-final { scan-assembler-not "mvn" } } */
 /* { dg-final { scan-assembler-not "uxth" } } */
-- 
1.9.1



[HSA] Fix-up of kernel-from-kernel dispatch mechanism

2015-10-20 Thread Martin Liška
Hello.

Following patch fixes up HSA kernel from kernel dispatching mechanism,
where we forgot to wait in a loop for the dispatched child kernel. Apart from 
that,
there's a small follow-up which changes naming scheme for HSA modules.

Martin
>From eca686b6495bf7faa32ecb292f94558c6bfdbdce Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 20 Oct 2015 11:32:58 +0200
Subject: [PATCH 1/2] HSA: fix kernel-from-kernel dispatch mechanism

libgomp/ChangeLog:

2015-10-20  Martin Liska  

	* plugin/plugin-hsa.c (struct agent_info): Add new
	kernel_dispatch_command_q member.
	(GOMP_OFFLOAD_init_device): Initialize the queue.
	(init_single_kernel): Verify that kernel dispatching
	is not run with a big depth.
	(create_kernel_dispatch): Rename from
	create_kernel_dispatch_recursive.
	(GOMP_OFFLOAD_run): Add signal waiting workaround.
	(GOMP_OFFLOAD_fini_device): Destroy the newly added queue.

gcc/ChangeLog:

2015-10-20  Martin Liska  

	* hsa-gen.c (gen_hsa_insns_for_kernel_call): Add waiting
	for the signal in a loop.
	(gen_body_from_gimple): Skip HSA basic blocks with already
	created HBB.
---
 gcc/hsa-gen.c   | 38 --
 libgomp/plugin/plugin-hsa.c | 56 +
 2 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index ab27eb1..c36c9e0 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3943,22 +3943,41 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
   signal->m_memoryscope = BRIG_MEMORY_SCOPE_SYSTEM;
   hbb->append_insn (signal);
 
+  /* Prepare CFG for waiting loop.  */
+  edge e = split_block (hbb->m_bb, call);
+
+  basic_block dest = split_edge (e);
+  edge false_e = EDGE_SUCC (dest, 0);
+
+  false_e->flags &= ~EDGE_FALLTHRU;
+  false_e->flags |= EDGE_FALSE_VALUE;
+
+  make_edge (e->dest, dest, EDGE_TRUE_VALUE);
+
   /* Emit blocking signal waiting instruction.  */
+  hsa_bb *new_hbb = hsa_init_new_bb (dest);
+
   hbb->append_insn (new hsa_insn_comment ("wait for the signal"));
 
-  hsa_op_reg *signal_result_reg = new hsa_op_reg (BRIG_TYPE_U64);
+  hsa_op_reg *signal_result_reg = new hsa_op_reg (BRIG_TYPE_S64);
   c = new hsa_op_immed (1, BRIG_TYPE_S64);
-  hsa_op_immed *c2 = new hsa_op_immed (UINT64_MAX, BRIG_TYPE_U64);
 
-  signal = new hsa_insn_signal (4, BRIG_OPCODE_SIGNAL,
-BRIG_ATOMIC_WAITTIMEOUT_LT, BRIG_TYPE_S64);
+  signal = new hsa_insn_signal (3, BRIG_OPCODE_SIGNAL,
+BRIG_ATOMIC_WAIT_LT, BRIG_TYPE_S64);
   signal->m_memoryorder = BRIG_MEMORY_ORDER_SC_ACQUIRE;
   signal->m_memoryscope = BRIG_MEMORY_SCOPE_SYSTEM;
   signal->set_op (0, signal_result_reg);
   signal->set_op (1, signal_reg);
   signal->set_op (2, c);
-  signal->set_op (3, c2);
-  hbb->append_insn (signal);
+  new_hbb->append_insn (signal);
+
+  hsa_op_reg *ctrl = new hsa_op_reg (BRIG_TYPE_B1);
+  hsa_insn_cmp *cmp = new hsa_insn_cmp
+(BRIG_COMPARE_EQ, ctrl->m_type, ctrl, signal_result_reg,
+ new hsa_op_immed (1, signal_result_reg->m_type));
+
+  new_hbb->append_insn (cmp);
+  new_hbb->append_insn (new hsa_insn_br (ctrl));
 
   hsa_cfun->m_kernel_dispatch_count++;
 }
@@ -4763,7 +4782,11 @@ gen_body_from_gimple (vec  *ssa_map)
   FOR_EACH_BB_FN (bb, cfun)
 {
   gimple_stmt_iterator gsi;
-  hsa_bb *hbb = hsa_init_new_bb (bb);
+  hsa_bb *hbb = hsa_bb_for_bb (bb);
+  if (hbb)
+	continue;
+
+  hbb = hsa_init_new_bb (bb);
 
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	{
@@ -4777,6 +4800,7 @@ gen_body_from_gimple (vec  *ssa_map)
 {
   gimple_stmt_iterator gsi;
   hsa_bb *hbb = hsa_bb_for_bb (bb);
+  gcc_assert (hbb != NULL);
 
   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	if (!virtual_operand_p (gimple_phi_result (gsi_stmt (gsi
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index ed3573f..45dfe9b 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -220,6 +220,8 @@ struct agent_info
   hsa_isa_t isa;
   /* Command queue of the agent.  */
   hsa_queue_t* command_q;
+  /* Kernel from kernel dispatch command queue.  */
+  hsa_queue_t* kernel_dispatch_command_q;
   /* The HSA memory region from which to allocate kernel arguments.  */
   hsa_region_t kernarg_region;
 
@@ -447,6 +449,12 @@ GOMP_OFFLOAD_init_device (int n)
   if (status != HSA_STATUS_SUCCESS)
 hsa_fatal ("Error creating command queue", status);
 
+  status = hsa_queue_create (agent->id, queue_size, HSA_QUEUE_TYPE_MULTI,
+			 queue_callback, NULL, UINT32_MAX, UINT32_MAX,
+			 &agent->kernel_dispatch_command_q);
+  if (status != HSA_STATUS_SUCCESS)
+hsa_fatal ("Error creating kernel dispatch command queue", status);
+
   agent->kernarg_region.handle = (uint64_t) -1;
   status = hsa_agent_iterate_regions (agent->id, get_kernarg_memory_region,
   &agent->kernarg_region);
@@ -455,6 +463,8 @@ GOMP_OFFLOAD_init_device (int n)
 		   "arguments");
   HSA_DEBUG ("HSA agent init

Re: [PATCH][AArch64] Enable fusion of AES instructions

2015-10-20 Thread Kyrill Tkachov


On 14/10/15 13:30, Wilco Dijkstra wrote:

Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. This 
can give up to 2x
speedup on many AArch64 implementations. Also model the crypto instructions on 
Cortex-A57 according
to the Optimization Guide.

Passes regression tests.


arm-wise this is ok, but I'd like a follow up patch to enable this fusion
for the arm port as well. It should be fairly simple.
Just add a new enum value to fuse_ops inside tune_params in arm-protos.h
and update the arm implementation in aarch_macro_fusion_pair_p similar
to your aarch64 implementation.

Thanks,
Kyrill



ChangeLog:
2015-10-14  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.c (cortexa53_tunings): Add AES fusion.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(arch_macro_fusion_pair_p): Add support for AES fusion.
* gcc/config/aarch64/aarch64-fusion-pairs.def: Add AES_AESMC entry.
* gcc/config/arm/aarch-common.c (aarch_crypto_can_dual_issue):
Allow virtual registers before reload so early scheduling works.
* gcc/config/arm/cortex-a57.md (cortex_a57_crypto_simple): Use
correct latency and pipeline.
(cortex_a57_crypto_complex): Likewise.
(cortex_a57_crypto_xor): Likewise.
(define_bypass): Add AES bypass.


---
  gcc/config/aarch64/aarch64-fusion-pairs.def |  1 +
  gcc/config/aarch64/aarch64.c| 10 +++---
  gcc/config/arm/aarch-common.c   |  7 +--
  gcc/config/arm/cortex-a57.md| 17 +++--
  4 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index 53bbef4..fea79fc 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -33,4 +33,5 @@ AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD)
  AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK)
  AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
  AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
+AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
  
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

index 230902d..96368c6 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -376,7 +376,7 @@ static const struct tune_params cortexa53_tunings =
&generic_branch_cost,
4, /* memmov_cost  */
2, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
 | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
8,  /* function_align.  */
8,  /* jump_align.  */
@@ -398,7 +398,7 @@ static const struct tune_params cortexa57_tunings =
&generic_branch_cost,
4, /* memmov_cost  */
3, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
 | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
16, /* function_align.  */
8,  /* jump_align.  */
@@ -420,7 +420,7 @@ static const struct tune_params cortexa72_tunings =
&generic_branch_cost,
4, /* memmov_cost  */
3, /* issue_rate  */
-  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
 | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
16, /* function_align.  */
8,  /* jump_align.  */
@@ -12843,6 +12843,10 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  }
  }
  
+  if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_AES_AESMC)

+   && aarch_crypto_can_dual_issue (prev, curr))
+return true;
+
if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_CMP_BRANCH)
&& any_condjump_p (curr))
  {
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 5dd8222..e191ab6 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -63,8 +63,11 @@ aarch_crypto_can_dual_issue (rtx_insn *producer_insn, 
rtx_insn *consumer_insn)
{
  unsigned int regno = REGNO (SET_DEST (producer_set));
  
-return REGNO (SET_DEST (consumer_set)) == regno

-   && REGNO (XVECEXP (consumer_src, 0, 0)) == regno;
+/* Before reload the registers are virtual, so the destination of
+   consumer_set doesn't need to match.  */
+
+return (REGNO (SET_DEST (consumer_set)) == regno || !reload_completed)
+   && REGNO (XVECEXP (consumer_src, 0, 0)) == regno;
}
  
return 0;

diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index a32c848..eab9d99 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -745,20 +745,20 @@
 neon_fp_sqrt_s_q, neon_fp_sqrt_d_q"))
"ca57_cx2_block*3")
  
-(define_insn_reservation "cortex_a57_crypto_simple" 4

+(define_insn_reservation "cortex_a57_crypto_simple" 3
(and (eq_attr "tune" "

Re: [C PATCH] Don't accept invalid attribute-list (PR c/67964)

2015-10-20 Thread Joseph Myers
On Tue, 20 Oct 2015, Marek Polacek wrote:

> Joseph noticed that we were wrongly accepting multiple attributes without
> commas.  Thus fixed by breaking out of the loop when parsing the attributes --
> if we don't see a comma after an attribute, then the next tokens must be )); 
> if
> not, then the attribute is invalid.  I've also added a few more comments.
> 
> (I think it would be nicer to model the code after C++ and introduce
> c_parser_attributes_list, but the following is enough to fix the bug.)
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof

2015-10-20 Thread Joseph Myers
On Tue, 20 Oct 2015, Martin Sebor wrote:

> I think -Warray-bounds should emit consistent diagnostics for invalid
> array references regardless of the contexts. I.e., given
> 
> struct S {
> int A [5][7];
> int x;
> } s;
> 
> these should both be diagnosed:
> 
> int i = offsetof (struct S, A [0][7]);
> 
> int *p = &s.A [0][7];
> 
> because they are both undefined and both can lead to surprising
> results when used.

But both are valid.  &s.A [0][7] means s.A[0] + 7 (as explicitly specified 
in C11, neither the & nor the [] is evaluated in this case, but the [] 
turns into a +), and s.A[0] is an object of type int[7], which decays to a 
pointer to the first element of that array, so adding 7 produces a 
just-past-end pointer.  It's not valid to dereference that pointer, but 
the pointer itself is valid (and subtracting 1 from it produces a pointer 
you can dereference).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof

2015-10-20 Thread Martin Sebor

On 10/20/2015 09:48 AM, Bernd Schmidt wrote:



On 10/20/2015 05:31 PM, Martin Sebor wrote:

On 10/20/2015 07:20 AM, Bernd Schmidt wrote:

On 10/16/2015 09:34 PM, Martin Sebor wrote:

Thank you for the review. Attached is an updated patch that hopefully
addresses all your comments. I ran the check_GNU_style.sh script on
it to make sure I didn't miss something.  I've also added replies to
a few of your comments below.


Ok, thanks. However - the logic around the context struct in the patch
still seems fairly impenetrable to me, and I've been wondering whether a
simpler approach wouldn't work just as well. I came up with the patch
below, which just passes a tree_code context to recursive invocations
and increasing the upper bound by one only if not in an ARRAY_REF or
COMPONENT_REF context.

As far as I can tell, this produces the same warnings on the testcase
(modulo differences in type printout), except for the final few FA5_7
testcases, which I had doubts about anyway:

typedef struct FA5_7 {
   int i;
   char a5_7 [5][7];
} FA5_7;

 __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning
"index" }
 __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning
"index" }
 __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning
"index" }
 __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning
"index" }

Why wouldn't at least the first two be convered by the
one-past-the-end-is-ok rule?


Thanks for taking the time to try to simplify the patch!

Diagnosing the cases above is at the core of the issue (and can
be seen by substituting a5_7 into the oversimplified test case
in the bug).

In a 5 by 7 array, the offset computed for the out-of-bounds
a_5_7[0][7] is the same as the offset computed for the in-bounds
a_5_7[1][0].  The result might be unexpected because it suggests
that two "distinct" elements of an array share the same offset.


Yeah, but Joseph wrote:

   Given such a flexible array member, [anything][0] and
   [anything][1] are logically valid if [anything] is
   nonnegative (of course, the array might or might not
   have enough element at runtime).  [anything][2] is a
   reference to just past the end of an element of the
   flexible array member; [anything][3] is clearly invalid.

   Regarding the actual testcase, accepting a1_1[0][1] seems
   fine (it's referencing the just-past-end element of the
   valid array a1_1[0]).

So how's that different from a5_7[0][7]? In case it matters, a1_1 in the
earlier test that was discussed could not be a flexible array because it
was followed by another field in the same struct.


Let me try to explain using a couple of examples. Given

  struct S {
  a1_1 [1][1];
  int x;
  } s;

where a1_1 is an ordinary array of a single element

  offseof (struct S, a1_1[0][1])

is accepted as a GCC extension (and, IMO, should be made valid
in C) because a) there is no way to mistake the offset for the
offset of an element of the array at a valid index, and because
b) it yields to the same value as the following valid expression:

  (char*)&s.a1_1[0][1] - (char*)&s.a1_1;

However, given

  struct S {
  a5_7 [5][7];
  int x;
  } s;

the invalid offsetof expression

  offseof (struct S, a5_7[0][7])

should be diagnosed because a) it yields the same offset of the
valid

  offseof (struct S, a5_7[1][0])

and b) because s.a5_7[0][7] is not a valid reference in any context
(i.e., (char*)&s.a5_7[0][7] - (char*)&s.a5_7 is not a valid expression).



Handing over review duties to Joseph because he'll have to decide what
to warn about and what not to.


struct S {
int A [5][7];
int x;
} s;

these should both be diagnosed:

int i = offsetof (struct S, A [0][7]);

int *p = &s.A [0][7];

because they are both undefined and both can lead to surprising
results when used.


Is that really undefined? I thought it was explicitly allowed.
Dereferencing the pointer is what's undefined.


Both are undefined. The most succinct statements to this effect are
in the Unde3fined Behavior section of Annex J:

  An array subscript is out of range, even if an object is apparently
  accessible with the given subscript (as in the lvalue expression
  a[1][7] given the declaration int a[4][5]) (6.5.6).

  The member designator parameter of an offsetof macro is an
  invalid right operand of the . operator for the type parameter,
  or designates a bit-field (7.19).

Martin


  1   2   >