Re: Fix prototype for print_insn in rtl.h
On 10/15/2015 09:42 PM, Trevor Saunders wrote: Sorry, a little late to the party.. but why is print_insn even in rtl.h? it seems that sched-vis.c is the only thing that uses it... Andrew I'm going to use it in the scheduler... but then wouldn't something like sched-int.h make more sense? On the other hand it seems like printing insns is generally useful functionality, so I'm curious why the scheduler needs its own way of doing it. Trev As for me, I believe sched-int.h is inappropriate place for print_insn prototype because the function has nothing scheduler specific. And I like Jeff's idea of removing sched-vis.c and moving everything from it into print-rtl.[hc]. It would be nice if such code motion resulted also in some interface and implementation unification for regular and slim dumpers. Thanks, Nikolai
Re: OpenACC async clause regressions (was: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data)
On Mon, Oct 19, 2015 at 07:43:59PM +0300, Ilya Verbin wrote: > On Mon, Oct 19, 2015 at 18:24:35 +0200, Thomas Schwinge wrote: > > Chung-Lin, would you please have a look at the following (on > > gomp-4_0-branch)? Also, anyone else got any ideas off-hand? > > > > PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c > > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors) > > [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-2.c > > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test > > PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c > > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors) > > [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/data-3.c > > -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 execution test > > Maybe it was caused by this change in gomp_unmap_vars? > https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01376.html > > Looking at the code, I don't see any difference in async_refcount handling, > but > I was unable to test it without having hardware :( I think that is the only patch that could have affected it. The copy_from change is from the old behavior, where basically all concurrent mappings ored into the copy_from flag and when refcount went to 0, if there were any mappings with from or tofrom, it copied back, the OpenMP 4.5 behavior is that whether data is copied from the device is determined solely by the mapping kind of the mapping that performs the refcount decrease to 0. Plus there is the always flag which requests the data copying operation always, no matter what the refcount is (either on the mapping/refcount increase side, or unmapping/refcount decrease size). Jakub
Re: [PATCH] Move cproj simplification to match.pd
On Mon, 19 Oct 2015, Christophe Lyon wrote: > On 19 October 2015 at 15:54, Richard Biener wrote: > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > > > > Hi Richard, > > This patch caused arm and aarch64 builds of newlib to cause ICEs: > In file included from > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdlib.h:11:0, > from > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:13: > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c: > In function '_mktm_r': > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:28:9: > internal compiler error: Segmentation fault > _DEFUN (_mktm_r, (tim_p, res, is_gmtime), > 0xa90205 crash_signal > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:353 > 0x7b3b0c tree_class_check > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3055 > 0x7b3b0c tree_single_nonnegative_warnv_p(tree_node*, bool*, int) > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13025 > 0x814053 gimple_phi_nonnegative_warnv_p > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6239 > 0x814053 gimple_stmt_nonnegative_warnv_p(gimple*, bool*, int) > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6264 > 0x7b5c94 tree_expr_nonnegative_p(tree_node*) > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13325 > 0xe2f657 gimple_simplify_108 > > /tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:5116 > 0xe3060d gimple_simplify_TRUNC_MOD_EXPR > > /tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:24762 > 0xe0809b gimple_simplify > > /tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:34389 > 0xe08c2b gimple_resimplify2(gimple**, code_helper*, tree_node*, > tree_node**, tree_node* (*)(tree_node*)) > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:193 > 0xe17600 gimple_simplify(gimple*, code_helper*, tree_node**, gimple**, > tree_node* (*)(tree_node*), tree_node* (*)(tree_node*)) > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:762 > 0x81c694 fold_stmt_1 > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:3605 > 0xad0f6c replace_uses_by(tree_node*, tree_node*) > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1835 > 0xad1a2f gimple_merge_blocks > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1921 > 0x67d325 merge_blocks(basic_block_def*, basic_block_def*) > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:776 > 0xae06da cleanup_tree_cfg_bb > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:654 > 0xae1118 cleanup_tree_cfg_1 > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:686 > 0xae1118 cleanup_tree_cfg_noloop > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:738 > 0xae1118 cleanup_tree_cfg() > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:793 > 0x9c5c94 execute_function_todo > > /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/passes.c:1920 > Please submit a full bug report, > > This happens for instance with GCC configured > --target arm-none-eabi > --with-cpu cortex-a9 > > You can download logs of a failed build from > http://people.linaro.org/~christophe.lyon/cross-validation/gcc-build/trunk/228970/build.html > > Sorry, I'm out of office for one week, I can't produce further details. Ok, without preprocessed source it's hard to track down but from the backtrace I figure it's an issue similar to that of PR67815. So a testcase would be really nice. Richard. > Christophe > > > > Richard. > > > > 2015-10-19 Richard Biener > > > > * gimple-fold.c (gimple_phi_nonnegative_warnv_p): New function. > > (gimple_stmt_nonnegative_warnv_p): Use it. > > * match.pd (CPROJ): New operator list. > > (cproj (complex ...)): Move simplifications from ... > > * builtins.c (fold_builtin_cproj): ... here. > > > > * gcc.dg/torture/builtin-cproj-1.c: Skip for -O0. > > > > Index: gcc/gimple-fold.c > > === > > --- gcc/gimple-fold.c (revision 228877) > > +++ gcc/gimple-fold.c (working copy) > > @@ -6224,6 +6224,24 @@ gimple_call_nonnegative_warnv_p (gimple > > strict_overflow_p, depth); > > } > > > > +/* Return true if return value of call STMT is known to be non-negative. > > + If
Re: [ARM] Use vector wide add for mixed-mode adds
Hi Kyrill, Since your email I have done the following: 1. Added the ENDIAN_LANE_N to the define_expand patterns for big endian targets. The big endian patches produced no change in the test results. I still have several execution failures with targeting big endian with lto enabled. 2. I diff'd the rtl dumps from a big endian compiler with lto enabled and disabled. I also examined the assembly language and there no differences except for the .ascii directives. I want to ask a question about existing patterns in neon.md that utilize the vec_select and all the lanes as my example does: Why are the following pattern not matched if the target is big endian? (define_insn "neon_vec_unpack_lo_" [(set (match_operand: 0 "register_operand" "=w") (SE: (vec_select: (match_operand:VU 1 "register_operand" "w") (match_operand:VU 2 "vect_par_constant_low" ""] "TARGET_NEON && !BYTES_BIG_ENDIAN" "vmovl. %q0, %e1" [(set_attr "type" "neon_shift_imm_long")] ) (define_insn "neon_vec_unpack_hi_" [(set (match_operand: 0 "register_operand" "=w") (SE: (vec_select: (match_operand:VU 1 "register_operand" "w") (match_operand:VU 2 "vect_par_constant_high" ""] "TARGET_NEON && !BYTES_BIG_ENDIAN" "vmovl. %q0, %f1" [(set_attr "type" "neon_shift_imm_long")] These patterns are similar to the new patterns I am adding and I am wondering if my patterns should exclude BYTES_BIG_ENDIAN? On 10/08/2015 04:02 AM, Kyrill Tkachov wrote: Hi Michael, On 01/10/15 11:05, Michael Collison wrote: Kyrill, I have modified the patch to address your comments. I also modified check_effective_target_vect_widen_sum_hi_to_si_pattern in target-supports.exp to indicate that arm neon supports vector widen sum of HImode to SImode. This resolved several test suite failures. Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have four related execution failure tests on armeb-non-linux-gnueabihf with -flto only. gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test We'd want to get to the bottom of these before committing. Does codegen before and after the patch show anything? When it comes to big-endian and NEON, the fiddly parts are usually lane numbers. Do you need to select the proper lanes with ENDIAN_LANE_N like Charles in his patch at: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00656.html? Thanks, Kyrill I am debugging but have not tracked down the root cause yet. Feedback? 2015-07-22 Michael Collison * config/arm/neon.md (widen_sum): New patterns where mode is VQI to improve mixed mode vectorization. * config/arm/neon.md (vec_sel_widen_ssum_lo3): New define_insn to match low half of signed vaddw. * config/arm/neon.md (vec_sel_widen_ssum_hi3): New define_insn to match high half of signed vaddw. * config/arm/neon.md (vec_sel_widen_usum_lo3): New define_insn to match low half of unsigned vaddw. * config/arm/neon.md (vec_sel_widen_usum_hi3): New define_insn to match high half of unsigned vaddw. * testsuite/gcc.target/arm/neon-vaddws16.c: New test. * testsuite/gcc.target/arm/neon-vaddws32.c: New test. * testsuite/gcc.target/arm/neon-vaddwu16.c: New test. * testsuite/gcc.target/arm/neon-vaddwu32.c: New test. * testsuite/gcc.target/arm/neon-vaddwu8.c: New test. * testsuite/lib/target-supports.exp (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate that arm neon support vector widen sum of HImode TO SImode. Note that the testsuite changes should have their own ChangeLog entry with the paths there starting relative to gcc/testsuite/ On 09/23/2015 01:49 AM, Kyrill Tkachov wrote: Hi Michael, On 23/09/15 00:52, Michael Collison wrote: This is a modified version of the previous patch that removes the documentation and read-md.c fixes. These patches have been submitted separately and approved. This patch is designed to address code that was not being vectorized due to missing widening patterns in the ARM backend. Code such as: int t6(int len, void * dummy, short * __restrict x) { len = len & ~31; int result = 0; __asm volatile (""); for (int i = 0; i < len; i++) result += x[i]; return result; } Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. 2015-09-22 Michael Collison * config/arm/neon.md (widen_sum): New patterns where mode is VQI to improve mixed mode add vectorization. Please list all the new define_expands and define_insns in the changelog. Also, please add an ChangeLog entry for the testsuite additions. The approach looks ok to me with a few comments on some p
Re: [PATCH] Move cproj simplification to match.pd
On October 19, 2015 3:54:05 PM GMT+02:00, Richard Biener wrote: > >+static bool >+gimple_phi_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p, >+ int depth) >+{ Shouldn't all such depth parms be unsigned short or at least unsigned int? Thanks,
Re: [PATCH, wwwdocs] Add -march=skylake-avx512 to gcc-6/changes.html.
On 17 Oct 02:37, Gerald Pfeifer wrote: > On Fri, 16 Oct 2015, Kirill Yukhin wrote: > > Is it ok to install? > > Yes, just add a "the" before "following". > > > This switch was backported to gcc-5. > > Is it ok to create a new section `GCC 5.3' and put it there > > or I need to wait for actual release? > > How about if you add it now, and add "(pending)" to that title? Thanks a lot, Gerald. I've implemented both suggestions. Checked in. > > Gerald -- Thanks, K
Move more cproj simplifications to match.pd
Richi and I both had patches for cproj. I thought I might as well post a rebased version of what I had, since for the other patches I'd been moving the constant handling into fold_builtin_1 (where functions without combinatorial folds also handled constants). I'm hoping the switch statement there will eventually become the switch statement for folding a function with constant arguments. The patch also makes build_complex_cproj available globally and uses it for the existing match.pd rules. Certainly not a big deal or a big improvement, but... tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. OK to install? Thanks, Richard gcc/ * builtins.c (fold_builtin_cproj): Delete. (fold_builtin_1): Handle constant arguments here. (build_complex_cproj): Move to... * tree.c: ...here. * tree.h (build_complex_cproj): Declare. * match.pd: Fold cproj(x)->x if x has no infinity. Use build_complex_cproj for existing cproj rules. diff --git a/gcc/builtins.c b/gcc/builtins.c index a9872c4..16f3bfd 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -7539,50 +7539,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree arg) return NULL_TREE; } -/* Build a complex (inf +- 0i) for the result of cproj. TYPE is the - complex tree type of the result. If NEG is true, the imaginary - zero is negative. */ - -static tree -build_complex_cproj (tree type, bool neg) -{ - REAL_VALUE_TYPE rinf, rzero = dconst0; - - real_inf (&rinf); - rzero.sign = neg; - return build_complex (type, build_real (TREE_TYPE (type), rinf), - build_real (TREE_TYPE (type), rzero)); -} - -/* Fold call to builtin cproj, cprojf or cprojl with argument ARG. TYPE is the - return type. Return NULL_TREE if no simplification can be made. */ - -static tree -fold_builtin_cproj (location_t loc, tree arg, tree type) -{ - if (!validate_arg (arg, COMPLEX_TYPE) - || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != REAL_TYPE) -return NULL_TREE; - - /* If there are no infinities, return arg. */ - if (! HONOR_INFINITIES (type)) -return non_lvalue_loc (loc, arg); - - /* Calculate the result when the argument is a constant. */ - if (TREE_CODE (arg) == COMPLEX_CST) -{ - const REAL_VALUE_TYPE *real = TREE_REAL_CST_PTR (TREE_REALPART (arg)); - const REAL_VALUE_TYPE *imag = TREE_REAL_CST_PTR (TREE_IMAGPART (arg)); - - if (real_isinf (real) || real_isinf (imag)) - return build_complex_cproj (type, imag->sign); - else - return arg; -} - - return NULL_TREE; -} - /* Fold function call to builtin tan, tanf, or tanl with argument ARG. Return NULL_TREE if no simplification can be made. */ @@ -9505,7 +9461,20 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0) break; CASE_FLT_FN (BUILT_IN_CPROJ): - return fold_builtin_cproj (loc, arg0, type); + if (TREE_CODE (arg0) == COMPLEX_CST + && TREE_CODE (TREE_TYPE (TREE_TYPE (arg0))) == REAL_TYPE) + { + const REAL_VALUE_TYPE *real + = TREE_REAL_CST_PTR (TREE_REALPART (arg0)); + const REAL_VALUE_TYPE *imag + = TREE_REAL_CST_PTR (TREE_IMAGPART (arg0)); + + if (real_isinf (real) || real_isinf (imag)) + return build_complex_cproj (type, imag->sign); + else + return arg0; + } + break; CASE_FLT_FN (BUILT_IN_CSIN): if (validate_arg (arg0, COMPLEX_TYPE) diff --git a/gcc/match.pd b/gcc/match.pd index aaca3a0..7d16c52 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2448,30 +2448,24 @@ along with GCC; see the file COPYING3. If not see (CABS (complex @0 @0)) (mult (abs @0) { build_real_truncate (type, dconst_sqrt2 ()); }))) +/* cproj(x) -> x if we're ignoring infinities. */ +(simplify + (CPROJ @0) + (if (!HONOR_INFINITIES (type)) + @0)) + /* If the real part is inf and the imag part is known to be nonnegative, return (inf + 0i). */ (simplify (CPROJ (complex REAL_CST@0 tree_expr_nonnegative_p@1)) (if (real_isinf (TREE_REAL_CST_PTR (@0))) - (with -{ - REAL_VALUE_TYPE rinf; - real_inf (&rinf); -} - { build_complex (type, build_real (TREE_TYPE (type), rinf), - build_zero_cst (TREE_TYPE (type))); }))) + { build_complex_cproj (type, false); })) + /* If the imag part is inf, return (inf+I*copysign(0,imag)). */ (simplify (CPROJ (complex @0 REAL_CST@1)) (if (real_isinf (TREE_REAL_CST_PTR (@1))) - (with -{ - REAL_VALUE_TYPE rinf, rzero = dconst0; - real_inf (&rinf); - rzero.sign = TREE_REAL_CST_PTR (@1)->sign; -} - { build_complex (type, build_real (TREE_TYPE (type), rinf), - build_real (TREE_TYPE (type), rzero)); }))) + { build_complex_cproj (type, TREE_REAL_CST_PTR (@1)->sign); })) /* Narrowing of arithmetic and logical operations. diff --git a/gcc/tree.c b/gcc/tree.c index 9d0e9de..1405328 100644 --- a/gcc/tree.c +++ b
Move tan simplifications to match.pd
Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. OK to install? Thanks, Richard gcc/ * builtins.c (fold_builtin_tab): Delete. (fold_builtin_1): Handle constant tan arguments here. * match.pd: Simplify (tan (atan x)) to x. diff --git a/gcc/builtins.c b/gcc/builtins.c index 16f3bfd..3244538 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -160,7 +160,6 @@ static rtx expand_builtin_fabs (tree, rtx, rtx); static rtx expand_builtin_signbit (tree, rtx); static tree fold_builtin_pow (location_t, tree, tree, tree, tree); static tree fold_builtin_powi (location_t, tree, tree, tree, tree); -static tree fold_builtin_tan (tree, tree); static tree fold_builtin_trunc (location_t, tree, tree); static tree fold_builtin_floor (location_t, tree, tree); static tree fold_builtin_ceil (location_t, tree, tree); @@ -7539,33 +7538,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree arg) return NULL_TREE; } -/* Fold function call to builtin tan, tanf, or tanl with argument ARG. - Return NULL_TREE if no simplification can be made. */ - -static tree -fold_builtin_tan (tree arg, tree type) -{ - enum built_in_function fcode; - tree res; - - if (!validate_arg (arg, REAL_TYPE)) -return NULL_TREE; - - /* Calculate the result when the argument is a constant. */ - if ((res = do_mpfr_arg1 (arg, type, mpfr_tan, NULL, NULL, 0))) -return res; - - /* Optimize tan(atan(x)) = x. */ - fcode = builtin_mathfn_code (arg); - if (flag_unsafe_math_optimizations - && (fcode == BUILT_IN_ATAN - || fcode == BUILT_IN_ATANF - || fcode == BUILT_IN_ATANL)) -return CALL_EXPR_ARG (arg, 0); - - return NULL_TREE; -} - /* Fold function call to builtin sincos, sincosf, or sincosl. Return NULL_TREE if no simplification can be made. */ @@ -9613,7 +9585,9 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0) break; CASE_FLT_FN (BUILT_IN_TAN): - return fold_builtin_tan (arg0, type); + if (validate_arg (arg0, REAL_TYPE)) + return do_mpfr_arg1 (arg0, type, mpfr_tan, NULL, NULL, 0); + break; CASE_FLT_FN (BUILT_IN_CEXP): return fold_builtin_cexp (loc, arg0, type); diff --git a/gcc/match.pd b/gcc/match.pd index 7d16c52..cd02b04 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3. If not see (define_operator_list SIN BUILT_IN_SINF BUILT_IN_SIN BUILT_IN_SINL) (define_operator_list COS BUILT_IN_COSF BUILT_IN_COS BUILT_IN_COSL) (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL) +(define_operator_list ATAN BUILT_IN_ATANF BUILT_IN_ATAN BUILT_IN_ATANL) (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL) (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL) (define_operator_list CPROJ BUILT_IN_CPROJF BUILT_IN_CPROJ BUILT_IN_CPROJL) @@ -2343,7 +2344,14 @@ along with GCC; see the file COPYING3. If not see /* cbrt(expN(x)) -> expN(x/3). */ (simplify (cbrts (exps @0)) - (exps (mult @0 { build_real_truncate (type, dconst_third ()); }) + (exps (mult @0 { build_real_truncate (type, dconst_third ()); } + + /* tan(atan(x)) -> x. */ + (for tans (TAN) + atans (ATAN) + (simplify + (tans (atans @0)) + @0))) /* cabs(x+0i) or cabs(0+xi) -> abs(x). */ (simplify
Re: using scratchpads to enhance RTL-level if-conversion: revised patch
On Tue, Oct 20, 2015 at 7:43 AM, Jeff Law wrote: > On 10/14/2015 01:15 PM, Bernd Schmidt wrote: >> >> On 10/14/2015 07:43 PM, Jeff Law wrote: >>> >>> Obviously some pessimization relative to current code is necessary to >>> fix some of the problems WRT thread safety and avoiding things like >>> introducing faults in code which did not previously fault. >> >> >> Huh? This patch is purely an (attempt at) optimization, not something >> that fixes any problems. > > Then I must be mentally merging two things Abe has been working on then. > He's certainly had an if-converter patch that was designed to avoid > introducing races in code that didn't previously have races. > > Looking back through the archives that appears to be the case. His patches > to avoid racing are for the tree level if converter, not the RTL if > converter. Even for the tree level this wasn't the case, he just run into a bug of the existing converter that I've fixed meanwhile. > Sigh, sorry for the confusion. It's totally my fault. Assuming Abe doesn't > have a correctness case at all here, then I don't see any way for the code > to go forward as-is since it's likely making things significantly worse. > >> >> I can't test valgrind right now, it fails to run on my machine, but I >> guess it could adapt to allow stores slightly below the stack (maybe >> warning once)? It seems like a bit of an edge case to worry about, but >> if supporting it is critical and it can't be changed to adapt to new >> optimizations, then I think we're probably better off entirely without >> this scratchpad transformation. >> >> Alternatively I can think of a few other possible approaches which >> wouldn't require this kind of bloat: >> * add support for allocating space in the stack redzone. That could be >> interesting for the register allocator as well. Would help only >> x86_64, but that's a large fraction of gcc's userbase. >> * add support for opportunistically finding unused alignment padding >> in the existing stack frame. Less likely to work but would produce >> better results when it does. >> * on embedded targets we probably don't have to worry about valgrind, >> so do the optimal (sp - x) thing there >> * allocate a single global as the dummy target. Might be more >> expensive to load the address on some targets though. >> * at least find a way to express costs for this transformation. >> Difficult since you don't yet necessarily know if the function is >> going to have a stack frame. Hence, IMO this approach is flawed. >> (You'll still want cost estimates even when not allocating stuff in >> the normal stack frame, because generated code will still execute >> between two and four extra instructions). > > One could argue these should all be on the table. However, I tend to really > dislike using area beyond the current stack. I realize it's throw-away > data, but it just seems like a bad idea to me -- even on embedded targets > that don't support valgrind. > >
Re: [PR fortran/63858] Fix mix of OpenACC and OpenMP sentinels in continuations
On Fri, Oct 09, 2015 at 12:15:24PM +0200, Thomas Schwinge wrote: > diff --git gcc/fortran/scanner.c gcc/fortran/scanner.c > index bfb7d45..1e1ea84 100644 > --- gcc/fortran/scanner.c > +++ gcc/fortran/scanner.c > @@ -935,6 +935,63 @@ skip_free_comments (void) >return false; > } > > +/* Return true if MP was matched in fixed form. */ > +static bool > +skip_omp_attribute_fixed (locus *start) Technically, this isn't attribute, but sentinel. So, skip_fixed_omp_sentinel? I know the free functions are called attribute, perhaps we should rename them too, patch to do so preapproved. > +{ > + gfc_char_t c; > + if (((c = next_char ()) == 'm' || c == 'M') > + && ((c = next_char ()) == 'p' || c == 'P')) > +{ > + c = next_char (); > + if (c != '\n' > + && (continue_flag The original code checked here (openmp_flag && continue_flag) instead. Is that change intentional? Looking around, we probably don't have a testcase coverage for say fixed form: C*OMP+PARALLEL DO do ... (i.e. where it starts with an OpenMP (or OpenACC) continuation, without non-continued line first), or for free form where: something & !$omp & parallel (ditto for OpenACC). > + while (gfc_is_whitespace (c)); > + if (c != '\n' && c != '!') > + { > + /* Canonicalize to *$omp. */ The comment has a pasto, by storing * you canonicalize to *$acc. > - if (flag_openacc) > + if (flag_openacc || (flag_openmp || flag_openmp_simd)) I'd just write if (flag_openacc || flag_openmp || flag_openmp_simd) the ()s around are just misleading. Anyway, if the removal of "openmp_flag &&" is intentional, then the patch is ok with the above mentioned changes. We can deal with the cases I've mentioned up above in a follow-up. Jakub
Re: Move more cproj simplifications to match.pd
On Tue, Oct 20, 2015 at 11:03 AM, Richard Sandiford wrote: > Richi and I both had patches for cproj. I thought I might as well > post a rebased version of what I had, since for the other patches I'd > been moving the constant handling into fold_builtin_1 (where functions > without combinatorial folds also handled constants). I'm hoping the > switch statement there will eventually become the switch statement > for folding a function with constant arguments. > > The patch also makes build_complex_cproj available globally and uses it > for the existing match.pd rules. > > Certainly not a big deal or a big improvement, but... tested on > x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. OK to install? Ok with build_complex_cproj renamed to sth more descriptive - build_complex_inf? After all it builds a complex Inf with an imag -0 eventually. Richard. > Thanks, > Richard > > > gcc/ > * builtins.c (fold_builtin_cproj): Delete. > (fold_builtin_1): Handle constant arguments here. > (build_complex_cproj): Move to... > * tree.c: ...here. > * tree.h (build_complex_cproj): Declare. > * match.pd: Fold cproj(x)->x if x has no infinity. > Use build_complex_cproj for existing cproj rules. > > diff --git a/gcc/builtins.c b/gcc/builtins.c > index a9872c4..16f3bfd 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -7539,50 +7539,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree > arg) >return NULL_TREE; > } > > -/* Build a complex (inf +- 0i) for the result of cproj. TYPE is the > - complex tree type of the result. If NEG is true, the imaginary > - zero is negative. */ > - > -static tree > -build_complex_cproj (tree type, bool neg) > -{ > - REAL_VALUE_TYPE rinf, rzero = dconst0; > - > - real_inf (&rinf); > - rzero.sign = neg; > - return build_complex (type, build_real (TREE_TYPE (type), rinf), > - build_real (TREE_TYPE (type), rzero)); > -} > - > -/* Fold call to builtin cproj, cprojf or cprojl with argument ARG. TYPE is > the > - return type. Return NULL_TREE if no simplification can be made. */ > - > -static tree > -fold_builtin_cproj (location_t loc, tree arg, tree type) > -{ > - if (!validate_arg (arg, COMPLEX_TYPE) > - || TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) != REAL_TYPE) > -return NULL_TREE; > - > - /* If there are no infinities, return arg. */ > - if (! HONOR_INFINITIES (type)) > -return non_lvalue_loc (loc, arg); > - > - /* Calculate the result when the argument is a constant. */ > - if (TREE_CODE (arg) == COMPLEX_CST) > -{ > - const REAL_VALUE_TYPE *real = TREE_REAL_CST_PTR (TREE_REALPART (arg)); > - const REAL_VALUE_TYPE *imag = TREE_REAL_CST_PTR (TREE_IMAGPART (arg)); > - > - if (real_isinf (real) || real_isinf (imag)) > - return build_complex_cproj (type, imag->sign); > - else > - return arg; > -} > - > - return NULL_TREE; > -} > - > /* Fold function call to builtin tan, tanf, or tanl with argument ARG. > Return NULL_TREE if no simplification can be made. */ > > @@ -9505,7 +9461,20 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0) >break; > > CASE_FLT_FN (BUILT_IN_CPROJ): > - return fold_builtin_cproj (loc, arg0, type); > + if (TREE_CODE (arg0) == COMPLEX_CST > + && TREE_CODE (TREE_TYPE (TREE_TYPE (arg0))) == REAL_TYPE) > + { > + const REAL_VALUE_TYPE *real > + = TREE_REAL_CST_PTR (TREE_REALPART (arg0)); > + const REAL_VALUE_TYPE *imag > + = TREE_REAL_CST_PTR (TREE_IMAGPART (arg0)); > + > + if (real_isinf (real) || real_isinf (imag)) > + return build_complex_cproj (type, imag->sign); > + else > + return arg0; > + } > + break; > > CASE_FLT_FN (BUILT_IN_CSIN): >if (validate_arg (arg0, COMPLEX_TYPE) > diff --git a/gcc/match.pd b/gcc/match.pd > index aaca3a0..7d16c52 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -2448,30 +2448,24 @@ along with GCC; see the file COPYING3. If not see >(CABS (complex @0 @0)) >(mult (abs @0) { build_real_truncate (type, dconst_sqrt2 ()); }))) > > +/* cproj(x) -> x if we're ignoring infinities. */ > +(simplify > + (CPROJ @0) > + (if (!HONOR_INFINITIES (type)) > + @0)) > + > /* If the real part is inf and the imag part is known to be > nonnegative, return (inf + 0i). */ > (simplify > (CPROJ (complex REAL_CST@0 tree_expr_nonnegative_p@1)) > (if (real_isinf (TREE_REAL_CST_PTR (@0))) > - (with > -{ > - REAL_VALUE_TYPE rinf; > - real_inf (&rinf); > -} > - { build_complex (type, build_real (TREE_TYPE (type), rinf), > - build_zero_cst (TREE_TYPE (type))); }))) > + { build_complex_cproj (type, false); })) > + > /* If the imag part is inf, return (inf+I*copysign(0,imag)). */ > (simplify > (CPROJ (complex @0 REAL_CST@1)) > (if (real_isinf (TREE_REAL_CST_PTR (@1))) > - (with > -{
Re: Move tan simplifications to match.pd
On Tue, Oct 20, 2015 at 11:14 AM, Richard Sandiford wrote: > Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. > OK to install? Ok. Thanks, Richard. > Thanks, > Richard > > > gcc/ > * builtins.c (fold_builtin_tab): Delete. > (fold_builtin_1): Handle constant tan arguments here. > * match.pd: Simplify (tan (atan x)) to x. > > diff --git a/gcc/builtins.c b/gcc/builtins.c > index 16f3bfd..3244538 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -160,7 +160,6 @@ static rtx expand_builtin_fabs (tree, rtx, rtx); > static rtx expand_builtin_signbit (tree, rtx); > static tree fold_builtin_pow (location_t, tree, tree, tree, tree); > static tree fold_builtin_powi (location_t, tree, tree, tree, tree); > -static tree fold_builtin_tan (tree, tree); > static tree fold_builtin_trunc (location_t, tree, tree); > static tree fold_builtin_floor (location_t, tree, tree); > static tree fold_builtin_ceil (location_t, tree, tree); > @@ -7539,33 +7538,6 @@ fold_fixed_mathfn (location_t loc, tree fndecl, tree > arg) >return NULL_TREE; > } > > -/* Fold function call to builtin tan, tanf, or tanl with argument ARG. > - Return NULL_TREE if no simplification can be made. */ > - > -static tree > -fold_builtin_tan (tree arg, tree type) > -{ > - enum built_in_function fcode; > - tree res; > - > - if (!validate_arg (arg, REAL_TYPE)) > -return NULL_TREE; > - > - /* Calculate the result when the argument is a constant. */ > - if ((res = do_mpfr_arg1 (arg, type, mpfr_tan, NULL, NULL, 0))) > -return res; > - > - /* Optimize tan(atan(x)) = x. */ > - fcode = builtin_mathfn_code (arg); > - if (flag_unsafe_math_optimizations > - && (fcode == BUILT_IN_ATAN > - || fcode == BUILT_IN_ATANF > - || fcode == BUILT_IN_ATANL)) > -return CALL_EXPR_ARG (arg, 0); > - > - return NULL_TREE; > -} > - > /* Fold function call to builtin sincos, sincosf, or sincosl. Return > NULL_TREE if no simplification can be made. */ > > @@ -9613,7 +9585,9 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0) >break; > > CASE_FLT_FN (BUILT_IN_TAN): > - return fold_builtin_tan (arg0, type); > + if (validate_arg (arg0, REAL_TYPE)) > + return do_mpfr_arg1 (arg0, type, mpfr_tan, NULL, NULL, 0); > + break; > > CASE_FLT_FN (BUILT_IN_CEXP): >return fold_builtin_cexp (loc, arg0, type); > diff --git a/gcc/match.pd b/gcc/match.pd > index 7d16c52..cd02b04 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3. If not see > (define_operator_list SIN BUILT_IN_SINF BUILT_IN_SIN BUILT_IN_SINL) > (define_operator_list COS BUILT_IN_COSF BUILT_IN_COS BUILT_IN_COSL) > (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL) > +(define_operator_list ATAN BUILT_IN_ATANF BUILT_IN_ATAN BUILT_IN_ATANL) > (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL) > (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL) > (define_operator_list CPROJ BUILT_IN_CPROJF BUILT_IN_CPROJ BUILT_IN_CPROJL) > @@ -2343,7 +2344,14 @@ along with GCC; see the file COPYING3. If not see >/* cbrt(expN(x)) -> expN(x/3). */ >(simplify > (cbrts (exps @0)) > - (exps (mult @0 { build_real_truncate (type, dconst_third ()); }) > + (exps (mult @0 { build_real_truncate (type, dconst_third ()); } > + > + /* tan(atan(x)) -> x. */ > + (for tans (TAN) > + atans (ATAN) > + (simplify > + (tans (atans @0)) > + @0))) > > /* cabs(x+0i) or cabs(0+xi) -> abs(x). */ > (simplify >
Re: [mask-vec_cond, patch 3/2] SLP support
2015-10-19 19:05 GMT+03:00 Jeff Law : > On 10/19/2015 05:21 AM, Ilya Enkovich wrote: >> >> Hi, >> >> This patch adds missing support for cond_expr with no embedded comparison >> in SLP. No new test added because vec cmp SLP test becomes (due to changes >> in bool patterns by the first patch) a regression test for this patch. Does >> it look OK? >> >> Thanks, >> Ilya >> -- >> gcc/ >> >> 2015-10-19 Ilya Enkovich >> >> * tree-vect-slp.c (vect_get_and_check_slp_defs): Allow >> cond_exp with no embedded comparison. >> (vect_build_slp_tree_1): Likewise. > > Is it even valid gimple to have a COND_EXPR that is anything other than a > conditional? > > From looking at gimplify_cond_expr, it looks like we could have a SSA_NAME > that's a bool as the conditional. Presumably we're allowing a vector of > bools as the conditional once we hit the vectorizer, which seems fairly > natural. Currently vectorizer doesn't handle such COND_EXPR and never produces VEC_COND_EXPR with SSA_NAME as a condition. Expand treats such VEC_COND_EXPR as implicit (OP < 0) case (we just have no optab to expand it with no comparison). But the first patch in this series[1] allows such conditions to enable re-using vector comparison result by multiple VEC_COND_EXPRs. > > OK. Please install when the prerequisites are installed. > > Thanks, > jeff > Thanks! Ilya [1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00862.html
[Ada] Pragma Constant_After_Elaboration
This patch allows for external tools that utilize the front end abstract syntax tree to querry for pragma Constant_After_Elaboration in the N_Contract node of a variable. No change in behavior, no need for a test. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Hristian Kirtchev * einfo.adb (Get_Pragma): Minor reformatting. Rename local constant Is_CDG to Is_CLS. Add pragma Constant_After_Elaboration to the list of classification pragmas. Index: einfo.adb === --- einfo.adb (revision 229023) +++ einfo.adb (working copy) @@ -6980,30 +6980,40 @@ function Get_Pragma (E : Entity_Id; Id : Pragma_Id) return Node_Id is - Is_CDG : constant Boolean := - Id = Pragma_Abstract_State or else - Id = Pragma_Async_Readers or else - Id = Pragma_Async_Writers or else - Id = Pragma_Dependsor else - Id = Pragma_Effective_Readsor else - Id = Pragma_Effective_Writes or else - Id = Pragma_Extensions_Visible or else - Id = Pragma_Global or else - Id = Pragma_Initial_Condition or else - Id = Pragma_Initializesor else - Id = Pragma_Part_Ofor else - Id = Pragma_Refined_Dependsor else - Id = Pragma_Refined_Global or else - Id = Pragma_Refined_State; + + -- Classification pragmas + + Is_CLS : constant Boolean := + Id = Pragma_Abstract_State or else + Id = Pragma_Async_Readers or else + Id = Pragma_Async_Writers or else + Id = Pragma_Constant_After_Elaboration or else + Id = Pragma_Dependsor else + Id = Pragma_Effective_Readsor else + Id = Pragma_Effective_Writes or else + Id = Pragma_Extensions_Visible or else + Id = Pragma_Global or else + Id = Pragma_Initial_Condition or else + Id = Pragma_Initializesor else + Id = Pragma_Part_Ofor else + Id = Pragma_Refined_Dependsor else + Id = Pragma_Refined_Global or else + Id = Pragma_Refined_State; + + -- Contract / test case pragmas + Is_CTC : constant Boolean := - Id = Pragma_Contract_Cases or else + Id = Pragma_Contract_Casesor else Id = Pragma_Test_Case; + + -- Pre / postcondition pragmas + Is_PPC : constant Boolean := - Id = Pragma_Precondition or else - Id = Pragma_Postcondition or else + Id = Pragma_Precondition or else + Id = Pragma_Postcondition or else Id = Pragma_Refined_Post; - In_Contract : constant Boolean := Is_CDG or Is_CTC or Is_PPC; + In_Contract : constant Boolean := Is_CLS or Is_CTC or Is_PPC; Item : Node_Id; Items : Node_Id; @@ -7018,7 +7028,7 @@ if No (Items) then return Empty; - elsif Is_CDG then + elsif Is_CLS then Item := Classifications (Items); elsif Is_CTC then
[Ada] Avoid overflow in Write_Int
This patch avoids an overflow on the most-negative number in Output.Write_Int. No simple test available. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Bob Duff * output.adb (Write_Int): Work with negative numbers in order to avoid negating Int'First and thereby causing overflow. Index: output.adb === --- output.adb (revision 229023) +++ output.adb (working copy) @@ -6,7 +6,7 @@ -- -- -- B o d y -- -- -- --- Copyright (C) 1992-2013, Free Software Foundation, Inc. -- +-- Copyright (C) 1992-2015, Free Software Foundation, Inc. -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -350,6 +350,7 @@ procedure Write_Char (C : Character) is begin + pragma Assert (Next_Col in Buffer'Range); if Next_Col = Buffer'Length then Write_Eol; end if; @@ -406,17 +407,29 @@ --- procedure Write_Int (Val : Int) is + -- Type Int has one extra negative number (i.e. two's complement), so we + -- work with negative numbers here. Otherwise, negating Int'First will + -- overflow. + + subtype Nonpositive is Int range Int'First .. 0; + procedure Write_Abs (Val : Nonpositive); + -- Write out the absolute value of Val + + procedure Write_Abs (Val : Nonpositive) is + begin + if Val < -9 then +Write_Abs (Val / 10); -- Recursively write higher digits + end if; + + Write_Char (Character'Val (-(Val rem 10) + Character'Pos ('0'))); + end Write_Abs; + begin if Val < 0 then Write_Char ('-'); - Write_Int (-Val); - + Write_Abs (Val); else - if Val > 9 then -Write_Int (Val / 10); - end if; - - Write_Char (Character'Val ((Val mod 10) + Character'Pos ('0'))); + Write_Abs (-Val); end if; end Write_Int;
[Ada] Do not inline No_Return procedures in GNATprove mode
Inlining in GNATprove mode is made to improve results of formal verification in GNATprove. It should not be done on procedures marked No_Return, which are handled specially in GNATprove. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Yannick Moy * inline.adb (Can_Be_Inlined_In_GNATprove_Mode): Return False for procedures marked No_Return. * sem_util.ads (Enclosing_Declaration): Improve comment. Index: inline.adb === --- inline.adb (revision 229023) +++ inline.adb (working copy) @@ -1534,6 +1534,12 @@ elsif In_Package_Visible_Spec (Id) then return False; + -- Do not inline subprograms marked No_Return, possibly used for + -- signaling errors, which GNATprove handles specially. + + elsif No_Return (Id) then + return False; + -- Do not inline subprograms that have a contract on the spec or the -- body. Use the contract(s) instead in GNATprove. Index: sem_util.ads === --- sem_util.ads(revision 229023) +++ sem_util.ads(working copy) @@ -532,7 +532,8 @@ -- Returns the closest ancestor of Typ that is a CPP type. function Enclosing_Declaration (N : Node_Id) return Node_Id; - -- Returns the declaration node enclosing N, if any, or Empty otherwise + -- Returns the declaration node enclosing N (including possibly N itself), + -- if any, or Empty otherwise function Enclosing_Generic_Body (N : Node_Id) return Node_Id;
[Ada] Support for recording bind time environment info
This change adds support for recording a set of key=value pairs at the time an application is built (or more precisely at bind time), and making this information available at run time. Typical use case is to record a build timestamp: $ gnatmake record_build_time -bargs -VBUILD_TIME="`LANG=C date`" gcc -c record_build_time.adb gnatbind -VBUILD_TIME=Tue Oct 20 05:51:21 EDT 2015 -x record_build_time.ali gnatlink record_build_time.ali $ ./record_build_time BT=<> Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Thomas Quinot * Makefile.rtl: add the following... * g-binenv.ads, g-binenv.adb: New unit providing runtime access to bind time captured values ("bind environment") * init.c, s-init.ads: declare new global variable __gl_bind_env_addr. * bindgen.ads, bindgen.adb (Set_Bind_Env): record a bind environment key=value pair. (Gen_Bind_Env_String): helper to produce the bind environment data called in the binder generated file. (Gen_Output_File_Ada): Call the above (Gen_Adainit): Set __gl_bind_env_addr accordingly. * switch-b.adb: Support for command line switch -V (user interface to set a build environment key=value pair) * bindusg.adb: Document the above Index: impunit.adb === --- impunit.adb (revision 229023) +++ impunit.adb (working copy) @@ -238,6 +238,7 @@ ("g-alvevi", F), -- GNAT.Altivec.Vector_Views ("g-arrspl", F), -- GNAT.Array_Split ("g-awk ", F), -- GNAT.AWK +("g-binenv", F), -- GNAT.Bind_Environment ("g-boubuf", F), -- GNAT.Bounded_Buffers ("g-boumai", F), -- GNAT.Bounded_Mailboxes ("g-bubsor", F), -- GNAT.Bubble_Sort Index: bindusg.adb === --- bindusg.adb (revision 229023) +++ bindusg.adb (working copy) @@ -4,9 +4,9 @@ -- -- -- B I N D U S G-- -- -- ---B o d y -- +-- B o d y -- -- -- --- Copyright (C) 1992-2014, Free Software Foundation, Inc. -- +-- Copyright (C) 1992-2015, Free Software Foundation, Inc. -- -- -- -- GNAT is free software; you can redistribute it and/or modify it under -- -- terms of the GNU General Public License as published by the Free Soft- -- @@ -228,6 +228,10 @@ Write_Line (" -vVerbose mode. Error messages, " & "header, summary output to stdout"); + -- Line for -V switch + + Write_Line (" -Vkey=val Record bind-time variable key " & + "with value val"); -- Line for -w switch Write_Line (" -wx Warning mode. (x=s/e for " & Index: bindgen.adb === --- bindgen.adb (revision 229023) +++ bindgen.adb (working copy) @@ -35,6 +35,7 @@ with Osint.B; use Osint.B; with Output; use Output; with Rident; use Rident; +with Stringt; use Stringt; with Table;use Table; with Targparm; use Targparm; with Types;use Types; @@ -43,6 +44,7 @@ with System.WCh_Con; use System.WCh_Con; with GNAT.Heap_Sort_A; use GNAT.Heap_Sort_A; +with GNAT.HTable; package body Bindgen is @@ -89,6 +91,9 @@ Lib_Final_Built : Boolean := False; -- Flag indicating whether the finalize_library rountine has been built + Bind_Env_String_Built : Boolean := False; + -- Flag indicating whether a bind environment string has been built + CodePeer_Wrapper_Name : constant String := "call_main_subprogram"; -- For CodePeer, introduce a wrapper subprogram which calls the -- user-defined main subprogram. @@ -124,6 +129,22 @@ Table_Increment => 200, Table_Name => "PSD_Pragma_Settings"); + + -- Bind_Environment Table -- + + + subtype Header_Num is Int range 0 .. 36; + + function Hash (Nam : Name_Id) return Header_Num; + + package Bind_Environment is new GNAT.HTable.Simple_HTable + (Header_Num => Header_Num, + Element=> Name_Id, + No_Element => No_Name, + Key=> Name_Id, + Hash => Hash, + Equal => "="); + -- -- Run-Time Globals -- -- @@ -246,6 +267,9 @@ procedure Gen_Adafinal; -- Generate the Adafinal procedure + procedure Gen_Bind_Env_String; + -- Generate the bind environment buffer + procedure G
[Ada] Improve error message for missing dependency item
This patch modifies the analysis of pragma Depends to emit a clearer message concerning a missing dependency item. -- Source -- -- message.ads package Message with Abstract_State => State, Initializes=> State, SPARK_Mode is procedure Proc (X : in Integer; Y : out Integer) with Global => (In_Out => State), Depends => (State => State, Y => X); end Message; -- message.adb package body Message with Refined_State => (State => N), SPARK_Mode is N : Natural := 0; procedure Proc(X : in Integer; Y : out Integer) with Refined_Global => (In_Out => N), Refined_Depends => (N => N) is begin N := N + 1; Y := X; end Proc; end Message; -- Compilation and output -- $ gcc -c message.adb message.adb:9:11: parameter "X" is missing from input dependence list Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Hristian Kirtchev * sem_prag.adb (Check_Usage): Update the calls to Usage_Error. (Usage_Error): Remove formal parameter Item. Emit a clearer message concerning a missing dependency item and place it on the related pragma. Index: sem_prag.adb === --- sem_prag.adb(revision 229031) +++ sem_prag.adb(working copy) @@ -1220,14 +1220,14 @@ Used_Items : Elist_Id; Is_Input : Boolean) is - procedure Usage_Error (Item : Node_Id; Item_Id : Entity_Id); + procedure Usage_Error (Item_Id : Entity_Id); -- Emit an error concerning the illegal usage of an item - -- Usage_Error -- - - procedure Usage_Error (Item : Node_Id; Item_Id : Entity_Id) is + procedure Usage_Error (Item_Id : Entity_Id) is Error_Msg : Name_Id; begin @@ -1245,10 +1245,10 @@ Add_Item_To_Name_Buffer (Item_Id); Add_Str_To_Name_Buffer -(" & must appear in at least one input dependence list"); +(" & is missing from input dependence list"); Error_Msg := Name_Find; - SPARK_Msg_NE (Get_Name_String (Error_Msg), Item, Item_Id); + SPARK_Msg_NE (Get_Name_String (Error_Msg), N, Item_Id); end if; -- Output case (SPARK RM 6.1.5(10)) @@ -1258,10 +1258,10 @@ Add_Item_To_Name_Buffer (Item_Id); Add_Str_To_Name_Buffer - (" & must appear in exactly one output dependence list"); + (" & is missing from output dependence list"); Error_Msg := Name_Find; - SPARK_Msg_NE (Get_Name_String (Error_Msg), Item, Item_Id); + SPARK_Msg_NE (Get_Name_String (Error_Msg), N, Item_Id); end if; end Usage_Error; @@ -1297,13 +1297,13 @@ and then not Contains (Used_Items, Item_Id) then if Is_Formal (Item_Id) then - Usage_Error (Item, Item_Id); + Usage_Error (Item_Id); -- States and global objects are not used properly only when -- the subprogram is subject to pragma Global. elsif Global_Seen then - Usage_Error (Item, Item_Id); + Usage_Error (Item_Id); end if; end if;
[Ada] Spurious error in instantiation of formal package with attribute
When verifying that a function that is an actual of a formal package matches the corresponding function in the corresponding actual package, functions given by attributes must be handled specially because each of them ends up renaming a different generated body, and we must check that the attribute references themselves match. No short example available. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Ed Schonberg * sem_ch12.adb (Same_Instantiated_Function): New predicate in Check_Formal_Package_Instance, used to verify that the formal and the actual of an actual package match when both are functions given as attribute references. Index: sem_ch12.adb === --- sem_ch12.adb(revision 229023) +++ sem_ch12.adb(working copy) @@ -5759,6 +5759,11 @@ -- same entity we may have to traverse several definitions to recover -- the ultimate entity that they refer to. + function Same_Instantiated_Function (E1, E2 : Entity_Id) return Boolean; + -- The formal and the actual must be identical, but if both are + -- given by attributes they end up renaming different generated bodies, + -- and we must verify that the attributes themselves match. + function Same_Instantiated_Variable (E1, E2 : Entity_Id) return Boolean; -- Similarly, if the formal comes from a nested formal package, the -- actual may designate the formal through multiple renamings, which @@ -5834,6 +5839,35 @@ end Same_Instantiated_Constant; + -- Same_Instantiated_Function -- + + + function Same_Instantiated_Function +(E1, E2 : Entity_Id) return Boolean + is + U1, U2 : Node_Id; + begin + if Alias (E1) = Alias (E2) then +return True; + + elsif Present (Alias (E2)) then +U1 := Original_Node (Unit_Declaration_Node (E1)); +U2 := Original_Node (Unit_Declaration_Node (Alias (E2))); + +return Nkind (U1) = N_Subprogram_Renaming_Declaration + and then Nkind (Name (U1)) = N_Attribute_Reference + + and then Nkind (U2) = N_Subprogram_Renaming_Declaration + and then Nkind (Name (U2)) = N_Attribute_Reference + + and then +Attribute_Name (Name (U1)) = Attribute_Name (Name (U2)); + else +return False; + end if; + end Same_Instantiated_Function; + + -- Same_Instantiated_Variable -- @@ -6050,7 +6084,8 @@ else Check_Mismatch - (Ekind (E2) /= Ekind (E1) or else (Alias (E1)) /= Alias (E2)); + (Ekind (E2) /= Ekind (E1) +or else not Same_Instantiated_Function (E1, E2)); end if; else
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote: > > How's the following (complete patch instead of incremental patch; the > > driver changes are still the same as before)? The changes are: > > > > * libgomp/target.c:gomp_target_init again loads all the plugins. > > * libgomp/target.c:resolve_device and > > libgomp/oacc-init.c:resolve_device verify that a default device > > (OpenMP device-var ICV, and acc_device_default, respectively) is > > actually enabled, or resort to host fallback if not. > > * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used > > to enable devices specified by -foffload. Can be called multiple > > times (executable, any shared libraries); the set of enabled devices > > is the union of all those ever requested. > > * GOMP_offload_register (but not the new GOMP_offload_register_ver) > > changed to enable all devices. This is to maintain compatibility > > with old executables and shared libraries built without the -foffload > > constructor support. Any reason not to pass the bitmask of the enabled targets to GOMP_offload_register_ver instead, to decrease the amount of ctors and the times you lock the various locks during initialization, or just enable automatically the devices you load data for during GOMP_offload_register_ver? I mean, GOMP_offload_register would enable for compatibility all devices, GOMP_offload_register_ver would enable the device it is registered for. For -foffload=disable on all shared libraries/binaries, naturally you would not register anything, thus would not enable any devices (only host fallback would work). Or are you worried about the case where one shared library is compiled with say -foffload=intelmic,ptx but doesn't actually contain any #pragma omp target/#pragma omp declare target (or OpenACC similar #directives), but only contains #pragma omp target data and/or the device query/copying routines, then dlopens some other shared library that actually has the offloading device code? That could be solved by adding the call you are talking about, but if we really should care about that unlikely case, it would be better to only arrange for it if really needed by the shared library (i.e. if it calls one of the OpenMP or OpenACC library routines that talk to the devices, or has #pragma omp target data or similar constructs; I'd strongly prefer not to have constructors in code that just got compiled with -fopenmp, even in configuration where some offloading is configured by default, when nothing in the code really cares about offloading. > --- a/gcc/gcc.c > +++ b/gcc/gcc.c > @@ -401,6 +401,8 @@ static const char > *compare_debug_auxbase_opt_spec_function (int, const char **); > static const char *pass_through_libs_spec_func (int, const char **); > static const char *replace_extension_spec_func (int, const char **); > static const char *greater_than_spec_func (int, const char **); > +static const char *add_omp_infile_spec_func (int, const char **); > + > static char *convert_white_space (char *); > > /* The Specs Language I'd like to defer review of the driver bits, can Joseph or Bernd please have a look at those? > diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h > index 24fbb94..5da4fa7 100644 > --- a/libgomp/libgomp-plugin.h > +++ b/libgomp/libgomp-plugin.h > @@ -48,7 +48,8 @@ enum offload_target_type >OFFLOAD_TARGET_TYPE_HOST = 2, >/* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed. */ >OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5, > - OFFLOAD_TARGET_TYPE_INTEL_MIC = 6 > + OFFLOAD_TARGET_TYPE_INTEL_MIC = 6, > + OFFLOAD_TARGET_TYPE_HWM What is HWM? Is that OFFLOAD_TARGET_TYPE_LAST what you mean? > diff --git a/libgomp/target.c b/libgomp/target.c > index b767410..df51bfb 100644 > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -72,6 +72,9 @@ static int num_offload_images; > /* Array of descriptors for all available devices. */ > static struct gomp_device_descr *devices; > > +/* Set of enabled devices. */ > +static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM]; I must say I don't like the locking for this. If all you ever change on this is that you change it from 0 to 1, then supposedly just storing it with __atomic_store, perhaps with rel semantics, and reading it as __atomic_load, with acquire semantics, would be good enough? And perhaps change it into int array, so that it is actually atomic even on the old Alphas (if there are any around). Jakub
Re: [gomp4.1] map clause parsing improvements
On Mon, Oct 19, 2015 at 05:00:33PM +0200, Thomas Schwinge wrote: > n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl); > if ((ctx->region_type & ORT_TARGET) != 0 > && !(n->value & GOVD_SEEN) > && ((OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS) == 0 > || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_STRUCT)) > { > remove = true; > > I'd suggest turning GOMP_MAP_FLAG_ALWAYS into GOMP_MAP_FLAG_SPECIAL_2, > and then provide a GOMP_MAP_ALWAYS_P that evaluates to true just for the > three "always,to", "always,from", and "always,tofrom" cases. Yeah, that can be done, I'll add it to my todo list. Jakub
[Ada] Crash in compile-only mode with dynamic predicates.
This patch fixes a compiler abort on a program with a type that has defined dynamic predicate and a call to a procedure with a parameter of that type, when the program is compiled with -gnatc. The following must compile quietly: gcc -c -gnatc p-q.adb --- with Unchecked_Conversion; package P is function Is_Valid_Name (Item : String) return Boolean; subtype Name_T is String with Dynamic_Predicate => Is_Valid_Name (Name_T); generic type T is private; package G is function Conv is new Unchecked_Conversion (Source => T, Target => Integer); end G; end P; --- package P.Q is type Rec is record B : Integer; end record; package My_G is new G (Rec); procedure Proc (Name : Name_T); end P.Q; --- package body P.Q is procedure Proc (Name : Name_T) is begin null; end; end P.Q; Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Ed Schonberg * sem_ch6.adb (Analyze_Subprogram_Body_Helper): Generate freeze node for subprogram in Compile_Only mode. Index: sem_ch6.adb === --- sem_ch6.adb (revision 229023) +++ sem_ch6.adb (working copy) @@ -3215,18 +3215,17 @@ -- the freeze actions that include the bodies. In particular, extra -- formals for accessibility or for return-in-place may need to be -- generated. Freeze nodes, if any, are inserted before the current - -- body. These freeze actions are also needed in ASIS mode to enable - -- the proper back-annotations. + -- body. These freeze actions are also needed in ASIS mode and in + -- Compile_Only mode to enable the proper back-end type annotations. + -- They are necessary in any case to insure order of elaboration + -- in gigi. if not Is_Frozen (Spec_Id) - and then (Expander_Active or ASIS_Mode) + and then (Expander_Active + or else ASIS_Mode + or else (Operating_Mode = Check_Semantics + and then Serious_Errors_Detected = 0)) then --- Force the generation of its freezing node to ensure proper --- management of access types in the backend. - --- This is definitely needed for some cases, but it is not clear --- why, to be investigated further??? - Set_Has_Delayed_Freeze (Spec_Id); Freeze_Before (N, Spec_Id); end if;
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On 10/20/2015 12:02 PM, Jakub Jelinek wrote: I'd like to defer review of the driver bits, can Joseph or Bernd please have a look at those? Last time around I think I asked for some minor changes, like updated documentation for give_switch. Other than that, I'm ok with the patch iff you are happy with the overall approach. Bernd
[Ada] Volatile functions
This patch implements SPARK volatile functions described by the following rules: A type is said to be effectively volatile if it is either a volatile type, an array type whose Volatile_Component aspect is True, or an array type whose component type is effectively volatile, a protected type, or a descendant of Ada.Synchronous_Task_Control.Suspension_Object. An effectively volatile object is a volatile object or an object of an effectively volatile type. The Boolean aspect Volatile_Function may be specified as part of the (explicit) initial declaration of a function. A function whose Volatile_Function aspect is True is said to be a volatile function. A protected function is also defined to be a volatile function. A subprogram whose Volatile_Function aspect is True shall not override an inherited primitive operation of a tagged type whose Volatile_Function aspect is False. A global_item of a nonvolatile function shall not denote either an effectively volatile object or an external state abstraction. A nonvolatile function shall not have a formal parameter (or result) of an effectively volatile type. Contrary to the general SPARK 2014 rule that expression evaluation cannot have side effects, a read of an effectively volatile object with the properties Async_Writers or Effective_Reads set to True is considered to have an effect when read. To reconcile this discrepancy, a name denoting such an object shall only occur in a non-interfering context. A name occurs in a non-interfering context if it is: the (protected) prefix of a name denoting a protected operation; or the return expression of a simple_return_statement which applies to a volatile function; or the initial value expression of the extended_return_object_declaration of an extended_return_statement which applies to a volatile function; or The patch also changes the implementation of aspects/pragmas Async_Readers, Async_Writers, Effective_Reads and Effective_Writes to remove their forward referencing capabilities and require a static Boolean expression (if any). -- Source -- -- volatile types.ads with Ada.Synchronous_Task_Control; use Ada.Synchronous_Task_Control; package Volatile_Types is type Vol_Typ_1 is null record with Volatile; type Vol_Typ_1_Ptr is access all Vol_Typ_1; type Vol_Typ_2 is array (1 .. 5) of Vol_Typ_1; type Vol_Typ_3 is array (1 .. 5) of Boolean with Volatile_Components; protected type Vol_Typ_4 is entry E; procedure P; end Vol_Typ_4; type Vol_Typ_5 is new Suspension_Object; end Volatile_Types; -- volatile types.adb package body Volatile_Types is protected body Vol_Typ_4 is entry E when True is begin null; end E; procedure P is begin null; end P; end Vol_Typ_4; end Volatile_Types; -- lr712.ads package LR712 is -- 7.1.2 - The Boolean aspect Volatile_Function may be specified as part of -- the initial declaration of a function. Function whose Volatile_Function -- aspect is True is said to be a volatile function. A protected function -- is also defined to be a volatile function. function OK_1 return Boolean with Volatile_Function; -- OK function OK_2 return Boolean is (True) with Volatile_Function;-- OK function OK_3 return Boolean; pragma Volatile_Function; -- OK function Error_1 return Boolean; function Error_2 return Boolean; function Error_3 return Boolean; function Error_4 return Boolean; function Error_4 return Boolean is (True) with Volatile_Function; -- Error function Error_5 return Boolean with Volatile_Function => Junk; -- Error function Error_6 return Boolean with Volatile_Function => "Junk"; -- Error function Error_7 return Boolean; Obj : Integer := 1 with Volatile_Function;-- Error end LR712; -- lr712.adb ckage body LR712 is function OK_1 return Boolean is begin return True; end OK_1; function OK_3 return Boolean is begin return True; end OK_3; function Error_1 return Boolean with Volatile_Function -- Error is begin return True; end Error_1; function Error_2 return Boolean is (True) with Volatile_Function; -- Error function Error_3 return Boolean is separate; function Error_5 return Boolean is begin return True; end Error_5; function Error_6 return Boolean is begin return True; end Error_6; function Error_7 return Boolean is pragma Volatile_Function; -- Error begin return True; end Error_7; end LR712; -- lr712-error_3.adb separate (LR712) function Error_3 return Boolean with Volatile_Function --
[Ada] Uninitialized variable on illegal placement of aspect specification
This patch improves the handling of programs with illegal placement of aspect specifications, ensuring that the flag that controls the parsing of a list of declarations is properly set in the presence of errors. No simple test available. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Ed Schonberg * par-ch3.adb (P_Declarative_Items): In case of misplaced aspect specifications, ensure that flag Done is properly set to continue parse. Index: par-ch3.adb === --- par-ch3.adb (revision 229023) +++ par-ch3.adb (working copy) @@ -4425,6 +4425,12 @@ else Error_Msg_SC ("aspect specifications not allowed here"); + + -- Assume that this is a misplaced aspect specification + -- within a declarative list. After discarding the + -- misplaced aspects we can continue the scan. + + Done := False; end if; declare
[Ada] Abort in front end client due to locked scope table
This patch reimplements the way the front end detects whether a type is a descendant of Ada.Synchronous_Task_Control.Suspension_Object to avoid using the RTSfind mechanism. This ensures that external clients of the front end will not fail due to a locked scope table accessed during analysis performed by RTSfind. No reproducer possible as this requires an external front end client. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Hristian Kirtchev * rtsfind.ads Remove the entries for Ada.Synchronous_Task_Control and Suspension_Object from tables RE_Id, RE_Unit_Table and RTU_Id. * sem_util.adb (Is_Descendant_Of_Suspension_Object): Update the comment on usage. Use routine Is_Suspension_Object to detect whether a type matches Suspension_Object. (Is_Suspension_Object): New routine. * snames.ads-tmpl: Add predefined names for Suspension_Object and Synchronous_Task_Control. Index: rtsfind.ads === --- rtsfind.ads (revision 229047) +++ rtsfind.ads (working copy) @@ -131,7 +131,6 @@ Ada_Real_Time, Ada_Streams, Ada_Strings, - Ada_Synchronous_Task_Control, Ada_Tags, Ada_Task_Identification, Ada_Task_Termination, @@ -607,8 +606,6 @@ RE_Unbounded_String,-- Ada.Strings.Unbounded - RE_Suspension_Object, -- Ada.Synchronous_Task_Control - RE_Access_Level,-- Ada.Tags RE_Alignment, -- Ada.Tags RE_Address_Array, -- Ada.Tags @@ -1840,8 +1837,6 @@ RE_Unbounded_String => Ada_Strings_Unbounded, - RE_Suspension_Object=> Ada_Synchronous_Task_Control, - RE_Access_Level => Ada_Tags, RE_Alignment=> Ada_Tags, RE_Address_Array=> Ada_Tags, Index: sem_util.adb === --- sem_util.adb(revision 229047) +++ sem_util.adb(working copy) @@ -11397,9 +11397,7 @@ function Is_Descendant_Of_Suspension_Object (Typ : Entity_Id) return Boolean; -- Determine whether type Typ is a descendant of type Suspension_Object - -- defined in Ada.Synchronous_Task_Control. This routine is similar to - -- Sem_Util.Is_Descendent_Of, however this version does not load unit - -- Ada.Synchronous_Task_Control. + -- defined in Ada.Synchronous_Task_Control. -- Is_Descendant_Of_Suspension_Object -- @@ -11408,24 +11406,39 @@ function Is_Descendant_Of_Suspension_Object (Typ : Entity_Id) return Boolean is - Cur_Typ : Entity_Id; - Par_Typ : Entity_Id; + function Is_Suspension_Object (Id : Entity_Id) return Boolean; + -- Determine whether arbitrary entity Id denotes Suspension_Object + -- defined in Ada.Synchronous_Task_Control. - begin - -- Do not attempt to load Ada.Synchronous_Task_Control in No_Run_Time - -- mode. The unit contains tagged types and those are not allowed in - -- this mode. + -- + -- Is_Suspension_Object -- + -- - if No_Run_Time_Mode then -return False; + function Is_Suspension_Object (Id : Entity_Id) return Boolean is + begin +-- This approach does an exact name match rather than to rely on +-- RTSfind. Routine Is_Effectively_Volatile is used by clients of +-- the front end at point where all auxiliary tables are locked +-- and any modifications to them are treated as violations. Do not +-- tamper with the tables, instead examine the Chars fields of all +-- the scopes of Id. - -- Unit Ada.Synchronous_Task_Control is not available, the type - -- cannot possibly be a descendant of Suspension_Object. +return + Chars (Id) = Name_Suspension_Object +and then Present (Scope (Id)) +and then Chars (Scope (Id)) = Name_Synchronous_Task_Control +and then Present (Scope (Scope (Id))) +and then Chars (Scope (Scope (Id))) = Name_Ada; + end Is_Suspension_Object; - elsif not RTE_Available (RE_Suspension_Object) then -return False; - end if; + -- Local variables + Cur_Typ : Entity_Id; + Par_Typ : Entity_Id; + + -- Start of processing for Is_Descendant_Of_Suspension_Object + + begin -- Climb the type derivation chain checking each parent type against -- Suspension_Object. @@ -11435,7 +11448,7 @@ -- The current type is a match -if Is_RTE (Cur_Typ, R
Re: [PATCH][AArch64] Fix insn types
Hi Evandro, On 19/10/15 22:05, Evandro Menezes wrote: The type assigned to some insn definitions was seemingly not correct: * "movi %d0, %1" was of type "fmov" * "fmov %s0, wzr" was of type "fconstd" * "mov %0, {-1,1}" were of type "csel" This patch changes their types to: * "movi %d0, %1" to type "neon_move" * "fmov %s0, wzr" to type "f_mcr" * "mov %0, {-1,1}" to type "mov_imm" Please, commit if it's alright. Thank you, Looking at your ChangeLog... gcc/ * config/aarch64/aarch64.md (*movdi_aarch64): Change the type of "movi %d0, %1" to "neon_move". (*movtf_aarch64): Change the type of "fmov %s0, wzr" to "f_mcr". (*cmov_insn): Change the types of "mov %0, {-1,1}" to "mov_imm". (*cmovsi_insn_uxtw): Idem The preferred form is "Likewise" rather than "Idem" AFAIK. Also, full stop at the end. --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1130,7 +1130,7 @@ ldrh\\t%w0, %1 strh\\t%w1, %0 mov\\t%w0, %w1" - [(set_attr "type" "neon_from_gp,neon_to_gp,fmov,\ + [(set_attr "type" "neon_from_gp,neon_to_gp,neon_move,\ f_loads,f_stores,load1,store1,mov_reg") (set_attr "simd" "yes,yes,yes,*,*,*,*,*") (set_attr "fp" "*,*,*,yes,yes,*,*,*")] I don't think this matches up with your changelog entry. This isn't the *movdi_aarch64 pattern. From what I can see the *movdi_aarch64 pattern already has the type neon_move on the movi\\t%d0, %1 alternative (the last one). In fact, if I apply your patch using "patch -p1" I see it being applied to the *movhf_aarch64 pattern. Is that what you intended? Thanks, Kyrill
[PATCH v2 0/6] Libsanitizer merge from upstream r250806 (was r249633).
Hi, this is the second attempt to perform libsanitizer merge from upstream. In previous patch set ( https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01212.html) we have revealed an issue with heuristic for old/new style ubsan_data that was needed to be fixed upstream + some errors in compiler changes were found. I thought that it would me too messy to proceed review in the previous thread, so creating the new one. The first patch is the merge itself. Since the heuristic fix for old/new style ubsan_data selection was applied upstream, I'm bumping revision to r250806. Since there aren't significant changes from r249633, I think this should be fine. The second one combines all compiler-related changes and addresses Jakub's nits from previous review. Patches 3, 4 and 5 are applied to library and were preapproved in previous review, but I'm attaching them here to form the full patch set. In patch 6, I'm trying to add a brief instruction how to perform the merge. This is just a documentation patch. Tested and {A, UB}San bootstrapped on x86-linux-gnu, x86_64-linux-gnu and aarch64-linux-gnu targets. Thanks, -Maxim
Re: [PATCH, sh][v3] musl support for sh
On Mon, 2015-10-19 at 20:01 +0100, Szabolcs Nagy wrote: > On 17/10/15 02:14, Oleg Endo wrote: > > On Fri, 2015-10-16 at 17:06 +0100, Szabolcs Nagy wrote: > >> Revision of > >> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01636.html > >> > >> The musl dynamic linker name is /lib/ld-musl-sh{-nofpu}{-fdpic}.so.1 > >> > >> New in this revision: > >> > >> Add -fdpic to the name, will be useful with the pending sh2 FDPIC support. > >> > >> 2015-10-16 Gregor Richards > >>Szabolcs Nagy > >> > >>* config/sh/linux.h (MUSL_DYNAMIC_LINKER): Define. > >>(MUSL_DYNAMIC_LINKER_E, MUSL_DYNAMIC_LINKER_FP): Define. > > > > > >> +#if TARGET_CPU_DEFAULT & (MASK_HARD_SH2A_DOUBLE | MASK_SH4) > >> +/* "-nofpu" if any nofpu option is specified. */ > >> +#define MUSL_DYNAMIC_LINKER_FP \ > >> + "%{m1|m2|m2a-nofpu|m3|m4-nofpu|m4-100-nofpu|m4-200-nofpu|m4-300-nofpu|" > >> \ > >> + "m4-340|m4-400|m4-500|m4al|m5-32media-nofpu|m5-64media-nofpu|" \ > >> + "m5-compact-nofpu:-nofpu}" > >> +#else > >> +/* "-nofpu" if none of the hard fpu options are specified. */ > >> +#define MUSL_DYNAMIC_LINKER_FP \ > >> + > >> "%{m2a|m4|m4-100|m4-200|m4-300|m4a|m5-32media|m5-64media|m5-compact:;:-nofpu}" > >> +#endif > > > > SH5 has been declared obsolete. Please do not add any new SH5 related > > things. In this case, drop the m5-* thingies. > > > > removed m5*. > OK to commit with this? OK for trunk. Cheers, Oleg
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
Hi Jakub! Thanks for the review. On Tue, 20 Oct 2015 12:02:45 +0200, Jakub Jelinek wrote: > On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote: > > > How's the following (complete patch instead of incremental patch; the > > > driver changes are still the same as before)? The changes are: > > > > > > * libgomp/target.c:gomp_target_init again loads all the plugins. > > > * libgomp/target.c:resolve_device and > > > libgomp/oacc-init.c:resolve_device verify that a default device > > > (OpenMP device-var ICV, and acc_device_default, respectively) is > > > actually enabled, or resort to host fallback if not. > > > * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used > > > to enable devices specified by -foffload. Can be called multiple > > > times (executable, any shared libraries); the set of enabled devices > > > is the union of all those ever requested. > > > * GOMP_offload_register (but not the new GOMP_offload_register_ver) > > > changed to enable all devices. This is to maintain compatibility > > > with old executables and shared libraries built without the -foffload > > > constructor support. > > Any reason not to pass the bitmask of the enabled targets to > GOMP_offload_register_ver instead, to decrease the amount of ctors and > the times you lock the various locks during initialization, or just enable > automatically the devices you load data for during GOMP_offload_register_ver? > I mean, GOMP_offload_register would enable for compatibility all devices, > GOMP_offload_register_ver would enable the device it is registered for. > For -foffload=disable on all shared libraries/binaries, naturally you would > not register anything, thus would not enable any devices (only host fallback > would work). As explained a few times already: GOMP_offload_register_ver constructors will only be generated if there actually are offloaded code regions, but for example: #include int main() { __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia)); return 0; } ... is a valid OpenACC program (untested), which doesn't contain any offloaded code regions. As a user I'd expect it to return different answers if compiled with -foffload=nvptx-none in contrast to -foffload=disable. Actually, I can foresee exactly such code to be used to probe for offloading being available, for example in testsuites. And, I guess we agree that under -foffload=disable we'd like the compilation/runtime system to be configured in a way that no offloading will happen? Always creating (dummy) GOMP_offload_register_ver constructors has been another suggestion that I had voiced much earlier in this thread (months ago), but everyone (including me) taking part in the discussion agreed that it'd cause even higher compile-time overhead. > Or are you worried about the case where one shared library is compiled > with say -foffload=intelmic,ptx but doesn't actually contain any > #pragma omp target/#pragma omp declare target (or OpenACC similar > #directives), but only contains #pragma omp target data and/or the device > query/copying routines, then dlopens some other shared library that actually > has the offloading device code? That's another example, yes. > That could be solved by adding the call you are talking about, but > if we really should care about that unlikely case, it would be better to > only arrange for it if really needed by the shared library (i.e. if it calls > one of the OpenMP or OpenACC library routines that talk to the devices, or > has #pragma omp target data or similar constructs; > I'd strongly prefer not to have constructors in code that just got compiled > with -fopenmp, even in configuration where some offloading is configured by > default, when nothing in the code really cares about offloading. So, how to resolve our different opinions? I mean, for any serious program code, there will be constructor calls into libgomp already; are you expecting that adding one more really will cause any noticeable overhead? I agree that enabling devices for GOMP_offload_register_ver calls makes sense. (I indeed had considered this earlier, but it didn't lead to solving the problem complete -- see above.) Can we come up with a scheme to do it this way, and only generate the GOMP_enable_offload_targets constructor of no GOMP_offload_register_ver constructors have been generated? But I have no idea how to implement that in a non-convoluted way. (And, it sounds excessive to me in terms of implementation overhead on our side, in contrast to execution overhead of one libgomp constructor call.) > > --- a/gcc/gcc.c > > +++ b/gcc/gcc.c > > @@ -401,6 +401,8 @@ static const char > > *compare_debug_auxbase_opt_spec_function (int, const char **); > > static const char *pass_through_libs_spec_func (int, const char **); > > static const char *replace_extension_spec_func (int, const char **); > > static const char *greater_th
[PATCH v2 2/6] Libsanitizer merge from upstream r250806 (was r249633).
This patch introduces required compiler changes. Now, we don't version asan_init, we have a special __asan_version_mismatch_check_v[n] symbol for this. asan_stack_malloc_[n] doesn't take a local stack as a second parameter anymore, so don't pass it. Also, ubsan_instrument_float_cast was adjusted to pass source location for libubsan if it is possible. gcc/ChangeLog: 2015-10-20 Maxim Ostapenko gcc/ * asan.c (asan_emit_stack_protection): Don't pass local stack to asan_stack_malloc_[n] anymore. Check if asan_stack_malloc_[n] returned NULL and use local stack than. (asan_finish_file): Insert __asan_version_mismatch_check_v[n] call in addition to __asan_init. * sanitizer.def (BUILT_IN_ASAN_INIT): Rename to __asan_init. (BUILT_IN_ASAN_VERSION_MISMATCH_CHECK): Add new builtin call. * asan.h (asan_intercepted_p): Handle new string builtins. * ubsan.c (ubsan_use_new_style_p): New function. (ubsan_instrument_float_cast): If location is unknown, assign input_location to loc. Propagate loc to ubsan_create_data if ubsan_use_new_style_p returned true. config/ * bootstrap-asan.mk: Replace ASAN_OPTIONS=detect_leaks with LSAN_OPTIONS=detect_leaks. gcc/testsuite/ChangeLog: 2015-10-20 Maxim Ostapenko * c-c++-common/ubsan/float-cast-overflow-10.c: Adjust test. * c-c++-common/ubsan/float-cast-overflow-8.c: Likewise. * c-c++-common/ubsan/float-cast-overflow-9.c: Likewise. * g++.dg/asan/default-options-1.C: Likewise. Index: gcc/asan.c === --- gcc/asan.c (revision 228817) +++ gcc/asan.c (working copy) @@ -1132,12 +1132,16 @@ snprintf (buf, sizeof buf, "__asan_stack_malloc_%d", use_after_return_class); ret = init_one_libfunc (buf); - rtx addr = convert_memory_address (ptr_mode, base); - ret = emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode, 2, + ret = emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode, 1, GEN_INT (asan_frame_size + base_align_bias), - TYPE_MODE (pointer_sized_int_node), - addr, ptr_mode); + TYPE_MODE (pointer_sized_int_node)); + /* __asan_stack_malloc_[n] returns a pointer to fake stack if succeeded + and NULL otherwise. Check RET value is NULL here and jump over the + BASE reassignment in this case. Otherwise, reassign BASE to RET. */ + int very_unlikely = REG_BR_PROB_BASE / 2000 - 1; + emit_cmp_and_jump_insns (ret, const0_rtx, EQ, NULL_RTX, + VOIDmode, 0, lab, very_unlikely); ret = convert_memory_address (Pmode, ret); emit_move_insn (base, ret); emit_label (lab); @@ -2470,6 +2474,8 @@ { tree fn = builtin_decl_implicit (BUILT_IN_ASAN_INIT); append_to_statement_list (build_call_expr (fn, 0), &asan_ctor_statements); + fn = builtin_decl_implicit (BUILT_IN_ASAN_VERSION_MISMATCH_CHECK); + append_to_statement_list (build_call_expr (fn, 0), &asan_ctor_statements); } FOR_EACH_DEFINED_VARIABLE (vnode) if (TREE_ASM_WRITTEN (vnode->decl) Index: gcc/sanitizer.def === --- gcc/sanitizer.def (revision 228817) +++ gcc/sanitizer.def (working copy) @@ -27,8 +27,11 @@ for other FEs by asan.c. */ /* Address Sanitizer */ -DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT, "__asan_init_v4", +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT, "__asan_init", BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_VERSION_MISMATCH_CHECK, + "__asan_version_mismatch_check_v6", + BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) /* Do not reorder the BUILT_IN_ASAN_{REPORT,CHECK}* builtins, e.g. cfgcleanup.c relies on this order. */ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_REPORT_LOAD1, "__asan_report_load1", Index: gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-10.c === --- gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-10.c (revision 228817) +++ gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-10.c (working copy) @@ -10,70 +10,37 @@ /* _Decimal32 */ /* { dg-output "value is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "\[^\n\r]*value is outside the range of representable values of type 'signed char'\[^\n\r]*(\n|\r\n|\r)" } */ /* { dg-output "\[^\n\r]*value is outside the range of representable values of type 'char'\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "\[^\n\r]*value is outside the range of representable values of type 'char'\[^\n\r]*(\n|\r\n|\r)" } */ /* { dg-output "\[^\n\r]*value is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r)" } */ -/* { dg-output "\[^\n\r]*value is outside the range of representable values of type 'unsigned char'\[^\n\r]*(\n|\r\n|\r)" } */ /* { dg-output "\[^\n\r]*value is outside the range of representable values of type 'short int'\[^\n\r]*(\n|\r\n
Re: [PATCH v2 0/6] Libsanitizer merge from upstream r250806 (was r249633).
This is just reapplied patch for SPARC by David S. Miller. The patch was preapproved here: (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01214.html). 2015-10-20 Maxim Ostapenko PR sanitizer/63958 Reapply: 2014-10-14 David S. Miller * sanitizer_common/sanitizer_platform_limits_linux.cc (time_t): Define at __kernel_time_t, as needed for sparc. (struct __old_kernel_stat): Don't check if __sparc__ is defined. * libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h (__sanitizer): Define struct___old_kernel_stat_sz, struct_kernel_stat_sz, and struct_kernel_stat64_sz for sparc. (__sanitizer_ipc_perm): Adjust for sparc targets. (__sanitizer_shmid_ds): Likewsie. (__sanitizer_sigaction): Likewise. (IOC_SIZE): Likewsie. Index: libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc === --- libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc (revision 250059) +++ libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc (working copy) @@ -38,6 +38,7 @@ #define uid_t __kernel_uid_t #define gid_t __kernel_gid_t #define off_t __kernel_off_t +#define time_t __kernel_time_t // This header seems to contain the definitions of _kernel_ stat* structs. #include #undef ino_t @@ -62,7 +63,7 @@ } // namespace __sanitizer #if !defined(__powerpc64__) && !defined(__x86_64__) && !defined(__aarch64__)\ -&& !defined(__mips__) +&& !defined(__mips__) && !defined(__sparc__) COMPILER_CHECK(struct___old_kernel_stat_sz == sizeof(struct __old_kernel_stat)); #endif Index: libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h === --- libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h (revision 250059) +++ libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h (working copy) @@ -83,6 +83,14 @@ const unsigned struct_kernel_stat_sz = 144; #endif const unsigned struct_kernel_stat64_sz = 104; +#elif defined(__sparc__) && defined(__arch64__) + const unsigned struct___old_kernel_stat_sz = 0; + const unsigned struct_kernel_stat_sz = 104; + const unsigned struct_kernel_stat64_sz = 144; +#elif defined(__sparc__) && !defined(__arch64__) + const unsigned struct___old_kernel_stat_sz = 0; + const unsigned struct_kernel_stat_sz = 64; + const unsigned struct_kernel_stat64_sz = 104; #endif struct __sanitizer_perf_event_attr { unsigned type; @@ -105,7 +113,7 @@ #if defined(__powerpc64__) const unsigned struct___old_kernel_stat_sz = 0; -#else +#elif !defined(__sparc__) const unsigned struct___old_kernel_stat_sz = 32; #endif @@ -184,6 +192,18 @@ unsigned short __pad1; unsigned long __unused1; unsigned long __unused2; +#elif defined(__sparc__) +# if defined(__arch64__) +unsigned mode; +unsigned short __pad1; +# else +unsigned short __pad1; +unsigned short mode; +unsigned short __pad2; +# endif +unsigned short __seq; +unsigned long long __unused1; +unsigned long long __unused2; #else unsigned short mode; unsigned short __pad1; @@ -201,6 +221,26 @@ struct __sanitizer_shmid_ds { __sanitizer_ipc_perm shm_perm; + #if defined(__sparc__) + # if !defined(__arch64__) +u32 __pad1; + # endif +long shm_atime; + # if !defined(__arch64__) +u32 __pad2; + # endif +long shm_dtime; + # if !defined(__arch64__) +u32 __pad3; + # endif +long shm_ctime; +uptr shm_segsz; +int shm_cpid; +int shm_lpid; +unsigned long shm_nattch; +unsigned long __glibc_reserved1; +unsigned long __glibc_reserved2; + #else #ifndef __powerpc__ uptr shm_segsz; #elif !defined(__powerpc64__) @@ -238,6 +278,7 @@ uptr __unused4; uptr __unused5; #endif +#endif }; #elif SANITIZER_FREEBSD struct __sanitizer_ipc_perm { @@ -555,9 +596,13 @@ #else __sanitizer_sigset_t sa_mask; #ifndef __mips__ +#if defined(__sparc__) +unsigned long sa_flags; +#else int sa_flags; #endif #endif +#endif #if SANITIZER_LINUX void (*sa_restorer)(); #endif @@ -799,7 +844,7 @@ #define IOC_NRBITS 8 #define IOC_TYPEBITS 8 -#if defined(__powerpc__) || defined(__powerpc64__) || defined(__mips__) +#if defined(__powerpc__) || defined(__powerpc64__) || defined(__mips__) || defined(__sparc__) #define IOC_SIZEBITS 13 #define IOC_DIRBITS 3 #define IOC_NONE 1U @@ -829,7 +874,17 @@ #define IOC_DIR(nr) (((nr) >> IOC_DIRSHIFT) & IOC_DIRMASK) #define IOC_TYPE(nr) (((nr) >> IOC_TYPESHIFT) & IOC_TYPEMASK) #define IOC_NR(nr) (((nr) >> IOC_NRSHIFT) & IOC_NRMASK) + +#if defined(__sparc__) +// In sparc the 14 bits SIZE field overlaps with the +// least significant bit of DIR, so either IOC_READ or +// IOC_WRITE shall be 1 in order to get a non-zero SIZE. +# define IOC_SIZE(nr) \ + ((nr) >> 29) & 0x7) & (4U|2U)) == 0)? \ + 0
[PATCH v2 4/6] Libsanitizer merge from upstream r250806 (was r249633).
This is a reapplied Jakub's patch for disabling ODR violation detection. The patch was preapproved here: (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01215.html). 2015-10-20 Maxim Ostapenko PR bootstrap/63888 Reapply: 2015-02-20 Jakub Jelinek * asan/asan_globals.cc (RegisterGlobal): Disable detect_odr_violation support until it is rewritten upstream. * c-c++-common/asan/pr63888.c: New test. Index: libsanitizer/asan/asan_globals.cc === --- libsanitizer/asan/asan_globals.cc (revision 250059) +++ libsanitizer/asan/asan_globals.cc (working copy) @@ -146,7 +146,9 @@ CHECK(AddrIsInMem(g->beg)); CHECK(AddrIsAlignedByGranularity(g->beg)); CHECK(AddrIsAlignedByGranularity(g->size_with_redzone)); - if (flags()->detect_odr_violation) { + // This "ODR violation" detection is fundamentally incompatible with + // how GCC registers globals. Disable as useless until rewritten upstream. + if (0 && flags()->detect_odr_violation) { // Try detecting ODR (One Definition Rule) violation, i.e. the situation // where two globals with the same name are defined in different modules. if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) {
[PATCH v2 5/6] Libsanitizer merge from upstream r250806 (was r249633).
This patch adjusts the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771 to extract the last PC from the stack frame if no valid FP is available for ARM. The patch was preapproved here: (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01217.html). 2015-10-20 Maxim Ostapenko * sanitizer_common/sanitizer_stacktrace.cc (GetCanonicFrame): Assume we compiled code with GCC when extracting the caller PC for ARM if no valid frame pointer is available. Index: libsanitizer/sanitizer_common/sanitizer_stacktrace.cc === --- libsanitizer/sanitizer_common/sanitizer_stacktrace.cc (revision 250059) +++ libsanitizer/sanitizer_common/sanitizer_stacktrace.cc (working copy) @@ -62,8 +62,8 @@ // Nope, this does not look right either. This means the frame after next does // not have a valid frame pointer, but we can still extract the caller PC. // Unfortunately, there is no way to decide between GCC and LLVM frame - // layouts. Assume LLVM. - return bp_prev; + // layouts. Assume GCC. + return bp_prev - 1; #else return (uhwptr*)bp; #endif
[PATCH v2 6/6] Libsanitizer merge from upstream r250806 (was r249633).
In this patch, I'm trying to add a general instruction how to perform the merge. This is just a documentation patch, any suggestions and opinions are welcome. Index: libsanitizer/HOWTO_MERGE === --- libsanitizer/HOWTO_MERGE (revision 0) +++ libsanitizer/HOWTO_MERGE (working copy) @@ -0,0 +1,26 @@ +In general, merging process should not be very difficult, but we need to +track various ABI changes and GCC-specific patches carefully. Here is a +general list of actions required to perform the merge: + +- Checkout recent GCC tree. +- Run merge.sh script from libsanitizer directory. +- Modify Makefile.am files into asan/tsan/lsan/ubsan/sanitizer_common/interception + directories if needed. In particular, you may need to add new source files + and remove old ones in source files list, add new flags to {C, CXX}FLAGS if + needed and update DEFS with new defined variables. +- Apply all needed GCC-specific patches to libsanitizer (note that some of + them might be already included to upstream). +- Apply all necessary compiler changes. Be especially careful here, you must + not break ABI between compiler and library. +- Modify configure.ac file if needed (e.g. if you need to add link against new + library for sanitizer lilbs). +- Remove unused (deleted by merge) files from all source and include + directories. Be especially careful with headers, because they aren't listed + in Makefiles explicitly. +- Regenerate configure script and all Makefiles by autoreconf. You should use + exactly the same autotools version as for other GCC directories (current + version is 2.64, https://www.gnu.org/software/automake/faq/autotools-faq.html + for details how to install/use it). +- Run regression testing on at least three platforms (e.g. x86-linux-gnu, + x86_64-linux-gnu, aarch64-linux-gnu). +- Run {A, UB}San bootstrap on at least three platforms.
[Ada] Better recovery for misplaced keyword in discriminant specification
This patch improves the handling of an error in a discriminant specification for an access discriminant. Compiling b.ads must yield: b.ads:3:43: "constant" must appear after "access" --- package B is type Constant_Reference (D : constant access Integer) is null record with Implicit_Dereference => D; end B; Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Ed Schonberg * par-ch3.adb (P_Known_Discriminant_Part_Opt): Handle properly a misplaced "constant" keyword in a discriminant specification. Index: par-ch3.adb === --- par-ch3.adb (revision 229049) +++ par-ch3.adb (working copy) @@ -3030,8 +3030,23 @@ Set_Discriminant_Type (Specification_Node, P_Access_Definition (Not_Null_Present)); + + -- Catch ouf-of-order keywords + + elsif Token = Tok_Constant then + Scan; + + if Token = Tok_Access then + Error_Msg_SC ("CONSTANT must appear after ACCESS"); + Set_Discriminant_Type + (Specification_Node, +P_Access_Definition (Not_Null_Present)); + + else + Error_Msg_SC ("misplaced CONSTANT"); + end if; + else - Set_Discriminant_Type (Specification_Node, P_Subtype_Mark); No_Constraint;
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On 10/20/2015 01:17 PM, Thomas Schwinge wrote: As explained a few times already: GOMP_offload_register_ver constructors will only be generated if there actually are offloaded code regions, but for example: #include int main() { __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia)); return 0; } ... is a valid OpenACC program (untested), which doesn't contain any offloaded code regions. As a user I'd expect it to return different answers if compiled with -foffload=nvptx-none in contrast to -foffload=disable. Actually, I can foresee exactly such code to be used to probe for offloading being available, for example in testsuites. And, I guess we agree that under -foffload=disable we'd like the compilation/runtime system to be configured in a way that no offloading will happen? Both of you can ignore me if you feel I'm not making sense, but what exactly is the use case for -foffload=disable? Isn't it slightly redundant with -fno-openacc? IMO it's not an option that alters the available devices, that's a question that is answered at run-time and doesn't (or shouldn't) really depend on compiler switches. As a user I'd expect -foffload=disable to just prevent generation of offloaded code for the things I'm compiling. As Jakub pointed out, shared libraries may still contain other pieces that are offloadable. I guess I don't fully understand why you want to go to great lengths to disable devices at run-time based on a compile-time switch. What's the reasoning here? Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he told me this means "High Water Mark". I have no strong opinion on the name to use, just want to mention that "*_LAST" sounds to me like that one still is part of the accepted set, whereas in this case it'd be the first enumerator outside of the accepted ones. (And I guess, we agree that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is ugly?) Nah, just rename HWM to LAST, that's fairly common usage I think. Bernd
[Ada] Abort while processing "of" container loop
This patch classifies an "of" loop parameter as never needing finalization actions. -- Source -- -- vectors.ads with Ada.Unchecked_Deallocation; with Ada.Unchecked_Conversion; with System; generic type Element_Type is private; Small_Vector_Capacity : Natural := 0; package Vectors is type Elements_Array is array (Natural) of Element_Type; type Elements_Array_Access is access all Elements_Array; function To_Pointer is new Ada.Unchecked_Conversion (System.Address, Elements_Array_Access); procedure Free is new Ada.Unchecked_Deallocation (Elements_Array, Elements_Array_Access); type Small_Array_Type is array (0 .. Small_Vector_Capacity - 1) of Element_Type; type Vector is private with Iterable => (First => First_Index, Next=> Next, Has_Element => Has_Element, Element => Get); procedure Append (Self : in out Vector; Element : Element_Type); pragma Inline_Always (Append); function Get (Self : Vector; Index : Natural) return Element_Type; pragma Inline_Always (Get); procedure Destroy (Self : in out Vector); pragma Inline_Always (Destroy); procedure Clear (Self : in out Vector); pragma Inline_Always (Clear); function First_Element (Self : Vector) return Element_Type; function Last_Element (Self : Vector) return Element_Type; function Length (Self : Vector) return Natural; pragma Inline_Always (Length); function First_Index (Self : Vector) return Natural is (0); pragma Inline_Always (First_Index); function Last_Index (Self : Vector) return Integer is (Length (Self) - 1); pragma Inline_Always (Last_Index); function Next (Self : Vector; N : Natural) return Natural is (N + 1); pragma Inline_Always (Next); function Has_Element (Self : Vector; N : Natural) return Boolean; pragma Inline_Always (Has_Element); private type Vector is record E: Elements_Array_Access := null; Size : Natural := 0; Capacity : Natural := Small_Vector_Capacity; SV : Small_Array_Type; end record; procedure Reserve (Self : in out Vector; Capacity : Positive); pragma Inline_Always (Reserve); function Has_Element (Self : Vector; N : Natural) return Boolean is (N < Self.Size); end Vectors; -- vectors.adb with System; use type System.Address; use System; with System.Memory; use System.Memory; package body Vectors is El_Size : constant size_t := Elements_Array'Component_Size / Storage_Unit; procedure Reserve (Self : in out Vector; Capacity : Positive) is Siz : constant size_t := size_t (Capacity) * El_Size; begin if Small_Vector_Capacity > 0 then if Self.Capacity = Small_Vector_Capacity then Self.E := To_Pointer (Alloc (Siz)); for I in 0 .. Self.Size - 1 loop Self.E.all (I) := Self.SV (I); end loop; else Self.E := To_Pointer (Realloc (Self.E.all'Address, Siz)); end if; else if Self.E /= null then Self.E := To_Pointer (Realloc (Self.E.all'Address, Siz)); else Self.E := To_Pointer (Alloc (Siz)); end if; end if; Self.Capacity := Capacity; end Reserve; procedure Append (Self : in out Vector; Element : Element_Type) is begin if Self.Capacity = Self.Size then Reserve (Self, (Self.Capacity * 2) + 1); end if; if Small_Vector_Capacity = 0 then Self.E.all (Self.Size) := Element; else if Self.Capacity = Small_Vector_Capacity then Self.SV (Self.Size) := Element; else Self.E.all (Self.Size) := Element; end if; end if; Self.Size := Self.Size + 1; end Append; function Get (Self : Vector; Index : Natural) return Element_Type is begin if Small_Vector_Capacity = 0 then return Self.E (Index); else if Self.Capacity = Small_Vector_Capacity then return Self.SV (Index); else return Self.E (Index); end if; end if; end Get; procedure Destroy (Self : in out Vector) is begin Free (Self.E); end Destroy; procedure Clear (Self : in out Vector) is begin Self.Size := 0; end Clear; function Last_Element (Self : Vector) return Element_Type is (Get (Self, Self.Size - 1)); function First_Element (Self : Vector) return Element_Type is (Get (Self, 0)); function Length (Self : Vector) return Natural is (Self.Size); end Vectors; -- ast.ads with Vectors; package AST is type AST_Node_Type is abstract tagged null record; type AST_Node is access all AST_Node_Type'Class; function Image (Node : access AST_Node_Type) return String is abstract; generic type Node_Type is abstract new AST_Node_Type with private; type Node_Access is access all Node_Type'Class; p
[Ada] Discriminants and protected units
This patch implements the following SPARK RM rule: 7.1.3(5) - An effectively volatile type other than a protected type shall not have a discriminated part. -- Source -- -- discrims.ads package Discrims with SPARK_Mode is type Vol_1 (Discr : Natural) is null record with Volatile; type Vol_2 (Discr : Natural) is null record; protected type Prot (Discr : Natural) is entry E; end Prot; end Discrims; -- discrims.adb package body Discrims with SPARK_Mode is protected body Prot is entry E when True is begin null; end E; end Prot; Obj_1 : Vol_1 (1); Obj_2 : Vol_2 (2) with Volatile; Obj_3 : Prot (3); Obj_4 : Prot (4) with Volatile; end Discrims; -- Compilation and output -- $ gcc -c discrims.adb discrims.adb:7:04: discriminated object "Obj_2" cannot be volatile discrims.ads:2:09: discriminated type "Vol_1" cannot be volatile Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Hristian Kirtchev * sem_ch3.adb (Analyze_Object_Contract): A protected type or a protected object is allowed to have a discriminated part. Index: sem_ch3.adb === --- sem_ch3.adb (revision 229049) +++ sem_ch3.adb (working copy) @@ -3347,9 +3347,11 @@ Obj_Id); -- An object of a discriminated type cannot be effectively - -- volatile (SPARK RM C.6(4)). + -- volatile except for protected objects (SPARK RM 7.1.3(5)). - elsif Has_Discriminants (Obj_Typ) then + elsif Has_Discriminants (Obj_Typ) + and then not Is_Protected_Type (Obj_Typ) + then Error_Msg_N ("discriminated object & cannot be volatile", Obj_Id);
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Tue, Oct 20, 2015 at 01:17:45PM +0200, Thomas Schwinge wrote: > Always creating (dummy) GOMP_offload_register_ver constructors has been > another suggestion that I had voiced much earlier in this thread (months > ago), but everyone (including me) taking part in the discussion agreed > that it'd cause even higher compile-time overhead. I'd prefer to just set a flag like "force creation of the GOMP offloading sections" whenever you see one of the APIs or constructs used in the TU, and if that flag is set, even when there are no offloaded vars or functions/kernels, force creation of the corresponding data sections. Either it can be stardard offloading LTO sections, just not containing anything, or, if you want to improve compile-time, it could be special too, so that the linker plugin can quickly identify those that only need offloading support, but don't have any offloaded vars or code. But that can certainly be done as an incremental optimization. For OpenMP that would be whenever #pragma omp target{, data, enter data, exit data} construct is seen (e.g. during gimplification or OMP region nesting checking even better), or for omp_set_default_device omp_get_default_device omp_get_num_devices omp_is_initial_device omp_get_initial_device omp_target_alloc omp_target_free omp_target_is_present omp_target_memcpy omp_target_memcpy_rect omp_target_associate_ptr omp_target_disassociate_ptr calls. Guess for OpenACC you have similar set of calls. The thing is, while OpenACC is standard is pretty much solely about offloading, OpenMP is not, and in many cases programs just use host OpenMP parallelization (at least right now, I bet such programs are significantly larger set than programs that use OpenACC or OpenMP offloading together). Distributions and others will eventually configure the compilers they are shipping to enable the offloading, and if that forces a constructor to every TU or even every shared library just because it has been compiled with -fopenmp, it is unacceptable overhead. For the vendor shipped binary compilers, I'm envisioning ideal would be to be able to configure gcc for many offloading targets, then build such main compiler and offloading target compilers, but package them separately (one package (or set of packages) the base compiler, and then another package (or set of them) for each offloading target. What the -foffload= actually will be in the end from the linked shared library or binary POV would depend both on the configured offloading target, but also on whether the mkoffload binaries are found (or whatever else is needed first from the offloading target). That would mean that we'd not issue hard error or any kind of diagnostics if mkoffload is missing. Is that acceptable, or should that e.g. be limited just to the compiled in configure default (i.e. explicit -foffload= would error if the requested mkoffload is missing, default -foffload= would silently skip unavailable ones; I guess this would be my preference), or should we have two ways of configuring the offloading targets, as hard requirements and as optional support? > So, how to resolve our different opinions? I mean, for any serious > program code, there will be constructor calls into libgomp already; are > you expecting that adding one more really will cause any noticeable > overhead? See above, that is really not the case. Most of OpenMP code doesn't have any constructor calls into libgomp at all, the only exception is GOMP_offload_register{,_ver} at this point. > > What is HWM? Is that OFFLOAD_TARGET_TYPE_LAST what you mean? > > Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he > told me this means "High Water Mark". I have no strong opinion on the > name to use, just want to mention that "*_LAST" sounds to me like that > one still is part of the accepted set, whereas in this case it'd be the > first enumerator outside of the accepted ones. (And I guess, we agree > that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by > "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is > ugly?) *_LAST or *_last is actually what we use pretty much everywhere, see e.g. lots of places in tree-core.h. > Are you worried about the performance issues of a very short locking > cycle that in the majority of all cases should happen without blocking, > in comparison to performance issues related to host/device memory > transfers or kernel launches that will follow after the call to > gomp_offload_target_enabled_p? I don't really think that is reasonable > to worry about. Yes, I'm worried about that. The lock could be contended, and if you take the lock many times for each construct, it can show up, I'm worried about cache effects etc. It is already bad enough that we take/release the locks for the same device e.g. in each of: void *fn_addr = gomp_get_target_fn_addr (devicep, fn); struct target_mem_desc *tgt_vars = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, fals
Re: [PATCH v2 1/6] Libsanitizer merge from upstream r250806 (was r249633).
On Tue, Oct 20, 2015 at 02:18:25PM +0300, Maxim Ostapenko wrote: > This is the merge itself. Added DCAN_SANITIZE_UB={0, 1} and nuked > DASAN_FLEXIBLE_MAPPING_AND_OFFSET=0 (not used now) in corresponding > Makefiles. Ok. Jakub
Re: [PATCH v2 2/6] Libsanitizer merge from upstream r250806 (was r249633).
On Tue, Oct 20, 2015 at 02:19:51PM +0300, Maxim Ostapenko wrote: > + a) Location's filename must not be NULL. > + b) Location's filename must not be equal to "". > + c) Location's filename must not be equal to "\1". > + d) First two bytes of filename msut not contain '\xff' symbol. */ must > + > +static bool > +ubsan_use_new_style_p (location_t loc) > +{ > + if (loc == UNKNOWN_LOCATION) > +return false; > + > + expanded_location xloc = expand_location (loc); > + if (xloc.file == NULL || strncmp (xloc.file, "\1", 2) == 0 > + || strncmp (xloc.file, "", 1) == 0 || xloc.file[0] == '\xff' I'd use || xloc.file == '\0' || xloc.file[0] == '\xff instead, but not a big deal for me. > + || xloc.file[1] == '\xff') > +return false; > + > + return true; > +} > + Ok with the typo fix. Jakub
Re: [PATCH v2 6/6] Libsanitizer merge from upstream r250806 (was r249633).
On Tue, Oct 20, 2015 at 02:29:44PM +0300, Maxim Ostapenko wrote: > In this patch, I'm trying to add a general instruction how to perform the > merge. This is just a documentation patch, any suggestions and opinions are > welcome. I'd add a line that the diff in lib/asan/tests tests since the last merge to those tests (looking at gcc/testsuite/ChangeLog*, the 2014-05-30 entry mentions r209283) should be incorporated into the gcc testsuite too (if possible). Otherwise, LGTM. > Index: libsanitizer/HOWTO_MERGE > === > --- libsanitizer/HOWTO_MERGE (revision 0) > +++ libsanitizer/HOWTO_MERGE (working copy) > @@ -0,0 +1,26 @@ > +In general, merging process should not be very difficult, but we need to > +track various ABI changes and GCC-specific patches carefully. Here is a > +general list of actions required to perform the merge: > + > +- Checkout recent GCC tree. > +- Run merge.sh script from libsanitizer directory. > +- Modify Makefile.am files into > asan/tsan/lsan/ubsan/sanitizer_common/interception > + directories if needed. In particular, you may need to add new source files > + and remove old ones in source files list, add new flags to {C, CXX}FLAGS if > + needed and update DEFS with new defined variables. > +- Apply all needed GCC-specific patches to libsanitizer (note that some of > + them might be already included to upstream). > +- Apply all necessary compiler changes. Be especially careful here, you must > + not break ABI between compiler and library. > +- Modify configure.ac file if needed (e.g. if you need to add link against > new > + library for sanitizer lilbs). > +- Remove unused (deleted by merge) files from all source and include > + directories. Be especially careful with headers, because they aren't > listed > + in Makefiles explicitly. > +- Regenerate configure script and all Makefiles by autoreconf. You should > use > + exactly the same autotools version as for other GCC directories (current > + version is 2.64, > https://www.gnu.org/software/automake/faq/autotools-faq.html > + for details how to install/use it). > +- Run regression testing on at least three platforms (e.g. x86-linux-gnu, > + x86_64-linux-gnu, aarch64-linux-gnu). > +- Run {A, UB}San bootstrap on at least three platforms. Jakub
Re: Move some bit and binary optimizations in simplify and match
On Tue, Oct 20, 2015 at 8:46 AM, Hurugalawadi, Naveen wrote: > Hi, > >>> +/* Fold X + (X / CST) * -CST to X % CST. */ >>> This one is still wrong > Removed. > >>> I don't understand the point of the FLOAT_TYPE_P check. > The check was there in fold-const. So, just had the same check. > >>> Will we also simplify (A & B) - (A & ~B) into B - (A ^ B) ? > Done. > >>> or maybe integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1)) > Done. > >>> :c on bit_ior? It should also allow you to merge the 2 CST versions into >>> one. > Had it. But due to the error on "logical_inverted_value"; confused it with > this pattern and had duplicated it. > >>> I am not really in favor of the indentation change (or the comment). > Done. > >>> This patch is ok when bootstrapped / tested and with a proper changelog >>> entry. > Regression tested on X86_64 with no extra failures. +(simplify + (minus (bit_and:s @0 INTEGER_CST@2) (bit_and:s @0 INTEGER_CST@1)) + (if (integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1))) + (minus (bit_xor @0 @1) @1))) use if (wi::bit_and (@2, @1) == 0) and instead of the 2nd group +/* Fold (A & B) - (A & ~B) into B - (A ^ B). */ +(simplify + (minus (bit_and:s @0 @1) (bit_and:s @0 (bit_not @1))) + (minus @1 (bit_xor @0 @1))) +(simplify + (minus (bit_and:s @0 INTEGER_CST@1) (bit_and:s @0 INTEGER_CST@2)) + (if (integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1))) + (minus @1 (bit_xor @0 @1 place a :c on the minus of the one not matching INTEGER_CSTs. +(simplify + (bit_ior:c (bit_and @0 INTEGER_CST@2) (bit_and (bit_not @0) INTEGER_CST@1)) + (if (integer_all_onesp (const_binop (BIT_XOR_EXPR, type, @2, @1))) + (bit_xor @0 @1))) see above, use wi::bit_and instead Otherwise looks ok to me. RIchard. > 2015-10-20 Richard Biener > Naveen H.S > > * fold-const.c (fold_binary_loc) : Move (-A) * (-B) -> A * B > to match.pd. > Move (a * (1 << b)) is (a << b) to match.pd. > Move convert (C1/X)*C2 into (C1*C2)/X to match.pd. > Move ~X & X, (X == 0) & X, and !X & X are zero to match.pd. > Move X & ~X , X & (X == 0), and X & !X are zero to match.pd. > > * match.pd (mult:c @0 (convert? (lshift integer_onep@1 @2))): > New simplifier. > (mult (rdiv:s REAL_CST@0 @1) REAL_CST@2): New simplifier. > (bit_and:c (convert? @0) (convert? (bit_not @0))): New simplifier. > (bit_ior (bit_and:s @0 (bit_not:s @1)) (bit_and:s (bit_not:s @0) @1)) > : New simplifier. > (mult:c (convert1? (negate @0)) (convert2? negate_expr_p@1)): > New simplifier. > (match (logical_inverted_value @0) (truth_not @0)) : New Predicate. > > > > 2015-10-20 Richard Biener > Naveen H.S > > * fold-const.c (fold_binary_loc) : Move Fold (A & ~B) - (A & B) > into (A ^ B) - B to match.pd > Move (X & ~Y) | (~X & Y) is X ^ Y to match.pd. > > * match.pd (minus (bit_and:s @0 (bit_not @1)) (bit_and:s @0 @1)): > New simplifier. > (minus (bit_and:s @0 INTEGER_CST@2) (bit_and:s @0 INTEGER_CST@1)): > New simplifier. > (minus (bit_and:s @0 @1) (bit_and:s @0 (bit_not @1))): New simplifier. > (minus (bit_and:s @0 INTEGER_CST@1) (bit_and:s @0 INTEGER_CST@2)): > New simplifier. > (bit_ior:c (bit_and:c @0 (bit_not @1)) (bit_and:c (bit_not @0) @1)): > New simplifier. > (bit_ior:c (bit_and @0 INTEGER_CST@2) (bit_and (bit_not @0) > INTEGER_CST@1)): > New simplifier.
Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
On Tue, Oct 20, 2015 at 01:45:37PM +0200, Bernd Schmidt wrote: > Both of you can ignore me if you feel I'm not making sense, but what exactly > is the use case for -foffload=disable? Isn't it slightly redundant with > -fno-openacc? IMO it's not an option that alters the available devices, > that's a question that is answered at run-time and doesn't (or shouldn't) > really depend on compiler switches. As a user I'd expect -foffload=disable > to just prevent generation of offloaded code for the things I'm compiling. > As Jakub pointed out, shared libraries may still contain other pieces that > are offloadable. > > I guess I don't fully understand why you want to go to great lengths to > disable devices at run-time based on a compile-time switch. What's the > reasoning here? At least for OpenMP, I'm also happy with what we do now (except for the ability to configure offloading targets as optional, i.e. dynamically configure the default based on what packages user install rather than just on how it has been configured, so that e.g. just because it has been configured for PTX offloading the host GCC itself doesn't have to have a dependency on the proprietary CUDA stuff in any way). I believe in OpenMP nobody says that if the device HW is available, but user chose to not compile offloading code/variables for that particular device that it can't show up among omp_get_num_devices (). And I think it is entirely fine if say target data map succeeds to that device, but then target is offloaded, if that is caused by users configure or command line choice. Maybe OpenACC has different requirements, is it required to terminate the program if it can't fulfill the requested offloading? In any case, I'm fine with something I've noted in the last mail, or with the status quo, but not with running constructors in TUs or even shared libraries just because they have been compiled with -fopenmp (and either haven't used any OpenMP code at all, or just the non-*target* directives). Jakub
[Ada] Friendlier behavior for "=" and ":="
This patch avoids race conditions in certain cases. This is not necessary, because these cases are technically erroneous, but it seems friendlier to avoid races. Furthermore, previous versions of the containers avoided some of these races. No test available; no change in behavior for correct programs. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Bob Duff * a-cbdlli.adb, a-cdlili.adb, a-chtgop.adb, a-cidlli.adb, * a-cobove.adb, a-coinve.adb, a-convec.adb, a-crbtgo.adb ("="): Avoid modifying the tampering counts unnecessarily. (Adjust): Zero tampering counts unconditionally. Index: a-cdlili.adb === --- a-cdlili.adb(revision 229049) +++ a-cdlili.adb(working copy) @@ -73,31 +73,35 @@ - function "=" (Left, Right : List) return Boolean is - -- Per AI05-0022, the container implementation is required to detect - -- element tampering by a generic actual subprogram. - - Lock_Left : With_Lock (Left.TC'Unrestricted_Access); - Lock_Right : With_Lock (Right.TC'Unrestricted_Access); - - L : Node_Access; - R : Node_Access; - begin if Left.Length /= Right.Length then return False; end if; - L := Left.First; - R := Right.First; - for J in 1 .. Left.Length loop - if L.Element /= R.Element then -return False; - end if; + if Left.Length = 0 then + return True; + end if; - L := L.Next; - R := R.Next; - end loop; + declare + -- Per AI05-0022, the container implementation is required to detect + -- element tampering by a generic actual subprogram. + Lock_Left : With_Lock (Left.TC'Unrestricted_Access); + Lock_Right : With_Lock (Right.TC'Unrestricted_Access); + + L : Node_Access := Left.First; + R : Node_Access := Right.First; + begin + for J in 1 .. Left.Length loop +if L.Element /= R.Element then + return False; +end if; + +L := L.Next; +R := R.Next; + end loop; + end; + return True; end "="; @@ -109,10 +113,15 @@ Src : Node_Access := Container.First; begin + -- If the counts are nonzero, execution is technically erroneous, but + -- it seems friendly to allow things like concurrent "=" on shared + -- constants. + + Zero_Counts (Container.TC); + if Src = null then pragma Assert (Container.Last = null); pragma Assert (Container.Length = 0); - pragma Assert (Container.TC = (Busy => 0, Lock => 0)); return; end if; Index: a-coinve.adb === --- a-coinve.adb(revision 229049) +++ a-coinve.adb(working copy) @@ -103,30 +103,37 @@ - overriding function "=" (Left, Right : Vector) return Boolean is - -- Per AI05-0022, the container implementation is required to detect - -- element tampering by a generic actual subprogram. - - Lock_Left : With_Lock (Left.TC'Unrestricted_Access); - Lock_Right : With_Lock (Right.TC'Unrestricted_Access); begin if Left.Last /= Right.Last then return False; end if; - for J in Index_Type range Index_Type'First .. Left.Last loop - if Left.Elements.EA (J) = null then -if Right.Elements.EA (J) /= null then + if Left.Length = 0 then + return True; + end if; + + declare + -- Per AI05-0022, the container implementation is required to detect + -- element tampering by a generic actual subprogram. + + Lock_Left : With_Lock (Left.TC'Unrestricted_Access); + Lock_Right : With_Lock (Right.TC'Unrestricted_Access); + begin + for J in Index_Type range Index_Type'First .. Left.Last loop +if Left.Elements.EA (J) = null then + if Right.Elements.EA (J) /= null then + return False; + end if; + +elsif Right.Elements.EA (J) = null then return False; + +elsif Left.Elements.EA (J).all /= Right.Elements.EA (J).all then + return False; end if; + end loop; + end; - elsif Right.Elements.EA (J) = null then -return False; - - elsif Left.Elements.EA (J).all /= Right.Elements.EA (J).all then -return False; - end if; - end loop; - return True; end "="; @@ -136,6 +143,12 @@ procedure Adjust (Container : in out Vector) is begin + -- If the counts are nonzero, execution is technically erroneous, but + -- it seems friendly to allow things like concurrent "=" on shared + -- constants. + + Zero_Counts (Container.TC); + if Contai
[Ada] Remove UET_Address attribute
This was an unused and internal-use only attribute. No testcase. Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Tristan Gingold * sem_util.adb (Is_Protected_Self_Reference): Remove reference to UET_Address in comment. * sem_attr.adb (Check_Unit_Name): Adjust comment. (Analyze_Attribute): Remove handling of UET_Address. * sem_attr.ads (Attribute_Impl_Def): Remove Attribute_UET_Address. * snames.ads-tmpl Remove Name_UET_Address, Attribute_UET_Address. * exp_attr.adb (Expand_N_Attribute_Reference): Remove Attribute_UET_Address. Index: exp_attr.adb === --- exp_attr.adb(revision 229057) +++ exp_attr.adb(working copy) @@ -6152,49 +6152,6 @@ Expand_Fpt_Attribute_R (N); end if; - - - -- UET_Address -- - - - - when Attribute_UET_Address => UET_Address : declare - Ent : constant Entity_Id := Make_Temporary (Loc, 'T'); - - begin - Insert_Action (N, - Make_Object_Declaration (Loc, - Defining_Identifier => Ent, - Aliased_Present => True, - Object_Definition => - New_Occurrence_Of (RTE (RE_Address), Loc))); - - -- Construct name __gnat_xxx__SDP, where xxx is the unit name - -- in normal external form. - - Get_External_Unit_Name_String (Get_Unit_Name (Pref)); - Name_Buffer (1 + 7 .. Name_Len + 7) := Name_Buffer (1 .. Name_Len); - Name_Len := Name_Len + 7; - Name_Buffer (1 .. 7) := "__gnat_"; - Name_Buffer (Name_Len + 1 .. Name_Len + 5) := "__SDP"; - Name_Len := Name_Len + 5; - - Set_Is_Imported (Ent); - Set_Interface_Name (Ent, - Make_String_Literal (Loc, - Strval => String_From_Name_Buffer)); - - -- Set entity as internal to ensure proper Sprint output of its - -- implicit importation. - - Set_Is_Internal (Ent); - - Rewrite (N, - Make_Attribute_Reference (Loc, - Prefix => New_Occurrence_Of (Ent, Loc), - Attribute_Name => Name_Address)); - - Analyze_And_Resolve (N, Typ); - end UET_Address; - -- Update -- Index: sem_util.adb === --- sem_util.adb(revision 229059) +++ sem_util.adb(working copy) @@ -12730,9 +12730,9 @@ begin -- Verify that prefix is analyzed and has the proper form. Note that - -- the attributes Elab_Spec, Elab_Body, Elab_Subp_Body and UET_Address, - -- which also produce the address of an entity, do not analyze their - -- prefix because they denote entities that are not necessarily visible. + -- the attributes Elab_Spec, Elab_Body and Elab_Subp_Body which also + -- produce the address of an entity, do not analyze their prefix + -- because they denote entities that are not necessarily visible. -- Neither of them can apply to a protected type. return Ada_Version >= Ada_2005 Index: sem_attr.adb === --- sem_attr.adb(revision 229057) +++ sem_attr.adb(working copy) @@ -388,8 +388,8 @@ -- itself of the form of a library unit name. Note that this is -- quite different from Check_Program_Unit, since it only checks -- the syntactic form of the name, not the semantic identity. This - -- is because it is used with attributes (Elab_Body, Elab_Spec, - -- UET_Address and Elaborated) which can refer to non-visible unit. + -- is because it is used with attributes (Elab_Body, Elab_Spec and + -- Elaborated) which can refer to non-visible unit. procedure Error_Attr (Msg : String; Error_Node : Node_Id); pragma No_Return (Error_Attr); @@ -2675,7 +2675,6 @@ if Aname /= Name_Elab_Body and then Aname /= Name_Elab_Spec and then Aname /= Name_Elab_Subp_Body and then - Aname /= Name_UET_Address and then Aname /= Name_Enabled and then Aname /= Name_Old then @@ -6026,15 +6025,6 @@ Analyze_And_Resolve (N, Standard_String); - - - -- UET_Address -- - - - - when Attribute_UET_Address => - Check_E0; - Check_Unit_Name (P); - Set_Etype (N, RTE (RE_Address)); - --- -- Unbiased_Rounding -- --- @@ -9710,7 +9700,6 @@ Attribute_Terminated | Attribute_To_Address | Attribute_Type_Key | - Attribute_UET_Address | Attribute_Unchecked_Access |
[Ada] Debug information for limited class-wide objects
This patch modifies the expansion of a build-in-place function call that initializes a class-wide limited object to generate the necessary debug information for the object. -- Source -- -- types.ads package Types is type Root_Type is tagged limited record I : Integer; end record; type Root_Access is access all Root_Type'Class; function Get return Root_Type'Class; procedure Ignore (O_Acc : Root_Access); end Types; -- types.adb package body Types is function Get return Root_Type'Class is begin return Root_Type'(I => 0); end Get; procedure Ignore (O_Acc : Root_Access) is begin null; end Ignore; end Types; -- cw_debug_info.adb with Types; use Types; procedure CW_Debug_Info is Obj : aliased Root_Type'Class := Get; Obj_Ptr : constant Root_Access:= Obj'Unchecked_Access; begin Ignore (Obj_Ptr); end CW_Debug_Info; -- Compilation and output -- $ gnatmake -q -g cw_debug_info.adb $ gdb ./cw_debug_info $ b cw_debug_info.adb:5 $ r $ print (obj) $1 = (i => 0) Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Hristian Kirtchev * exp_ch6.adb (Expand_N_Extended_Return_Statement): Code cleanup. (Make_Build_In_Place_Call_In_Object_Declaration): Update the parameter profile. Code cleanup. Request debug info for the object renaming declaration. (Move_Activation_Chain): Add new formal parameter and update the comment on usage. * exp_ch6.ads (Make_Build_In_Place_Call_In_Object_Declaration): Update the parameter profile and comment on usage. * sem_util.ads, sem_util.adb (Remove_Overloaded_Entity): New routine, currently unused. Index: sem_util.adb === --- sem_util.adb(revision 229060) +++ sem_util.adb(working copy) @@ -16961,6 +16961,106 @@ end if; end Remove_Homonym; + -- + -- Remove_Overloaded_Entity -- + -- + + procedure Remove_Overloaded_Entity (Id : Entity_Id) is + procedure Remove_Primitive_Of (Typ : Entity_Id); + -- Remove primitive subprogram Id from the list of primitives that + -- belong to type Typ. + + - + -- Remove_Primitive_Of -- + - + + procedure Remove_Primitive_Of (Typ : Entity_Id) is + Prims : Elist_Id; + + begin + if Is_Tagged_Type (Typ) then +Prims := Direct_Primitive_Operations (Typ); + +if Present (Prims) then + Remove (Prims, Id); +end if; + end if; + end Remove_Primitive_Of; + + -- Local variables + + Scop: constant Entity_Id := Scope (Id); + Formal : Entity_Id; + Prev_Id : Entity_Id; + + -- Start of processing for Remove_Overloaded_Entity + + begin + -- Remove the entity from the homonym chain. When the entity is the + -- head of the chain, associate the entry in the name table with its + -- homonym effectively making it the new head of the chain. + + if Current_Entity (Id) = Id then + Set_Name_Entity_Id (Chars (Id), Homonym (Id)); + + -- Otherwise link the previous and next homonyms + + else + Prev_Id := Current_Entity (Id); + while Present (Prev_Id) and then Homonym (Prev_Id) /= Id loop +Prev_Id := Homonym (Prev_Id); + end loop; + + Set_Homonym (Prev_Id, Homonym (Id)); + end if; + + -- Remove the entity from the scope entity chain. When the entity is + -- the head of the chain, set the next entity as the new head of the + -- chain. + + if First_Entity (Scop) = Id then + Prev_Id := Empty; + Set_First_Entity (Scop, Next_Entity (Id)); + + -- Otherwise the entity is either in the middle of the chain or it acts + -- as its tail. Traverse and link the previous and next entities. + + else + Prev_Id := First_Entity (Scop); + while Present (Prev_Id) and then Next_Entity (Prev_Id) /= Id loop +Next_Entity (Prev_Id); + end loop; + + Set_Next_Entity (Prev_Id, Next_Entity (Id)); + end if; + + -- Handle the case where the entity acts as the tail of the scope entity + -- chain. + + if Last_Entity (Scop) = Id then + Set_Last_Entity (Scop, Prev_Id); + end if; + + -- The entity denotes a primitive subprogram. Remove it from the list of + -- primitives of the associated controlling type. + + if Ekind_In (Id, E_Function, E_Procedure) and then Is_Primitive (Id) then + Formal := First_Formal (Id); + while Present (Formal) loop +if Is_Controlling_Formal (Formal) then + Remove_Primitive_Of (Etype (Formal)); + exit; +
[Ada] Crash on address clause involving controlled objects.
An address clause that overlays an object with a controlled object of a component of a controlled object is erroneous, and the compiler replaces the address clause with the corresponding raise statement. However, the analysis of the address clause must not terminate prematurely, so that the back-end can complete code generation. Compiling and executing main,adb must yield: main.adb:4:03: warning: in instantiation at erase_on_finalize.adb:22 main.adb:4:03: warning: cannot use overlays with controlled objects main.adb:4:03: warning: Program_Error will be raised at run time raised PROGRAM_ERROR : main.adb:2 finalize/adjust raised exception --- with Erase_On_Finalize; procedure Main is package Sensitive_Integer is new Erase_On_Finalize (Integer); I1: Sensitive_Integer.Object := (Sensitive_Integer.Root with 1); I2: Sensitive_Integer.Object := (Sensitive_Integer.Root with 2); I3: Sensitive_Integer.Object := (Sensitive_Integer.Root with 3); begin null; end Main; --- with Ada.Finalization; generic type Data_Type is limited private; package Erase_On_Finalize is subtype Root is Ada.Finalization.Limited_Controlled; type Object is new Root with record Contents : Data_Type; end record; overriding procedure Finalize (Obj: in out Object); -- Disallow dynamic memory allocation. type Object_Ref is access Object with Storage_Size => 0; end Erase_On_Finalize; --- with System; with Interfaces; with GNAT.Memory_Dump; package body Erase_On_Finalize is subtype Byte_Type is Interfaces.Unsigned_8; use type Byte_Type; Erase_Pattern : constant Byte_Type := 0; procedure Erase_Bytes (Addr: System.Address; Bytes: Natural) is type Byte_View_Type is array (1 .. Bytes) of Byte_Type; Byte_View : Byte_View_Type with Address => Addr, Volatile; begin GNAT.Memory_Dump.Dump (Addr, Bytes); Byte_View := (others => Erase_Pattern); end Erase_Bytes; overriding procedure Finalize (Obj: in out Object) is type Byte_View_Type is array (1 .. (Obj.Contents'Size + 7) / 8) of Byte_Type; Byte_View : Byte_View_Type with address => Obj.Contents'address, Volatile; begin GNAT.Memory_Dump.Dump (Byte_View'Address, Byte_View'Length); Byte_View := (others => Erase_Pattern); end Finalize; end Erase_On_Finalize; Tested on x86_64-pc-linux-gnu, committed on trunk 2015-10-20 Ed Schonberg * sem_ch13.adb: nalyze_Attribute_Definition_Clause, case 'Address): If either object is controlled the overlay is erroneous, but analysis must be completed so that back-end sees address clause and completes code generation. Improve text of warning. Index: sem_ch13.adb === --- sem_ch13.adb(revision 229071) +++ sem_ch13.adb(working copy) @@ -4711,20 +4711,22 @@ Find_Overlaid_Entity (N, O_Ent, Off); - -- Overlaying controlled objects is erroneous + -- Overlaying controlled objects is erroneous. + -- Emit warning but continue analysis because program is + -- itself legal, and back-end must see address clause. if Present (O_Ent) and then (Has_Controlled_Component (Etype (O_Ent)) or else Is_Controlled (Etype (O_Ent))) +and then not Inside_A_Generic then Error_Msg_N - ("??cannot overlay with controlled object", Expr); + ("??cannot use overlays with controlled objects", Expr); Error_Msg_N ("\??Program_Error will be raised at run time", Expr); Insert_Action (Declaration_Node (U_Ent), Make_Raise_Program_Error (Loc, Reason => PE_Overlaid_Controlled_Object)); - return; elsif Present (O_Ent) and then Ekind (U_Ent) = E_Constant
[PATCH] Fix PR68017
The usual stmt removing vs. debug stmts thing. Bootstrapped and tested on x86_64-linux-gnu, applied to trunk. Richard. 2015-10-20 Richard Biener PR tree-optimization/68017 * tree-tailcall.c (eliminate_tail_call): Remove stmts backwards. * gcc.dg/torture/pr68017.c: New testcase. Index: gcc/tree-tailcall.c === --- gcc/tree-tailcall.c (revision 228971) +++ gcc/tree-tailcall.c (working copy) @@ -847,17 +847,21 @@ eliminate_tail_call (struct tailcall *t) possibly unreachable code in other blocks is removed later in cfg cleanup. */ gsi = t->call_gsi; - gsi_next (&gsi); - while (!gsi_end_p (gsi)) + gimple_stmt_iterator gsi2 = gsi_last_bb (gimple_bb (gsi_stmt (gsi))); + while (gsi_stmt (gsi2) != gsi_stmt (gsi)) { - gimple *t = gsi_stmt (gsi); + gimple *t = gsi_stmt (gsi2); /* Do not remove the return statement, so that redirect_edge_and_branch sees how the block ends. */ - if (gimple_code (t) == GIMPLE_RETURN) - break; - - gsi_remove (&gsi, true); - release_defs (t); + if (gimple_code (t) != GIMPLE_RETURN) + { + gimple_stmt_iterator gsi3 = gsi2; + gsi_prev (&gsi2); + gsi_remove (&gsi3, true); + release_defs (t); + } + else + gsi_prev (&gsi2); } /* Number of executions of function has reduced by the tailcall. */ Index: gcc/testsuite/gcc.dg/torture/pr68017.c === --- gcc/testsuite/gcc.dg/torture/pr68017.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr68017.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-g" } */ + +long long a; + +short +fn1 (short p1, unsigned short p2) +{ + return p1 + p2; +} + +short +fn2 () +{ + int b = a ? fn1 (fn2 (), a) : 0; + return b; +}
[PATCH][AArch64 Testsuite][Trivial?] Remove divisions-to-produce-NaN from vdiv_f.c
The test vdiv_f.c #define's NAN to (0.0 / 0.0). This produces extra scalar fdiv's, which complicate the scan-assembler testing. We can remove these by using __builtin_nan instead. Tested on AArch64 Linux. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vdiv_f.c: Use __builtin_nan. --- gcc/testsuite/gcc.target/aarch64/vdiv_f.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/vdiv_f.c b/gcc/testsuite/gcc.target/aarch64/vdiv_f.c index 45c72a9..a505e39 100644 --- a/gcc/testsuite/gcc.target/aarch64/vdiv_f.c +++ b/gcc/testsuite/gcc.target/aarch64/vdiv_f.c @@ -7,7 +7,7 @@ #define FLT_INFINITY (__builtin_inff ()) #define DBL_INFINITY (__builtin_inf ()) -#define NAN (0.0 / 0.0) +#define NAN (__builtin_nan ("")) #define PI 3.141592653589793 #define PI_4 0.7853981633974483 @@ -228,9 +228,7 @@ test_vdiv_f64 () return 0; } -/* The following assembly should match 2 more times, - in 64bit NAN generation. */ -/* { dg-final { scan-assembler-times "fdiv\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "fdiv\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */ #undef TESTA8 #undef ANSW8 -- 1.9.1
[PATCH] Pass manager: add support for termination of pass list
Hello. As part of upcoming merge of HSA branch, we would like to have possibility to terminate pass manager after execution of the HSA generation pass. The HSA back-end is implemented as a tree pass that directly emits HSAIL from gimple tree representation. The pass operates on clones created by HSA IPA pass and the pass manager should stop execution of further RTL passes. Suggested patch survives bootstrap and regression tests on x86_64-linux-pc. What do you think about it? Thanks, Martin >From fc60561c3ac09188fb61dac61b1cc2422061fc1d Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 19 Oct 2015 23:26:54 +0200 Subject: [PATCH] Add support for termination of pass manager. gcc/ChangeLog: 2015-10-19 Martin Liska * passes.c (execute_one_pass): Add new argument called exit. (execute_pass_list_1): Terminate pass manager if a pass requests termination. (execute_ipa_pass_list): Likewise. * tree-pass.h: Introduce new TODO_stop_pass_execution. --- gcc/passes.c| 24 gcc/tree-pass.h | 3 +++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/gcc/passes.c b/gcc/passes.c index 6ef6d2e..1199ae3 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -2279,10 +2279,11 @@ override_gate_status (opt_pass *pass, tree func, bool gate_status) } -/* Execute PASS. */ +/* Execute PASS. If the PASS requests to stop after its execution, EXIT + value is set to true. */ bool -execute_one_pass (opt_pass *pass) +execute_one_pass (opt_pass *pass, bool *exit) { unsigned int todo_after = 0; @@ -2387,18 +2388,28 @@ execute_one_pass (opt_pass *pass) if (!((todo_after | pass->todo_flags_finish) & TODO_do_not_ggc_collect)) ggc_collect (); + /* If finish TODO flags contain TODO_stop_pass_execution, set exit = true. */ + if (todo_after & TODO_stop_pass_execution) +*exit = true; + return true; } static void execute_pass_list_1 (opt_pass *pass) { + bool stop_pass_execution = false; + do { gcc_assert (pass->type == GIMPLE_PASS || pass->type == RTL_PASS); - if (execute_one_pass (pass) && pass->sub) + if (execute_one_pass (pass, &stop_pass_execution) && pass->sub) execute_pass_list_1 (pass->sub); + + if (stop_pass_execution) + return; + pass = pass->next; } while (pass); @@ -2739,12 +2750,14 @@ ipa_read_optimization_summaries (void) void execute_ipa_pass_list (opt_pass *pass) { + bool stop_pass_execution = false; + do { gcc_assert (!current_function_decl); gcc_assert (!cfun); gcc_assert (pass->type == SIMPLE_IPA_PASS || pass->type == IPA_PASS); - if (execute_one_pass (pass) && pass->sub) + if (execute_one_pass (pass, &stop_pass_execution) && pass->sub) { if (pass->sub->type == GIMPLE_PASS) { @@ -2763,6 +2776,9 @@ execute_ipa_pass_list (opt_pass *pass) gcc_assert (!current_function_decl); symtab->process_new_functions (); pass = pass->next; + + if (stop_pass_execution) + return; } while (pass); } diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index c37e4b2..a481bac 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -300,6 +300,9 @@ protected: /* Rebuild the callgraph edges. */ #define TODO_rebuild_cgraph_edges (1 << 22) +/* Stop pass manager after execution of a pass. */ +#define TODO_stop_pass_execution (1 << 23) + /* Internally used in execute_function_todo(). */ #define TODO_update_ssa_any \ (TODO_update_ssa \ -- 2.6.0
Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof
On 10/16/2015 09:34 PM, Martin Sebor wrote: Thank you for the review. Attached is an updated patch that hopefully addresses all your comments. I ran the check_GNU_style.sh script on it to make sure I didn't miss something. I've also added replies to a few of your comments below. Ok, thanks. However - the logic around the context struct in the patch still seems fairly impenetrable to me, and I've been wondering whether a simpler approach wouldn't work just as well. I came up with the patch below, which just passes a tree_code context to recursive invocations and increasing the upper bound by one only if not in an ARRAY_REF or COMPONENT_REF context. As far as I can tell, this produces the same warnings on the testcase (modulo differences in type printout), except for the final few FA5_7 testcases, which I had doubts about anyway: typedef struct FA5_7 { int i; char a5_7 [5][7]; } FA5_7; __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" } Why wouldn't at least the first two be convered by the one-past-the-end-is-ok rule? Bernd Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c (revision 229049) +++ gcc/c-family/c-common.c (working copy) @@ -10589,11 +10589,11 @@ c_common_to_target_charset (HOST_WIDE_IN traditional rendering of offsetof as a macro. Return the folded result. */ tree -fold_offsetof_1 (tree expr) +fold_offsetof_1 (tree expr, enum tree_code ctx) { tree base, off, t; - - switch (TREE_CODE (expr)) + tree_code code = TREE_CODE (expr); + switch (code) { case ERROR_MARK: return expr; @@ -10617,7 +10617,7 @@ fold_offsetof_1 (tree expr) return TREE_OPERAND (expr, 0); case COMPONENT_REF: - base = fold_offsetof_1 (TREE_OPERAND (expr, 0)); + base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code); if (base == error_mark_node) return base; @@ -10634,7 +10634,7 @@ fold_offsetof_1 (tree expr) break; case ARRAY_REF: - base = fold_offsetof_1 (TREE_OPERAND (expr, 0)); + base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code); if (base == error_mark_node) return base; @@ -10649,8 +10649,9 @@ fold_offsetof_1 (tree expr) && !tree_int_cst_equal (upbound, TYPE_MAX_VALUE (TREE_TYPE (upbound { - upbound = size_binop (PLUS_EXPR, upbound, -build_int_cst (TREE_TYPE (upbound), 1)); + if (ctx != ARRAY_REF && ctx != COMPONENT_REF) + upbound = size_binop (PLUS_EXPR, upbound, + build_int_cst (TREE_TYPE (upbound), 1)); if (tree_int_cst_lt (upbound, t)) { tree v; Index: gcc/c-family/c-common.h === --- gcc/c-family/c-common.h (revision 229049) +++ gcc/c-family/c-common.h (working copy) @@ -1029,7 +1029,7 @@ extern bool c_dump_tree (void *, tree); extern void verify_sequence_points (tree); -extern tree fold_offsetof_1 (tree); +extern tree fold_offsetof_1 (tree, tree_code ctx = ERROR_MARK); extern tree fold_offsetof (tree); /* Places where an lvalue, or modifiable lvalue, may be required.
Re: [PATCH] x86 interrupt attribute
The debug_info section for the interrupt function looks ok. I tried to call it from assembler code to check it in gdb. pushl $0x333 ;eflags pushl $0x111 ;cs pushl $0x222 ;eip jmp foo ;interrupt function #define uword_t unsigned int struct interrupt_frame { uword_t ip; uword_t cs; uword_t flags; }; I get this inside the interrupt function: Breakpoint 1, foo (frame=0xd560) at interrupt-1.c:7 7 a = (struct interrupt_frame*) frame; (gdb) p/x ((struct interrupt_frame*)frame)->ip $1 = 0x222 (gdb) p/x ((struct interrupt_frame*)frame)->cs $3 = 0x111 (gdb) p/x ((struct interrupt_frame*)frame)->flags $4 = 0x333 Frame pointer info looks ok. On Tue, Oct 13, 2015 at 3:18 PM, Yulia Koval wrote: > > Here is the current version of the patch with all the fixes. > Regtested\bootstraped it on 64 bit. > > We need a pointer since interrupt handler will update data pointing > to by frame. Since error_code isn't at the normal location where the > parameter is passed on stack and frame isn't in a hard register, we > changed ix86_function_arg: > > + if (!cum->caller && cfun->machine->func_type != TYPE_NORMAL) > +{ > + /* The first argument of interrupt handler is a pointer and > +points to the return address slot on stack. The optional > +second argument is an integer for error code on stack. */ > + gcc_assert (type != NULL_TREE); > + if (POINTER_TYPE_P (type)) > + { > + if (cfun->machine->func_type == TYPE_EXCEPTION) > + /* (AP) in the current frame in exception handler. */ > + arg = arg_pointer_rtx; > + else > + /* -WORD(AP) in the current frame in interrupt handler. */ > + arg = force_reg (Pmode, > +plus_constant (Pmode, arg_pointer_rtx, > + -UNITS_PER_WORD)); > + if (mode != Pmode) > + arg = convert_to_mode (mode, arg, 1); > + } > + else > + { > + gcc_assert (TREE_CODE (type) == INTEGER_TYPE > + && cfun->machine->func_type == TYPE_EXCEPTION > + && mode == word_mode); > + /* The error code is at -WORD(AP) in the current frame in > +exception handler. */ > + arg = gen_rtx_MEM (word_mode, > +plus_constant (Pmode, arg_pointer_rtx, > + -UNITS_PER_WORD)); > + } > + > + return arg; > +} > + > > to return a pseudo register. It violates > >Return where to put the arguments to a function. >Return zero to push the argument on the stack, or a hard register in >which to store the argument. > > Register allocator has no problem with parameters in pseudo registers. > But GCC crashes when it tries to access DECL_INCOMING_RTL as a hard > register when generating debug information. We worked around it by > doing > > + > + if (cfun->machine->func_type != TYPE_NORMAL) > +{ > + /* Since the pointer argument of interrupt handler isn't a real > + argument, adjust DECL_INCOMING_RTL for debug output. */ > + tree arg = DECL_ARGUMENTS (current_function_decl); > + gcc_assert (arg != NULL_TREE > + && POINTER_TYPE_P (TREE_TYPE (arg))); > + if (cfun->machine->func_type == TYPE_EXCEPTION) > + /* (AP) in the current frame in exception handler. */ > + DECL_INCOMING_RTL (arg) = arg_pointer_rtx; > + else > + /* -WORD(AP) in the current frame in interrupt handler. */ > + DECL_INCOMING_RTL (arg) = plus_constant (Pmode, > +arg_pointer_rtx, > +-UNITS_PER_WORD); > +} > > > On Mon, Oct 5, 2015 at 12:29 PM, Uros Bizjak wrote: > > On Mon, Oct 5, 2015 at 1:17 AM, H.J. Lu wrote: > > > >>> Looking a bit deeper into the code, it looks that we want to realign > >>> the stack in the interrupt handler. Let's assume that interrupt > >>> handler is calling some other function that saves SSE vector regs to > >>> the stack. According to the x86 ABI, incoming stack of the called > >>> function is assumed to be aligned to 16 bytes. But, interrupt handler > >>> violates this assumption, since the stack could be aligned to only 4 > >>> bytes for 32bit and 8 bytes for 64bit targets. Entering the called > >>> function with stack, aligned to less than 16 bytes will certainly > >>> violate ABI. > >>> > >>> So, it looks to me that we need to realign the stack in the interrupt > >>> handler unconditionally to 16bytes. In this case, we also won't need > >>> the following changes: > >>> > >> > >> Current stack alignment implementation requires at least > >> one, maybe two, scratch registers: > >> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67841 > >> > >> Ext
Re: [PATCH][AArch64 Testsuite][Trivial?] Remove divisions-to-produce-NaN from vdiv_f.c
On 20 October 2015 at 13:40, Alan Lawrence wrote: > The test vdiv_f.c #define's NAN to (0.0 / 0.0). This produces extra scalar > fdiv's, which complicate the scan-assembler testing. We can remove these by > using __builtin_nan instead. > > Tested on AArch64 Linux. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vdiv_f.c: Use __builtin_nan. OK /Marcus
Re: [PATCH] 2015-10-19 Benedikt Huber Philipp Tomsich
On 4 January 1970 at 00:02, Benedikt Huber wrote: > * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf. > * config/aarch64/aarch64-protos.h: Declare. > * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte and > frsqrts. > * config/aarch64/aarch64-tuning-flags.def: Added recip_sqrt. > * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code > when > applicable. > * config/aarch64/aarch64.md: Added enum entries. > * config/aarch64/aarch64.opt: Added option -mlow-precision-recip-sqrt. > * testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h: Common > macros for > assembly checks. > * testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c: Make sure > frsqrts and frsqrte are not emitted. > * testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c: Make sure frsqrts > and > frsqrte are emitted. > * testsuite/gcc.target/aarch64/rsqrt_1.c: Functional tests for rsqrt. OK /Marcus
Re: [PATCH] Pass manager: add support for termination of pass list
On Tue, Oct 20, 2015 at 3:00 PM, Martin Liška wrote: > Hello. > > As part of upcoming merge of HSA branch, we would like to have possibility to > terminate > pass manager after execution of the HSA generation pass. The HSA back-end is > implemented > as a tree pass that directly emits HSAIL from gimple tree representation. The > pass operates > on clones created by HSA IPA pass and the pass manager should stop execution > of further > RTL passes. > > Suggested patch survives bootstrap and regression tests on x86_64-linux-pc. > > What do you think about it? Are you sure it works this way? Btw, you will miss executing of all the cleanup passes that will eventually free memory associated with the function. So I'd rather support a TODO_discard_function which should basically release the body from the cgraph. Richard. > Thanks, > Martin
Re: [vec-cmp, patch 2/6] Vectorization factor computation
On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich wrote: > 2015-10-13 16:37 GMT+03:00 Richard Biener : >> On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich wrote: >>> Hi, >>> >>> This patch handles statements with boolean result in vectorization factor >>> computation. For comparison its operands type is used instead of restult >>> type to compute VF. Other boolean statements are ignored for VF. >>> >>> Vectype for comparison is computed using type of compared values. Computed >>> type is propagated into other boolean operations. >> >> This feels rather ad-hoc, mixing up the existing way of computing >> vector type and VF. I'd rather have turned the whole >> vector type computation around to the scheme working on the operands >> rather than on the lhs and then searching >> for smaller/larger types on the rhs'. >> >> I know this is a tricky function (heh, but you make it even worse...). >> And it needs a helper with knowledge about operations >> so one can compute the result vector type for an operation on its >> operands. The seeds should be PHIs (handled like now) >> and loads, and yes, externals need special handling. >> >> Ideally we'd do things in two stages, first compute vector types in a >> less constrained manner (not forcing a single vector size) >> and then in a 2nd run promote to a common size also computing the VF to do >> that. > > This sounds like a refactoring, not a functional change, right? Also I > don't see a reason to analyze DF to compute vectypes if we promote it > to a single vector size anyway. For booleans we have to do it because > boolean vectors of the same size may have different number of > elements. What is the reason to do it for other types? For conversions and operators which support different sized operands > Shouldn't it be a patch independent from comparison vectorization series? As you like. >> >> Btw, I think you "mishandle" bool b = boolvar != 0; > > This should be handled fine. Statement will inherit a vectype of > 'boolvar' definition. If it's invariant - then yes, invariant boolean > statement case is not handled. But this is only because I supposed we > just shouldn't have such statements in a loop. If we may have them, > then using 'vector _Bool (VF)' type for that should be OK. > > Ilya > >> >> Richard. >>
Re: [Patch AArch64 63304] Fix issue with global state.
On 16 October 2015 at 12:05, Ramana Radhakrishnan wrote: > 2015-10-15 Ramana Radhakrishnan > > PR target/63304 > * config/aarch64/aarch64.c (aarch64_nopcrelative_literal_loads): New. > (aarch64_expand_mov_immediate): Use > aarch64_nopcrelative_literal_loads. > (aarch64_classify_address): Likewise. > (aarch64_secondary_reload): Likewise. > (aarch64_override_options_after_change_1): Adjust. > * config/aarch64/aarch64.md > (aarch64_reload_movcp): > Use aarch64_nopcrelative_literal_loads. > (aarch64_reload_movcp): Likewise. > * config/aarch64/aarch64-protos.h > (aarch64_nopcrelative_literal_loads): Declare > > 2015-10-15 Jiong Wang > Ramana Radhakrishnan > > PR target/63304 > * gcc.target/aarch64/pr63304-1.c: New test. > +/* Global flag for PC relative loads. */ > +bool aarch64_nopcrelative_literal_loads; Yuk... but I don't have a better suggestion to hand. > +++ b/gcc/testsuite/gcc.target/aarch64/pr63304-1.c In that directory I've been pestering folks to follow the name convention on the wiki for new additions, so _1 please (or change the wiki ;-). Otherwise OK with me. Cheers /Marcus
[hsa] Allow simple local assignments in between OMP statements
Hello, the following change to gridification part of the HSA branch are necessary to gridify loops which are bounded by Fortran parameters passed by reference, and therefore which have dereference of the actual parameter in front of the loop but within the target construct. Committed to the branch. Thanks, Martin 2015-10-20 Martin Jambor * omp-low.c (single_stmt_in_seq_skip_bind): Removed. (reg_assignment_to_local_var_p): New function. (seq_only_contains_local_assignments): Use it. (find_single_omp_among_assignments_1): New function. (find_single_omp_among_assignments): Likewise. (target_follows_gridifiable_pattern): Use it. (copy_leading_local_assignments): New function. (find_mark_kernel_components): Renamed to process_kernel_body_copy, use copy_leading_local_assignments to copy assignments leading to OMP statements. (attempt_target_gridification): Use the above new functions. --- gcc/omp-low.c | 303 -- 1 file changed, 191 insertions(+), 112 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 5234a11..56cc813 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -84,7 +84,6 @@ along with GCC; see the file COPYING3. If not see #include "symbol-summary.h" #include "hsa.h" - /* Lowering of OMP parallel and workshare constructs proceeds in two phases. The first phase scans the function looking for OMP statements and then for variables that must be replaced to satisfy data sharing @@ -12557,76 +12556,122 @@ lower_omp (gimple_seq *body, omp_context *ctx) input_location = saved_location; } -/* If SEQ is a sequence containing only one statement or a bind statement which - itself contains only one statement, return that statement. Otherwise return - NULL. TARGET_LOC must be location of the target statement and NAME the name - of the currently processed statement, both are used for dumping. */ +/* Returen true if STMT is an assignment of a register-type into a local + VAR_DECL. */ -static gimple * -single_stmt_in_seq_skip_bind (gimple_seq seq, location_t target_loc, - const char *name) +static bool +reg_assignment_to_local_var_p (gimple *stmt) { - gimple *stmt; - bool loop; - do + gassign *assign = dyn_cast (stmt); + if (!assign) +return false; + tree lhs = gimple_assign_lhs (assign); + if (TREE_CODE (lhs) != VAR_DECL + || !is_gimple_reg_type (TREE_TYPE (lhs)) + || is_global_var (lhs)) +return false; + return true; +} + +/* Return true if all statements in SEQ are assignments to local register-type + variables. */ + +static bool +seq_only_contains_local_assignments (gimple_seq seq) +{ + if (!seq) +return true; + + gimple_stmt_iterator gsi; + for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi)) +if (!reg_assignment_to_local_var_p (gsi_stmt (gsi))) + return false; + return true; +} + + +/* Scan statements in SEQ and call itself recursively on any bind. If during + whole search only assignments to register-type local variables and one + single OMP statement is encountered, return true, otherwise return false. + 8RET is where we store any OMP statement encountered. TARGET_LOC and NAME + are used for dumping a note about a failure. */ + +static bool +find_single_omp_among_assignments_1 (gimple_seq seq, location_t target_loc, +const char *name, gimple **ret) +{ + gimple_stmt_iterator gsi; + for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi)) { - if (!seq) + gimple *stmt = gsi_stmt (gsi); + + if (reg_assignment_to_local_var_p (stmt)) + continue; + if (gbind *bind = dyn_cast (stmt)) { - gcc_assert (name); - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, target_loc, -"Will not turn target construct into a simple " -"GPGPU kernel because %s construct has empty " -"body\n", -name); - return NULL; + if (!find_single_omp_among_assignments_1 (gimple_bind_body (bind), + target_loc, name, ret)) + return false; } - - if (!gimple_seq_singleton_p (seq)) + else if (is_gimple_omp (stmt)) + { + if (*ret) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, target_loc, +"Will not turn target construct into a simple " +"GPGPU kernel because %s construct contains " +"multiple OpenMP constructs\n", name); + return false; + } + *ret = stmt; + } + else { - gcc_assert (name); if (dump_enabled_p ()) dump_pri
[hsa] Do not output an extraneous kernel when gridifying
Hi, a small bug caused HSAIL output of a kernel for the parallel construct, even when it has been made a part of a gridified kernel and the standalone construct is therefore not necessary for HSA. This small patch fixes that. Committed to the branch. On a related note, I have merged trunk revision 228776 (just before OpenMP 4.5 landed) to the branch and am in the process of forward-porting the relevant bits to the new OpenMP. Hopefully it will not take too long. Thanks, Martin 2015-10-20 Martin Jambor * omp-low.c (region_part_of_target_p): Renamed to region_needs_kernel_p, return false for parts of gridified kernels. (expand_parallel_call): Adjusted. --- gcc/omp-low.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 56cc813..900ed19 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -4872,19 +4872,32 @@ gimple_build_cond_empty (tree cond) return gimple_build_cond (pred_code, lhs, rhs, NULL_TREE, NULL_TREE); } -/* Return true if the REGION is within a declare target function or within a - target region. */ +/* Return true if a parallel REGION is within a declare target function or + within a target region and is not a part of a gridified kernel. */ static bool -region_part_of_target_p (struct omp_region *region) +region_needs_kernel_p (struct omp_region *region) { + bool indirect = false; + for (region = region->outer; region; region = region->outer) +{ + if (region->type == GIMPLE_OMP_PARALLEL) + indirect = true; + else if (region->type == GIMPLE_OMP_TARGET) + { + gomp_target *tgt_stmt; + tgt_stmt = as_a (last_stmt (region->entry)); + if (tgt_stmt->kernel_iter) + return indirect; + else + return true; + } +} + if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (current_function_decl))) return true; - for (region = region->outer; region; region = region->outer) -if (region->type == GIMPLE_OMP_TARGET) - return true; return false; } @@ -5058,7 +5071,7 @@ expand_parallel_call (struct omp_region *region, basic_block bb, false, GSI_CONTINUE_LINKING); if (hsa_gen_requested_p () - && region_part_of_target_p (region)) + && region_needs_kernel_p (region)) { cgraph_node *child_cnode = cgraph_node::get (child_fndecl); hsa_register_kernel (child_cnode); -- 2.6.0
Re: [PATCH v2 1/6] Libsanitizer merge from upstream r250806 (was r249633).
Great, thanks! I'm going to commit the whole patch set tomorrow morning if no objections. On 20/10/15 14:52, Jakub Jelinek wrote: On Tue, Oct 20, 2015 at 02:18:25PM +0300, Maxim Ostapenko wrote: This is the merge itself. Added DCAN_SANITIZE_UB={0, 1} and nuked DASAN_FLEXIBLE_MAPPING_AND_OFFSET=0 (not used now) in corresponding Makefiles. Ok. Jakub
Re: [PATCH] Move cproj simplification to match.pd
On 20/10/15 08:37, Richard Biener wrote: On Mon, 19 Oct 2015, Christophe Lyon wrote: On 19 October 2015 at 15:54, Richard Biener wrote: Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Hi Richard, This patch caused arm and aarch64 builds of newlib to cause ICEs: In file included from /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/include/stdlib.h:11:0, from /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:13: /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c: In function '_mktm_r': /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/time/mktm_r.c:28:9: internal compiler error: Segmentation fault _DEFUN (_mktm_r, (tim_p, res, is_gmtime), 0xa90205 crash_signal /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:353 0x7b3b0c tree_class_check /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree.h:3055 0x7b3b0c tree_single_nonnegative_warnv_p(tree_node*, bool*, int) /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13025 0x814053 gimple_phi_nonnegative_warnv_p /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6239 0x814053 gimple_stmt_nonnegative_warnv_p(gimple*, bool*, int) /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:6264 0x7b5c94 tree_expr_nonnegative_p(tree_node*) /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/fold-const.c:13325 0xe2f657 gimple_simplify_108 /tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:5116 0xe3060d gimple_simplify_TRUNC_MOD_EXPR /tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:24762 0xe0809b gimple_simplify /tmp/884316_1.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc1/gcc/gimple-match.c:34389 0xe08c2b gimple_resimplify2(gimple**, code_helper*, tree_node*, tree_node**, tree_node* (*)(tree_node*)) /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:193 0xe17600 gimple_simplify(gimple*, code_helper*, tree_node**, gimple**, tree_node* (*)(tree_node*), tree_node* (*)(tree_node*)) /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-match-head.c:762 0x81c694 fold_stmt_1 /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gimple-fold.c:3605 0xad0f6c replace_uses_by(tree_node*, tree_node*) /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1835 0xad1a2f gimple_merge_blocks /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:1921 0x67d325 merge_blocks(basic_block_def*, basic_block_def*) /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfghooks.c:776 0xae06da cleanup_tree_cfg_bb /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:654 0xae1118 cleanup_tree_cfg_1 /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:686 0xae1118 cleanup_tree_cfg_noloop /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:738 0xae1118 cleanup_tree_cfg() /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfgcleanup.c:793 0x9c5c94 execute_function_todo /tmp/884316_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/passes.c:1920 Please submit a full bug report, This happens for instance with GCC configured --target arm-none-eabi --with-cpu cortex-a9 You can download logs of a failed build from http://people.linaro.org/~christophe.lyon/cross-validation/gcc-build/trunk/228970/build.html Sorry, I'm out of office for one week, I can't produce further details. Ok, without preprocessed source it's hard to track down but from the backtrace I figure it's an issue similar to that of PR67815. So a testcase would be really nice. Seems it has been filed PR 68031. Kyrill Richard. Christophe Richard. 2015-10-19 Richard Biener * gimple-fold.c (gimple_phi_nonnegative_warnv_p): New function. (gimple_stmt_nonnegative_warnv_p): Use it. * match.pd (CPROJ): New operator list. (cproj (complex ...)): Move simplifications from ... * builtins.c (fold_builtin_cproj): ... here. * gcc.dg/torture/builtin-cproj-1.c: Skip for -O0. Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 228877) +++ gcc/gimple-fold.c (working copy) @@ -6224,6 +6224,24 @@ gimple_call_nonnegative_warnv_p (gimple strict_overflow_p, depth); } +/* Return true if return value of call STMT is known to be non-negative. + If the return value is based on the assumption that signed overflow is + undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change + *STRIC
Re: PING^2: [PATCH] Limit alignment on error_mark_node variable
> >> > >> This patch avoids calling varpool_node::finalize_decl on error_mark_node > >> type decls. Does it make sense? Yep, i guess this is fine. I wonder why error_mark_node needs to be in local decl list? :) honza > >> > >> > >> H.J. > >> -- > >> gcc/ > >> > >> PR target/66810 > >> * cgraphbuild.c (pass_build_cgraph_edges::execute): Skip > >> error_mark_node type decls. > >> > >> gcc/testsuite/ > >> > >> PR target/66810 > >> * gcc.target/i386/pr66810.c: New test. > >> --- > >> gcc/cgraphbuild.c | 3 ++- > >> gcc/testsuite/gcc.target/i386/pr66810.c | 10 ++ > >> 2 files changed, 12 insertions(+), 1 deletion(-) > >> create mode 100644 gcc/testsuite/gcc.target/i386/pr66810.c > >> > >> diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c > >> index 7d2d096..4d679d9 100644 > >> --- a/gcc/cgraphbuild.c > >> +++ b/gcc/cgraphbuild.c > >> @@ -381,7 +381,8 @@ pass_build_cgraph_edges::execute (function *fun) > >>FOR_EACH_LOCAL_DECL (fun, ix, decl) > >> if (TREE_CODE (decl) == VAR_DECL > >> && (TREE_STATIC (decl) && !DECL_EXTERNAL (decl)) > >> - && !DECL_HAS_VALUE_EXPR_P (decl)) > >> + && !DECL_HAS_VALUE_EXPR_P (decl) > >> + && TREE_TYPE (decl) != error_mark_node) > >>varpool_node::finalize_decl (decl); > >>record_eh_tables (node, fun); > >> > >> diff --git a/gcc/testsuite/gcc.target/i386/pr66810.c > >> b/gcc/testsuite/gcc.target/i386/pr66810.c > >> new file mode 100644 > >> index 000..4778b37 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/i386/pr66810.c > >> @@ -0,0 +1,10 @@ > >> +/* { dg-do compile { target ia32 } } */ > >> +/* { dg-options "-mno-sse -mno-mmx -miamcu" } */ > >> + > >> +int vv; > >> + > >> +void > >> +i (void) > >> +{ > >> + static int a[vv]; /* { dg-error "storage size" } */ > >> +} > >> -- > >> 2.4.3 > >> > > > > > > > > -- > > H.J. > > > > -- > H.J.
Re: [PATCH][AArch64] Replace insn to zero up DF register
On Tue, Oct 20, 2015 at 7:59 AM, Andrew Pinski wrote: > On Tue, Oct 20, 2015 at 7:51 AM, Andrew Pinski wrote: >> On Tue, Oct 20, 2015 at 7:40 AM, Evandro Menezes >> wrote: >>> In the existing targets, it seems that it's always faster to zero up a DF >>> register with "movi %d0, #0" instead of "fmov %d0, xzr". >> >> I think for ThunderX 1, this change will not make a difference. So I >> am neutral on this change. > > Actually depending on fmov is decoded in our pipeline, this change > might actually be worse. Currently fmov with an immediate is 1 cycle > while movi is two cycles. Let me double check how internally on how > it is decoded and if it is 1 cycle or two. Ok, my objections are removed as I talked with the architectures here at Cavium and using movi is better in this case. Thanks, Andrew > > Thanks, > Andrew > >> >> Thanks, >> Andrew >> >>> >>> This patch modifies the respective pattern. >>> >>> Please, commit if it's alright. >>> >>> Thank you, >>> >>> -- >>> Evandro Menezes >>>
[C PATCH] Don't accept invalid attribute-list (PR c/67964)
Joseph noticed that we were wrongly accepting multiple attributes without commas. Thus fixed by breaking out of the loop when parsing the attributes -- if we don't see a comma after an attribute, then the next tokens must be )); if not, then the attribute is invalid. I've also added a few more comments. (I think it would be nicer to model the code after C++ and introduce c_parser_attributes_list, but the following is enough to fix the bug.) Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-10-20 Marek Polacek PR c/67964 * c-parser.c (c_parser_attributes): Break out of the loop if the token after an attribute isn't a comma. * gcc.dg/pr67964.c: New test. diff --git gcc/c/c-parser.c gcc/c/c-parser.c index 704ebc6..e7b8440 100644 --- gcc/c/c-parser.c +++ gcc/c/c-parser.c @@ -3965,7 +3965,9 @@ c_parser_attributes (c_parser *parser) /* ??? Follow the C++ parser rather than using the lex_untranslated_string kludge. */ parser->lex_untranslated_string = true; + /* Consume the `__attribute__' keyword. */ c_parser_consume_token (parser); + /* Look for the two `(' tokens. */ if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) { parser->lex_untranslated_string = false; @@ -3993,17 +3995,24 @@ c_parser_attributes (c_parser *parser) attr_name = c_parser_attribute_any_word (parser); if (attr_name == NULL) break; - if (is_cilkplus_vector_p (attr_name)) + if (is_cilkplus_vector_p (attr_name)) { c_token *v_token = c_parser_peek_token (parser); c_parser_cilk_simd_fn_vector_attrs (parser, *v_token); + /* If the next token isn't a comma, we're done. */ + if (!c_parser_next_token_is (parser, CPP_COMMA)) + break; continue; } c_parser_consume_token (parser); if (c_parser_next_token_is_not (parser, CPP_OPEN_PAREN)) { attr = build_tree_list (attr_name, NULL_TREE); + /* Add this attribute to the list. */ attrs = chainon (attrs, attr); + /* If the next token isn't a comma, we're done. */ + if (!c_parser_next_token_is (parser, CPP_COMMA)) + break; continue; } c_parser_consume_token (parser); @@ -4062,8 +4071,13 @@ c_parser_attributes (c_parser *parser) "expected %<)%>"); return attrs; } + /* Add this attribute to the list. */ attrs = chainon (attrs, attr); + /* If the next token isn't a comma, we're done. */ + if (!c_parser_next_token_is (parser, CPP_COMMA)) + break; } + /* Look for the two `)' tokens. */ if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN)) c_parser_consume_token (parser); else diff --git gcc/testsuite/gcc.dg/pr67964.c gcc/testsuite/gcc.dg/pr67964.c index e69de29..095b50f 100644 --- gcc/testsuite/gcc.dg/pr67964.c +++ gcc/testsuite/gcc.dg/pr67964.c @@ -0,0 +1,21 @@ +/* PR c/67964 */ +/* { dg-do compile } */ + +extern int fn0 (void) __attribute__ ((const const)); /* { dg-error "expected" } */ +extern int fn1 (void) __attribute__ ((const, const)); +extern int fn2 (void) __attribute__ ((optimize (0) const)); /* { dg-error "expected" } */ +extern int fn3 (void) __attribute__ ((optimize (0), const)); +/* We allow starting/trailing comma. */ +extern int fn4 (void) __attribute__ ((, const)); +extern int fn5 (void) __attribute__ ((const, )); +extern int fn6 (void) __attribute__ (( const, )); +extern int fn7 (void) __attribute__ ((,)); +extern int fn8 (void) __attribute__ ((__noreturn__ __noreturn__)); /* { dg-error "expected" } */ +extern int fn9 (void) __attribute__ ((__noreturn__, __noreturn__)); +extern int fn10 (void) __attribute__ ((__cold__ __pure__ __noclone__)); /* { dg-error "expected" } */ +extern int fn11 (void) __attribute__ ((__cold__, __pure__ __noclone__)); /* { dg-error "expected" } */ +int i; +int ii; +extern int a __attribute__ ((alias ("i") unused)); /* { dg-error "expected" } */ +extern int a2 __attribute__ ((alias ("i" "i"))); +struct A { char p[6]; } __attribute__((__packed__ packed)); /* { dg-error "expected" } */ Marek
Re: [PATCH] Properly handle -miamcu and -march=lakemont
On Sat, Oct 17, 2015 at 4:30 AM, H.J. Lu wrote: > -miamcu specifies how parameters are passed to functions and how value > is returned from a function. Inside function, we can use instructions > supported by -march=XXX. -miamcu -march=haswell can generate x87, SSE > and AVX instructions as long as the IA MCU psABI is followed. But since > Lakemont processor doesn't support x87 instructions, we shouldn't > generate x87 instructions with -march=lakemont. This patch separates > code generation from -miamcu and makes -march=lakemont not to generate > x87 instructions. > > Tested on x86-64. OK for trunk? > > Thanks. > > H.J. > > gcc/ > > PR target/67963 > PR target/67985 > * common/config/i386/i386-common.c (ix86_handle_option): Remove > OPT_miamcu handling. > * config/i386/i386.c (PTA_NO_80387): New macro. > (processor_alias_table): Add PTA_NO_80387 to lakemont. > (ix86_option_override_internal): Update MASK_80387 from > PTA_NO_80387. Don't warn x87/MMX/SSE/AVX for -miamcu. Warn > SSE math only if 80387 is supported. Don't change > MASK_FLOAT_RETURNS. > (ix86_valid_target_attribute_tree): Enable FPMATH_387 only if > 80387 is supported. > * config/i386/i386.h (TARGET_FLOAT_RETURNS_IN_80387): True only > if TARGET_80387 is true and TARGET_IAMCU is false. > (TARGET_FLOAT_RETURNS_IN_80387_P): True only if TARGET_80387_P > is true and TARGET_IAMCU_P is false. > > gcc/testsuite/ > > PR target/67963 > PR target/67985 > * gcc.target/i386/pr67963-1.c: New test. > * gcc.target/i386/pr67963-2.c: Likewise. > * gcc.target/i386/pr67963-3.c: Likewise. > * gcc.target/i386/pr67985-1.c: Likewise. > * gcc.target/i386/pr67985-2.c: Likewise. > * gcc.target/i386/pr67985-3.c: Likewise. OK. Thanks, Uros. > gcc/common/config/i386/i386-common.c | 16 +--- > gcc/config/i386/i386.c| 41 > ++- > gcc/config/i386/i386.h| 8 -- > gcc/testsuite/gcc.target/i386/pr67963-1.c | 9 +++ > gcc/testsuite/gcc.target/i386/pr67963-2.c | 11 + > gcc/testsuite/gcc.target/i386/pr67963-3.c | 11 + > gcc/testsuite/gcc.target/i386/pr67985-1.c | 11 + > gcc/testsuite/gcc.target/i386/pr67985-2.c | 13 ++ > gcc/testsuite/gcc.target/i386/pr67985-3.c | 12 + > 9 files changed, 93 insertions(+), 39 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr67963-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr67963-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr67963-3.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr67985-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr67985-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr67985-3.c > > diff --git a/gcc/common/config/i386/i386-common.c > b/gcc/common/config/i386/i386-common.c > index 79b2472..0f8c3e1 100644 > --- a/gcc/common/config/i386/i386-common.c > +++ b/gcc/common/config/i386/i386-common.c > @@ -223,7 +223,7 @@ along with GCC; see the file COPYING3. If not see > > bool > ix86_handle_option (struct gcc_options *opts, > - struct gcc_options *opts_set, > + struct gcc_options *opts_set ATTRIBUTE_UNUSED, > const struct cl_decoded_option *decoded, > location_t loc) > { > @@ -232,20 +232,6 @@ ix86_handle_option (struct gcc_options *opts, > >switch (code) > { > -case OPT_miamcu: > - if (value) > - { > - /* Turn off x87/MMX/SSE/AVX codegen for -miamcu. */ > - opts->x_target_flags &= ~MASK_80387; > - opts_set->x_target_flags |= MASK_80387; > - opts->x_ix86_isa_flags &= ~(OPTION_MASK_ISA_MMX_UNSET > - | OPTION_MASK_ISA_SSE_UNSET); > - opts->x_ix86_isa_flags_explicit |= (OPTION_MASK_ISA_MMX_UNSET > - | OPTION_MASK_ISA_SSE_UNSET); > - > - } > - return true; > - > case OPT_mmmx: >if (value) > { > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 33e0cff..270199c 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -4295,6 +4295,7 @@ ix86_option_override_internal (bool main_args_p, > #define PTA_PCOMMIT(HOST_WIDE_INT_1 << 56) > #define PTA_MWAITX (HOST_WIDE_INT_1 << 57) > #define PTA_CLZERO (HOST_WIDE_INT_1 << 58) > +#define PTA_NO_80387 (HOST_WIDE_INT_1 << 59) > > #define PTA_CORE2 \ >(PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSSE3 \ > @@ -4339,7 +4340,7 @@ ix86_option_override_internal (bool main_args_p, >{"i486", PROCESSOR_I486, CPU_NONE, 0}, >{"i586", PROCESSOR_PENTIUM, CPU_PENTIUM, 0}, >{"pentium", PROCESSOR_PENTIUM, CPU_PENTIUM, 0}, > -
[PATCH] Refactor graphite-sese-to-poly, sese.h, graphite-poly.h
Now that scop contains a list of all the basic blocks inside, it makes sense to iterate over only those basic blocks in graphite-sese-to-poly.c:rewrite_reductions_out_of_ssa,rewrite_cross_bb_scalar_deps_out_of_ssa Passes regtest and bootstrap. gcc/ChangeLog: 2015-10-20 Aditya Kumar * graphite-poly.h (struct dr_info): Added invalid_alias_set number. (operator=): Removed. (dr_info): Make alias_set number the last argument with default value of invalid_alias_set. * graphite-sese-to-poly.c (build_scop_drs): Update constructor of dr_info. (rewrite_reductions_out_of_ssa): Iterate only through the basic blocks which are inside region. (rewrite_cross_bb_scalar_deps_out_of_ssa): Same. * sese.h (struct sese_l): Removed assignment operator. (split_region_for_bb): Removed dead code. --- gcc/graphite-poly.h | 26 +-- gcc/graphite-sese-to-poly.c | 50 ++--- gcc/sese.h | 37 - 3 files changed, 34 insertions(+), 79 deletions(-) diff --git a/gcc/graphite-poly.h b/gcc/graphite-poly.h index 721e914..5298f85 100644 --- a/gcc/graphite-poly.h +++ b/gcc/graphite-poly.h @@ -373,29 +373,23 @@ pbb_set_black_box (poly_bb_p pbb, gimple_poly_bb_p black_box) struct dr_info { + enum { +invalid_alias_set = -1 + }; /* The data reference. */ data_reference_p dr; - /* ALIAS_SET is the SCC number assigned by a graph_dfs of the alias graph. -1 - is an invalid alias set. */ - int alias_set; - /* The polyhedral BB containing this DR. */ poly_bb_p pbb; + /* ALIAS_SET is the SCC number assigned by a graph_dfs of the alias graph. + -1 is an invalid alias set. */ + int alias_set; + /* Construct a DR_INFO from a data reference DR, an ALIAS_SET, and a PBB. */ - dr_info (data_reference_p dr, int alias_set, poly_bb_p pbb) -: dr (dr), alias_set (alias_set), pbb (pbb) {} - - /* Assignment operator, to be able to iterate over a vec of these objects. */ - const dr_info & - operator= (const dr_info &p) - { -dr = p.dr; -alias_set = p.alias_set; -pbb = p.pbb; -return *this; - } + dr_info (data_reference_p dr, poly_bb_p pbb, + int alias_set = invalid_alias_set) +: dr (dr), pbb (pbb), alias_set (alias_set) {} }; /* A SCOP is a Static Control Part of the program, simple enough to be diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index d75e6a2..d1eae90 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -1151,7 +1151,7 @@ build_scop_drs (scop_p scop) FOR_EACH_VEC_ELT (scop->pbbs, i, pbb) if (pbb) FOR_EACH_VEC_ELT (GBB_DATA_REFS (PBB_BLACK_BOX (pbb)), j, dr) - scop->drs.safe_push (dr_info (dr, -1, pbb)); + scop->drs.safe_push (dr_info (dr, pbb)); build_alias_set (scop); @@ -1497,31 +1497,29 @@ rewrite_degenerate_phi (gphi_iterator *psi) static void rewrite_reductions_out_of_ssa (scop_p scop) { + int i; basic_block bb; - sese_l region = scop->scop_info->region; - - FOR_EACH_BB_FN (bb, cfun) -if (bb_in_sese_p (bb, region)) - for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);) - { - gphi *phi = psi.phi (); + FOR_EACH_VEC_ELT (scop->scop_info->bbs, i, bb) +for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);) + { + gphi *phi = psi.phi (); - if (virtual_operand_p (gimple_phi_result (phi))) - { - gsi_next (&psi); - continue; - } + if (virtual_operand_p (gimple_phi_result (phi))) + { + gsi_next (&psi); + continue; + } - if (gimple_phi_num_args (phi) > 1 - && degenerate_phi_result (phi)) - rewrite_degenerate_phi (&psi); + if (gimple_phi_num_args (phi) > 1 + && degenerate_phi_result (phi)) + rewrite_degenerate_phi (&psi); - else if (scalar_close_phi_node_p (phi)) - rewrite_close_phi_out_of_ssa (scop, &psi); + else if (scalar_close_phi_node_p (phi)) + rewrite_close_phi_out_of_ssa (scop, &psi); - else if (reduction_phi_p (region, &psi)) - rewrite_phi_out_of_ssa (scop, &psi); - } + else if (reduction_phi_p (scop->scop_info->region, &psi)) + rewrite_phi_out_of_ssa (scop, &psi); + } update_ssa (TODO_update_ssa); #ifdef ENABLE_CHECKING @@ -1684,7 +1682,6 @@ rewrite_cross_bb_scalar_deps (scop_p scop, gimple_stmt_iterator *gsi) static void rewrite_cross_bb_scalar_deps_out_of_ssa (scop_p scop) { - basic_block bb; gimple_stmt_iterator psi; sese_l region = scop->scop_info->region; bool changed = false; @@ -1692,10 +1689,11 @@ rewrite_cross_bb_scalar_deps_out_of_ssa (scop_p scop) /* Create an extra empty BB after the scop. */ split_edge (region.exit); - FOR_EACH_BB_FN (bb, cfun) -if (bb_
Re: [PATCH v2] PR rtl-optimization/66790: uninitialized registers handling in REE
Pierre, Did this revised patch address the comments about MIR from Kenny? - David
Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
On 19 October 2015 at 14:57, Kyrill Tkachov wrote: > 2015-10-19 Kyrylo Tkachov > > * config/aarch64/aarch64.md > (*aarch64_fcvt2_mult): New pattern. > * config/aarch64/aarch64-simd.md > (*aarch64_fcvt2_mult): Likewise. > * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. > (aarch64_fpconst_pow_of_2): New function. > (aarch64_vec_fpconst_pow_of_2): Likewise. > * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare > prototype. > (aarch64_vec_fpconst_pow_of_2): Likewise. > * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. > (aarch64_fp_vec_pow2): Likewise. > > 2015-10-19 Kyrylo Tkachov > > * gcc.target/aarch64/fmul_fcvt_1.c: New test. > * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. +char buf[64]; +sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits); Prefer snprintf please. + } + [(set_attr "type" "neon_fp_to_int_")] +) + + Superflous blank line here ? + *cost += rtx_cost (XEXP (x, 0), VOIDmode, + (enum rtx_code) code, 0, speed); My understanding is the unnecessary use of enum is now discouraged, (rtx_code) is sufficient in this case. + int count = CONST_VECTOR_NUNITS (x); + int i; + for (i = 1; i < count; i++) Push the int into the for initializer. Push the rhs of the count assignment into the for condition and drop the definition of count. +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */ I'd prefer scan-assembler-times or do you have a particular reason to avoid it in these tests? Cheers /Marcus
Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof
On 10/20/2015 07:20 AM, Bernd Schmidt wrote: On 10/16/2015 09:34 PM, Martin Sebor wrote: Thank you for the review. Attached is an updated patch that hopefully addresses all your comments. I ran the check_GNU_style.sh script on it to make sure I didn't miss something. I've also added replies to a few of your comments below. Ok, thanks. However - the logic around the context struct in the patch still seems fairly impenetrable to me, and I've been wondering whether a simpler approach wouldn't work just as well. I came up with the patch below, which just passes a tree_code context to recursive invocations and increasing the upper bound by one only if not in an ARRAY_REF or COMPONENT_REF context. As far as I can tell, this produces the same warnings on the testcase (modulo differences in type printout), except for the final few FA5_7 testcases, which I had doubts about anyway: typedef struct FA5_7 { int i; char a5_7 [5][7]; } FA5_7; __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" } Why wouldn't at least the first two be convered by the one-past-the-end-is-ok rule? Thanks for taking the time to try to simplify the patch! Diagnosing the cases above is at the core of the issue (and can be seen by substituting a5_7 into the oversimplified test case in the bug). In a 5 by 7 array, the offset computed for the out-of-bounds a_5_7[0][7] is the same as the offset computed for the in-bounds a_5_7[1][0]. The result might be unexpected because it suggests that two "distinct" elements of an array share the same offset. The "one-past-the-end rule" applies only to the offset corresponding to the address just past the end of the whole array because like its address, the offset of the element just beyond the end of the array is valid and cannot be mistaken for an offset of a valid element. My initial approach was similar to the patch you attached. Most of the additional complexity (i.e., the recursive use of the context) grew out of wanting to diagnose these less than trivial cases of surprising offsets in multi-dimensional arrays. I don't think it can be done without passing some state down the recursive calls but I'd be happy to be proven wrong. FWIW, teh bug and my patch for it were precipitated by looking for the problems behind PR 67872 - missing -Warray-bounds warning (which was in turn prompted by developing and testing a solution for PR 67942 - diagnose placement new buffer overflow). I think -Warray-bounds should emit consistent diagnostics for invalid array references regardless of the contexts. I.e., given struct S { int A [5][7]; int x; } s; these should both be diagnosed: int i = offsetof (struct S, A [0][7]); int *p = &s.A [0][7]; because they are both undefined and both can lead to surprising results when used. With my patch for 67882, GCC diagnoses the first case. I hope to work on 67872 in the near future to diagnose the second to bring GCC on par with Clang. (Clang doesn't yet diagnose any offsetof errors). Martin
[PATCH, alpha]: Use CEIL macro
... plus some cleanups due to unused macro argument. No functional changes. 2015-10-20 Uros Bizjak * config/alpha/alpha.h (HARD_REGNO_NREGS): Use CEIL macro. (ALPHA_ARG_SIZE): Ditto. Remove unused NAMED argument. * config/alpha/alpha.c (alpha_function_arg_advance): Update ALPHA_ARG_SIZE usage. (alpha_arg_partial_bytes): Ditto. Bootstrapped and regression tested on alpha-linux-gnu, committed to mainline SVN. Uros. Index: config/alpha/alpha.c === --- config/alpha/alpha.c(revision 229082) +++ config/alpha/alpha.c(working copy) @@ -5613,7 +5613,7 @@ alpha_function_arg_advance (cumulative_args_t cum_ { CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); bool onstack = targetm.calls.must_pass_in_stack (mode, type); - int increment = onstack ? 6 : ALPHA_ARG_SIZE (mode, type, named); + int increment = onstack ? 6 : ALPHA_ARG_SIZE (mode, type); #if TARGET_ABI_OSF *cum += increment; @@ -5635,10 +5635,10 @@ alpha_arg_partial_bytes (cumulative_args_t cum_v, #if TARGET_ABI_OPEN_VMS if (cum->num_args < 6 - && 6 < cum->num_args + ALPHA_ARG_SIZE (mode, type, named)) + && 6 < cum->num_args + ALPHA_ARG_SIZE (mode, type)) words = 6 - cum->num_args; #elif TARGET_ABI_OSF - if (*cum < 6 && 6 < *cum + ALPHA_ARG_SIZE (mode, type, named)) + if (*cum < 6 && 6 < *cum + ALPHA_ARG_SIZE (mode, type)) words = 6 - *cum; #else #error Unhandled ABI Index: config/alpha/alpha.h === --- config/alpha/alpha.h(revision 229082) +++ config/alpha/alpha.h(working copy) @@ -383,7 +383,7 @@ extern enum alpha_fp_trap_mode alpha_fptm; but can be less for certain modes in special long registers. */ #define HARD_REGNO_NREGS(REGNO, MODE) \ - ((GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD) + CEIL (GET_MODE_SIZE (MODE), UNITS_PER_WORD) /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE. On Alpha, the integer registers can hold any mode. The floating-point @@ -674,13 +674,15 @@ extern int alpha_memory_latency; #define INIT_CUMULATIVE_ARGS(CUM, FNTYPE, LIBNAME, INDIRECT, N_NAMED_ARGS) \ (CUM) = 0 -/* Define intermediate macro to compute the size (in registers) of an argument - for the Alpha. */ +/* Define intermediate macro to compute + the size (in registers) of an argument. */ -#define ALPHA_ARG_SIZE(MODE, TYPE, NAMED) \ +#define ALPHA_ARG_SIZE(MODE, TYPE) \ ((MODE) == TFmode || (MODE) == TCmode ? 1\ - : (((MODE) == BLKmode ? int_size_in_bytes (TYPE) : GET_MODE_SIZE (MODE)) \ - + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD) + : CEIL (((MODE) == BLKmode \ + ? int_size_in_bytes (TYPE) \ + : GET_MODE_SIZE (MODE)),\ + UNITS_PER_WORD)) /* Make (or fake) .linkage entry for function call. IS_LOCAL is 0 if name is used in call, 1 if name is used in definition. */
Re: GCC 6 Status Report (2015-10-16)
Hi Richard, On 10/16/2015 05:09 AM, Richard Biener wrote: This means it is time to get things you want to have in GCC 6 finalized and reviewed. As usual there may be exceptions to late reviewed features but don't count on that. Likewise target specific features can sneak in during Stage 3 if maintainers ok them. This is just to point out I have series of DWARF-related patches awaiting reviews for several months. I plan to rebase against trunk, retest and resubmit them shortly but here’s an overview: * The patch series for transition to standard DWARF for Ada (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01857.html). There are 8 patches, each one depending on the previous one, except the 6/8 one (https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01361.html) which could go on its own. * Fix DW_AT_static_link generation in DWARF (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01855.html). The corresponding support in GDB is in the development branch. * Track indirect calls as call site information in DWARF (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01856.html). * Materialize subprogram renamings in Ada as imported declarations in DWARF (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01854.html). The corresponding support in GDB is in the development branch. If there is anything I should do to ease the review process, please let me know. :-) Thank you in advance! -- Pierre-Marie de Rodat
Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
On 20/10/15 16:26, Marcus Shawcroft wrote: On 19 October 2015 at 14:57, Kyrill Tkachov wrote: 2015-10-19 Kyrylo Tkachov * config/aarch64/aarch64.md (*aarch64_fcvt2_mult): New pattern. * config/aarch64/aarch64-simd.md (*aarch64_fcvt2_mult): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. (aarch64_fpconst_pow_of_2): New function. (aarch64_vec_fpconst_pow_of_2): Likewise. * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare prototype. (aarch64_vec_fpconst_pow_of_2): Likewise. * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. (aarch64_fp_vec_pow2): Likewise. 2015-10-19 Kyrylo Tkachov * gcc.target/aarch64/fmul_fcvt_1.c: New test. * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. +char buf[64]; +sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits); Prefer snprintf please. + } + [(set_attr "type" "neon_fp_to_int_")] +) + + Superflous blank line here ? + *cost += rtx_cost (XEXP (x, 0), VOIDmode, + (enum rtx_code) code, 0, speed); My understanding is the unnecessary use of enum is now discouraged, (rtx_code) is sufficient in this case. + int count = CONST_VECTOR_NUNITS (x); + int i; + for (i = 1; i < count; i++) Push the int into the for initializer. Push the rhs of the count assignment into the for condition and drop the definition of count. Ok, done. +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */ I'd prefer scan-assembler-times or do you have a particular reason to avoid it in these tests? No reason. Here's the patch updated as per your feedback. How's this? Thanks, Kyrill 2015-10-20 Kyrylo Tkachov * config/aarch64/aarch64.md (*aarch64_fcvt2_mult): New pattern. * config/aarch64/aarch64-simd.md (*aarch64_fcvt2_mult): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. (aarch64_fpconst_pow_of_2): New function. (aarch64_vec_fpconst_pow_of_2): Likewise. * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare prototype. (aarch64_vec_fpconst_pow_of_2): Likewise. * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. (aarch64_fp_vec_pow2): Likewise. 2015-10-20 Kyrylo Tkachov * gcc.target/aarch64/fmul_fcvt_1.c: New test. * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. commit dc55192dd5f54f69d659ea6cc703c5fc2dc9b88b Author: Kyrylo Tkachov Date: Thu Oct 8 15:17:47 2015 +0100 [AArch64] Add fmul+fcvt optimisation diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index a8ac8d3..309dcfb 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -294,12 +294,14 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx); enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx); enum reg_class aarch64_regno_regclass (unsigned); int aarch64_asm_preferred_eh_data_format (int, int); +int aarch64_fpconst_pow_of_2 (rtx); machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned, machine_mode); int aarch64_hard_regno_mode_ok (unsigned, machine_mode); int aarch64_hard_regno_nregs (unsigned, machine_mode); int aarch64_simd_attr_length_move (rtx_insn *); int aarch64_uxt_size (int, HOST_WIDE_INT); +int aarch64_vec_fpconst_pow_of_2 (rtx); rtx aarch64_final_eh_return_addr (void); rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int); const char *aarch64_output_move_struct (rtx *operands); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 167277e..cf1ff6d 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1654,6 +1654,26 @@ (define_insn "l2" [(set_attr "type" "neon_fp_to_int_")] ) +(define_insn "*aarch64_fcvt2_mult" + [(set (match_operand: 0 "register_operand" "=w") + (FIXUORS: (unspec: + [(mult:VDQF + (match_operand:VDQF 1 "register_operand" "w") + (match_operand:VDQF 2 "aarch64_fp_vec_pow2" ""))] + UNSPEC_FRINTZ)))] + "TARGET_SIMD + && IN_RANGE (aarch64_vec_fpconst_pow_of_2 (operands[2]), 1, + GET_MODE_BITSIZE (GET_MODE_INNER (mode)))" + { +int fbits = aarch64_vec_fpconst_pow_of_2 (operands[2]); +char buf[64]; +snprintf (buf, 64, "fcvtz\\t%%0., %%1., #%d", fbits); +output_asm_insn (buf, operands); +return ""; + } + [(set_attr "type" "neon_fp_to_int_")] +) + (define_expand "2" [(set (match_operand: 0 "register_operand") (FIXUORS: (unspec: diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e304d40..17e59e1 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6791,6 +6791,19 @@ cost_plus: else *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint; } + + /* We can combine fmul by a power of 2 followed by a fcvt into a single + fixed-point fcvt. */ + if (GET_CODE (x) == MULT + && (
Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof
On 10/20/2015 05:31 PM, Martin Sebor wrote: On 10/20/2015 07:20 AM, Bernd Schmidt wrote: On 10/16/2015 09:34 PM, Martin Sebor wrote: Thank you for the review. Attached is an updated patch that hopefully addresses all your comments. I ran the check_GNU_style.sh script on it to make sure I didn't miss something. I've also added replies to a few of your comments below. Ok, thanks. However - the logic around the context struct in the patch still seems fairly impenetrable to me, and I've been wondering whether a simpler approach wouldn't work just as well. I came up with the patch below, which just passes a tree_code context to recursive invocations and increasing the upper bound by one only if not in an ARRAY_REF or COMPONENT_REF context. As far as I can tell, this produces the same warnings on the testcase (modulo differences in type printout), except for the final few FA5_7 testcases, which I had doubts about anyway: typedef struct FA5_7 { int i; char a5_7 [5][7]; } FA5_7; __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" } Why wouldn't at least the first two be convered by the one-past-the-end-is-ok rule? Thanks for taking the time to try to simplify the patch! Diagnosing the cases above is at the core of the issue (and can be seen by substituting a5_7 into the oversimplified test case in the bug). In a 5 by 7 array, the offset computed for the out-of-bounds a_5_7[0][7] is the same as the offset computed for the in-bounds a_5_7[1][0]. The result might be unexpected because it suggests that two "distinct" elements of an array share the same offset. Yeah, but Joseph wrote: Given such a flexible array member, [anything][0] and [anything][1] are logically valid if [anything] is nonnegative (of course, the array might or might not have enough element at runtime). [anything][2] is a reference to just past the end of an element of the flexible array member; [anything][3] is clearly invalid. Regarding the actual testcase, accepting a1_1[0][1] seems fine (it's referencing the just-past-end element of the valid array a1_1[0]). So how's that different from a5_7[0][7]? In case it matters, a1_1 in the earlier test that was discussed could not be a flexible array because it was followed by another field in the same struct. Handing over review duties to Joseph because he'll have to decide what to warn about and what not to. struct S { int A [5][7]; int x; } s; these should both be diagnosed: int i = offsetof (struct S, A [0][7]); int *p = &s.A [0][7]; because they are both undefined and both can lead to surprising results when used. Is that really undefined? I thought it was explicitly allowed. Dereferencing the pointer is what's undefined. Bernd Bernd
[PATCH][ARM] Fix for testcase after r228661
Hi, This patch addresses PR-67948 by changing the xor-and.c test, initially written for a simplify-rtx pattern, to make it pass post r228661 (see https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00676.html). This test no longer triggered the simplify-rtx pattern it was written for prior to r228661, though other optimizations did lead to the same assembly the test checked for. The optimization added with r228661 matches the pattern used in the test and optimizes it to a better and still valid sequence. Being unable to easily change the test to trigger the original simplify-rtx pattern, I chose to change it to pass with the new produced assembly sequence. This is correct because the transformation is valid and it yields a more efficient pattern. However, as I pointed out before this test doesn't test the optimization it originally was intended for. Tested by running regression tests for armv6. Is this OK to commit? Thanks, Andre From 89922547118e716b41ddf6edefb274322193f25c Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Thu, 15 Oct 2015 12:48:26 +0100 Subject: [PATCH] Fix for xor-and.c test --- gcc/testsuite/gcc.target/arm/xor-and.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/arm/xor-and.c b/gcc/testsuite/gcc.target/arm/xor-and.c index 53dff85f8f780fb99a93bbcc24180a3d0d5d3be9..3715530cd7bf9ad8abb24cb21cd51ae3802079e8 100644 --- a/gcc/testsuite/gcc.target/arm/xor-and.c +++ b/gcc/testsuite/gcc.target/arm/xor-and.c @@ -10,6 +10,6 @@ unsigned short foo (unsigned short x) return x; } -/* { dg-final { scan-assembler "orr" } } */ +/* { dg-final { scan-assembler "eor" } } */ /* { dg-final { scan-assembler-not "mvn" } } */ /* { dg-final { scan-assembler-not "uxth" } } */ -- 1.9.1
Re: [PATCH][AArch64] Enable fusion of AES instructions
On 14 October 2015 at 13:30, Wilco Dijkstra wrote: > Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. > This can give up to 2x > speedup on many AArch64 implementations. Also model the crypto instructions > on Cortex-A57 according > to the Optimization Guide. > > Passes regression tests. > > ChangeLog: > 2015-10-14 Wilco Dijkstra > > * gcc/config/aarch64/aarch64.c (cortexa53_tunings): Add AES fusion. > (cortexa57_tunings): Likewise. > (cortexa72_tunings): Likewise. > (arch_macro_fusion_pair_p): Add support for AES fusion. > * gcc/config/aarch64/aarch64-fusion-pairs.def: Add AES_AESMC entry. These AArch64 changes are OK > * gcc/config/arm/aarch-common.c (aarch_crypto_can_dual_issue): > Allow virtual registers before reload so early scheduling works. > * gcc/config/arm/cortex-a57.md (cortex_a57_crypto_simple): Use > correct latency and pipeline. > (cortex_a57_crypto_complex): Likewise. > (cortex_a57_crypto_xor): Likewise. > (define_bypass): Add AES bypass. ... but wait for Ramana or Kyrill to respond on these changes before you commit. Cheers /Marcus
Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
On 20 October 2015 at 16:47, Kyrill Tkachov wrote: > Here's the patch updated as per your feedback. > > How's this? > > Thanks, > Kyrill > > 2015-10-20 Kyrylo Tkachov > > * config/aarch64/aarch64.md > (*aarch64_fcvt2_mult): New pattern. > * config/aarch64/aarch64-simd.md > (*aarch64_fcvt2_mult): Likewise. > * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. > (aarch64_fpconst_pow_of_2): New function. > (aarch64_vec_fpconst_pow_of_2): Likewise. > * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare > prototype. > (aarch64_vec_fpconst_pow_of_2): Likewise. > * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. > (aarch64_fp_vec_pow2): Likewise. > > 2015-10-20 Kyrylo Tkachov > > > * gcc.target/aarch64/fmul_fcvt_1.c: New test. > * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. > OK /Marcus
Re: [PATCH][Testsuite] Turn on 64-bit-vector tests for AArch64
On 16 October 2015 at 12:26, Alan Lawrence wrote: > This enables tests bb-slp-11.c and bb-slp-26.c for AArch64. Both of these are > currently passing on little- and big-endian. > > (Tested on aarch64-none-linux-gnu and aarch64_be-none-elf). > > OK for trunk? > > gcc/testsuite/ChangeLog: > > * lib/target-supports.exp (check_effective_target_vect64): Add > AArch64. > --- > gcc/testsuite/lib/target-supports.exp | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 3088369..bd03108 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -4762,6 +4762,7 @@ proc check_effective_target_vect64 { } { > if { ([istarget arm*-*-*] > && [check_effective_target_arm_neon_ok] > && [check_effective_target_arm_little_endian]) > +|| [istarget aarch64*-*-*] > || [istarget sparc*-*-*] } { > set et_vect64_saved 1 > } > -- > 1.9.1 > OK /Marcus
RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
Hello Jeff: Did you get a chance to look at the below response. Please let me know your opinion on the below. Thanks & Regards Ajit -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Ajit Kumar Agarwal Sent: Saturday, September 12, 2015 4:09 PM To: Jeff Law; Richard Biener Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Thursday, September 10, 2015 3:10 AM To: Ajit Kumar Agarwal; Richard Biener Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation On 08/26/2015 11:29 PM, Ajit Kumar Agarwal wrote: > > Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c > > void dct_unquantize_h263_inter_c (short *block, int n, int qscale, int > nCoeffs) { int i, level, qmul, qadd; > > qadd = (qscale - 1) | 1; qmul = qscale << 1; > > for (i = 0; i <= nCoeffs; i++) { level = block[i]; if (level < 0) > level = level * qmul - qadd; else level = level * qmul + qadd; > block[i] = level; } } > > The above Loop is a candidate of path splitting as the IF block merges > at the latch of the Loop and the path splitting duplicates The latch > of the loop which is the statement block[i] = level into the > predecessors THEN and ELSE block. > > Due to above path splitting, the IF conversion is disabled and the > above IF-THEN-ELSE is not IF-converted and the test case fails. >>So I think the question then becomes which of the two styles generally >>results in better code? The path-split version or the older if-converted >>version. >>If the latter, then this may suggest that we've got the path splitting code >>at the wrong stage in the optimizer pipeline or that we need better >>heuristics for >>when to avoid applying path splitting. The code generated by the Path Splitting is useful when it exposes the DCE, PRE,CCP candidates. Whereas the IF-conversion is useful When the if-conversion exposes the vectorization candidates. If the if-conversion doesn't exposes the vectorization and the path splitting doesn't Exposes the DCE, PRE redundancy candidates, it's hard to predict. If the if-conversion does not exposes the vectorization and in the similar case Path splitting exposes the DCE , PRE and CCP redundancy candidates then path splitting is useful. Also the path splitting increases the granularity of the THEN and ELSE path makes better register allocation and code scheduling. The suggestion for keeping the path splitting later in the pipeline after the if-conversion and the vectorization is useful as it doesn't break the Existing Deja GNU tests. Also useful to keep the path splitting later in the pipeline after the if-conversion and vectorization is that path splitting Can always duplicate the merge node into its predecessor after the if-conversion and vectorization pass, if the if-conversion and vectorization Is not applicable to the Loops. But this suppresses the CCP, PRE candidates which are earlier in the optimization pipeline. > > There were following review comments from the above patch. > > +/* This function performs the feasibility tests for path splitting >> + to perform. Return false if the feasibility for path splitting >> + is not done and returns true if the feasibility for path >> splitting + is done. Following feasibility tests are performed. >> + + 1. Return false if the join block has rhs casting for assign >> + gimple statements. > > Comments from Jeff: > >>> These seem totally arbitrary. What's the reason behind each of >>> these restrictions? None should be a correctness requirement >>> AFAICT. > > In the above patch I have made a check given in point 1. in the loop > latch and the Path splitting is disabled and the IF-conversion happens > and the test case passes. >>That sounds more like a work-around/hack. There's nothing inherent with a >>type conversion that should disable path splitting. I have sent the patch with this change and I will remove the above check from the patch. >>What happens if we delay path splitting to a point after if-conversion is >>complete? This is better suggestion as explained above, but gains achieved through path splitting by keeping earlier in the pipeline before if-conversion , tree-vectorization, tree-vrp is suppressed if the following optimization after path splitting is not applicable for the above loops. I have made the above changes and the existing set up doesn't break but the gains achieved in the benchmarks like rgbcmy_lite(EEMBC) Benchmarks is suppressed. The path splitting for the above EEMBC benchmarks give gains of 9% and for such loops if-conversion and Vectorization is not applicable expo
Re: GCC 6 Status Report (2015-10-16)
On 10/20/2015 09:42 AM, Pierre-Marie de Rodat wrote: Hi Richard, On 10/16/2015 05:09 AM, Richard Biener wrote: This means it is time to get things you want to have in GCC 6 finalized and reviewed. As usual there may be exceptions to late reviewed features but don't count on that. Likewise target specific features can sneak in during Stage 3 if maintainers ok them. This is just to point out I have series of DWARF-related patches awaiting reviews for several months. I plan to rebase against trunk, retest and resubmit them shortly but here’s an overview: * The patch series for transition to standard DWARF for Ada (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01857.html). There are 8 patches, each one depending on the previous one, except the 6/8 one (https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01361.html) which could go on its own. * Fix DW_AT_static_link generation in DWARF (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01855.html). The corresponding support in GDB is in the development branch. * Track indirect calls as call site information in DWARF (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01856.html). * Materialize subprogram renamings in Ada as imported declarations in DWARF (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01854.html). The corresponding support in GDB is in the development branch. I've been leaving all that stuff to Jason & Jakub. My knowledge of dwarf, both the specs and our implementation is so limited that my review would be worthless. jeff
Re: [PATCH][AArch64] Add support for 64-bit vector-mode ldp/stp
On 16 October 2015 at 13:58, Kyrill Tkachov wrote: > Hi all, > > We already support load/store-pair operations on the D-registers when they > contain an FP value, but the peepholes/sched-fusion machinery that > do all the hard work currently ignore 64-bit vector modes. > > This patch adds support for fusing loads/stores of 64-bit vector operands > into ldp and stp instructions. > I've seen this trigger a few times in SPEC2006. Not too many times, but the > times it did trigger the code seemed objectively better > i.e. long sequences of ldr and str instructions essentially halved in size. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? > > Thanks, > Kyrill > > 2015-10-16 Kyrylo Tkachov > > * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p): We have several different flavours of fusion in the backend, this one is specifically load/stores, perhaps making that clear in the name of this predicate will avoid confusion further down the line? > New function. > (fusion_load_store): Use it. > * config/aarch64/aarch64-ldpstp.md: Add new peephole2s for > ldp and stp in VD modes. > * config/aarch64/aarch64-simd.md (load_pair, VD): New pattern. > (store_pair, VD): Likewise. > > 2015-10-16 Kyrylo Tkachov > > * gcc.target/aarch64/stp_vec_64_1.c: New test. > * gcc.target/aarch64/ldp_vec_64_1.c: New test. Otherwise OK /Marcus
Re: [PATCH][AArch64] Fix insn types
Kyrill, Indeed, the correct log would be: The type assigned to some insn definitions was not correct. gcc/ * config/aarch64/aarch64.md (*movhf_aarch64): Change the type of "mov %0.h[0], %1.h[0] to "neon_move". (*movtf_aarch64): Change the type of "fmov %s0, wzr" to "f_mcr". (*cmov_insn): Change the types of "mov %0, {-1,1}" to "mov_imm". (*cmovsi_insn_uxtw): Likewise. Thank you, -- Evandro Menezes On 10/20/2015 05:59 AM, Kyrill Tkachov wrote: Hi Evandro, On 19/10/15 22:05, Evandro Menezes wrote: The type assigned to some insn definitions was seemingly not correct: * "movi %d0, %1" was of type "fmov" * "fmov %s0, wzr" was of type "fconstd" * "mov %0, {-1,1}" were of type "csel" This patch changes their types to: * "movi %d0, %1" to type "neon_move" * "fmov %s0, wzr" to type "f_mcr" * "mov %0, {-1,1}" to type "mov_imm" Please, commit if it's alright. Thank you, Looking at your ChangeLog... gcc/ * config/aarch64/aarch64.md (*movdi_aarch64): Change the type of "movi %d0, %1" to "neon_move". (*movtf_aarch64): Change the type of "fmov %s0, wzr" to "f_mcr". (*cmov_insn): Change the types of "mov %0, {-1,1}" to "mov_imm". (*cmovsi_insn_uxtw): Idem The preferred form is "Likewise" rather than "Idem" AFAIK. Also, full stop at the end. --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1130,7 +1130,7 @@ ldrh\\t%w0, %1 strh\\t%w1, %0 mov\\t%w0, %w1" - [(set_attr "type" "neon_from_gp,neon_to_gp,fmov,\ + [(set_attr "type" "neon_from_gp,neon_to_gp,neon_move,\ f_loads,f_stores,load1,store1,mov_reg") (set_attr "simd" "yes,yes,yes,*,*,*,*,*") (set_attr "fp" "*,*,*,yes,yes,*,*,*")] I don't think this matches up with your changelog entry. This isn't the *movdi_aarch64 pattern. From what I can see the *movdi_aarch64 pattern already has the type neon_move on the movi\\t%d0, %1 alternative (the last one). In fact, if I apply your patch using "patch -p1" I see it being applied to the *movhf_aarch64 pattern. Is that what you intended? Thanks, Kyrill
Re: [PATCH 1/9] ENABLE_CHECKING refactoring
On 10/18/2015 12:17 AM, Mikhail Maltsev wrote: On 10/12/2015 11:57 PM, Jeff Law wrote: -#ifdef ENABLE_CHECKING +#if CHECKING_P I fail to see the point of this change. I'm guessing (and Mikhail, please correct me if I'm wrong), but I think he's trying to get away from ENABLE_CHECKING and instead use a macro which is always defined to a value. Yes, exactly. Such macro is better because it can be used both for conditional compilation (if needed) and normal if-s (unlike ENABLE_CHECKING). On 10/14/2015 12:33 AM, Jeff Law wrote: gcc/common.opt | 5 + gcc/system.h| 3 +++ libcpp/system.h | 8 3 files changed, 16 insertions(+) I committed this prerequisite patch to the trunk. jeff There is a typo here: #ifdef ENABLE_CHECKING #define gcc_checking_assert(EXPR) gcc_assert (EXPR) +#define CHECKING_P 1 #else +/* N.B.: in release build EXPR is not evaluated. */ #define gcc_checking_assert(EXPR) ((void)(0 && (EXPR))) +#define CHECKING_P 1 #endif In my original patch the second definition actually was '+#define CHECKING_P 0' Not sure how that happened. I'll take care of it. I'd actually planned to start extracting the non-controversial stuff out of the later patches, but just ran out of time before going onto PTO. Also, gcc_checking_assert in libcpp requires gcc_assert to be defined. That was missing in my original patch (and was added in patch 2/9), but I think it would be better to fix it here, as Bernd noticed (in the last paragraph of [1]). Sounds wise. Agreed. Besides, I like Richard's proposal [2] about moving CHECKING_P macros into configure.ac. [1] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00550.html [2] https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00555.html It required minor tweaking in order to silence autotools' warnings. I attached the modified patch. When I tried that, I couldn't get it to work. My autotools-fu is quite limited. OK for trunk (after bootstrap/regtest)? I'll take a closer look later today, after my morning meetings. P.S. I am planning to post at least some of the other updated parts today, and I also hope to get in time with the whole series (before stage1 ends). Excellent. jeff
Re: [PATCH][simplify-rtx][2/2] Use constants from pool when simplifying binops
On 19/10/15 15:37, Kyrill Tkachov wrote: Hi Bernd, On 19/10/15 15:31, Bernd Schmidt wrote: On 10/19/2015 03:57 PM, Kyrill Tkachov wrote: This second patch teaches simplify_binary_operation to return the dereferenced constants from the constant pool in the binary expression if other simplifications failed. This, combined with the 1/2 patch for aarch64 (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01744.html) allow for: int foo (float a) { return a * 32.0f; } to generate the code: foo: fcvtzs w0, s0, #5 ret because combine now successfully tries to match: (set (reg/i:SI 0 x0) (fix:SI (mult:SF (reg:SF 32 v0 [ a ]) (const_double:SF 3.2e+1 [0x0.8p+6] whereas before it would not try the to use the const_double directly but rather its constant pool reference. The only way I could see a problem with that if there are circumstances where the memory variant would simplify further. That doesn't seem highly likely, so... I that were the case, I'd expect the earlier call to simplify_binary_operation_1 have returned a non-NULL rtx, and the code in this patch would not come into play. * simplify-rtx.c (simplify_binary_operation): If either operand was a constant pool reference use them if all other simplifications failed. Ok. Thanks, I'll commit it when the first (aarch64-specific) patch is approved. This patch has already been ok'd, but I'd like to update the aarch64 test to use scan-assembler-times rather that just scan-assembler in light of Marcus' feedback to another patch. I committed this version to trunk with r229086. Thanks, Kyrill 2015-10-20 Kyrylo Tkachov * simplify-rtx.c (simplify_binary_operation): If either operand was a constant pool reference use them if all other simplifications failed. 2015-10-20 Kyrylo Tkachov * gcc.target/aarch64/fmul_fcvt_1.c: Add multiply-by-32 cases. Kyrill Bernd Index: gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c === --- gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c (revision 229085) +++ gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c (working copy) @@ -83,6 +83,16 @@ /* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#4" 1 } } */ /* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#4" 1 } } */ +FUNC_DEFS (32) +FUNC_DEFD (32) +/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], s\[0-9\]*.*#5" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], s\[0-9\]*.*#5" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], d\[0-9\]*.*#5" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], d\[0-9\]*.*#5" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], s\[0-9\]*.*#5" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], s\[0-9\]*.*#5" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#5" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#5" 1 } } */ #define FUNC_TESTS(__a, __b) \ do\ @@ -120,10 +130,12 @@ FUNC_TESTS (4, i); FUNC_TESTS (8, i); FUNC_TESTS (16, i); + FUNC_TESTS (32, i); FUNC_TESTD (4, i); FUNC_TESTD (8, i); FUNC_TESTD (16, i); + FUNC_TESTD (32, i); } return 0; } Index: gcc/simplify-rtx.c === --- gcc/simplify-rtx.c (revision 229084) +++ gcc/simplify-rtx.c (working copy) @@ -2001,7 +2001,17 @@ tem = simplify_const_binary_operation (code, mode, trueop0, trueop1); if (tem) return tem; - return simplify_binary_operation_1 (code, mode, op0, op1, trueop0, trueop1); + tem = simplify_binary_operation_1 (code, mode, op0, op1, trueop0, trueop1); + + if (tem) +return tem; + + /* If the above steps did not result in a simplification and op0 or op1 + were constant pool references, use the referenced constants directly. */ + if (trueop0 != op0 || trueop1 != op1) +return simplify_gen_binary (code, mode, trueop0, trueop1); + + return NULL_RTX; } /* Subroutine of simplify_binary_operation. Simplify a binary operation
[PATCH] Move some Cilk+ code around
The function is_cilkplus_vector_p is defined both in c-parser.c and parser.c and is exactly the same so it seems that it should rather be defined in the common Cilk+ code (even though for such a small static inline fn it probably doesn't matter much). While at it, fix some typos and simplify the code a bit. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-10-20 Marek Polacek * array-notation-common.c (is_cilkplus_vector_p): Define. * c-common.h (is_cilkplus_vector_p): Declare. * c-parser.c (is_cilkplus_vector_p): Don't define here. * parser.c (is_cilkplus_vector_p): Don't define here. diff --git gcc/c-family/array-notation-common.c gcc/c-family/array-notation-common.c index 85ded8d..6b55747 100644 --- gcc/c-family/array-notation-common.c +++ gcc/c-family/array-notation-common.c @@ -676,3 +676,12 @@ fix_sec_implicit_args (location_t loc, vec *list, vec_safe_push (array_operand, (*list)[ii]); return array_operand; } + +/* Returns true if NAME is an IDENTIFIER_NODE with identifier "vector", + "__vector", or "__vector__". */ + +bool +is_cilkplus_vector_p (tree name) +{ + return flag_cilkplus && is_attribute_p ("vector", name); +} diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h index cf44482..4b5cac8 100644 --- gcc/c-family/c-common.h +++ gcc/c-family/c-common.h @@ -1435,6 +1435,7 @@ extern void cilkplus_extract_an_triplets (vec *, size_t, size_t, vec > *); extern vec *fix_sec_implicit_args (location_t, vec *, vec, size_t, tree); +extern bool is_cilkplus_vector_p (tree); /* In cilk.c. */ extern tree insert_cilk_frame (tree); diff --git gcc/c/c-parser.c gcc/c/c-parser.c index 704ebc6..a5c1d84 100644 --- gcc/c/c-parser.c +++ gcc/c/c-parser.c @@ -3848,17 +3848,6 @@ c_parser_attribute_any_word (c_parser *parser) return attr_name; } -/* Returns true of NAME is an IDENTIFIER_NODE with identiifer "vector," - "__vector" or "__vector__." */ - -static inline bool -is_cilkplus_vector_p (tree name) -{ - if (flag_cilkplus && is_attribute_p ("vector", name)) -return true; - return false; -} - #define CILK_SIMD_FN_CLAUSE_MASK \ ((OMP_CLAUSE_MASK_1 << PRAGMA_CILK_CLAUSE_VECTORLENGTH) \ | (OMP_CLAUSE_MASK_1 << PRAGMA_CILK_CLAUSE_LINEAR) \ diff --git gcc/cp/parser.c gcc/cp/parser.c index f07a5e4..cfe5468 100644 --- gcc/cp/parser.c +++ gcc/cp/parser.c @@ -23225,17 +23225,6 @@ cp_parser_gnu_attributes_opt (cp_parser* parser) return attributes; } -/* Returns true of NAME is an IDENTIFIER_NODE with identiifer "vector," - "__vector" or "__vector__." */ - -static inline bool -is_cilkplus_vector_p (tree name) -{ - if (flag_cilkplus && is_attribute_p ("vector", name)) -return true; - return false; -} - /* Parse a GNU attribute-list. attribute-list: Marek
Re: [PATCH][ARM] Fix for testcase after r228661
On Tue, Oct 20, 2015 at 4:52 PM, Andre Vieira wrote: > Hi, > > This patch addresses PR-67948 by changing the xor-and.c test, initially > written for a simplify-rtx pattern, to make it pass post r228661 (see > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00676.html). This test no > longer triggered the simplify-rtx pattern it was written for prior to > r228661, though other optimizations did lead to the same assembly the test > checked for. The optimization added with r228661 matches the pattern used in > the test and optimizes it to a better and still valid sequence. Being unable > to easily change the test to trigger the original simplify-rtx pattern, I > chose to change it to pass with the new produced assembly sequence. > > This is correct because the transformation is valid and it yields a more > efficient pattern. However, as I pointed out before this test doesn't test > the optimization it originally was intended for. > > Tested by running regression tests for armv6. > > Is this OK to commit? > Missing Changelog - please remember to put the PR number in the Changelog in the correct format i.e PR testsuite/67948. Ok with that. I suspect that the simplify-rtx.c is much less likely to trigger given your match.pd change, but it will be fun proving that. regards ramana > Thanks, > Andre
Re: [PATCH][AArch64] Add support for 64-bit vector-mode ldp/stp
Hi Marcus, On 20/10/15 17:05, Marcus Shawcroft wrote: On 16 October 2015 at 13:58, Kyrill Tkachov wrote: Hi all, We already support load/store-pair operations on the D-registers when they contain an FP value, but the peepholes/sched-fusion machinery that do all the hard work currently ignore 64-bit vector modes. This patch adds support for fusing loads/stores of 64-bit vector operands into ldp and stp instructions. I've seen this trigger a few times in SPEC2006. Not too many times, but the times it did trigger the code seemed objectively better i.e. long sequences of ldr and str instructions essentially halved in size. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2015-10-16 Kyrylo Tkachov * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p): We have several different flavours of fusion in the backend, this one is specifically load/stores, perhaps making that clear in the name of this predicate will avoid confusion further down the line? Thanks for the review, This particular type of fusion is called sched_fusion in various places in the compiler and its implementation in aarch64 is only for load/store merging (indeed, the only usage of sched_fusion currently is to merge loads/stores in arm and aarch64). So, I think that sched_fusion in the name already conveys the information that it's the ldp/stp one rather than macro fusion. In fact, there is a macro fusion of ADRP and an LDR instruction, so having sched_fusion in the name is actually a better differentiator than mentioning loads/stores as both types of fusion deal with loads in some way. Is it ok to keep the name as is? Thanks, Kyrill New function. (fusion_load_store): Use it. * config/aarch64/aarch64-ldpstp.md: Add new peephole2s for ldp and stp in VD modes. * config/aarch64/aarch64-simd.md (load_pair, VD): New pattern. (store_pair, VD): Likewise. 2015-10-16 Kyrylo Tkachov * gcc.target/aarch64/stp_vec_64_1.c: New test. * gcc.target/aarch64/ldp_vec_64_1.c: New test. Otherwise OK /Marcus
Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
On Tue, Oct 20, 2015 at 4:26 PM, Marcus Shawcroft wrote: > On 19 October 2015 at 14:57, Kyrill Tkachov wrote: > >> 2015-10-19 Kyrylo Tkachov >> >> * config/aarch64/aarch64.md >> (*aarch64_fcvt2_mult): New pattern. >> * config/aarch64/aarch64-simd.md >> (*aarch64_fcvt2_mult): Likewise. >> * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. >> (aarch64_fpconst_pow_of_2): New function. >> (aarch64_vec_fpconst_pow_of_2): Likewise. >> * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare >> prototype. >> (aarch64_vec_fpconst_pow_of_2): Likewise. >> * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. >> (aarch64_fp_vec_pow2): Likewise. >> >> 2015-10-19 Kyrylo Tkachov >> >> * gcc.target/aarch64/fmul_fcvt_1.c: New test. >> * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. > > +char buf[64]; > +sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits); > > Prefer snprintf please Should we also update our coding standards for this ? i.e. use the non `n' versions of the string functions only if you have a very good reason. Even more so in the run time support libraries ! Maybe we can go to the extreme of poisoning sprintf too, but that's probably an extreme... regards Ramana > > + } > + [(set_attr "type" "neon_fp_to_int_")] > +) > + > + > > Superflous blank line here ? > > + *cost += rtx_cost (XEXP (x, 0), VOIDmode, > + (enum rtx_code) code, 0, speed); > > My understanding is the unnecessary use of enum is now discouraged, > (rtx_code) is sufficient in this case. > > + int count = CONST_VECTOR_NUNITS (x); > + int i; > + for (i = 1; i < count; i++) > > Push the int into the for initializer. > Push the rhs of the count assignment into the for condition and drop > the definition of count. > > +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */ > > I'd prefer scan-assembler-times or do you have a particular reason to > avoid it in these tests? > > Cheers > /Marcus
[committed, PATCH] Add --enable-compressed-debug-sections={all,gas,gold,ld}
Add --enable-compressed-debug-sections={all,gas,gold,ld} This patch removes the gas configure option: --enable-compressed-debug-sections and adds a toplevel configure option: --enable-compressed-debug-sections={all,gas,gold,ld} to enable compressed debug sections for gas, gold or ld by default. At the moment, this configure option is ignored by gold and ld. For x86 Linux targets, default to compressing debug sections in gas. Sync with binutils-gdb: PR gas/19109 * configure.ac: Add --enable-compressed-debug-sections={all,gas,gold,ld}. * configure: Regenerated. diff --git a/ChangeLog b/ChangeLog index 1d9600a..2c19e94 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,13 @@ +2015-10-20 H.J. Lu + + Sync with binutils-gdb: + 2015-10-20 H.J. Lu + + PR gas/19109 + * configure.ac: Add + --enable-compressed-debug-sections={all,gas,gold,ld}. + * configure: Regenerated. + 2015-10-16 Arnaud Charlet * MAINTAINERS: Update list of Ada maintainers and email addresses. diff --git a/configure b/configure index eca5e6f..f66f424 100755 --- a/configure +++ b/configure @@ -753,6 +753,7 @@ enable_as_accelerator_for enable_offload_targets enable_gold enable_ld +enable_compressed_debug_sections enable_libquadmath enable_libquadmath_support enable_libada @@ -1476,6 +1477,9 @@ Optional Features: offload target compiler during the build --enable-gold[=ARG] build gold [ARG={default,yes,no}] --enable-ld[=ARG] build ld [ARG={default,yes,no}] + --enable-compressed-debug-sections={all,gas,gold,ld} + Enable compressed debug sections for gas, gold or ld + by default --disable-libquadmath do not build libquadmath directory --disable-libquadmath-support disable libquadmath support for Fortran @@ -3013,6 +3017,21 @@ $as_echo "$as_me: WARNING: neither ld nor gold are enabled" >&2;} ;; esac +# PR gas/19109 +# Decide the default method for compressing debug sections. +# Provide a configure time option to override our default. +# Check whether --enable-compressed_debug_sections was given. +if test "${enable_compressed_debug_sections+set}" = set; then : + enableval=$enable_compressed_debug_sections; + if test x"$enable_compressed_debug_sections" = xyes; then +as_fn_error "no program with compressed debug sections specified" "$LINENO" 5 + fi + +else + enable_compressed_debug_sections= +fi + + # Configure extra directories which are host specific case "${host}" in diff --git a/configure.ac b/configure.ac index 9241261..cb6ca24 100644 --- a/configure.ac +++ b/configure.ac @@ -393,6 +393,19 @@ case "${ENABLE_LD}" in ;; esac +# PR gas/19109 +# Decide the default method for compressing debug sections. +# Provide a configure time option to override our default. +AC_ARG_ENABLE(compressed_debug_sections, +[AS_HELP_STRING([--enable-compressed-debug-sections={all,gas,gold,ld}], + [Enable compressed debug sections for gas, gold or ld by +default])], +[ + if test x"$enable_compressed_debug_sections" = xyes; then +AC_MSG_ERROR([no program with compressed debug sections specified]) + fi +], [enable_compressed_debug_sections=]) + # Configure extra directories which are host specific case "${host}" in
patch to fix PR67609
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67609 The patch was bootstrapped and tested on x86/x86-64, arm, aarch64. It was also tested on ppc64 (unfortunately bootstrap with LRA is broken now). Committed as rev. 229087. Index: ChangeLog === --- ChangeLog (revision 228844) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2015-10-20 Vladimir Makarov + + PR rtl-optimization/67609 + * lra-splill.c (lra_final_code_change): Don't remove all + sub-registers. + 2015-10-15 Marek Polacek * tree-ssa-reassoc.c (attempt_builtin_copysign): Call Index: testsuite/ChangeLog === --- testsuite/ChangeLog (revision 228844) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2015-10-20 Vladimir Makarov + + PR rtl-optimization/67609 + * gcc.target/i386/pr67609.c: New. + 2015-10-15 Marek Polacek * gcc.dg/tree-ssa/reassoc-42.c: New test. Index: lra-spills.c === --- lra-spills.c (revision 228844) +++ lra-spills.c (working copy) @@ -727,14 +727,44 @@ lra_final_code_change (void) lra_insn_recog_data_t id = lra_get_insn_recog_data (insn); struct lra_static_insn_data *static_id = id->insn_static_data; bool insn_change_p = false; - - for (i = id->insn_static_data->n_operands - 1; i >= 0; i--) - if ((DEBUG_INSN_P (insn) || ! static_id->operand[i].is_operator) - && alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn))) - { - lra_update_dup (id, i); - insn_change_p = true; - } + + for (i = id->insn_static_data->n_operands - 1; i >= 0; i--) + { + if (! DEBUG_INSN_P (insn) && static_id->operand[i].is_operator) + continue; + + rtx op = *id->operand_loc[i]; + + if (static_id->operand[i].type == OP_OUT + && GET_CODE (op) == SUBREG && REG_P (SUBREG_REG (op)) + && ! LRA_SUBREG_P (op)) + { + hard_regno = REGNO (SUBREG_REG (op)); + /* We can not always remove sub-registers of + hard-registers as we may lose information that + only a part of registers is changed and + subsequent optimizations may do wrong + transformations (e.g. dead code eliminations). + We can not also keep all sub-registers as the + subsequent optimizations can not handle all such + cases. Here is a compromise which works. */ + if ((GET_MODE_SIZE (GET_MODE (op)) + < GET_MODE_SIZE (GET_MODE (SUBREG_REG (op + && (hard_regno_nregs[hard_regno][GET_MODE (SUBREG_REG (op))] + == hard_regno_nregs[hard_regno][GET_MODE (op)]) +#ifdef STACK_REGS + && (hard_regno < FIRST_STACK_REG + || hard_regno > LAST_STACK_REG) +#endif + ) + continue; + } + if (alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn))) + { + lra_update_dup (id, i); + insn_change_p = true; + } + } if (insn_change_p) lra_update_operator_dups (id); } Index: testsuite/gcc.target/i386/pr67609.c === --- testsuite/gcc.target/i386/pr67609.c (revision 0) +++ testsuite/gcc.target/i386/pr67609.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-final { scan-assembler "movdqa" } } */ + +#include +__m128d reg; +void set_lower(double b) +{ + double v[2]; + _mm_store_pd(v, reg); + v[0] = b; + reg = _mm_load_pd(v); +}
Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
On 20/10/15 17:28, Ramana Radhakrishnan wrote: On Tue, Oct 20, 2015 at 4:26 PM, Marcus Shawcroft wrote: On 19 October 2015 at 14:57, Kyrill Tkachov wrote: 2015-10-19 Kyrylo Tkachov * config/aarch64/aarch64.md (*aarch64_fcvt2_mult): New pattern. * config/aarch64/aarch64-simd.md (*aarch64_fcvt2_mult): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. (aarch64_fpconst_pow_of_2): New function. (aarch64_vec_fpconst_pow_of_2): Likewise. * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare prototype. (aarch64_vec_fpconst_pow_of_2): Likewise. * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. (aarch64_fp_vec_pow2): Likewise. 2015-10-19 Kyrylo Tkachov * gcc.target/aarch64/fmul_fcvt_1.c: New test. * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. +char buf[64]; +sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits); Prefer snprintf please Should we also update our coding standards for this ? i.e. use the non `n' versions of the string functions only if you have a very good reason. Even more so in the run time support libraries ! My understanding is one should *really* use snprintf if any part of the string comes from the user. So in this case it probably doesn't make a difference. However, it's not hard to use the 'n' version and thinking about the maximum length is probably a good practice to get into. Kyrill Maybe we can go to the extreme of poisoning sprintf too, but that's probably an extreme... regards Ramana + } + [(set_attr "type" "neon_fp_to_int_")] +) + + Superflous blank line here ? + *cost += rtx_cost (XEXP (x, 0), VOIDmode, + (enum rtx_code) code, 0, speed); My understanding is the unnecessary use of enum is now discouraged, (rtx_code) is sufficient in this case. + int count = CONST_VECTOR_NUNITS (x); + int i; + for (i = 1; i < count; i++) Push the int into the for initializer. Push the rhs of the count assignment into the for condition and drop the definition of count. +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */ I'd prefer scan-assembler-times or do you have a particular reason to avoid it in these tests? Cheers /Marcus
Re: [patch 4/6] scalar-storage-order merge: bulk
> I suspect there are many comments that we should update as a result of > this work. Essentially there's many throughout GCC of the form "On a > big-endian target" and the like. With these changes it's more a > property of the data -- the vast majority of the time the data's > property is the same as the target, but with this patch it can vary. > > I just happened to spot one in varasm.c::output_constructor_bitfield, as > the comment started shortly after some code this patch changes. No > doubt there's others. I'm torn whether or not to ask you to scan and > update them. It's a lot of work and I'm not terribly sure how valuable > it'll generally be. I have adjusted the output_constructor_bitfield case to "For big-endian data". The cases in expmed.c have naked "big-endian" or "big-endian case" so are OK. I have changed 2 sibling cases in expr.c but left the others unchanged since they are about calling conventions. I think that's pretty much it for the files that are touched by the patch. > Thanks for not trying too hard to optimize this stuff, it makes the > expmed.c patches (for example) a lot easier to work through :-) > > I didn't even try to verify you've got all the paths covered. I mostly > tried to make sure the patch didn't break existing code. I'm going to > assume that the patch essentially works and that if there's paths > missing where reversal is needed that you'll take care of them as > they're exposed. Yes, the main issue in expr.c & expmed.c is getting all the paths covered. I have run the compile and execute C torture testsuites with -fsso-struct set to big-endian on x86-64 so I'm relatively confident, but that's not a proof. And one of the design constraints of the implementation was to change nothing to the default endianness support, so I'm more confident about the absence of breakages (IIRC we didn't get a single PR for a breakage with the AdaCore compilers while we did get a few for cases where the data was not reversed). > I'm a bit dismayed at the number of changes to fold-const.c, but > presumably there's no good way around them. I probably would have > looked for a way to punt earlier (that may not be trivial though). > Given you've done the work, no need to undo it now. Yes, that's the bitfield optimization stuff, so it depends on the endianness. Initially the changes were even more pervasive because I tried to optimize e.g. equality comparisons of 2 bitfields with reverse endianness in there. But this broke the "all accesses to a scalar in memory are done with the same storage order" invariant and was thus causing annoying issues later on. > I think this patch is fine. OK, thanks for the thorough review. All the parts have been approved, modulo the C++ FE part. As I already explained, this part is very likely incomplete (based on the changes that were made in the Ada FE proper) and only makes sure that C code compiled by the C++ compiler works as intended, but nothing more. So I can propose to install only the generic, C and Ada changes for now and to explicitly disable the feature for C++ until a more complete implementation of the C++ FE part is written. I'm OK to work on it but I'll probably need help. -- Eric Botcazou
[patch] Fix failure of ACATS c45503c at -O2
Hi, this test started to fail recently as the result of the work of Richard S., but the underlying issue had been latent for a long time. It boils down to this excerpt from the VRP1 dump file: Found new range for _9: [0, 12] marking stmt to be not simulated again Visiting statement: _3 = _9 %[fl] _11; Found new range for _3: [0, +INF] which is plain wrong since the sign of FLOOR_MOD_EXPR is the sign of the divisor and not that of the dividend (the latter is for TRUNC_MOD_EXPR). I have attached a fix for mainline and another one for the 4.9/5 branches. Tested on x86_64-suse-linux, OK for all active branches? 2015-10-20 Eric Botcazou * fold-const.c (tree_binary_nonnegative_warnv_p) : Recurse on operand #1 instead of operand #0. : Do not recurse. : Likewise. -- Eric BotcazouIndex: fold-const.c === --- fold-const.c (revision 229022) +++ fold-const.c (working copy) @@ -12982,11 +12982,15 @@ tree_binary_nonnegative_warnv_p (enum tr return RECURSE (op0) && RECURSE (op1); case TRUNC_MOD_EXPR: -case CEIL_MOD_EXPR: -case FLOOR_MOD_EXPR: -case ROUND_MOD_EXPR: + /* The sign of the remainder is that of the dividend. */ return RECURSE (op0); +case FLOOR_MOD_EXPR: + /* The sign of the remainder is that of the divisor. */ + return RECURSE (op1); + +case CEIL_MOD_EXPR: +case ROUND_MOD_EXPR: default: return tree_simple_nonnegative_warnv_p (code, type); } Index: fold-const.c === --- fold-const.c (revision 228932) +++ fold-const.c (working copy) @@ -14853,11 +14853,17 @@ tree_binary_nonnegative_warnv_p (enum tr strict_overflow_p)); case TRUNC_MOD_EXPR: -case CEIL_MOD_EXPR: -case FLOOR_MOD_EXPR: -case ROUND_MOD_EXPR: + /* The sign of the remainder is that of the dividend. */ return tree_expr_nonnegative_warnv_p (op0, strict_overflow_p); + +case FLOOR_MOD_EXPR: + /* The sign of the remainder is that of the divisor. */ + return tree_expr_nonnegative_warnv_p (op1, + strict_overflow_p); + +case CEIL_MOD_EXPR: +case ROUND_MOD_EXPR: default: return tree_simple_nonnegative_warnv_p (code, type); }
Re: [PATCH][ARM] Fix for testcase after r228661
On 20/10/15 17:25, Ramana Radhakrishnan wrote: On Tue, Oct 20, 2015 at 4:52 PM, Andre Vieira wrote: Hi, This patch addresses PR-67948 by changing the xor-and.c test, initially written for a simplify-rtx pattern, to make it pass post r228661 (see https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00676.html). This test no longer triggered the simplify-rtx pattern it was written for prior to r228661, though other optimizations did lead to the same assembly the test checked for. The optimization added with r228661 matches the pattern used in the test and optimizes it to a better and still valid sequence. Being unable to easily change the test to trigger the original simplify-rtx pattern, I chose to change it to pass with the new produced assembly sequence. This is correct because the transformation is valid and it yields a more efficient pattern. However, as I pointed out before this test doesn't test the optimization it originally was intended for. Tested by running regression tests for armv6. Is this OK to commit? Missing Changelog - please remember to put the PR number in the Changelog in the correct format i.e PR testsuite/67948. Ok with that. I suspect that the simplify-rtx.c is much less likely to trigger given your match.pd change, but it will be fun proving that. Ideally, I'd like to prove the simplify-rtx.c couldn't be triggered anymore. Though I wouldn't know where to begin with that. I did spend a tiny effort trying to trigger it with a version before the match.pd change, but with no success. regards ramana Thanks, Andre Here is the ChangeLog (with the PR), sorry for that! gcc/testsuite/ChangeLog 2015-10-15 Andre Vieira PR testsuite/67948 * gcc.target/arm/xor-and.c: check for eor instead of orr. From 89922547118e716b41ddf6edefb274322193f25c Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Thu, 15 Oct 2015 12:48:26 +0100 Subject: [PATCH] Fix for xor-and.c test --- gcc/testsuite/gcc.target/arm/xor-and.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/arm/xor-and.c b/gcc/testsuite/gcc.target/arm/xor-and.c index 53dff85f8f780fb99a93bbcc24180a3d0d5d3be9..3715530cd7bf9ad8abb24cb21cd51ae3802079e8 100644 --- a/gcc/testsuite/gcc.target/arm/xor-and.c +++ b/gcc/testsuite/gcc.target/arm/xor-and.c @@ -10,6 +10,6 @@ unsigned short foo (unsigned short x) return x; } -/* { dg-final { scan-assembler "orr" } } */ +/* { dg-final { scan-assembler "eor" } } */ /* { dg-final { scan-assembler-not "mvn" } } */ /* { dg-final { scan-assembler-not "uxth" } } */ -- 1.9.1
[HSA] Fix-up of kernel-from-kernel dispatch mechanism
Hello. Following patch fixes up HSA kernel from kernel dispatching mechanism, where we forgot to wait in a loop for the dispatched child kernel. Apart from that, there's a small follow-up which changes naming scheme for HSA modules. Martin >From eca686b6495bf7faa32ecb292f94558c6bfdbdce Mon Sep 17 00:00:00 2001 From: marxin Date: Tue, 20 Oct 2015 11:32:58 +0200 Subject: [PATCH 1/2] HSA: fix kernel-from-kernel dispatch mechanism libgomp/ChangeLog: 2015-10-20 Martin Liska * plugin/plugin-hsa.c (struct agent_info): Add new kernel_dispatch_command_q member. (GOMP_OFFLOAD_init_device): Initialize the queue. (init_single_kernel): Verify that kernel dispatching is not run with a big depth. (create_kernel_dispatch): Rename from create_kernel_dispatch_recursive. (GOMP_OFFLOAD_run): Add signal waiting workaround. (GOMP_OFFLOAD_fini_device): Destroy the newly added queue. gcc/ChangeLog: 2015-10-20 Martin Liska * hsa-gen.c (gen_hsa_insns_for_kernel_call): Add waiting for the signal in a loop. (gen_body_from_gimple): Skip HSA basic blocks with already created HBB. --- gcc/hsa-gen.c | 38 -- libgomp/plugin/plugin-hsa.c | 56 + 2 files changed, 78 insertions(+), 16 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index ab27eb1..c36c9e0 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -3943,22 +3943,41 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call) signal->m_memoryscope = BRIG_MEMORY_SCOPE_SYSTEM; hbb->append_insn (signal); + /* Prepare CFG for waiting loop. */ + edge e = split_block (hbb->m_bb, call); + + basic_block dest = split_edge (e); + edge false_e = EDGE_SUCC (dest, 0); + + false_e->flags &= ~EDGE_FALLTHRU; + false_e->flags |= EDGE_FALSE_VALUE; + + make_edge (e->dest, dest, EDGE_TRUE_VALUE); + /* Emit blocking signal waiting instruction. */ + hsa_bb *new_hbb = hsa_init_new_bb (dest); + hbb->append_insn (new hsa_insn_comment ("wait for the signal")); - hsa_op_reg *signal_result_reg = new hsa_op_reg (BRIG_TYPE_U64); + hsa_op_reg *signal_result_reg = new hsa_op_reg (BRIG_TYPE_S64); c = new hsa_op_immed (1, BRIG_TYPE_S64); - hsa_op_immed *c2 = new hsa_op_immed (UINT64_MAX, BRIG_TYPE_U64); - signal = new hsa_insn_signal (4, BRIG_OPCODE_SIGNAL, -BRIG_ATOMIC_WAITTIMEOUT_LT, BRIG_TYPE_S64); + signal = new hsa_insn_signal (3, BRIG_OPCODE_SIGNAL, +BRIG_ATOMIC_WAIT_LT, BRIG_TYPE_S64); signal->m_memoryorder = BRIG_MEMORY_ORDER_SC_ACQUIRE; signal->m_memoryscope = BRIG_MEMORY_SCOPE_SYSTEM; signal->set_op (0, signal_result_reg); signal->set_op (1, signal_reg); signal->set_op (2, c); - signal->set_op (3, c2); - hbb->append_insn (signal); + new_hbb->append_insn (signal); + + hsa_op_reg *ctrl = new hsa_op_reg (BRIG_TYPE_B1); + hsa_insn_cmp *cmp = new hsa_insn_cmp +(BRIG_COMPARE_EQ, ctrl->m_type, ctrl, signal_result_reg, + new hsa_op_immed (1, signal_result_reg->m_type)); + + new_hbb->append_insn (cmp); + new_hbb->append_insn (new hsa_insn_br (ctrl)); hsa_cfun->m_kernel_dispatch_count++; } @@ -4763,7 +4782,11 @@ gen_body_from_gimple (vec *ssa_map) FOR_EACH_BB_FN (bb, cfun) { gimple_stmt_iterator gsi; - hsa_bb *hbb = hsa_init_new_bb (bb); + hsa_bb *hbb = hsa_bb_for_bb (bb); + if (hbb) + continue; + + hbb = hsa_init_new_bb (bb); for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { @@ -4777,6 +4800,7 @@ gen_body_from_gimple (vec *ssa_map) { gimple_stmt_iterator gsi; hsa_bb *hbb = hsa_bb_for_bb (bb); + gcc_assert (hbb != NULL); for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) if (!virtual_operand_p (gimple_phi_result (gsi_stmt (gsi diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c index ed3573f..45dfe9b 100644 --- a/libgomp/plugin/plugin-hsa.c +++ b/libgomp/plugin/plugin-hsa.c @@ -220,6 +220,8 @@ struct agent_info hsa_isa_t isa; /* Command queue of the agent. */ hsa_queue_t* command_q; + /* Kernel from kernel dispatch command queue. */ + hsa_queue_t* kernel_dispatch_command_q; /* The HSA memory region from which to allocate kernel arguments. */ hsa_region_t kernarg_region; @@ -447,6 +449,12 @@ GOMP_OFFLOAD_init_device (int n) if (status != HSA_STATUS_SUCCESS) hsa_fatal ("Error creating command queue", status); + status = hsa_queue_create (agent->id, queue_size, HSA_QUEUE_TYPE_MULTI, + queue_callback, NULL, UINT32_MAX, UINT32_MAX, + &agent->kernel_dispatch_command_q); + if (status != HSA_STATUS_SUCCESS) +hsa_fatal ("Error creating kernel dispatch command queue", status); + agent->kernarg_region.handle = (uint64_t) -1; status = hsa_agent_iterate_regions (agent->id, get_kernarg_memory_region, &agent->kernarg_region); @@ -455,6 +463,8 @@ GOMP_OFFLOAD_init_device (int n) "arguments"); HSA_DEBUG ("HSA agent init
Re: [PATCH][AArch64] Enable fusion of AES instructions
On 14/10/15 13:30, Wilco Dijkstra wrote: Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. This can give up to 2x speedup on many AArch64 implementations. Also model the crypto instructions on Cortex-A57 according to the Optimization Guide. Passes regression tests. arm-wise this is ok, but I'd like a follow up patch to enable this fusion for the arm port as well. It should be fairly simple. Just add a new enum value to fuse_ops inside tune_params in arm-protos.h and update the arm implementation in aarch_macro_fusion_pair_p similar to your aarch64 implementation. Thanks, Kyrill ChangeLog: 2015-10-14 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (cortexa53_tunings): Add AES fusion. (cortexa57_tunings): Likewise. (cortexa72_tunings): Likewise. (arch_macro_fusion_pair_p): Add support for AES fusion. * gcc/config/aarch64/aarch64-fusion-pairs.def: Add AES_AESMC entry. * gcc/config/arm/aarch-common.c (aarch_crypto_can_dual_issue): Allow virtual registers before reload so early scheduling works. * gcc/config/arm/cortex-a57.md (cortex_a57_crypto_simple): Use correct latency and pipeline. (cortex_a57_crypto_complex): Likewise. (cortex_a57_crypto_xor): Likewise. (define_bypass): Add AES bypass. --- gcc/config/aarch64/aarch64-fusion-pairs.def | 1 + gcc/config/aarch64/aarch64.c| 10 +++--- gcc/config/arm/aarch-common.c | 7 +-- gcc/config/arm/cortex-a57.md| 17 +++-- 4 files changed, 24 insertions(+), 11 deletions(-) diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def index 53bbef4..fea79fc 100644 --- a/gcc/config/aarch64/aarch64-fusion-pairs.def +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def @@ -33,4 +33,5 @@ AARCH64_FUSION_PAIR ("adrp+add", ADRP_ADD) AARCH64_FUSION_PAIR ("movk+movk", MOVK_MOVK) AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR) AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH) +AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 230902d..96368c6 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -376,7 +376,7 @@ static const struct tune_params cortexa53_tunings = &generic_branch_cost, 4, /* memmov_cost */ 2, /* issue_rate */ - (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops */ 8, /* function_align. */ 8, /* jump_align. */ @@ -398,7 +398,7 @@ static const struct tune_params cortexa57_tunings = &generic_branch_cost, 4, /* memmov_cost */ 3, /* issue_rate */ - (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ 16, /* function_align. */ 8, /* jump_align. */ @@ -420,7 +420,7 @@ static const struct tune_params cortexa72_tunings = &generic_branch_cost, 4, /* memmov_cost */ 3, /* issue_rate */ - (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops */ 16, /* function_align. */ 8, /* jump_align. */ @@ -12843,6 +12843,10 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) } } + if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_AES_AESMC) + && aarch_crypto_can_dual_issue (prev, curr)) +return true; + if ((aarch64_tune_params.fusible_ops & AARCH64_FUSE_CMP_BRANCH) && any_condjump_p (curr)) { diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c index 5dd8222..e191ab6 100644 --- a/gcc/config/arm/aarch-common.c +++ b/gcc/config/arm/aarch-common.c @@ -63,8 +63,11 @@ aarch_crypto_can_dual_issue (rtx_insn *producer_insn, rtx_insn *consumer_insn) { unsigned int regno = REGNO (SET_DEST (producer_set)); -return REGNO (SET_DEST (consumer_set)) == regno - && REGNO (XVECEXP (consumer_src, 0, 0)) == regno; +/* Before reload the registers are virtual, so the destination of + consumer_set doesn't need to match. */ + +return (REGNO (SET_DEST (consumer_set)) == regno || !reload_completed) + && REGNO (XVECEXP (consumer_src, 0, 0)) == regno; } return 0; diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md index a32c848..eab9d99 100644 --- a/gcc/config/arm/cortex-a57.md +++ b/gcc/config/arm/cortex-a57.md @@ -745,20 +745,20 @@ neon_fp_sqrt_s_q, neon_fp_sqrt_d_q")) "ca57_cx2_block*3") -(define_insn_reservation "cortex_a57_crypto_simple" 4 +(define_insn_reservation "cortex_a57_crypto_simple" 3 (and (eq_attr "tune" "
Re: [C PATCH] Don't accept invalid attribute-list (PR c/67964)
On Tue, 20 Oct 2015, Marek Polacek wrote: > Joseph noticed that we were wrongly accepting multiple attributes without > commas. Thus fixed by breaking out of the loop when parsing the attributes -- > if we don't see a comma after an attribute, then the next tokens must be )); > if > not, then the attribute is invalid. I've also added a few more comments. > > (I think it would be nicer to model the code after C++ and introduce > c_parser_attributes_list, but the following is enough to fix the bug.) > > Bootstrapped/regtested on x86_64-linux, ok for trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof
On Tue, 20 Oct 2015, Martin Sebor wrote: > I think -Warray-bounds should emit consistent diagnostics for invalid > array references regardless of the contexts. I.e., given > > struct S { > int A [5][7]; > int x; > } s; > > these should both be diagnosed: > > int i = offsetof (struct S, A [0][7]); > > int *p = &s.A [0][7]; > > because they are both undefined and both can lead to surprising > results when used. But both are valid. &s.A [0][7] means s.A[0] + 7 (as explicitly specified in C11, neither the & nor the [] is evaluated in this case, but the [] turns into a +), and s.A[0] is an object of type int[7], which decays to a pointer to the first element of that array, so adding 7 produces a just-past-end pointer. It's not valid to dereference that pointer, but the pointer itself is valid (and subtracting 1 from it produces a pointer you can dereference). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof
On 10/20/2015 09:48 AM, Bernd Schmidt wrote: On 10/20/2015 05:31 PM, Martin Sebor wrote: On 10/20/2015 07:20 AM, Bernd Schmidt wrote: On 10/16/2015 09:34 PM, Martin Sebor wrote: Thank you for the review. Attached is an updated patch that hopefully addresses all your comments. I ran the check_GNU_style.sh script on it to make sure I didn't miss something. I've also added replies to a few of your comments below. Ok, thanks. However - the logic around the context struct in the patch still seems fairly impenetrable to me, and I've been wondering whether a simpler approach wouldn't work just as well. I came up with the patch below, which just passes a tree_code context to recursive invocations and increasing the upper bound by one only if not in an ARRAY_REF or COMPONENT_REF context. As far as I can tell, this produces the same warnings on the testcase (modulo differences in type printout), except for the final few FA5_7 testcases, which I had doubts about anyway: typedef struct FA5_7 { int i; char a5_7 [5][7]; } FA5_7; __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" } Why wouldn't at least the first two be convered by the one-past-the-end-is-ok rule? Thanks for taking the time to try to simplify the patch! Diagnosing the cases above is at the core of the issue (and can be seen by substituting a5_7 into the oversimplified test case in the bug). In a 5 by 7 array, the offset computed for the out-of-bounds a_5_7[0][7] is the same as the offset computed for the in-bounds a_5_7[1][0]. The result might be unexpected because it suggests that two "distinct" elements of an array share the same offset. Yeah, but Joseph wrote: Given such a flexible array member, [anything][0] and [anything][1] are logically valid if [anything] is nonnegative (of course, the array might or might not have enough element at runtime). [anything][2] is a reference to just past the end of an element of the flexible array member; [anything][3] is clearly invalid. Regarding the actual testcase, accepting a1_1[0][1] seems fine (it's referencing the just-past-end element of the valid array a1_1[0]). So how's that different from a5_7[0][7]? In case it matters, a1_1 in the earlier test that was discussed could not be a flexible array because it was followed by another field in the same struct. Let me try to explain using a couple of examples. Given struct S { a1_1 [1][1]; int x; } s; where a1_1 is an ordinary array of a single element offseof (struct S, a1_1[0][1]) is accepted as a GCC extension (and, IMO, should be made valid in C) because a) there is no way to mistake the offset for the offset of an element of the array at a valid index, and because b) it yields to the same value as the following valid expression: (char*)&s.a1_1[0][1] - (char*)&s.a1_1; However, given struct S { a5_7 [5][7]; int x; } s; the invalid offsetof expression offseof (struct S, a5_7[0][7]) should be diagnosed because a) it yields the same offset of the valid offseof (struct S, a5_7[1][0]) and b) because s.a5_7[0][7] is not a valid reference in any context (i.e., (char*)&s.a5_7[0][7] - (char*)&s.a5_7 is not a valid expression). Handing over review duties to Joseph because he'll have to decide what to warn about and what not to. struct S { int A [5][7]; int x; } s; these should both be diagnosed: int i = offsetof (struct S, A [0][7]); int *p = &s.A [0][7]; because they are both undefined and both can lead to surprising results when used. Is that really undefined? I thought it was explicitly allowed. Dereferencing the pointer is what's undefined. Both are undefined. The most succinct statements to this effect are in the Unde3fined Behavior section of Annex J: An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6). The member designator parameter of an offsetof macro is an invalid right operand of the . operator for the type parameter, or designates a bit-field (7.19). Martin