Re: [PATCH, testsuite]: Cleanup lib/target-supports.exp, ...
On 1 November 2016 at 23:41, Uros Bizjak wrote: > On Tue, Nov 1, 2016 at 5:05 PM, Jakub Jelinek wrote: >> On Tue, Nov 01, 2016 at 10:05:22AM +0100, Uros Bizjak wrote: >>> ... simplify some conditions and add i?86-*-* target where missing. >>> >>> 2016-11-01 Uros Bizjak >>> >>> * lib/target-supports.exp: Normalize order of i?86 and x86_64 targets. >>> Whitespace fixes. >> ... >>> (check_effective_target_divmod): Add i?86-*-* target. >> >> This part likely broke >> +FAIL: gcc.dg/divmod-1.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> +FAIL: gcc.dg/divmod-2.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> +FAIL: gcc.dg/divmod-3.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> +FAIL: gcc.dg/divmod-4.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> +FAIL: gcc.dg/divmod-6.c scan-tree-dump-times widening_mul "DIVMOD" 7 >> on i686-linux (i.e. 32-bit). > > No, this is expected (these tests already fail with x86_64 -m32 > multilib). These will be fixed by [1]. Oops, sorry for the breakage. The tests are meant to check if the divmod transform triggered, which is done by scanning DIVMOD in the widening_mul dump. Apparently I only checked for the triplet "x86_64-*-*" in check_effective_target_divmod() and it returned 1, which probably caused the divmod DImode tests to fail with -m32. In general, could I check in check_effective_target_*(), what options are passed ? So in case of -m32, I wanted to return 0 instead of 1 to make the tests on 32-bit UNSUPPORTED. Thanks for fixing the test-cases! Thanks, Prathamesh > > [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02483.html > > Uros. > >> Dunno what exactly the tests are meant to test, most likely they just >> need extra guards or something. Can be reproduced even on x86_64-linux >> with >> make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} dg.exp=divmod*' >> >>> @@ -8110,7 +8090,7 @@ >>> #TODO: Add checks for all targets that have either hardware divmod insn >>> # or define libfunc for divmod. >>> if { [istarget arm*-*-*] >>> - || [istarget x86_64-*-*] } { >>> + || [istarget i?86-*-*] || [istarget x86_64-*-*] } { >>> return 1 >>> } >>> return 0 >> >> >> Jakub
[ping * 4] PR35503 - warn for restrict
Pinging patch: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01545.html Thanks, Prathamesh
Re: [PATCH] enhance buffer overflow warnings (and c/53562)
On Tue, Nov 01, 2016 at 08:55:03PM -0600, Martin Sebor wrote: > struct S { > int a, b, c, d; > }; > > #define bos(p, t) __builtin_object_size (p, t) > #define memset0(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 0)) > #define memset1(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 1)) > > void f0 (struct S *s) > { > memset0 (&s->d, 0, 1024); // no warning here (bos 0) > } But we do not want the warning here, there is nothing wrong on it. The caller may be void bar (void) { struct T { struct S header; char payload[1024 - offsetof (struct S, d)]; } t; initialize (&t); f0 (&t.header); } and the callee might rely on that. Using some header structure at the beginning and then conditionally on fields in that structure various payloads occurs in many projects, starting with glibc, gcc, Linux kernel, ... The warning really must not be detached from reality. If you want a warning for suspicious calls, sure, but 1) it has to be clearly worded significantly differently from how do you word it, so that users really understand you are warning about suspicious code (though, I really believe it is very common and there is really nothing the users can do about it) 2) it really shouldn't be enabled in -Wall, and I think not even in -Wextra Jakub
Re: [PATCH] Fix PR77407
On Tue, 1 Nov 2016, Marc Glisse wrote: > On Mon, 31 Oct 2016, Richard Biener wrote: > > > On Fri, 28 Oct 2016, Marc Glisse wrote: > > > > > On Wed, 28 Sep 2016, Richard Biener wrote: > > > > > > > The following patch implements patterns to catch x / abs (x) > > > > and x / -x, taking advantage of undefinedness at x == 0 as > > > > opposed to the PR having testcases with explicit != 0 checks. > > > > > > > > Bootstrap / regtest pending on x86_64-unknown-linux-gnu. > > > > > > > > Richard. > > > > > > > > 2016-09-28 Richard Biener > > > > > > > > PR middle-end/77407 > > > > * match.pd: Add X / abs (X) -> X < 0 ? -1 : 1 and > > > > X / -X -> -1 simplifications. > > > > > > I notice that we still have the following comment a few lines above: > > > > > > /* Make sure to preserve divisions by zero. This is the reason why > > >we don't simplify x / x to 1 or 0 / x to 0. */ > > > > > > Did we give up on preserving divisions by 0? Can we now do the 2 > > > simplifications listed by the comment? > > > > At some point there was at least diagnostics fallout when doing them. > > There may be also undefined sanitizer fallout depending on when we > > instrument for that. > > > > But in general yes, we do want to do the two simplifications. Maybe > > we can compromise (in case of early fallout) to do them on GIMPLE > > only. > > > > We could at least add them with a proper nonzero_p predicate. > > (for div (trunc_div ceil_div floor_div round_div exact_div) > + (simplify (div @0 @0) { build_one_cst (type); }) > + (simplify (div integer_zerop@0 @1) @0) > > causes no regression on powerpc64le-unknown-linux-gnu with > --enable-languages=all,obj-c++,go. Good. I probably tried last before the C++ early folding changes. If you'd formally post a patch adding the above (and adjusting the comment) I'll approve that. This eventually means we can remove the if (integer_zerop ()) early out in fold_binary_loc as well. Richard.
[PATCH] Infer value ranges from stmt ops in EVRP
The following makes EVRP use infer_value_range. It also adds a bit of dump verbosity to make EVRP traceable with -details. Bootstrapped and tested on x86_64-unkown-linux-gnu, applied to trunk. Richard. 2016-11-02 Richard Biener * tree-vrp.c (evrp_dom_walker::before_dom_children): Call infer_value_range on stmt ops and update value-ranges. Dump visited stmts and blocks. (evrp_dom_walker::push_value_range): Dump changes. (evrp_dom_walker::pop_value_range): Likewise. (evrp_dom_walker::try_find_new_range): Avoid noop changes. * gcc.dg/tree-ssa/vrp111.c: New testcase. * gcc.dg/tree-ssa/pr20702.c: Disable EVRP. * gcc.dg/tree-ssa/pr21086.c: Likewise. * gcc.dg/tree-ssa/pr58480.c: Likewise. * gcc.dg/tree-ssa/vrp08.c: Likewise. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 241697) +++ gcc/tree-vrp.c (working copy) @@ -10650,18 +10650,17 @@ public: } virtual edge before_dom_children (basic_block); virtual void after_dom_children (basic_block); - void push_value_range (const_tree var, value_range *vr); - value_range *pop_value_range (const_tree var); + void push_value_range (tree var, value_range *vr); + value_range *pop_value_range (tree var); value_range *try_find_new_range (tree op, tree_code code, tree limit); /* Cond_stack holds the old VR. */ - auto_vec > stack; + auto_vec > stack; bitmap need_eh_cleanup; auto_vec stmts_to_fixup; auto_vec stmts_to_remove; }; - /* Find new range for OP such that (OP CODE LIMIT) is true. */ value_range * @@ -10679,6 +10678,10 @@ evrp_dom_walker::try_find_new_range (tre PUSH old value in the stack with the old VR. */ if (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE) { + if (old_vr->type == vr.type + && vrp_operand_equal_p (old_vr->min, vr.min) + && vrp_operand_equal_p (old_vr->max, vr.max)) + return NULL; value_range *new_vr = vrp_value_range_pool.allocate (); *new_vr = vr; return new_vr; @@ -10696,7 +10699,10 @@ evrp_dom_walker::before_dom_children (ba edge_iterator ei; edge e; - push_value_range (NULL_TREE, NULL); + if (dump_file && (dump_flags & TDF_DETAILS)) +fprintf (dump_file, "Visiting BB%d\n", bb->index); + + stack.safe_push (std::make_pair (NULL_TREE, (value_range *)NULL)); edge pred_e = NULL; FOR_EACH_EDGE (e, ei, bb->preds) @@ -10723,6 +10729,11 @@ evrp_dom_walker::before_dom_children (ba && (INTEGRAL_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt))) || POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt) { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Visiting controlling predicate "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } /* Entering a new scope. Try to see if we can find a VR here. */ tree op1 = gimple_cond_rhs (stmt); @@ -10778,6 +10789,11 @@ evrp_dom_walker::before_dom_children (ba continue; value_range vr_result = VR_INITIALIZER; bool interesting = stmt_interesting_for_vrp (phi); + if (interesting && dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Visiting PHI node "); + print_gimple_stmt (dump_file, phi, 0, 0); + } if (!has_unvisited_preds && interesting) extract_range_from_phi_node (phi, &vr_result); @@ -10814,6 +10830,12 @@ evrp_dom_walker::before_dom_children (ba bool was_noreturn = (is_gimple_call (stmt) && gimple_call_noreturn_p (stmt)); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Visiting stmt "); + print_gimple_stmt (dump_file, stmt, 0, 0); + } + if (gcond *cond = dyn_cast (stmt)) { vrp_visit_cond_stmt (cond, &taken_edge); @@ -10825,6 +10847,7 @@ evrp_dom_walker::before_dom_children (ba gimple_cond_make_false (cond); else gcc_unreachable (); + update_stmt (stmt); } } else if (stmt_interesting_for_vrp (stmt)) @@ -10873,6 +10896,55 @@ evrp_dom_walker::before_dom_children (ba else set_defs_to_varying (stmt); + /* See if we can derive a range for any of STMT's operands. */ + tree op; + ssa_op_iter i; + FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_USE) + { + tree value; + enum tree_code comp_code; + + /* If OP is used in such a way that we can infer a value +range for it, and we don't find a previous assertion for +it, create a new assertion location node for OP. */ + if (infer_value_range (stmt, op, &comp_code, &value)) + { + /* If we are able to infer a nonzero value range f
Re: [PATCH, testsuite]: Cleanup lib/target-supports.exp, ...
On Wed, Nov 02, 2016 at 12:39:08PM +0530, Prathamesh Kulkarni wrote: > On 1 November 2016 at 23:41, Uros Bizjak wrote: > > On Tue, Nov 1, 2016 at 5:05 PM, Jakub Jelinek wrote: > >> On Tue, Nov 01, 2016 at 10:05:22AM +0100, Uros Bizjak wrote: > >>> ... simplify some conditions and add i?86-*-* target where missing. > >>> > >>> 2016-11-01 Uros Bizjak > >>> > >>> * lib/target-supports.exp: Normalize order of i?86 and x86_64 targets. > >>> Whitespace fixes. > >> ... > >>> (check_effective_target_divmod): Add i?86-*-* target. > >> > >> This part likely broke > >> +FAIL: gcc.dg/divmod-1.c scan-tree-dump-times widening_mul "DIVMOD" 7 > >> +FAIL: gcc.dg/divmod-2.c scan-tree-dump-times widening_mul "DIVMOD" 7 > >> +FAIL: gcc.dg/divmod-3.c scan-tree-dump-times widening_mul "DIVMOD" 7 > >> +FAIL: gcc.dg/divmod-4.c scan-tree-dump-times widening_mul "DIVMOD" 7 > >> +FAIL: gcc.dg/divmod-6.c scan-tree-dump-times widening_mul "DIVMOD" 7 > >> on i686-linux (i.e. 32-bit). > > > > No, this is expected (these tests already fail with x86_64 -m32 > > multilib). These will be fixed by [1]. > Oops, sorry for the breakage. > The tests are meant to check if the divmod transform triggered, which > is done by scanning > DIVMOD in the widening_mul dump. > > Apparently I only checked for the triplet "x86_64-*-*" in > check_effective_target_divmod() > and it returned 1, which probably caused the divmod DImode tests to > fail with -m32. > > In general, could I check in check_effective_target_*(), what options > are passed ? On some targets, yes. E.g. on i?86-*-*/x86_64-*-*, one can additionally test lp64 or ia32 or negation thereof - there are -m32, -mx32 and -m64 modes, -m64 satisfies lp64, -m32 ia32. Though the predicate seems to be misnamed and not properly documented if it is about properties of DImode divmod rather than other modes (SImode, HImode, QImode, TImode, ...). Jakub
Re: [PATCH] Fix host_size_t_cst_p predicate
On Mon, Oct 31, 2016 at 3:56 PM, Martin Liška wrote: > On 10/31/2016 12:11 PM, Richard Biener wrote: >> On Mon, Oct 31, 2016 at 11:18 AM, Richard Sandiford >> wrote: >>> Richard Biener writes: On Mon, Oct 31, 2016 at 10:58 AM, Richard Sandiford wrote: > Richard Biener writes: >> On Mon, Oct 31, 2016 at 10:10 AM, Martin Liška wrote: >>> On 10/31/2016 01:12 AM, Richard Sandiford wrote: Richard Biener writes: > On Thu, Oct 27, 2016 at 5:06 PM, Martin Liška wrote: >> On 10/27/2016 03:35 PM, Richard Biener wrote: >>> On Thu, Oct 27, 2016 at 9:41 AM, Martin Liška >>> wrote: Running simple test-case w/o the proper header file causes ICE: strncmp ("a", "b", -1); 0xe74462 tree_to_uhwi(tree_node const*) ../../gcc/tree.c:7324 0x90a23f host_size_t_cst_p ../../gcc/fold-const-call.c:63 0x90a23f fold_const_call(combined_fn, tree_node*, tree_node*, tree_node*, tree_node*) ../../gcc/fold-const-call.c:1512 0x787b01 fold_builtin_3 ../../gcc/builtins.c:8385 0x787b01 fold_builtin_n(unsigned int, tree_node*, tree_node**, int, bool) ../../gcc/builtins.c:8465 0x9052b1 fold(tree_node*) ../../gcc/fold-const.c:11919 0x6de2bb c_fully_fold_internal ../../gcc/c/c-fold.c:185 0x6e1f6b c_fully_fold(tree_node*, bool, bool*) ../../gcc/c/c-fold.c:90 0x67cbbf c_process_expr_stmt(unsigned int, tree_node*) ../../gcc/c/c-typeck.c:10369 0x67cfbd c_finish_expr_stmt(unsigned int, tree_node*) ../../gcc/c/c-typeck.c:10414 0x6cb578 c_parser_statement_after_labels ../../gcc/c/c-parser.c:5430 0x6cd333 c_parser_compound_statement_nostart ../../gcc/c/c-parser.c:4944 0x6cdbde c_parser_compound_statement ../../gcc/c/c-parser.c:4777 0x6c93ac c_parser_declaration_or_fndef ../../gcc/c/c-parser.c:2176 0x6d51ab c_parser_external_declaration ../../gcc/c/c-parser.c:1574 0x6d5c09 c_parser_translation_unit ../../gcc/c/c-parser.c:1454 0x6d5c09 c_parse_file() ../../gcc/c/c-parser.c:18173 0x72ffd2 c_common_parse_file() ../../gcc/c-family/c-opts.c:1087 Following patch improves the host_size_t_cst_p predicate. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? >>> >>> I believe the wi::min_precision (t, UNSIGNED) <= sizeof (size_t) * >>> CHAR_BIT test is now redundant. >>> >>> OTOH it was probably desired to allow -1 here? A little looking >>> back >>> in time should tell. >> >> Ok, it started with r229922, where it was changed from: >> >> if (tree_fits_uhwi_p (len) && p1 && p2) >> { >> const int i = strncmp (p1, p2, tree_to_uhwi (len)); >> ... >> >> to current version: >> >> case CFN_BUILT_IN_STRNCMP: >> { >> bool const_size_p = host_size_t_cst_p (arg2, &s2); >> >> Thus I'm suggesting to change to back to it. >> >> Ready to be installed? > > Let's ask Richard. The idea with the: wi::min_precision (t, UNSIGNED) <= sizeof (size_t) * CHAR_BIT test was to stop us attempting 64-bit size_t operations on ILP32 hosts. I think we still want that. >>> >>> OK, so is the consensus to add tree_fits_uhwi_p predicate to the current >>> wi::min_precision check, right? >> >> Not sure. If we have host_size_t_cst_p then we should have a >> corresponding >> size_t host_size_t (const_tree) and should use those in pairs. Not sure >> why we have sth satisfying host_size_t_cst_p but not tree_fits_uhwi_p. > > It's the other way around: something can satisfy tree_fits_uhwi_p > (i.e. fit within a uint64_t) but not fit within the host's size_t. > The kind of case I'm thinking of is: > > strncmp ("fi", "fo", (1L << 32) + 1) > > for an ILP32 host and LP64 target. There's a danger that by passing > the uint64_t value (1L << 32) + 1 to the host's strncmp that we'd > truncate it to 1, giving the wrong result. Yes, but if it passes host_size_t_cst_p why does tree_to_uhwi ICE then? (unless we have a > 64bit host size_t). >>> >>> Because in Mar
Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote: > On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote: >> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote: >> > This patch solves this problem by simply running the >> > duplicate_computed_gotos >> > pass again, as long as it does any work. The patch looks much bigger than >> > it is, because I factored out two routines to simplify the control flow. >> >> It's made the patch a bit difficult to read. Condensing it a bit... > > Well, it would have a goto crossing a loop, or two gotos crossing each > other, otherwise. I should have done it as two patches I guess (first > factor, then change). > >> > + for (;;) >> > { >> > + if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1) >> > + return 0; >> >> This test should not be needed in the loop. This pass can never >> collapse the function to a single basic block. > > Yeah maybe, but that relies on quite a few assumptions. This is strictly > an optimisation anyway, will move it outside the loop. > >> > + basic_block bb; >> > + FOR_EACH_BB_FN (bb, fun) >> > + { >> > + /* Build the reorder chain for the original order of blocks. */ >> > + if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun)) >> > + bb->aux = bb->next_bb; >> > + } >> > >> > + duplicate_computed_gotos_find_candidates (fun, candidates, >> > max_size); >> > >> > + bool changed = false; >> > + if (!bitmap_empty_p (candidates)) >> > + changed = duplicate_computed_gotos_do_duplicate (fun, candidates); >> > >> > + if (changed) >> > + fixup_partitions (); >> > + >> > + cfg_layout_finalize (); >> >> I don't think you have to go into/out-of cfglayout mode for each iteration. > > Yeah probably. I was too lazy :-) It needs the cleanup_cfg every > iteration though, I was afraid that interacts. Ick. Why would it need a cfg-cleanup every iteration? I fear this is quadratic complexity in the number of edges to the compgoto block (and the size of the function). Can't the unfactoring perform the "cleanup" we rely on here? >> >/* Merge the duplicated blocks into predecessors, when possible. */ >> > + if (changed) >> > + cleanup_cfg (0); >> > + else >> > + break; >> > } >> >> Maybe a gcc_assert that the loop doesn't iterate more often than num_edges? > > Good plan (num blocks even). > > Thanks, > > > Segher
Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)
On Mon, 31 Oct 2016, Jakub Jelinek wrote: > Hi! > > Some automatic VAR_DECLs don't get DECL_RTL set - e.g. if its SSA_NAMEs > expand to multiple rtls, then there is not a single one that can be used. > Using DECL_RTL on such VAR_DECLs ICEs. > > I've tried to just return 0 in nonoverlapping_memrefs_p if either > DECL_RTL_SET_P (expr{x,y}) wasn't true, but that triggered way too often > during bootstrap/regtest (3800+ times). So the following patch narrows it > down more and triggers only on the testcase below. What kind of cases did this trigger on? I would have expected we have DECL_RTL_SET_P on (almost?) all decls that can have it. And for those that don't it should be uninteresting to have? > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Hmm. Can you try splitting out a decl_can_have_rtl () predicate from make_decl_rtl and use that? Thanks, Richard. > 2016-10-31 Jakub Jelinek > > PR target/77834 > * alias.c (nonoverlapping_memrefs_p): Return 0 if exprx or expry > doesn't have rtl set and it can't be safely created. > > * gcc.dg/pr77834.c: New test. > > --- gcc/alias.c.jj2016-10-21 17:06:27.0 +0200 > +++ gcc/alias.c 2016-10-31 11:38:29.448031590 +0100 > @@ -2755,6 +2755,27 @@ nonoverlapping_memrefs_p (const_rtx x, c >|| TREE_CODE (expry) == CONST_DECL) > return 1; > > + /* Don't try to create RTL for decls that intentionally don't have > + DECL_RTL set (e.g. marked as living in multiple places). */ > + if (!DECL_RTL_SET_P (exprx)) > +{ > + if (TREE_CODE (exprx) == PARM_DECL > + || TREE_CODE (exprx) == RESULT_DECL > + || (VAR_P (exprx) > + && !TREE_STATIC (exprx) > + && !DECL_EXTERNAL (exprx))) > + return 0; > +} > + if (!DECL_RTL_SET_P (expry)) > +{ > + if (TREE_CODE (expry) == PARM_DECL > + || TREE_CODE (expry) == RESULT_DECL > + || (VAR_P (expry) > + && !TREE_STATIC (expry) > + && !DECL_EXTERNAL (expry))) > + return 0; > +} > + >rtlx = DECL_RTL (exprx); >rtly = DECL_RTL (expry); > > --- gcc/testsuite/gcc.dg/pr77834.c.jj 2016-10-31 11:41:46.290521464 +0100 > +++ gcc/testsuite/gcc.dg/pr77834.c2016-10-31 11:41:24.0 +0100 > @@ -0,0 +1,18 @@ > +/* PR target/77834 */ > +/* { dg-do compile } */ > +/* { dg-options "-O -ftree-pre -Wno-psabi" } */ > +/* { dg-additional-options "-mstringop-strategy=libcall" { target i?86-*-* > x86_64-*-* } } */ > + > +typedef int V __attribute__ ((vector_size (64))); > + > +V > +foo (V u, V v, int w) > +{ > + do > +{ > + if (u[0]) v ^= u[w]; > +} > + while ((V) { 0, u[w] }[1]); > + u = (V) { v[v[0]], u[u[0]] }; > + return v + u; > +} > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Five patches for std::experimental::filesystem
On 27 October 2016 at 15:34, Jonathan Wakely wrote: > On 26/10/16 09:24 +0200, Christophe Lyon wrote: >> >> Hi Jonathan, >> >> On 25 October 2016 at 17:32, Jonathan Wakely wrote: >>> >>> Two more fixes for the filesystem TS, and improved tests. >>> >>> Handle negative times in filesystem::last_write_time >>> * src/filesystem/ops.cc >>>(last_write_time(const path&, file_time_type, error_code&)): >>> Handle >>>negative times correctly. >>>* testsuite/experimental/filesystem/operations/last_write_time.cc: >>>Test writing file times. >>> >>>Fix error handling in copy_file and equivalent >>> * src/filesystem/ops.cc (do_copy_file): Report an error if >>> source >>> or >>>destination is not a regular file (LWG 2712). >>>(equivalent): Fix error handling and result when only one file >>> exists. >>>* testsuite/experimental/filesystem/operations/copy.cc: Remove >>> files >>>created by tests. Test copying directories. >>>* testsuite/experimental/filesystem/operations/copy_file.cc: >>> Remove >>>files created by tests. >>>* testsuite/experimental/filesystem/operations/equivalent.cc: New. >>>* testsuite/experimental/filesystem/operations/is_empty.cc: New. >>>* testsuite/experimental/filesystem/operations/read_symlink.cc: >>> Remove >>>file created by test. >>>* testsuite/experimental/filesystem/operations/remove_all.cc: New. >>>* testsuite/util/testsuite_fs.h (~scoped_file): Only try to remove >>>file if path is non-empty, to support removal by other means. >>> >>> Tested x86_64-linux, committed to trunk. >>> >>> >> I can see failures in >> experimental/filesystem/operations/last_write_time.cc after your >> committed this patch: >> >> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/testsuite/experimental/filesystem/operations/last_write_time.cc:127: >> void test02(): Assertion 'last_write_time(f.path) == time' failed. >> on arm*linux* and aarch64*linux* targets. > > > That test will fail for targets where _GLIBCXX_USE_UTIMENSAT is not > defined, as they use utime() instead which only supports second > granularity. > > This should solve it, by only checking that the file times are within > one second of the expected value. > Hi Jonathan, Indeed your patch fixes the problem I reported. Sorry for the delay, I was on holidays. Thanks, Christophe > > Tested x86_64-linux, committed to trunk.
Re: [PATCH, RFC, rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add
On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek wrote: > On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote: >> The PowerPC back end loses performance on vector intrinsics, because >> currently >> all of them are treated as calls throughout the middle-end phases and only >> expanded when they reach RTL. Our version of altivec.h currently defines the >> public names of overloaded functions (like vec_add) to be #defines for hidden >> functions (like __builtin_vec_add), which are recognized in the parser as >> requiring special back-end support. Tables in rs6000-c.c handle dispatch of >> the overloaded functions to specific function calls appropriate to the >> argument >> types. > > This doesn't look very nice. If all you care is that the builtins like > __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold > into generic vector operations under certain conditions, just fold those > into whatever you want in targetm.gimple_fold_builtin (gsi). Note that traditionally "overloading" with GCC "builtins" is done by using varargs and the "type generic" attribute. That doesn't scale to return type overloading though for which we usually added direct support to the parser (for example for __builtin_shuffle). The folding trick of course should work just fine. Richard. > Jakub
Re: [PATCH] PR tree-optimization/78170: Truncate sign-extended padding when encoding bitfields
On Tue, Nov 1, 2016 at 12:54 PM, Kyrill Tkachov wrote: > Hi all, > > In this PR the code writes a -1 to a bitfield of size 17 bits and ends up > overwriting another bitfields. > The problem is that the intermediate buffer in encode_tree_to_bitpos holding > the value to merge holds > a 24-bit temporary with -1 written to it i.e. sign-extended to all ones. > That is how native_encode_expr works.This gets then written to > the final buffer (well, a shifted version of it). > > We should instead be truncating the intermediate value to contain zeros in > all the bits that we don't want. > This is already performed for big-endian, this patch just wires it up for > little-endian. > > Bootstrapped and tested on x86_64. > Ok for trunk? Ok. Richard. > Thanks, > Kyrill > > 2016-11-01 Kyrylo Tkachov > > PR tree-optimization/78170 > * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Truncate padding > introduced by native_encode_expr on little-endian as well. > > 2016-11-01 Kyrylo Tkachov > > PR tree-optimization/78170 > * gcc.c-torture/execute/pr78170.c: New test.
Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)
On Wed, Nov 02, 2016 at 10:08:25AM +0100, Richard Biener wrote: > On Mon, 31 Oct 2016, Jakub Jelinek wrote: > > > Hi! > > > > Some automatic VAR_DECLs don't get DECL_RTL set - e.g. if its SSA_NAMEs > > expand to multiple rtls, then there is not a single one that can be used. > > Using DECL_RTL on such VAR_DECLs ICEs. > > > > I've tried to just return 0 in nonoverlapping_memrefs_p if either > > DECL_RTL_SET_P (expr{x,y}) wasn't true, but that triggered way too often > > during bootstrap/regtest (3800+ times). So the following patch narrows it > > down more and triggers only on the testcase below. > > What kind of cases did this trigger on? I would have expected we > have DECL_RTL_SET_P on (almost?) all decls that can have it. And for > those that don't it should be uninteresting to have? I admit I havne't studied it in detail yet. Attaching list of BITS_PER_WORD, main_input_filename, current_function_name () where nonoverlapping_memrefs_p returned 0 early because either exprx or expry didn't have DECL_RTL_SET_P. As except for the new testcase that didn't result into ICE, all of those must have been something where make_decl_rtl created RTL in that case. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > Hmm. Can you try splitting out a decl_can_have_rtl () predicate > from make_decl_rtl and use that? Will do. Jakub alias.bz2 Description: BZip2 compressed data
Re: [PATCH] PR tree-optimization/78162: Reject negative offsets in store merging early
On Tue, Nov 1, 2016 at 12:54 PM, Kyrill Tkachov wrote: > Hi all, > > Store merging ICEs on this invalid testcase because it trips up on the > negative bitposition to store to. > It doesn't really expect to handle negative offsets and I believe they won't > occur very often in valid code anyway. > Filling out structs/bitfields/class members involves positive offsets. > I can look into removing all the assumptions about positive offsets if folks > want me to (should involve removing > some 'unsigned' modifiers from HOST_WIDE_INTs and double-checking some range > checks), but for now > this patch just fixes the ICE by rejecting negative offsets early on. > > Bootstrapped and tested on aarch64-none-linux-gnu and x86_64. > > Ok for trunk? Ok (an improvement would be to only reject it after eventually processing a MEM_REF base_addr). Richard. > Thanks, > Kyrill > > 2016-11-01 Kyrylo Tkachov > > PR tree-optimization/78162 > * gimple-ssa-store-merging.c (execute): Mark stores with bitpos < 0 > as invalid. > > 2016-11-01 Kyrylo Tkachov > > PR tree-optimization/78162 > * gcc.c-torture/compile/pr78162.c: New test.
Re: [PATCH, RFC, rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add
On Wed, Nov 02, 2016 at 10:19:26AM +0100, Richard Biener wrote: > On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek wrote: > > On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote: > >> The PowerPC back end loses performance on vector intrinsics, because > >> currently > >> all of them are treated as calls throughout the middle-end phases and only > >> expanded when they reach RTL. Our version of altivec.h currently defines > >> the > >> public names of overloaded functions (like vec_add) to be #defines for > >> hidden > >> functions (like __builtin_vec_add), which are recognized in the parser as > >> requiring special back-end support. Tables in rs6000-c.c handle dispatch > >> of > >> the overloaded functions to specific function calls appropriate to the > >> argument > >> types. > > > > This doesn't look very nice. If all you care is that the builtins like > > __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold > > into generic vector operations under certain conditions, just fold those > > into whatever you want in targetm.gimple_fold_builtin (gsi). > > Note that traditionally "overloading" with GCC "builtins" is done by > using varargs > and the "type generic" attribute. That doesn't scale to return type > overloading > though for which we usually added direct support to the parser (for example > for __builtin_shuffle). My understanding is that rs6000 already does that, it hooks into resolve_overloaded_builtin which already handles the fully type generic builtins where not just the arguments, but also the return type can be picked up. But it resolves the overloaded builtins into calls to other builtins that are not type-generic. So, either that function instead of returning the specific md builtin calls in some cases already returns trees with the generic behavior of the builtin, or it returns what it does now and then in the gimple fold builtin target hook (note, the normal fold builtin target hook is not right for that, because it is mostly used for folding builtins into constant - callers will usually throw away other results) fold those specific md builtins into whatever GIMPLE you want. If we want to decrease amount of folding in the FEs, the gimple fold builtin hook is probably better. Jakub
Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)
On Wed, 2 Nov 2016, Jakub Jelinek wrote: > On Wed, Nov 02, 2016 at 10:08:25AM +0100, Richard Biener wrote: > > On Mon, 31 Oct 2016, Jakub Jelinek wrote: > > > > > Hi! > > > > > > Some automatic VAR_DECLs don't get DECL_RTL set - e.g. if its SSA_NAMEs > > > expand to multiple rtls, then there is not a single one that can be used. > > > Using DECL_RTL on such VAR_DECLs ICEs. > > > > > > I've tried to just return 0 in nonoverlapping_memrefs_p if either > > > DECL_RTL_SET_P (expr{x,y}) wasn't true, but that triggered way too often > > > during bootstrap/regtest (3800+ times). So the following patch narrows it > > > down more and triggers only on the testcase below. > > > > What kind of cases did this trigger on? I would have expected we > > have DECL_RTL_SET_P on (almost?) all decls that can have it. And for > > those that don't it should be uninteresting to have? > > I admit I havne't studied it in detail yet. Attaching list of > BITS_PER_WORD, main_input_filename, current_function_name () > where nonoverlapping_memrefs_p returned 0 early because > either exprx or expry didn't have DECL_RTL_SET_P. As except for the new > testcase that didn't result into ICE, all of those must have been something > where make_decl_rtl created RTL in that case. Ok, just looking at regex.c (I happen to have regex.i around) shows the first hit as #2 0x008d9914 in true_dependence_1 (mem=0x761ef360, mem_mode=DImode, mem_addr=0x76220cf0, x=0x76605060, x_addr=0x7662ffd8, mem_canonicalized=true) at /space/rguenther/src/svn/trunk/gcc/alias.c:2928 2928 if (nonoverlapping_memrefs_p (mem, x, false)) (gdb) p debug_tree (expry) ) [0 __builtin_malloc S1 A8]) $5 = void called from DSE check_mem_read_rtx. I believe that's a bogus query. (I see only a single hit on regex.c, your log seems to have multiple ones...) I wouldn't worry about the lost cases, and if I'd be ambitious I'd try to investigate the above and see what remains... > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > Hmm. Can you try splitting out a decl_can_have_rtl () predicate > > from make_decl_rtl and use that? > > Will do. Thanks, Richard.
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On 11/01/2016 03:53 PM, Jakub Jelinek wrote: > What kind of false positives it is for each case? Is it with normal > asan-bootstrap (without your -fsanitize-use-after-scope changes), or > only with those changes, or only with those changes and > -fsanitize-use-after-scope used during bootstrap? Ok, the situation is simpler than I thought: #include int main(int argc, char **argv) { int *ptr; switch (argc) { int a; case 1: break; default: ptr = &a; break; } fprintf (stderr, "v: %d\n", *ptr); return 0; } Which is gimplified as: int * ptr; switch (argc) , case 1: > { int a; try { ASAN_MARK (2, &a, 4); : goto ; : ptr = &a; goto ; } finally { ASAN_MARK (1, &a, 4); } } : _1 = *ptr; stderr.0_2 = stderr; fprintf (stderr.0_2, "v: %d\n", _1); D.2577 = 0; return D.2577; } D.2577 = 0; return D.2577; and thus we get: /tmp/switch-case.c:9:11: warning: statement will never be executed [-Wswitch-unreachable] int a; I'm wondering where properly fix that, we can either find all these ASAN_MARKs in gimplify_switch_expr and distribute it to all labels (which are gimplified). Or we can put such variables to asan_poisoned_variables if we have information that we're gimplifing statements before a first label. Do we know that from gimple context? If so, these variables will be unpoisoned at the very beginning of each label and the ASAN_MARK call in between switch statement and a first label can be removed. Thoughts? Thanks, Martin
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Tue, Nov 1, 2016 at 4:12 PM, Jakub Jelinek wrote: > On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote: >> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs) >> >> static void >> maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs, >> - bitmap suitable_for_renaming) >> + bitmap suitable_for_renaming, bitmap marked_nonaddressable) >> { >>/* Global Variables, result decls cannot be changed. */ >>if (is_global_var (var) >> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, >> bitmap not_reg_needs, >> || !bitmap_bit_p (not_reg_needs, DECL_UID (var >> { >>TREE_ADDRESSABLE (var) = 0; >> + bitmap_set_bit (marked_nonaddressable, DECL_UID (var)); > > Why do you need the marked_nonaddressable bitmap? > >>if (is_gimple_reg (var)) >> bitmap_set_bit (suitable_for_renaming, DECL_UID (var)); >>if (dump_file) >> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap >> addresses_taken, bitmap not_reg_needs, >> } >> } >> >> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables. */ >> +/* Return true when STMT is ASAN mark where second argument is an address >> + of a local variable. */ >> >> -void >> -execute_update_addresses_taken (void) >> +static bool >> +is_asan_mark_p (gimple *stmt) >> +{ >> + if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK)) >> +return false; >> + >> + tree addr = get_base_address (gimple_call_arg (stmt, 1)); >> + if (TREE_CODE (addr) == ADDR_EXPR >> + && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL) > > Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw) > would turn it into is_gimple_reg), and don't return true if not. > >> +return true; >> + >> + return false; >> +} >> + >> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables. >> + If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK built-ins. >> */ >> + >> + >> +static void >> +execute_update_addresses_taken (bool sanitize_asan_mark = false) > > I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property > set during the asan pass and kept on until end of compilation of that > function. That means even if a var only addressable because of ASAN_MARK is > discovered after this pass we'd still be able to rewrite it into SSA. Note that being TREE_ADDRESSABLE also has effects on alias analysis (didn't follow the patches to see whether you handle ASAN_MARK specially in points-to analysis and/or alias analysis). Generally in update-address-taken you can handle ASAN_MARK similar to MEM_REF (and drop it in the rewrite phase?). As said, I didnt look at the patches and just came by here seeing tree-ssa.c pieces... Richard. > Jakub
Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)
On Wed, Nov 02, 2016 at 10:34:15AM +0100, Richard Biener wrote: > Ok, just looking at regex.c (I happen to have regex.i around) shows > the first hit as > > #2 0x008d9914 in true_dependence_1 (mem=0x761ef360, > mem_mode=DImode, mem_addr=0x76220cf0, x=0x76605060, > x_addr=0x7662ffd8, mem_canonicalized=true) > at /space/rguenther/src/svn/trunk/gcc/alias.c:2928 > 2928 if (nonoverlapping_memrefs_p (mem, x, false)) > (gdb) p debug_tree (expry) > type > 2928 if (nonoverlapping_memrefs_p (mem, x, false)) > (gdb) p debug_rtx (mem) > (mem/f/c:DI (plus:DI (reg/f:DI 7 sp) > (const_int 200 [0xc8])) [5 p+0 S8 A64]) > $4 = void > (gdb) p debug_rtx (x) > (mem:QI (symbol_ref:DI ("malloc") [flags 0x41] 0x76943000 malloc>) [0 __builtin_malloc S1 A8]) > $5 = void > > called from DSE check_mem_read_rtx. I believe that's a bogus query. I guess I should also log debug_rtx (DECL_RTL (expr{x,y})) in those cases then and perhaps also if we ever return non-zero in those cases. > (I see only a single hit on regex.c, your log seems to have multiple > ones...) That was statistics gathered across x86_64-linux and i686-linux simultaneous bootstraps + regtests (and most likely I've mistyped the first value from (int) BITS_PER_WORD to (int) BITS_PER_UNIT :( ). So regex.c is built there many times (3 times each bootstrap at least). Jakub
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On 11/02/2016 10:40 AM, Richard Biener wrote: > On Tue, Nov 1, 2016 at 4:12 PM, Jakub Jelinek wrote: >> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote: >>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs) >>> >>> static void >>> maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs, >>> - bitmap suitable_for_renaming) >>> + bitmap suitable_for_renaming, bitmap >>> marked_nonaddressable) >>> { >>>/* Global Variables, result decls cannot be changed. */ >>>if (is_global_var (var) >>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, >>> bitmap not_reg_needs, >>> || !bitmap_bit_p (not_reg_needs, DECL_UID (var >>> { >>>TREE_ADDRESSABLE (var) = 0; >>> + bitmap_set_bit (marked_nonaddressable, DECL_UID (var)); >> >> Why do you need the marked_nonaddressable bitmap? >> >>>if (is_gimple_reg (var)) >>> bitmap_set_bit (suitable_for_renaming, DECL_UID (var)); >>>if (dump_file) >>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap >>> addresses_taken, bitmap not_reg_needs, >>> } >>> } >>> >>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables. */ >>> +/* Return true when STMT is ASAN mark where second argument is an address >>> + of a local variable. */ >>> >>> -void >>> -execute_update_addresses_taken (void) >>> +static bool >>> +is_asan_mark_p (gimple *stmt) >>> +{ >>> + if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK)) >>> +return false; >>> + >>> + tree addr = get_base_address (gimple_call_arg (stmt, 1)); >>> + if (TREE_CODE (addr) == ADDR_EXPR >>> + && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL) >> >> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw) >> would turn it into is_gimple_reg), and don't return true if not. >> >>> +return true; >>> + >>> + return false; >>> +} >>> + >>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables. >>> + If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK >>> built-ins. */ >>> + >>> + >>> +static void >>> +execute_update_addresses_taken (bool sanitize_asan_mark = false) >> >> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property >> set during the asan pass and kept on until end of compilation of that >> function. That means even if a var only addressable because of ASAN_MARK is >> discovered after this pass we'd still be able to rewrite it into SSA. > > Note that being TREE_ADDRESSABLE also has effects on alias analysis > (didn't follow the patches to see whether you handle ASAN_MARK specially > in points-to analysis and/or alias analysis). Currently all manipulation with shadow memory is done via a pointer type which has created a separate aliasing set: static void asan_init_shadow_ptr_types (void) { asan_shadow_set = new_alias_set (); tree types[3] = { signed_char_type_node, short_integer_type_node, integer_type_node }; for (unsigned i = 0; i < 3; i++) { shadow_ptr_types[i] = build_distinct_type_copy (types[i]); TYPE_ALIAS_SET (shadow_ptr_types[i]) = asan_shadow_set; shadow_ptr_types[i] = build_pointer_type (shadow_ptr_types[i]); } ... Martin > > Generally in update-address-taken you can handle ASAN_MARK similar to > MEM_REF (and drop it in the rewrite phase?). > > As said, I didnt look at the patches and just came by here seeing > tree-ssa.c pieces... > > Richard. > >> Jakub
Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)
On Wed, 2 Nov 2016, Jakub Jelinek wrote: > On Wed, Nov 02, 2016 at 10:34:15AM +0100, Richard Biener wrote: > > Ok, just looking at regex.c (I happen to have regex.i around) shows > > the first hit as > > > > #2 0x008d9914 in true_dependence_1 (mem=0x761ef360, > > mem_mode=DImode, mem_addr=0x76220cf0, x=0x76605060, > > x_addr=0x7662ffd8, mem_canonicalized=true) > > at /space/rguenther/src/svn/trunk/gcc/alias.c:2928 > > 2928 if (nonoverlapping_memrefs_p (mem, x, false)) > > (gdb) p debug_tree (expry) > > > type > > > 2928 if (nonoverlapping_memrefs_p (mem, x, false)) > > (gdb) p debug_rtx (mem) > > (mem/f/c:DI (plus:DI (reg/f:DI 7 sp) > > (const_int 200 [0xc8])) [5 p+0 S8 A64]) > > $4 = void > > (gdb) p debug_rtx (x) > > (mem:QI (symbol_ref:DI ("malloc") [flags 0x41] > 0x76943000 malloc>) [0 __builtin_malloc S1 A8]) > > $5 = void > > > > called from DSE check_mem_read_rtx. I believe that's a bogus query. > > I guess I should also log debug_rtx (DECL_RTL (expr{x,y})) in those cases > then and perhaps also if we ever return non-zero in those cases. Yeah, plus if a followup test would have disambiguated things (the dispatch to the tree oracle for example). > > (I see only a single hit on regex.c, your log seems to have multiple > > ones...) > > That was statistics gathered across x86_64-linux and i686-linux simultaneous > bootstraps + regtests (and most likely I've mistyped the first value > from (int) BITS_PER_WORD to (int) BITS_PER_UNIT :( ). So regex.c is built > there many times (3 times each bootstrap at least). Ah, ok. At least 12 times for x86_64 with multilibs (we build PIC and non-PIC variants IIRC). Richard.
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On 11/01/2016 04:12 PM, Jakub Jelinek wrote: > On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote: >> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs) >> >> static void >> maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs, >> -bitmap suitable_for_renaming) >> +bitmap suitable_for_renaming, bitmap marked_nonaddressable) >> { >>/* Global Variables, result decls cannot be changed. */ >>if (is_global_var (var) >> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, >> bitmap not_reg_needs, >>|| !bitmap_bit_p (not_reg_needs, DECL_UID (var >> { >>TREE_ADDRESSABLE (var) = 0; >> + bitmap_set_bit (marked_nonaddressable, DECL_UID (var)); > > Why do you need the marked_nonaddressable bitmap? Because the later loop (which visits every gimple statement) iterates only if there's an entry in suitable_for_renaming. > >>if (is_gimple_reg (var)) >> bitmap_set_bit (suitable_for_renaming, DECL_UID (var)); >>if (dump_file) >> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap >> addresses_taken, bitmap not_reg_needs, >> } >> } >> >> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables. */ >> +/* Return true when STMT is ASAN mark where second argument is an address >> + of a local variable. */ >> >> -void >> -execute_update_addresses_taken (void) >> +static bool >> +is_asan_mark_p (gimple *stmt) >> +{ >> + if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK)) >> +return false; >> + >> + tree addr = get_base_address (gimple_call_arg (stmt, 1)); >> + if (TREE_CODE (addr) == ADDR_EXPR >> + && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL) > > Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw) > would turn it into is_gimple_reg), and don't return true if not. Well, the predicate is called once before maybe_optimize_var, thus I need to have it conservative and not consider TREE_ADDRESSABLE flag. Having argument would work for that? > >> +return true; >> + >> + return false; >> +} >> + >> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables. >> + If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK built-ins. >> */ >> + >> + >> +static void >> +execute_update_addresses_taken (bool sanitize_asan_mark = false) > > I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property > set during the asan pass and kept on until end of compilation of that > function. That means even if a var only addressable because of ASAN_MARK is > discovered after this pass we'd still be able to rewrite it into SSA. It's doable (please see attached patch) and also nicer. However, one would need to extend curr_properties to long type as we already use 16 existing values. Martin > > Jakub > >From ad5f68a010674118fac7ca8b6953f7b99fd3c2a8 Mon Sep 17 00:00:00 2001 From: marxin Date: Tue, 1 Nov 2016 11:21:20 +0100 Subject: [PATCH] Use-after-scope: do not mark variables that are no longer addressable gcc/ChangeLog: 2016-11-02 Martin Liska * asan.c: Update properties_provided and todo_flags_finish. * function.h (struct GTY): Change int to long as there's not enough space for a new value. * tree-pass.h: Define PROP_asan_check_done. * tree-ssa.c (maybe_optimize_var): Add new argument. (is_asan_mark_p): New function. (execute_update_addresses_taken): Handle ASAN_MARK internal fns. --- gcc/asan.c | 5 +++-- gcc/function.h | 2 +- gcc/tree-pass.h | 1 + gcc/tree-ssa.c | 69 + 4 files changed, 60 insertions(+), 17 deletions(-) diff --git a/gcc/asan.c b/gcc/asan.c index 95495d2..94ee877 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3. If not see #include "params.h" #include "builtins.h" #include "fnmatch.h" +#include "tree-ssa.h" /* AddressSanitizer finds out-of-bounds and use-after-free bugs with <2x slowdown on average. @@ -2993,10 +2994,10 @@ const pass_data pass_data_asan = OPTGROUP_NONE, /* optinfo_flags */ TV_NONE, /* tv_id */ ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */ - 0, /* properties_provided */ + PROP_asan_check_done, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ - TODO_update_ssa, /* todo_flags_finish */ + TODO_update_ssa | TODO_update_address_taken, /* todo_flags_finish */ }; class pass_asan : public gimple_opt_pass diff --git a/gcc/function.h b/gcc/function.h index e854c7f..5600488 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -289,7 +289,7 @@ struct GTY(()) function { location_t function_end_locus; /* Properties used by the pass manager. */ - unsigned int curr_properties; + unsigned long curr_properties; unsigned int last_verified; /* Non-null if the function does something that would prevent it from d
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 02, 2016 at 10:40:35AM +0100, Richard Biener wrote: > > I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property > > set during the asan pass and kept on until end of compilation of that > > function. That means even if a var only addressable because of ASAN_MARK is > > discovered after this pass we'd still be able to rewrite it into SSA. > > Note that being TREE_ADDRESSABLE also has effects on alias analysis > (didn't follow the patches to see whether you handle ASAN_MARK specially > in points-to analysis and/or alias analysis). > > Generally in update-address-taken you can handle ASAN_MARK similar to > MEM_REF (and drop it in the rewrite phase?). That is the intent, but we can't do that before the asan pass, because otherwise as Martin explained we don't diagnose at runtime bugs where a variable is used outside of its scope only through a MEM_REF. So we need to wait for asan pass to actually add a real builtin call that takes the address in that case. Except now I really don't see how that can work for the case where the var is used only properly when it is in the scope, because the asan pass will still see those being addressable. Unless I'm missing something we either need to perform further analysis during the addressable subpass - this variable could be made non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA form there be any SSA uses of the poisoning ASAN_MARK? If yes, keep it addressable, otherwise rewrite into SSA. Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any uses of those, rewrite it back into addressable immediately or later or something. Jakub
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 02, 2016 at 10:36:44AM +0100, Martin Liška wrote: > On 11/01/2016 03:53 PM, Jakub Jelinek wrote: > > What kind of false positives it is for each case? Is it with normal > > asan-bootstrap (without your -fsanitize-use-after-scope changes), or > > only with those changes, or only with those changes and > > -fsanitize-use-after-scope used during bootstrap? > > Ok, the situation is simpler than I thought: CCing also Marek. > > #include > > int main(int argc, char **argv) > { > int *ptr; > > switch (argc) > { > int a; > > case 1: > break; > > default: > ptr = &a; > break; > } > > fprintf (stderr, "v: %d\n", *ptr); > return 0; > } > > Which is gimplified as: > > int * ptr; > > switch (argc) , case 1: > > { > int a; > > try > { > ASAN_MARK (2, &a, 4); > : > goto ; > : > ptr = &a; > goto ; > } > finally > { > ASAN_MARK (1, &a, 4); > } > } > : > _1 = *ptr; > stderr.0_2 = stderr; > fprintf (stderr.0_2, "v: %d\n", _1); > D.2577 = 0; > return D.2577; > } > D.2577 = 0; > return D.2577; > > and thus we get: > /tmp/switch-case.c:9:11: warning: statement will never be executed > [-Wswitch-unreachable] >int a; > > I'm wondering where properly fix that, we can either find all these > ASAN_MARKs in gimplify_switch_expr > and distribute it to all labels (which are gimplified). Or we can put such > variables to asan_poisoned_variables > if we have information that we're gimplifing statements before a first label. > Do we know that from gimple context? > If so, these variables will be unpoisoned at the very beginning of each label > and the ASAN_MARK call in between > switch statement and a first label can be removed. Wouldn't it be easiest if -Wswitch-unreachable warning just ignored the ASAN_MARK internal calls altogether? Do you emit there any other statements, or just ASAN_MARK and nothing else? Shouldn't there be also ASAN_MARK on the implicit gotos from the switch statement? Otherwise, consider this being done in a loop, after the first iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with args 1 and have in case 1: a = 0;, won't that trigger runtime error? Jakub
Re: [PATCH, ARM/testsuite 6/7, ping] Force soft float in ARMv6-M and ARMv8-M Baseline options
Ping? Best regards, Thomas On 28/10/16 10:49, Thomas Preudhomme wrote: On 22/09/16 16:47, Richard Earnshaw (lists) wrote: On 22/09/16 15:51, Thomas Preudhomme wrote: Sorry, noticed an error in the patch. It was not caught during testing because GCC was built with --with-mode=thumb. Correct patch attached. Best regards, Thomas On 22/09/16 14:49, Thomas Preudhomme wrote: Hi, ARMv6-M and ARMv8-M Baseline only support soft float ABI. Therefore, the arm_arch_v8m_base add option should pass -mfloat-abi=soft, much like -mthumb is passed for architectures that only support Thumb instruction set. This patch adds -mfloat-abi=soft to both arm_arch_v6m and arm_arch_v8m_base add options. Patch is in attachment. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2016-07-15 Thomas Preud'homme * lib/target-supports.exp (add_options_for_arm_arch_v6m): Add -mfloat-abi=soft option. (add_options_for_arm_arch_v8m_base): Likewise. Is this ok for trunk? Best regards, Thomas 6_softfloat_testing_v6m_v8m_baseline.patch diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 0dabea0850124947a7fe333e0b94c4077434f278..b5d72f1283be6a6e4736a1d20936e169c1384398 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3540,24 +3540,25 @@ proc check_effective_target_arm_fp16_hw { } { # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */ #/* { dg-add-options arm_arch_v5 } */ # /* { dg-require-effective-target arm_arch_v5_multilib } */ -foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__ - v4t "-march=armv4t" __ARM_ARCH_4T__ - v5 "-march=armv5 -marm" __ARM_ARCH_5__ - v5t "-march=armv5t" __ARM_ARCH_5T__ - v5te "-march=armv5te" __ARM_ARCH_5TE__ - v6 "-march=armv6" __ARM_ARCH_6__ - v6k "-march=armv6k" __ARM_ARCH_6K__ - v6t2 "-march=armv6t2" __ARM_ARCH_6T2__ - v6z "-march=armv6z" __ARM_ARCH_6Z__ - v6m "-march=armv6-m -mthumb" __ARM_ARCH_6M__ - v7a "-march=armv7-a" __ARM_ARCH_7A__ - v7r "-march=armv7-r" __ARM_ARCH_7R__ - v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__ - v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__ - v8a "-march=armv8-a" __ARM_ARCH_8A__ - v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ - v8m_base "-march=armv8-m.base -mthumb" __ARM_ARCH_8M_BASE__ - v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } { +foreach { armfunc armflag armdef } { +v4 "-march=armv4 -marm" __ARM_ARCH_4__ +v4t "-march=armv4t" __ARM_ARCH_4T__ +v5 "-march=armv5 -marm" __ARM_ARCH_5__ +v5t "-march=armv5t" __ARM_ARCH_5T__ +v5te "-march=armv5te" __ARM_ARCH_5TE__ +v6 "-march=armv6" __ARM_ARCH_6__ +v6k "-march=armv6k" __ARM_ARCH_6K__ +v6t2 "-march=armv6t2" __ARM_ARCH_6T2__ +v6z "-march=armv6z" __ARM_ARCH_6Z__ +v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__ +v7a "-march=armv7-a" __ARM_ARCH_7A__ +v7r "-march=armv7-r" __ARM_ARCH_7R__ +v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__ +v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__ +v8a "-march=armv8-a" __ARM_ARCH_8A__ +v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ +v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft" __ARM_ARCH_8M_BASE__ +v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } { eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] { proc check_effective_target_arm_arch_FUNC_ok { } { if { [ string match "*-marm*" "FLAG" ] && I think if you're going to do this you need to also check that changing the ABI in this way isn't incompatible with other aspects of how the user has invoked dejagnu. The reason this patch was made is that without it dg-require-effective-target arm_arch_v8m_base_ok evaluates to true for an arm-none-linux-gnueabihf toolchain but then any testcase containing a function for such a target (such as the atomic-op-* in gcc.target/arm) will error out because ARMv8-M Baseline does not support hard float ABI. I see 2 ways to fix this: 1) the approach taken in this patch, ie saying that to select ARMv8-M baseline architecture you need the right -march, -mthumb but also the right float ABI. Note that the comment at the top of that procedure says: # Creates a series of routines that return 1 if the given architecture # can be selected and a routine to give the flags to select that architecture 2) Add a function to the assembly that is used to test support for the architecture. The reason I favor the first one is that it enables more test while the second test would just skip ARMv6-M and ARMv8-M Baseline tests for arm-none-linux-gnueabihf toolchai
Re: [PATCH, GCC/ARM 1/2, ping] Add multilib support for embedded bare-metal targets
Ping? Best regards, Thomas On 27/10/16 15:26, Thomas Preudhomme wrote: Hi Kyrill, On 27/10/16 10:45, Kyrill Tkachov wrote: Hi Thomas, On 24/10/16 09:06, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 13/10/16 16:35, Thomas Preudhomme wrote: Hi ARM maintainers, This patchset aims at adding multilib support for R and M profile ARM architectures and allowing it to be built alongside multilib for A profile ARM architectures. This specific patch adds the t-rmprofile multilib Makefile fragment for the former objective. Multilib are built for all M profile architecture involved: ARMv6S-M, ARMv7-M and ARMv7E-M as well as ARMv7. ARMv7 multilib is used for R profile architectures but also A profile architectures. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-10-03 Thomas Preud'homme * config.gcc: Allow new rmprofile value for configure option --with-multilib-list. * config/arm/t-rmprofile: New file. * doc/install.texi (--with-multilib-list): Document new rmprofile value for ARM. Testing: == aprofile == * "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the patchset for both aprofile and rmprofile * default spec (gcc -dumpspecs) is the same before and after the patchset for aprofile * No difference in --print-multi-directory between before and after the patchset for aprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU and float ABI == rmprofile == * aprofile and rmprofile use similar directory structure (ISA/arch/FPU/float ABI) and directory naming * Difference in --print-multi-directory between before [1] and after the patchset for rmprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU and float ABI modulo the name and directory structure changes [1] as per patch applied in ARM embedded branches https://gcc.gnu.org/viewcvs/gcc/branches/ARM/embedded-5-branch/gcc/config/arm/t-baremetal?view=markup == aprofile + rmprofile == * aprofile,rmprofile and rmprofile,aprofile builds give an error saying it is not supported Is this ok for master branch? Best regards, Thomas +# Arch Matches +MULTILIB_MATCHES += march?armv6s-m=march?armv6-m +MULTILIB_MATCHES += march?armv8-m.main=march?armv8-m.main+dsp +MULTILIB_MATCHES += march?armv7=march?armv7-r +ifeq (,$(HAS_APROFILE)) +MULTILIB_MATCHES += march?armv7=march?armv7-a +MULTILIB_MATCHES += march?armv7=march?armv7ve +MULTILIB_MATCHES += march?armv7=march?armv8-a +MULTILIB_MATCHES += march?armv7=march?armv8-a+crc +MULTILIB_MATCHES += march?armv7=march?armv8.1-a +MULTILIB_MATCHES += march?armv7=march?armv8.1-a+crc +endif I think you want to update the patch to handle -march=armv8.2-a and armv8.2-a+fp16 Thanks, Kyrill Indeed. Please find updated ChangeLog and patch (attached): *** gcc/ChangeLog *** 2016-10-03 Thomas Preud'homme * config.gcc: Allow new rmprofile value for configure option --with-multilib-list. * config/arm/t-rmprofile: New file. * doc/install.texi (--with-multilib-list): Document new rmprofile value for ARM. Ok for trunk? Best regards, Thomas diff --git a/gcc/config.gcc b/gcc/config.gcc index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -3739,6 +3739,16 @@ case "${target}" in # pragmatic. tmake_profile_file="arm/t-aprofile" ;; + rmprofile) +# Note that arm/t-rmprofile is a +# stand-alone make file fragment to be +# used only with itself. We do not +# specifically use the +# TM_MULTILIB_OPTION framework because +# this shorthand is more +# pragmatic. +tmake_profile_file="arm/t-rmprofile" +;; default) ;; *) @@ -3748,9 +3758,10 @@ case "${target}" in esac if test "x${tmake_profile_file}" != x ; then -# arm/t-aprofile is only designed to work -# without any with-cpu, with-arch, with-mode, -# with-fpu or with-float options. +# arm/t-aprofile and arm/t-rmprofile are only +# designed to work without any with-cpu, +# with-arch, with-mode, with-fpu or with-float +# options. if test "x$with_arch" != x \ || test "x$with_cpu" != x \ || test "x$with_float" != x \ diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile new file mode 100644 index ..c8b5c9cbd03694eea69855e20372afa3e97d6b4c --- /dev/null +++ b/gcc/config/arm/t-rmprofile @@ -0,0 +1,174 @@ +# Copyright (C) 2016 Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without e
Re: [PATCH, GCC/ARM 2/2, ping2] Allow combination of aprofile and rmprofile multilibs
Ping? Best regards, Thomas On 24/10/16 09:07, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 13/10/16 16:35, Thomas Preudhomme wrote: Hi ARM maintainers, This patchset aims at adding multilib support for R and M profile ARM architectures and allowing it to be built alongside multilib for A profile ARM architectures. This specific patch is concerned with the latter. The patch works by moving the bits shared by both aprofile and rmprofile multilib build (variable initilization as well as ISA and float ABI to build multilib for) to a new t-multilib file. Then, based on which profile was requested in --with-multilib-list option, that files includes t-aprofile and/or t-rmprofile where the architecture and FPU to build the multilib for are specified. Unfortunately the duplication of CPU to A profile architectures could not be avoided because substitution due to MULTILIB_MATCHES are not transitive. Therefore, mapping armv7-a to armv7 for rmprofile multilib build does not have the expected effect. Two patches were written to allow this using 2 different approaches but I decided against it because this is not the right solution IMO. See caveats below for what I believe is the correct approach. *** combined build caveats *** As the documentation in this patch warns, there is a few caveats to using a combined multilib build due to the way the multilib framework works. 1) For instance, when using only rmprofile the combination of options -mthumb -march=armv7 -mfpu=neon the thumb/-march=armv7 multilib but in a combined multilib build the default multilib would be used. This is because in the rmprofile build -mfpu=neon is not specified in MULTILIB_OPTION and thus the option is ignored when considering MULTILIB_REQUIRED entries. 2) Another issue is the fact that aprofile and rmprofile multilib build have some conflicting requirements in terms of how to map options for which no multilib is built to another option. (i) A first example of this is the difference of CPU to architecture mapping mentionned above: rmprofile multilib build needs A profile CPUs and architectures to be mapped down to ARMv7 so that one of the v7-ar multilib gets chosen in such a case but aprofile needs A profile architectures to stand on their own because multilibs are built for several architectures. (ii) Another example of this is that in aprofile multilib build no multilib is built with -mfpu=fpv5-d16 but some multilibs are built with -mfpu=fpv4-d16. Therefore, aprofile defines a match rule to map fpv5-d16 onto fpv4-d16. However, rmprofile multilib profile *does* build some multilibs with -mfpu=fpv5-d16. This has the consequence that when building for -mthumb -march=armv7e-m -mfpu=fpv5-d16 -mfloat-abi=hard the default multilib is chosen because this is rewritten into -mthumb -march=armv7e-m -mfpu=fpv5-d16 -mfloat-abi=hard and there is no multilib for that. Both of these issues could be handled by using MULTILIB_REUSE instead of MULTILIB_MATCHES but this would require a large set of rules. I believe instead the right approach is to create a new mechanism to inform GCC on how options can be down mapped _when no multilib can be found_ which would require a smaller set of rules and would make it explicit that the options are not equivalent. A patch will be posted to this effect at a later time. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-10-03 Thomas Preud'homme * config.gcc: Allow combinations of aprofile and rmprofile values for --with-multilib-list. * config/arm/t-multilib: New file. * config/arm/t-aprofile: Remove initialization of MULTILIB_* variables. Remove setting of ISA and floating-point ABI in MULTILIB_OPTIONS and MULTILIB_DIRNAMES. Set architecture and FPU in MULTI_ARCH_OPTS_A and MULTI_ARCH_DIRS_A rather than MULTILIB_OPTIONS and MULTILIB_DIRNAMES respectively. Add comment to introduce all matches. Add architecture matches for marvel-pj4 and generic-armv7-a CPU options. * config/arm/t-rmprofile: Likewise except for the matches changes. * doc/install.texi (--with-multilib-list): Document the combination of aprofile and rmprofile values and warn about pitfalls in doing that. Testing: * "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the patchset for both aprofile and rmprofile * "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same for aprofile,rmprofile and rmprofile,aprofile * default spec (gcc -dumpspecs) is the same for aprofile,rmprofile and rmprofile,aprofile * Difference in --print-multi-directory between aprofile or rmprofile and aprofile,rmprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU and float ABI is as expected (best multilib for the combination is chosen), modulo the caveat mentionned above and the new marvel-pj4 and generic-armv7-a CPU to architecture mapping. Is this ok for master? Best regards, Thomas
Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M33
Ping? Best regards, Thomas On 26/10/16 17:42, Thomas Preudhomme wrote: Hi, This patch adds support for the Cortex-M33 processor launched by ARM [1]. The patch adds support for the name and wires it up to the ARMv8-M Mainline with DSP extensions architecture and arm_v7m_tune tuning parameters for the time being. It also updates documentation to mention this new processor. [1] http://www.arm.com/products/processors/cortex-m/cortex-m33-processor.php ChangeLog entry is as follows: *** gcc/Changelog *** 2016-10-26 Thomas Preud'homme * config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as representative core for this architecture. * config/arm/arm-cores.def (cortex-m33): Define new processor. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Likewise. * config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of valid -mcpu options. * doc/invoke.texi (ARM Options): Document new Cortex-M33 processor. Tested by building libgcc and libstdc++ for Cortex-M33 and running a hello world compiled for it. Is this ok for master? Best regards, Thomas diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def index 9293429b3f9a026bcdacc1651c534bdf14d4df1e..cd79bc505853d4dda6cf2e58bdc2d129032befef 100644 --- a/gcc/config/arm/arm-arches.def +++ b/gcc/config/arm/arm-arches.def @@ -73,7 +73,7 @@ ARM_ARCH("armv8-m.base", cortexm23, 8M_BASE, ARM_FSET_MAKE_CPU1 ( FL_FOR_ARCH8M_BASE)) ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN, ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_FOR_ARCH8M_MAIN)) -ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN, +ARM_ARCH("armv8-m.main+dsp", cortexm33, 8M_MAIN, ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN)) ARM_ARCH("iwmmxt", iwmmxt, 5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)) ARM_ARCH("iwmmxt2", iwmmxt2,5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)) diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 940b5de82f0340fc0c26be80d47729bc1f193db0..ec63ee4abe54af06cd5531486f294f9a8dae71a1 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -168,6 +168,7 @@ ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7, cortexa7, 7A, ARM_FSET_MAKE_ /* V8 Architecture Processors */ ARM_CORE("cortex-m23", cortexm23, cortexm23, 8M_BASE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8M_BASE), v6m) ARM_CORE("cortex-a32", cortexa32, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35) +ARM_CORE("cortex-m33", cortexm33, cortexm33, 8M_MAIN, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN), v7m) ARM_CORE("cortex-a35", cortexa35, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35) ARM_CORE("cortex-a53", cortexa53, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53) ARM_CORE("cortex-a57", cortexa57, cortexa57, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57) diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index de712924afd33ba1e6e65cb56a5b260858d0cc4f..f7886b94be779fcba91506e77574662fe7188876 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -313,6 +313,9 @@ EnumValue Enum(processor_type) String(cortex-a32) Value(cortexa32) EnumValue +Enum(processor_type) String(cortex-m33) Value(cortexm33) + +EnumValue Enum(processor_type) String(cortex-a35) Value(cortexa35) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index 46c2c9258bcad43618a50f6201414fa084cb5b56..e782baccf424e51ac19ef5f02d25ed4f4eb0541d 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -33,8 +33,9 @@ cortexr8,cortexm7,cortexm4, cortexm3,marvell_pj4,cortexa15cortexa7, cortexa17cortexa7,cortexm23,cortexa32, - cortexa35,cortexa53,cortexa57, - cortexa72,cortexa73,exynosm1, - qdf24xx,xgene1,cortexa57cortexa53, - cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53" + cortexm33,cortexa35,cortexa53, + cortexa57,cortexa72,cortexa73, + exynosm1,qdf24xx,xgene1, + cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35, + cortexa73cortexa53" (const (symbol_ref "((enum attr_tune) arm_tune)"))) diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h index 302302f0d2d522fe282bb1d12687b53de72cae25..d45a1ca421901da25e16d965a9474438ea10f349 100644 --- a/gcc/config/arm/bpabi.h +++ b/gcc/config/arm/bpabi.h @@ -97,7 +97,7 @@ |march=armv8.2-a+fp16\ |march=armv8-m.base|mcpu=cortex-m23 \ |march=armv8-m.main \ - |march=armv8-m.main+dsp\ + |march=armv8-m.main+dsp|mcpu=cortex-m33 \ :%{!r:--be8}}}" #else #define BE8_LINK_SPEC \ @@ -136,7 +136,7 @@ |march=armv8.2-a+fp16\ |march=armv8-m.base|mcpu=cortex-m23 \ |march=armv8-m.main \ -
Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M23
Ping? Best regards, Thomas On 26/10/16 17:42, Thomas Preudhomme wrote: Hi, This patch adds support for the Cortex-M23 processor launched by ARM [1]. The patch adds support for the name and wires it up to the ARMv8-M Baseline architecture and arm_v6m_tune tuning parameters for the time being. It also updates documentation to mention this new processor. [1] http://www.arm.com/products/processors/cortex-m/cortex-m23-processor.php ChangeLog entry is as follows: *** gcc/Changelog *** 2016-10-26 Thomas Preud'homme * config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as representative core for this architecture. * config/arm/arm-cores.def (cortex-m23): Define new processor. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Likewise. * config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores this tuning parameters apply to in the comment. * config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of valid -mcpu options. * doc/invoke.texi (ARM Options): Document new Cortex-M23 processor. Tested by building libgcc and libstdc++ for Cortex-M23 and running a hello world compiled for it. Is this ok for master? Best regards, Thomas diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def index 4b196a7d1188de5eca028e5c2597bbc20835201f..9293429b3f9a026bcdacc1651c534bdf14d4df1e 100644 --- a/gcc/config/arm/arm-arches.def +++ b/gcc/config/arm/arm-arches.def @@ -69,7 +69,7 @@ ARM_ARCH ("armv8.2-a", cortexa53, 8A, ARM_ARCH ("armv8.2-a+fp16", cortexa53, 8A, ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A, FL2_FOR_ARCH8_2A | FL2_FP16INST)) -ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE, +ARM_ARCH("armv8-m.base", cortexm23, 8M_BASE, ARM_FSET_MAKE_CPU1 ( FL_FOR_ARCH8M_BASE)) ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN, ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_FOR_ARCH8M_MAIN)) diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 2072e1e6f8d84533deead24e6fb0b6aff7603f24..940b5de82f0340fc0c26be80d47729bc1f193db0 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -166,6 +166,7 @@ ARM_CORE("cortex-a15.cortex-a7", cortexa15cortexa7, cortexa7, 7A, ARM_FSET_MAKE_ ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7, cortexa7, 7A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV | FL_FOR_ARCH7A), cortex_a12) /* V8 Architecture Processors */ +ARM_CORE("cortex-m23", cortexm23, cortexm23, 8M_BASE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8M_BASE), v6m) ARM_CORE("cortex-a32", cortexa32, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35) ARM_CORE("cortex-a35", cortexa35, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35) ARM_CORE("cortex-a53", cortexa53, cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53) diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt index ee9e3bb7ec57e0e8f2f15b83442711b9faf82d20..de712924afd33ba1e6e65cb56a5b260858d0cc4f 100644 --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -307,6 +307,9 @@ EnumValue Enum(processor_type) String(cortex-a17.cortex-a7) Value(cortexa17cortexa7) EnumValue +Enum(processor_type) String(cortex-m23) Value(cortexm23) + +EnumValue Enum(processor_type) String(cortex-a32) Value(cortexa32) EnumValue diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md index 594ce9d1734451f89812200191cb35f1f579289e..46c2c9258bcad43618a50f6201414fa084cb5b56 100644 --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -32,9 +32,9 @@ cortexr4f,cortexr5,cortexr7, cortexr8,cortexm7,cortexm4, cortexm3,marvell_pj4,cortexa15cortexa7, - cortexa17cortexa7,cortexa32,cortexa35, - cortexa53,cortexa57,cortexa72, - cortexa73,exynosm1,qdf24xx, - xgene1,cortexa57cortexa53,cortexa72cortexa53, - cortexa73cortexa35,cortexa73cortexa53" + cortexa17cortexa7,cortexm23,cortexa32, + cortexa35,cortexa53,cortexa57, + cortexa72,cortexa73,exynosm1, + qdf24xx,xgene1,cortexa57cortexa53, + cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53" (const (symbol_ref "((enum attr_tune) arm_tune)"))) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 022c1d72a1272e56397dc7e2018483e77f18b90d..39b2da05d2135c68032231bb7780104061355786 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2243,7 +2243,8 @@ const struct tune_params arm_cortex_m7_tune = }; /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than - arm_v6t2_tune. It is used for cortex-m0, cortex-m1 and cortex-m0plus. */ + arm_v6t2_tune. It is used for cortex-m0, cortex-m1, cortex-m0plus and + cortex-m23. */ const struct tune_params arm_v6m_tune = { arm_9e_rtx_costs, diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h index 0da98fb711bdcaf5add6e392060f4edaddf6cf05..302302f0d2d522f
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On 11/02/2016 10:59 AM, Jakub Jelinek wrote: > On Wed, Nov 02, 2016 at 10:36:44AM +0100, Martin Liška wrote: >> On 11/01/2016 03:53 PM, Jakub Jelinek wrote: >>> What kind of false positives it is for each case? Is it with normal >>> asan-bootstrap (without your -fsanitize-use-after-scope changes), or >>> only with those changes, or only with those changes and >>> -fsanitize-use-after-scope used during bootstrap? >> >> Ok, the situation is simpler than I thought: > > CCing also Marek. >> >> #include >> >> int main(int argc, char **argv) >> { >> int *ptr; >> >> switch (argc) >> { >> int a; >> >> case 1: >> break; >> >> default: >> ptr = &a; >> break; >> } >> >> fprintf (stderr, "v: %d\n", *ptr); >> return 0; >> } >> >> Which is gimplified as: >> >> int * ptr; >> >> switch (argc) , case 1: > >> { >> int a; >> >> try >> { >> ASAN_MARK (2, &a, 4); >> : >> goto ; >> : >> ptr = &a; >> goto ; >> } >> finally >> { >> ASAN_MARK (1, &a, 4); >> } >> } >> : >> _1 = *ptr; >> stderr.0_2 = stderr; >> fprintf (stderr.0_2, "v: %d\n", _1); >> D.2577 = 0; >> return D.2577; >> } >> D.2577 = 0; >> return D.2577; >> >> and thus we get: >> /tmp/switch-case.c:9:11: warning: statement will never be executed >> [-Wswitch-unreachable] >>int a; >> >> I'm wondering where properly fix that, we can either find all these >> ASAN_MARKs in gimplify_switch_expr >> and distribute it to all labels (which are gimplified). Or we can put such >> variables to asan_poisoned_variables >> if we have information that we're gimplifing statements before a first >> label. Do we know that from gimple context? >> If so, these variables will be unpoisoned at the very beginning of each >> label and the ASAN_MARK call in between >> switch statement and a first label can be removed. > > Wouldn't it be easiest if -Wswitch-unreachable warning just ignored > the ASAN_MARK internal calls altogether? > Do you emit there any other statements, or just ASAN_MARK and nothing else? Yep, skipping warning can be done easily, however gimplified code is wrong as un-poisoning is not done for variable (even for a valid program). > > Shouldn't there be also ASAN_MARK on the implicit gotos from the switch > statement? Otherwise, consider this being done in a loop, after the first > iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with > args 1 and have in case 1: a = 0;, won't that trigger runtime error? Hopefully having the un-poisoning call be encapsulated in finally block would properly clean up the variable. Or am I wrong? Martin > > Jakub >
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote: > > Which is gimplified as: > > > > int * ptr; > > > > switch (argc) , case 1: > > > { > > int a; > > > > try > > { > > ASAN_MARK (2, &a, 4); > > : > > goto ; > > : > > ptr = &a; > > goto ; > > } > > finally > > { > > ASAN_MARK (1, &a, 4); > > } > Shouldn't there be also ASAN_MARK on the implicit gotos from the switch > statement? Otherwise, consider this being done in a loop, after the first > iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with > args 1 and have in case 1: a = 0;, won't that trigger runtime error? Wonder if for the variables declared inside of switch body, because we don't care about uses before scope, it wouldn't be more efficient to arrange for int *p, *q, *r; switch (x) { int a; case 1: p = &a; a = 5; break; int b; case 2: int c; q = &b; r = &c; b = 3; c = 4; break; } to effectively ASAN_MARK (2, &a, 4); ASAN_MARK (2, &b, 4); ASAN_MARK (2, &c, 4); before the GIMPLE_SWITCH, instead of unpoisoning them on every case label where they might be in scope. Though, of course, at least until lower pass that is quite ugly, because it would refer to "not yet declared" variables. Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not the expression evaluation of the switch control expression) inside of the switches' GIMPLE_BIND. Jakub
Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M23
On 02/11/16 10:07, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 26/10/16 17:42, Thomas Preudhomme wrote: Hi, This patch adds support for the Cortex-M23 processor launched by ARM [1]. The patch adds support for the name and wires it up to the ARMv8-M Baseline architecture and arm_v6m_tune tuning parameters for the time being. It also updates documentation to mention this new processor. [1] http://www.arm.com/products/processors/cortex-m/cortex-m23-processor.php ChangeLog entry is as follows: *** gcc/Changelog *** 2016-10-26 Thomas Preud'homme * config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as representative core for this architecture. * config/arm/arm-cores.def (cortex-m23): Define new processor. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Likewise. * config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores this tuning parameters apply to in the comment. * config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of valid -mcpu options. * doc/invoke.texi (ARM Options): Document new Cortex-M23 processor. Tested by building libgcc and libstdc++ for Cortex-M23 and running a hello world compiled for it. Is this ok for master? Ok. Thanks, Kyrill Best regards, Thomas
Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M33
On 02/11/16 10:07, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 26/10/16 17:42, Thomas Preudhomme wrote: Hi, This patch adds support for the Cortex-M33 processor launched by ARM [1]. The patch adds support for the name and wires it up to the ARMv8-M Mainline with DSP extensions architecture and arm_v7m_tune tuning parameters for the time being. It also updates documentation to mention this new processor. [1] http://www.arm.com/products/processors/cortex-m/cortex-m33-processor.php ChangeLog entry is as follows: *** gcc/Changelog *** 2016-10-26 Thomas Preud'homme * config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as representative core for this architecture. * config/arm/arm-cores.def (cortex-m33): Define new processor. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Likewise. * config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of valid -mcpu options. * doc/invoke.texi (ARM Options): Document new Cortex-M33 processor. Tested by building libgcc and libstdc++ for Cortex-M33 and running a hello world compiled for it. Is this ok for master? Ok. Thanks, Kyrill Best regards, Thomas
Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener wrote: > On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote: >> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote: >>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote: >>> > + cfg_layout_finalize (); >>> >>> I don't think you have to go into/out-of cfglayout mode for each iteration. >> >> Yeah probably. I was too lazy :-) It needs the cleanup_cfg every >> iteration though, I was afraid that interacts. > > Ick. Why would it need a cfg-cleanup every iteration? I don't believe it needs a cleanup on every iteration. One cleanup at the end should work fine. Ciao! Steven
Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS
> This converts the preprocessor check for WORD_REGISTER_OPERATIONS into a > runtime one in rtlanal.c. > > Since this one was in combination with an "#if defined" and used to guard an > if-statement I'd appreciate it if someone gave it a double-check that I > dind't screw up the intended behaviour. Unfortunately I think you did, as the old version was: #if WORD_REGISTER_OPERATIONS && defined (LOAD_EXTEND_OP) /* If this is a typical RISC machine, we only have to worry about the way loads are extended. */ if ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND ? val_signbit_known_set_p (inner_mode, nonzero) : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND) || !MEM_P (SUBREG_REG (x))) #endif and the new version is: #ifdef LOAD_EXTEND_OP /* If this is a typical RISC machine, we only have to worry about the way loads are extended. */ if (WORD_REGISTER_OPERATIONS && ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND ? val_signbit_known_set_p (inner_mode, nonzero) : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND) || !MEM_P (SUBREG_REG (x #endif So if WORD_REGISTER_OPERATIONS is zero and LOAD_EXTEND_OP is defined, for example on PowerPC, the block guarded by the condition is always executed in the former case but never in the latter case. -- Eric Botcazou
[PATCH] Fix store-merging alias check, apply some TLC
This fixes the alias check in terminate_all_aliasing_chains -- the base we use for the hash table indexing does not constitute an object that aliases all stores in the chain (consider for example the MEM_REF handling which strips the offset completely). I've refactored data structures a bit in the process of making (as a followup) 'base_addr' a true address (and thus avoid building that new MEM_REF for example). A followup will then try to support (some) bases with offset != NULL_TREE. Bootstrap / regtest running on x86_64-unknown-linux-gnu. Richard. 2016-11-02 Richard Biener * gimple-ssa-store-merging.c (struct store_immediate_info): Remove redundant val and dest members. (store_immediate_info::store_immediate_info): Adjust. (merged_store_group::merged_store_group): Adjust. (merged_store_group::apply_stores): Likewise. (struct imm_store_chain_info): Add base_addr field. (imm_store_chain_info::imm_store_chain_info): New constructor. (imm_store_chain_info::terminate_and_process_chain): Do not pass base. (imm_store_chain_info::output_merged_store): Likewise. (imm_store_chain_info::output_merged_stores): Likewise. (pass_tree_store_merging::terminate_all_aliasing_chains): Take imm_store_chain_info instead of base. Fix alias check. (pass_tree_store_merging::terminate_and_release_chain): Likewise. (imm_store_chain_info::coalesce_immediate_stores): Adjust. Index: gcc/gimple-ssa-store-merging.c === --- gcc/gimple-ssa-store-merging.c (revision 241779) +++ gcc/gimple-ssa-store-merging.c (working copy) @@ -140,19 +140,17 @@ struct store_immediate_info { unsigned HOST_WIDE_INT bitsize; unsigned HOST_WIDE_INT bitpos; - tree val; - tree dest; gimple *stmt; unsigned int order; - store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, tree, - tree, gimple *, unsigned int); + store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, + gimple *, unsigned int); }; store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs, - unsigned HOST_WIDE_INT bp, tree v, - tree d, gimple *st, + unsigned HOST_WIDE_INT bp, + gimple *st, unsigned int ord) - : bitsize (bs), bitpos (bp), val (v), dest (d), stmt (st), order (ord) + : bitsize (bs), bitpos (bp), stmt (st), order (ord) { } @@ -557,7 +555,7 @@ merged_store_group::merged_store_group ( /* VAL has memory allocated for it in apply_stores once the group width has been finalized. */ val = NULL; - align = get_object_alignment (info->dest); + align = get_object_alignment (gimple_assign_lhs (info->stmt)); stores.create (1); stores.safe_push (info); last_stmt = info->stmt; @@ -654,14 +652,16 @@ merged_store_group::apply_stores () FOR_EACH_VEC_ELT (stores, i, info) { unsigned int pos_in_buffer = info->bitpos - start; - bool ret = encode_tree_to_bitpos (info->val, val, info->bitsize, -pos_in_buffer, buf_size); + bool ret = encode_tree_to_bitpos (gimple_assign_rhs1 (info->stmt), + val, info->bitsize, + pos_in_buffer, buf_size); if (dump_file && (dump_flags & TDF_DETAILS)) { if (ret) { fprintf (dump_file, "After writing "); - print_generic_expr (dump_file, info->val, 0); + print_generic_expr (dump_file, + gimple_assign_rhs1 (info->stmt), 0); fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC " at position %d the merged region contains:\n", info->bitsize, pos_in_buffer); @@ -680,13 +680,15 @@ merged_store_group::apply_stores () struct imm_store_chain_info { + tree base_addr; auto_vec m_store_info; auto_vec m_merged_store_groups; - bool terminate_and_process_chain (tree); + imm_store_chain_info (tree b_a) : base_addr (b_a) {} + bool terminate_and_process_chain (); bool coalesce_immediate_stores (); - bool output_merged_store (tree, merged_store_group *); - bool output_merged_stores (tree); + bool output_merged_store (merged_store_group *); + bool output_merged_stores (); }; const pass_data pass_data_tree_store_merging = { @@ -722,8 +724,9 @@ private: hash_map m_stores; bool terminate_and_process_all_chains (); - bool terminate_all_aliasing_chains (tree, tree, bool, gimple *); - bool terminate_and_release_chain (tree); + bool terminate_all_aliasing_chains (tree, imm_store_chain_info **, +
Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.
On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote: > On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski wrote: > > Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 (variant > > 1) > > is an ARMv8.1 part, I needed to add detecting of the variant also for this > > difference. Also I simplify a little bit and combined the single core and > > arch detecting cases so it would be easier to add variant. > > Actually it is a bit more complex than what I said here, see below for > the full table of options and what are enabled/disabled now. > > > OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. > > Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is > > deecting the two seperately. > > > Here is the final patch in this series updated; I changed the cpu name > slightly and made sure I updated invoke.texi too. > > The names are going to match the names in LLVM (worked with our LLVM > engineer here at Cavium about the names). > Here are the names recorded and > -mpcu=thunderx: > *Matches part num 0xA0 (reserved for ThunderX 8x series) > *T88 Pass 2 scheduling > *Hardware prefetching (software prefetching disabled) > *LSE enabled > *no v8.1 This doesn't match the current LLVM proposal ( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support for -mcpu=thunderx. > -mcpu=thunderxt88: > *Matches part num 0xA1 > *T88 Pass 2 scheduling > *software prefetching enabled > *LSE enabled > *no v8.1 > > -mcpu=thunderxt88p1 (only for GCC): > *Matches part num 0xA1, variant 0 > *T88 Pass 1 scheduling > *software prefetching enabled > *no LSE enabled > *no v8.1 > > -mcpu=thunderxt81 and -mcpu=thunderxt83: > *Matches part num 0xA2/0xA3 > *T88 Pass 2 scheduling > *Hardware prefetching (software prefetching disabled) > *LSE enabled > *v8.1 This looks like what has been added to LLVM as -mcpu=thunderx. > I have not hooked up software vs hardware prefetching and the > scheduler parts (the next patch will do part of that); both ARMv8.1-a > and LSE parts are hooked up as those parts are only in > aarch64-cores.def. > > OK? Bootstrapped and tested on ThunderX T88 and ThunderX T81 > (aarch64-linux-gnu). > > Index: common/config/aarch64/aarch64-common.c > === > --- common/config/aarch64/aarch64-common.c(revision 241727) > +++ common/config/aarch64/aarch64-common.c(working copy) > @@ -145,7 +145,7 @@ struct arch_to_arch_name > the default set of architectural feature flags they support. */ > static const struct processor_name_to_arch all_cores[] = > { > -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \ > +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, > VARIANT) \ >{NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS}, > #include "config/aarch64/aarch64-cores.def" >{"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8}, > Index: config/aarch64/aarch64-cores.def > === > --- config/aarch64/aarch64-cores.def (revision 241727) > +++ config/aarch64/aarch64-cores.def (working copy) > @@ -21,7 +21,7 @@ > > Before using #include to read this file, define a macro: > > - AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, > FLAGS, COSTS, IMP, PART) > + AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, > FLAGS, COSTS, IMP, PART, VARIANT) > > The CORE_NAME is the name of the core, represented as a string constant. > The CORE_IDENT is the name of the core, represented as an identifier. > @@ -39,39 +39,45 @@ > PART is the part number of the CPU. On a GNU/Linux system it can be > found in /proc/cpuinfo. For big.LITTLE systems this should use the > macro AARCH64_BIG_LITTLE where the big part number comes as the first > - argument to the macro and little is the second. */ > + argument to the macro and little is the second. > + VARIANT is the variant of the CPU. In a GNU/Linux system it can found > + in /proc/cpuinfo. If this is -1, this means it can match any variant. */ > > /* V8 Architecture Processors. */ > > /* ARM ('A') cores. */ > -AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04) > -AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03) > -AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07) > -AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08) > -AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC, cortexa73
Re: [PATCH testsuite]Require vect_cond_mixed for test case gcc.dg/vect/pr56541.c
On Tue, Nov 1, 2016 at 4:24 PM, Bin.Cheng wrote: > On Thu, Aug 11, 2016 at 11:38 AM, Richard Biener > wrote: >> On Thu, Aug 11, 2016 at 11:56 AM, Bin.Cheng wrote: >>> On Thu, Aug 11, 2016 at 10:50 AM, Richard Biener >>> wrote: On Wed, Aug 10, 2016 at 5:58 PM, Bin Cheng wrote: > Hi, > Due to some reasons, tree-if-conv.c now factors floating point comparison > out of cond_expr, > resulting in mixed types in it. This does help CSE on common comparison > operations. > Only problem is that test gcc.dg/vect/pr56541.c now requires > vect_cond_mixed to be > vectorized. This patch changes the test in that way. > Test result checked. Is it OK? Hmm, I think the fix is to fix if-conversion not doing that. Can you track down why this happens? >>> Hmm, but there are several common floating comparison operations in >>> the case, by doing this, we could do CSE on GIMPLE, otherwise we >>> depends on RTL optimizers. >> >> I see. >> >>> I thought we prefer GIMPLE level >>> transforms? >> >> Yes, but the vectorizer is happier with the conditions present in the >> COND_EXPR >> and thus we concluded we always want to have them there. forwprop will >> also aggressively put them back. Note that we cannot put back >> tree_could_throw_p >> conditions (FP compares with signalling nans for example) to properly model >> EH >> (though for VEC_COND_EXPRs we don't really care here). >> >> Note that nothing between if-conversion and vectorization will perform >> the desired >> CSE anyway. > Hi Richard, > I looked into this one and found it was not if-conv factors cond_expr > out. For test case: > > for (i=0; i!=1024; ++i) > { > float rR = a*z[i]; > float rL = b*z[i]; > float rMin = (rR float rMax = (rR rMin = (rMax>0) ? rMin : rBig; > rMin = (rMin>0) ? rMin : rMax; > ok[i] = rMin-c } > Dump before jump threading is like: > > : > # iftmp.3_12 = PHI > if (iftmp.3_12 > 0.0) > goto ; > else > goto ; > > : > > : > # iftmp.4_13 = PHI > if (iftmp.4_13 > 0.0) > goto ; > else > goto ; > > : > > : > # iftmp.5_14 = PHI > > Jump thread in dom pass threads edges (bb7 -> bb8 -> ... bb11) to (bb6 > -> bb12 -> bb9) as below: > > : > # iftmp.3_12 = PHI > # iftmp.2_23 = PHI > if (iftmp.3_12 > 0.0) > goto ; > else > goto ; > > : > # iftmp.4_13 = PHI > if (iftmp.4_13 > 0.0) > goto ; > else > goto ; > > : > > : > # iftmp.5_14 = PHI > > //... > : > # iftmp.4_22 = PHI <1.5e+2(6)> > goto ; > > This transform saves one comparison on the path, but creates multi-arg > phi, resulting in cond_expr being factored out. > > Looks like threading corrupts vectorization opportunity for target > doesn't support vect_cond_mixed, but I guess it's hard to tell in > threading itself. Any ideas? First of all I wonder why MIN/MAX are not pattern matched earlier... (I'd generally avoid threading through them). But yes, I see that if-conversion can cause factored out conditons. Also for the simple case of ! condition: _49 = iftmp.3_12 > 0.0; _50 = ~_49; iftmp.5_14 = _50 ? 1.5e+2 : _ifc__48; but those should have been eliminated by swapping operands (looks like that doesn't reliably work). In the end we have the bool patterns in the vectorizer to recover from this, so I wonder why that doesn't work then. It should re-write the above to _111 = iftmp.3_12 > 0.0 ? 1 : 0; // of proper type _112 = ~ _111; iftmp.5_14 = _112 != 0 ? ; Richard. > Thanks, > bin
Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS
Hi Eric, On 02/11/16 10:47, Eric Botcazou wrote: This converts the preprocessor check for WORD_REGISTER_OPERATIONS into a runtime one in rtlanal.c. Since this one was in combination with an "#if defined" and used to guard an if-statement I'd appreciate it if someone gave it a double-check that I dind't screw up the intended behaviour. Unfortunately I think you did, as the old version was: #if WORD_REGISTER_OPERATIONS && defined (LOAD_EXTEND_OP) /* If this is a typical RISC machine, we only have to worry about the way loads are extended. */ if ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND ? val_signbit_known_set_p (inner_mode, nonzero) : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND) || !MEM_P (SUBREG_REG (x))) #endif and the new version is: #ifdef LOAD_EXTEND_OP /* If this is a typical RISC machine, we only have to worry about the way loads are extended. */ if (WORD_REGISTER_OPERATIONS && ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND ? val_signbit_known_set_p (inner_mode, nonzero) : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND) || !MEM_P (SUBREG_REG (x #endif So if WORD_REGISTER_OPERATIONS is zero and LOAD_EXTEND_OP is defined, for example on PowerPC, the block guarded by the condition is always executed in the former case but never in the latter case. I think you're right. I suppose the new condition should be: #ifdef LOAD_EXTEND_OP /* If this is a typical RISC machine, we only have to worry about the way loads are extended. */ if (!WORD_REGISTER_OPERATIONS || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND ? val_signbit_known_set_p (inner_mode, nonzero) : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND) || !MEM_P (SUBREG_REG (x #endif Would you prefer me to make this change or just revert the patch? Thanks, Kyrill
Re: [PATCH testsuite]Require vect_cond_mixed for test case gcc.dg/vect/pr56541.c
On Wed, Nov 2, 2016 at 11:08 AM, Richard Biener wrote: > On Tue, Nov 1, 2016 at 4:24 PM, Bin.Cheng wrote: >> On Thu, Aug 11, 2016 at 11:38 AM, Richard Biener >> wrote: >>> On Thu, Aug 11, 2016 at 11:56 AM, Bin.Cheng wrote: On Thu, Aug 11, 2016 at 10:50 AM, Richard Biener wrote: > On Wed, Aug 10, 2016 at 5:58 PM, Bin Cheng wrote: >> Hi, >> Due to some reasons, tree-if-conv.c now factors floating point >> comparison out of cond_expr, >> resulting in mixed types in it. This does help CSE on common comparison >> operations. >> Only problem is that test gcc.dg/vect/pr56541.c now requires >> vect_cond_mixed to be >> vectorized. This patch changes the test in that way. >> Test result checked. Is it OK? > > Hmm, I think the fix is to fix if-conversion not doing that. Can you > track down why this happens? Hmm, but there are several common floating comparison operations in the case, by doing this, we could do CSE on GIMPLE, otherwise we depends on RTL optimizers. >>> >>> I see. >>> I thought we prefer GIMPLE level transforms? >>> >>> Yes, but the vectorizer is happier with the conditions present in the >>> COND_EXPR >>> and thus we concluded we always want to have them there. forwprop will >>> also aggressively put them back. Note that we cannot put back >>> tree_could_throw_p >>> conditions (FP compares with signalling nans for example) to properly model >>> EH >>> (though for VEC_COND_EXPRs we don't really care here). >>> >>> Note that nothing between if-conversion and vectorization will perform >>> the desired >>> CSE anyway. >> Hi Richard, >> I looked into this one and found it was not if-conv factors cond_expr >> out. For test case: >> >> for (i=0; i!=1024; ++i) >> { >> float rR = a*z[i]; >> float rL = b*z[i]; >> float rMin = (rR> float rMax = (rR> rMin = (rMax>0) ? rMin : rBig; >> rMin = (rMin>0) ? rMin : rMax; >> ok[i] = rMin-c> } >> Dump before jump threading is like: >> >> : >> # iftmp.3_12 = PHI >> if (iftmp.3_12 > 0.0) >> goto ; >> else >> goto ; >> >> : >> >> : >> # iftmp.4_13 = PHI >> if (iftmp.4_13 > 0.0) >> goto ; >> else >> goto ; >> >> : >> >> : >> # iftmp.5_14 = PHI >> >> Jump thread in dom pass threads edges (bb7 -> bb8 -> ... bb11) to (bb6 >> -> bb12 -> bb9) as below: >> >> : >> # iftmp.3_12 = PHI >> # iftmp.2_23 = PHI >> if (iftmp.3_12 > 0.0) >> goto ; >> else >> goto ; >> >> : >> # iftmp.4_13 = PHI >> if (iftmp.4_13 > 0.0) >> goto ; >> else >> goto ; >> >> : >> >> : >> # iftmp.5_14 = PHI >> >> //... >> : >> # iftmp.4_22 = PHI <1.5e+2(6)> >> goto ; >> >> This transform saves one comparison on the path, but creates multi-arg >> phi, resulting in cond_expr being factored out. >> >> Looks like threading corrupts vectorization opportunity for target >> doesn't support vect_cond_mixed, but I guess it's hard to tell in >> threading itself. Any ideas? > > First of all I wonder why MIN/MAX are not pattern matched earlier... > (I'd generally avoid threading through them). Hmm, this case is not MIN/MAX cases, though it uses rMin/rMax as variable names. So it cannot be pattern matched. As for below cases you mentioned. Since I am introducing more/advanced pattern in match.pd and adding call to fold_stmt in fold_stmt, I think they can be addressed in this way too, I will keep this in mind in my cond_expr pattern work. So we don't need to go that far to vector pattern matches. Thanks, bin > > But yes, I see that if-conversion can cause factored out conditons. > Also for the simple case of ! condition: > > _49 = iftmp.3_12 > 0.0; > _50 = ~_49; > iftmp.5_14 = _50 ? 1.5e+2 : _ifc__48; > > but those should have been eliminated by swapping operands (looks like > that doesn't reliably work). > > In the end we have the bool patterns in the vectorizer to recover from this, > so I wonder why that doesn't work then. It should re-write the above to > > _111 = iftmp.3_12 > 0.0 ? 1 : 0; // of proper type > _112 = ~ _111; > iftmp.5_14 = _112 != 0 ? ; > > Richard. > >> Thanks, >> bin
Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsincs.
On Tue, Nov 1, 2016 at 12:21 PM, Kyrill Tkachov wrote: > Hi Tamar, > > > On 26/10/16 16:01, Tamar Christina wrote: >> >> Hi Christophe, >> >> Here's the updated patch. >> >> Cheers, >> Tamar >> >> From: Christophe Lyon >> Sent: Wednesday, October 19, 2016 11:23:56 AM >> To: Tamar Christina >> Cc: GCC Patches; Kyrylo Tkachov; nd >> Subject: Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and >> vminnmQ_ST intrinsincs. >> >> On 19 October 2016 at 11:36, Tamar Christina >> wrote: >>> >>> Hi All, >>> >>> This patch implements the vmaxnmQ_ST and vminnmQ_ST intrinsics. The >>> current builtin registration code is deficient since it can't access >>> standard pattern names, to which vmaxnmQ_ST and vminnmQ_ST map >>> directly. Thus, to enable the vectoriser to have access to these >>> intrinsics, we implement them using builtin functions, which we >>> expand to the proper standard pattern using a define_expand. >>> >>> This patch also implements the __ARM_FEATURE_NUMERIC_MAXMIN macro, >>> which is defined when __ARM_ARCH >= 8, and which enables the >>> intrinsics. >>> >>> Regression tested on arm-none-eabi and no regressions. >>> >>> This patch is a rework of a previous patch: >>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01971.html >>> >>> OK for trunk? These cases failed on arm-none-linux-gnueabihf as below: FAIL: gcc.target/arm/simd/vmaxnm_f32_1.c execution test FAIL: gcc.target/arm/simd/vmaxnmq_f32_1.c execution test FAIL: gcc.target/arm/simd/vminnm_f32_1.c execution test FAIL: gcc.target/arm/simd/vminnmq_f32_1.c execution test For such changes, I would suggest reg test for both bare-metal and linux toolchains, plus a bootstrap for linux toolchain. Thanks, bin > > > Ok. > Thanks, > Kyrill > > >>> Thanks, >>> Tamar >>> >>> --- >>> >>> gcc/ >>> >>> 2016-10-19 Bilyan Borisov >>> Tamar Christina >>> >>> * config/arm/arm-c.c (arm_cpu_builtins): New macro definition. >>> * config/arm/arm_neon.h (vmaxnm_f32): New intrinsinc. >>> (vmaxnmq_f32): Likewise. >>> (vminnm_f32): Likewise. >>> (vminnmq_f32): Likewise. >>> * config/arm/arm_neon_builtins.def (vmaxnm): New builtin. >>> (vminnm): Likewise. >>> * config/arm/neon.md (neon_, VCVTF): New >>> expander. >>> >>> gcc/testsuite/ >>> >>> 2016-10-19 Bilyan Borisov >>> >>> * gcc.target/arm/simd/vmaxnm_f32_1.c: New. >>> * gcc.target/arm/simd/vmaxnmq_f32_1.c: Likewise. >>> * gcc.target/arm/simd/vminnm_f32_1.c: Likewise. >>> * gcc.target/arm/simd/vminnmq_f32_1.c: Likewise. >>> >> I think you forgot to attach the new tests. >> >> Christophe >> >
Re: [PATCH] Fix store-merging alias check, apply some TLC
On Wed, 2 Nov 2016, Richard Biener wrote: > > This fixes the alias check in terminate_all_aliasing_chains -- the > base we use for the hash table indexing does not constitute an object > that aliases all stores in the chain (consider for example the MEM_REF > handling which strips the offset completely). I fat-fingered this thinking one step ahead (forgot to take the address). > I've refactored data structures a bit in the process of making > (as a followup) 'base_addr' a true address (and thus avoid building > that new MEM_REF for example). A followup will then try to support > (some) bases with offset != NULL_TREE. > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. So I had to re-start the testing when I already finished the very minor change to make addr_base actually the address. Thus the 2nd part is now included. This also has the advantage that the hash table comparison will now apply the lax rules valid for comparing addresses (thus the TREE_THIS_VOLATILE check should be no longer needed for example and also different TBAA stores will now be merged). I took the liberty to explicitely rule out TARGET_MEM_REFs from the merging (they'll have failed to form groups anyway as they are not subsetted and we don't handle the const offset stripping yet to eventually merge byte to word stores). Richard. 2016-11-02 Richard Biener * gimple-ssa-store-merging.c (struct store_immediate_info): Remove redundant val and dest members. (store_immediate_info::store_immediate_info): Adjust. (merged_store_group::merged_store_group): Adjust. (merged_store_group::apply_stores): Likewise. (struct imm_store_chain_info): Add base_addr field. (imm_store_chain_info::imm_store_chain_info): New constructor. (imm_store_chain_info::terminate_and_process_chain): Do not pass base. (imm_store_chain_info::output_merged_store): Likewise. Use addr_base which is already the address. (imm_store_chain_info::output_merged_stores): Likewise. (pass_tree_store_merging::terminate_all_aliasing_chains): Take imm_store_chain_info instead of base. Fix alias check. (pass_tree_store_merging::terminate_and_release_chain): Likewise. (imm_store_chain_info::coalesce_immediate_stores): Adjust. (pass_store_merging::execute): Refuse to operate on TARGET_MEM_REF. use the address of the base and adjust for other changes. Index: gcc/gimple-ssa-store-merging.c === --- gcc/gimple-ssa-store-merging.c (revision 241779) +++ gcc/gimple-ssa-store-merging.c (working copy) @@ -140,19 +140,17 @@ struct store_immediate_info { unsigned HOST_WIDE_INT bitsize; unsigned HOST_WIDE_INT bitpos; - tree val; - tree dest; gimple *stmt; unsigned int order; - store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, tree, - tree, gimple *, unsigned int); + store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, + gimple *, unsigned int); }; store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs, - unsigned HOST_WIDE_INT bp, tree v, - tree d, gimple *st, + unsigned HOST_WIDE_INT bp, + gimple *st, unsigned int ord) - : bitsize (bs), bitpos (bp), val (v), dest (d), stmt (st), order (ord) + : bitsize (bs), bitpos (bp), stmt (st), order (ord) { } @@ -557,7 +555,7 @@ merged_store_group::merged_store_group ( /* VAL has memory allocated for it in apply_stores once the group width has been finalized. */ val = NULL; - align = get_object_alignment (info->dest); + align = get_object_alignment (gimple_assign_lhs (info->stmt)); stores.create (1); stores.safe_push (info); last_stmt = info->stmt; @@ -654,14 +652,16 @@ merged_store_group::apply_stores () FOR_EACH_VEC_ELT (stores, i, info) { unsigned int pos_in_buffer = info->bitpos - start; - bool ret = encode_tree_to_bitpos (info->val, val, info->bitsize, -pos_in_buffer, buf_size); + bool ret = encode_tree_to_bitpos (gimple_assign_rhs1 (info->stmt), + val, info->bitsize, + pos_in_buffer, buf_size); if (dump_file && (dump_flags & TDF_DETAILS)) { if (ret) { fprintf (dump_file, "After writing "); - print_generic_expr (dump_file, info->val, 0); + print_generic_expr (dump_file, + gimple_assign_rhs1 (info->stmt), 0); fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC " at position %d the merged r
Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS
> I think you're right. I suppose the new condition should be: > > #ifdef LOAD_EXTEND_OP > /* If this is a typical RISC machine, we only have to worry >about the way loads are extended. */ > if (!WORD_REGISTER_OPERATIONS > > || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND > >? val_signbit_known_set_p (inner_mode, nonzero) > >: LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND) > || > || !MEM_P (SUBREG_REG (x > > #endif Agreed. > Would you prefer me to make this change or just revert the patch? Go ahead and make the change, but please do a bit of comment massaging in the process, for example: #ifdef LOAD_EXTEND_OP /* On many CISC machines, accessing an object in a wider mode causes the high-order bits to become undefined. So they are not known to be zero. */ if (!WORD_REGISTER_OPERATIONS /* If this is a typical RISC machine, we only have to worry about the way loads are extended. */ || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND ? val_signbit_known_set_p (inner_mode, nonzero) : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND) || !MEM_P (SUBREG_REG (x #endif { if (GET_MODE_PRECISION (GET_MODE (x)) > GET_MODE_PRECISION (inner_mode)) nonzero |= (GET_MODE_MASK (GET_MODE (x)) & ~GET_MODE_MASK (inner_mode)); } -- Eric Botcazou
Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.
What is currently submitted for LLVM review was submitted before we determined this naming scheme. I will mark the current submittal as abandoned, as the scheduling model needs to be split out and revised. Joel Jones Sent from my AArch64 powered iPhone > On Nov 2, 2016, at 3:55 AM, James Greenhalgh wrote: > >> On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote: >>> On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski wrote: >>> Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 (variant >>> 1) >>> is an ARMv8.1 part, I needed to add detecting of the variant also for this >>> difference. Also I simplify a little bit and combined the single core and >>> arch detecting cases so it would be easier to add variant. >> >> Actually it is a bit more complex than what I said here, see below for >> the full table of options and what are enabled/disabled now. >> >>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. >>> Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is >>> deecting the two seperately. >> >> >> Here is the final patch in this series updated; I changed the cpu name >> slightly and made sure I updated invoke.texi too. >> >> The names are going to match the names in LLVM (worked with our LLVM >> engineer here at Cavium about the names). >> Here are the names recorded and >> -mpcu=thunderx: >> *Matches part num 0xA0 (reserved for ThunderX 8x series) >> *T88 Pass 2 scheduling >> *Hardware prefetching (software prefetching disabled) >> *LSE enabled >> *no v8.1 > > This doesn't match the current LLVM proposal > ( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support > for -mcpu=thunderx. > >> -mcpu=thunderxt88: >> *Matches part num 0xA1 >> *T88 Pass 2 scheduling >> *software prefetching enabled >> *LSE enabled >> *no v8.1 >> >> -mcpu=thunderxt88p1 (only for GCC): >> *Matches part num 0xA1, variant 0 >> *T88 Pass 1 scheduling >> *software prefetching enabled >> *no LSE enabled >> *no v8.1 >> >> -mcpu=thunderxt81 and -mcpu=thunderxt83: >> *Matches part num 0xA2/0xA3 >> *T88 Pass 2 scheduling >> *Hardware prefetching (software prefetching disabled) >> *LSE enabled >> *v8.1 > > This looks like what has been added to LLVM as -mcpu=thunderx. > >> I have not hooked up software vs hardware prefetching and the >> scheduler parts (the next patch will do part of that); both ARMv8.1-a >> and LSE parts are hooked up as those parts are only in >> aarch64-cores.def. >> >> OK? Bootstrapped and tested on ThunderX T88 and ThunderX T81 >> (aarch64-linux-gnu). >> >> Index: common/config/aarch64/aarch64-common.c >> === >> --- common/config/aarch64/aarch64-common.c(revision 241727) >> +++ common/config/aarch64/aarch64-common.c(working copy) >> @@ -145,7 +145,7 @@ struct arch_to_arch_name >>the default set of architectural feature flags they support. */ >> static const struct processor_name_to_arch all_cores[] = >> { >> -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \ >> +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, >> VARIANT) \ >> {NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS}, >> #include "config/aarch64/aarch64-cores.def" >> {"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8}, >> Index: config/aarch64/aarch64-cores.def >> === >> --- config/aarch64/aarch64-cores.def(revision 241727) >> +++ config/aarch64/aarch64-cores.def(working copy) >> @@ -21,7 +21,7 @@ >> >>Before using #include to read this file, define a macro: >> >> - AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, >> FLAGS, COSTS, IMP, PART) >> + AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, >> FLAGS, COSTS, IMP, PART, VARIANT) >> >>The CORE_NAME is the name of the core, represented as a string constant. >>The CORE_IDENT is the name of the core, represented as an identifier. >> @@ -39,39 +39,45 @@ >>PART is the part number of the CPU. On a GNU/Linux system it can be >>found in /proc/cpuinfo. For big.LITTLE systems this should use the >>macro AARCH64_BIG_LITTLE where the big part number comes as the first >> - argument to the macro and little is the second. */ >> + argument to the macro and little is the second. >> + VARIANT is the variant of the CPU. In a GNU/Linux system it can found >> + in /proc/cpuinfo. If this is -1, this means it can match any variant. >> */ >> >> /* V8 Architecture Processors. */ >> >> /* ARM ('A') cores. */ >> -AARCH64_CORE("cortex-a35", cortexa35, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 >> | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04) >> -AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH6
Re: [PATCH, RFC, rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add
On Nov 2, 2016, at 4:28 AM, Jakub Jelinek wrote: > > On Wed, Nov 02, 2016 at 10:19:26AM +0100, Richard Biener wrote: >> On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek wrote: >>> On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote: The PowerPC back end loses performance on vector intrinsics, because currently all of them are treated as calls throughout the middle-end phases and only expanded when they reach RTL. Our version of altivec.h currently defines the public names of overloaded functions (like vec_add) to be #defines for hidden functions (like __builtin_vec_add), which are recognized in the parser as requiring special back-end support. Tables in rs6000-c.c handle dispatch of the overloaded functions to specific function calls appropriate to the argument types. >>> >>> This doesn't look very nice. If all you care is that the builtins like >>> __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold >>> into generic vector operations under certain conditions, just fold those >>> into whatever you want in targetm.gimple_fold_builtin (gsi). >> >> Note that traditionally "overloading" with GCC "builtins" is done by >> using varargs >> and the "type generic" attribute. That doesn't scale to return type >> overloading >> though for which we usually added direct support to the parser (for example >> for __builtin_shuffle). > > My understanding is that rs6000 already does that, it hooks into > resolve_overloaded_builtin which already handles the fully type generic > builtins where not just the arguments, but also the return type can be > picked up. But it resolves the overloaded builtins into calls to other > builtins that are not type-generic. > > So, either that function instead of returning the specific md builtin calls > in some cases already returns trees with the generic behavior of the > builtin, or it returns what it does now and then in the gimple fold builtin > target hook (note, the normal fold builtin target hook is not right for > that, because it is mostly used for folding builtins into constant - callers > will usually throw away other results) fold those specific md builtins > into whatever GIMPLE you want. If we want to decrease amount of folding in > the FEs, the gimple fold builtin hook is probably better. > > Jakub Thanks, all. Using the gimple_fold_builtin target hook works very well and is exactly what I'm looking for. I've reworked the patch to the much simpler https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00104.html. Much obliged for the help! Bill
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Tue, 1 Nov 2016, Yuri Rumyantsev wrote: > Hi All, > > I re-send all patches sent by Ilya earlier for review which support > vectorization of loop epilogues and loops with low trip count. We > assume that the only patch - vec-tails-07-combine-tail.patch - was not > approved by Jeff. > > I did re-base of all patches and performed bootstrapping and > regression testing that did not show any new failures. Also all > changes related to new vect_do_peeling algorithm have been changed > accordingly. > > Is it OK for trunk? I would have prefered that the series up to -03-nomask-tails would _only_ contain epilogue loop vectorization changes but unfortunately the patchset is oddly separated. I have a comment on that part nevertheless: @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) /* Check if we can possibly peel the loop. */ if (!vect_can_advance_ivs_p (loop_vinfo) || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) - || loop->inner) + || loop->inner + /* Required peeling was performed in prologue and +is not required for epilogue. */ + || LOOP_VINFO_EPILOGUE_P (loop_vinfo)) do_peeling = false; if (do_peeling @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) do_versioning = optimize_loop_nest_for_speed_p (loop) - && (!loop->inner); /* FORNOW */ + && (!loop->inner) /* FORNOW */ +/* Required versioning was performed for the + original loop and is not required for epilogue. */ + && !LOOP_VINFO_EPILOGUE_P (loop_vinfo); if (do_versioning) { please do that check in the single caller of this function. Otherwise I still dislike the new ->aux use and I believe that simply passing down info from the processed parent would be _much_ cleaner. That is, here (and avoid the FOR_EACH_LOOP change): @@ -580,12 +586,21 @@ vectorize_loops (void) && dump_enabled_p ()) dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location, "loop vectorized\n"); - vect_transform_loop (loop_vinfo); + new_loop = vect_transform_loop (loop_vinfo); num_vectorized_loops++; /* Now that the loop has been vectorized, allow it to be unrolled etc. */ loop->force_vectorize = false; + /* Add new loop to a processing queue. To make it easier + to match loop and its epilogue vectorization in dumps + put new loop as the next loop to process. */ + if (new_loop) + { + loops.safe_insert (i + 1, new_loop->num); + vect_loops_num = number_of_loops (cfun); + } simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop) function which will set up stuff properly (and also perform the if-conversion of the epilogue there). That said, if we can get in non-masked epilogue vectorization separately that would be great. I'm still torn about all the rest of the stuff and question its usability (esp. merging the epilogue with the main vector loop). But it has already been approved ... oh well. Thanks, Richard.
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 2, 2016 at 10:52 AM, Jakub Jelinek wrote: > On Wed, Nov 02, 2016 at 10:40:35AM +0100, Richard Biener wrote: >> > I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property >> > set during the asan pass and kept on until end of compilation of that >> > function. That means even if a var only addressable because of ASAN_MARK >> > is >> > discovered after this pass we'd still be able to rewrite it into SSA. >> >> Note that being TREE_ADDRESSABLE also has effects on alias analysis >> (didn't follow the patches to see whether you handle ASAN_MARK specially >> in points-to analysis and/or alias analysis). >> >> Generally in update-address-taken you can handle ASAN_MARK similar to >> MEM_REF (and drop it in the rewrite phase?). > > That is the intent, but we can't do that before the asan pass, because > otherwise as Martin explained we don't diagnose at runtime bugs where > a variable is used outside of its scope only through a MEM_REF. > So we need to wait for asan pass to actually add a real builtin call that > takes the address in that case. Except now I really don't see how that > can work for the case where the var is used only properly when it is in the > scope, because the asan pass will still see those being addressable. > > Unless I'm missing something we either need to perform further analysis > during the addressable subpass - this variable could be made > non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA > form there be any SSA uses of the poisoning ASAN_MARK? If yes, keep it > addressable, otherwise rewrite into SSA. > Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into > some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any > uses of those, rewrite it back into addressable immediately or later or > something. Or just give up optimizing (asan has a penalty anyway)? Or re-structure ASAN_POISON () similar to clobbers: var = ASAN_POISION (); that avoids the address-taking and thus should be less heavy-weight on optimizations. Richard. > > Jakub
Re: [RFC PATCH] expand_strn_compare should attempt expansion even if neither string is constant
On 10/27/2016 03:14 AM, Aaron Sawdey wrote: I'm currently working on a builtin expansion of strncmp for powerpc similar to the one for memcmp I checked recently. One thing I encountered is that the code in expand_strn_compare will not attempt to expand the cmpstrnsi pattern at all if neither string parameter is a constant string. This doesn't make a lot of sense in light of the fact that expand_str_compare starts with an attempt to expand cmpstrsi and then if that does not work, looks at the string args to see if one is constant so it can use cmpstrnsi with the known length. The attached patch is my attempt to add this fallback path to expand_strn_compare, i.e. if neither length is known, just attempt expansion of cmpstrnsi using the given 3 arguments. It looks like in addition to rs6000, there are 3 other targets that have cmpstrnsi patterns: i386, sh, and rx. Is this a reasonable approach to take with this? If so I'll bootstrap/regtest on i386 as rs6000 does not as yet have an expansion for cmpstrsi or cmpstrnsi. Just to be sure, this is superseded by your later patch series, correct? (After I saw this one I had been trying to remember what exactly the i386 issue was but it looks like you found it yourself.) Bernd
Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS
On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra wrote: > Jeff Law wrote: > >> I think you'll need to look at bz61320 before this could go in. > > I had a look, but there is nothing there that is related - eventually > a latent alignment bug was fixed in IVOpt. Note that the bswap phase > currently inserts unaligned accesses irrespectively of STRICT_ALIGNMENT > or SLOW_UNALIGNED_ACCESS: > > - if (bswap > - && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type)) > - && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align)) > - return false; > > If bswap is false no byte swap is needed, so we found a native endian load > and it will always perform the optimization by inserting an unaligned load. Yes, the general agreement is that the expander can do best and thus we should canonicalize accesses to larger ones even for SLOW_UNALIGNED_ACCESS. The expander will generate the canonical best code (hopefully...). > This apparently works on all targets, and doesn't cause alignment traps or > huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS. > So I'm at a loss what these macros are supposed to mean and how I can query > whether a backend supports fast unaligned access for a particular mode. > > What I actually want to write is something like: > > if (!FAST_UNALIGNED_LOAD (mode, align)) return false; > > And know that it only accepts unaligned accesses that are efficient on the > target. > Maybe we need a new hook like this and get rid of the old one? No, we don't need to other hook. Note there is another similar user in gimple-fold.c when folding small memcpy/memmove to single load/store pairs (patch posted but not applied by me -- I've asked for strict-align target maintainer feedback but got none). Now - for bswap I'm only 99% sure that unaligned load + bswap is better than piecewise loads plus manual swap. But generally I'm always in favor of removing SLOW_UNALIGNED_ACCESS / STRICT_ALIGNMENT checks from the GIMPLE side of the compiler. Richard. > Wilco >
Re: [ping * 4] PR35503 - warn for restrict
The format-checking parts of the patch are OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 02, 2016 at 01:36:31PM +0100, Richard Biener wrote: > > Unless I'm missing something we either need to perform further analysis > > during the addressable subpass - this variable could be made > > non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA > > form there be any SSA uses of the poisoning ASAN_MARK? If yes, keep it > > addressable, otherwise rewrite into SSA. > > Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into > > some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any > > uses of those, rewrite it back into addressable immediately or later or > > something. > > Or just give up optimizing (asan has a penalty anyway)? Or Well, asan has a penalty and -fsanitize-use-after-scope even bigger penalty, but the point is to make that penalty bearable. > re-structure ASAN_POISON () > similar to clobbers: > > var = ASAN_POISION (); > > that avoids the address-taking and thus should be less heavy-weight on > optimizations. Yeah, that is what I meant. The issue is how to report uses of such SSA_NAME when there is no memory. So, either we'd need a special runtime library entrypoint that would report uses after scope even when there is no underlying memory, or we'd need to force it at asan pass time into memory again. Jakub
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 2, 2016 at 1:56 PM, Jakub Jelinek wrote: > On Wed, Nov 02, 2016 at 01:36:31PM +0100, Richard Biener wrote: >> > Unless I'm missing something we either need to perform further analysis >> > during the addressable subpass - this variable could be made >> > non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA >> > form there be any SSA uses of the poisoning ASAN_MARK? If yes, keep it >> > addressable, otherwise rewrite into SSA. >> > Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into >> > some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any >> > uses of those, rewrite it back into addressable immediately or later or >> > something. >> >> Or just give up optimizing (asan has a penalty anyway)? Or > > Well, asan has a penalty and -fsanitize-use-after-scope even bigger penalty, > but the point is to make that penalty bearable. > >> re-structure ASAN_POISON () >> similar to clobbers: >> >> var = ASAN_POISION (); >> >> that avoids the address-taking and thus should be less heavy-weight on >> optimizations. > > Yeah, that is what I meant. The issue is how to report uses of such > SSA_NAME when there is no memory. So, either we'd need a special runtime > library entrypoint that would report uses after scope even when there is no > underlying memory, or we'd need to force it at asan pass time into memory > again. Well, there can't be any uses outside the scope -- there are no (memory) uses left if we rewrite the thing into SSA. That is, the address can no longer "escape". Of course there could have been invalid uses before the rewrite into SSA. But those can be diagnosed either immediately before or after re-writing into SSA at compile-time (may be in dead code regions of course). Richard. > Jakub
Re: [ping * 4] PR35503 - warn for restrict
Then I'll approve the whole patch. On Wed, Nov 2, 2016 at 8:42 AM, Joseph Myers wrote: > The format-checking parts of the patch are OK. > > -- > Joseph S. Myers > jos...@codesourcery.com
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 02, 2016 at 01:59:00PM +0100, Richard Biener wrote: > > Yeah, that is what I meant. The issue is how to report uses of such > > SSA_NAME when there is no memory. So, either we'd need a special runtime > > library entrypoint that would report uses after scope even when there is no > > underlying memory, or we'd need to force it at asan pass time into memory > > again. > > Well, there can't be any uses outside the scope -- there are no (memory) uses > left if we rewrite the thing into SSA. That is, the address can no > longer "escape". > > Of course there could have been invalid uses before the rewrite into SSA. But > those can be diagnosed either immediately before or after re-writing into SSA > at compile-time (may be in dead code regions of course). Sure, we can warn on those at compile time, but we really should arrange to error on those at runtime if they are ever executed, the UB happens only at runtime, so in dead code isn't fatal. Jakub
[PATCH] Allow non-NULL offset for store-merging bases
The following teaches store-merging to handle non-NULL offset if the base is already addressable (otherwise introducing new pointers to a non-addressable base invalidates points-to information, see a comment in the patch how we could avoid this in theory). Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2016-11-02 Richard Biener * gimple-ssa-store-merging.c: Include gimplify-me.h. (imm_store_chain_info::output_merged_stores): Force base_addr to be proper GIMPLE for a MEM_REF address. (pass_store_merging::execute): Restrict negative bitpos handling to non-MEM_REF bases. Remove TREE_THIS_VOLATILE check. Take into account non-NULL_TREE offset if the base is already addressable. * gcc.dg/store_merging_8.c: New testcase. Index: gcc/gimple-ssa-store-merging.c === --- gcc/gimple-ssa-store-merging.c (revision 241789) +++ gcc/gimple-ssa-store-merging.c (working copy) @@ -125,6 +125,7 @@ #include "tree-cfg.h" #include "tree-eh.h" #include "target.h" +#include "gimplify-me.h" /* The maximum size (in bits) of the stores this pass should generate. */ #define MAX_STORE_BITSIZE (BITS_PER_WORD) @@ -1127,6 +1128,8 @@ imm_store_chain_info::output_merged_stor unsigned int i; bool fail = false; + tree addr = force_gimple_operand_1 (unshare_expr (base_addr), &seq, + is_gimple_mem_ref_addr, NULL_TREE); FOR_EACH_VEC_ELT (split_stores, i, split_store) { unsigned HOST_WIDE_INT try_size = split_store->size; @@ -1137,7 +1140,7 @@ imm_store_chain_info::output_merged_stor tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED); int_type = build_aligned_type (int_type, align); - tree dest = fold_build2 (MEM_REF, int_type, base_addr, + tree dest = fold_build2 (MEM_REF, int_type, addr, build_int_cst (offset_type, try_pos)); tree src = native_interpret_expr (int_type, @@ -1366,15 +1369,10 @@ pass_store_merging::execute (function *f &unsignedp, &reversep, &volatilep); /* As a future enhancement we could handle stores with the same base and offset. */ - bool invalid = offset || reversep || bitpos < 0 + bool invalid = reversep || ((bitsize > MAX_BITSIZE_MODE_ANY_INT) && (TREE_CODE (rhs) != INTEGER_CST)) -|| !rhs_valid_for_store_merging_p (rhs) - /* An access may not be volatile itself but base_addr may be - a volatile decl i.e. MEM[&volatile-decl]. The hashing for - tree_operand_hash won't consider such stores equal to each - other so we can't track chains on them. */ -|| TREE_THIS_VOLATILE (base_addr); +|| !rhs_valid_for_store_merging_p (rhs); /* We do not want to rewrite TARGET_MEM_REFs. */ if (TREE_CODE (base_addr) == TARGET_MEM_REF) @@ -1398,7 +1396,32 @@ pass_store_merging::execute (function *f /* get_inner_reference returns the base object, get at its address now. */ else - base_addr = build_fold_addr_expr (base_addr); + { + if (bitpos < 0) + invalid = true; + base_addr = build_fold_addr_expr (base_addr); + } + + if (! invalid + && offset != NULL_TREE) + { + /* If the access is variable offset then a base +decl has to be address-taken to be able to +emit pointer-based stores to it. +??? We might be able to get away with +re-using the original base up to the first +variable part and then wrapping that inside +a BIT_FIELD_REF. */ + tree base = get_base_address (base_addr); + if (! base + || (DECL_P (base) + && ! TREE_ADDRESSABLE (base))) + invalid = true; + else + base_addr = build2 (POINTER_PLUS_EXPR, + TREE_TYPE (base_addr), + base_addr, offset); + } struct imm_store_chain_info **chain_info = m_stores.get (base_addr); Index: gcc/testsuite/gcc.dg/store_merging_8.c === --- gcc/testsuite/gcc.dg/store_merging_8.c (revision 0) +++ gcc/testsuite/gcc.dg/store_merging_8.c (working copy) @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target non_stric
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 2, 2016 at 2:06 PM, Jakub Jelinek wrote: > On Wed, Nov 02, 2016 at 01:59:00PM +0100, Richard Biener wrote: >> > Yeah, that is what I meant. The issue is how to report uses of such >> > SSA_NAME when there is no memory. So, either we'd need a special runtime >> > library entrypoint that would report uses after scope even when there is no >> > underlying memory, or we'd need to force it at asan pass time into memory >> > again. >> >> Well, there can't be any uses outside the scope -- there are no (memory) uses >> left if we rewrite the thing into SSA. That is, the address can no >> longer "escape". >> >> Of course there could have been invalid uses before the rewrite into SSA. >> But >> those can be diagnosed either immediately before or after re-writing into SSA >> at compile-time (may be in dead code regions of course). > > Sure, we can warn on those at compile time, but we really should arrange to > error on those at runtime if they are ever executed, the UB happens only at > runtime, so in dead code isn't fatal. Then we can replace those uses with a call into the asan runtime diagnosing the issue instead? Richard. > Jakub
[PATCH] libiberty: Fix -Wimplicit-fallthrough warnings.
Adjust some comments, add some explicit fall through comments or explicit returns where necessary to not get implicit-fallthrough warnings. All fall throughs were deliberate. In one case I added an explicit return false for clarity instead of falling through a default case (that also would return false). libiberty/ChangeLog: * cplus-dem.c (demangle_signature): Move fall through comment. (demangle_fund_type): Add fall through comment between 'G' and 'I'. * hashtab.c (iterative_hash): Add fall through comments. * regex.c (regex_compile): Add Fall through comment after '+'/'?'. (byte_re_match_2_internal): Add Fall through comment after jump_n. Change "Note fall through" to "Fall through". (common_op_match_null_string_p): Return false after set_number_at instead of fall through. --- diff --git a/libiberty/cplus-dem.c b/libiberty/cplus-dem.c index 7f63397..810e971 100644 --- a/libiberty/cplus-dem.c +++ b/libiberty/cplus-dem.c @@ -1658,8 +1658,8 @@ demangle_signature (struct work_stuff *work, break; } else - /* fall through */ {;} + /* fall through */ default: if (AUTO_DEMANGLING || GNU_DEMANGLING) @@ -4024,6 +4024,7 @@ demangle_fund_type (struct work_stuff *work, success = 0; break; } + /* fall through */ case 'I': (*mangled)++; if (**mangled == '_') diff --git a/libiberty/hashtab.c b/libiberty/hashtab.c index 04607ea..99381b1 100644 --- a/libiberty/hashtab.c +++ b/libiberty/hashtab.c @@ -962,17 +962,17 @@ iterative_hash (const PTR k_in /* the key */, c += length; switch(len) /* all the case statements fall through */ { -case 11: c+=((hashval_t)k[10]<<24); -case 10: c+=((hashval_t)k[9]<<16); -case 9 : c+=((hashval_t)k[8]<<8); +case 11: c+=((hashval_t)k[10]<<24);/* fall through */ +case 10: c+=((hashval_t)k[9]<<16); /* fall through */ +case 9 : c+=((hashval_t)k[8]<<8); /* fall through */ /* the first byte of c is reserved for the length */ -case 8 : b+=((hashval_t)k[7]<<24); -case 7 : b+=((hashval_t)k[6]<<16); -case 6 : b+=((hashval_t)k[5]<<8); -case 5 : b+=k[4]; -case 4 : a+=((hashval_t)k[3]<<24); -case 3 : a+=((hashval_t)k[2]<<16); -case 2 : a+=((hashval_t)k[1]<<8); +case 8 : b+=((hashval_t)k[7]<<24); /* fall through */ +case 7 : b+=((hashval_t)k[6]<<16); /* fall through */ +case 6 : b+=((hashval_t)k[5]<<8); /* fall through */ +case 5 : b+=k[4]; /* fall through */ +case 4 : a+=((hashval_t)k[3]<<24); /* fall through */ +case 3 : a+=((hashval_t)k[2]<<16); /* fall through */ +case 2 : a+=((hashval_t)k[1]<<8); /* fall through */ case 1 : a+=k[0]; /* case 0: nothing left to add */ } diff --git a/libiberty/regex.c b/libiberty/regex.c index 9ffc3f4..6854e3b 100644 --- a/libiberty/regex.c +++ b/libiberty/regex.c @@ -2493,6 +2493,7 @@ PREFIX(regex_compile) (const char *ARG_PREFIX(pattern), if ((syntax & RE_BK_PLUS_QM) || (syntax & RE_LIMITED_OPS)) goto normal_char; + /* Fall through. */ handle_plus: case '*': /* If there is no previous pattern... */ @@ -6697,6 +6698,7 @@ byte_re_match_2_internal (struct re_pattern_buffer *bufp, { case jump_n: is_a_jump_n = true; + /* Fall through. */ case pop_failure_jump: case maybe_pop_jump: case jump: @@ -7125,7 +7127,7 @@ byte_re_match_2_internal (struct re_pattern_buffer *bufp, DEBUG_PRINT1 (" Match => jump.\n"); goto unconditional_jump; } -/* Note fall through. */ +/* Fall through. */ /* The end of a simple repeat has a pop_failure_jump back to @@ -7150,7 +7152,7 @@ byte_re_match_2_internal (struct re_pattern_buffer *bufp, dummy_low_reg, dummy_high_reg, reg_dummy, reg_dummy, reg_info_dummy); } - /* Note fall through. */ + /* Fall through. */ unconditional_jump: #ifdef _LIBC @@ -7453,6 +7455,7 @@ byte_re_match_2_internal (struct re_pattern_buffer *bufp, { case jump_n: is_a_jump_n = true; + /* Fall through. */ case maybe_pop_jump: case pop_failure_jump: case jump: @@ -7718,6 +7721,7 @@ PREFIX(common_op_match_null_string_p) (UCHAR_T **p, UCHAR_T *end, case set_number_at: p1 += 2 * OFFSET_ADDRESS_SIZE; + return false; default: /* All other opcodes mean we cannot match the empty string. */ -- 1.8.3.1
Re: [PATCH] libiberty: Fix -Wimplicit-fallthrough warnings.
On Wed, Nov 02, 2016 at 02:19:33PM +0100, Mark Wielaard wrote: > Adjust some comments, add some explicit fall through comments or explicit > returns where necessary to not get implicit-fallthrough warnings. > > All fall throughs were deliberate. In one case I added an explicit return > false for clarity instead of falling through a default case (that also > would return false). > > libiberty/ChangeLog: > >* cplus-dem.c (demangle_signature): Move fall through comment. >(demangle_fund_type): Add fall through comment between 'G' and 'I'. >* hashtab.c (iterative_hash): Add fall through comments. >* regex.c (regex_compile): Add Fall through comment after '+'/'?'. >(byte_re_match_2_internal): Add Fall through comment after jump_n. >Change "Note fall through" to "Fall through". >(common_op_match_null_string_p): Return false after set_number_at >instead of fall through. LGTM, except for: > --- a/libiberty/cplus-dem.c > +++ b/libiberty/cplus-dem.c > @@ -1658,8 +1658,8 @@ demangle_signature (struct work_stuff *work, > break; > } > else > - /* fall through */ > {;} > + /* fall through */ I think you should just remove the else and {;} and just have fallthrough comment indented where else used to be. Jakub
Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
On 01/11/16 16:48, Jason Merrill wrote: > It seems to me that a CFA_*expression note would never use target > UNSPEC codes, and a DWARF UNSPEC would never appear outside of such a > note, so we don't need to worry about conflicts. Indeed. DWARF UNSPEC is deeper inside DW_CFA_*expression note. My worry of conflict makes no sense. I updated the patch to put DWARF operation in to UNSPEC number field. x86-64 bootstrap OK, no regression on gcc/g++. Please review. Thanks. gcc/ 2016-11-02 Jiong Wang * reg-notes.def (CFA_VAL_EXPRESSION): New entry. * dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New function. (dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION. (output_cfa_loc): Support DW_CFA_val_expression. (output_cfa_loc_raw): Likewise. (output_cfi): Likewise. (output_cfi_directive): Likewise. * dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression. (dw_cfi_oprnd2_desc): Likewise. (mem_loc_descriptor): Recognize new pattern generated for value expression. diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 6491d5aaf4c4a21241cc718bfff1016f6d149951..b8c88fbae1df80a2664a414d8ae016a5343bf435 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set) reg_save (sregno, dregno, 0); } -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ static void dwarf2out_frame_debug_cfa_expression (rtx set) @@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set) update_row_reg_save (cur_row, regno, cfi); } +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION + note. */ + +static void +dwarf2out_frame_debug_cfa_val_expression (rtx set) +{ + rtx dest = SET_DEST (set); + gcc_assert (REG_P (dest)); + + rtx span = targetm.dwarf_register_span (dest); + gcc_assert (!span); + + rtx src = SET_SRC (set); + dw_cfi_ref cfi = new_cfi (); + cfi->dw_cfi_opc = DW_CFA_val_expression; + cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest); + cfi->dw_cfi_oprnd2.dw_cfi_loc += mem_loc_descriptor (src, GET_MODE (src), + GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED); + add_cfi (cfi); + update_row_reg_save (cur_row, dwf_regno (dest), cfi); +} + /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note. */ static void @@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn) break; case REG_CFA_EXPRESSION: + case REG_CFA_VAL_EXPRESSION: n = XEXP (note, 0); if (n == NULL) n = single_set (insn); - dwarf2out_frame_debug_cfa_expression (n); + + if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION) + dwarf2out_frame_debug_cfa_expression (n); + else + dwarf2out_frame_debug_cfa_val_expression (n); + handled_one = true; break; @@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression + || cfi->dw_cfi_opc == DW_CFA_val_expression) { unsigned r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh); @@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression + || cfi->dw_cfi_opc == DW_CFA_val_expression) { unsigned r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1); @@ -3188,6 +3219,7 @@ output_cfi (dw_cfi_ref cfi, dw_fde_ref fde, int for_eh) case DW_CFA_def_cfa_expression: case DW_CFA_expression: + case DW_CFA_val_expression: output_cfa_loc (cfi, for_eh); break; @@ -3302,16 +3334,13 @@ output_cfi_directive (FILE *f, dw_cfi_ref cfi) break; case DW_CFA_def_cfa_expression: - if (f != asm_out_file) - { - fprintf (f, "\t.cfi_def_cfa_expression ...\n"); - break; - } - /* FALLTHRU */ case DW_CFA_expression: +case DW_CFA_val_expression: if (f != asm_out_file) { - fprintf (f, "\t.cfi_cfa_expression ...\n"); + fprintf (f, "\t.cfi_%scfa_%sexpression ...\n", + cfi->dw_cfi_opc == DW_CFA_def_cfa_expression ? "def_" : "", + cfi->dw_cfi_opc == DW_CFA_val_expression ? "val_" : ""); break; } fprintf (f, "\t.cfi_escape %#x,", cfi->dw_cfi_opc); diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 4a3df339df2c6a6816ac8b8dbdb2466a7492c592..7dac70d7392f2c457ffd3f677e07decb1ba488a1 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -518,6 +518,7 @@ dw_cfi_oprnd1_desc (enum dwarf_call_frame_info cfi) case DW_CFA_def_cfa_register: case DW_CFA_register: case DW_CFA_expression: +case DW_CFA_val_expression: return dw_cfi_oprnd_reg_num; case DW_CFA_def_cfa_offset: @@ -551,6 +552,7 @@ dw_cfi_oprnd2_desc (enum dwarf_
Re: [PATCH 1/2, libgcc]: Implement _divmoddi4
On Mon, Oct 31, 2016 at 7:46 AM, Uros Bizjak wrote: > This function will be used in a follow-up patch to implement > TARGET_EXPAND_DIVMOD_LIBFUNC for x86 targets. Other targets can call > this function, so IMO it should be part of a generic library. > > 2016-10-31 Uros Bizjak > > * Makefile.in (LIB2_DIVMOD_FUNCS): Add _divmoddi4. > * libgcc2.c (__divmoddi4): New function. > * libgcc2.h (__divmoddi4): Declare. > * libgcc-std.ver.in (GCC_7.0.0): New. Add __PFX_divmoddi4 > and __PFX_divmodti4. You aren't defining divmodti4 anywhere, so it seems premature to add it to libgcc-std.ver.in. Other than that the patch is OK. Thanks. Ian
Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsincs.
On 2 November 2016 at 12:22, Bin.Cheng wrote: > On Tue, Nov 1, 2016 at 12:21 PM, Kyrill Tkachov > wrote: >> Hi Tamar, >> >> >> On 26/10/16 16:01, Tamar Christina wrote: >>> >>> Hi Christophe, >>> >>> Here's the updated patch. >>> >>> Cheers, >>> Tamar >>> >>> From: Christophe Lyon >>> Sent: Wednesday, October 19, 2016 11:23:56 AM >>> To: Tamar Christina >>> Cc: GCC Patches; Kyrylo Tkachov; nd >>> Subject: Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and >>> vminnmQ_ST intrinsincs. >>> >>> On 19 October 2016 at 11:36, Tamar Christina >>> wrote: Hi All, This patch implements the vmaxnmQ_ST and vminnmQ_ST intrinsics. The current builtin registration code is deficient since it can't access standard pattern names, to which vmaxnmQ_ST and vminnmQ_ST map directly. Thus, to enable the vectoriser to have access to these intrinsics, we implement them using builtin functions, which we expand to the proper standard pattern using a define_expand. This patch also implements the __ARM_FEATURE_NUMERIC_MAXMIN macro, which is defined when __ARM_ARCH >= 8, and which enables the intrinsics. Regression tested on arm-none-eabi and no regressions. This patch is a rework of a previous patch: https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01971.html OK for trunk? > These cases failed on arm-none-linux-gnueabihf as below: > FAIL: gcc.target/arm/simd/vmaxnm_f32_1.c execution test > FAIL: gcc.target/arm/simd/vmaxnmq_f32_1.c execution test > FAIL: gcc.target/arm/simd/vminnm_f32_1.c execution test > FAIL: gcc.target/arm/simd/vminnmq_f32_1.c execution test > > For such changes, I would suggest reg test for both bare-metal and > linux toolchains, plus a bootstrap for linux toolchain. > Hi, I confirm some tests are failing: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241736/report-build-info.html Sorry I couldn't answer/test before you committed, I was on holidays. Christophe > Thanks, > bin > >> >> >> Ok. >> Thanks, >> Kyrill >> >> Thanks, Tamar --- gcc/ 2016-10-19 Bilyan Borisov Tamar Christina * config/arm/arm-c.c (arm_cpu_builtins): New macro definition. * config/arm/arm_neon.h (vmaxnm_f32): New intrinsinc. (vmaxnmq_f32): Likewise. (vminnm_f32): Likewise. (vminnmq_f32): Likewise. * config/arm/arm_neon_builtins.def (vmaxnm): New builtin. (vminnm): Likewise. * config/arm/neon.md (neon_, VCVTF): New expander. gcc/testsuite/ 2016-10-19 Bilyan Borisov * gcc.target/arm/simd/vmaxnm_f32_1.c: New. * gcc.target/arm/simd/vmaxnmq_f32_1.c: Likewise. * gcc.target/arm/simd/vminnm_f32_1.c: Likewise. * gcc.target/arm/simd/vminnmq_f32_1.c: Likewise. >>> I think you forgot to attach the new tests. >>> >>> Christophe >>> >>
Re: [RFC PATCH] expand_strn_compare should attempt expansion even if neither string is constant
On Wed, 2016-11-02 at 13:41 +0100, Bernd Schmidt wrote: > On 10/27/2016 03:14 AM, Aaron Sawdey wrote: > > > > I'm currently working on a builtin expansion of strncmp for powerpc > > similar to the one for memcmp I checked recently. One thing I > > encountered is that the code in expand_strn_compare will not > > attempt to > > expand the cmpstrnsi pattern at all if neither string parameter is > > a > > constant string. This doesn't make a lot of sense in light of the > > fact > > that expand_str_compare starts with an attempt to expand cmpstrsi > > and > > then if that does not work, looks at the string args to see if one > > is > > constant so it can use cmpstrnsi with the known length. > > > > The attached patch is my attempt to add this fallback path to > > expand_strn_compare, i.e. if neither length is known, just attempt > > expansion of cmpstrnsi using the given 3 arguments. > > > > It looks like in addition to rs6000, there are 3 other targets that > > have cmpstrnsi patterns: i386, sh, and rx. > > > > Is this a reasonable approach to take with this? If so I'll > > bootstrap/regtest on i386 as rs6000 does not as yet have an > > expansion > > for cmpstrsi or cmpstrnsi. > > Just to be sure, this is superseded by your later patch series, > correct? > > (After I saw this one I had been trying to remember what exactly the > i386 issue was but it looks like you found it yourself.) > > > Bernd Yes, the later patch series replaces this preliminary patch. And yes, the i386 issue took some headscratching and careful reading of the arch manual on what repz cmpsb actually does. Thanks, Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain
Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
On Wed, Nov 02, 2016 at 11:39:20AM +0100, Steven Bosscher wrote: > On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener > wrote: > > On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote: > >> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote: > >>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote: > >>> > + cfg_layout_finalize (); > >>> > >>> I don't think you have to go into/out-of cfglayout mode for each > >>> iteration. > >> > >> Yeah probably. I was too lazy :-) It needs the cleanup_cfg every > >> iteration though, I was afraid that interacts. > > > > Ick. Why would it need a cfg-cleanup every iteration? > > I don't believe it needs a cleanup on every iteration. One cleanup at > the end should work fine. But as the comment there says: /* Merge the duplicated blocks into predecessors, when possible. */ cleanup_cfg (0); (this is not a new comment), and without merging blocks this whole patch does zilch. There is no need to create new basic blocks at all (can replace the final branch in a block with a copy of the whole block it jumps to, instead); and then it is painfully obvious that switching to layout mode here is pointless, too. Do you want me to do that? Btw, this isn't quadratic anyway; it is a constant number (the maximal length allowed of the block to copy) linear. Unless there are unboundedly many forwarder blocks, which there shouldn't be, but I cannot prove that. And on a testcase with 2000 cases (instead of the 4 in the testcase in the PR) this pass takes less than 1% of the compiler runtime; and in the normal cases (no computed gotos to unfactor) it does the same as before. Segher
Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
On Wed, Nov 02, 2016 at 01:26:48PM +, Jiong Wang wrote: > -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. > */ > +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. > */ Too long line. > +/* RTL sequences inside PARALLEL are raw expression representation. > + > + mem_loc_descriptor can be used to build generic DWARF expressions for > + DW_CFA_expression and DW_CFA_val_expression where the expression may > can > + not be represented using normal RTL sequences. In this case, group > all > + expression operations (DW_OP_*) inside a PARALLEL. For those DW_OP > which > + doesn't have RTL mapping, wrap it using UNSPEC. The logic for parsing > + PARALLEL sequences is: > + > + foreach elem inside PARALLEL > + if (elem is UNSPEC) > + dw_op = XINT (elem, 1) (DWARF operation is kept as UNSPEC number) > + oprnd1 = XVECEXP (elem, 0, 0) > + oprnd2 = XVECEXP (elem, 0, 1) > + else > + call mem_loc_descriptor */ Not sure if it is a good idea to document in weirdly formatted pseudo-language what the code actually does a few lines below. IMHO either express it in words, or don't express it at all. > + exp_result = > + new_loc_descr ((enum dwarf_location_atom) dw_op, oprnd1, > + oprnd2); Wrong formatting, = should be on the next line. > + } > + else > + exp_result = > + mem_loc_descriptor (elem, mode, mem_mode, > + VAR_INIT_STATUS_INITIALIZED); Likewise. Jakub
Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS
Richard Biener wrote: On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra wrote: > > If bswap is false no byte swap is needed, so we found a native endian load > > and it will always perform the optimization by inserting an unaligned load. > > Yes, the general agreement is that the expander can do best and thus we > should canonicalize accesses to larger ones even for SLOW_UNALIGNED_ACCESS. > The expander will generate the canonical best code (hopefully...). Right, but there are cases where you have to choose between unaligned or aligned accesses and you need to know whether the unaligned access is fast. A good example is memcpy expansion, if you have fast unaligned accesses then you should use them to deal with the last few bytes, but if they get expanded, using several aligned accesses is much faster than a single unaligned access. > > This apparently works on all targets, and doesn't cause alignment traps or > > huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS. > > So I'm at a loss what these macros are supposed to mean and how I can query > > whether a backend supports fast unaligned access for a particular mode. > > > > What I actually want to write is something like: > > > > if (!FAST_UNALIGNED_LOAD (mode, align)) return false; > > > > And know that it only accepts unaligned accesses that are efficient on the > > target. > > Maybe we need a new hook like this and get rid of the old one? > > No, we don't need to other hook. > > Note there is another similar user in gimple-fold.c when folding small > memcpy/memmove > to single load/store pairs (patch posted but not applied by me -- I've > asked for strict-align > target maintainer feedback but got none). I didn't find it, do you have a link? > Now - for bswap I'm only 99% sure that unaligned load + bswap is > better than piecewise loads plus manual swap. It depends on whether unaligned loads and bswap are expanded or not. Even if we assume the expansion is at least as efficient as doing it explicitly (definitely true for modes larger than the native integer size - as we found out in PR77308!), if both the unaligned load and bswap are expanded it seems better not to make the transformation for modes up to the word size. But there is no way to find out as SLOW_UNALIGNED_ACCESS must be true whenever STRICT_ALIGN is true. > But generally I'm always in favor of removing SLOW_UNALIGNED_ACCESS / > STRICT_ALIGNMENT checks from the GIMPLE side of the compiler. I sort of agree because the purpose of these macros is unclear - the documentation is insufficient and out of date. I do believe however we need an accurate way to find out whether a target supports fast unaligned accesses as that is required to generate good target code. Wilco
Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
On Wed, Nov 2, 2016 at 2:40 PM, Segher Boessenkool wrote: > On Wed, Nov 02, 2016 at 11:39:20AM +0100, Steven Bosscher wrote: >> On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener >> wrote: >> > On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote: >> >> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote: >> >>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote: >> >>> > + cfg_layout_finalize (); >> >>> >> >>> I don't think you have to go into/out-of cfglayout mode for each >> >>> iteration. >> >> >> >> Yeah probably. I was too lazy :-) It needs the cleanup_cfg every >> >> iteration though, I was afraid that interacts. >> > >> > Ick. Why would it need a cfg-cleanup every iteration? >> >> I don't believe it needs a cleanup on every iteration. One cleanup at >> the end should work fine. > > But as the comment there says: > > /* Merge the duplicated blocks into predecessors, when possible. */ > cleanup_cfg (0); > > (this is not a new comment), and without merging blocks this whole > patch does zilch. > > There is no need to create new basic blocks at all (can replace the > final branch in a block with a copy of the whole block it jumps to, > instead); and then it is painfully obvious that switching to layout > mode here is pointless, too. > > Do you want me to do that? > > Btw, this isn't quadratic anyway; it is a constant number (the maximal > length allowed of the block to copy) linear. Unless there are unboundedly > many forwarder blocks, which there shouldn't be, but I cannot prove that. Well, you iterate calling functions (find candidates, merge, cleanup-cfg) that walk the whole function. So unless the number of iterations is bound with a constant I call this quadratic (in function size). > And on a testcase with 2000 cases (instead of the 4 in the testcase in > the PR) this pass takes less than 1% of the compiler runtime; and in > the normal cases (no computed gotos to unfactor) it does the same as > before. > > > Segher
Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS
On Wed, Nov 2, 2016 at 2:43 PM, Wilco Dijkstra wrote: > Richard Biener wrote: > On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra > wrote: > >> > If bswap is false no byte swap is needed, so we found a native endian load >> > and it will always perform the optimization by inserting an unaligned load. >> >> Yes, the general agreement is that the expander can do best and thus we >> should canonicalize accesses to larger ones even for SLOW_UNALIGNED_ACCESS. >> The expander will generate the canonical best code (hopefully...). > > Right, but there are cases where you have to choose between unaligned or > aligned > accesses and you need to know whether the unaligned access is fast. > > A good example is memcpy expansion, if you have fast unaligned accesses then > you > should use them to deal with the last few bytes, but if they get expanded, > using several > aligned accesses is much faster than a single unaligned access. Yes. That's RTL expansion at which point you of course have to look at SLOW_UNALIGNED_ACCESS. >> > This apparently works on all targets, and doesn't cause alignment traps or >> > huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS. >> > So I'm at a loss what these macros are supposed to mean and how I can query >> > whether a backend supports fast unaligned access for a particular mode. >> > >> > What I actually want to write is something like: >> > >> > if (!FAST_UNALIGNED_LOAD (mode, align)) return false; >> > >> > And know that it only accepts unaligned accesses that are efficient on the >> > target. >> > Maybe we need a new hook like this and get rid of the old one? >> >> No, we don't need to other hook. >> >> Note there is another similar user in gimple-fold.c when folding small >> memcpy/memmove >> to single load/store pairs (patch posted but not applied by me -- I've >> asked for strict-align >> target maintainer feedback but got none). > > I didn't find it, do you have a link? https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00598.html >> Now - for bswap I'm only 99% sure that unaligned load + bswap is >> better than piecewise loads plus manual swap. > > It depends on whether unaligned loads and bswap are expanded or not. Even if > we > assume the expansion is at least as efficient as doing it explicitly > (definitely true > for modes larger than the native integer size - as we found out in PR77308!), > if both the unaligned load and bswap are expanded it seems better not to make > the > transformation for modes up to the word size. But there is no way to find out > as > SLOW_UNALIGNED_ACCESS must be true whenever STRICT_ALIGN is true. The case I was thinking about is availability of a bswap load operating only on aligned memory and "regular" register bswap being "fake" provided by first spilling to an aligned stack slot and then loading from that. Maybe a bit far-fetched. >> But generally I'm always in favor of removing SLOW_UNALIGNED_ACCESS / >> STRICT_ALIGNMENT checks from the GIMPLE side of the compiler. > > I sort of agree because the purpose of these macros is unclear - the > documentation > is insufficient and out of date. I do believe however we need an accurate way > to find out > whether a target supports fast unaligned accesses as that is required to > generate good > target code. I believe the target macros are solely for RTL expansion and say that it has to avoid unaligned ops as those would trap. Richard. > Wilco
Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
On 02/11/16 13:42, Jakub Jelinek wrote: On Wed, Nov 02, 2016 at 01:26:48PM +, Jiong Wang wrote: -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ Too long line. Hmm, it shows 80 columns under my editor. I guess '+' is counted in? +/* RTL sequences inside PARALLEL are raw expression representation. + + mem_loc_descriptor can be used to build generic DWARF expressions for + DW_CFA_expression and DW_CFA_val_expression where the expression may can + not be represented using normal RTL sequences. In this case, group all + expression operations (DW_OP_*) inside a PARALLEL. For those DW_OP which + doesn't have RTL mapping, wrap it using UNSPEC. The logic for parsing + PARALLEL sequences is: + + foreach elem inside PARALLEL + if (elem is UNSPEC) + dw_op = XINT (elem, 1) (DWARF operation is kept as UNSPEC number) + oprnd1 = XVECEXP (elem, 0, 0) + oprnd2 = XVECEXP (elem, 0, 1) + else + call mem_loc_descriptor */ Not sure if it is a good idea to document in weirdly formatted pseudo-language what the code actually does a few lines below. IMHO either express it in words, or don't express it at all. OK, fixed. I replaced these comments as some brief words. + exp_result = + new_loc_descr ((enum dwarf_location_atom) dw_op, oprnd1, +oprnd2); Wrong formatting, = should be on the next line. + } + else + exp_result = + mem_loc_descriptor (elem, mode, mem_mode, + VAR_INIT_STATUS_INITIALIZED); Likewise. Both fixed. Patch updated, please review. Thanks. gcc/ 2016-11-02 Jiong Wang * reg-notes.def (CFA_VAL_EXPRESSION): New entry. * dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New function. (dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION. (output_cfa_loc): Support DW_CFA_val_expression. (output_cfa_loc_raw): Likewise. (output_cfi): Likewise. (output_cfi_directive): Likewise. * dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression. (dw_cfi_oprnd2_desc): Likewise. (mem_loc_descriptor): Recognize new pattern generated for value expression. commit 36de0173c17efcc30c56ef10304377e71313e8bc Author: Jiong Wang Date: Wed Oct 19 15:42:04 2016 +0100 dwarf val expression diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 6491d5a..b8c88fb 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -1235,7 +1235,7 @@ dwarf2out_frame_debug_cfa_register (rtx set) reg_save (sregno, dregno, 0); } -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */ static void dwarf2out_frame_debug_cfa_expression (rtx set) @@ -1267,6 +1267,29 @@ dwarf2out_frame_debug_cfa_expression (rtx set) update_row_reg_save (cur_row, regno, cfi); } +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_VAL_EXPRESSION + note. */ + +static void +dwarf2out_frame_debug_cfa_val_expression (rtx set) +{ + rtx dest = SET_DEST (set); + gcc_assert (REG_P (dest)); + + rtx span = targetm.dwarf_register_span (dest); + gcc_assert (!span); + + rtx src = SET_SRC (set); + dw_cfi_ref cfi = new_cfi (); + cfi->dw_cfi_opc = DW_CFA_val_expression; + cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (dest); + cfi->dw_cfi_oprnd2.dw_cfi_loc += mem_loc_descriptor (src, GET_MODE (src), + GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED); + add_cfi (cfi); + update_row_reg_save (cur_row, dwf_regno (dest), cfi); +} + /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note. */ static void @@ -2033,10 +2056,16 @@ dwarf2out_frame_debug (rtx_insn *insn) break; case REG_CFA_EXPRESSION: + case REG_CFA_VAL_EXPRESSION: n = XEXP (note, 0); if (n == NULL) n = single_set (insn); - dwarf2out_frame_debug_cfa_expression (n); + + if (REG_NOTE_KIND (note) == REG_CFA_EXPRESSION) + dwarf2out_frame_debug_cfa_expression (n); + else + dwarf2out_frame_debug_cfa_val_expression (n); + handled_one = true; break; @@ -3015,7 +3044,8 @@ output_cfa_loc (dw_cfi_ref cfi, int for_eh) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression + || cfi->dw_cfi_opc == DW_CFA_val_expression) { unsigned r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, for_eh); @@ -3041,7 +3071,8 @@ output_cfa_loc_raw (dw_cfi_ref cfi) dw_loc_descr_ref loc; unsigned long size; - if (cfi->dw_cfi_opc == DW_CFA_expression) + if (cfi->dw_cfi_opc == DW_CFA_expression + || cfi->dw_cfi_opc == DW_CFA_val_expression) { u
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 02, 2016 at 11:10:53AM +0100, Jakub Jelinek wrote: > On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote: > > > Which is gimplified as: > > > > > > int * ptr; > > > > > > switch (argc) , case 1: > > > > { > > > int a; > > > > > > try > > > { > > > ASAN_MARK (2, &a, 4); > > > : > > > goto ; > > > : > > > ptr = &a; > > > goto ; > > > } > > > finally > > > { > > > ASAN_MARK (1, &a, 4); > > > } > > > Shouldn't there be also ASAN_MARK on the implicit gotos from the switch > > statement? Otherwise, consider this being done in a loop, after the first > > iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with > > args 1 and have in case 1: a = 0;, won't that trigger runtime error? > > Wonder if for the variables declared inside of switch body, because we don't > care about uses before scope, it wouldn't be more efficient to arrange for > int *p, *q, *r; > switch (x) > { > int a; > case 1: > p = &a; > a = 5; > break; > int b; > case 2: > int c; > q = &b; > r = &c; > b = 3; > c = 4; > break; > } > to effectively ASAN_MARK (2, &a, 4); ASAN_MARK (2, &b, 4); ASAN_MARK (2, &c, > 4); > before the GIMPLE_SWITCH, instead of unpoisoning them on every case label > where they might be in scope. Though, of course, at least until lower pass > that is quite ugly, because it would refer to "not yet declared" variables. > Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not > the expression evaluation of the switch control expression) inside of the > switches' GIMPLE_BIND. So is there anything I should do wrt -Wswitch-unreachable? Marek
[PATCH 2/4] Split prefix-map values on the last '=' character, not the first
We are planning to ask other tools to support SOURCE_PREFIX_MAP, in the same way that we have already done this for SOURCE_DATE_EPOCH. So, this will last for some time and have quite wide reach. Consequently, we believe it is better to split on the final '=', since it is much less likely to result in problems. For example, with the previous behaviour (splitting on the first '=') it would not be possible to map paths like the following: C:\Documents and Settings\Betrand Russell\Proofs of 1+1=2\Automated Proofs\Source Code\ /srv/net/distributed-hash-table/address/VaIWP8YIWDChR2O76yiufXBsbw5g2skB_kp9VP-qVLvydovdGw==/projects/gcc-6/ These are contrived examples, but hopefully you can agree that they're not *so* unrealistic - future software or users might plausibly wish to run reproducible build processes underneath paths similar to these. On the other hand, I can think of much fewer cases where the new-prefix *must* include a '=' character. I can't think of any software project that includes it, and I'd imagine that any such (or future) projects that might exist would already have standardised "ASCII-only" versions of its name. ChangeLogs -- gcc/ChangeLog: 2016-11-01 Ximin Luo * final.c: (add_debug_prefix_map): Split on the last and not first '='. * doc/invoke.texi (Environment Variables): Update SOURCE_PREFIX_MAP to describe new parsing. Index: gcc-7-20161030/gcc/final.c === --- gcc-7-20161030.orig/gcc/final.c +++ gcc-7-20161030/gcc/final.c @@ -1528,7 +1528,7 @@ add_debug_prefix_map (const char *arg) debug_prefix_map *map; const char *p; - p = strchr (arg, '='); + p = strrchr (arg, '='); if (!p) { error ("invalid value %qs for debug-prefix-map", arg); Index: gcc-7-20161030/gcc/doc/invoke.texi === --- gcc-7-20161030.orig/gcc/doc/invoke.texi +++ gcc-7-20161030/gcc/doc/invoke.texi @@ -26233,8 +26233,8 @@ output by higher-level build processes. The form and behaviour is similar to @option{-fdebug-prefix-map}. That is, the value of @env{SOURCE_PREFIX_MAP} must be of the form -@samp{@var{old}=@var{new}}. The split occurs on the first @code{=} -character, so that @var{old} cannot itself contain a @code{=}. +@samp{@var{old}=@var{new}}. The split occurs on the last @code{=} +character, so that @var{new} cannot itself contain a @code{=}. Whenever an absolute source- or build-related filepath is to be emitted in a final end-result output, GCC will replace @var{old} with @var{new}
[PATCH 4/4] Use SOURCE_DATE_EPOCH in place of __TIMESTAMP__ if the latter is newer.
This brings the behaviour in line with the __DATE__ and __TIME__ macros. Note though the minor difference: __TIMESTAMP__ is defined as the file-modification date and not the "current" date or time like the other two macros. Therefore, we do a clamping behaviour (similar to tar --clamp-mtime). Acknowledgements Reiner Herrmann suggested the clamping behaviour. ChangeLogs -- libcpp/ChangeLog: 2016-11-01 Ximin Luo * macro.c (_cpp_builtin_macro_text): Use SOURCE_DATE_EPOCH in place of __TIMESTAMP__ if the latter is newer than the former. gcc/ChangeLog: 2016-11-01 Ximin Luo * doc/cppenv.texi (Environment Variables): Update SOURCE_DATE_EPOCH to describe the new effect on __TIMESTAMP__. gcc/testsuite/ChangeLog: 2016-11-01 Ximin Luo * gcc.dg/cpp/source_date_epoch-4.c: New test. * gcc.dg/cpp/source_date_epoch-5.c: New test. Index: gcc-7-20161030/libcpp/macro.c === --- gcc-7-20161030.orig/libcpp/macro.c +++ gcc-7-20161030/libcpp/macro.c @@ -264,7 +264,30 @@ _cpp_builtin_macro_text (cpp_reader *pfi struct tm *tb = NULL; struct stat *st = _cpp_get_file_stat (file); if (st) - tb = localtime (&st->st_mtime); + { + /* Set a reproducible timestamp for __DATE__ and __TIME__ macro + if SOURCE_DATE_EPOCH is defined. */ + if (pfile->source_date_epoch == (time_t) -2 + && pfile->cb.get_source_date_epoch != NULL) + pfile->source_date_epoch = pfile->cb.get_source_date_epoch (pfile); + + if (pfile->source_date_epoch >= (time_t) 0) + { + /* If SOURCE_DATE_EPOCH is set, we want reproducible + timestamps so use gmtime not localtime. */ + if (st->st_mtime >= pfile->source_date_epoch) + tb = gmtime (&pfile->source_date_epoch); + else + /* Don't use SOURCE_DATE_EPOCH if the timestamp is +older, since that means it was probably already +set in a reproducible way (e.g. via source tarball +extraction or some other method). */ + tb = gmtime (&st->st_mtime); + } + else + tb = localtime (&st->st_mtime); + } + if (tb) { char *str = asctime (tb); Index: gcc-7-20161030/gcc/doc/cppenv.texi === --- gcc-7-20161030.orig/gcc/doc/cppenv.texi +++ gcc-7-20161030/gcc/doc/cppenv.texi @@ -83,8 +83,9 @@ main input file is omitted. @item SOURCE_DATE_EPOCH If this variable is set, its value specifies a UNIX timestamp to be used in replacement of the current date and time in the @code{__DATE__} -and @code{__TIME__} macros, so that the embedded timestamps become -reproducible. +and @code{__TIME__} macros, and in replacement of the file modification +date in the @code{__TIMESTAMP__} macro if the latter is newer than the +former, so that the embedded timestamps become reproducible. The value of @env{SOURCE_DATE_EPOCH} must be a UNIX timestamp, defined as the number of seconds (excluding leap seconds) since Index: gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-4.c === --- /dev/null +++ gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-4.c @@ -0,0 +1,13 @@ +/* __TIMESTAMP__ should use SOURCE_DATE_EPOCH if the latter is older. */ +/* { dg-do run } */ +/* { dg-set-compiler-env-var TZ "UTC" } */ +/* { dg-set-compiler-env-var LC_ALL "C" } */ +/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "0" } */ + +int +main () +{ + if (__builtin_strcmp (__TIMESTAMP__, "Thu Jan 1 00:00:00 1970") != 0) +__builtin_abort (); + return 0; +} Index: gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-5.c === --- /dev/null +++ gcc-7-20161030/gcc/testsuite/gcc.dg/cpp/source_date_epoch-5.c @@ -0,0 +1,13 @@ +/* __TIMESTAMP__ should not use SOURCE_DATE_EPOCH if the latter is newer. */ +/* { dg-do run } */ +/* { dg-set-compiler-env-var TZ "UTC" } */ +/* { dg-set-compiler-env-var LC_ALL "C" } */ +/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "253402300799" } */ + +int +main () +{ + if (__builtin_strcmp (__TIMESTAMP__, "Fri Dec 31 23:59:59 UTC ") == 0) +__builtin_abort (); + return 0; +}
[PATCH 1/4] Use SOURCE_PREFIX_MAP envvar as an implicit debug-prefix-map
Define the SOURCE_PREFIX_MAP environment variable, and treat it as an implicit CLI -fdebug-prefix-map option specified before any explicit such options. Acknowledgements Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of time figuring out what to edit. HW42 for discussion on the details of the proposal, and for suggesting that we retain the ability to map the prefix to something other than ".". ChangeLogs -- gcc/ChangeLog: 2016-11-01 Ximin Luo * opts-global.c (add_debug_prefix_map_from_envvar): Add the value of SOURCE_PREFIX_MAP as a debug_prefix_map if the former is set. (handle_common_deferred_options): Call add_debug_prefix_map_from_envvar before processing options. * final.c: (add_debug_prefix_map): Be less specific in the error message, since it can also be triggered by the SOURCE_PREFIX_MAP variable. * doc/invoke.texi (Environment Variables): Document form and behaviour of SOURCE_PREFIX_MAP. gcc/testsuite/ChangeLog: 2016-11-01 Ximin Luo * gcc.dg/debug/dwarf2/source_prefix_map-1.c: New test. * gcc.dg/debug/dwarf2/source_prefix_map-2.c: New test. Index: gcc-7-20161030/gcc/opts-global.c === --- gcc-7-20161030.orig/gcc/opts-global.c +++ gcc-7-20161030/gcc/opts-global.c @@ -310,6 +310,21 @@ decode_options (struct gcc_options *opts finish_options (opts, opts_set, loc); } +/* Add a debug-prefix-map using the SOURCE_PREFIX_MAP environment variable if + it is set. */ + +static void +add_debug_prefix_map_from_envvar () +{ + char *prefix_map; + + prefix_map = getenv ("SOURCE_PREFIX_MAP"); + if (!prefix_map) +return; + + add_debug_prefix_map (prefix_map); +} + /* Hold command-line options associated with stack limitation. */ const char *opt_fstack_limit_symbol_arg = NULL; int opt_fstack_limit_register_no = -1; @@ -335,6 +350,8 @@ handle_common_deferred_options (void) if (flag_opt_info) opt_info_switch_p (NULL); + add_debug_prefix_map_from_envvar (); + FOR_EACH_VEC_ELT (v, i, opt) { switch (opt->opt_index) Index: gcc-7-20161030/gcc/final.c === --- gcc-7-20161030.orig/gcc/final.c +++ gcc-7-20161030/gcc/final.c @@ -1531,7 +1531,7 @@ add_debug_prefix_map (const char *arg) p = strchr (arg, '='); if (!p) { - error ("invalid argument %qs to -fdebug-prefix-map", arg); + error ("invalid value %qs for debug-prefix-map", arg); return; } map = XNEW (debug_prefix_map); Index: gcc-7-20161030/gcc/doc/invoke.texi === --- gcc-7-20161030.orig/gcc/doc/invoke.texi +++ gcc-7-20161030/gcc/doc/invoke.texi @@ -26222,6 +26222,23 @@ Recognize EUCJP characters. If @env{LANG} is not defined, or if it has some other value, then the compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale to recognize and translate multibyte characters. + +@item SOURCE_PREFIX_MAP If this variable is set, it specifies a mapping +that is used to transform filepaths that are output in debug symbols. +This helps the embedded paths become reproducible, without having the +unreproducible value be visible in other input sources - such as GCC +command-line flags or standardised build-time environment variables like +@code{CFLAGS} - that are commonly legitimately-embedded in the build +output by higher-level build processes. + +The form and behaviour is similar to @option{-fdebug-prefix-map}. That +is, the value of @env{SOURCE_PREFIX_MAP} must be of the form +@samp{@var{old}=@var{new}}. The split occurs on the first @code{=} +character, so that @var{old} cannot itself contain a @code{=}. + +Whenever an absolute source- or build-related filepath is to be emitted +in a final end-result output, GCC will replace @var{old} with @var{new} +if that filepath starts with @var{old}. @end table @noindent Index: gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-1.c === --- /dev/null +++ gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-1.c @@ -0,0 +1,9 @@ +/* DW_AT_comp_dir should be relative if SOURCE_PREFIX_MAP is a prefix of it. */ +/* { dg-do compile } */ +/* { dg-options "-gdwarf -dA" } */ +/* { dg-set-compiler-env-var SOURCE_PREFIX_MAP "[file dirname [pwd]]=DWARF2TEST" } */ +/* { dg-final { scan-assembler-dem "DW_AT_comp_dir: \"DWARF2TEST/gcc" } } */ + +void func (void) +{ +} Index: gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-2.c === --- /dev/null +++ gcc-7-20161030/gcc/testsuite/gcc.dg/debug/dwarf2/source_prefix_map-2.c @@ -0,0 +1,8 @@ +/* DW_AT_comp_dir should be absolute if SOURCE_PREFIX_MAP is not set. */ +/* { dg-do compile } */ +/* { dg-op
[PATCH 3/4] Use SOURCE_PREFIX_MAP envvar to transform __FILE__
Honour the SOURCE_PREFIX_MAP environment variable when expanding the __FILE__ macro, in the same way that debug-prefix-map works for debugging symbol paths. This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH. Specifically, we read the environment variable not in libcpp but via a hook which has an implementation defined in gcc/c-family. However, to achieve this is more complex than the earlier patch: we need to share the prefix_map data structure and associated functions between libcpp and c-family. Therefore, we need to move these to libiberty. (For comparison, the SOURCE_DATE_EPOCH patch did not need this because time_t et. al. are in the standard C library.) Acknowledgements Dhole who wrote the earlier patch for SOURCE_DATE_EPOCH which saved me a lot of time on figuring out what to edit. ChangeLogs -- include/ChangeLog: 2016-11-01 Ximin Luo * prefix-map.h: New file, mostly derived from /gcc/final.c. libiberty/ChangeLog: 2016-11-01 Ximin Luo * prefix-map.c: New file, mostly derived from /gcc/final.c. * Makefile.in: Update for new files. gcc/ChangeLog: 2016-11-01 Ximin Luo * final.c: Generalise and refactor code related to debug_prefix_map. Move some of it to /libiberty/prefix-map.c, /include/prefix-map.h and refactor the remaining code to use the moved-out things. * doc/invoke.texi (Environment Variables): Update SOURCE_PREFIX_MAP to describe how it affects __FILE__ expansion. gcc/c-family/ChangeLog: 2016-11-01 Ximin Luo * c-common.c (cb_get_source_prefix_map): Define new call target. * c-common.h (cb_get_source_prefix_map): Declare call target. * c-lex.c (init_c_lex): Set the get_source_prefix_map callback. libcpp/ChangeLog: 2016-11-01 Ximin Luo * include/cpplib.h (cpp_callbacks): Add get_source_prefix_map callback. * init.c (cpp_create_reader): Initialise source_prefix_map field. * internal.h (cpp_reader): Add new field source_prefix_map. * macro.c (_cpp_builtin_macro_text): Set the source_prefix_map field if unset and apply it to the __FILE__ macro. gcc/testsuite/ChangeLog: 2016-11-01 Ximin Luo * gcc.dg/cpp/source_prefix_map-1.c: New test. * gcc.dg/cpp/source_prefix_map-2.c: New test. Index: gcc-7-20161030/include/prefix-map.h === --- /dev/null +++ gcc-7-20161030/include/prefix-map.h @@ -0,0 +1,71 @@ +/* Declarations for manipulating filename prefixes. + + Copyright (C) 2016 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software Foundation, + Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA. */ + +#ifndef _PREFIX_MAP_H +#define _PREFIX_MAP_H + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef HAVE_STDLIB_H +#include +#endif + +/* Linked-list of mappings from old prefixes to new prefixes. */ + +struct prefix_map +{ + const char *old_prefix; + const char *new_prefix; + size_t old_len; + size_t new_len; + struct prefix_map *next; +}; + +/* Parse a single prefix-map. + + The string `arg' is split at the final '=' character. The part before + it is used to set `map->old_prefix' and `map->old_len', and the part + after it is used to set `map->new_prefix' and `map->new_len'. + + If `arg' does not contain a '=' then 0 is returned. Otherwise, a + non-zero value is returned. + */ + +extern int parse_prefix_map (const char *arg, struct prefix_map *map); + +/* Perform mapping of filename prefixes. + + Return the filename corresponding to `old_name'. The return value is + equal to `old_name' if no transformation occurred, else it is equal + to `new_name' where the new filename is stored. + + On entry into this function, `new_name' must be able to hold at least + `(old_name - map->old_len + map->old_len + 1)' characters, where + `map' is the mapping that will be selected and performed. + */ + +extern const char *apply_prefix_map (const char *old_name, char *new_name, +struct prefix_map *map_head); + +#ifdef __cplusplus +} +#endif + +#endif /* _PREFIX_MAP_H */ Index: gcc-7-20161030/libiberty/prefix-map.c === --- /dev/null +++ gcc-7-20161030/libiberty/prefix-map.c @
[PATCH] Generate reproducible output independently of the build-path
(Please keep me on CC, I am not subscribed) Background == We are on a long journey to make build processes be able to reproduce the build outputs independently of which filesystem path the build is being executed from - e.g. if the executing user doesn't have root access to be able to execute it under a standard path such as /build. This currently is making about 2k-3k [1] packages in Debian unreproducible when build-paths are varied across builds. [1] https://tests.reproducible-builds.org/debian/issues/unstable/captures_build_path_issue.html Previous attempts have involved using -fdebug-prefix-map to strip away the prefix of an absolute path, leaving behind the part relative to the top-level directory of the source code, which is reproducible. But this flag was itself stored in DW_AT_producer, which makes the final output unreproducible. This was pointed out in bug 68848 and fixed in r231835. However, through more testing, we have found out that the fix just mentioned is not enough to achieve reproducibility in practice. The main issue is that many different packages like to store CFLAGS et. al. in arbitrary ways. So if we add an explicit -fdebug-prefix-map flag to the system-level CFLAGS etc, this will often propagate into the build result, making it again dependent on the build-path, and not reproducible. For example: Some packages embed compiler flags into their pkg-config files (or equivalent): https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/curl.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/perl.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/qt4-x11.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/fflas-ffpack.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/sip4.html Other packages embed compiler flags directly into the binary: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/singular.html https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/mutt.html etc etc. We think it's not appropriate to patch all (3k+) of these packages to strip out -fdebug-prefix-map flags. This would involve adding quite complex logic to everyone's build scripts, and we have to adapt this logic every single time to that particular package. Also, in general CFLAGS is *supposed* to affect the compiler output, and saving it unconditionally is quite a reasonable thing for packages to do. If we tried to patch all of these packages, we would be turning "reproducible builds" in to a costly opt-in feature, rather than on-by-default that everyone can easily benefit from. So, we believe it is better to patch GCC centrally. Our proposed solution is similar to (a) the SOURCE_DATE_EPOCH environment variable which was previously accepted into GCC and was used to successfully make 400+ packages reproducible, and (b) the -fdebug-prefix-map mechanism that already exists in GCC and which nearly but not quite, achieves at-scale build-path-independent reproducibility. Proposal This patch series adds a new environment variable SOURCE_PREFIX_MAP. When this is set, GCC will treat this as an implicit "-fdebug-prefix-map=$value" command-line argument. This makes the final binary output reproducible, and also hides the unreproducible value (the build path prefix) from CFLAGS et. al. which everyone is (understandably) embedding as-is into their build output. This environment variable also acts on the __FILE__ macro, mapping it in the same way that debug-prefix-map works for debug symbols. We have seen that __FILE__ is also a very large source of unreproducibility, and is represented quite heavily in the 3k+ figure given above. Finally, we tweak the __TIMESTAMP__ macro so it honours SOURCE_DATE_EPOCH, in a similar way to how __DATE__ and __TIME__ do so already. More details are given in the headers of the patch files themselves. Testing === I've tested these patches on a Debian testing/unstable x86_64-linux-gnu system. So far I've only run the new tests that this patch adds, on a disable-bootstrap build. I will do a full bootstrap and run the full testsuite over the next few days, both with and without this patch, and report back. Copyright disclaimer I dedicate these patches to the public domain by waiving all of my rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. See https://creativecommons.org/publicdomain/zero/1.0/legalcode for full text. Please let me know if the above is insufficient and I will be happy to sign any relevant forms.
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On 11/02/2016 03:20 PM, Marek Polacek wrote: > On Wed, Nov 02, 2016 at 11:10:53AM +0100, Jakub Jelinek wrote: >> On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote: Which is gimplified as: int * ptr; switch (argc) , case 1: > { int a; try { ASAN_MARK (2, &a, 4); : goto ; : ptr = &a; goto ; } finally { ASAN_MARK (1, &a, 4); } >> >>> Shouldn't there be also ASAN_MARK on the implicit gotos from the switch >>> statement? Otherwise, consider this being done in a loop, after the first >>> iteration you ASAN_MARK (1, &a, 4) (i.e. poison), then you iterate say with >>> args 1 and have in case 1: a = 0;, won't that trigger runtime error? >> >> Wonder if for the variables declared inside of switch body, because we don't >> care about uses before scope, it wouldn't be more efficient to arrange for >> int *p, *q, *r; >> switch (x) >> { >> int a; >> case 1: >> p = &a; >> a = 5; >> break; >> int b; >> case 2: >> int c; >> q = &b; >> r = &c; >> b = 3; >> c = 4; >> break; >> } >> to effectively ASAN_MARK (2, &a, 4); ASAN_MARK (2, &b, 4); ASAN_MARK (2, &c, >> 4); >> before the GIMPLE_SWITCH, instead of unpoisoning them on every case label >> where they might be in scope. Though, of course, at least until lower pass >> that is quite ugly, because it would refer to "not yet declared" variables. >> Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not >> the expression evaluation of the switch control expression) inside of the >> switches' GIMPLE_BIND. > > So is there anything I should do wrt -Wswitch-unreachable? > > Marek > Probably not. I'm having a patch puts GIMPLE_SWITCH statement to a proper place in GIMPLE_BIND. Let's see whether such patch can bootstrap and survive regression tests. Thanks, Martin
[PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those
Hi! This is a merge of my patch from yesterday, Jason's incremental patch to that and parts of Alex' patch from Oct 19. It uses Alex' LANG_HOOKS_GET_PTRMEMFN_TYPE langhook. I've tried to think about https://gcc.gnu.org/ml/gcc-patches/2011-05/msg00227.html and we even have such a langhook now, modified_type_die uses lang_hooks.types.get_debug_type, but 1) it is just called in modified_type_die and not in gen_type_die_with_usage, that looks weird 2) it is used for something Ada-ish I really don't know how to test etc. to be able to find out if it is safe to call it in gen_type_die_with_usage too 3) most importantly, if the C++ version of this langhook would create OFFSET_TYPE on the fly, I don't know how to ensure effective sharing of DW_TAG_ptr_to_member_type nodes with the same DW_AT_type and DW_AT_containing_type; unless the C++ langhook adds some extra hash table that caches already created OFFSET_TYPEs or something similar, it would create a new OFFSET_TYPE each time it is called Also, I really don't know how well does GDB (especially older releases) handle DW_TAG_ptr_to_member_type for PMF, so the patch wraps that currently with if (dwarf_version >= 5). Quick grep revealed that GDB has code to handle the __pfn/__delta fields. So, can I ask somebody from the GDB team to test this patch with that if (dwarf_version >= 5) replaced with if (1) and see if it works properly with current GDB as well as say 4-5 years old one (e.g. with -gdwarf-2 or -gdwarf-3)? If yes, we should emit it unconditionally. Bootstrapped/regtested on x86_64-linux and i686-linux, on the new ref-3.C testcase emits the number of DIEs I really expect to. Ok for trunk? 2016-11-02 Jakub Jelinek Alexandre Oliva Jason Merrill PR debug/28767 PR debug/56974 * langhooks.h (struct lang_hooks_for_types): Document type_hash_eq being also called on METHOD_TYPEs. Add type_dwarf_attribute and get_ptrmemfn_type langhooks. * langhooks.c (lhd_type_dwarf_attribute): New function. * langhooks-def.h (lhd_type_dwarf_attribute): Declare. (LANG_HOOKS_TYPE_DWARF_ATTRIBUTE, LANG_HOOKS_GET_PTRMEMFN_TYPE): Define. (LANG_HOOKS_FOR_TYPES_INITIALIZER): Add LANG_HOOKS_TYPE_DWARF_ATTRIBUTE and LANG_HOOKS_GET_PTRMEMFN_TYPE. * hooks.h (hook_tree_const_tree_int_null): Declare. * hooks.c (hook_tree_const_tree_int_null): New function. * tree.h (check_lang_type): Declare. * tree.c (check_lang_type): New function. (check_qualified_type, check_aligned_type): Call it. * dwarf2out.c (modified_type_die): Don't use type_main_variant for FUNCTION_TYPE or METHOD_TYPE, instead walk over variants with check_base_type and check_lang_type. (gen_ptr_to_mbr_type_die): Add class_type and member_type arguments, allow type to be something other than OFFSET_TYPE, if lookup_type_die is already non-NULL, return early. For OFFSET_TYPE add DW_AT_use_location attribute. (gen_subroutine_type_die): Add DW_AT_{,rvalue_}reference attribute if needed. (gen_type_die_with_usage): Don't use type_main_variant for FUNCTION_TYPE or METHOD_TYPE, instead walk over variants with check_base_type and check_lang_type. Formatting fixes. Adjust gen_ptr_to_mbr_type_die caller. Handle PMF. cp/ * tree.c (cp_check_qualified_type): Use check_base_type and TYPE_QUALS comparison instead of check_qualified_type. (cxx_type_hash_eq): Return false if type_memfn_rqual don't match. * cp-objcp-common.c (cp_get_ptrmemfn_type): New function. (cp_decl_dwarf_attribute): Don't handle types here. (cp_type_dwarf_attribute): New function. * cp-objcp-common.h (cp_get_ptrmemfn_type, cp_type_dwarf_attribute): Declare. (LANG_HOOKS_GET_PTRMEMFN_TYPE, LANG_HOOKS_TYPE_DWARF_ATTRIBUTE): Define. testsuite/ * g++.dg/debug/dwarf2/ptrdmem-1.C: New test. * g++.dg/debug/dwarf2/ref-3.C: New test. * g++.dg/debug/dwarf2/refqual-1.C: New test. * g++.dg/debug/dwarf2/refqual-2.C: New test. --- gcc/langhooks.h.jj 2016-11-01 15:18:44.994506161 +0100 +++ gcc/langhooks.h 2016-11-02 11:43:51.905046755 +0100 @@ -120,7 +120,7 @@ struct lang_hooks_for_types /* Return TRUE if TYPE1 and TYPE2 are identical for type hashing purposes. Called only after doing all language independent checks. At present, this function is only called when both TYPE1 and TYPE2 are - FUNCTION_TYPEs. */ + FUNCTION_TYPE or METHOD_TYPE. */ bool (*type_hash_eq) (const_tree, const_tree); /* Return TRUE if TYPE uses a hidden descriptor and fills in information @@ -162,6 +162,15 @@ struct lang_hooks_for_types for the debugger about scale factor, etc. */ bool (*get_fixed_point_type_info) (const_tree, struct
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 02, 2016 at 03:27:42PM +0100, Martin Liška wrote: > > So is there anything I should do wrt -Wswitch-unreachable? > > > > Marek > > > > Probably not. I'm having a patch puts GIMPLE_SWITCH statement to a proper > place > in GIMPLE_BIND. Let's see whether such patch can bootstrap and survive > regression > tests. Please do that only for -fsanitize-use-after-scope, it will likely affect at least for -O0 the debugging experience. For -O0 -fsanitize=address -fsanitize-use-after-scope perhaps we could arrange for some extra stmt to have the locus of the switch (where we still don't want the vars to appear in scope) and then have no locus on the ASAN_MARK and actual GIMPLE_SWITCH or something similar. Jakub
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On 11/02/2016 02:16 PM, Richard Biener wrote: > On Wed, Nov 2, 2016 at 2:06 PM, Jakub Jelinek wrote: >> On Wed, Nov 02, 2016 at 01:59:00PM +0100, Richard Biener wrote: Yeah, that is what I meant. The issue is how to report uses of such SSA_NAME when there is no memory. So, either we'd need a special runtime library entrypoint that would report uses after scope even when there is no underlying memory, or we'd need to force it at asan pass time into memory again. >>> >>> Well, there can't be any uses outside the scope -- there are no (memory) >>> uses >>> left if we rewrite the thing into SSA. That is, the address can no >>> longer "escape". >>> >>> Of course there could have been invalid uses before the rewrite into SSA. >>> But >>> those can be diagnosed either immediately before or after re-writing into >>> SSA >>> at compile-time (may be in dead code regions of course). >> >> Sure, we can warn on those at compile time, but we really should arrange to >> error on those at runtime if they are ever executed, the UB happens only at >> runtime, so in dead code isn't fatal. > > Then we can replace those uses with a call into the asan runtime diagnosing > the > issue instead? > > Richard. > >> Jakub OK, thanks for the clarification, it's more clear to me. So we want to consider for SSA transformation of ASAN_MARK only is_gimple_reg_types. I'm having a test-case where it converts: foo () { char a; char * p; char _1; int _2; int _8; int _9; : ASAN_MARK (2, &a, 1); a = 0; p_6 = &a; ASAN_MARK (1, &a, 1); _1 = *p_6; if (_1 != 0) goto ; else goto ; : _9 = 1; goto ; : _8 = 0; : # _2 = PHI <_9(3), _8(4)> return _2; } to: foo () { char a; char * p; char _1; int _2; : a_10 = 0; a_12 = ASAN_POISON (); _1 = a_12; if (_1 != 0) goto ; else goto ; : : # _2 = PHI <1(2), 0(3)> return _2; } and probably the last goal is to convert the newly added internal fn to a runtime call. Hope sanopt pass is the right place where to it? Thanks, Martin
Re: [PATCH 0/3] use rtx_insn * more
On Wed, 2016-11-02 at 00:05 -0400, Trevor Saunders wrote: > On Mon, Oct 31, 2016 at 07:37:54AM -0600, Jeff Law wrote: > > On 10/28/2016 01:13 PM, tbsaunde+...@tbsaunde.org wrote: > > > From: Trevor Saunders > > > > > > HI, > > > > > > This series changes various variables type from rtx to rtx_insn * > > > so that the > > > remaining patches in this series > > > http://gcc.gnu.org/ml/gcc-patches/2016-10/msg01353.html can be > > > applied. > > > > > > patches bootstrapped and regtested on x86_64-linux-gnu, and run > > > through config-list.mk, ok? > > > > > > Thanks! > > > > > > Trev > > > > > > Trevor Saunders (3): > > > use rtx_insn * in various places where it is obvious > > > split up the trial variable in reorg.c:relax_delay_slots to use > > > rtx_insn * more > > > split up some variables to use rtx_insn * more > > All 3 patches in this series are fine. > > Thanks for the reviews. > > Can I trouble you to look at > http://gcc.gnu.org/ml/gcc-patches/2016-10/ > now that it is tested on aarch64? I think you're missing the full URL here; the above URL is the index for all of October. Did you mean this patch: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01358.html ?
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Wed, Nov 02, 2016 at 03:38:25PM +0100, Martin Liška wrote: > it converts: > foo () > { > char a; > char * p; > char _1; > int _2; > int _8; > int _9; > > : > ASAN_MARK (2, &a, 1); > a = 0; > p_6 = &a; > ASAN_MARK (1, &a, 1); > _1 = *p_6; You shouldn't convert if a is addressable (when ignoring &a in ASAN_MARK calls). Only if there is &a just in ASAN_MARK and MEM_REF, you can convert. > to: > > foo () > { > char a; > char * p; > char _1; > int _2; > > : > a_10 = 0; > a_12 = ASAN_POISON (); > _1 = a_12; > if (_1 != 0) > goto ; > else > goto ; > > : > > : > # _2 = PHI <1(2), 0(3)> > return _2; > > } > > and probably the last goal is to convert the newly added internal fn to a > runtime call. > Hope sanopt pass is the right place where to it? If ASAN_POISON is ECF_CONST and has any uses during sanopt, perhaps best would be to add an artificial variable you give the same name as the underlying var of the SSA_NAME (and alignment, locus etc.) and poison it right away (keep unpoisoning only to the function epilogue) and then ASAN_CHECK replace all uses of that SSA_NAME with ASAN_CHECK + use of (D) SSA_NAME. Jakub
[PATCH] Fix for big-endian gcc.c-torture/execute/pr55750.c
Hi all, I noticed that my patch for PR tree-optimization/78170 broke execute.exp=pr55750.c on big-endian. The problem with that patch is that we should not forget to clear the padding bits in the temporary buffer when merging values even when they are less than a byte. Tested on aarch64_be-none-elf (the test failure goes away) Bootstrap and test on x86_64 ongoing. Ok for trunk if successful? David, does this patch allow AIX bootstrap to proceed by any chance? Thanks, Kyrill 2016-11-02 Kyrylo Tkachov * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Don't forget to clear padding bits even when they're less than a byte. diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c index 081620e50f603e2de8ed962aec6c619890ce1e33..db3c3c14a5b8938024db8bed80145c77d29396ac 100644 --- a/gcc/gimple-ssa-store-merging.c +++ b/gcc/gimple-ssa-store-merging.c @@ -430,7 +430,8 @@ encode_tree_to_bitpos (tree expr, unsigned char *ptr, int bitlen, int bitpos, contain a sign bit due to sign-extension). */ unsigned int padding = byte_size - ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT - 1; - if (padding != 0) + if (padding != 0 + || bitlen % BITS_PER_UNIT != 0) { /* On big-endian the padding is at the 'front' so just skip the initial bytes. */
[PATCH] Fix PR78185
The following fixes PR78185 by properly honoring possibly infinite child loops when computing what blocks are always executed during loop invariant motion. Such loops behave as if the loop would exit at this point. Both GIMPLE and RTL level passes have that very same issue and the following fixes the GIMPLE one and simply disables hoisting of possibly trapping or faulting instructions in the RTL pass ... Eric seems to remember this might regress gzip so I'm going to put this on one of our SPEC testers for tonights run as well. Maybe somebody else wants to check the performance impact (I'm interested in both, GIMPLE and RTL change fallout for performance). Bootstrap and regtest running on x86_64-unknown-linux-gnu. If all goes well I'll followup with the obvious removal of no longer needed stuff in loop-invariant.c. Another variant for RTL would be to simply treat all edges entering a child loop (not only those entering possibly infinitely looping ones) as an exit. I think finite_loop_p has no RTL level variant (yet). Richard. 2016-11-02 Richard Biener PR middle-end/78185 * loop-invariant.c (find_invariant_insn): Never hoist trapping or faulting instructions. * tree-ssa-loop-im.c: Include tree-ssa-loop-niter.h. (fill_always_executed_in_1): Honor infinite child loops. * gcc.dg/pr78185.c: New testcase. diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 551103f..deb5be6 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -1076,7 +1076,7 @@ pre_check_invariant_p (bool simple, rtx dest) unless the program ends due to a function call. */ static void -find_invariant_insn (rtx_insn *insn, bool always_reached, bool always_executed) +find_invariant_insn (rtx_insn *insn, bool, bool always_executed) { df_ref ref; struct def *def; @@ -1108,8 +1108,8 @@ find_invariant_insn (rtx_insn *insn, bool always_reached, bool always_executed) if (can_throw_internal (insn)) return; - /* We cannot make trapping insn executed, unless it was executed before. */ - if (may_trap_or_fault_p (PATTERN (insn)) && !always_reached) + /* We cannot make trapping insn executed. */ + if (may_trap_or_fault_p (PATTERN (insn))) return; depends_on = BITMAP_ALLOC (NULL); diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 463db04..0524e57 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not see #include "trans-mem.h" #include "gimple-fold.h" #include "tree-scalar-evolution.h" +#include "tree-ssa-loop-niter.h" /* TODO: Support for predicated code motion. I.e. @@ -2369,8 +2370,16 @@ fill_always_executed_in_1 (struct loop *loop, sbitmap contains_call) break; FOR_EACH_EDGE (e, ei, bb->succs) - if (!flow_bb_inside_loop_p (loop, e->dest)) - break; + { + /* If there is an exit from this BB. */ + if (!flow_bb_inside_loop_p (loop, e->dest)) + break; + /* Or we enter a possibly non-finite loop. */ + if (flow_loop_nested_p (bb->loop_father, + e->dest->loop_father) + && ! finite_loop_p (e->dest->loop_father)) + break; + } if (e) break; Index: trunk/gcc/testsuite/gcc.dg/pr78185.c === --- trunk/gcc/testsuite/gcc.dg/pr78185.c(revision 0) +++ trunk/gcc/testsuite/gcc.dg/pr78185.c(revision 0) @@ -0,0 +1,28 @@ +/* { dg-do run { target *-*-linux* *-*-gnu* } } */ +/* { dg-options "-O" } */ + +#include +#include +#include + +static char var1 = 0L; +static char *var2 = &var1; + +void do_exit (int i) +{ + exit (0); +} + +int main(void) +{ + struct sigaction s; + sigemptyset (&s.sa_mask); + s.sa_handler = do_exit; + s.sa_flags = 0; + sigaction (SIGALRM, &s, NULL); + alarm (1); + /* The following loop is infinite, the division by zero should not + be hoisted out of it. */ + for (; (var1 == 0 ? 0 : (100 / var1)) == *var2; ); + return 0; +}
Re: [PATCH] Fix for big-endian gcc.c-torture/execute/pr55750.c
On Wed, 2 Nov 2016, Kyrill Tkachov wrote: > Hi all, > > I noticed that my patch for PR tree-optimization/78170 broke > execute.exp=pr55750.c on big-endian. > The problem with that patch is that we should not forget to clear the padding > bits in the temporary > buffer when merging values even when they are less than a byte. > > Tested on aarch64_be-none-elf (the test failure goes away) > Bootstrap and test on x86_64 ongoing. > Ok for trunk if successful? Ok. Thanks, Richard. > David, does this patch allow AIX bootstrap to proceed by any chance? Crossing fingers ;) > Thanks, > Kyrill > > 2016-11-02 Kyrylo Tkachov > > * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Don't forget to > clear padding bits even when they're less than a byte. >
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On 11/02/2016 03:51 PM, Jakub Jelinek wrote: > On Wed, Nov 02, 2016 at 03:38:25PM +0100, Martin Liška wrote: >> it converts: >> foo () >> { >> char a; >> char * p; >> char _1; >> int _2; >> int _8; >> int _9; >> >> : >> ASAN_MARK (2, &a, 1); >> a = 0; >> p_6 = &a; >> ASAN_MARK (1, &a, 1); >> _1 = *p_6; > > You shouldn't convert if a is addressable (when ignoring &a in ASAN_MARK > calls). Only if there is &a just in ASAN_MARK and MEM_REF, you can convert. Sure, which should be done in execute_update_addresses_taken via gimple_ior_addresses_taken. > >> to: >> >> foo () >> { >> char a; >> char * p; >> char _1; >> int _2; >> >> : >> a_10 = 0; >> a_12 = ASAN_POISON (); >> _1 = a_12; >> if (_1 != 0) >> goto ; >> else >> goto ; >> >> : >> >> : >> # _2 = PHI <1(2), 0(3)> >> return _2; >> >> } >> >> and probably the last goal is to convert the newly added internal fn to a >> runtime call. >> Hope sanopt pass is the right place where to it? > > If ASAN_POISON is ECF_CONST and has any uses during sanopt, perhaps best > would be to add an artificial variable you give the same name as the > underlying var of the SSA_NAME (and alignment, locus etc.) and poison it > right away (keep unpoisoning only to the function epilogue) and then > ASAN_CHECK replace all uses of that SSA_NAME with ASAN_CHECK + use of > (D) SSA_NAME. When I create an ASAN_POISON call in execute_update_addresses_taken, there would not be any ASAN_CHECK generated as it's going to be rewritten to SSA form (like the previous sample I sent). I like the idea of having a parallel variable, which can be poisoned at the very beginning of a function. Whenever we have a use of the SSA_NAME (like a_12 = ASAN_POISON ()), we can simply insert BUILT_IN_ASAN_REPORT_LOADx(¶llel_variable) statement. No change would be necessary for ASAN runtime in such case. Will it work? Thanks, Martin > > Jakub >
Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)
On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote: > >> I don't believe it needs a cleanup on every iteration. One cleanup at > >> the end should work fine. > > > > But as the comment there says: > > > > /* Merge the duplicated blocks into predecessors, when possible. */ > > cleanup_cfg (0); > > > > (this is not a new comment), and without merging blocks this whole > > patch does zilch. > > > > There is no need to create new basic blocks at all (can replace the > > final branch in a block with a copy of the whole block it jumps to, > > instead); and then it is painfully obvious that switching to layout > > mode here is pointless, too. > > > > Do you want me to do that? > > > > Btw, this isn't quadratic anyway; it is a constant number (the maximal > > length allowed of the block to copy) linear. Unless there are unboundedly > > many forwarder blocks, which there shouldn't be, but I cannot prove that. > > Well, you iterate calling functions (find candidates, merge, cleanup-cfg) that > walk the whole function. So unless the number of iterations is bound > with a constant I call this quadratic (in function size). It is bounded (with that caveat above): new candidates will be bigger than the block merged into it, so there won't be more than PARAM_MAX_GOTO_DUPLICATION_INSNS passes. But I can make it all work simpler, in non-layout mode, if you prefer. Segher
Re: [PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those
On Wed, Nov 2, 2016 at 10:31 AM, Jakub Jelinek wrote: > It uses Alex' LANG_HOOKS_GET_PTRMEMFN_TYPE langhook. I've tried > to think about https://gcc.gnu.org/ml/gcc-patches/2011-05/msg00227.html > and we even have such a langhook now, modified_type_die > uses lang_hooks.types.get_debug_type, but > 1) it is just called in modified_type_die and not in >gen_type_die_with_usage, that looks weird How much of a problem is that? modified_type_die calls gen_type_die, does that not cover the cases needed here? > 2) it is used for something Ada-ish I really don't know how to test etc. >to be able to find out if it is safe to call it in >gen_type_die_with_usage too You could find an Ada test that uses the code and verify that the output stays the same? > 3) most importantly, if the C++ version of this langhook would create >OFFSET_TYPE on the fly, I don't know how to ensure effective sharing >of DW_TAG_ptr_to_member_type nodes with the same DW_AT_type >and DW_AT_containing_type; unless the C++ langhook adds some extra >hash table that caches already created OFFSET_TYPEs or something similar, >it would create a new OFFSET_TYPE each time it is called build_offset_type already uses a hash table. > Also, I really don't know how well does GDB (especially older releases) > handle DW_TAG_ptr_to_member_type for PMF, so the patch wraps that currently > with if (dwarf_version >= 5). Quick grep revealed that GDB has code to > handle the __pfn/__delta fields. So, can I ask somebody from the GDB > team to test this patch with that if (dwarf_version >= 5) replaced > with if (1) and see if it works properly with current GDB as well as say > 4-5 years old one (e.g. with -gdwarf-2 or -gdwarf-3)? If yes, we > should emit it unconditionally. This all makes sense to me. > + if (dwarf_version >= 5) > + { > + tree class_type = lang_hooks.types.get_ptrmemfn_type (type, 0); > + if (class_type != NULL_TREE) This can be if (dwarf_version >= 5) if (tree class_type = lang_hooks.types.get_ptrmemfn_type (type, 0)) Jason
Re: [gcc] Enable DW_OP_VAL_EXPRESSION support in dwarf module
OK. On Wed, Nov 2, 2016 at 10:18 AM, Jiong Wang wrote: > On 02/11/16 13:42, Jakub Jelinek wrote: >> >> On Wed, Nov 02, 2016 at 01:26:48PM +, Jiong Wang wrote: >>> >>> -/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION >>> note. */ >>> +/* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION >>> note. */ >> >> >> Too long line. > > > Hmm, it shows 80 columns under my editor. I guess '+' is counted in? > >> >>> +/* RTL sequences inside PARALLEL are raw expression representation. >>> + >>> + mem_loc_descriptor can be used to build generic DWARF expressions >>> for >>> + DW_CFA_expression and DW_CFA_val_expression where the expression >>> may can >>> + not be represented using normal RTL sequences. In this case, >>> group all >>> + expression operations (DW_OP_*) inside a PARALLEL. For those >>> DW_OP which >>> + doesn't have RTL mapping, wrap it using UNSPEC. The logic for >>> parsing >>> + PARALLEL sequences is: >>> + >>> + foreach elem inside PARALLEL >>> + if (elem is UNSPEC) >>> + dw_op = XINT (elem, 1) (DWARF operation is kept as UNSPEC >>> number) >>> + oprnd1 = XVECEXP (elem, 0, 0) >>> + oprnd2 = XVECEXP (elem, 0, 1) >>> + else >>> + call mem_loc_descriptor */ >> >> >> Not sure if it is a good idea to document in weirdly formatted >> pseudo-language what the code actually does a few lines below. IMHO >> either >> express it in words, or don't express it at all. > > > OK, fixed. I replaced these comments as some brief words. > >> >>> + exp_result = >>> + new_loc_descr ((enum dwarf_location_atom) dw_op, >>> oprnd1, >>> +oprnd2); >> >> >> Wrong formatting, = should be on the next line. >> >>> + } >>> + else >>> + exp_result = >>> + mem_loc_descriptor (elem, mode, mem_mode, >>> + VAR_INIT_STATUS_INITIALIZED); >> >> >> Likewise. > > > Both fixed. Patch updated, please review. > > > Thanks. > > gcc/ > 2016-11-02 Jiong Wang > > * reg-notes.def (CFA_VAL_EXPRESSION): New entry. > * dwarf2cfi.c (dwarf2out_frame_debug_cfa_val_expression): New > function. > (dwarf2out_frame_debug): Support REG_CFA_VAL_EXPRESSION. > (output_cfa_loc): Support DW_CFA_val_expression. > (output_cfa_loc_raw): Likewise. > (output_cfi): Likewise. > (output_cfi_directive): Likewise. > * dwarf2out.c (dw_cfi_oprnd1_desc): Support DW_CFA_val_expression. > (dw_cfi_oprnd2_desc): Likewise. > (mem_loc_descriptor): Recognize new pattern generated for value > expression. >
[PATCH][GCC][testsuite] Fix failing vminnm/vmaxnm tests for ARM
Hi all, This fixes the ARM failures in the testsuite. Previously these tests were gated on if ARMv8-a support was available in the compiler, not if the hardware was an ARMv8-a hardware. This thus resulted in correct code, but wouldn't run on any other hardware. Ran regression tests on aarch64-none-linux-gnu, arm-none-linux-gnueabihf. Ok for trunk? Thanks, Tamar gcc/testsuite/ 2016-11-01 Tamar Christina * gcc.target/arm/simd/vmaxnm_f32_1.c (dg-require-effective-target): Check for arm_v8_neon_hw. * gcc.target/arm/simd/vmaxnmq_f32_1.c (dg-require-effective-target): Likewise. * gcc.target/arm/simd/vminnm_f32_1.c (dg-require-effective-target): Likewise. * gcc.target/arm/simd/vminnmq_f32_1.c(dg-require-effective-target): Likewise. gcc.arm-fix-vminnm-vmaxnm-tests Description: gcc.arm-fix-vminnm-vmaxnm-tests
Re: [PATCH] combine: Improve change_zero_ext (fixes PR71847)
On Tue, Oct 25, 2016 at 11:40 AM, Segher Boessenkool wrote: > This improves a few things in change_zero_ext. Firstly, it should use > the passed in pattern in recog_for_combine, not the pattern of the insn > (they are not the same if the whole pattern was replaced). Secondly, > it handled zero_ext of a subreg, but with hard registers we do not get > a subreg, instead the mode of the reg is changed. So this handles that. > Thirdly, after changing a zero_ext to an AND, the resulting RTL may become > non-canonical, like (ior (ashift ..) (and ..)); the AND should be first, > it is commutative. And lastly, zero_extract as a set_dest wasn't handled > at all, but now it is. > > This fixes the testcase in PR71847, and improves code generation in some > other edge cases too. > > Tested on powerpc64-linux {-m32,-m64}; I'll also test it on x86 and > will build lots of crosses before committing. Hi Segher, This causes failure @ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78186 It includes how to report it, could you please have a look? Thanks. Thanks, Bin
[PATCH v2] AIX visibility
This revised patch makes two changes: 1) Fix typo in configure.ac 2) Add AIX visibility support for ASM_WEAKEN_DECL, which does touch the same code as Linux. The AIX "weak" support fixes a large number of C++ visibility testcases. Bootstrapped on powerpc-ibm-aix7.2.0.0. * configure.ac (.hidden): Change to conftest_s string. Provide string for AIX assembler. (gcc_cv_ld_hidden): Yes for AIX. * configure: Regenerate. * dwarf2asm.c (USE_LINKONCE_INDIRECT): Don't set for AIX (XCOFF). * config/rs6000/rs6000-protos.h (rs6000_asm_weaken_decl): Declare (rs6000_xcoff_asm_output_aligned_decl_common): Declare. * config/rs6000/xcoff.h (TARGET_ASM_GLOBALIZE_DECL_NAME): Define. (ASM_OUTPUT_ALIGNED_DECL_COMMON): Define. (ASM_OUTPUT_ALIGNED_COMMON): Delete. * config/rs6000/rs6000.c (rs6000_init_builtins): Change clog rename from #if to if. (rs6000_xcoff_visibility): New. (rs6000_xcoff_declare_function_name): Add visibility support. (rs6000_xcoff_asm_globalize_decl_name): New. (rs6000_xcoff_asm_output_aligned_decl_common): New. (rs6000_asm_weaken_decl): New. (rs6000_code_end): Disable HIDDEN_LINKONCE on XCOFF. config/rs6000/rs6000.h (ASM_WEAKEN_DECL): Change definition to reference function. dwarf2asm.c okay? Any comments on ASM_WEAKEN_DECL change? Thanks, David ZZ Description: Binary data
Re: [PATCH][GCC][testsuite] Fix failing vminnm/vmaxnm tests for ARM
On 02/11/16 15:38, Tamar Christina wrote: Hi all, This fixes the ARM failures in the testsuite. Previously these tests were gated on if ARMv8-a support was available in the compiler, not if the hardware was an ARMv8-a hardware. This thus resulted in correct code, but wouldn't run on any other hardware. Ran regression tests on aarch64-none-linux-gnu, arm-none-linux-gnueabihf. Ok for trunk? Looks ok to me too. Kyrill Thanks, Tamar gcc/testsuite/ 2016-11-01 Tamar Christina * gcc.target/arm/simd/vmaxnm_f32_1.c (dg-require-effective-target): Check for arm_v8_neon_hw. * gcc.target/arm/simd/vmaxnmq_f32_1.c (dg-require-effective-target): Likewise. * gcc.target/arm/simd/vminnm_f32_1.c (dg-require-effective-target): Likewise. * gcc.target/arm/simd/vminnmq_f32_1.c(dg-require-effective-target): Likewise.
Re: [PATCH] enhance buffer overflow warnings (and c/53562)
On 11/02/2016 01:37 AM, Jakub Jelinek wrote: On Tue, Nov 01, 2016 at 08:55:03PM -0600, Martin Sebor wrote: struct S { int a, b, c, d; }; #define bos(p, t) __builtin_object_size (p, t) #define memset0(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 0)) #define memset1(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 1)) void f0 (struct S *s) { memset0 (&s->d, 0, 1024); // no warning here (bos 0) } But we do not want the warning here, there is nothing wrong on it. The caller may be void bar (void) { struct T { struct S header; char payload[1024 - offsetof (struct S, d)]; } t; initialize (&t); f0 (&t.header); } and the callee might rely on that. Sure, they might and in that case the warning would be a false positive. It wouldn't be the first such warning that wasn't 100% free of them. But my testing with Binutils, GCC, and the Linux kernel has exposed only 10 instances of new warnings and I don't think I saw this idiom among them. But even if some were, or if all of them were false positives I think that would be well within the acceptable rates. Here are the numbers of warnings for Binutils and the kernel: 113 -Wimplicit-fallthrough 38 -Wformat-length= 12 -Wunused-const-variable= 10 -Wstringop-overflow 2 -Wdangling-else 2 -Wframe-address 2 -Wint-in-bool-context 1 -Wbool-operation Using some header structure at the beginning and then conditionally on fields in that structure various payloads occurs in many projects, starting with glibc, gcc, Linux kernel, ... The warning really must not be detached from reality. That's an unfair assertion in light of the numbers above. If you want a warning for suspicious calls, sure, but 1) it has to be clearly worded significantly differently from how do you word it, so that users really understand you are warning about suspicious code (though, I really believe it is very common and there is really nothing the users can do about it) 2) it really shouldn't be enabled in -Wall, and I think not even in -Wextra As you have argued yourself recently in our discussion of -Wimplicit-fallthrough, warnings that aren't enabled by either of these options don't generally benefit users because very few turn them on explicitly. I agree with that argument although I would be in favor of rolling out a new warning gradually over two or more releases if it were known to be prone to high rates of false positive. The -Wstringop-overflow warning clearly isn't in that category so there's no need for it. My offer to do additional testing is still good if you'd like to see more data. As for the wording, I welcome suggestions for improvements. Martin
[PATCH, Fortran] Extension: Support legacy PARAMETER statements with -std=legacy (or -fdec?)
All, Another quirk of legacy compilers is their syntax for PARAMETER statements. Such statements are similar to standard PARAMETER statements but lack parentheses following the PARAMETER keyword. There is a good reason the standard doesn't support this - because the statement becomes ambiguous with assignment statements in fixed form. Consider the following: parameter pi = 3.14d0 In fixed form, spaces are ignored so in standard Fortran the above looks like an assignment to the variable "parameterpi". In legacy compilers, the above is instead interpreted as parameter (pi = 3.14d0) which of course declares the variable 'pi' to be a parameter with the value 3.14. Attached is a patch for the GNU Fortran front-end which allows the compiler to interpret these legacy PARAMETER statements. The patch in its current form does this without warning through -std=legacy (warning for -std=gnu, error for -std=f*). Bootstraps and regtests on x86_64-redhat-linux. However, note that this would change by default the compiler's interpretation of fixed-form variables starting with the string "parameter", if any such cases existed in real code. IMO fixed form code is isomorphic to legacy code, so I imagine most users writing fixed-form/legacy code would intend for a legacy PARAMETER statement, rather than assignment to variable PARAMETERPI, when writing such a statement. Beyond compatibility, one motivation for acceptance/recognizance of these statements in GNU Fortran is the counterintuitive compile errors currently seen when compiling legacy code which uses this type of statement. Because the legacy PARAMETER statement is currently interpreted as an assignment in fixed-form, and parameter statements often precede other specification statements in real code, GNU Fortran complains not about the parameter statement itself but often the following line with "Unexpected data declaration" or similar. This serves to confuse the user. An example of this can be seen by compiling the attached dec_parameter_1.f in fixed form with the current build of GNU Fortran. I am amenable to only enabling support for legacy PARAMETER statements through -fdec, so that the default (-std=gnu) behavior is unchanged for such cases in fixed form. But this would leave the esoteric "Unexpected data declaration statement" errors for legacy code without -fdec. The case would be difficult to detect and work around specifically since the entire parameter "assignment" statement is eaten by the time the error comes about. What do you think? --- Fritz Reese Author: Fritz O. Reese Date: Tue Nov 1 12:26:57 2016 -0400 Support legacy PARAMETER statements with -std=legacy. gcc/fortran/ * decl.c (gfc_match_parameter): Allow omitted '()' with -std=legacy. * parse.c (decode_statement): Match "parmeter" before assignments. * gfortran.texi: Document. gcc/testsuite/gfortran.dg/ * dec_parameter_1.f: New test. * dec_parameter_2.f90: Likewise. * dec_parameter_3.f90: Likewise. * dec_parameter_4.f90: Likewise. gcc/fortran/decl.c| 10 +++- gcc/fortran/gfortran.texi | 16 ++ gcc/fortran/parse.c |4 +- gcc/testsuite/gfortran.dg/dec_parameter_1.f | 64 + gcc/testsuite/gfortran.dg/dec_parameter_2.f90 | 63 gcc/testsuite/gfortran.dg/dec_parameter_3.f90 | 13 + gcc/testsuite/gfortran.dg/dec_parameter_4.f90 | 13 + 7 files changed, 180 insertions(+), 3 deletions(-) diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index f18eb41..0120ceb 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -7821,10 +7821,16 @@ cleanup: match gfc_match_parameter (void) { + const char *term = " )%t"; match m; if (gfc_match_char ('(') == MATCH_NO) -return MATCH_NO; +{ + /* With legacy PARAMETER statements, don't expect a terminating ')'. */ + if (!gfc_notify_std (GFC_STD_LEGACY, "PARAMETER without '()' at %C")) + return MATCH_NO; + term = " %t"; +} for (;;) { @@ -7832,7 +7838,7 @@ gfc_match_parameter (void) if (m != MATCH_YES) break; - if (gfc_match (" )%t") == MATCH_YES) + if (gfc_match (term) == MATCH_YES) break; if (gfc_match_char (',') != MATCH_YES) diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index 85ab31b..1d60551 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -1472,6 +1472,7 @@ compatibility extensions along with those enabled by @option{-std=legacy}. * Bitwise logical operators:: * Extended I/O specifiers:: * Default exponents:: +* Legacy PARAMETER statements:: @end menu @node Old-style kind specifications @@ -2705,6 +2706,21 @@ For compatibility, GNU Fortran supports a default exponent of zero in real constants with @option{-fdec}. For example, @code{9e} would be interpret
Re: [PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those
On Wed, Nov 02, 2016 at 11:31:25AM -0400, Jason Merrill wrote: > On Wed, Nov 2, 2016 at 10:31 AM, Jakub Jelinek wrote: > > It uses Alex' LANG_HOOKS_GET_PTRMEMFN_TYPE langhook. I've tried > > to think about https://gcc.gnu.org/ml/gcc-patches/2011-05/msg00227.html > > and we even have such a langhook now, modified_type_die > > uses lang_hooks.types.get_debug_type, but > > 1) it is just called in modified_type_die and not in > >gen_type_die_with_usage, that looks weird > > How much of a problem is that? modified_type_die calls gen_type_die, > does that not cover the cases needed here? If e.g. on the ref-3.C testcase from the patch I put breakpoint on both gen_ptr_to_mbr_type_die and modified_type_die (the latter only for type->base.code == RECORD_TYPE), then I see first: #0 gen_ptr_to_mbr_type_die (type=, context_die=>, class_type=, member_type=) at ../../gcc/dwarf2out.c:23128 #1 0x00c3a7e5 in gen_type_die_with_usage (type=, context_die=>, usage=DINFO_USAGE_DIR_USE) at ../../gcc/dwarf2out.c:24428 #2 0x00c3ab92 in gen_type_die (type=, context_die=>) at ../../gcc/dwarf2out.c:24491 #3 0x00c3caed in gen_decl_die (decl=, origin=, ctx=0x0, context_die=>) at ../../gcc/dwarf2out.c:25117 #4 0x00c3b13e in process_scope_var (stmt=, decl=, origin=, context_die=>) at ../../gcc/dwarf2out.c:24620 #5 0x00c3b1bb in decls_for_scope (stmt=, context_die=>) at ../../gcc/dwarf2out.c:24645 and only afterwards: #0 modified_type_die (type=, cv_quals=0, reverse=false, context_die=>) at ../../gcc/dwarf2out.c:12328 #1 0x00c2f03b in add_type_attribute (object_die=>, type=, cv_quals=0, reverse=false, context_die=>) at ../../gcc/dwarf2out.c:20346 #2 0x00c354f4 in gen_variable_die (decl=, origin=, context_die=>) at ../../gcc/dwarf2out.c:22688 #3 0x00c3cb8e in gen_decl_die (decl=, origin=, ctx=0x0, context_die=>) at ../../gcc/dwarf2out.c:25138 #4 0x00c3b13e in process_scope_var (stmt=, decl=, origin=, context_die=>) at ../../gcc/dwarf2out.c:24620 #5 0x00c3b1bb in decls_for_scope (stmt=, context_die=>) at ../../gcc/dwarf2out.c:24645 which means if gen_type_die or gen_type_die_with_usage doesn't use the langhook, then we'd emit a completely useless { __pfn; __delta } struct into debug info first, and then in modified_type_die used the langhook, get OFFSET_TYPE and probably create the DW_TAG_ptr_to_member_type. So I think we really need that. > > 2) it is used for something Ada-ish I really don't know how to test etc. > >to be able to find out if it is safe to call it in > >gen_type_die_with_usage too > > You could find an Ada test that uses the code and verify that the > output stays the same? I can try to find the patch that introduced it and if it contained any testcases. > > 3) most importantly, if the C++ version of this langhook would create > >OFFSET_TYPE on the fly, I don't know how to ensure effective sharing > >of DW_TAG_ptr_to_member_type nodes with the same DW_AT_type > >and DW_AT_containing_type; unless the C++ langhook adds some extra > >hash table that caches already created OFFSET_TYPEs or something similar, > >it would create a new OFFSET_TYPE each time it is called > > build_offset_type already uses a hash table. Ah, ok. > > Also, I really don't know how well does GDB (especially older releases) > > handle DW_TAG_ptr_to_member_type for PMF, so the patch wraps that currently > > with if (dwarf_version >= 5). Quick grep revealed that GDB has code to > > handle the __pfn/__delta fields. So, can I ask somebody from the GDB > > team to test this patch with that if (dwarf_version >= 5) replaced > > with if (1) and see if it works properly with current GDB as well as say > > 4-5 years old one (e.g. with -gdwarf-2 or -gdwarf-3)? If yes, we > > should emit it unconditionally. > > This all makes sense to me. > > > + if (dwarf_version >= 5) > > + { > > + tree class_type = lang_hooks.types.get_ptrmemfn_type (type, 0); > > + if (class_type != NULL_TREE) > > This can be > > if (dwarf_version >= 5) > if (tree class_type = lang_hooks.types.get_ptrmemfn_type (type, 0)) Ok. Jakub
RFC: Warnings silenced when macros from system headers are used (PR c/78000, c/71613)
One of the pending issues that we should address before we release GCC 7 is that sometimes we don't issue a warning if the location points to a macro defined in a system header, unless -Wsystem-headers. Consider e.g. enum { e1 = LLONG_MIN }; or float_var = LLONG_MIN; Here, LLONG_MIN comes from a system header and so the compiler doesn't print any warnings even though it should--the problem is not in the macro itself, but in how it's used. This has happened before, e.g. with NULL, and the fix was to call expansion_point_location_if_in_system_header, but this strategy is not tenable; there are too many such issues. After having spent days on this it seems that we should always use the expansion location except for certain pedwarns/warnings--the challenge is of course how to distinguish which ones. This patch introduces expand_macros_sentinel that can be used to suppress expanding macros, removes various expansion_point_location_if_in_system_header calls and fixes testsuite fallout. I have *not* gone over all the warnings/pedwarns yet, because this is touch-and-go whether this'll be considered a reasonable approach. The general principle should be: is it the macro or its user that is responsible for the warning?, but in practice it was often less clear to me what to do. So e.g. imagine #define C : #define F i ?: 3 in a system header and then i = i ? C 5; // 1 return F; // 2 in user code -- I believe we should NOT warn for 2 (so don't expand the location), but that also means we won't warn for 1. Thoughts? Bootstrapped/regtested on x86_64-linux and ppc64-linux. 2016-11-02 Marek Polacek PR c/78000 PR c/71613 gcc/c-family/ * c-common.c (unsafe_conversion_p): Don't call expansion_point_location_if_in_system_header. (c_cpp_error): Add a paramater and expand locations depending on that. * c-common.h (c_cpp_error): Update declaration. * c-warn.c (warnings_for_convert_and_check): Don't call expansion_point_location_if_in_system_header. gcc/c/ * c-decl.c (declspecs_add_type): Use expand_macros_sentinel to suppress expanding macro locations. * c-parser.c (c_parser_postfix_expression) [RID_FUNCTION_NAME, RID_PRETTY_FUNCTION_NAME, RID_C99_FUNCTION_NAME]: Likewise. * c-typeck.c (convert_arguments): Don't call expansion_point_location_if_in_system_header. (pedwarn_init): Likewise. (warning_init): Likewise. (convert_for_assignment): Don't call expansion_point_location_if_in_system_header. (c_finish_return): Likewise. gcc/cp/ * call.c (conversion_null_warnings): Don't call expansion_point_location_if_in_system_header. * cvt.c (build_expr_type_conversion): Likewise. * parser.c (set_and_check_decl_spec_loc): Use expand_macros_sentinel to suppress expanding macro locations. * typeck.c (cp_build_binary_op): Don't call expansion_point_location_if_in_system_header. gcc/ * diagnostic.c (diagnostic_initialize): Initialize dc_expand_locations. (diagnostic_report_warnings_p): New function. * diagnostic.h (struct diagnostic_context): Add dc_expand_locations. (diagnostic_report_warnings_p): Remove. (struct expand_macros_sentinel): New sentinel. (diagnostic_report_warnings_p): Declare. * genmatch.c (error_cb): Add bool parameter. (fatal_at): Adjust error_cb call. (warning_at): Likewise. * input.c (on_error): Add bool parameter. gcc/fortran/ * cpp.c (cb_cpp_error): Add a paramater and expand locations depending on that. gcc/testsuite/ * gcc.dg/pr71613.c: New. * gcc.dg/pr78000.c: New. * gcc.dg/pr78000.h: New. libcpp/ * charset.c (noop_error_cb): Add bool parameter. (saved_error_handler): Likewise. (cpp_interpret_string_ranges): * errors.c (cpp_diagnostic_at_richloc): Adjust cb.error call. (cpp_diagnostic_at): Likewise. (cpp_diagnostic_with_line): Add bool parameter and pass it down to cb.error. (cpp_error_with_line_noexpand): New. (cpp_warning_with_line_noexpand): New. (cpp_pedwarning_with_line_noexpand): New. (cpp_warning_with_line_syshdr): Pass true to cpp_diagnostic_with_line. * expr.c (cpp_classify_number): Call *_noexpand variants. * include/cpplib.h (error): Adjust declaration. (cpp_error_with_line_noexpand): New. (cpp_warning_with_line_noexpand): New. (cpp_pedwarning_with_line_noexpand): New. diff --git gcc/gcc/c-family/c-common.c gcc/gcc/c-family/c-common.c index 307862b..b0a1b5c 100644 --- gcc/gcc/c-family/c-common.c +++ gcc/gcc/c-family/c-common.c @@ -1230,7 +1230,6 @@ unsafe_conversion_p (location_t loc, tree type, tree expr, bool produce_warns) { enum conversion_safety give_warning = SAFE_CONVERSION; /* is 0 or false */ tree expr_
Re: [PATCH][GCC][testsuite] Fix failing vminnm/vmaxnm tests for ARM
On 2 November 2016 at 16:38, Tamar Christina wrote: > Hi all, > > This fixes the ARM failures in the testsuite. > Previously these tests were gated on if ARMv8-a > support was available in the compiler, not if the hardware > was an ARMv8-a hardware. > > This thus resulted in correct code, but wouldn't run on > any other hardware. > > Ran regression tests on aarch64-none-linux-gnu, arm-none-linux-gnueabihf. > > Ok for trunk? > It seems OK to me (without actually testing, though) Thanks Christophe > Thanks, > Tamar > > gcc/testsuite/ > > 2016-11-01 Tamar Christina > > * gcc.target/arm/simd/vmaxnm_f32_1.c (dg-require-effective-target): > Check for arm_v8_neon_hw. > * gcc.target/arm/simd/vmaxnmq_f32_1.c (dg-require-effective-target): > Likewise. > * gcc.target/arm/simd/vminnm_f32_1.c (dg-require-effective-target): > Likewise. > * gcc.target/arm/simd/vminnmq_f32_1.c(dg-require-effective-target): > Likewise.
Re: RFC: Warnings silenced when macros from system headers are used (PR c/78000, c/71613)
On Wed, Nov 2, 2016 at 11:51 AM, Marek Polacek wrote: > One of the pending issues that we should address before we release GCC 7 is > that sometimes we don't issue a warning if the location points to a macro > defined in a system header, unless -Wsystem-headers. Consider e.g. > > enum { e1 = LLONG_MIN }; > > or > > float_var = LLONG_MIN; > > Here, LLONG_MIN comes from a system header and so the compiler doesn't print > any warnings even though it should--the problem is not in the macro itself, > but in how it's used. This has happened before, e.g. with NULL, and the fix > was to call expansion_point_location_if_in_system_header, but this strategy > is not tenable; there are too many such issues. After having spent days on > this it seems that we should always use the expansion location except for > certain pedwarns/warnings--the challenge is of course how to distinguish > which ones. This patch introduces expand_macros_sentinel that can be used > to suppress expanding macros, removes various > expansion_point_location_if_in_system_header calls and fixes testsuite > fallout. I have *not* gone over all the warnings/pedwarns yet, because this > is touch-and-go whether this'll be considered a reasonable approach. > > The general principle should be: is it the macro or its user that is > responsible for the warning?, but in practice it was often less clear to > me what to do. So e.g. imagine > > #define C : > #define F i ?: 3 > > in a system header and then > > i = i ? C 5; // 1 > return F; // 2 > > in user code -- I believe we should NOT warn for 2 (so don't expand the > location), > but that also means we won't warn for 1. > > Thoughts? It seems to me that the general principle is that we should consider the location where the thing we're warning about is happening. In float_var = LLONG_MIN; The relevant location is that of the assignment, not the constant on the RHS. In your ?: example, a simple answer would be to warn based on the location of the ?. The ?C example seems not worth worrying about either way. Jason
Re: [PATCH 1/4] Use SOURCE_PREFIX_MAP envvar as an implicit debug-prefix-map
On Wed, 2 Nov 2016, Ximin Luo wrote: > +@item SOURCE_PREFIX_MAP If this variable is set, it specifies a mapping The text should start on a separate line from the @item. > +that is used to transform filepaths that are output in debug symbols. > +This helps the embedded paths become reproducible, without having the > +unreproducible value be visible in other input sources - such as GCC Use a TeX em dash, ---, instead of " - ". -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH v2] AIX visibility
On Wed, Nov 02, 2016 at 11:41:32AM -0400, David Edelsohn wrote: > Any comments on ASM_WEAKEN_DECL change? It no longer checks RS6000_WEAK, is that always on now? Otherwise looks fine to me. Segher
Re: [PATCH 0/3] use rtx_insn * more
On 11/02/2016 03:55 PM, David Malcolm wrote: Did you mean this patch: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01358.html That one is ok after the test, sorry for not being more clear. Bernd