[testsuite, PR81662, committed] Skip fpatchable-function-entry tests for nvptx
Hi, fpatchable-function-entry requires nop, which nvptx does not have. I've disabled the corresponding test for nvptx. I suppose I could add an effective target has_nop, but I'm not sure if it's worth the trouble. Committed. Thanks, - Tom Skip fpatchable-function-entry tests for nvptx 2017-08-02 Tom de Vries PR target/81662 * c-c++-common/patchable_function_entry-decl.c: Skip for nvptx. * c-c++-common/patchable_function_entry-default.c: Same. * c-c++-common/patchable_function_entry-definition.c: Same. --- gcc/testsuite/c-c++-common/patchable_function_entry-decl.c | 2 +- gcc/testsuite/c-c++-common/patchable_function_entry-default.c| 2 +- gcc/testsuite/c-c++-common/patchable_function_entry-definition.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c b/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c index 8514b10..5c39a35 100644 --- a/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c +++ b/gcc/testsuite/c-c++-common/patchable_function_entry-decl.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { ! nvptx*-*-* } } } */ /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */ /* { dg-final { scan-assembler-times "nop" 2 } } */ diff --git a/gcc/testsuite/c-c++-common/patchable_function_entry-default.c b/gcc/testsuite/c-c++-common/patchable_function_entry-default.c index 0dcf118..48094f7 100644 --- a/gcc/testsuite/c-c++-common/patchable_function_entry-default.c +++ b/gcc/testsuite/c-c++-common/patchable_function_entry-default.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { ! nvptx*-*-* } } } */ /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */ /* { dg-final { scan-assembler-times "nop" 3 } } */ diff --git a/gcc/testsuite/c-c++-common/patchable_function_entry-definition.c b/gcc/testsuite/c-c++-common/patchable_function_entry-definition.c index a007867..af8202f 100644 --- a/gcc/testsuite/c-c++-common/patchable_function_entry-definition.c +++ b/gcc/testsuite/c-c++-common/patchable_function_entry-definition.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { ! nvptx*-*-* } } } */ /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */ /* { dg-final { scan-assembler-times "nop" 1 } } */
[testsuite, committed] Require label_values for gcc.dg/torture/pr80163.c
Hi, this patch requires effective target label_values for gcc.dg/torture/pr80163.c. Committed. Thanks, - Tom Require label_values for gcc.dg/torture/pr80163.c 2017-08-02 Tom de Vries * gcc.dg/torture/pr80163.c: Require label_values. --- gcc/testsuite/gcc.dg/torture/pr80163.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/torture/pr80163.c b/gcc/testsuite/gcc.dg/torture/pr80163.c index 80cc68d..a9a4438 100644 --- a/gcc/testsuite/gcc.dg/torture/pr80163.c +++ b/gcc/testsuite/gcc.dg/torture/pr80163.c @@ -1,5 +1,6 @@ /* PR c/80163 */ /* { dg-do compile { target int128 } } */ +/* { dg-require-effective-target label_values } */ volatile int v;
[testsuite, committed] Require alloca for gcc.dg/attr-noipa.c
Hi, this patch requires effective target alloca for gcc.dg/attr-noipa.c. Committed. Thanks, - Tom Require alloca for gcc.dg/attr-noipa.c 2017-08-02 Tom de Vries * gcc.dg/attr-noipa.c: Require alloca. --- gcc/testsuite/gcc.dg/attr-noipa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/attr-noipa.c b/gcc/testsuite/gcc.dg/attr-noipa.c index 1d2b868..e2349b6 100644 --- a/gcc/testsuite/gcc.dg/attr-noipa.c +++ b/gcc/testsuite/gcc.dg/attr-noipa.c @@ -1,6 +1,7 @@ /* Test the noipa attribute. */ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-require-effective-target alloca } */ static inline int __attribute__((noipa)) fn1 (void) /* { dg-warning "inline function \[^\n\]* given attribute noinline" "" } */
[testsuite, committed] Require alias for gcc.dg/pr56727-2.c
Hi, this patch requires alias support for gcc.dg/pr56727-2.c. Committed. Thanks, - Tom Require alias for gcc.dg/pr56727-2.c 2017-08-02 Tom de Vries * gcc.dg/pr56727-2.c: Require alias. --- gcc/testsuite/gcc.dg/pr56727-2.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c b/gcc/testsuite/gcc.dg/pr56727-2.c index e47ee3c..62a74d1 100644 --- a/gcc/testsuite/gcc.dg/pr56727-2.c +++ b/gcc/testsuite/gcc.dg/pr56727-2.c @@ -1,5 +1,6 @@ /* { dg-do compile { target fpic } } */ /* { dg-options "-O2 -fPIC" } */ +/* { dg-require-alias "" } */ /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* powerpc*-*-linux* } } } */ __attribute__((noinline, noclone))
PR78888 - add value range info for tolower/toupper
Hi, The attached patch adds value-range info for __builtin_tolower and __builtin_toupper. In the patch, I have just settled for anti-range ~['a', 'z'] for return value of toupper. Would that be correct albeit imprecise ? A more precise range would be: [0, UCHAR_MAX] intersect ~['a', 'z'] union EOF as mentioned in PR. I am not sure though if it's possible for a SSA_NAME to have multiple disjoint ranges ? Bootstrap+tested on x86_64-unknown-linux-gnu. Thanks, Prathamesh diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr7.c b/gcc/testsuite/gcc.dg/tree-ssa/pr7.c new file mode 100644 index 000..2bcddf1f2c3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr7.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-evrp-slim" } */ + +void g (int x) +{ + if (__builtin_toupper ((unsigned char)x) == 'a') +__builtin_abort (); +} + +void h (int x) +{ + if (__builtin_tolower ((unsigned char)x) == 'A') +__builtin_abort (); +} + +/* { dg-final { scan-tree-dump-not "__builtin_abort" "evrp" } } */ diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 79a29bf0efb..7137a4c52ec 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -3778,6 +3778,19 @@ extract_range_basic (value_range *vr, gimple *stmt) return; } break; + case CFN_BUILT_IN_TOUPPER: + case CFN_BUILT_IN_TOLOWER: + if (tree lhs = gimple_call_lhs (stmt)) + { + tree type = TREE_TYPE (lhs); + unsigned char min = (cfn == CFN_BUILT_IN_TOUPPER) ? 'a' : 'A'; + unsigned char max = (cfn == CFN_BUILT_IN_TOUPPER) ? 'z' : 'Z'; + tree range_min = build_int_cstu (type, min); + tree range_max = build_int_cstu (type, max); + set_value_range (vr, VR_ANTI_RANGE, range_min, range_max, NULL); + return; + } + break; default: break; }
Re: PR78888 - add value range info for tolower/toupper
On Thu, Aug 03, 2017 at 12:58:06PM +0530, Prathamesh Kulkarni wrote: > --- a/gcc/tree-vrp.c > +++ b/gcc/tree-vrp.c > @@ -3778,6 +3778,19 @@ extract_range_basic (value_range *vr, gimple *stmt) > return; > } > break; > + case CFN_BUILT_IN_TOUPPER: > + case CFN_BUILT_IN_TOLOWER: > + if (tree lhs = gimple_call_lhs (stmt)) > + { > + tree type = TREE_TYPE (lhs); > + unsigned char min = (cfn == CFN_BUILT_IN_TOUPPER) ? 'a' : 'A'; > + unsigned char max = (cfn == CFN_BUILT_IN_TOUPPER) ? 'z' : 'Z'; > + tree range_min = build_int_cstu (type, min); > + tree range_max = build_int_cstu (type, max); > + set_value_range (vr, VR_ANTI_RANGE, range_min, range_max, NULL); > + return; You are hardcoding here host characters and using it for target. I think you need to use lang_hooks.to_target_charset (really no idea how it works or doesn't in LTO, but gimple-fold.c is already using it among others). Also, you're assuming that the 'a'-'z' and 'A'-'Z' ranges are without gaps, which isn't the case for e.g. EBCDIC. So I think you'd need to verify (once?) that the target charset has this property. Jakub
[PATCH] Improve var_bound range test opt (PR tree-optimization/81655)
Hi! For the PR81588 testcase, on targets with branch cost 1, we end up with: b.0_1 = b; _2 = (long long int) b.0_1; a.1_3 = a; _4 = _2 > a.1_3; _5 = (int) _4; if (a.1_3 < 0) goto ; [36.00%] [count: INV] else goto ; [64.00%] [count: INV] [64.00%] [count: INV]: if (_4 != 1) goto ; [46.88%] [count: INV] else goto ; [53.13%] [count: INV] [66.00%] [count: INV]: c = 0; [100.00%] [count: INV]: The reason why we punt is the unexpected _4 != 1 condition, the code is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is equivalent to _4 != 1 for boolean type. The following patch handles even comparison with 1 if the type is unsigned 1-bit precision. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-08-03 Jakub Jelinek PR tree-optimization/81655 PR tree-optimization/81588 * tree-ssa-reassoc.c (optimize_range_tests_var_bound): Handle also the case when ranges[i].low and high are 1 for unsigned type with precision 1. --- gcc/tree-ssa-reassoc.c.jj 2017-08-01 10:28:50.0 +0200 +++ gcc/tree-ssa-reassoc.c 2017-08-02 11:28:44.789134681 +0200 @@ -2918,11 +2918,22 @@ optimize_range_tests_var_bound (enum tre for (i = 0; i < length; i++) { + bool in_p = ranges[i].in_p; if (ranges[i].low == NULL_TREE - || ranges[i].high == NULL_TREE - || !integer_zerop (ranges[i].low) - || !integer_zerop (ranges[i].high)) + || ranges[i].high == NULL_TREE) continue; + if (!integer_zerop (ranges[i].low) + || !integer_zerop (ranges[i].high)) + { + if (ranges[i].exp + && TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) == 1 + && TYPE_UNSIGNED (TREE_TYPE (ranges[i].exp)) + && integer_onep (ranges[i].low) + && integer_onep (ranges[i].high)) + in_p = !in_p; + else + continue; + } gimple *stmt; tree_code ccode; @@ -2964,7 +2975,7 @@ optimize_range_tests_var_bound (enum tre default: continue; } - if (ranges[i].in_p) + if (in_p) ccode = invert_tree_comparison (ccode, false); switch (ccode) { Jakub
Re: [PATCH 0/2] Python testcases to check DWARF output
On 08/02/2017 05:43 PM, Jeff Law wrote: I hate to throw in a wrench at this point, but has anyone looked at dwgrep from Petr Machata? He's not doing much with it anymore, but it might provide enough of a dwarf scanning framework to be useful for testing purposes. Sure, no problem: I first started talking publicly about this one week ago, so it’s definitely not too late to mention alternatives. ;-) I learned about dwgrep two years ago and forgot about it, so thank you for the idea. I started to have a look at it, and for now I don’t think it’s a good match in this context: 1. it’s an ELF only tool; 2. it must be built, requiring external dependencies: cmake and elfutils; 3. in order to use it, one must learn a dedicated post-fix language (Zwerg) For 1. I think this is a true problem, as it means for instance that we could not test DWARF on Windows and Darwin setups. Unless we add PE and Mach-O handling in dwgrep of course, but that does not sound easy and will bring other external dependencies. For 3. I feel that, for someone who is comfortable with Python, it will be easier to deal with a Python library (the dwarfutils in my patch) than having to learn yet another DSL. I think that’s precisely why some people would like to have a Python test framework rather than a TCL one. Working with a “usual” imperative language looks easier than with postfix expressions. Smaller cognitive load. Actually I see another problem: pattern will have to vary depending on the target platform (for instance 32/64bit or depending on the DWARF version). Of course we could duplicate whole patterns in testcases to take this into account, but that’s like code duplication: I think we should be able to include small “X if 32bit else Y” in patterns, and I don’t think we can do that with Zwerg (no way to pass something like environment variables). Of course, I have written a “competitor” tool: I guess my judgment is biased. :-) So other opinions are welcome! -- Pierre-Marie de Rodat
Re: GCC 7.2 Status Report (2017-08-02)
On Wed, 2 Aug 2017, Ian Lance Taylor wrote: > On Wed, Aug 2, 2017 at 4:17 AM, Richard Biener wrote: > > > > Status > > == > > > > The GCC 7 branch is now frozen for the upcoming release candidate > > and release. All changes require release manager approval. > > Whoops, I committed a patch to the GCC 7 branch this morning (my time) > before I saw this e-mail message (SVN revision 250833). Sorry about > that. Let me know if you want me to revert it, but I think it's > pretty safe. Yes, looks harmless to me (go isn't release critical anyway). Thanks, Richard.
Re: [PATCH 1/2] Introduce testsuite support to run Python tests
On 08/02/2017 08:43 PM, Jeff Law wrote: If it was trivial, then let's support 2.6. But if it's nontrivial, I'd support stepping to something more modern. It is trivial. I’ve done it locally. :-) -- Pierre-Marie de Rodat
[committed] Diagnose invalid #pragma omp simd even for -fopenmp-simd (PR middle-end/81052)
Hi! The (mini)pass to diagnose invalid OpenMP/OpenACC/Cilk+ constructs wasn't mistakenly run for -fopenmp-simd, so non-conforming simd constructs weren't diagnosed, but could easily ICE later on, because we assume the body of the loop is a structured block (single entry, single exit, no branches into it or out of it, no exceptions thrown out of it etc.). Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk so far, queued for backporting. 2017-08-03 Jakub Jelinek PR middle-end/81052 * omp-low.c (diagnose_sb_0): Handle flag_openmp_simd like flag_openmp. (pass_diagnose_omp_blocks::gate): Enable also for flag_openmp_simd. * c-c++-common/pr81052.c: New test. --- gcc/omp-low.c.jj2017-07-06 20:31:32.0 +0200 +++ gcc/omp-low.c 2017-08-02 12:52:47.608735787 +0200 @@ -9083,7 +9083,7 @@ diagnose_sb_0 (gimple_stmt_iterator *gsi } if (kind == NULL) { - gcc_checking_assert (flag_openmp); + gcc_checking_assert (flag_openmp || flag_openmp_simd); kind = "OpenMP"; } @@ -9343,7 +9343,7 @@ public: /* opt_pass methods: */ virtual bool gate (function *) { -return flag_cilkplus || flag_openacc || flag_openmp; +return flag_cilkplus || flag_openacc || flag_openmp || flag_openmp_simd; } virtual unsigned int execute (function *) { --- gcc/testsuite/c-c++-common/pr81052.c.jj 2017-08-02 12:59:28.092223248 +0200 +++ gcc/testsuite/c-c++-common/pr81052.c2017-08-02 13:00:30.603510773 +0200 @@ -0,0 +1,28 @@ +/* PR middle-end/81052 */ +/* { dg-do compile } */ +/* { dg-options "-fopenmp-simd -O2" } */ + +int +foo (int x, int y) +{ + int i; +#pragma omp simd + for (i = x; i < y; ++i) +return 0; /* { dg-error "invalid branch to/from OpenMP structured block" } */ + return 1; +} + +#ifdef __cplusplus +template +T +bar (T x, T y) +{ + T i; +#pragma omp simd + for (i = x; i < y; ++i) +return 0; /* { dg-error "invalid branch to/from OpenMP structured block" "" { target c++ } } */ + return 1; +} + +int x = bar (1, 7); +#endif Jakub
[PATCH] Some -Walloc-size-larger-than= warning fixes (PR driver/81650)
Hi! This patch attempts to fix some overflows during computation of the size and then effectively uses MIN (SSIZE_MAX, user_specified_value) as the bound. Not really sure what exact behavior we want, whether that, or default to SSIZE_MAX (note the documentation mistakenly talks about SIZE_MAX / 2 instead) and use MIN (SIZE_MAX, user_specified_value) for user specified value, or error out if user_specified_value is larger than SSIZE_MAX or larger than SIZE_MAX. Also the else unit = 0; case probably should get diagnostics. Another issue is if we want to diagnose, that right now it will be diagnosed only if some allocation routine is seen, diagnosing stuff during option processing is unfortunately too early because we don't have sizetype/ssizetype built yet. Anyway, the following has been bootstrapped/regtested on x86_64-linux and i686-linux. 2017-08-03 Jakub Jelinek PR driver/81650 * calls.c (alloc_max_size): Use HOST_WIDE_INT_UC (10??) instead of 10??LU, perform unit multiplication in wide_int, don't change alloc_object_size_limit if the limit is larger than SSIZE_MAX. * gcc.dg/pr81650.c: New test. --- gcc/calls.c.jj 2017-06-12 12:42:01.0 +0200 +++ gcc/calls.c 2017-08-02 13:41:00.887324420 +0200 @@ -1222,32 +1222,38 @@ alloc_max_size (void) else if (!strcasecmp (end, "KiB") || strcmp (end, "KB")) unit = 1024; else if (!strcmp (end, "MB")) - unit = 1000LU * 1000; + unit = HOST_WIDE_INT_UC (1000) * 1000; else if (!strcasecmp (end, "MiB")) - unit = 1024LU * 1024; + unit = HOST_WIDE_INT_UC (1024) * 1024; else if (!strcasecmp (end, "GB")) - unit = 1000LU * 1000 * 1000; + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000; else if (!strcasecmp (end, "GiB")) - unit = 1024LU * 1024 * 1024; + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024; else if (!strcasecmp (end, "TB")) - unit = 1000LU * 1000 * 1000 * 1000; + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000; else if (!strcasecmp (end, "TiB")) - unit = 1024LU * 1024 * 1024 * 1024; + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024; else if (!strcasecmp (end, "PB")) - unit = 1000LU * 1000 * 1000 * 1000 * 1000; + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000 * 1000; else if (!strcasecmp (end, "PiB")) - unit = 1024LU * 1024 * 1024 * 1024 * 1024; + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024 * 1024; else if (!strcasecmp (end, "EB")) - unit = 1000LU * 1000 * 1000 * 1000 * 1000 * 1000; + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000 * 1000 + * 1000; else if (!strcasecmp (end, "EiB")) - unit = 1024LU * 1024 * 1024 * 1024 * 1024 * 1024; + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024 * 1024 + * 1024; else unit = 0; } if (unit) - alloc_object_size_limit - = build_int_cst (ssizetype, limit * unit); + { + wide_int w = wi::uhwi (limit, HOST_BITS_PER_WIDE_INT + 64); + w *= unit; + if (wi::ltu_p (w, alloc_object_size_limit)) + alloc_object_size_limit = wide_int_to_tree (ssizetype, w); + } } } } --- gcc/testsuite/gcc.dg/pr81650.c.jj 2017-08-02 14:52:22.864787221 +0200 +++ gcc/testsuite/gcc.dg/pr81650.c 2017-08-02 14:21:11.0 +0200 @@ -0,0 +1,9 @@ +/* PR driver/81650 */ +/* { dg-do compile } */ +/* { dg-options "-Walloc-size-larger-than=9223372036854775807" } */ + +void * +foo (void) +{ + return __builtin_malloc (5); +} Jakub
[PATCH] Fix df-related ICE due to bbpart pass bug (PR target/81621)
Hi! The following testcase ICEs on s390x. The problem is that the bbpart pass calls df_set_flags (DF_DEFER_INSN_RESCAN); because it wants to defer rescanning, but doesn't actually df_finish_pass (it does in one case, but then calls df_set_flags with another changeable flag, so it has the same issue), and if the IRA pass is invoked soon after it without any df_finish_pass calls in between, we end up with deferred insn rescanning during IRA which heavily relies on immediate insn rescanning. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2017-08-03 Jakub Jelinek PR target/81621 * bb-reorder.c (pass_partition_blocks::execute): Return TODO_df_finish after setting changeable df flags. * gcc.dg/pr81621.c: New test. --- gcc/bb-reorder.c.jj 2017-07-21 10:28:13.0 +0200 +++ gcc/bb-reorder.c2017-08-02 19:43:58.797243254 +0200 @@ -2904,7 +2904,7 @@ pass_partition_blocks::execute (function crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges (); if (!crossing_edges.exists ()) -return 0; +return TODO_df_finish; crtl->has_bb_partition = true; @@ -2970,7 +2970,7 @@ pass_partition_blocks::execute (function df_analyze (); } - return 0; + return TODO_df_finish; } } // anon namespace --- gcc/testsuite/gcc.dg/pr81621.c.jj 2017-08-02 19:52:08.435831121 +0200 +++ gcc/testsuite/gcc.dg/pr81621.c 2017-08-02 19:52:00.026924067 +0200 @@ -0,0 +1,5 @@ +/* PR target/81621 */ +/* { dg-do compile { target freorder } } */ +/* { dg-options "-Og -fno-split-wide-types -freorder-blocks-and-partition" } */ + +#include "graphite/scop-10.c" Jakub
Re: [PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)
On Wed, Aug 2, 2017 at 7:10 PM, Jeff Law wrote: > On 08/01/2017 03:25 AM, Richard Biener wrote: >> On Tue, Aug 1, 2017 at 11:23 AM, Richard Biener >> wrote: >>> On Tue, Aug 1, 2017 at 4:27 AM, Martin Sebor wrote: Richard, in discussing this work Jeff mentioned that your comments on the tree-ssa-alias.c parts would be helpful. When you have a chance could you please give it a once over and let me know if you have any suggestions or concerns? There are no visible changes to existing clients of the pass, just extensions that are relied on only by the new diagnostics. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01264.html I expect to revisit the sprintf %s patch mentioned below and see how to simplify it by taking advantage of the changes implemented here. >>> >>> Your ptr_deref_alias_decl_p returns true when must_alias is true >>> but there is no must-alias relationship. >>> >>> The ao_ref_init_from_ptr_and_size doesn't belong there. >>> >>> I dislike all of the changes related to returning an alias "size". >>> >>> Please keep away from the alias machinery. >>> >>> Warning during folding is similarly bad design. Please don't add >>> more such things. >>> >>> Thanks for putting this on my radar though. >>> Richard. >> >> Oh, for a constructive comment this all feels like sth for either >> sanitizers or a proper static analysis tool. As I outlined repeatedly >> I belive GCC could host a static analysis tool but it surely should >> not be interwinded into it's optimization passes or tools. > I disagree strongly here. > > Sanitiers catch problems after the fact and only if you've compiled your > code with sanitization and manage to find a way to trigger the problem path. > > Other static analysis tools are useful, but they aren't an integral part > of the edit, compile, debug cycle and due to their cost are often run > long after code was committed. > > Finding useful warnings that can be issued as part of the compile, edit, > debug cycle is, IMHO, far more useful than sanitizers or independent > static checkers. > > -- > > > I think finding a way to exploit information that our various analyzers > can provide to give precise warnings is a good thing. So it seemed > natural that if we wanted to ask a must-alias question that we should be > querying the alias oracle rather than implementing that kind of query > within the sprintf warnings. I'm not sure why you'd say "Please keep > away from the alias machinery". > > I'm a little confused here... Well, simply because the way as implemented isn't a must-alias query but maybe one that's good enough for warnings (reduces false positives but surely doesn't eliminate them). There's a must alias query already, in stmt_kills_ref_p. It's a matter of refactoring to make a refs_must_alias_p. Then propose that "overlap range" stuff separately. But I'm against lumping this all in as an innocent change suggesting the machinery can do sth (must alias) when it really can't. I also do not like adding a "must alias" bool to the may-alias APIs as the implementation is fundamentally may-alias, must-alias really is very different. And to repeat, no, I do not want a "good-enough-for-warnings" must-alias in an API that's supposed to be used by optimizations where "good enough" is not good enough. Richard. > > > Jeff
Re: [PATCH] Improve var_bound range test opt (PR tree-optimization/81655)
On Thu, 3 Aug 2017, Jakub Jelinek wrote: > Hi! > > For the PR81588 testcase, on targets with branch cost 1, we end up with: > b.0_1 = b; > _2 = (long long int) b.0_1; > a.1_3 = a; > _4 = _2 > a.1_3; > _5 = (int) _4; > if (a.1_3 < 0) > goto ; [36.00%] [count: INV] > else > goto ; [64.00%] [count: INV] > >[64.00%] [count: INV]: > if (_4 != 1) > goto ; [46.88%] [count: INV] > else > goto ; [53.13%] [count: INV] > >[66.00%] [count: INV]: > c = 0; > >[100.00%] [count: INV]: > The reason why we punt is the unexpected _4 != 1 condition, the code > is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is equivalent > to _4 != 1 for boolean type. Hmm, I thought we had code to canonicalize boolean compares (but I can't find that right now). Some is in forwprop:forward_propagate_into_gimple_cond where it canonicalizes != 1 to == 0. > The following patch handles even comparison with 1 if the type is > unsigned 1-bit precision. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. > 2017-08-03 Jakub Jelinek > > PR tree-optimization/81655 > PR tree-optimization/81588 > * tree-ssa-reassoc.c (optimize_range_tests_var_bound): Handle also > the case when ranges[i].low and high are 1 for unsigned type with > precision 1. > > --- gcc/tree-ssa-reassoc.c.jj 2017-08-01 10:28:50.0 +0200 > +++ gcc/tree-ssa-reassoc.c2017-08-02 11:28:44.789134681 +0200 > @@ -2918,11 +2918,22 @@ optimize_range_tests_var_bound (enum tre > >for (i = 0; i < length; i++) > { > + bool in_p = ranges[i].in_p; >if (ranges[i].low == NULL_TREE > - || ranges[i].high == NULL_TREE > - || !integer_zerop (ranges[i].low) > - || !integer_zerop (ranges[i].high)) > + || ranges[i].high == NULL_TREE) > continue; > + if (!integer_zerop (ranges[i].low) > + || !integer_zerop (ranges[i].high)) > + { > + if (ranges[i].exp > + && TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) == 1 > + && TYPE_UNSIGNED (TREE_TYPE (ranges[i].exp)) > + && integer_onep (ranges[i].low) > + && integer_onep (ranges[i].high)) > + in_p = !in_p; > + else > + continue; > + } > >gimple *stmt; >tree_code ccode; > @@ -2964,7 +2975,7 @@ optimize_range_tests_var_bound (enum tre > default: > continue; > } > - if (ranges[i].in_p) > + if (in_p) > ccode = invert_tree_comparison (ccode, false); >switch (ccode) > { > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Some -Walloc-size-larger-than= warning fixes (PR driver/81650)
On Thu, 3 Aug 2017, Jakub Jelinek wrote: > Hi! > > This patch attempts to fix some overflows during computation of the size > and then effectively uses MIN (SSIZE_MAX, user_specified_value) as the > bound. > > Not really sure what exact behavior we want, whether that, or default > to SSIZE_MAX (note the documentation mistakenly talks about SIZE_MAX / 2 > instead) and use MIN (SIZE_MAX, user_specified_value) for user specified > value, or error out if user_specified_value is larger than > SSIZE_MAX or larger than SIZE_MAX. Also the else unit = 0; case > probably should get diagnostics. Another issue is if we want to diagnose, > that right now it will be diagnosed only if some allocation routine is seen, > diagnosing stuff during option processing is unfortunately too early because > we don't have sizetype/ssizetype built yet. > > Anyway, the following has been bootstrapped/regtested on x86_64-linux and > i686-linux. Ok. Richard. > 2017-08-03 Jakub Jelinek > > PR driver/81650 > * calls.c (alloc_max_size): Use HOST_WIDE_INT_UC (10??) > instead of 10??LU, perform unit multiplication in wide_int, > don't change alloc_object_size_limit if the limit is larger > than SSIZE_MAX. > > * gcc.dg/pr81650.c: New test. > > --- gcc/calls.c.jj2017-06-12 12:42:01.0 +0200 > +++ gcc/calls.c 2017-08-02 13:41:00.887324420 +0200 > @@ -1222,32 +1222,38 @@ alloc_max_size (void) > else if (!strcasecmp (end, "KiB") || strcmp (end, "KB")) > unit = 1024; > else if (!strcmp (end, "MB")) > - unit = 1000LU * 1000; > + unit = HOST_WIDE_INT_UC (1000) * 1000; > else if (!strcasecmp (end, "MiB")) > - unit = 1024LU * 1024; > + unit = HOST_WIDE_INT_UC (1024) * 1024; > else if (!strcasecmp (end, "GB")) > - unit = 1000LU * 1000 * 1000; > + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000; > else if (!strcasecmp (end, "GiB")) > - unit = 1024LU * 1024 * 1024; > + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024; > else if (!strcasecmp (end, "TB")) > - unit = 1000LU * 1000 * 1000 * 1000; > + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000; > else if (!strcasecmp (end, "TiB")) > - unit = 1024LU * 1024 * 1024 * 1024; > + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024; > else if (!strcasecmp (end, "PB")) > - unit = 1000LU * 1000 * 1000 * 1000 * 1000; > + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000 * 1000; > else if (!strcasecmp (end, "PiB")) > - unit = 1024LU * 1024 * 1024 * 1024 * 1024; > + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024 * 1024; > else if (!strcasecmp (end, "EB")) > - unit = 1000LU * 1000 * 1000 * 1000 * 1000 * 1000; > + unit = HOST_WIDE_INT_UC (1000) * 1000 * 1000 * 1000 * 1000 > +* 1000; > else if (!strcasecmp (end, "EiB")) > - unit = 1024LU * 1024 * 1024 * 1024 * 1024 * 1024; > + unit = HOST_WIDE_INT_UC (1024) * 1024 * 1024 * 1024 * 1024 > +* 1024; > else > unit = 0; > } > > if (unit) > - alloc_object_size_limit > - = build_int_cst (ssizetype, limit * unit); > + { > + wide_int w = wi::uhwi (limit, HOST_BITS_PER_WIDE_INT + 64); > + w *= unit; > + if (wi::ltu_p (w, alloc_object_size_limit)) > + alloc_object_size_limit = wide_int_to_tree (ssizetype, w); > + } > } > } > } > --- gcc/testsuite/gcc.dg/pr81650.c.jj 2017-08-02 14:52:22.864787221 +0200 > +++ gcc/testsuite/gcc.dg/pr81650.c2017-08-02 14:21:11.0 +0200 > @@ -0,0 +1,9 @@ > +/* PR driver/81650 */ > +/* { dg-do compile } */ > +/* { dg-options "-Walloc-size-larger-than=9223372036854775807" } */ > + > +void * > +foo (void) > +{ > + return __builtin_malloc (5); > +} > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
> On 08/02/2017 01:51 PM, Richard Biener wrote: > > On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote: > >> Hello. > >> > >> After some discussions with Honza, I've decided to convert current code in > >> stmt.c that > >> is responsible for switch expansion. More precisely, I would like to > >> convert the code > >> to expand gswitch statements on tree level. Currently the newly created > >> pass is executed > >> at the end of tree optimizations. > >> > >> My plan for future is to inspire in [1] and come up with some more > >> sophisticated switch > >> expansions. For that I've been working on a paper where I'll summarize > >> statistics based > >> on what I've collected in openSUSE distribution with specially > >> instrumented GCC. If I'll be > >> happy I can also fit in to schedule of this year's Cauldron with a talk. > >> > >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > >> > >> Thoughts? > > > > First of all thanks. > > > > I think part of switch expansion moved to switch-conversion some time ago > > (emit_case_bit_tests). So maybe the full lowering should be in at least > > the same source file and it should maybe applied earlier for a subset of > > cases (very low number of cases for example). > > Yep, good idea. I'll take a look. > > > > > Did you base the code on the RTL expansion code or did you re-write it from > > scratch? > > It's based, I've just changed the function that create CFG. I have talked Martin to do this in first step. Switch expansion is infinitely difficult problem and I think changing representation first and keeping the basic algorithm is easiest way to get something done. Algorithm will be improved next as far as I know :) Honza > > Martin
[PATCH, ARM] fix .cfi inconsistency out of builtin_eh_return
Hello, Activating dwarf2 based eh for real on VxWorks6 (patch to come) triggers a libgcc build failure, where most functions resorting to __builtin_eh_return fail this check from maybe_record_trace_start in dwarf2cfi.c: /* We ought to have the same state incoming to a given trace no matter how we arrive at the trace. Anything else means we've got some kind of optimization error. */ #if CHECKING_P if (!cfi_row_equal_p (cur_row, ti->beg_row)) ... The inconsistency is introduced by a store inserted in the middle of the insn stream by arm_set_return_address for __builtin_eh_return, marked FRAME_RELATED with: /* The store needs to be marked as frame related in order to prevent DSE from deleting it as dead if it is based on fp. */ rtx insn = emit_move_insn (gen_frame_mem (Pmode, addr), source); RTX_FRAME_RELATED_P (insn) = 1; add_reg_note (insn, REG_CFA_RESTORE, gen_rtx_REG (Pmode, LR_REGNUM)); The FRAME_RELATED ness was setup to fixed DWARF2 unwinding at the time already, indeed broken by DSE removing the store - see https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01669.html. The attached patch is a proposal to fix this by setting MEM_VOLATILE_P on the store destination mem instead of setting RTX_FRAME_RELATED_P on the insn, which alleviates the .cfi complications and is as effective in preventing the store removal by DSE, from: /* Assuming that there are sets in these insns, we cannot delete them. */ if ((GET_CODE (PATTERN (insn)) == CLOBBER) || volatile_refs_p (PATTERN (insn)) || (!cfun->can_delete_dead_exceptions && !insn_nothrow_p (insn)) || (RTX_FRAME_RELATED_P (insn)) || find_reg_note (insn, REG_FRAME_RELATED_EXPR, NULL_RTX)) insn_info->cannot_delete = true; For testing, I verified that - an arm-wrs-vxworks build with the extra patch to activate DWARF2 eh proceeds to completion on mainline, - the same symptoms were visible and cured on our in-house gcc-7 based toolchain, and that - ACATS are clean for Ada in this configuration after the patch, confirming proper behavior of the DWARF2 exception propagation circuitry. OK to commit ? Thanks much in advance for your feedback, With Kind Regards, Olivier arm-set-return-address.patch Description: Binary data
Re: [PATCH] Fix df-related ICE due to bbpart pass bug (PR target/81621)
On Thu, 3 Aug 2017, Jakub Jelinek wrote: > Hi! > > The following testcase ICEs on s390x. The problem is that the bbpart pass > calls > df_set_flags (DF_DEFER_INSN_RESCAN); > because it wants to defer rescanning, but doesn't actually df_finish_pass > (it does in one case, but then calls df_set_flags with another changeable > flag, > so it has the same issue), and if the IRA pass is invoked soon after it > without any df_finish_pass calls in between, we end up with deferred insn > rescanning during IRA which heavily relies on immediate insn rescanning. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? Maybe add a comment in case somebody wonders later? > 2017-08-03 Jakub Jelinek > > PR target/81621 > * bb-reorder.c (pass_partition_blocks::execute): Return TODO_df_finish > after setting changeable df flags. > > * gcc.dg/pr81621.c: New test. > > --- gcc/bb-reorder.c.jj 2017-07-21 10:28:13.0 +0200 > +++ gcc/bb-reorder.c 2017-08-02 19:43:58.797243254 +0200 > @@ -2904,7 +2904,7 @@ pass_partition_blocks::execute (function > >crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges (); >if (!crossing_edges.exists ()) > -return 0; > +return TODO_df_finish; I suppose we can avoid this if we move the df_set_flags after this? I doubt find_rarely_executed_basic_blocks_and_crossing_edges modifies anything. Ok with those changes (if the latter is possible). Thanks, Richard. >crtl->has_bb_partition = true; > > @@ -2970,7 +2970,7 @@ pass_partition_blocks::execute (function >df_analyze (); > } > > - return 0; > + return TODO_df_finish; > } > > } // anon namespace > --- gcc/testsuite/gcc.dg/pr81621.c.jj 2017-08-02 19:52:08.435831121 +0200 > +++ gcc/testsuite/gcc.dg/pr81621.c2017-08-02 19:52:00.026924067 +0200 > @@ -0,0 +1,5 @@ > +/* PR target/81621 */ > +/* { dg-do compile { target freorder } } */ > +/* { dg-options "-Og -fno-split-wide-types -freorder-blocks-and-partition" } > */ > + > +#include "graphite/scop-10.c" > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH, ARM] fix .cfi inconsistency out of builtin_eh_return
> On 03 Aug 2017, at 11:02, Olivier Hainque wrote: > The attached patch is a proposal to fix this by setting MEM_VOLATILE_P on > the store destination mem instead of setting RTX_FRAME_RELATED_P on the insn, > which alleviates the .cfi complications and is as effective in preventing > the store removal by DSE Forgot the ChangeLog, here it is: 2017-08-03 Olivier Hainque * config/arm/arm.c (arm_set_return_address): Use MEM_VOLATILE_P on the target mem instead of RTX_FRAME_RELATED_P on the insn to prevent DSE. (thumb_set_return_address): Likewise. > OK to commit ? > > Thanks much in advance for your feedback, > > With Kind Regards, > > Olivier
Re: [PATCH] Fix df-related ICE due to bbpart pass bug (PR target/81621)
On Thu, Aug 03, 2017 at 11:02:51AM +0200, Richard Biener wrote: > On Thu, 3 Aug 2017, Jakub Jelinek wrote: > > > Hi! > > > > The following testcase ICEs on s390x. The problem is that the bbpart pass > > calls > > df_set_flags (DF_DEFER_INSN_RESCAN); > > because it wants to defer rescanning, but doesn't actually df_finish_pass > > (it does in one case, but then calls df_set_flags with another changeable > > flag, > > so it has the same issue), and if the IRA pass is invoked soon after it > > without any df_finish_pass calls in between, we end up with deferred insn > > rescanning during IRA which heavily relies on immediate insn rescanning. > > > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > > trunk? > > Maybe add a comment in case somebody wonders later? Ok. > > --- gcc/bb-reorder.c.jj 2017-07-21 10:28:13.0 +0200 > > +++ gcc/bb-reorder.c2017-08-02 19:43:58.797243254 +0200 > > @@ -2904,7 +2904,7 @@ pass_partition_blocks::execute (function > > > >crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges (); > >if (!crossing_edges.exists ()) > > -return 0; > > +return TODO_df_finish; > > I suppose we can avoid this if we move the df_set_flags after this? > I doubt find_rarely_executed_basic_blocks_and_crossing_edges modifies > anything. > > Ok with those changes (if the latter is possible). I was looking through find_rarely_executed_basic_blocks_and_crossing_edges before writing the patch and while I could prove for some functions that it doesn't modify anything, but e.g. for fix_up_crossing_landing_pad I'm pretty sure it can modify instructions in several ways. So I think we can't do that. Jakub
Re: [PATCH] Improve var_bound range test opt (PR tree-optimization/81655)
On Thu, Aug 03, 2017 at 10:58:07AM +0200, Richard Biener wrote: > > The reason why we punt is the unexpected _4 != 1 condition, the code > > is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is equivalent > > to _4 != 1 for boolean type. > > Hmm, I thought we had code to canonicalize boolean compares (but I can't > find that right now). Some is in I was looking for that too, but didn't find anything that would be done always. > forwprop:forward_propagate_into_gimple_cond where it canonicalizes > != 1 to == 0. Yes, but from Canonicalize _Bool == 0 and _Bool != 1 to _Bool != 0 by swapping edges. it seems that it can keep _Bool == 1 around, which is something optimize_range_tests_var_bound didn't handle before either and now does. Jakub
[GCC][PATCH][mid-end][Version 3] Optimize x * copysign (1.0, y) [Patch (1/2)]
Hi All, this patch implements a optimization rewriting x * copysign (1.0, y) to: x ^ (y & (1 << sign_bit_position)) This is only done when not honoring signaling NaNs. This transormation is done at ssa mult widening time and is gated on the a check for the optab "xorsign". If the optab is not available then copysign is expanded as normal. If the optab exists then the expression is replaced with a call to the internal function XORSIGN. This patch is a revival of a previous patches https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00749.html Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues. Regression done on aarch64-none-linux-gnu and no regressions. Ok for trunk? gcc/ 2017-08-03 Tamar Christina Andrew Pinski PR middle-end/19706 * internal-fn.def (XORSIGN): New. * optabs.def (xorsign_optab): New. * tree-ssa-math-opts.c (is_copysign_call_with_1): New. (convert_expand_mult_copysign): New. (pass_optimize_widening_mul::execute): Call convert_expand_mult_copysign. -- diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index a9a3f7606eb2a79f64dab1b7fdeef0d308e3061d..58e5f4a322a92ccb842ab05cc4213933ffa59679 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -129,6 +129,8 @@ DEF_INTERNAL_FLT_FN (REMAINDER, ECF_CONST, remainder, binary) DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary) DEF_INTERNAL_FLT_FN (FMIN, ECF_CONST, fmin, binary) DEF_INTERNAL_FLT_FN (FMAX, ECF_CONST, fmax, binary) +DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary) + /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git a/gcc/optabs.def b/gcc/optabs.def index f21f2267ec2118d5cd0e74b18721525a564d16f2..54afe2d796ee9af3bd7b25d93eb0789a70e47c7b 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -255,6 +255,7 @@ OPTAB_D (asin_optab, "asin$a2") OPTAB_D (atan2_optab, "atan2$a3") OPTAB_D (atan_optab, "atan$a2") OPTAB_D (copysign_optab, "copysign$F$a3") +OPTAB_D (xorsign_optab, "xorsign$F$a3") OPTAB_D (cos_optab, "cos$a2") OPTAB_D (exp10_optab, "exp10$a2") OPTAB_D (exp2_optab, "exp2$a2") diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 7ac1659fa0670b7080685f3f9513939807073a63..780a7f76ce756cfe025e80845208b00568eda56c 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -3145,6 +3145,96 @@ is_widening_mult_p (gimple *stmt, return true; } +/* Check to see if the CALL statement is an invocation of copysign + with 1. being the first argument. */ +static bool +is_copysign_call_with_1 (gimple *call) +{ + if (!is_gimple_call (call)) +return false; + + enum combined_fn code = gimple_call_combined_fn (call); + + if (code == CFN_LAST) +return false; + + gcall *c = as_a (call); + + if (builtin_fn_p (code)) +{ + switch (as_builtin_fn (code)) + { + CASE_FLT_FN (BUILT_IN_COPYSIGN): + CASE_FLT_FN_FLOATN_NX (BUILT_IN_COPYSIGN): + return real_onep (gimple_call_arg (c, 0)); + default: + return false; + } +} + + if (internal_fn_p (code)) +{ + switch (as_internal_fn (code)) + { + case IFN_COPYSIGN: + return real_onep (gimple_call_arg (c, 0)); + default: + return false; + } +} + + return false; +} + +/* Try to expand the pattern x * copysign (1, y) into xorsign (x, y). + This only happens when the the xorsign optab is defined, if the + pattern is not a xorsign pattern or if expansion fails FALSE is + returned, otherwise TRUE is returned. */ +static bool +convert_expand_mult_copysign (gimple *stmt, gimple_stmt_iterator *gsi) +{ + tree treeop0, treeop1, lhs, type; + location_t loc = gimple_location (stmt); + lhs = gimple_assign_lhs (stmt); + treeop0 = gimple_assign_rhs1 (stmt); + treeop1 = gimple_assign_rhs2 (stmt); + type = TREE_TYPE (lhs); + machine_mode mode = TYPE_MODE (type); + + if (HONOR_SNANS (type)) +return false; + + if (TREE_CODE (treeop0) == SSA_NAME && TREE_CODE (treeop1) == SSA_NAME) +{ + gimple *call0 = SSA_NAME_DEF_STMT (treeop0); + if (!is_copysign_call_with_1 (call0)) + { + /* IPA sometimes inlines and then extracts the function again, + resulting in an incorrect order, so check both ways. */ + call0 = SSA_NAME_DEF_STMT (treeop1); + if (!is_copysign_call_with_1 (call0)) + return false; + + treeop1 = treeop0; + } + + if (optab_handler (xorsign_optab, mode) == CODE_FOR_nothing) + return false; + + gcall *c = as_a (call0); + treeop0 = gimple_call_arg (c, 1); + + gcall *call_stmt + = gimple_build_call_internal (IFN_XORSIGN, 2, treeop1,treeop0); + gimple_set_lhs (call_stmt, lhs); + gimple_set_location (call_stmt, loc); + gsi_insert_after (gsi, call_stmt, GSI_SAME_STMT); + return true; +} + + return false; +} + /* Process a single gimple statement STMT, which has a MULT_EXPR as its rhs, and try to convert it into a WIDEN_MULT_EXPR. The return value is true iff we converted th
[PATCH, ARM] Handle DWARF2_UNWIND_INFO in arm_except_unwind_info
Hello, On top of the correction for fallouts proposed in https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00271.html the attached patch is needed to support targets still resorting to DWARF2 unwinding on ARM, such as VxWorks6 (VxWorks7 has moved to ARM EH). Tested by verifying that a build with mainline sources for arm-wrs-vxworks proceeds to completion, that a similar gcc-7 based compiler builds fine as well and passes Ada ACATS, which confirms proper operation of the DWARF2 based exceptions. OK to commit ? Thanks in advance for your feedback, Olivier 2017-08-03 Olivier Hainque * common/config/arm/arm-common.c (arm_except_unwind_info): Handle DWARF2_UNWIND_INFO. arm-handle-dwarf2.diff Description: Binary data
Re: PR78888 - add value range info for tolower/toupper
On 3 August 2017 at 13:21, Jakub Jelinek wrote: > On Thu, Aug 03, 2017 at 12:58:06PM +0530, Prathamesh Kulkarni wrote: >> --- a/gcc/tree-vrp.c >> +++ b/gcc/tree-vrp.c >> @@ -3778,6 +3778,19 @@ extract_range_basic (value_range *vr, gimple *stmt) >> return; >> } >> break; >> + case CFN_BUILT_IN_TOUPPER: >> + case CFN_BUILT_IN_TOLOWER: >> + if (tree lhs = gimple_call_lhs (stmt)) >> + { >> + tree type = TREE_TYPE (lhs); >> + unsigned char min = (cfn == CFN_BUILT_IN_TOUPPER) ? 'a' : 'A'; >> + unsigned char max = (cfn == CFN_BUILT_IN_TOUPPER) ? 'z' : 'Z'; >> + tree range_min = build_int_cstu (type, min); >> + tree range_max = build_int_cstu (type, max); >> + set_value_range (vr, VR_ANTI_RANGE, range_min, range_max, NULL); >> + return; > > You are hardcoding here host characters and using it for target. > I think you need to use > lang_hooks.to_target_charset > (really no idea how it works or doesn't in LTO, but gimple-fold.c is already > using it among others). > Also, you're assuming that the 'a'-'z' and 'A'-'Z' ranges are without gaps, > which isn't the case for e.g. EBCDIC. So I think you'd need to verify > (once?) that the target charset has this property. Hi Jakub, Thanks for the suggestions, I wasn't aware about lang_hooks.to_target_charset. Would following be a correct check that target has ascii charset and only then set range to ~[a, z] ? target_a = lang_hooks.to_target_charset ('a'); target_z = lang_hooks.to_target_charset('z'); if (target_a == 'a' && target_z == 'z') { // set vr to ~['a', 'z'] } Thanks, Prathamesh > > Jakub
Re: [PATCH][2/2] early LTO debug, main part
On Wed, 2 Aug 2017, Jason Merrill wrote: > On Wed, Aug 2, 2017 at 6:35 AM, Richard Biener wrote: > > On Wed, 2 Aug 2017, Jason Merrill wrote: > > > >> On 05/19/2017 06:42 AM, Richard Biener wrote: > >> > + /* ??? In some cases the C++ FE (at least) fails to > >> > +set DECL_CONTEXT properly. Simply globalize stuff > >> > +in this case. For example > >> > +__dso_handle created via iostream line 74 col 25. */ > >> > + parent = comp_unit_die (); > >> > >> I've now fixed __dso_handle, so that can be removed from the comment, but > >> it > >> still makes sense to have this fall-back for the (permitted) case of null > >> DECL_CONTEXT. > > > > Adjusted the comment. > > > >> > + /* ??? LANG issue - DW_TAG_module for fortran. Either look > >> > +at the input language (if we have enough DECL_CONTEXT to follow) > >> > +or use a bit in tree_decl_with_vis to record the distinction. */ > >> > >> Sure, you should be able to look at TRANSLATION_UNIT_LANGUAGE. > > > > Yeah, the comment says we might be able to walk DECL_CONTEXT up to > > a TRANSLATION_UNIT_DECL. I've amended is_fortran similar to as I > > amended is_cxx, providing an overload for a decl, factoring out common > > code. So this is now if (is_fortran (decl)) ... = new_die > > (DW_TAG_module,...). > > > >> > ! /* ??? We cannot unconditionally output die_offset if > >> > !non-zero - at least -feliminate-dwarf2-dups will > >> > !create references to those DIEs via symbols. And we > >> > !do not clear its DIE offset after outputting it > >> > !(and the label refers to the actual DIEs, not the > >> > !DWARF CU unit header which is when using label + offset > >> > !would be the correct thing to do). > >> > >> As in our previous discussion, I think -feliminate-dwarf2-dups can go away > >> now. But this is a more general issue: die_offset has meant the offset > >> from > >> the beginning of the CU, but if with_offset is set it means an offset from > >> die_symbol. Since with_offset changes the meaning of die_symbol and > >> die_offset, having different code here depending on that flag makes sense. > >> > >> It seems likely that when -fEDD goes away, we will never again want to do > >> direct symbolic references to DIEs, in which case we could drop the current > >> meaning of die_symbol, and so we wouldn't need the with_offset flag. > > > > Yeah, I've been playing with a patch to remove -fEDD but it has conflicts > > with the early LTO debug work and thus I wanted to postpone it until > > after that goes in to avoid further churn. I hope that's fine, it's > > sth I'll tackle soon after this patch lands on trunk. > > Sure. > > >> > ! /* Don't output the symbol twice. For LTO we want the label > >> > ! on the section beginning, not on the actual DIE. */ > >> > ! && (!flag_generate_lto > >> > ! || die->die_tag != DW_TAG_compile_unit)) > >> > >> I think this check should just be !with_offset; if that flag is set the DIE > >> doesn't actually have its own symbol. > > > > with_offset is set only during LTRANS phase for the DIEs refering to > > the early DIEs via the CU label. But the above is guarding the > > early phase when we do not want to output that CU label twice. > > > > Can we revisit this check when -fEDD has gone away? > > Yes. > > >> > ! if (old_die > >> > ! && (c = get_AT_ref (old_die, DW_AT_abstract_origin)) > >> > ! /* ??? In LTO all origin DIEs still refer to the early > >> > !debug copy. Detect that. */ > >> > ! && get_AT (c, DW_AT_inline)) > >> ... > >> > ! /* "Unwrap" the decls DIE which we put in the imported unit > >> > context. > >> > ! ??? If we finish dwarf2out_function_decl refactoring we can > >> > ! do this in a better way from the start and only lazily emit > >> > ! the early DIE references. */ > >> > >> It seems like in gen_subprogram_die you deliberately avoid reusing the DIE > >> from dwarf2out_register_external_die (since it doesn't have DW_AT_inline), > >> and > >> then in add_abstract_origin_attribute you need to look through that > >> redundant > >> die. Why not reuse it? > > > > What we're doing here is dealing with the case of an inlined > > instance which is adjusted to point back to the early debug copy > > directly than to the wrapping DIE (supposed to eventually contain > > the concrete instance). > > But we enter this block when we're emitting the concrete out-of-line > instance, and the DW_AT_inline check prevents us from using the > wrapping DIE for the out-of-line instance. > > The comment should really change "inlined instance" to "concrete > instance"; inlined instances are handled in > gen_inlined_subroutine_die. You are right. I suspect I got confused by the comment when looking for a way to paper over the check_die ICE removing the check causes. We end up add
Re: PR78888 - add value range info for tolower/toupper
On Thu, Aug 03, 2017 at 04:13:38PM +0530, Prathamesh Kulkarni wrote: > > You are hardcoding here host characters and using it for target. > > I think you need to use > > lang_hooks.to_target_charset > > (really no idea how it works or doesn't in LTO, but gimple-fold.c is already > > using it among others). > > Also, you're assuming that the 'a'-'z' and 'A'-'Z' ranges are without gaps, > > which isn't the case for e.g. EBCDIC. So I think you'd need to verify > > (once?) that the target charset has this property. > Hi Jakub, > Thanks for the suggestions, I wasn't aware about lang_hooks.to_target_charset. > Would following be a correct check that target has ascii charset and only then > set range to ~[a, z] ? > > target_a = lang_hooks.to_target_charset ('a'); > target_z = lang_hooks.to_target_charset('z'); > if (target_a == 'a' && target_z == 'z') > { > // set vr to ~['a', 'z'] > } No. E.g. if host == target charset is EBCDIC, then the condition is true, but it isn't a consecutive range. A rough check that would maybe work (at least would be false for EBCDIC and true for ASCII) could be something like: if (target_z > target_a && target_z - target_a == 25) (which is not exact, some strange charset could have say '0'-'9' block in the middle of 'a'-'z' block, and say 'h'-'r' outside of the range), but perhaps good enough for real-world charsets. In any case, you should probably investigate all the locales say in glibc or some other big locale repository whether tolower/toupper have the expected properties there. Plus of course, the set vr to ~[ ] needs to use target_a/target_z. Jakub
[nvptx, PR 81662, committed] Error out on nvptx for fpatchable-function-entry
[ was: Re: [testsuite, PR81662, committed] Skip fpatchable-function-entry tests for nvptx ] On 08/03/2017 09:17 AM, Tom de Vries wrote: Hi, fpatchable-function-entry requires nop, which nvptx does not have. I've disabled the corresponding test for nvptx. This patch errors out on nvptx for fpatchable-function-entry. Committed. Thanks, - Tom Error out on nvptx for fpatchable-function-entry 2017-08-03 Tom de Vries PR target/81662 * config/nvptx/nvptx.c (nvptx_option_override): Emit sorry if function_entry_patch_area_size > 0. * gcc.target/nvptx/patchable_function_entry-default.c: New test. --- gcc/config/nvptx/nvptx.c | 4 .../gcc.target/nvptx/patchable_function_entry-default.c | 15 +++ 2 files changed, 19 insertions(+) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 208b115..9211d1a 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -180,6 +180,10 @@ nvptx_option_override (void) if (!global_options_set.x_flag_no_common) flag_no_common = 1; + /* The patch area requires nops, which we don't have. */ + if (function_entry_patch_area_size > 0) +sorry ("not generating patch area, nops not supported"); + /* Assumes that it will see only hard registers. */ flag_var_tracking = 0; diff --git a/gcc/testsuite/gcc.target/nvptx/patchable_function_entry-default.c b/gcc/testsuite/gcc.target/nvptx/patchable_function_entry-default.c new file mode 100644 index 000..4254456 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/patchable_function_entry-default.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fpatchable-function-entry=3,1" } */ + +extern int a; + +int f3 (void); + +int +__attribute__((noinline)) +f3 (void) +{ + return 5*a; +} + +/* { dg-excess-errors "sorry, unimplemented: not generating patch area, nops not supported" } */
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
Yury Gribov: > On 03.08.2017 3:06, Ximin Luo wrote: >> Jeff Law: >>> On 07/21/2017 10:15 AM, Ximin Luo wrote: (Please keep me on CC, I am not subscribed) Proposal This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value" command-line arguments that precede any explicit ones. This makes the final binary output reproducible, and also hides the unreproducible value (the source path prefixes) from CFLAGS et. al. which many build tools (understandably) embed as-is into their build output. >>> I'd *really* avoid doing this with magic environment variables. Make it >>> a first class option to the compiler. Yes, it means projects that want >>> this behavior have to arrange to pass that flag to their compiler, but >>> IMHO that's much preferred over environment variables. >>> >>> Jeff >>> >> >> Hi Jeff, >> >> If by "first class option" you meant a command-line flag, GCC *already has* >> that (-fdebug-prefix-map) > and it wasn't enough to achieve reproducibility >> in many cases we tested. > > Shouldn't -fdebug-prefix-map be updated to use the same syntax as > BUILD_PATH_PREFIX_MAP? > -fdebug-prefix-map is a CLI option and can be given multiple times, each flag given is in the form of $from=$to where $from can't contain a '='. BUILD_PATH_PREFIX_MAP is a single envvar that encodes a list-of-pairs of the form $to=$from:$to=$from with some escaping for flexibility and to support things like windows paths. Since it's a new envvar, Ian Jackson suggested $to=$from to emphasise the reproducible ($to) part. I liked the idea so I implemented it like that. (We did a lot of bikeshedding over on the rb-general mailing list about the exact format and this is what we settled on, I'd like to avoid getting into that again but would nevertheless do it, if it's necessary to get this patch accepted.) Because -fdebug-prefix-map currently only encodes one $from=$to pair, it would be a very disruptive and highly backward-incompatible change to make it use the same syntax as B_P_P_M. A slightly less disruptive but still backward-incompatible change would be to make it encode a single $to=$from pair, but I don't really see the advantage to doing so - what were your thoughts on this? X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
[PATCH] Fix PR81148
This tries to fix all the UBSAN issues in the reassoc case of fold_binary by never generating any NEGATE_EXPRs from split_tree or association. This means we have to track minus_var/minus_con and appropriately transform. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. This FAILs gcc.dg/tree-ssa/reassoc-23.c because we now optimize this during FRE1 (after folding in a way that allows CSE to catch this). Slightly uglifying the testcase exposes a missed optimization in reassoc so I'm going to fix that separately. Richard. 2017-08-03 Richard Biener PR middle-end/81148 * fold-const.c (split_tree): Add minus_var and minus_con arguments, remove unused loc arg. Never generate NEGATE_EXPRs here but always use minus_*. (associate_trees): Assert we never associate with MINUS_EXPR and NULL first operand. Do not recurse for PLUS_EXPR operands when associating as MINUS_EXPR either. (fold_binary_loc): Track minus_var and minus_con. * c-c++-common/ubsan/pr81148.c: New testcase. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 250818) +++ gcc/fold-const.c(working copy) @@ -108,8 +108,6 @@ enum comparison_code { static bool negate_expr_p (tree); static tree negate_expr (tree); -static tree split_tree (location_t, tree, tree, enum tree_code, - tree *, tree *, tree *, int); static tree associate_trees (location_t, tree, tree, enum tree_code, tree); static enum comparison_code comparison_to_compcode (enum tree_code); static enum tree_code compcode_to_comparison (enum comparison_code); @@ -775,12 +773,14 @@ negate_expr (tree t) same type as IN, but they will have the same signedness and mode. */ static tree -split_tree (location_t loc, tree in, tree type, enum tree_code code, - tree *conp, tree *litp, tree *minus_litp, int negate_p) +split_tree (tree in, tree type, enum tree_code code, + tree *minus_varp, tree *conp, tree *minus_conp, + tree *litp, tree *minus_litp, int negate_p) { tree var = 0; - + *minus_varp = 0; *conp = 0; + *minus_conp = 0; *litp = 0; *minus_litp = 0; @@ -834,27 +834,19 @@ split_tree (location_t loc, tree in, tre if (neg_litp_p) *minus_litp = *litp, *litp = 0; if (neg_conp_p && *conp) - { - /* Convert to TYPE before negating. */ - *conp = fold_convert_loc (loc, type, *conp); - *conp = negate_expr (*conp); - } + *minus_conp = *conp, *conp = 0; if (neg_var_p && var) - { - /* Convert to TYPE before negating. */ - var = fold_convert_loc (loc, type, var); - var = negate_expr (var); - } + *minus_varp = var, var = 0; } else if (TREE_CONSTANT (in)) *conp = in; else if (TREE_CODE (in) == BIT_NOT_EXPR && code == PLUS_EXPR) { - /* -X - 1 is folded to ~X, undo that here. Do _not_ do this - when IN is constant. Convert to TYPE before negating. */ - *minus_litp = build_one_cst (type); - var = negate_expr (fold_convert_loc (loc, type, TREE_OPERAND (in, 0))); + /* -1 - X is folded to ~X, undo that here. Do _not_ do this + when IN is constant. */ + *litp = build_minus_one_cst (type); + *minus_varp = TREE_OPERAND (in, 0); } else var = in; @@ -866,17 +858,13 @@ split_tree (location_t loc, tree in, tre else if (*minus_litp) *litp = *minus_litp, *minus_litp = 0; if (*conp) - { - /* Convert to TYPE before negating. */ - *conp = fold_convert_loc (loc, type, *conp); - *conp = negate_expr (*conp); - } + *minus_conp = *conp, *conp = 0; + else if (*minus_conp) + *conp = *minus_conp, *minus_conp = 0; if (var) - { - /* Convert to TYPE before negating. */ - var = fold_convert_loc (loc, type, var); - var = negate_expr (var); - } + *minus_varp = var, var = 0; + else if (*minus_varp) + var = *minus_varp, *minus_varp = 0; } if (*litp @@ -898,7 +886,10 @@ static tree associate_trees (location_t loc, tree t1, tree t2, enum tree_code code, tree type) { if (t1 == 0) -return t2; +{ + gcc_assert (t2 == 0 || code != MINUS_EXPR); + return t2; +} else if (t2 == 0) return t1; @@ -906,6 +897,7 @@ associate_trees (location_t loc, tree t1 try to fold this since we will have infinite recursion. But do deal with any NEGATE_EXPRs. */ if (TREE_CODE (t1) == code || TREE_CODE (t2) == code + || TREE_CODE (t1) == PLUS_EXPR || TREE_CODE (t2) == PLUS_EXPR || TREE_CODE (t1) == MINUS_EXPR || TREE_CODE (t2) == MINUS_EXPR) { if (code == PLUS_EXPR) @@ -9560,8 +9552,8 @@ fold_binary_loc (location_t loc, if ((! FLOAT_TYPE_P (type) || flag_associative_math)
[PATCH] Fix gcc.dg/tree-ssa/reassoc-23.c
The following makes reassoc also break up subtracts for values used on the positive side of a MINUS_EXPR. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2017-08-03 Richard Biener * tree-ssa-reassoc.c (should_break_up_subtract): Also break up if the use is in USE - X. * gcc.dg/tree-ssa/reassoc-23.c: Adjust to fool early folding and CSE. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 250818) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -4718,7 +4718,9 @@ should_break_up_subtract (gimple *stmt) && (immusestmt = get_single_immediate_use (lhs)) && is_gimple_assign (immusestmt) && (gimple_assign_rhs_code (immusestmt) == PLUS_EXPR - || gimple_assign_rhs_code (immusestmt) == MULT_EXPR)) + || (gimple_assign_rhs_code (immusestmt) == MINUS_EXPR + && gimple_assign_rhs1 (immusestmt) == lhs) + || gimple_assign_rhs_code (immusestmt) == MULT_EXPR)) return true; return false; } Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-23.c === --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-23.c (revision 250818) +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-23.c (working copy) @@ -6,9 +6,10 @@ foo(unsigned int a, unsigned int b, unsi unsigned int e, unsigned int f, unsigned int g, unsigned int h) { /* Should be transformed into e = 20 */ - unsigned int i = (a + 9) + (c + 8); - unsigned int j = (-c + 1) + (-a + 2); - + unsigned int i = (a + 9); + unsigned int j = (-c + 1); + i += (c + 8); + j += (-a + 2); e = i + j; return e; }
[PATCH] Avoid messing up DECL_ABSTRACT_ORIGIN in lto_symtab_prevail_decl
$subject because if we put anything decl-ish there (set_decl_origin_self...) then we run into lto_fixup_prevailing_decls ICEing as the state is now no longer the same as during the time we called mentions_vars_p. Bootstrap / regtest running on x86_64-unknown-linux-gnu. Richard. 2017-08-03 Richard Biener lto/ * lto-symtab.h (lto_symtab_prevail_decl): Do not use DECL_ABSTRACT_ORIGIN as flag we can end up using that. Instead use DECL_LANG_FLAG_0. (lto_symtab_prevail_decl): Likewise. Index: gcc/lto/lto-symtab.h === --- gcc/lto/lto-symtab.h(revision 250817) +++ gcc/lto/lto-symtab.h(working copy) @@ -23,7 +23,7 @@ extern tree lto_symtab_prevailing_decl ( extern tree lto_symtab_prevailing_virtual_decl (tree decl); /* Mark DECL to be previailed by PREVAILING. - Use DECL_ABSTRACT_ORIGIN and DECL_CHAIN as special markers; those do not + Use DECL_LANG_FLAG_0 and DECL_CHAIN as special markers; those do not disturb debug_tree and diagnostics. We are safe to modify them as we wish, because the declarations disappear from the IL after the merging. */ @@ -31,10 +31,10 @@ extern tree lto_symtab_prevailing_virtua inline void lto_symtab_prevail_decl (tree prevailing, tree decl) { - gcc_checking_assert (DECL_ABSTRACT_ORIGIN (decl) != error_mark_node); + gcc_checking_assert (! DECL_LANG_FLAG_0 (decl)); gcc_assert (TREE_PUBLIC (decl) || DECL_EXTERNAL (decl)); DECL_CHAIN (decl) = prevailing; - DECL_ABSTRACT_ORIGIN (decl) = error_mark_node; + DECL_LANG_FLAG_0 (decl) = 1; } /* Given the decl DECL, return the prevailing decl with the same name. */ @@ -42,7 +42,7 @@ lto_symtab_prevail_decl (tree prevailing inline tree lto_symtab_prevailing_decl (tree decl) { - if (DECL_ABSTRACT_ORIGIN (decl) == error_mark_node) + if (DECL_LANG_FLAG_0 (decl)) return DECL_CHAIN (decl); else {
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On Wed, Aug 2, 2017 at 1:51 PM, Richard Biener wrote: > On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote: >> Hello. >> >> After some discussions with Honza, I've decided to convert current code in >> stmt.c that >> is responsible for switch expansion. More precisely, I would like to convert >> the code >> to expand gswitch statements on tree level. Currently the newly created pass >> is executed >> at the end of tree optimizations. Hah, something I promissed myself (and others) to do years ago! Thanks thanks thanks! :-) >> My plan for future is to inspire in [1] and come up with some more >> sophisticated switch >> expansions. For that I've been working on a paper where I'll summarize >> statistics based >> on what I've collected in openSUSE distribution with specially instrumented >> GCC. If I'll be >> happy I can also fit in to schedule of this year's Cauldron with a talk. Sayle's paper is a good starting point. Also interesting: >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> >> Thoughts? > > First of all thanks. > > I think part of switch expansion moved to switch-conversion some time ago > (emit_case_bit_tests). So maybe the full lowering should be in at least > the same source file and it should maybe applied earlier for a subset of > cases (very low number of cases for example). > > Did you base the code on the RTL expansion code or did you re-write it from > scratch? > > No further comments sofar. > > Richard. > >> Martin >> >> [1] https://www.nextmovesoftware.com/technology/SwitchOptimization.pdf >> >> gcc/ChangeLog: >> >> 2017-07-31 Martin Liska >> >> * Makefile.in: Add gimple-switch-low.o. >> * passes.def: Include pass_lower_switch. >> * stmt.c (dump_case_nodes): Remove and move to >> gimple-switch-low.c. >> (case_values_threshold): Likewise. >> (expand_switch_as_decision_tree_p): Likewise. >> (emit_case_decision_tree): Likewise. >> (expand_case): Likewise. >> (balance_case_nodes): Likewise. >> (node_has_low_bound): Likewise. >> (node_has_high_bound): Likewise. >> (node_is_bounded): Likewise. >> (emit_case_nodes): Likewise. >> * timevar.def: Add TV_TREE_SWITCH_LOWERING. >> * tree-pass.h: Add make_pass_lower_switch. >> * gimple-switch-low.c: New file. >> >> gcc/testsuite/ChangeLog: >> >> 2017-07-31 Martin Liska >> >> * gcc.dg/tree-prof/update-loopch.c: Scan patterns in >> switchlower. >> * gcc.dg/tree-ssa/vrp104.c: Likewise. >> --- >> gcc/Makefile.in|1 + >> gcc/gimple-switch-low.c| 1226 >> >> gcc/passes.def |1 + >> gcc/stmt.c | 861 - >> gcc/testsuite/gcc.dg/tree-prof/update-loopch.c | 10 +- >> gcc/testsuite/gcc.dg/tree-ssa/vrp104.c |2 +- >> gcc/timevar.def|1 + >> gcc/tree-pass.h|1 + >> 8 files changed, 1236 insertions(+), 867 deletions(-) >> create mode 100644 gcc/gimple-switch-low.c >> >>
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov wrote: > On 02.08.2017 23:04, H.J. Lu wrote: >> >> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov >> wrote: >>> >>> On 02.08.2017 21:48, H.J. Lu wrote: On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov wrote: > > > On 02.08.2017 20:02, Jeff Law wrote: >> >> >> >> On 08/02/2017 12:47 PM, Jakub Jelinek wrote: >>> >>> >>> >>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: On 07/17/2017 01:23 AM, Yuri Gribov wrote: > > > > I've rebased the previous patch to trunk per Andrew's suggestion. > Original patch description/motivation/questions are in > https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html Is his stuff used for exception handling? If so, doesn't that make the performance a significant concern (ie that msync call?) >>> >>> >>> >>> >>> For the performance issue, I wonder if it wouldn't be better to just >>> compile the unwinder twice, basically include everything needed for >>> the >>> verification backtrace in a single *.c file with special defines, and >>> not export anything from that *.o file except the single entrypoint. >>> That would also handle the ABI problems. >> >> >> >> Yea. >> >> Given that the vast majority of the time the addresses are valid, I >> wonder if we could take advantage of that property to keep the >> overhead >> lower in the most common cases. >> >>> For the concurrent calls of other threads unmapping stacks of running >>> threads (that is where most of the verification goes against), I'm >>> afraid >>> that is simply non-solveable, even if you don't cache anything, it >>> will >>> still be racy. >> >> >> >> Absolutely -- I think solving this stuff 100% reliably without races >> isn't possible. I think the question is can we make this >> significantly >> better. > > > > > All, > > First of all, thanks for the feedback. This is indeed not a 100% > robust > solution, just something to allow tools like Asan to produce more sane > backtraces on crash (currently when stack is corrupted they would crash > in > the unwinder without printing at least partial backtrace). > FWIW, glibc 2.26 is changed to abort as soon as possible when stack is corrupted: https://sourceware.org/bugzilla/show_bug.cgi?id=21752 https://sourceware.org/bugzilla/show_bug.cgi?id=12189 There is very little point to unwind stack when stack is corrupted. >>> >>> >>> >>> I'd argue that situation with __stack_chk_fail is very special. It's >>> used >>> in production code where you'd want to halt as soon as corruption is >>> detected to prevent further damage so even printing message is an >>> additional >>> risk. Debugging tools (e.g. sanitizers) are different and would prefer >>> to >>> print as many survived frames as possible. >>> >> >> But stack unwinder in libgcc is primarily for production use, not for >> debugging. > > > I can see it used in different projects to print backtraces on error (bind9, > SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder and > seems to be meant primarily for debugging (at least many projects use it for > this purpose: > https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for > libbacktrace which, as its README mentions, is meant to "print a detailed > backtrace when an error occurs or to gather detailed profiling information". But it shouldn't be performed on a corrupted stack. When a stack is corrupted, there is just no way to reliably unwind it. It isn't a problem for a debugger which is designed to analyze corrupted program. > Validation is also performed by libunwind (at least in some cases) which > libgcc unwinder mimics. > > >> Use it to unwind corrupted stack isn't appropriate. Santizer can try >> similar >> similar to valgrind to invoke a debugger to unwind corrupted stack. > > > -Y -- H.J.
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: > On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov > wrote: >> On 02.08.2017 23:04, H.J. Lu wrote: >>> >>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov >>> wrote: On 02.08.2017 21:48, H.J. Lu wrote: > > > On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov > > wrote: >> >> >> On 02.08.2017 20:02, Jeff Law wrote: >>> >>> >>> >>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote: On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: > > > > On 07/17/2017 01:23 AM, Yuri Gribov wrote: >> >> >> >> I've rebased the previous patch to trunk per Andrew's suggestion. >> Original patch description/motivation/questions are in >> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html > > > > Is his stuff used for exception handling? If so, doesn't that make > the > performance a significant concern (ie that msync call?) For the performance issue, I wonder if it wouldn't be better to just compile the unwinder twice, basically include everything needed for the verification backtrace in a single *.c file with special defines, and not export anything from that *.o file except the single entrypoint. That would also handle the ABI problems. >>> >>> >>> >>> Yea. >>> >>> Given that the vast majority of the time the addresses are valid, I >>> wonder if we could take advantage of that property to keep the >>> overhead >>> lower in the most common cases. >>> For the concurrent calls of other threads unmapping stacks of running threads (that is where most of the verification goes against), I'm afraid that is simply non-solveable, even if you don't cache anything, it will still be racy. >>> >>> >>> >>> Absolutely -- I think solving this stuff 100% reliably without races >>> isn't possible. I think the question is can we make this >>> significantly >>> better. >> >> >> >> >> All, >> >> First of all, thanks for the feedback. This is indeed not a 100% >> robust >> solution, just something to allow tools like Asan to produce more sane >> backtraces on crash (currently when stack is corrupted they would crash >> in >> the unwinder without printing at least partial backtrace). >> > > FWIW, glibc 2.26 is changed to abort as soon as possible when stack is > corrupted: > > https://sourceware.org/bugzilla/show_bug.cgi?id=21752 > https://sourceware.org/bugzilla/show_bug.cgi?id=12189 > > There is very little point to unwind stack when stack is corrupted. I'd argue that situation with __stack_chk_fail is very special. It's used in production code where you'd want to halt as soon as corruption is detected to prevent further damage so even printing message is an additional risk. Debugging tools (e.g. sanitizers) are different and would prefer to print as many survived frames as possible. >>> >>> But stack unwinder in libgcc is primarily for production use, not for >>> debugging. >> >> >> I can see it used in different projects to print backtraces on error (bind9, >> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder and >> seems to be meant primarily for debugging (at least many projects use it for >> this purpose: >> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for >> libbacktrace which, as its README mentions, is meant to "print a detailed >> backtrace when an error occurs or to gather detailed profiling information". > > But it shouldn't be performed on a corrupted stack. When a stack is > corrupted, there is just no way to reliably unwind it. Why not? Attached patch shows that it's possible (up to a point of corruption of course). > It isn't a problem > for a debugger which is designed to analyze corrupted program. Yes but forcing all applications that would like to print helpful message on stack corruption to employ gdb sounds less efficient that providing fail-safe APIs in standard library... >> Validation is also performed by libunwind (at least in some cases) which >> libgcc unwinder mimics. >> >> >>> Use it to unwind corrupted stack isn't appropriate. Santizer can try >>> similar >>> similar to valgrind to invoke a debugger to unwind corrupted stack. >> >> >> -Y > > > > -- > H.J.
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote: > Hello. > > After some discussions with Honza, I've decided to convert current code in > stmt.c that > is responsible for switch expansion. More precisely, I would like to convert > the code > to expand gswitch statements on tree level. Currently the newly created pass > is executed > at the end of tree optimizations. Hah, something I promissed myself (and others) to do years ago! Thanks thanks thanks! :-) The end of the gimple optimizations is seems late for the lowering. Before there were gimple optimizations, switch lowering was part of the first compiler pass (to generate RTL from the front end) *before* all code transformation passes ("optimizations" ;-). Because the lowering of switch statements was rewritten as a gimple lowering pass, it now runs *after* optimizations. But to be honest, I can't think of any optimization opportunities exposed by lowering earlier than at the end of gimple optimizations. Years ago there was some RTL jump threading still done after lowering of small switch statements, but I can't find the related PRs anymore. > My plan for future is to inspire in [1] and come up with some more > sophisticated switch > expansions. For that I've been working on a paper where I'll summarize > statistics based > on what I've collected in openSUSE distribution with specially instrumented > GCC. If I'll be > happy I can also fit in to schedule of this year's Cauldron with a talk. Sayle's paper is a good starting point. Also interesting: http://llvm.org/devmtg/2017-02-04/Efficient-clustering-of-case-statements-for-indirect-branch-prediction.pdf About adjusting the size of jump tables to the capabilities of the CPU branch predictor. Mixed results but something to consider. Kannan & Proebsting, "CORRECTION TO ‘PRODUCING GOOD CODE FOR THE CASE STATEMENT’" About finding "clusers" of given density within the target values of the switch statement. The clustering algorithm as presented is N^2 in the number of case labels, but the idea is interesting and a simplified, cheaper approach may be possible. This is already used in LLVM, it seems. The real challenge will be to figure out how to pick from all these different approaches the right ones to lower a single switch statement. Ciao! Steven
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On Thu, Aug 3, 2017 at 2:52 PM, Steven Bosscher wrote: > On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote: >> Hello. >> >> After some discussions with Honza, I've decided to convert current code in >> stmt.c that >> is responsible for switch expansion. More precisely, I would like to convert >> the code >> to expand gswitch statements on tree level. Currently the newly created pass >> is executed >> at the end of tree optimizations. > > Hah, something I promissed myself (and others) to do years ago! Thanks > thanks thanks! :-) > > The end of the gimple optimizations is seems late for the lowering. > Before there were gimple optimizations, switch lowering was part of > the first compiler pass (to generate RTL from the front end) *before* > all code transformation passes ("optimizations" ;-). Because the > lowering of switch statements was rewritten as a gimple lowering pass, > it now runs *after* optimizations. But to be honest, I can't think of > any optimization opportunities exposed by lowering earlier than at the > end of gimple optimizations. Years ago there was some RTL jump > threading still done after lowering of small switch statements, but I > can't find the related PRs anymore. I think the main reason for not doing it early is the benefit is small (unless it is GIMPLE optimizations triggering) and we can't get rid of switches completely because we eventually have to support casei RTL expansion. (and no, computed goto with an array of label addresses at GIMPLE is really too ugly ;)) Richard. > >> My plan for future is to inspire in [1] and come up with some more >> sophisticated switch >> expansions. For that I've been working on a paper where I'll summarize >> statistics based >> on what I've collected in openSUSE distribution with specially instrumented >> GCC. If I'll be >> happy I can also fit in to schedule of this year's Cauldron with a talk. > > Sayle's paper is a good starting point. Also interesting: > > http://llvm.org/devmtg/2017-02-04/Efficient-clustering-of-case-statements-for-indirect-branch-prediction.pdf > About adjusting the size of jump tables to the capabilities of the CPU > branch predictor. Mixed results but something to consider. > > Kannan & Proebsting, "CORRECTION TO ‘PRODUCING GOOD CODE FOR THE CASE > STATEMENT’" > About finding "clusers" of given density within the target values of > the switch statement. The clustering algorithm as presented is N^2 in > the number of case labels, but the idea is interesting and a > simplified, cheaper approach may be possible. This is already used in > LLVM, it seems. > > The real challenge will be to figure out how to pick from all these > different approaches the right ones to lower a single switch > statement. > > Ciao! > Steven
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov wrote: > On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: >> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov >> wrote: >>> On 02.08.2017 23:04, H.J. Lu wrote: On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov wrote: > > On 02.08.2017 21:48, H.J. Lu wrote: >> >> >> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov >> >> wrote: >>> >>> >>> On 02.08.2017 20:02, Jeff Law wrote: On 08/02/2017 12:47 PM, Jakub Jelinek wrote: > > > > On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: >> >> >> >> On 07/17/2017 01:23 AM, Yuri Gribov wrote: >>> >>> >>> >>> I've rebased the previous patch to trunk per Andrew's suggestion. >>> Original patch description/motivation/questions are in >>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html >> >> >> >> Is his stuff used for exception handling? If so, doesn't that make >> the >> performance a significant concern (ie that msync call?) > > > > > For the performance issue, I wonder if it wouldn't be better to just > compile the unwinder twice, basically include everything needed for > the > verification backtrace in a single *.c file with special defines, and > not export anything from that *.o file except the single entrypoint. > That would also handle the ABI problems. Yea. Given that the vast majority of the time the addresses are valid, I wonder if we could take advantage of that property to keep the overhead lower in the most common cases. > For the concurrent calls of other threads unmapping stacks of running > threads (that is where most of the verification goes against), I'm > afraid > that is simply non-solveable, even if you don't cache anything, it > will > still be racy. Absolutely -- I think solving this stuff 100% reliably without races isn't possible. I think the question is can we make this significantly better. >>> >>> >>> >>> >>> All, >>> >>> First of all, thanks for the feedback. This is indeed not a 100% >>> robust >>> solution, just something to allow tools like Asan to produce more sane >>> backtraces on crash (currently when stack is corrupted they would crash >>> in >>> the unwinder without printing at least partial backtrace). >>> >> >> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is >> corrupted: >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=21752 >> https://sourceware.org/bugzilla/show_bug.cgi?id=12189 >> >> There is very little point to unwind stack when stack is corrupted. > > > > I'd argue that situation with __stack_chk_fail is very special. It's > used > in production code where you'd want to halt as soon as corruption is > detected to prevent further damage so even printing message is an > additional > risk. Debugging tools (e.g. sanitizers) are different and would prefer > to > print as many survived frames as possible. > But stack unwinder in libgcc is primarily for production use, not for debugging. >>> >>> >>> I can see it used in different projects to print backtraces on error (bind9, >>> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder and >>> seems to be meant primarily for debugging (at least many projects use it for >>> this purpose: >>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for >>> libbacktrace which, as its README mentions, is meant to "print a detailed >>> backtrace when an error occurs or to gather detailed profiling information". >> >> But it shouldn't be performed on a corrupted stack. When a stack is >> corrupted, there is just no way to reliably unwind it. > > Why not? Attached patch shows that it's possible (up to a point of > corruption of course). > >> It isn't a problem >> for a debugger which is designed to analyze corrupted program. > > Yes but forcing all applications that would like to print helpful > message on stack corruption to employ gdb sounds less efficient that > providing fail-safe APIs in standard library... ^^^ There is no such a thing of fail-safe on corrupted stack. A bad, but valid address, can be put on corrupted stack. When you unwind it, the unwinder may not crash, but give you bogus stack backtrace. -- H.J.
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
> On Thu, Aug 3, 2017 at 2:52 PM, Steven Bosscher wrote: > > On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote: > >> Hello. > >> > >> After some discussions with Honza, I've decided to convert current code in > >> stmt.c that > >> is responsible for switch expansion. More precisely, I would like to > >> convert the code > >> to expand gswitch statements on tree level. Currently the newly created > >> pass is executed > >> at the end of tree optimizations. > > > > Hah, something I promissed myself (and others) to do years ago! Thanks > > thanks thanks! :-) > > > > The end of the gimple optimizations is seems late for the lowering. > > Before there were gimple optimizations, switch lowering was part of > > the first compiler pass (to generate RTL from the front end) *before* > > all code transformation passes ("optimizations" ;-). Because the > > lowering of switch statements was rewritten as a gimple lowering pass, > > it now runs *after* optimizations. But to be honest, I can't think of > > any optimization opportunities exposed by lowering earlier than at the > > end of gimple optimizations. Years ago there was some RTL jump > > threading still done after lowering of small switch statements, but I > > can't find the related PRs anymore. > > I think the main reason for not doing it early is the benefit is small > (unless it is GIMPLE optimizations triggering) and we can't get rid of > switches completely because we eventually have to support casei RTL expansion. > (and no, computed goto with an array of label addresses at GIMPLE is really > too ugly ;)) Also later in optimization you have better idea of value ranges and you can shape the decision tree better for given context. In theory at least ;) One thing that would benefit from earlier expansion is code size/runtime estimation for inlining. But perhaps we could just run the algorithm to see how hard the switch is when we decide on its size (what is there right now is bit lame) Honza > > Richard.
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On Thu, Aug 3, 2017 at 2:56 PM, Richard Biener wrote: > I think the main reason for not doing it early is the benefit is small > (unless it is GIMPLE optimizations triggering) Agree. > and we can't get rid of > switches completely because we eventually have to support casei RTL expansion. > (and no, computed goto with an array of label addresses at GIMPLE is really > too ugly ;)) What I would have done, is lower all gswitch statements that are to be lowered to something other than a tablejump. So by the time you get to RTL expansion, all remaining gswitch statements would be tablejump or casesi. Ciao! Steven
Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8
> On Aug 2, 2017, at 6:17 PM, David Miller wrote: > > From: Qing Zhao > Date: Wed, 2 Aug 2017 14:41:51 -0500 > >> so, could you please specify what kind of side effects will have >> when set STRICT_ALIGNMENT to true on TARGET_MISALIGN? > > Why don't you read the code rather than just relying upon what > high level description is given by the documentation instead? I read the codes before making the change, that’s the reason I ask you to specify clearly the bad side effect that I didn’t considered yet. thanks. Qing > > Thanks.
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu wrote: > On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov > wrote: >> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: >>> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov >>> wrote: On 02.08.2017 23:04, H.J. Lu wrote: > > On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov > wrote: >> >> On 02.08.2017 21:48, H.J. Lu wrote: >>> >>> >>> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov >>> >>> wrote: On 02.08.2017 20:02, Jeff Law wrote: > > > > On 08/02/2017 12:47 PM, Jakub Jelinek wrote: >> >> >> >> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: >>> >>> >>> >>> On 07/17/2017 01:23 AM, Yuri Gribov wrote: I've rebased the previous patch to trunk per Andrew's suggestion. Original patch description/motivation/questions are in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html >>> >>> >>> >>> Is his stuff used for exception handling? If so, doesn't that make >>> the >>> performance a significant concern (ie that msync call?) >> >> >> >> >> For the performance issue, I wonder if it wouldn't be better to just >> compile the unwinder twice, basically include everything needed for >> the >> verification backtrace in a single *.c file with special defines, and >> not export anything from that *.o file except the single entrypoint. >> That would also handle the ABI problems. > > > > Yea. > > Given that the vast majority of the time the addresses are valid, I > wonder if we could take advantage of that property to keep the > overhead > lower in the most common cases. > >> For the concurrent calls of other threads unmapping stacks of running >> threads (that is where most of the verification goes against), I'm >> afraid >> that is simply non-solveable, even if you don't cache anything, it >> will >> still be racy. > > > > Absolutely -- I think solving this stuff 100% reliably without races > isn't possible. I think the question is can we make this > significantly > better. All, First of all, thanks for the feedback. This is indeed not a 100% robust solution, just something to allow tools like Asan to produce more sane backtraces on crash (currently when stack is corrupted they would crash in the unwinder without printing at least partial backtrace). >>> >>> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is >>> corrupted: >>> >>> https://sourceware.org/bugzilla/show_bug.cgi?id=21752 >>> https://sourceware.org/bugzilla/show_bug.cgi?id=12189 >>> >>> There is very little point to unwind stack when stack is corrupted. >> >> >> >> I'd argue that situation with __stack_chk_fail is very special. It's >> used >> in production code where you'd want to halt as soon as corruption is >> detected to prevent further damage so even printing message is an >> additional >> risk. Debugging tools (e.g. sanitizers) are different and would prefer >> to >> print as many survived frames as possible. >> > > But stack unwinder in libgcc is primarily for production use, not for > debugging. I can see it used in different projects to print backtraces on error (bind9, SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder and seems to be meant primarily for debugging (at least many projects use it for this purpose: https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for libbacktrace which, as its README mentions, is meant to "print a detailed backtrace when an error occurs or to gather detailed profiling information". >>> >>> But it shouldn't be performed on a corrupted stack. When a stack is >>> corrupted, there is just no way to reliably unwind it. >> >> Why not? Attached patch shows that it's possible (up to a point of >> corruption of course). >> >>> It isn't a problem >>> for a debugger which is designed to analyze corrupted program. >> >> Yes but forcing all applications that would like to print helpful >> message on stack corruption to employ gdb sounds less efficient that >> providing fail-safe APIs in standard library... > ^^^ There is no such a thing of fail-safe on > corrupted > stack. A bad, but valid address, can be put on corrupted stack. When > you unwind it, the unwinder may not cr
Re: [PATCH 6/6] qsort comparator consistency checking
On Wed, 2 Aug 2017, Jeff Law wrote: > >> Well, there's not *that* many qsort calls. My quick grep shows 94 and > >> its a very mechanical change. Then a poison in system.h to ensure raw > >> calls to qsort don't return. Note that poisoning qsort outlaws vec::qsort too; it would need to be mass- renamed as well (to vec::sort, presumably). It seems there are 83 or more calls to vec::qsort. > > Any suggestion for the non-poisoned replacement? xqsort? gcc_qsort? > qsort_chk/qsort_nochk for checked and non-checked? I believe qsort_chk isn't appropriate, checking is not explicitly a part of interface, and we never use _chk in potentially-checking tree and RTL accessors either. Thanks. Alexander
Re: PR78888 - add value range info for tolower/toupper
On Thu, 3 Aug 2017, Jakub Jelinek wrote: > In any case, you should probably investigate all the locales say in glibc or > some other big locale repository whether tolower/toupper have the expected > properties there. They don't. In tr_TR.UTF-8, toupper ('i') == 'i', because 'İ', the correct uppercase version (as returned in locale tr_TR.ISO-8859-9), is a multibyte character and toupper can only return single-byte characters. -- Joseph S. Myers jos...@codesourcery.com
[PATCH, gcc/Makefile.in] Fix typo on ref to multi_dir variable in install-mkheaders
Hello, The attached patch fixes a typo in a Makefile rule, which we noticed while working on fixincludes for a specific target. Bootstrapped and regression tested fine on x86_64-linux. OK to commit ? Thanks much in advance for your feedback, With Kind Regards, Olivier 2017-08-04 Douglas Rupp * Makefile.in (install-mkheaders): Fix typo, where the multi_dir variable was referenced as multidir in command. fixinc-multidir.diff Description: Binary data
Re: PR78888 - add value range info for tolower/toupper
On Thu, Aug 03, 2017 at 02:38:35PM +, Joseph Myers wrote: > On Thu, 3 Aug 2017, Jakub Jelinek wrote: > > > In any case, you should probably investigate all the locales say in glibc or > > some other big locale repository whether tolower/toupper have the expected > > properties there. > > They don't. In tr_TR.UTF-8, toupper ('i') == 'i', because 'İ', the > correct uppercase version (as returned in locale tr_TR.ISO-8859-9), is a > multibyte character and toupper can only return single-byte characters. Indeed, #include #include int main () { setlocale (LC_ALL, ""); int i; for (i = -1000; i < 1000; i++) if (tolower (i) >= 'A' && tolower (i) <= 'Z') __builtin_abort (); else if (toupper (i) >= 'a' && toupper (i) <= 'z') __builtin_abort (); return 0; } fails for LC_ALL=tr_TR.UTF-8, because tolower ('I') is 'I'. Not to mention that the result is unspecified if the functions are called with a value outside of the range of unsigned char or EOF. Therefore, this optimization is invalid. Jakub
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Thu, Aug 3, 2017 at 7:17 AM, Yury Gribov wrote: > On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu wrote: >> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov >> wrote: >>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov wrote: > On 02.08.2017 23:04, H.J. Lu wrote: >> >> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov >> wrote: >>> >>> On 02.08.2017 21:48, H.J. Lu wrote: On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov wrote: > > > On 02.08.2017 20:02, Jeff Law wrote: >> >> >> >> On 08/02/2017 12:47 PM, Jakub Jelinek wrote: >>> >>> >>> >>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: On 07/17/2017 01:23 AM, Yuri Gribov wrote: > > > > I've rebased the previous patch to trunk per Andrew's suggestion. > Original patch description/motivation/questions are in > https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html Is his stuff used for exception handling? If so, doesn't that make the performance a significant concern (ie that msync call?) >>> >>> >>> >>> >>> For the performance issue, I wonder if it wouldn't be better to just >>> compile the unwinder twice, basically include everything needed for >>> the >>> verification backtrace in a single *.c file with special defines, >>> and >>> not export anything from that *.o file except the single entrypoint. >>> That would also handle the ABI problems. >> >> >> >> Yea. >> >> Given that the vast majority of the time the addresses are valid, I >> wonder if we could take advantage of that property to keep the >> overhead >> lower in the most common cases. >> >>> For the concurrent calls of other threads unmapping stacks of >>> running >>> threads (that is where most of the verification goes against), I'm >>> afraid >>> that is simply non-solveable, even if you don't cache anything, it >>> will >>> still be racy. >> >> >> >> Absolutely -- I think solving this stuff 100% reliably without races >> isn't possible. I think the question is can we make this >> significantly >> better. > > > > > All, > > First of all, thanks for the feedback. This is indeed not a 100% > robust > solution, just something to allow tools like Asan to produce more sane > backtraces on crash (currently when stack is corrupted they would > crash > in > the unwinder without printing at least partial backtrace). > FWIW, glibc 2.26 is changed to abort as soon as possible when stack is corrupted: https://sourceware.org/bugzilla/show_bug.cgi?id=21752 https://sourceware.org/bugzilla/show_bug.cgi?id=12189 There is very little point to unwind stack when stack is corrupted. >>> >>> >>> >>> I'd argue that situation with __stack_chk_fail is very special. It's >>> used >>> in production code where you'd want to halt as soon as corruption is >>> detected to prevent further damage so even printing message is an >>> additional >>> risk. Debugging tools (e.g. sanitizers) are different and would prefer >>> to >>> print as many survived frames as possible. >>> >> >> But stack unwinder in libgcc is primarily for production use, not for >> debugging. > > > I can see it used in different projects to print backtraces on error > (bind9, > SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder > and > seems to be meant primarily for debugging (at least many projects use it > for > this purpose: > https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same > for > libbacktrace which, as its README mentions, is meant to "print a detailed > backtrace when an error occurs or to gather detailed profiling > information". But it shouldn't be performed on a corrupted stack. When a stack is corrupted, there is just no way to reliably unwind it. >>> >>> Why not? Attached patch shows that it's possible (up to a point of >>> corruption of course). >>> It isn't a problem for a debugger which is designed to analyze corrupted program. >>> >>> Yes but forcing all applications that would like to print helpful >>> message on stack corruption to employ gdb sounds less efficient that >>> providing f
Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
Hi Mike, On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote: > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote: > > I think calling this with the rtx elementN args makes this only more > > complicated (the function comment doesn't say what they are or what > > NULL means, btw). You didn't handle the first part of this as far as I see? It's the big complicating issue here. > + If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is > + non-NULL, it is a 0 or 1 constant that gives the vector element number to > + use for extracting the 64-bit double word from ARG1. > + > + If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is > + non-NULL, it is a 0 or 1 constant that gives the vector element number to > + use for extracting the 64-bit double word from ARG2. > + > + The element number is based on the user element ordering, set by the > + endianess and by the -maltivec={le,be} options. */ ("endianness", two n's). I don't like using NULL as a magic value at all; it does not simplify this interface, it complicates it instead. Can you move the "which half is high" decision to the callers? Segher
[PATCH,AIX] 2nd part of support for DWARF debug sections in XCOFF.
Description: * This patch provides the 2nd part of the support for DWARF debug sections in XCOFF. ImportedSymbols() and ImportedLibraries() functions are not implemented. Tests: * AIX: Build: SUCCESS - build made by means of gmake in trunk. - patch generated by: cd gcc-svn-trunk/ svn diff libgo ChangeLog: * libgo/Makefile.am * libgo/Makefile.in * libgo/go/cmd/cgo/gcc.go * libgo/go/cmd/cgo/out.go * libgo/go/debug/dwarf/open.go * libgo/go/debug/xcoff/file.go * libgo/go/debug/xcoff/xcoff.go * libgo/go/go/build/deps_test.go Cordialement, Tony Reix Bull - ATOS IBM Coop Architect & Technical Leader Office : +33 (0) 4 76 29 72 67 1 rue de Provence - 38432 Échirolles - France www.atos.netIndex: libgo/Makefile.am === --- ./libgo/Makefile.am (revision 250630) +++ ./libgo/Makefile.am (working copy) @@ -215,7 +215,8 @@ toolexeclibgodebug_DATA = \ debug/gosym.gox \ debug/macho.gox \ debug/pe.gox \ - debug/plan9obj.gox + debug/plan9obj.gox \ + debug/xcoff.gox toolexeclibgoencodingdir = $(toolexeclibgodir)/encoding @@ -698,6 +699,7 @@ PACKAGES = \ debug/macho \ debug/pe \ debug/plan9obj \ + debug/xcoff \ encoding \ encoding/ascii85 \ encoding/asn1 \ @@ -1209,6 +1211,7 @@ TEST_PACKAGES = \ debug/macho/check \ debug/pe/check \ debug/plan9obj/check \ + debug/xcoff/check \ encoding/ascii85/check \ encoding/asn1/check \ encoding/base32/check \ Index: libgo/Makefile.in === --- ./libgo/Makefile.in (revision 250630) +++ ./libgo/Makefile.in (working copy) @@ -608,7 +608,8 @@ toolexeclibgodebug_DATA = \ debug/gosym.gox \ debug/macho.gox \ debug/pe.gox \ - debug/plan9obj.gox + debug/plan9obj.gox \ + debug/xcoff.gox toolexeclibgoencodingdir = $(toolexeclibgodir)/encoding toolexeclibgoencoding_DATA = \ @@ -861,6 +862,7 @@ PACKAGES = \ debug/macho \ debug/pe \ debug/plan9obj \ + debug/xcoff \ encoding \ encoding/ascii85 \ encoding/asn1 \ @@ -1239,6 +1241,7 @@ TEST_PACKAGES = \ debug/macho/check \ debug/pe/check \ debug/plan9obj/check \ + debug/xcoff/check \ encoding/ascii85/check \ encoding/asn1/check \ encoding/base32/check \ Index: libgo/go/cmd/cgo/gcc.go === --- ./libgo/go/cmd/cgo/gcc.go (revision 250630) +++ ./libgo/go/cmd/cgo/gcc.go (working copy) @@ -13,6 +13,7 @@ import ( "debug/elf" "debug/macho" "debug/pe" + "debug/xcoff" "encoding/binary" "errors" "flag" @@ -1230,6 +1231,10 @@ func (p *Package) gccMachine() []string { return []string{"-mabi=64"} case "mips", "mipsle": return []string{"-mabi=32"} + case "ppc64": + if goos == "aix" { + return []string{"-maix64"} + } } return nil } @@ -1360,7 +1365,29 @@ func (p *Package) gccDebug(stdin []byte) (*dwarf.D return d, binary.LittleEndian, data } - fatalf("cannot parse gcc output %s as ELF, Mach-O, PE object", gccTmp()) + if f, err := xcoff.Open(gccTmp()); err == nil { + defer f.Close() + d, err := f.DWARF() + if err != nil { + fatalf("cannot load DWARF output from %s: %v", gccTmp(), err) + } + var data []byte + for _, s := range f.Symbols { + if isDebugData(s.Name) { +if i := int(s.SectionNumber) - 1; 0 <= i && i < len(f.Sections) { + sect := f.Sections[i] + if s.Value < sect.Size { + if sdat, err := sect.Data(); err == nil { + data = sdat[s.Value:] + } + } +} + } + } + return d, binary.BigEndian, data + } + + fatalf("cannot parse gcc output %s as ELF, Mach-O, PE, XCOFF object", gccTmp()) panic("not reached") } Index: libgo/go/cmd/cgo/out.go === --- ./libgo/go/cmd/cgo/out.go (revision 250630) +++ ./libgo/go/cmd/cgo/out.go (working copy) @@ -9,6 +9,7 @@ import ( "debug/elf" "debug/macho" "debug/pe" + "debug/xcoff" "fmt" "go/ast" "go/printer" @@ -324,7 +325,28 @@ func dynimport(obj string) { return } - fatalf("cannot parse %s as ELF, Mach-O or PE", obj) + if f, err := xcoff.Open(obj); err == nil { + sym, err := f.ImportedSymbols() + if err != nil { + fatalf("cannot load imported symbols from XCOFF file %s: %v", obj, err) + } + for _, s := range sym { + if len(s) > 0 && s[0] == '.' { +s = s[1:] + } + fmt.Fprintf(stdout, "//go:cgo_import_dynamic %s %s %q\n", s, s, "") + } + lib, err := f.ImportedLibraries() + if err != nil { + fatalf("cannot load imported libraries from XCOFF file %s: %v", obj, err) + } + for _, l := range lib { + fmt.Fprintf(stdout, "//go:cgo_import_dynamic _ _ %q\n", l) + } + return + } + + fatalf("cannot parse %s as ELF, Mach-O, PE or XCOFF", obj) } // Construct a gcc struct matching the gc argument frame. Index: libgo/go/debug/dwarf/open.go === --- ./libgo/go/debug/dwarf/open.go (revision 2506
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Thu, Aug 3, 2017 at 3:55 PM, H.J. Lu wrote: > On Thu, Aug 3, 2017 at 7:17 AM, Yury Gribov > wrote: >> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu wrote: >>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov >>> wrote: On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: > On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov > wrote: >> On 02.08.2017 23:04, H.J. Lu wrote: >>> >>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov >>> >>> wrote: On 02.08.2017 21:48, H.J. Lu wrote: > > > On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov > > wrote: >> >> >> On 02.08.2017 20:02, Jeff Law wrote: >>> >>> >>> >>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote: On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: > > > > On 07/17/2017 01:23 AM, Yuri Gribov wrote: >> >> >> >> I've rebased the previous patch to trunk per Andrew's suggestion. >> Original patch description/motivation/questions are in >> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html > > > > Is his stuff used for exception handling? If so, doesn't that > make > the > performance a significant concern (ie that msync call?) For the performance issue, I wonder if it wouldn't be better to just compile the unwinder twice, basically include everything needed for the verification backtrace in a single *.c file with special defines, and not export anything from that *.o file except the single entrypoint. That would also handle the ABI problems. >>> >>> >>> >>> Yea. >>> >>> Given that the vast majority of the time the addresses are valid, I >>> wonder if we could take advantage of that property to keep the >>> overhead >>> lower in the most common cases. >>> For the concurrent calls of other threads unmapping stacks of running threads (that is where most of the verification goes against), I'm afraid that is simply non-solveable, even if you don't cache anything, it will still be racy. >>> >>> >>> >>> Absolutely -- I think solving this stuff 100% reliably without races >>> isn't possible. I think the question is can we make this >>> significantly >>> better. >> >> >> >> >> All, >> >> First of all, thanks for the feedback. This is indeed not a 100% >> robust >> solution, just something to allow tools like Asan to produce more >> sane >> backtraces on crash (currently when stack is corrupted they would >> crash >> in >> the unwinder without printing at least partial backtrace). >> > > FWIW, glibc 2.26 is changed to abort as soon as possible when stack is > corrupted: > > https://sourceware.org/bugzilla/show_bug.cgi?id=21752 > https://sourceware.org/bugzilla/show_bug.cgi?id=12189 > > There is very little point to unwind stack when stack is corrupted. I'd argue that situation with __stack_chk_fail is very special. It's used in production code where you'd want to halt as soon as corruption is detected to prevent further damage so even printing message is an additional risk. Debugging tools (e.g. sanitizers) are different and would prefer to print as many survived frames as possible. >>> >>> But stack unwinder in libgcc is primarily for production use, not for >>> debugging. >> >> >> I can see it used in different projects to print backtraces on error >> (bind9, >> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder >> and >> seems to be meant primarily for debugging (at least many projects use it >> for >> this purpose: >> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same >> for >> libbacktrace which, as its README mentions, is meant to "print a detailed >> backtrace when an error occurs or to gather detailed profiling >> information". > > But it shouldn't be performed on a corrupted stack. When a stack is > corrupted, there is just no way to reliably unwind it. Why not? Attached patch shows that it's possible (up to a point of corruption of course).
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Thu, Aug 3, 2017 at 8:07 AM, Yury Gribov wrote: > On Thu, Aug 3, 2017 at 3:55 PM, H.J. Lu wrote: >> On Thu, Aug 3, 2017 at 7:17 AM, Yury Gribov >> wrote: >>> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu wrote: On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov wrote: > On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: >> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov >> wrote: >>> On 02.08.2017 23:04, H.J. Lu wrote: On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov wrote: > > On 02.08.2017 21:48, H.J. Lu wrote: >> >> >> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov >> >> wrote: >>> >>> >>> On 02.08.2017 20:02, Jeff Law wrote: On 08/02/2017 12:47 PM, Jakub Jelinek wrote: > > > > On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: >> >> >> >> On 07/17/2017 01:23 AM, Yuri Gribov wrote: >>> >>> >>> >>> I've rebased the previous patch to trunk per Andrew's >>> suggestion. >>> Original patch description/motivation/questions are in >>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html >> >> >> >> Is his stuff used for exception handling? If so, doesn't that >> make >> the >> performance a significant concern (ie that msync call?) > > > > > For the performance issue, I wonder if it wouldn't be better to > just > compile the unwinder twice, basically include everything needed > for > the > verification backtrace in a single *.c file with special defines, > and > not export anything from that *.o file except the single > entrypoint. > That would also handle the ABI problems. Yea. Given that the vast majority of the time the addresses are valid, I wonder if we could take advantage of that property to keep the overhead lower in the most common cases. > For the concurrent calls of other threads unmapping stacks of > running > threads (that is where most of the verification goes against), I'm > afraid > that is simply non-solveable, even if you don't cache anything, it > will > still be racy. Absolutely -- I think solving this stuff 100% reliably without races isn't possible. I think the question is can we make this significantly better. >>> >>> >>> >>> >>> All, >>> >>> First of all, thanks for the feedback. This is indeed not a 100% >>> robust >>> solution, just something to allow tools like Asan to produce more >>> sane >>> backtraces on crash (currently when stack is corrupted they would >>> crash >>> in >>> the unwinder without printing at least partial backtrace). >>> >> >> FWIW, glibc 2.26 is changed to abort as soon as possible when stack >> is >> corrupted: >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=21752 >> https://sourceware.org/bugzilla/show_bug.cgi?id=12189 >> >> There is very little point to unwind stack when stack is corrupted. > > > > I'd argue that situation with __stack_chk_fail is very special. It's > used > in production code where you'd want to halt as soon as corruption is > detected to prevent further damage so even printing message is an > additional > risk. Debugging tools (e.g. sanitizers) are different and would > prefer > to > print as many survived frames as possible. > But stack unwinder in libgcc is primarily for production use, not for debugging. >>> >>> >>> I can see it used in different projects to print backtraces on error >>> (bind9, >>> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder >>> and >>> seems to be meant primarily for debugging (at least many projects use >>> it for >>> this purpose: >>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). >>> Same for >>> libbacktrace which, as its README mentions, is meant to "print a >>> detailed >>> backtrace when an error occurs or to gather detailed profiling >
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On 08/03/2017 08:17 AM, Yury Gribov wrote: > On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu wrote: >> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov >> wrote: >>> On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov wrote: > On 02.08.2017 23:04, H.J. Lu wrote: >> >> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov >> wrote: >>> >>> On 02.08.2017 21:48, H.J. Lu wrote: On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov wrote: > > > On 02.08.2017 20:02, Jeff Law wrote: >> >> >> >> On 08/02/2017 12:47 PM, Jakub Jelinek wrote: >>> >>> >>> >>> On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: On 07/17/2017 01:23 AM, Yuri Gribov wrote: > > > > I've rebased the previous patch to trunk per Andrew's suggestion. > Original patch description/motivation/questions are in > https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html Is his stuff used for exception handling? If so, doesn't that make the performance a significant concern (ie that msync call?) >>> >>> >>> >>> >>> For the performance issue, I wonder if it wouldn't be better to just >>> compile the unwinder twice, basically include everything needed for >>> the >>> verification backtrace in a single *.c file with special defines, >>> and >>> not export anything from that *.o file except the single entrypoint. >>> That would also handle the ABI problems. >> >> >> >> Yea. >> >> Given that the vast majority of the time the addresses are valid, I >> wonder if we could take advantage of that property to keep the >> overhead >> lower in the most common cases. >> >>> For the concurrent calls of other threads unmapping stacks of >>> running >>> threads (that is where most of the verification goes against), I'm >>> afraid >>> that is simply non-solveable, even if you don't cache anything, it >>> will >>> still be racy. >> >> >> >> Absolutely -- I think solving this stuff 100% reliably without races >> isn't possible. I think the question is can we make this >> significantly >> better. > > > > > All, > > First of all, thanks for the feedback. This is indeed not a 100% > robust > solution, just something to allow tools like Asan to produce more sane > backtraces on crash (currently when stack is corrupted they would > crash > in > the unwinder without printing at least partial backtrace). > FWIW, glibc 2.26 is changed to abort as soon as possible when stack is corrupted: https://sourceware.org/bugzilla/show_bug.cgi?id=21752 https://sourceware.org/bugzilla/show_bug.cgi?id=12189 There is very little point to unwind stack when stack is corrupted. >>> >>> >>> >>> I'd argue that situation with __stack_chk_fail is very special. It's >>> used >>> in production code where you'd want to halt as soon as corruption is >>> detected to prevent further damage so even printing message is an >>> additional >>> risk. Debugging tools (e.g. sanitizers) are different and would prefer >>> to >>> print as many survived frames as possible. >>> >> >> But stack unwinder in libgcc is primarily for production use, not for >> debugging. > > > I can see it used in different projects to print backtraces on error > (bind9, > SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder > and > seems to be meant primarily for debugging (at least many projects use it > for > this purpose: > https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same > for > libbacktrace which, as its README mentions, is meant to "print a detailed > backtrace when an error occurs or to gather detailed profiling > information". But it shouldn't be performed on a corrupted stack. When a stack is corrupted, there is just no way to reliably unwind it. >>> >>> Why not? Attached patch shows that it's possible (up to a point of >>> corruption of course). >>> It isn't a problem for a debugger which is designed to analyze corrupted program. >>> >>> Yes but forcing all applications that would like to print helpful >>> message on stack corruption to employ gdb sounds less efficient that >>> providing fail-safe
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Thu, Aug 3, 2017 at 8:25 AM, Jeff Law wrote: > On 08/03/2017 08:17 AM, Yury Gribov wrote: >> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu wrote: >>> On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov >>> wrote: On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: > On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov > wrote: >> On 02.08.2017 23:04, H.J. Lu wrote: >>> >>> On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov >>> >>> wrote: On 02.08.2017 21:48, H.J. Lu wrote: > > > On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov > > wrote: >> >> >> On 02.08.2017 20:02, Jeff Law wrote: >>> >>> >>> >>> On 08/02/2017 12:47 PM, Jakub Jelinek wrote: On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: > > > > On 07/17/2017 01:23 AM, Yuri Gribov wrote: >> >> >> >> I've rebased the previous patch to trunk per Andrew's suggestion. >> Original patch description/motivation/questions are in >> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html > > > > Is his stuff used for exception handling? If so, doesn't that > make > the > performance a significant concern (ie that msync call?) For the performance issue, I wonder if it wouldn't be better to just compile the unwinder twice, basically include everything needed for the verification backtrace in a single *.c file with special defines, and not export anything from that *.o file except the single entrypoint. That would also handle the ABI problems. >>> >>> >>> >>> Yea. >>> >>> Given that the vast majority of the time the addresses are valid, I >>> wonder if we could take advantage of that property to keep the >>> overhead >>> lower in the most common cases. >>> For the concurrent calls of other threads unmapping stacks of running threads (that is where most of the verification goes against), I'm afraid that is simply non-solveable, even if you don't cache anything, it will still be racy. >>> >>> >>> >>> Absolutely -- I think solving this stuff 100% reliably without races >>> isn't possible. I think the question is can we make this >>> significantly >>> better. >> >> >> >> >> All, >> >> First of all, thanks for the feedback. This is indeed not a 100% >> robust >> solution, just something to allow tools like Asan to produce more >> sane >> backtraces on crash (currently when stack is corrupted they would >> crash >> in >> the unwinder without printing at least partial backtrace). >> > > FWIW, glibc 2.26 is changed to abort as soon as possible when stack is > corrupted: > > https://sourceware.org/bugzilla/show_bug.cgi?id=21752 > https://sourceware.org/bugzilla/show_bug.cgi?id=12189 > > There is very little point to unwind stack when stack is corrupted. I'd argue that situation with __stack_chk_fail is very special. It's used in production code where you'd want to halt as soon as corruption is detected to prevent further damage so even printing message is an additional risk. Debugging tools (e.g. sanitizers) are different and would prefer to print as many survived frames as possible. >>> >>> But stack unwinder in libgcc is primarily for production use, not for >>> debugging. >> >> >> I can see it used in different projects to print backtraces on error >> (bind9, >> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder >> and >> seems to be meant primarily for debugging (at least many projects use it >> for >> this purpose: >> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same >> for >> libbacktrace which, as its README mentions, is meant to "print a detailed >> backtrace when an error occurs or to gather detailed profiling >> information". > > But it shouldn't be performed on a corrupted stack. When a stack is > corrupted, there is just no way to reliably unwind it. Why not? Attached patch shows that it's possible (up to a point of corruption of course). > It is
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On 08/03/2017 06:52 AM, Yury Gribov wrote: > On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: >> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov >> wrote: >>> On 02.08.2017 23:04, H.J. Lu wrote: On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov wrote: > > On 02.08.2017 21:48, H.J. Lu wrote: >> >> >> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov >> >> wrote: >>> >>> >>> On 02.08.2017 20:02, Jeff Law wrote: On 08/02/2017 12:47 PM, Jakub Jelinek wrote: > > > > On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: >> >> >> >> On 07/17/2017 01:23 AM, Yuri Gribov wrote: >>> >>> >>> >>> I've rebased the previous patch to trunk per Andrew's suggestion. >>> Original patch description/motivation/questions are in >>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html >> >> >> >> Is his stuff used for exception handling? If so, doesn't that make >> the >> performance a significant concern (ie that msync call?) > > > > > For the performance issue, I wonder if it wouldn't be better to just > compile the unwinder twice, basically include everything needed for > the > verification backtrace in a single *.c file with special defines, and > not export anything from that *.o file except the single entrypoint. > That would also handle the ABI problems. Yea. Given that the vast majority of the time the addresses are valid, I wonder if we could take advantage of that property to keep the overhead lower in the most common cases. > For the concurrent calls of other threads unmapping stacks of running > threads (that is where most of the verification goes against), I'm > afraid > that is simply non-solveable, even if you don't cache anything, it > will > still be racy. Absolutely -- I think solving this stuff 100% reliably without races isn't possible. I think the question is can we make this significantly better. >>> >>> >>> >>> >>> All, >>> >>> First of all, thanks for the feedback. This is indeed not a 100% >>> robust >>> solution, just something to allow tools like Asan to produce more sane >>> backtraces on crash (currently when stack is corrupted they would crash >>> in >>> the unwinder without printing at least partial backtrace). >>> >> >> FWIW, glibc 2.26 is changed to abort as soon as possible when stack is >> corrupted: >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=21752 >> https://sourceware.org/bugzilla/show_bug.cgi?id=12189 >> >> There is very little point to unwind stack when stack is corrupted. > > > > I'd argue that situation with __stack_chk_fail is very special. It's > used > in production code where you'd want to halt as soon as corruption is > detected to prevent further damage so even printing message is an > additional > risk. Debugging tools (e.g. sanitizers) are different and would prefer > to > print as many survived frames as possible. > But stack unwinder in libgcc is primarily for production use, not for debugging. >>> >>> >>> I can see it used in different projects to print backtraces on error (bind9, >>> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder and >>> seems to be meant primarily for debugging (at least many projects use it for >>> this purpose: >>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). Same for >>> libbacktrace which, as its README mentions, is meant to "print a detailed >>> backtrace when an error occurs or to gather detailed profiling information". >> >> But it shouldn't be performed on a corrupted stack. When a stack is >> corrupted, there is just no way to reliably unwind it. > > Why not? Attached patch shows that it's possible (up to a point of > corruption of course). Right. We should unwind up to the point where we detect something amiss, then stop. And that's what I'm most interested in here -- detecting of the invalid frame data. THe problem I see is doing it efficiently enough. jeff
Re: [PATCH, gcc/Makefile.in] Fix typo on ref to multi_dir variable in install-mkheaders
On 08/03/2017 08:41 AM, Olivier Hainque wrote: > Hello, > > The attached patch fixes a typo in a Makefile rule, which > we noticed while working on fixincludes for a specific target. > > Bootstrapped and regression tested fine on x86_64-linux. > > OK to commit ? > > Thanks much in advance for your feedback, > > With Kind Regards, > > Olivier > > 2017-08-04 Douglas Rupp > > * Makefile.in (install-mkheaders): Fix typo, where the multi_dir > variable was referenced as multidir in command. > OK. jeff
Re: [PATCH 6/6] qsort comparator consistency checking
On 08/03/2017 08:24 AM, Alexander Monakov wrote: > On Wed, 2 Aug 2017, Jeff Law wrote: Well, there's not *that* many qsort calls. My quick grep shows 94 and its a very mechanical change. Then a poison in system.h to ensure raw calls to qsort don't return. > > Note that poisoning qsort outlaws vec::qsort too; it would need to be mass- > renamed as well (to vec::sort, presumably). It seems there are 83 or more > calls to vec::qsort. Ugh :( That's an unfortunate implementation of poisoning :( Consider a patch to rename those too pre-approved. > >>> Any suggestion for the non-poisoned replacement? xqsort? gcc_qsort? >> qsort_chk/qsort_nochk for checked and non-checked? > > I believe qsort_chk isn't appropriate, checking is not explicitly a part > of interface, and we never use _chk in potentially-checking tree and RTL > accessors either. How about just "sort"? Jeff
Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8
To be more specified, when reading all the codes corresponding to “STRICT_ALIGNMENT” and “SLOW_UNALIGNMENT_ACCESS” in gcc (NOTE, SLOW_UNALIGNMENT_ACCESS is the same as STRICT_ALIGNMENT when it is NOT defined explicitly, this is the case for SPARC) We can get the following summary: all the special handling on STRICT_ALIGNMENT or SLOW_UNALIGNMENT_ACCESS in these codes have the following common logic: if the memory access is known to be not-aligned well during compilation time, if the targeted platform does NOT support faster unaligned memory access, the compiler will try to make the memory access aligned well. Otherwise, if the targeted platform supports faster unaligned memory access, it will leave the compiler-time known not-aligned memory access as it, later the hardware support will kicked in for these unaligned memory access. this behavior is consistent with the high level definition of STRICT_ALIGNMENT. And also consistent with the M8 misaligned support: if the target is NOT TARGET_MISALIGN, STRICT_ALIGNMENT is 1, all the compiler-time known misaligned memory accesses are adjusted to aligned memory access before RTL generation; on the other hand, if the target is TARGET_MISALIGN, STRICT_ALIGNMENT is 0, the compiler-time known misaligned memory accesses are NOT adjusted, after RTL generation, we will have compiler-time known misaligned memory access, we can use the new misaligned ld/st hardware insns to support these compiler-time known misaligned memory access. hope this is clear. thanks. Qing >> >> Why don't you read the code rather than just relying upon what >> high level description is given by the documentation instead? > > I read the codes before making the change, that’s the reason I ask you to > specify clearly the bad side effect that I didn’t considered yet. > > thanks. > > Qing
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On 08/02/2017 03:36 PM, Florian Weimer wrote: > * Jeff Law: > >> Something like setup a signal handler when we first start unwinding that >> flags the error and tear it down when we're done unwinding?Obviously >> we can't do setup/tear down each time for each address. Anyway, just >> thinking outloud here... > > Linux doesn't have per-thread signal handlers, so this doesn't work > reliably. Sigh. There's probably a multitude of reasons why this wouldn't work :-) > > If speed is not a concern, but reliability is, call fork (the system > call, not glibc's wrapper which calls fork handlers) and do the work > in a single-threaded copy of the process. There, you can set up > signal handlers as you see fit, and the VM layout won't change > unexpectedly. Speed is one of the concerns. Essentially we're pondering an efficient mechanism to know if a given code address is valid so that we know if we should even try to unwind through a particular stack frame. One could probably claim that in the EH path a call to msync shouldn't be a concern. I'm skeptical these days, but could be convinced that it's tolerable. > > To harden unwinding against corrupted tables or table locations, we'd > have to change ld.so to make all critical data read-only after loading > and remove the unwinder caches (with more help from ld.so instead). > It would make sense to move the unwinder implementation into ld.so. > With proper hardening, corrupted stacks would not be able to cause > crashes anymore, either. Yea. I'm not sure if we're ready to suggest that kind of change yet, but that may be the long term direction. jeff
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Thu, Aug 3, 2017 at 4:28 PM, H.J. Lu wrote: > On Thu, Aug 3, 2017 at 8:25 AM, Jeff Law wrote: >> On 08/03/2017 08:17 AM, Yury Gribov wrote: >>> On Thu, Aug 3, 2017 at 2:01 PM, H.J. Lu wrote: On Thu, Aug 3, 2017 at 5:52 AM, Yury Gribov wrote: > On Thu, Aug 3, 2017 at 1:22 PM, H.J. Lu wrote: >> On Wed, Aug 2, 2017 at 9:29 PM, Yury Gribov >> wrote: >>> On 02.08.2017 23:04, H.J. Lu wrote: On Wed, Aug 2, 2017 at 1:56 PM, Yury Gribov wrote: > > On 02.08.2017 21:48, H.J. Lu wrote: >> >> >> On Wed, Aug 2, 2017 at 1:39 PM, Yury Gribov >> >> wrote: >>> >>> >>> On 02.08.2017 20:02, Jeff Law wrote: On 08/02/2017 12:47 PM, Jakub Jelinek wrote: > > > > On Wed, Aug 02, 2017 at 12:38:13PM -0600, Jeff Law wrote: >> >> >> >> On 07/17/2017 01:23 AM, Yuri Gribov wrote: >>> >>> >>> >>> I've rebased the previous patch to trunk per Andrew's >>> suggestion. >>> Original patch description/motivation/questions are in >>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html >> >> >> >> Is his stuff used for exception handling? If so, doesn't that >> make >> the >> performance a significant concern (ie that msync call?) > > > > > For the performance issue, I wonder if it wouldn't be better to > just > compile the unwinder twice, basically include everything needed > for > the > verification backtrace in a single *.c file with special defines, > and > not export anything from that *.o file except the single > entrypoint. > That would also handle the ABI problems. Yea. Given that the vast majority of the time the addresses are valid, I wonder if we could take advantage of that property to keep the overhead lower in the most common cases. > For the concurrent calls of other threads unmapping stacks of > running > threads (that is where most of the verification goes against), I'm > afraid > that is simply non-solveable, even if you don't cache anything, it > will > still be racy. Absolutely -- I think solving this stuff 100% reliably without races isn't possible. I think the question is can we make this significantly better. >>> >>> >>> >>> >>> All, >>> >>> First of all, thanks for the feedback. This is indeed not a 100% >>> robust >>> solution, just something to allow tools like Asan to produce more >>> sane >>> backtraces on crash (currently when stack is corrupted they would >>> crash >>> in >>> the unwinder without printing at least partial backtrace). >>> >> >> FWIW, glibc 2.26 is changed to abort as soon as possible when stack >> is >> corrupted: >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=21752 >> https://sourceware.org/bugzilla/show_bug.cgi?id=12189 >> >> There is very little point to unwind stack when stack is corrupted. > > > > I'd argue that situation with __stack_chk_fail is very special. It's > used > in production code where you'd want to halt as soon as corruption is > detected to prevent further damage so even printing message is an > additional > risk. Debugging tools (e.g. sanitizers) are different and would > prefer > to > print as many survived frames as possible. > But stack unwinder in libgcc is primarily for production use, not for debugging. >>> >>> >>> I can see it used in different projects to print backtraces on error >>> (bind9, >>> SimGrid, sanitizers). backtrace(3) also sits on top of libgcc unwinder >>> and >>> seems to be meant primarily for debugging (at least many projects use >>> it for >>> this purpose: >>> https://codesearch.debian.net/search?q=backtrace_symbols&perpkg=1). >>> Same for >>> libbacktrace which, as its README mentions, is meant to "print a >>> detailed >>> backtrace when an error occurs or to gather detailed profiling >>> information".
Re: [PATCH 6/6] qsort comparator consistency checking
On Thu, Aug 03, 2017 at 09:33:13AM -0600, Jeff Law wrote: > On 08/03/2017 08:24 AM, Alexander Monakov wrote: > > On Wed, 2 Aug 2017, Jeff Law wrote: > Well, there's not *that* many qsort calls. My quick grep shows 94 and > its a very mechanical change. Then a poison in system.h to ensure raw > calls to qsort don't return. > > > > Note that poisoning qsort outlaws vec::qsort too; it would need to be mass- > > renamed as well (to vec::sort, presumably). It seems there are 83 or more > > calls to vec::qsort. > Ugh :( That's an unfortunate implementation of poisoning :( Consider a > patch to rename those too pre-approved. Do we really need to rename and poison anything? qsort () in the source is something that is most obvious to developers, so trying to force them to use something different will just mean extra thing to learn. I mean, isn't it better to just add a static inline qsort that in the checking case calls qsort_chk, or do the redirection using GNU asm redirection: typeof (qsort) __asm ("qsort_chk"); and define extern "C" qsort_chk somewhere? configure could test whether it works on the target (it wouldn't perhaps on targets which already use some inline for qsort in their headers or use qsort as a macro (but the latter wouldn't work anyway with GCC already)). The _5th macro isn't that bad either, appart from using reserved namespace identifiers (it really should be something like qsort_5th and the arguments shouldn't start with underscores). Jakub
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On Thu, Aug 03, 2017 at 09:48:47AM -0600, Jeff Law wrote: > > If speed is not a concern, but reliability is, call fork (the system > > call, not glibc's wrapper which calls fork handlers) and do the work > > in a single-threaded copy of the process. There, you can set up > > signal handlers as you see fit, and the VM layout won't change > > unexpectedly. > Speed is one of the concerns. Essentially we're pondering an efficient > mechanism to know if a given code address is valid so that we know if we > should even try to unwind through a particular stack frame. > > One could probably claim that in the EH path a call to msync shouldn't > be a concern. I'm skeptical these days, but could be convinced that > it's tolerable. There are many projects that abuse exception handling for normal control flow and where the EH speed is very important. While the msync calls were not performed during normal EH or normal backtrace in the posted patch, just in sections guarded with __builtin_expect on some variable, it still makes worse RA because the compiler has to store and load all the state it could have in call clobbered registers across that spot. Jakub
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
On 08/02/2017 10:47 PM, Yury Gribov wrote: > On 03.08.2017 3:06, Ximin Luo wrote: >> Jeff Law: >>> On 07/21/2017 10:15 AM, Ximin Luo wrote: (Please keep me on CC, I am not subscribed) Proposal This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value" command-line arguments that precede any explicit ones. This makes the final binary output reproducible, and also hides the unreproducible value (the source path prefixes) from CFLAGS et. al. which many build tools (understandably) embed as-is into their build output. >>> I'd *really* avoid doing this with magic environment variables. Make it >>> a first class option to the compiler. Yes, it means projects that want >>> this behavior have to arrange to pass that flag to their compiler, but >>> IMHO that's much preferred over environment variables. >>> >>> Jeff >>> >> >> Hi Jeff, >> >> If by "first class option" you meant a command-line flag, GCC *already >> has* that (-fdebug-prefix-map) > and it wasn't enough to achieve >> reproducibility in many cases we tested. > > Shouldn't -fdebug-prefix-map be updated to use the same syntax as > BUILD_PATH_PREFIX_MAP? Or a different option to cover what Ximin is trying to do with BUILD_PATH_PREFIX_MAP. BUILD_PATH_PREFIX_MAP has to go, I'm not going to approve it in that form. But I would seriously consider the same functionality as a command line switch. jeff
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
On 08/02/2017 08:06 PM, Ximin Luo wrote: > Jeff Law: >> On 07/21/2017 10:15 AM, Ximin Luo wrote: >>> (Please keep me on CC, I am not subscribed) >>> >>> >>> Proposal >>> >>> >>> This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. >>> When >>> this is set, GCC will treat this as extra implicit >>> "-fdebug-prefix-map=$value" >>> command-line arguments that precede any explicit ones. This makes the final >>> binary output reproducible, and also hides the unreproducible value (the >>> source >>> path prefixes) from CFLAGS et. al. which many build tools (understandably) >>> embed as-is into their build output. >> I'd *really* avoid doing this with magic environment variables. Make it >> a first class option to the compiler. Yes, it means projects that want >> this behavior have to arrange to pass that flag to their compiler, but >> IMHO that's much preferred over environment variables. >> >> Jeff >> > > Hi Jeff, > > If by "first class option" you meant a command-line flag, GCC *already has* > that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in > many cases we tested. dpkg-buildflags actually already adds these flags to > CFLAGS CXXFLAGS etc on Debian. However, with this patch using the environment > variable, we are able to reproduce 1800 more packages out of 26000.Then take > what you've done with the environment variable and instead implement it on top of a switch. An environment variable is absolutely the wrong thing to do here. > > GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which > was accepted about 2 years ago in a patch written by one of our GSoC > students. We are not planning any more environment variables like this, and > are committed to fixing other sources of non-determinism by patching the > relevant build scripts. I would have rejected that as well :-) One of the few times I would have disagreed with Bernd. Jeff
[PATCH] Fix PR81503 (SLSR invalid fold)
Hi, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81503 identifies a problem in SLSR where an invalid replacement is made because the desired value doesn't fit in the target type. This patch addresses that issue. The signed case is simple, as we require the value not to change when restricted to the precision of the target type. For the unsigned case, a negative value is expected to be converted to the congruent positive value, so we need to check for that. We have several test cases in the test suite that exercise the negative bump for an unsigned/pointer type. Per Jakub's suggestion I attempted to do all of this within the wide_int and widest_int framework. However, I was unable to find a way to generate maxval as a widest_int within that framework. I can use wide_int maxval = wi::max_value (prec, sign); to get a wide_int, but I can't find a way to convert this to a widest_int so that I can add it to directly to bump. I am probably missing something obvious here, so I welcome any suggestions. At any rate, using TYPE_MAX_VALUE works, it's just not as clean as I would like. BTW, I plan a follow-up patch to rewrite the complex condition for executing the main block of code, which is starting to get out of control. It will be simpler to read if the conditions are negated and used to do an immediate return. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this ok for trunk? (Just noticed a tab/space problem on the comment lines, which I will fix.) Thanks, Bill [gcc] 2017-08-03 Bill Schmidt PR tree-optimization/81503 * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure folded constant fits in the target type. [gcc/testsuite] 2017-08-03 Bill Schmidt PR tree-optimization/81503 * gcc.c-torture/execute/pr81503.c: New file. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 250791) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ { tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); + unsigned int prec = TYPE_PRECISION (target_type); + tree maxval = (POINTER_TYPE_P (target_type) +? TYPE_MAX_VALUE (sizetype) +: TYPE_MAX_VALUE (target_type)); /* It is highly unlikely, but possible, that the resulting bump doesn't fit in a HWI. Abandon the replacement @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ types but allows for safe negation without twisted logic. */ if (wi::fits_shwi_p (bump) && bump.to_shwi () != HOST_WIDE_INT_MIN + /* It is more likely that the bump doesn't fit in the target +type, so check whether constraining it to that type changes +the value. For a signed type, the value mustn't change. + For an unsigned type, the value may only change to a + congruent value (for negative bumps). */ + && (TYPE_UNSIGNED (target_type) + || wi::eq_p (bump, wi::ext (bump, prec, SIGNED))) + && (!TYPE_UNSIGNED (target_type) + || wi::eq_p (bump, wi::ext (bump, prec, UNSIGNED)) + || wi::eq_p (bump + wi::to_widest (maxval) + 1, + wi::ext (bump, prec, UNSIGNED))) /* It is not useful to replace casts, copies, negates, or adds of an SSA name and a constant. */ && cand_code != SSA_NAME Index: gcc/testsuite/gcc.c-torture/execute/pr81503.c === --- gcc/testsuite/gcc.c-torture/execute/pr81503.c (nonexistent) +++ gcc/testsuite/gcc.c-torture/execute/pr81503.c (working copy) @@ -0,0 +1,15 @@ +unsigned short a = 41461; +unsigned short b = 3419; +int c = 0; + +void foo() { + if (a + b * ~(0 != 5)) +c = -~(b * ~(0 != 5)) + 2147483647; +} + +int main() { + foo(); + if (c != 2147476810) +return -1; + return 0; +}
Re: [PATCH] Improve var_bound range test opt (PR tree-optimization/81655)
On 08/03/2017 03:32 AM, Jakub Jelinek wrote: > On Thu, Aug 03, 2017 at 10:58:07AM +0200, Richard Biener wrote: >>> The reason why we punt is the unexpected _4 != 1 condition, the code >>> is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is equivalent >>> to _4 != 1 for boolean type. >> >> Hmm, I thought we had code to canonicalize boolean compares (but I can't >> find that right now). Some is in > > I was looking for that too, but didn't find anything that would be done > always. I can recall looking for that kind of canonicalization as well, but not finding it. Furthermore instrumentation showed the different forms could show up in the IL, but forms such as != 1 were relatively rare. Jeff
Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.
On 08/03/2017 03:02 AM, Jan Hubicka wrote: >> On 08/02/2017 01:51 PM, Richard Biener wrote: >>> On Wed, Aug 2, 2017 at 1:20 PM, Martin Liška wrote: Hello. After some discussions with Honza, I've decided to convert current code in stmt.c that is responsible for switch expansion. More precisely, I would like to convert the code to expand gswitch statements on tree level. Currently the newly created pass is executed at the end of tree optimizations. My plan for future is to inspire in [1] and come up with some more sophisticated switch expansions. For that I've been working on a paper where I'll summarize statistics based on what I've collected in openSUSE distribution with specially instrumented GCC. If I'll be happy I can also fit in to schedule of this year's Cauldron with a talk. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Thoughts? >>> >>> First of all thanks. >>> >>> I think part of switch expansion moved to switch-conversion some time ago >>> (emit_case_bit_tests). So maybe the full lowering should be in at least >>> the same source file and it should maybe applied earlier for a subset of >>> cases (very low number of cases for example). >> >> Yep, good idea. I'll take a look. >> >>> >>> Did you base the code on the RTL expansion code or did you re-write it from >>> scratch? >> >> It's based, I've just changed the function that create CFG. > > I have talked Martin to do this in first step. Switch expansion is infinitely > difficult problem and I think changing representation first and keeping the > basic algorithm is easiest way to get something done. Algorithm will be > improved next as far as I know :) I can support that approach :-) jeff
Re: [PATCH 0/2] Python testcases to check DWARF output
On 08/03/2017 02:27 AM, Pierre-Marie de Rodat wrote: > On 08/02/2017 05:43 PM, Jeff Law wrote: >> I hate to throw in a wrench at this point, but has anyone looked at >> dwgrep from Petr Machata? He's not doing much with it anymore, but it >> might provide enough of a dwarf scanning framework to be useful for >> testing purposes. > > Sure, no problem: I first started talking publicly about this one week > ago, so it’s definitely not too late to mention alternatives. ;-) I > learned about dwgrep two years ago and forgot about it, so thank you for > the idea. I started to have a look at it, and for now I don’t think it’s > a good match in this context: > > 1. it’s an ELF only tool; > 2. it must be built, requiring external dependencies: cmake and > elfutils; > 3. in order to use it, one must learn a dedicated post-fix language > (Zwerg) > > For 1. I think this is a true problem, as it means for instance that we > could not test DWARF on Windows and Darwin setups. Unless we add PE and > Mach-O handling in dwgrep of course, but that does not sound easy and > will bring other external dependencies. > > For 3. I feel that, for someone who is comfortable with Python, it will > be easier to deal with a Python library (the dwarfutils in my patch) > than having to learn yet another DSL. I think that’s precisely why some > people would like to have a Python test framework rather than a TCL one. > Working with a “usual” imperative language looks easier than with > postfix expressions. Smaller cognitive load. > > Actually I see another problem: pattern will have to vary depending on > the target platform (for instance 32/64bit or depending on the DWARF > version). Of course we could duplicate whole patterns in testcases to > take this into account, but that’s like code duplication: I think we > should be able to include small “X if 32bit else Y” in patterns, and I > don’t think we can do that with Zwerg (no way to pass something like > environment variables). > > Of course, I have written a “competitor” tool: I guess my judgment is > biased. :-) So other opinions are welcome! Thanks. I just wanted to raise it as a possibility. It looks like it's not a good fit here, so let's not let it be a distraction. jeff
Re: [PATCH] Fix PR81503 (SLSR invalid fold)
On Thu, Aug 03, 2017 at 11:05:54AM -0500, Bill Schmidt wrote: > --- gcc/gimple-ssa-strength-reduction.c (revision 250791) > +++ gcc/gimple-ssa-strength-reduction.c (working copy) > @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ > { >tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); >enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); > + unsigned int prec = TYPE_PRECISION (target_type); > + tree maxval = (POINTER_TYPE_P (target_type) > + ? TYPE_MAX_VALUE (sizetype) > + : TYPE_MAX_VALUE (target_type)); > >/* It is highly unlikely, but possible, that the resulting > bump doesn't fit in a HWI. Abandon the replacement > @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ > types but allows for safe negation without twisted logic. */ >if (wi::fits_shwi_p (bump) >&& bump.to_shwi () != HOST_WIDE_INT_MIN > + /* It is more likely that the bump doesn't fit in the target > + type, so check whether constraining it to that type changes > + the value. For a signed type, the value mustn't change. > + For an unsigned type, the value may only change to a > + congruent value (for negative bumps). */ > + && (TYPE_UNSIGNED (target_type) > + || wi::eq_p (bump, wi::ext (bump, prec, SIGNED))) > + && (!TYPE_UNSIGNED (target_type) > + || wi::eq_p (bump, wi::ext (bump, prec, UNSIGNED)) > + || wi::eq_p (bump + wi::to_widest (maxval) + 1, > +wi::ext (bump, prec, UNSIGNED))) wi::ext makes only sense if used with variable TYPE_SIGNED, when used with constant, it is better to use wi::sext or wi::zext. And, wouldn't it be more readable to use: && (TYPE_UNSIGNED (target_type) ? (wi::eq_p (bump, wi::zext (bump, prec)) || wi::eq_p (bump + wi::to_widest (maxval) + 1, wi::zext (bump, prec))) : wi::eq_p (bump, wi::sext (bump, prec))) ? For TYPE_UNSIGNED, do you actually need any restriction? What kind of bump values are wrong for unsigned types and why? Jakub
Re: [PATCH 6/6] qsort comparator consistency checking
On Thu, 3 Aug 2017, Jakub Jelinek wrote: > Do we really need to rename and poison anything? qsort () in the source is > something that is most obvious to developers, so trying to force them to use > something different will just mean extra thing to learn. Yep, I'd prefer to have a solution that keeps C-style qsort invocations as-is. Note that with vec::qsort -> vec::sort renaming (which should be less controversial, STL also has std::vector::sort), the argument counting trick won't be needed, the redirection will simply be: #undef qsort #define qsort(base, n, sz, cmp) qsort_chk (base, n, sz, cmp) > I mean, isn't it better to just add a static inline qsort that in the > checking case calls qsort_chk, (redefining qsort like that is formally UB, I'd like to avoid it) > or do the redirection using GNU asm redirection: > typeof (qsort) __asm ("qsort_chk"); > and define extern "C" qsort_chk somewhere? > configure could test whether it works on the target (it wouldn't perhaps > on targets which already use some inline for qsort in their headers or use > qsort as a macro (but the latter wouldn't work anyway with GCC already)). I'd rather not go this way. > The _5th macro isn't that bad either, appart from using reserved namespace > identifiers (it really should be something like qsort_5th and the arguments > shouldn't start with underscores). I didn't understand what Jeff found "ugly" about it; I wonder what epithets would be applied then to more, ahem, unusual parts of GCC. Thanks. Alexander
Re: [PATCH 6/6] qsort comparator consistency checking
On Thu, 2017-08-03 at 19:23 +0300, Alexander Monakov wrote: > > Note that with vec::qsort -> vec::sort renaming (which should be less > controversial, STL also has std::vector::sort) No it doesn't? One uses std::sort from on a pair of random access iterators to sort a std::vector. Cheers, Oleg
Re: [PATCH 6/6] qsort comparator consistency checking
On Thu, Aug 03, 2017 at 07:23:31PM +0300, Alexander Monakov wrote: > On Thu, 3 Aug 2017, Jakub Jelinek wrote: > > Do we really need to rename and poison anything? qsort () in the source is > > something that is most obvious to developers, so trying to force them to use > > something different will just mean extra thing to learn. > > Yep, I'd prefer to have a solution that keeps C-style qsort invocations as-is. > > Note that with vec::qsort -> vec::sort renaming (which should be less > controversial, STL also has std::vector::sort), the argument counting vec::qsort -> vec::sort renaming is fine with me. Jakub
Re: [PATCH] Fix PR81503 (SLSR invalid fold)
On Aug 3, 2017, at 11:20 AM, Jakub Jelinek wrote: > > On Thu, Aug 03, 2017 at 11:05:54AM -0500, Bill Schmidt wrote: >> --- gcc/gimple-ssa-strength-reduction.c (revision 250791) >> +++ gcc/gimple-ssa-strength-reduction.c (working copy) >> @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ >> { >> tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); >> enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); >> + unsigned int prec = TYPE_PRECISION (target_type); >> + tree maxval = (POINTER_TYPE_P (target_type) >> + ? TYPE_MAX_VALUE (sizetype) >> + : TYPE_MAX_VALUE (target_type)); >> >> /* It is highly unlikely, but possible, that the resulting >> bump doesn't fit in a HWI. Abandon the replacement >> @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ >> types but allows for safe negation without twisted logic. */ >> if (wi::fits_shwi_p (bump) >> && bump.to_shwi () != HOST_WIDE_INT_MIN >> + /* It is more likely that the bump doesn't fit in the target >> + type, so check whether constraining it to that type changes >> + the value. For a signed type, the value mustn't change. >> + For an unsigned type, the value may only change to a >> + congruent value (for negative bumps). */ >> + && (TYPE_UNSIGNED (target_type) >> + || wi::eq_p (bump, wi::ext (bump, prec, SIGNED))) >> + && (!TYPE_UNSIGNED (target_type) >> + || wi::eq_p (bump, wi::ext (bump, prec, UNSIGNED)) >> + || wi::eq_p (bump + wi::to_widest (maxval) + 1, >> + wi::ext (bump, prec, UNSIGNED))) > > wi::ext makes only sense if used with variable TYPE_SIGNED, when > used with constant, it is better to use wi::sext or wi::zext. Okay. > And, wouldn't it be more readable to use: > && (TYPE_UNSIGNED (target_type) > ? (wi::eq_p (bump, wi::zext (bump, prec)) >|| wi::eq_p (bump + wi::to_widest (maxval) + 1, > wi::zext (bump, prec))) > : wi::eq_p (bump, wi::sext (bump, prec))) > ? Probably. As noted, it's all becoming a bit unreadable with too much negative logic in a long conditional, so I want to clean that up in a follow-up. > For TYPE_UNSIGNED, do you actually need any restriction? > What kind of bump values are wrong for unsigned types and why? If the value of the bump is actually larger than the precision of the type (not likely except for quite small types), say 2 * (maxval + 1) which is congruent to 0, the replacement is wrong. Bill > > Jakub >
Re: [PATCH 6/6] qsort comparator consistency checking
On Fri, 4 Aug 2017, Oleg Endo wrote: > > Note that with vec::qsort -> vec::sort renaming (which should be less > > controversial, STL also has std::vector::sort) > > No it doesn't? One uses std::sort from on a pair of random > access iterators to sort a std::vector. My mistake, but the main point remains: STL uses 'sort' without the 'q'. Thanks. Alexander
Re: [PATCH 6/6] qsort comparator consistency checking
On Thu, 2017-08-03 at 19:31 +0300, Alexander Monakov wrote: > > My mistake, but the main point remains: STL uses 'sort' without the > 'q'. I think it'd be better if GCC's custom containers somewhat tried to follow C++ standard container idioms. Chopping off the 'q' is one step towards it. Cheers, Oleg
Re: [PATCH] Fix PR81503 (SLSR invalid fold)
On Thu, Aug 03, 2017 at 11:29:44AM -0500, Bill Schmidt wrote: > > And, wouldn't it be more readable to use: > > && (TYPE_UNSIGNED (target_type) > > ? (wi::eq_p (bump, wi::zext (bump, prec)) > > || wi::eq_p (bump + wi::to_widest (maxval) + 1, > > wi::zext (bump, prec))) > > : wi::eq_p (bump, wi::sext (bump, prec))) > > ? > > Probably. As noted, it's all becoming a bit unreadable with too > much negative logic in a long conditional, so I want to clean that > up in a follow-up. > > > For TYPE_UNSIGNED, do you actually need any restriction? > > What kind of bump values are wrong for unsigned types and why? > > If the value of the bump is actually larger than the precision of the > type (not likely except for quite small types), say 2 * (maxval + 1) > which is congruent to 0, the replacement is wrong. Ah, ok. Anyway, for unsigned type, perhaps it could be written as: && (TYPE_UNSIGNED (target_type) ? wi::eq_p (wi::neg_p (bump) ? bump + wi::to_widest (maxval) + 1 : bump, wi::zext (bump, prec)) : wi::eq_p (bump, wi::sext (bump, prec))) I mean, if bump >= 0, then the bump + wi::to_widest (maxval) + 1 value has no chance to be equal to zero extended bump, and for bump < 0 only that one has a chance. Jakub
Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8
From: Qing Zhao Date: Thu, 3 Aug 2017 10:37:15 -0500 > all the special handling on STRICT_ALIGNMENT or > SLOW_UNALIGNMENT_ACCESS in these codes have the following common > logic: > > if the memory access is known to be not-aligned well during > compilation time, if the targeted platform does NOT support faster > unaligned memory access, the compiler will try to make the memory > access aligned well. Otherwise, if the targeted platform supports > faster unaligned memory access, it will leave the compiler-time > known not-aligned memory access as it, later the hardware support > will kicked in for these unaligned memory access. > > this behavior is consistent with the high level definition of > STRICT_ALIGNMENT. That's exactly the problem. What you want with this M8 feature is simply to let the compiler know that if it is completely impossible to make some memory object aligned, then the cpu can handle this with special instructions. You still want the compiler to make the effort to align data when it can because the accesses will be faster than if it used the unaligned loads and stores. This is incredibly important for on-stack objects.
Re: [PATCH] Verify edge probability consistency in verify_flow_info
On 08/02/2017 10:07 AM, Tom de Vries wrote: > Hi, > > I. > > for target nvptx we recently ran into PR81442, an ICE in verify_flow_info: > ... > error: verify_flow_info: REG_BR_PROB is set but cfg probability is not > ... > > We start out with a jump instruction: > ... > (jump_insn 18 17 31 2 (set (pc) > (if_then_else (ne (reg:BI 83) > (const_int 0 [0])) > (label_ref 31) > (pc))) "reduction-5.c":52 104 {br_true} > (expr_list:REG_DEAD (reg:BI 83) > (int_list:REG_BR_PROB 1 (nil))) > -> 31) > ... > > The probability is set on the branch edge, but not on the fallthru edge. > > After fixup_reorder_chain, the condition is reversed, and the > probability reg-note update accordingly: > ... > (jump_insn 18 17 32 2 (set (pc) > (if_then_else (eq (reg:BI 83) > (const_int 0 [0])) > (label_ref:DI 33) > (pc))) "reduction-5.c":52 105 {br_false} > (expr_list:REG_DEAD (reg:BI 83) > (int_list:REG_BR_PROB 0 (nil))) > -> 33) > ... > > The branch and fallthru edge are also reversed, which means now the > probability is set on the fallthru edge, and not on the branch edge. > > This is the immediate cause for the verify_flow_info error. > > > II. > > We've fixed the ICE in the target by setting the probability on the > fallthru edge when we introduce it (r250422). > > We've noted other places where we were forgetting to set the probability > (fixed in r250823), but that did not trigger the ICE. > > > III. > > I've written this patch to check for the missing probability more > consistently. I'm not certain if we can require that the probability > should always be set, so I'm just requiring that if it is set on one > outgoing edge, it needs to be set on all outgoing edges. > > Sofar I've build a c-only x86_64 non-bootstrap compiler and ran dg.exp. > The only problem I ran into was in attr-simd{,-2,-4}.c. I've written a > tentative patch for that, and will submit it as follow-up. > > Is this check a good idea? I think the additional checking is a good idea. Ideally we'd verify that all edges have a probability. Until then I think you need some kind of rationale in a comment for why the checking is limited. > > OK for trunk if bootstrap and reg-test on x86_64 succeeds? Yea, but I'd like to see ongoing work towards full checking. Jeff
Re: [PING^4][PATCH v2] Generate reproducible output independently of the build-path
Jeff Law: > On 08/02/2017 08:06 PM, Ximin Luo wrote: >> Jeff Law: >>> On 07/21/2017 10:15 AM, Ximin Luo wrote: (Please keep me on CC, I am not subscribed) Proposal This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value" command-line arguments that precede any explicit ones. This makes the final binary output reproducible, and also hides the unreproducible value (the source path prefixes) from CFLAGS et. al. which many build tools (understandably) embed as-is into their build output. >>> I'd *really* avoid doing this with magic environment variables. Make it >>> a first class option to the compiler. Yes, it means projects that want >>> this behavior have to arrange to pass that flag to their compiler, but >>> IMHO that's much preferred over environment variables. >>> >>> Jeff >>> >> >> Hi Jeff, >> >> If by "first class option" you meant a command-line flag, GCC *already has* >> that (-fdebug-prefix-map) and it wasn't enough to achieve reproducibility in >> many cases we tested. dpkg-buildflags actually already adds these flags to >> CFLAGS CXXFLAGS etc on Debian. However, with this patch using the >> environment variable, we are able to reproduce 1800 more packages out of >> 26000. > Then take what you've done with the environment variable and instead > implement it on top of a switch. An environment variable is absolutely the > wrong thing to do here. > >> >> GCC already supports a similar environment variable SOURCE_DATE_EPOCH, which >> was accepted about 2 years ago in a patch written by one of our GSoC >> students. We are not planning any more environment variables like this, and >> are committed to fixing other sources of non-determinism by patching the >> relevant build scripts. > I would have rejected that as well :-) One of the few times I would > have disagreed with Bernd. > > Could you go into some more detail on why you think an envvar is absolutely the wrong thing to do here? X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
* Jeff Law: >> If speed is not a concern, but reliability is, call fork (the system >> call, not glibc's wrapper which calls fork handlers) and do the work >> in a single-threaded copy of the process. There, you can set up >> signal handlers as you see fit, and the VM layout won't change >> unexpectedly. > Speed is one of the concerns. I thought this was debugging-only, optional unwinder behavior. > Essentially we're pondering an efficient > mechanism to know if a given code address is valid so that we know if we > should even try to unwind through a particular stack frame. Efficient code address validation is possible using data structures maintained by ld.so. If those data structures are mostly read-only, they cannot be corrupted, so pointers and offsets in them don't have to be validated before derefencing them. But the main problem is the frame pointer chain, and the current lack of registration for stacks. The program can just malloc some area, switch the SP, and start running from there. Some libraries do exactly that. And without knowledge of the bottom address of the stack, it is impossible to verify that the frame pointer chain goes somewhere off the stack. > One could probably claim that in the EH path a call to msync shouldn't > be a concern. I'm skeptical these days, but could be convinced that > it's tolerable. Quite a few applications are sensitive to EH performance. The current freshness check for the cache is very inefficient, which is a significant problem for some users. If we can push out the flag check the top level (essentially compiling the unwinder twice, with and without checking), I don't think this would cause significant additional performance issues.
Re: [PATCH][2/2] early LTO debug, main part
On Thu, Aug 3, 2017 at 6:51 AM, Richard Biener wrote: > On Wed, 2 Aug 2017, Jason Merrill wrote: > >> On Wed, Aug 2, 2017 at 6:35 AM, Richard Biener wrote: >> > On Wed, 2 Aug 2017, Jason Merrill wrote: >> > >> >> On 05/19/2017 06:42 AM, Richard Biener wrote: >> >> > + /* ??? In some cases the C++ FE (at least) fails to >> >> > +set DECL_CONTEXT properly. Simply globalize stuff >> >> > +in this case. For example >> >> > +__dso_handle created via iostream line 74 col 25. */ >> >> > + parent = comp_unit_die (); >> >> >> >> I've now fixed __dso_handle, so that can be removed from the comment, but >> >> it >> >> still makes sense to have this fall-back for the (permitted) case of null >> >> DECL_CONTEXT. >> > >> > Adjusted the comment. >> > >> >> > + /* ??? LANG issue - DW_TAG_module for fortran. Either look >> >> > +at the input language (if we have enough DECL_CONTEXT to follow) >> >> > +or use a bit in tree_decl_with_vis to record the distinction. */ >> >> >> >> Sure, you should be able to look at TRANSLATION_UNIT_LANGUAGE. >> > >> > Yeah, the comment says we might be able to walk DECL_CONTEXT up to >> > a TRANSLATION_UNIT_DECL. I've amended is_fortran similar to as I >> > amended is_cxx, providing an overload for a decl, factoring out common >> > code. So this is now if (is_fortran (decl)) ... = new_die >> > (DW_TAG_module,...). >> > >> >> > ! /* ??? We cannot unconditionally output die_offset if >> >> > !non-zero - at least -feliminate-dwarf2-dups will >> >> > !create references to those DIEs via symbols. And we >> >> > !do not clear its DIE offset after outputting it >> >> > !(and the label refers to the actual DIEs, not the >> >> > !DWARF CU unit header which is when using label + offset >> >> > !would be the correct thing to do). >> >> >> >> As in our previous discussion, I think -feliminate-dwarf2-dups can go away >> >> now. But this is a more general issue: die_offset has meant the offset >> >> from >> >> the beginning of the CU, but if with_offset is set it means an offset from >> >> die_symbol. Since with_offset changes the meaning of die_symbol and >> >> die_offset, having different code here depending on that flag makes sense. >> >> >> >> It seems likely that when -fEDD goes away, we will never again want to do >> >> direct symbolic references to DIEs, in which case we could drop the >> >> current >> >> meaning of die_symbol, and so we wouldn't need the with_offset flag. >> > >> > Yeah, I've been playing with a patch to remove -fEDD but it has conflicts >> > with the early LTO debug work and thus I wanted to postpone it until >> > after that goes in to avoid further churn. I hope that's fine, it's >> > sth I'll tackle soon after this patch lands on trunk. >> >> Sure. >> >> >> > ! /* Don't output the symbol twice. For LTO we want the label >> >> > ! on the section beginning, not on the actual DIE. */ >> >> > ! && (!flag_generate_lto >> >> > ! || die->die_tag != DW_TAG_compile_unit)) >> >> >> >> I think this check should just be !with_offset; if that flag is set the >> >> DIE >> >> doesn't actually have its own symbol. >> > >> > with_offset is set only during LTRANS phase for the DIEs refering to >> > the early DIEs via the CU label. But the above is guarding the >> > early phase when we do not want to output that CU label twice. >> > >> > Can we revisit this check when -fEDD has gone away? >> >> Yes. >> >> >> > ! if (old_die >> >> > ! && (c = get_AT_ref (old_die, DW_AT_abstract_origin)) >> >> > ! /* ??? In LTO all origin DIEs still refer to the early >> >> > !debug copy. Detect that. */ >> >> > ! && get_AT (c, DW_AT_inline)) >> >> ... >> >> > ! /* "Unwrap" the decls DIE which we put in the imported unit >> >> > context. >> >> > ! ??? If we finish dwarf2out_function_decl refactoring we can >> >> > ! do this in a better way from the start and only lazily emit >> >> > ! the early DIE references. */ >> >> >> >> It seems like in gen_subprogram_die you deliberately avoid reusing the DIE >> >> from dwarf2out_register_external_die (since it doesn't have >> >> DW_AT_inline), and >> >> then in add_abstract_origin_attribute you need to look through that >> >> redundant >> >> die. Why not reuse it? >> > >> > What we're doing here is dealing with the case of an inlined >> > instance which is adjusted to point back to the early debug copy >> > directly than to the wrapping DIE (supposed to eventually contain >> > the concrete instance). >> >> But we enter this block when we're emitting the concrete out-of-line >> instance, and the DW_AT_inline check prevents us from using the >> wrapping DIE for the out-of-line instance. >> >> The comment should really change "inlined instance" to "concrete >> instance"; inlined instances are handled in >> gen_inlined_
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
* Yuri Gribov: > I've rebased the previous patch to trunk per Andrew's suggestion. > Original patch description/motivation/questions are in > https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html What about the remaining TODOs? > + if (__builtin_expect (msync ((void *)page, len, MS_ASYNC), 0)) > +return 0; Is there any reason not to probe using mincore? It won't trigger writeback.
Re: [PATCH, gcc/Makefile.in] Fix typo on ref to multi_dir variable in install-mkheaders
> On 03 Aug 2017, at 17:30, Jeff Law wrote: > >>* Makefile.in (install-mkheaders): Fix typo, where the multi_dir >>variable was referenced as multidir in command. >> > OK. Great, Thanks Jeff!
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On 08/03/2017 11:03 AM, Florian Weimer wrote: > * Jeff Law: > >>> If speed is not a concern, but reliability is, call fork (the system >>> call, not glibc's wrapper which calls fork handlers) and do the work >>> in a single-threaded copy of the process. There, you can set up >>> signal handlers as you see fit, and the VM layout won't change >>> unexpectedly. > >> Speed is one of the concerns. > > I thought this was debugging-only, optional unwinder behavior. My understanding was this was in the EH path as well. If it's outside the EH path, then I don't care about the cost of msync and the like :-) > >> Essentially we're pondering an efficient >> mechanism to know if a given code address is valid so that we know if we >> should even try to unwind through a particular stack frame. > > Efficient code address validation is possible using data structures > maintained by ld.so. If those data structures are mostly read-only, > they cannot be corrupted, so pointers and offsets in them don't have > to be validated before derefencing them. Understood. > > But the main problem is the frame pointer chain, and the current lack > of registration for stacks. The program can just malloc some area, > switch the SP, and start running from there. Some libraries do > exactly that. And without knowledge of the bottom address of the > stack, it is impossible to verify that the frame pointer chain goes > somewhere off the stack. Yea. > >> One could probably claim that in the EH path a call to msync shouldn't >> be a concern. I'm skeptical these days, but could be convinced that >> it's tolerable. > > Quite a few applications are sensitive to EH performance. The current > freshness check for the cache is very inefficient, which is a > significant problem for some users. If we can push out the flag check > the top level (essentially compiling the unwinder twice, with and > without checking), I don't think this would cause significant > additional performance issues. Yea. I went back and reviewed the internal BZ. It's actually just the print-a-backtrace thing rather than EH unwinding. SO yea, if the print-a-backtrace path and EH path both use bits, we could just compile them twice, once with checking once without respectively. Jeff
Re: [PATCH 1/3] matching tokens: c-family parts
On 08/01/2017 02:21 PM, David Malcolm wrote: > (Unchanged since v1; already approved by Marek, assuming rest is approved) > > gcc/c-family/ChangeLog: > * c-common.c (c_parse_error): Add rich_location * param, using it > rather implicitly using input_location. > * c-common.h (c_parse_error): Add rich_location * param. > > gcc/testsuite/ChangeLog: > * c-c++-common/missing-close-symbol.c: New test case. > * c-c++-common/missing-symbol.c: New test case. LGTM. jeff
Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote: > Hi Mike, > > On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote: > > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote: > > > I think calling this with the rtx elementN args makes this only more > > > complicated (the function comment doesn't say what they are or what > > > NULL means, btw). > > You didn't handle the first part of this as far as I see? It's the > big complicating issue here. I am not sure exactly what you are asking for. This is like the other output functions that take the rtx insns. > > + If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is > > + non-NULL, it is a 0 or 1 constant that gives the vector element number > > to > > + use for extracting the 64-bit double word from ARG1. > > + > > + If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is > > + non-NULL, it is a 0 or 1 constant that gives the vector element number > > to > > + use for extracting the 64-bit double word from ARG2. > > + > > + The element number is based on the user element ordering, set by the > > + endianess and by the -maltivec={le,be} options. */ > > ("endianness", two n's). > > I don't like using NULL as a magic value at all; it does not simplify > this interface, it complicates it instead. > > Can you move the "which half is high" decision to the callers? And then essentially there is no need for the function, since each of the 4 concat variants have to have the logic to support big endian, -maltivec=be, and -maltivec=le. Let me see what I can do about it. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH 2/3] Matching tokens: C parts (v2)
On 08/01/2017 02:21 PM, David Malcolm wrote: > Changed in v2: > > * Renamed template argument to traits_t; eliminated subclasses, just > using traits struct. > * Moved enum constants into struct bodies (string constants can't be > without constexpr, which isn't available in C++98). > * Fixed typo. > > OK for trunk? > > gcc/c/ChangeLog: > * c-parser.c (c_parser_error): Rename to... > (c_parser_error_richloc): ...this, making static, and adding > "richloc" parameter, passing it to the c_parse_error call, > rather than calling c_parser_set_source_position_from_token. > (c_parser_error): Reintroduce, reimplementing in terms of the > above, converting return type from void to bool. > (class token_pair): New class. > (struct matching_paren_traits): New struct. > (matching_parens): New typedef. > (struct matching_brace_traits): New struct. > (matching_braces): New typedef. > (get_matching_symbol): New function. > (c_parser_require): Add param MATCHING_LOCATION, using it to > highlight matching "opening" tokens for missing "closing" tokens. > (c_parser_skip_until_found): Likewise. > (c_parser_static_assert_declaration_no_semi): Convert explicit > parsing of CPP_OPEN_PAREN and CPP_CLOSE_PAREN to use of > class matching_parens, so that the pertinent open parenthesis is > highlighted when there are problems locating the close > parenthesis. > (c_parser_struct_or_union_specifier): Likewise. > (c_parser_typeof_specifier): Likewise. > (c_parser_alignas_specifier): Likewise. > (c_parser_simple_asm_expr): Likewise. > (c_parser_braced_init): Likewise, for matching_braces. > (c_parser_paren_condition): Likewise, for matching_parens. > (c_parser_switch_statement): Likewise. > (c_parser_for_statement): Likewise. > (c_parser_asm_statement): Likewise. > (c_parser_asm_operands): Likewise. > (c_parser_cast_expression): Likewise. > (c_parser_sizeof_expression): Likewise. > (c_parser_alignof_expression): Likewise. > (c_parser_generic_selection): Likewise. > (c_parser_postfix_expression): Likewise for cases RID_VA_ARG, > RID_OFFSETOF, RID_TYPES_COMPATIBLE_P, RID_AT_SELECTOR, > RID_AT_PROTOCOL, RID_AT_ENCODE, reindenting as necessary. > In case CPP_OPEN_PAREN, pass loc_open_paren to the > c_parser_skip_until_found call. > (c_parser_objc_class_definition): Use class matching_parens as > above. > (c_parser_objc_method_decl): Likewise. > (c_parser_objc_try_catch_finally_statement): Likewise. > (c_parser_objc_synchronized_statement): Likewise. > (c_parser_objc_at_property_declaration): Likewise. > (c_parser_oacc_wait_list): Likewise. > (c_parser_omp_var_list_parens): Likewise. > (c_parser_omp_clause_collapse): Likewise. > (c_parser_omp_clause_default): Likewise. > (c_parser_omp_clause_if): Likewise. > (c_parser_omp_clause_num_threads): Likewise. > (c_parser_omp_clause_num_tasks): Likewise. > (c_parser_omp_clause_grainsize): Likewise. > (c_parser_omp_clause_priority): Likewise. > (c_parser_omp_clause_hint): Likewise. > (c_parser_omp_clause_defaultmap): Likewise. > (c_parser_oacc_single_int_clause): Likewise. > (c_parser_omp_clause_ordered): Likewise. > (c_parser_omp_clause_reduction): Likewise. > (c_parser_omp_clause_schedule): Likewise. > (c_parser_omp_clause_num_teams): Likewise. > (c_parser_omp_clause_thread_limit): Likewise. > (c_parser_omp_clause_aligned): Likewise. > (c_parser_omp_clause_linear): Likewise. > (c_parser_omp_clause_safelen): Likewise. > (c_parser_omp_clause_simdlen): Likewise. > (c_parser_omp_clause_depend): Likewise. > (c_parser_omp_clause_map): Likewise. > (c_parser_omp_clause_device): Likewise. > (c_parser_omp_clause_dist_schedule): Likewise. > (c_parser_omp_clause_proc_bind): Likewise. > (c_parser_omp_clause_uniform): Likewise. > (c_parser_omp_for_loop): Likewise. > (c_parser_cilk_clause_vectorlength): Likewise. > (c_parser_cilk_clause_linear): Likewise. > (c_parser_transaction_expression): Likewise. > * c-parser.h (c_parser_require): Add param matching_location with > default UNKNOWN_LOCATION. > (c_parser_error): Convert return type from void to bool. > (c_parser_skip_until_found): Add param matching_location with > default UNKNOWN_LOCATION. > > gcc/testsuite/ChangeLog: > * gcc.dg/unclosed-init.c: New test case. Phew. I only spot-checked most of the changes around the new API for requiring the open/close paren/brace/bracket or consuming parens/braces/brackets. They were very mechanical :-) Jeff
Re: [PATCH 1/3] c-family: add name_hint/deferred_diagnostic
On 07/03/2017 11:32 AM, David Malcolm wrote: > On Mon, 2017-07-03 at 10:25 -0600, Jeff Law wrote: >> On 05/05/2017 11:51 AM, David Malcolm wrote: >>> In various places we use lookup_name_fuzzy to provide a hint, >>> and can report messages of the form: >>> error: unknown foo named 'bar' >>> or: >>> error: unknown foo named 'bar'; did you mean 'SUGGESTION? >>> >>> This patch provides a way for lookup_name_fuzzy to provide >>> both the suggestion above, and (optionally) additional hints >>> that can be printed e.g. >>> >>> note: did you forget to include ? >>> >>> This patch provides the mechanism and ports existing users >>> of lookup_name_fuzzy to the new return type. >>> There are no uses of such hints in this patch, but followup >>> patches provide various front-end specific uses of this. >>> >>> gcc/c-family/ChangeLog: >>> * c-common.h (class deferred_diagnostic): New class. >>> (class name_hint): New class. >>> (lookup_name_fuzzy): Convert return type from const char * >>> to name_hint. Add location_t param. >>> >>> gcc/c/ChangeLog: >>> * c-decl.c (implicit_decl_warning): Convert "hint" from >>> const char * to name_hint. Pass location to >>> lookup_name_fuzzy. Suppress any deferred diagnostic if the >>> warning was not printed. >>> (undeclared_variable): Likewise for "guessed_id". >>> (lookup_name_fuzzy): Convert return type from const char * >>> to name_hint. Add location_t param. >>> * c-parser.c (c_parser_declaration_or_fndef): Convert "hint" >>> from >>> const char * to name_hint. Pass location to lookup_name_fuzzy. >>> (c_parser_parameter_declaration): Pass location to >>> lookup_name_fuzzy. >>> >>> gcc/cp/ChangeLog: >>> * name-lookup.c (suggest_alternatives_for): Convert >>> "fuzzy_name" from >>> const char * to name_hint, and rename to "hint". Pass location >>> to >>> lookup_name_fuzzy. >>> (lookup_name_fuzzy): Convert return type from const char * >>> to name_hint. Add location_t param. >>> * parser.c (cp_parser_diagnose_invalid_type_name): Convert >>> "suggestion" from const char * to name_hint, and rename to >>> "hint". >>> Pass location to lookup_name_fuzzy. >> >>> --- >>> gcc/c-family/c-common.h | 121 >>> +++- >>> gcc/c/c-decl.c | 35 +++--- >>> gcc/c/c-parser.c| 16 --- >>> gcc/cp/name-lookup.c| 17 +++ >>> gcc/cp/parser.c | 12 ++--- >>> 5 files changed, 163 insertions(+), 38 deletions(-) >>> >>> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h >>> index 138a0a6..83c1a68 100644 >>> --- a/gcc/c-family/c-common.h >>> +++ b/gcc/c-family/c-common.h >>> @@ -1009,7 +1009,126 @@ enum lookup_name_fuzzy_kind { >>>/* Any name. */ >>>FUZZY_LOOKUP_NAME >>> }; >>> -extern const char *lookup_name_fuzzy (tree, enum >>> lookup_name_fuzzy_kind); >>> + >>> +/* A deferred_diagnostic is a wrapper around optional extra >>> diagnostics >>> + that we may want to bundle into a name_hint. >>> + >>> + The emit method is called when no name_hint instances reference >>> + the deferred_diagnostic. In the simple case this is when the >>> name_hint >>> + goes out of scope, but a reference-counting scheme is used to >>> allow >>> + name_hint instances to be copied. */ >>> + >>> +class deferred_diagnostic >>> +{ >>> + public: >>> + virtual ~deferred_diagnostic () {} >>> + virtual void emit () = 0; >>> + >>> + void incref () { m_refcnt++; } >>> + void decref () >>> + { >>> +if (--m_refcnt == 0) >>> + { >>> + if (!m_suppress) >>> + emit (); >>> + delete this; >>> + } >>> + } >>> + >>> + location_t get_location () const { return m_loc; } >>> + >>> + /* Call this if the corresponding warning was not emitted, >>> + in which case we should also not emit the >>> deferred_diagnostic. */ >>> + void suppress () >>> + { >>> +m_suppress = true; >>> + } >>> + >>> + protected: >>> + deferred_diagnostic (location_t loc) >>> + : m_refcnt (0), m_loc (loc), m_suppress (false) {} >>> + >>> + private: >>> + int m_refcnt; >>> + location_t m_loc; >>> + bool m_suppress; >>> +}; >> So what stands out here is "delete this" and the need for explicit >> reference counting. Also doesn't that imply that deferred_diagnostic >> objects must be allocated on the heap? Is there another way to get >> the >> behavior you want without resorting to something like this? >> > > Thanks for looking at this. > > Yes: deferred_diagnostic instances are heap-allocated. This is because > it's an abstract base class; each concrete subclass is an > implementation detail within the frontends, for isolating the special > -case logic for each different kind of hint, and thus these concrete > subclasses are hidden within the FE code. > > My initial implementation of the above had the name_hint class directly > "own" the deferred_diagnostic ptr, with a: > delete m_deferred; > within name
Re: [PATCH] Fix PR81503 (SLSR invalid fold)
-- Bill Bill Schmidt, Ph.D. GCC for Linux on Power Linux on Power Toolchain IBM Linux Technology Center wschm...@linux.vnet.ibm.com > On Aug 3, 2017, at 11:39 AM, Jakub Jelinek wrote: > > On Thu, Aug 03, 2017 at 11:29:44AM -0500, Bill Schmidt wrote: >>> And, wouldn't it be more readable to use: >>> && (TYPE_UNSIGNED (target_type) >>> ? (wi::eq_p (bump, wi::zext (bump, prec)) >>> || wi::eq_p (bump + wi::to_widest (maxval) + 1, >>> wi::zext (bump, prec))) >>> : wi::eq_p (bump, wi::sext (bump, prec))) >>> ? >> >> Probably. As noted, it's all becoming a bit unreadable with too >> much negative logic in a long conditional, so I want to clean that >> up in a follow-up. >> >>> For TYPE_UNSIGNED, do you actually need any restriction? >>> What kind of bump values are wrong for unsigned types and why? >> >> If the value of the bump is actually larger than the precision of the >> type (not likely except for quite small types), say 2 * (maxval + 1) >> which is congruent to 0, the replacement is wrong. > > Ah, ok. Anyway, for unsigned type, perhaps it could be written as: > && (TYPE_UNSIGNED (target_type) > ? wi::eq_p (wi::neg_p (bump) ? bump + wi::to_widest (maxval) + 1 > : bump, wi::zext (bump, prec)) > : wi::eq_p (bump, wi::sext (bump, prec))) > I mean, if bump >= 0, then the bump + wi::to_widest (maxval) + 1 > value has no chance to be equal to zero extended bump, and > for bump < 0 only that one has a chance. Yeah, that's true. And arguably my case for the really large bump causing problems is kind of thin, because the program is probably already broken in that case anyway. But I think I will sleep better having the check in there, as somebody other than SLSR will catch the bug then. ;-) Thanks for all the help with this one. These corner cases are always tricky, and I appreciate the extra eyeballs. Bill > > Jakub >
libgo patch committed: Fix signal counting for glibc 2.26
This patch to libgo changes the mksigtab script to recognize the glibc 2.26 NSIG expression. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Will commit to GCC 7 branch when it reopens. This fixes GCC PR 81617 and https://golang.org/issue/21147. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 250832) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -c1ac6bc99f988633c6bc68a5ca9ffad3487750ef +adac632f95d1cd3421c9c1df5204db10b6a92c44 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/mksigtab.sh === --- libgo/mksigtab.sh (revision 250406) +++ libgo/mksigtab.sh (working copy) @@ -107,6 +107,19 @@ if test "${GOOS}" = "aix"; then nsig=`expr $nsig + 1` else nsig=`grep 'const _*NSIG = [0-9]*$' gen-sysinfo.go | sed -e 's/.* = \([0-9]*\)/\1/'` +if test -z "$nsig"; then + if grep 'const _*NSIG = [ (]*_*SIGRTMAX + 1[ )]*' gen-sysinfo.go >/dev/null 2>&1; then + rtmax=`grep 'const _*SIGRTMAX = [0-9]*$' gen-sysinfo.go | sed -e 's/.* = \([0-9]*\)/\1/'` + if test -n "$rtmax"; then + nsig=`expr $rtmax + 1` + fi + fi +fi +fi + +if test -z "$nsig"; then +echo 1>&2 "could not determine number of signals" +exit 1 fi i=1
Re: [PATCH, rs6000] Clean up capitalized diagnostic messages
> On Aug 2, 2017, at 6:43 PM, Segher Boessenkool > wrote: > > Hi Bill, > > On Wed, Aug 02, 2017 at 10:29:20AM -0500, Bill Schmidt wrote: >> I don't anticipate backporting any of this. > > Good :-) > >> @@ -6802,7 +6802,7 @@ altivec_resolve_overloaded_builtin (location_t loc >> if (unsupported_builtin) >> { >> const char *name = rs6000_overloaded_builtin_name (fcode); >> -error ("Builtin function %s not supported in this compiler >> configuration", >> +error ("builtin function %s not supported in this compiler >> configuration", >> name); > > As Martin says, %qs for this and similar (see the documentation before > pp_format in pretty-print.c). Can be a separate patch of course, this > one is big enough already. > >> Index: gcc/config/rs6000/rs6000.c >> === >> --- gcc/config/rs6000/rs6000.c (revision 250791) >> +++ gcc/config/rs6000/rs6000.c (working copy) >> @@ -4132,7 +4132,7 @@ rs6000_option_override_internal (bool global_init_ >> || rs6000_cpu == PROCESSOR_PPCE5500) >> { >> if (TARGET_ALTIVEC) >> -error ("AltiVec not supported in this target"); >> +error ("altivec not supported in this target"); >> } >> >> /* If we are optimizing big endian systems for space, use the load/store > > Let's either keep AltiVec or say -maltivec. We only have this warning > because we allow -maltivec with CPUs that do not support it; and this > warning is only for some of the FSL CPUs. It isn't very consistent. Back to AltiVec it goes! Thanks. > >> @@ -4250,7 +4250,7 @@ rs6000_option_override_internal (bool global_init_ >>rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~ignore_masks); >> } >>else >> -error ("Power9 target option is incompatible with -mcpu= for " >> +error ("power9 target option is incompatible with -mcpu= for " >> " less than power9"); >> } >> else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit) > > See also PR79477. Since many of these options are going away it is > probably not worth spending too much time on this, not until stage 3 > or so anyway. Yeah, let's address that later in the year after Mike finishes his cleanups. > >> @@ -11226,7 +11226,7 @@ rs6000_return_in_memory (const_tree type, const_tr >> static bool warned_for_return_big_vectors = false; >> if (!warned_for_return_big_vectors) >> { >> - warning (OPT_Wpsabi, "GCC vector returned by reference: " >> + warning (OPT_Wpsabi, "gcc vector returned by reference: " >> "non-standard ABI extension with no compatibility >> guarantee"); >>warned_for_return_big_vectors = true; >> } > > Maybe the warning should just say "big vector"? Or "generic vector"? > > (Vectors that fit in one VR, or in GPRs in 8 bytes or less, do not have > the problem this warns for. Kind of hard to express tersely and > precisely though). I looked in the GCC manual and couldn't find a better way of expressing this than just "GCC vector," so I will return it to the way it was. "GCC vector extension vector" is accurate but hardly trips lightly off the tongue... > > Approved for trunk with whichever of the suggested changes you think > are good. Thanks, Thanks much! Bill > > > Segher >
Re: [PATCH 1/1] sparc: support for -mmisalign in the SPARC M8
> On Aug 3, 2017, at 11:40 AM, David Miller wrote: > > From: Qing Zhao > Date: Thu, 3 Aug 2017 10:37:15 -0500 > >> all the special handling on STRICT_ALIGNMENT or >> SLOW_UNALIGNMENT_ACCESS in these codes have the following common >> logic: >> >> if the memory access is known to be not-aligned well during >> compilation time, if the targeted platform does NOT support faster >> unaligned memory access, the compiler will try to make the memory >> access aligned well. Otherwise, if the targeted platform supports >> faster unaligned memory access, it will leave the compiler-time >> known not-aligned memory access as it, later the hardware support >> will kicked in for these unaligned memory access. >> >> this behavior is consistent with the high level definition of >> STRICT_ALIGNMENT. > > That's exactly the problem. > > What you want with this M8 feature is simply to let the compiler know > that if it is completely impossible to make some memory object > aligned, then the cpu can handle this with special instructions. > > You still want the compiler to make the effort to align data when it > can because the accesses will be faster than if it used the unaligned > loads and stores. I don’t think the above is true. first, the compiler-time known misaligned memory access can always be emulated by aligned memory access ( by byte-size load/stores). then there will be no compiler-time known misaligned memory access left for the special misaligned ld/st insns. second, there are always overhead cost for the compiler-time effort to make the compiler-time known unaligned memory access as aligned memory access. (adding additional padding, or split the unaligned multi-bytes to single-byte load/store), all such overhead might be even bigger than the overhead of the special misaligned load/store itself. to decide which is better (to use software emulation or use hardware misaligned load/store insns), experiments might be needed to justify the performance impact. This set of change is to provide a way to use misaligned load/store insns to implement the compiler-time known unaligned memory access, -mno-misalign can be used to disable such behavior very easily if our performance data shows that misaligned load/store insns are slower than the current software emulation. Qing > > This is incredibly important for on-stack objects.
Re: [RFC][PR 67336][PING^6] Verify pointers during stack unwind
On 03.08.2017 18:04, Florian Weimer wrote: * Yuri Gribov: I've rebased the previous patch to trunk per Andrew's suggestion. Original patch description/motivation/questions are in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01869.html What about the remaining TODOs? Yes, need to be fixed. Note that patch is an RFC so I didn't want to polish it until I get enough feedback from others. + if (__builtin_expect (msync ((void *)page, len, MS_ASYNC), 0)) +return 0; Is there any reason not to probe using mincore? It won't trigger writeback. Nice, for some reason I didn't realize it provides necessary info. -Y
Re: [PATCH] Fix PR81503 (SLSR invalid fold)
Hi, Here's v2 of the patch with Jakub's suggestions incorporated. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this ok for trunk? Eventually this should be backported to all active releases as well. Ok for that after a week or so of burn-in? (And after 7.2, I imagine.) Thanks, Bill [gcc] 2017-08-03 Bill Schmidt Jakub Jelinek PR tree-optimization/81503 * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure folded constant fits in the target type. [gcc/testsuite] 2017-08-03 Bill Schmidt Jakub Jelinek PR tree-optimization/81503 * gcc.c-torture/execute/pr81503.c: New file. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 250791) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -2074,6 +2074,10 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ { tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt)); enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt); + unsigned int prec = TYPE_PRECISION (target_type); + tree maxval = (POINTER_TYPE_P (target_type) +? TYPE_MAX_VALUE (sizetype) +: TYPE_MAX_VALUE (target_type)); /* It is highly unlikely, but possible, that the resulting bump doesn't fit in a HWI. Abandon the replacement @@ -2082,6 +2086,17 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ types but allows for safe negation without twisted logic. */ if (wi::fits_shwi_p (bump) && bump.to_shwi () != HOST_WIDE_INT_MIN + /* It is more likely that the bump doesn't fit in the target +type, so check whether constraining it to that type changes +the value. For a signed type, the value mustn't change. +For an unsigned type, the value may only change to a +congruent value (for negative bumps). */ + && (TYPE_UNSIGNED (target_type) + ? wi::eq_p (wi::neg_p (bump) + ? bump + wi::to_widest (maxval) + 1 + : bump, + wi::zext (bump, prec)) + : wi::eq_p (bump, wi::sext (bump, prec))) /* It is not useful to replace casts, copies, negates, or adds of an SSA name and a constant. */ && cand_code != SSA_NAME Index: gcc/testsuite/gcc.c-torture/execute/pr81503.c === --- gcc/testsuite/gcc.c-torture/execute/pr81503.c (nonexistent) +++ gcc/testsuite/gcc.c-torture/execute/pr81503.c (working copy) @@ -0,0 +1,15 @@ +unsigned short a = 41461; +unsigned short b = 3419; +int c = 0; + +void foo() { + if (a + b * ~(0 != 5)) +c = -~(b * ~(0 != 5)) + 2147483647; +} + +int main() { + foo(); + if (c != 2147476810) +return -1; + return 0; +} On 8/3/17 1:02 PM, Bill Schmidt wrote: >> On Aug 3, 2017, at 11:39 AM, Jakub Jelinek wrote: >> >> On Thu, Aug 03, 2017 at 11:29:44AM -0500, Bill Schmidt wrote: And, wouldn't it be more readable to use: && (TYPE_UNSIGNED (target_type) ? (wi::eq_p (bump, wi::zext (bump, prec)) || wi::eq_p (bump + wi::to_widest (maxval) + 1, wi::zext (bump, prec))) : wi::eq_p (bump, wi::sext (bump, prec))) ? >>> Probably. As noted, it's all becoming a bit unreadable with too >>> much negative logic in a long conditional, so I want to clean that >>> up in a follow-up. >>> For TYPE_UNSIGNED, do you actually need any restriction? What kind of bump values are wrong for unsigned types and why? >>> If the value of the bump is actually larger than the precision of the >>> type (not likely except for quite small types), say 2 * (maxval + 1) >>> which is congruent to 0, the replacement is wrong. >> Ah, ok. Anyway, for unsigned type, perhaps it could be written as: >> && (TYPE_UNSIGNED (target_type) >>? wi::eq_p (wi::neg_p (bump) ? bump + wi::to_widest (maxval) + 1 >> : bump, wi::zext (bump, prec)) >>: wi::eq_p (bump, wi::sext (bump, prec))) >> I mean, if bump >= 0, then the bump + wi::to_widest (maxval) + 1 >> value has no chance to be equal to zero extended bump, and >> for bump < 0 only that one has a chance. > Yeah, that's true. And arguably my case for the really large bump > causing problems is kind of thin, because the program is probably > already broken in that case anyway. But I think I will sleep better > having the check in there, as somebody other than SLSR will catch > the bug then. ;-) > > Thanks for all the help with this one. These corner cases are > always tricky, and I appreciate the extra eyeballs. > > Bill > >> Jakub >>
Re: Handle data dependence relations with different bases
Ping Richard Sandiford writes: > Richard Sandiford writes: >> Eric Botcazou writes: >>> [Sorry for missing the previous messages] >>> Thanks. Just been retesting, and I think I must have forgotten to include Ada last time. It turns out that the patch causes a dg-scan regression in gnat.dg/vect17.adb, because we now think that if the array RECORD_TYPEs *do* alias in: procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is begin for I in Sarray'Range loop R(I) := X(I) + Y(I); end loop; end; then the dependence distance must be zero. Eric, does that hold true for Ada? I.e. if X and R (or Y and R) alias, must it be the case that X(I) can only alias R(I) and not for example R(I-1) or R(I+1)? >>> >>> Yes, I'd think so (even without the artificial RECORD_TYPE around the >>> arrays). >> >> Good! >> 2017-06-07 Richard Sandiford gcc/testsuite/ * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5. * gnat.dg/vect17.adb (Add): Create a dependence distance of 1 when X = R or Y = R. >>> >>> I think that you need to modify vect15 and vect16 the same way. >> >> Ah, yeah. And doing that shows that I'd not handled safelen for >> DDR_COULD_BE_INDEPENDENT_P. I've fixed that locally. >> >> How does this look? Tested on x86_64-linux-gnu both without the >> vectoriser changes and with the fixed vectoriser patch. > > Here's a version of the patch that handles safelen. I split the > handling out into a new function (vect_analyze_possibly_independent_ddr) > since it was getting too big to do inline. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Thanks, > Richard > > > 2017-07-27 Richard Sandiford > > gcc/ > * tree-data-ref.h (subscript): Add access_fn field. > (data_dependence_relation): Add could_be_independent_p. > (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. > (same_access_functions): Move to tree-data-ref.c. > * tree-data-ref.c (ref_contains_union_access_p): New function. > (access_fn_component_p): Likewise. > (access_fn_components_comparable_p): Likewise. > (dr_analyze_indices): Add a reference to access_fn_component_p. > (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of > DR_ACCESS_FN. > (constant_access_functions): Likewise. > (add_other_self_distances): Likewise. > (same_access_functions): Likewise. (Moved from tree-data-ref.h.) > (initialize_data_dependence_relation): Use XCNEW and remove > explicit zeroing of DDR_REVERSED_P. Look for a subsequence > of access functions that have the same type. Allow the > subsequence to end with different bases in some circumstances. > Record the chosen access functions in SUB_ACCESS_FN. > (build_classic_dist_vector_1): Replace ddr_a and ddr_b with > a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. > (subscript_dependence_tester_1): Likewise dra and drb. > (build_classic_dist_vector): Update calls accordingly. > (subscript_dependence_tester): Likewise. > * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check > DDR_COULD_BE_INDEPENDENT_P. > * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test > comp_alias_ddrs instead of may_alias_ddrs. > * tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr): > New function. > (vect_analyze_data_ref_dependence): Use it if > DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded > distance vectors if that fails. > (dependence_distance_ge_vf): New function. > (vect_prune_runtime_alias_test_list): Use it. Don't clear > LOOP_VINFO_MAY_ALIAS_DDRS. > > gcc/testsuite/ > * gcc.dg/vect/vect-alias-check-3.c: New test. > * gcc.dg/vect/vect-alias-check-4.c: Likewise. > * gcc.dg/vect/vect-alias-check-5.c: Likewise. > > Index: gcc/tree-data-ref.h > === > --- gcc/tree-data-ref.h 2017-07-27 13:10:29.620045506 +0100 > +++ gcc/tree-data-ref.h 2017-07-27 13:10:33.023912613 +0100 > @@ -260,6 +260,9 @@ struct conflict_function > > struct subscript > { > + /* The access functions of the two references. */ > + tree access_fn[2]; > + >/* A description of the iterations for which the elements are > accessed twice. */ >conflict_function *conflicting_iterations_in_a; > @@ -278,6 +281,7 @@ struct subscript > > typedef struct subscript *subscript_p; > > +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] > #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a > #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b > #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict > @@ -333,6 +337,33 @@ struct data_dependence_relation >/* Set to true when the dep
Re: [PATCH 0/2] Python testcases to check DWARF output
On Jul 27, 2017, at 2:07 AM, Pierre-Marie de Rodat wrote: > On 07/27/2017 09:52 AM, Richard Biener wrote: >>> I'm fine with the direction if a reviewer wants to go in that >>> direction. I wish python didn't have a built-in speed penalty, >>> that's the only downside I don't like about it. Aside from that, >>> even switching all of the testsuite to be python based isn't a >>> terrible idea. >> But is it worse than TCL? python is likely 2x faster than tcl, if you have one instance per testsuite run. The problem is, for the work required, it's cheaper to do the work once to cut over to a new language. I'd rather switch to some other language that can come closer to the speed of compiled C code, yet in the scripting family. I don't have a pointer to a better solution than python at this time. I'd not be opposed to switching to python, it should be faster, just as safe, a bit easier for new people to code in, and more people who know it and use it. I think python is funner to code in than tcl. Cutting the entire testsuite over at once, might well be more than any one person can contribute, but, we could cut over subtrees, as people donate time; if people want to go in that direction. This can't be a 1 person decision, but rather a consensus building type decision. What do others think? > Good point. Actually for Python there are ways to make it faster. If we can > somehow manage to have a limited set of Python interpreter instances (instead > of one per test), we could use pypy, which is very good I heard to make long > running instances fast. Yes. One instance would help ensure the performance is good. I don't have a firm grasp of startup time to know just how critical it is. Also, I don't have a good grasp on memory pressures that python would create, say, compared to how we use tcl.
Re: [PATCH] Improve var_bound range test opt (PR tree-optimization/81655)
On Thu, 3 Aug 2017, Jeff Law wrote: > On 08/03/2017 03:32 AM, Jakub Jelinek wrote: > > On Thu, Aug 03, 2017 at 10:58:07AM +0200, Richard Biener wrote: > >>> The reason why we punt is the unexpected _4 != 1 condition, the code > >>> is prepared to handle just _4 == 0 (or _4 != 0) where _4 == 0 is > >>> equivalent > >>> to _4 != 1 for boolean type. > >> > >> Hmm, I thought we had code to canonicalize boolean compares (but I can't > >> find that right now). Some is in > > > > I was looking for that too, but didn't find anything that would be done > > always. > I can recall looking for that kind of canonicalization as well, but not > finding it. Furthermore instrumentation showed the different forms > could show up in the IL, but forms such as != 1 were relatively rare. We have foldings that canonicalize to some extent but further canonicalization requires switching edges which means even if we'd do that in CFG cleanup it would be delayed until the next such run. Richard.