Re: [PATCH] Change PRED_LOOP_EXIT from 85 to 92
On 01/02/2018 12:57 PM, Martin Liška wrote: > From what I've measured, suggested adjustment for this particular predictor > would be > increase to 89. Hello. Note that I've installed the patch as r256888. Martin
[PACH][RFC] Pass -m -jN to gcc_build from gcc_release
Currently gcc_release builds GCC (for generating in-tree generated files) serially - that's prohibitly expensive and takges ages (>4h for me). I'm using (when I remember to edit gcc_release ...) -j6 without a problem for some years, thus the following proposal. Any recommendation for the default N? 4 might work just fine as well and no release manager should do without at least 4 cores... Thanks, Richard. 2018-01-22 Richard Biener maintainer-scripts/ * gcc_release (build_sources): Pass -m "-j1" to gcc_build. Index: maintainer-scripts/gcc_release === --- maintainer-scripts/gcc_release (revision 256939) +++ maintainer-scripts/gcc_release (working copy) @@ -210,7 +210,8 @@ EOF inform "Building compiler" OBJECT_DIRECTORY=../objdir contrib/gcc_build -d ${SOURCE_DIRECTORY} -o ${OBJECT_DIRECTORY} \ - -c "--enable-generated-files-in-srcdir --disable-multilib" build || \ + -c "--enable-generated-files-in-srcdir --disable-multilib" \ + -m "-j1" build || \ error "Could not rebuild GCC" fi
Re: [PACH][RFC] Pass -m -jN to gcc_build from gcc_release
On Mon, Jan 22, 2018 at 09:37:05AM +0100, Richard Biener wrote: > > Currently gcc_release builds GCC (for generating in-tree generated > files) serially - that's prohibitly expensive and takges ages (>4h for > me). I'm using (when I remember to edit gcc_release ...) -j6 without > a problem for some years, thus the following proposal. > > Any recommendation for the default N? 4 might work just fine as well > and no release manager should do without at least 4 cores... Perhaps num_cpus=1 if type -p getconf 2>/dev/null; then num_cpus=`getconf _NPROCESSORS_ONLN 2>/dev/null` case "$num_cpus" in '' | 0* | *[!0-9]*) num_cpus=1;; esac fi and "-j$num_cpus" ? > 2018-01-22 Richard Biener > > maintainer-scripts/ > * gcc_release (build_sources): Pass -m "-j1" to gcc_build. > > Index: maintainer-scripts/gcc_release > === > --- maintainer-scripts/gcc_release(revision 256939) > +++ maintainer-scripts/gcc_release(working copy) > @@ -210,7 +210,8 @@ EOF > inform "Building compiler" > OBJECT_DIRECTORY=../objdir > contrib/gcc_build -d ${SOURCE_DIRECTORY} -o ${OBJECT_DIRECTORY} \ > - -c "--enable-generated-files-in-srcdir --disable-multilib" build || \ > + -c "--enable-generated-files-in-srcdir --disable-multilib" \ > + -m "-j1" build || \ >error "Could not rebuild GCC" >fi > Jakub
[PATCH] Minor warning option sync with clang
Hi, a mail in the gcc-list https://gcc.gnu.org/ml/gcc/2018-01/msg00144.html reminded me about this minor stuff I have (so it can be controlled by -Werror), but never managed to write testcases. I'm posting it here in case it's good enough or someone wants to take over from here. Franz c/ChangeLog; 2018-01-22 Franz Sirl * c-decl.c (grokdeclarator): Use OPT_Wextern_initializer. * c-typeck.c (build_binary_op): Use OPT_Wcompare_distinct_pointer_types. c-family/ChangeLog: 2018-01-22 Franz Sirl * c.opt (-Wcompare-distinct-pointer-types): New option. (-Wextern-initializer): Likewise. cp/ChangeLog: 2018-01-22 Franz Sirl * decl.c (grokdeclarator): Use OPT_Wextern_initializer. * typeck.c (composite_pointer_error): Use OPT_Wcompare_distinct_pointer_types. Index: gcc-trunk/gcc/c-family/c.opt === --- gcc-trunk/gcc/c-family/c.opt(revision 256939) +++ gcc-trunk/gcc/c-family/c.opt(working copy) @@ -318,10 +318,6 @@ alloca, and on bounded uses of alloca whose bound can be larger than bytes. -Warray-bounds -LangEnabledBy(C ObjC C++ ObjC++,Wall) -; in common.opt - Warray-bounds= LangEnabledBy(C ObjC C++ ObjC++,Wall,1,0) ; in common.opt @@ -420,6 +416,10 @@ C ObjC C++ ObjC++ Warning Alias(Wcomment) Synonym for -Wcomment. +Wcompare-distinct-pointer-types +C ObjC C++ ObjC++ Var(warn_distinct_pointer_types) Init(1) Warning +Warn for comparison of distinct pointer types. + Wconditionally-supported C++ ObjC++ Var(warn_conditionally_supported) Warning Warn for conditionally-supported constructs. @@ -512,6 +512,10 @@ C ObjC RejectNegative Warning Alias(Werror=, implicit-function-declaration) This switch is deprecated; use -Werror=implicit-function-declaration instead. +Wextern-initializer +C ObjC C++ ObjC++ Var(warn_extern_initializer) Init(1) Warning +Warn about extern declarations with an initializer. + Wextra C ObjC C++ ObjC++ Warning ; in common.opt @@ -694,6 +698,10 @@ C C++ Common Var(warn_misleading_indentation) Warning LangEnabledBy(C C++,Wall) Warn when the indentation of the code does not reflect the block structure. +Wmisleading-indentation-ltb +C C++ Common Var(warn_misleading_indentation_ltb) Warning +Warn when the indentation of the code does not reflect the block structure. + Wmissing-braces C ObjC C++ ObjC++ Var(warn_missing_braces) Warning LangEnabledBy(C ObjC,Wall) Warn about possibly missing braces around initializers. Index: gcc-trunk/gcc/c/c-decl.c === --- gcc-trunk/gcc/c/c-decl.c(revision 256939) +++ gcc-trunk/gcc/c/c-decl.c(working copy) @@ -5888,8 +5888,8 @@ /* It is fine to have 'extern const' when compiling at C and C++ intersection. */ if (!(warn_cxx_compat && constp)) - warning_at (loc, 0, "%qE initialized and declared %", -name); + warning_at (loc, OPT_Wextern_initializer, +"%qE initialized and declared %", name); } else error_at (loc, "%qE has both % and initializer", name); Index: gcc-trunk/gcc/cp/decl.c === --- gcc-trunk/gcc/cp/decl.c (revision 256939) +++ gcc-trunk/gcc/cp/decl.c (working copy) @@ -12408,7 +12408,8 @@ /* It's common practice (and completely valid) to have a const be initialized and declared extern. */ if (!(type_quals & TYPE_QUAL_CONST)) - warning (0, "%qs initialized and declared %", name); + warning_at (input_location, OPT_Wextern_initializer, + "%qs initialized and declared %", name); } else { Index: gcc-trunk/gcc/c/c-typeck.c === --- gcc-trunk/gcc/c/c-typeck.c (revision 256939) +++ gcc-trunk/gcc/c/c-typeck.c (working copy) @@ -11599,8 +11599,9 @@ else /* Avoid warning about the volatile ObjC EH puts on decls. */ if (!objc_ok) - pedwarn (location, 0, - "comparison of distinct pointer types lacks a cast"); + pedwarn (location, OPT_Wcompare_distinct_pointer_types, + "comparison of distinct pointer types " + "%qT and %qT lacks a cast", type0, type1); if (result_type == NULL_TREE) { @@ -11711,8 +11712,9 @@ int qual = ENCODE_QUAL_ADDR_SPACE (as_common); result_type = build_pointer_type (build_qualified_type (void_type_node, qual)); - pedwarn (location, 0, - "comparison of distinct pointer types lacks a cast"); + pedwarn (location, OPT_Wcompare_distinct_pointer_types, + "comparison of distinct pointer type
Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")
Hi, On 20/01/2018 02:59, Paolo Carlini wrote: Hi again, On 19/01/2018 23:55, Paolo Carlini wrote: ...Therefore It seems to me that a way to more cleanly solve the bug would be moving something like || !DECL_NONTRIVIALLY_INITIALIZED_P to the the above check in check_for_uninitialized_const_var, and therefore remove completely the uninitialized case from potential_constant_expression_1, ... Of course this doesn't work. check_for_uninitialized_const would really need to know that the VAR_DECL appears in a statement expression which is initializing a constexpr variable, nothing to do with DECL_NONTRIVIALLY_INITIALIZED_P. I'll give the issue more thought over the we, but removing completely the check from potential_constant_expression_1 seems very tough to me, assuming of course we really care about diagnosing the uninitialized inner in stmtexpr20.C, which is my invention, isn't in the testsuite yet ;) Thus the below, carefully tested over the we, would be a change completely removing the problematic error_at call, plus some testcases checking that we would still do the right thing in a few cases (bug submitter added constexpr-83921-2.C). The updated stmtexpr20.C shows that we would reject anyway a statement expression using an uninitialized inner, simply because the whole initializer would still be flagged as non const. ICC does the same. CLANG diagnoses the uninitialized inner but then actually accepts something using 'static const int inner = 1' whereas we don't anyway, we (and ICC) don't accept anything similar to stmtexpr20.C. So, I would say this kind of change may actually make sense... Thanks, Paolo. //. Index: cp/constexpr.c === --- cp/constexpr.c (revision 256939) +++ cp/constexpr.c (working copy) @@ -5707,13 +5707,6 @@ potential_constant_expression_1 (tree t, bool want "% in % context", tmp); return false; } - else if (!DECL_NONTRIVIALLY_INITIALIZED_P (tmp)) - { - if (flags & tf_error) - error_at (DECL_SOURCE_LOCATION (tmp), "uninitialized " - "variable %qD in % context", tmp); - return false; - } } return RECUR (tmp, want_rval); Index: testsuite/g++.dg/cpp1y/constexpr-83921-1.C === --- testsuite/g++.dg/cpp1y/constexpr-83921-1.C (nonexistent) +++ testsuite/g++.dg/cpp1y/constexpr-83921-1.C (working copy) @@ -0,0 +1,5 @@ +// PR c++/83921 +// { dg-do compile { target c++14 } } + +struct Foo { }; +constexpr void test() { Foo f; } Index: testsuite/g++.dg/cpp1y/constexpr-83921-2.C === --- testsuite/g++.dg/cpp1y/constexpr-83921-2.C (nonexistent) +++ testsuite/g++.dg/cpp1y/constexpr-83921-2.C (working copy) @@ -0,0 +1,5 @@ +// PR c++/83921 +// { dg-do compile { target c++14 } } + +struct Foo { Foo() = default; }; +constexpr void test() { Foo f; } Index: testsuite/g++.dg/cpp1y/constexpr-83921-3.C === --- testsuite/g++.dg/cpp1y/constexpr-83921-3.C (nonexistent) +++ testsuite/g++.dg/cpp1y/constexpr-83921-3.C (working copy) @@ -0,0 +1,5 @@ +// PR c++/83921 +// { dg-do compile { target c++14 } } + +struct Foo { int m; }; +constexpr void test() { Foo f; } // { dg-error "uninitialized" } Index: testsuite/g++.dg/ext/stmtexpr20.C === --- testsuite/g++.dg/ext/stmtexpr20.C (nonexistent) +++ testsuite/g++.dg/ext/stmtexpr20.C (working copy) @@ -0,0 +1,13 @@ +// PR c++/83921 +// { dg-options "" } +// { dg-do compile { target c++11 } } + +struct test { const int *addr; }; + +const test* setup() +{ + static constexpr test atest = +{ ({ int inner; &inner; }) }; // { dg-error "not a constant" } + + return &atest; +}
[PATCH][arm] Make gcc.target/arm/copysign_softfloat_1.c more robust
Hi all, This test has needlessly restrictive requirements. It tries to force a soft-float target and tries to run. This makes it unsupportable for any non-soft-float variant. In fact, the test can be a run-time test for any target, and only the scan-assembler tests are specific to -mfloat-abi=soft. So this patch makes the test always runnable and makes the scan-assembler checks predicable on the the new arm_sotftfloat effective target check. Committing to trunk. Thanks, Kyrill 2018-01-22 Kyrylo Tkachov * doc/sourcebuild.texi (arm_softfloat): Document. 2018-01-22 Kyrylo Tkachov * lib/target-supports.exp (check_effective_target_arm_softfloat): New procedure. * gcc.target/arm/copysign_softfloat_1.c: Allow running everywhere. Adjust scan-assembler checks for soft-float. diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index f0233c9cca4ca32248db407bf00fb4c97eb740b1..69fbf6ac76842ae90fcf4f07638755c18ecc23c9 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1650,6 +1650,10 @@ ARM target adheres to the VFP and Advanced SIMD Register Arguments variant of the ABI for the ARM Architecture (as selected with @code{-mfloat-abi=hard}). +@item arm_softfloat +ARM target uses the soft-float ABI with no floating-point instructions +used whatsoever (as selected with @code{-mfloat-abi=soft}). + @item arm_hard_vfp_ok ARM target supports @code{-mfpu=vfp -mfloat-abi=hard}. Some multilibs may be incompatible with these options. diff --git a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c index d79d014e27cf445cd741504c6b256a3a51ace6cd..fdbeeadc01e1c9b9a7810a8ff8b23c58f6c429a5 100644 --- a/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c +++ b/gcc/testsuite/gcc.target/arm/copysign_softfloat_1.c @@ -1,8 +1,8 @@ /* { dg-do run } */ /* { dg-require-effective-target arm_thumb2_ok } */ -/* { dg-require-effective-target arm_soft_ok } */ -/* { dg-skip-if "skip override" { *-*-* } { "-mfloat-abi=softfp" "-mfloat-abi=hard" } { "" } } */ -/* { dg-options "-O2 -mfloat-abi=soft --save-temps" } */ +/* { dg-add-options arm_arch_v6t2 } */ +/* { dg-additional-options "-O2 --save-temps" } */ + extern void abort (void); #define N 16 @@ -42,8 +42,8 @@ main (int argc, char **argv) { int index = 0; -/* { dg-final { scan-assembler-times "bfi" 2 } } */ -/* { dg-final { scan-assembler-times "lsr" 1 } } */ +/* { dg-final { scan-assembler-times "bfi" 2 { target arm_softfloat } } } */ +/* { dg-final { scan-assembler-times "lsr" 1 { target arm_softfloat } } } */ for (index; index < N; index++) { if (__builtin_copysignf (a_f[index], b_f[index]) != c_f[index]) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 4095f6386b19d601ffd345922b14e015565a2462..50ec1adbf8158b5f29130d3919fcd86de8be5248 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4879,6 +4879,19 @@ proc check_effective_target_arm_hf_eabi { } { }] } +# Return 1 if this is an ARM target that uses the soft float ABI +# with no floating-point instructions at all (e.g. -mfloat-abi=soft). + +proc check_effective_target_arm_softfloat { } { +return [check_no_compiler_messages arm_softfloat object { + #if !defined(__SOFTFP__) + #error not soft-float EABI + #else + int dummy; + #endif +}] +} + # Return 1 if this is an ARM target supporting -mcpu=iwmmxt. # Some multilibs may be incompatible with this option.
Re: [PATCH][ARM] Fix test fail with conflicting -mfloat-abi
Hi Kyrill On 19/01/18 18:00, Kyrill Tkachov wrote: On 16/01/18 10:31, Sudakshina Das wrote: Hi Christophe On 12/01/18 18:32, Christophe Lyon wrote: Le 12 janv. 2018 15:26, "Sudakshina Das" a écrit : Hi This patch fixes my earlier test case that fails for arm-none-eabi with explicit user option for -mfloat-abi which conflict with the test case options. I have added a guard to skip the test on those cases. @Christophe: Sorry about this. I think this should fix the test case. Can you please confirm if this works for you? Yes it does thanks Thanks for checking that. I have added one more directive for armv5t as well to avoid any conflicts for mcpu options. I agree with what Sudi said in https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01422.html I'd rather keep the test in the generic torture suite as long as we get the directives right. So this is ok for trunk (as the changes are arm-specific directives) with one change below: Thanks, Kyrill Sudi Thanks Sudi gcc/testsuite/ChangeLog 2018-01-12 Sudakshina Das * gcc.c-torture/compile/pr82096.c: Add dg-skip-if directive. diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c index 9fed28c..35551f5 100644 --- a/gcc/testsuite/gcc.c-torture/compile/pr82096.c +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c @@ -1,3 +1,5 @@ +/* { dg-require-effective-target arm_arch_v5t_ok } */ Please also guard this on { target arm*-*-* } That way this test will be run on other targets as well so that they can benefit from it. +/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */ /* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */ Thanks committed with the change as r256941 Sudi
RE: [patch][x86] -march=icelake
Hi, I tried omp_clause_mask and it looks ok. But it lacks check if there is any bit or none. With addition of it(as proposed or in some other way it should work. What do you think about this approach(patch attached)? Thanks, Julia > -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Tuesday, December 19, 2017 2:50 PM > To: Koval, Julia > Cc: Richard Biener ; Uros Bizjak > ; GCC Patches ; Kirill Yukhin > > Subject: Re: [patch][x86] -march=icelake > > On Tue, Dec 19, 2017 at 12:34:03PM +, Koval, Julia wrote: > > >> Maybe [] operator could be used instead of a dynamic handling here. > > I had another solution in mind, with enums, which then addresses elements > using its index, please look the patch attached. > > You can also have a look at the omp_clause_mask class in c-common.h, that is > also something that has been added to handle the case where we run out of > 64-bits for a particular bitmask, wanted to keep using pretty much the same > interfaces and be able to handle it fast. Using 2 enums for the two halves > and treating it accordingly is also an option. > > I agree sbitmap is too heavy for this. > > Jakub 0001-test.patch Description: 0001-test.patch
Re: [patch][x86] -march=icelake
On Mon, Jan 22, 2018 at 11:30:10AM +, Koval, Julia wrote: > Hi, I tried omp_clause_mask and it looks ok. But it lacks check if there > is any bit or none. With addition of it(as proposed or in some other way > it should work. What do you think about this approach(patch attached)? Well, I certainly didn't mean to use omp_clause_mask for something completely unrelated to OpenMP, the reason I've mentioned it is that it is a class that deals with a similar problem. So, if you want to use the same class, it would need to be moved to some generic header, renamed and then c-common.h would typedef that_class omp_clause_mask. I'm surprised you need any, doesn't ((mask & (...)) != 0 already handle that? Jakub
[PATCH] Fix PR83963
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2018-01-22 Richard Biener PR tree-optimization/83963 * graphite-scop-detection.c (scop_detection::get_sese): Delay including the loop exit block. (scop_detection::merge_sese): Likewise. (scop_detection::add_scop): Do it here instead. * gcc.dg/graphite/pr83963.c: New testcase. Index: gcc/graphite-scop-detection.c === --- gcc/graphite-scop-detection.c (revision 256939) +++ gcc/graphite-scop-detection.c (working copy) @@ -420,13 +420,6 @@ scop_detection::get_sese (loop_p loop) edge scop_end = single_exit (loop); if (!scop_end || (scop_end->flags & (EDGE_COMPLEX|EDGE_FAKE))) return invalid_sese; - /* Include the BB with the loop-closed SSA PHI nodes. - canonicalize_loop_closed_ssa makes sure that is in proper shape. */ - if (! single_pred_p (scop_end->dest) - || ! single_succ_p (scop_end->dest) - || ! sese_trivially_empty_bb_p (scop_end->dest)) -gcc_unreachable (); - scop_end = single_succ_edge (scop_end->dest); return sese_l (scop_begin, scop_end); } @@ -507,17 +500,6 @@ scop_detection::merge_sese (sese_l first } while (! bitmap_empty_p (worklist)); - /* Include the BB with the loop-closed SSA PHI nodes. - canonicalize_loop_closed_ssa makes sure that is in proper shape. */ - if (exit->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) - && loop_exit_edge_p (exit->src->loop_father, exit)) -{ - gcc_assert (single_pred_p (exit->dest) - && single_succ_p (exit->dest) - && sese_trivially_empty_bb_p (exit->dest)); - exit = single_succ_edge (exit->dest); -} - sese_l combined (entry, exit); DEBUG_PRINT (dp << "[merged-sese] s1: "; print_sese (dump_file, combined)); @@ -608,6 +590,18 @@ scop_detection::add_scop (sese_l s) { gcc_assert (s); + /* Include the BB with the loop-closed SSA PHI nodes, we need this + block in the region for code-generating out-of-SSA copies. + canonicalize_loop_closed_ssa makes sure that is in proper shape. */ + if (s.exit->dest != EXIT_BLOCK_PTR_FOR_FN (cfun) + && loop_exit_edge_p (s.exit->src->loop_father, s.exit)) +{ + gcc_assert (single_pred_p (s.exit->dest) + && single_succ_p (s.exit->dest) + && sese_trivially_empty_bb_p (s.exit->dest)); + s.exit = single_succ_edge (s.exit->dest); +} + /* Do not add scops with only one loop. */ if (region_has_one_loop (s)) { Index: gcc/testsuite/gcc.dg/graphite/pr83963.c === --- gcc/testsuite/gcc.dg/graphite/pr83963.c (nonexistent) +++ gcc/testsuite/gcc.dg/graphite/pr83963.c (working copy) @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O -floop-nest-optimize -fno-tree-loop-im" } */ + +int mg, et; + +void +s5 (int is) +{ + if (is == 0) +{ +g6: + ++is; +} + + while (mg < 1) +{ + while (et < 1) + { + if (is == 0) + return; + + ++et; + } + + while (mg < 1) + ++mg; +} + + goto g6; +} +
Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)
On 19 January 2018 at 22:45, Jakub Jelinek wrote: > Hi! > > This PR is about a certain test FAILing on arm, because it scans for > "Invalid sum ..." message in the *.ira dump, but recent GCC versions have > that message in there; not introduced by IRA though, but all the way from > expansion. We are expanding: >[local count: 1073741825]: > _1 = *x_3(D); > if (_1 u>= 0.0) > goto ; [99.95%] > else > goto ; [0.05%] > >[local count: 536864]: > sqrtf (_1); > and do_compare_rtx_and_jump decides to split the single _1 u>= 0.0 > comparison into two. The expectation is that the basic block counts stay > the same, so if bb 3's count is 0.05% times bb 2's count, the probabilities > need to be computed on both jumps so that this is preserved. > We want to expand essentially to: >[local count: 1073741825]: > ... > if (cond1) > goto ; [first_prob] > else > goto ; [first_prob.invert ()] > > : > if (cond2) > goto ; [prob] > else > goto ; [prob.invert ()] > >[local count: 536864]: > sqrtf (_1); > and compute first_prob and prob from the original prob such that the bb > counts match. The code used to hardcode first_prob to 1% or 99% depending > on condition, and leave the second probability the original one. > > That IMHO can't work and the Invalid sum message verifies that. If we want > the first jump to hit 99times more often than the second one or vice versa, > I believe first_prob should be .99 * prob or .01 * prob respectively, and > the second probability then should be (0.01 * prob) / (1 - 0.99 * prob) > or (0.99 * prob) / (1 - 0.01 * prob) respectively. > > With this change the Invalid sum message disappears. > predict-8.c testcase was apparently trying to match the hardcoded 0.99 > probability rather than .99 * 65% emitted now. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > Hi, Did you notice any performance impact? Thanks, Christophe > If this patch is right, I think do_jump_by_parts* are buggy similarly too, > there we emit prob or prob.invert () for all the N jumps we emit instead of > the original single conditional jump with probability prob. I think we'd > need to decide what relative probabilities we want to use for the different > jumps, e.g. all have even relative likelyhood and then adjust the > probability of each branch and from what we compute the following > probabiliries similarly to this patch. > > 2018-01-19 Jakub Jelinek > > PR tree-optimization/83081 > * dojump.c (do_compare_rtx_and_jump): Fix adjustment of probabilities > when splitting a single conditional jump into 2. > > * gcc.dg/predict-8.c: Adjust expected probability. > > --- gcc/dojump.c.jj 2018-01-03 10:19:55.0 +0100 > +++ gcc/dojump.c2018-01-19 17:07:25.238927314 +0100 > @@ -1122,11 +1122,30 @@ do_compare_rtx_and_jump (rtx op0, rtx op > { > profile_probability first_prob = prob; > if (first_code == UNORDERED) > - first_prob = profile_probability::guessed_always > ().apply_scale > -(1, 100); > + { > + /* We want to split: > +if (x) goto t; // prob; > +into > +if (a) goto t; // first_prob; > +if (b) goto t; // prob; > +such that the overall probability of jumping to t > +remains the same, but the first jump jumps > +much less often than the second jump. */ > + first_prob = prob.guessed ().apply_scale (1, 100); > + prob = (prob.guessed () - first_prob) / first_prob.invert > (); > + } > else if (first_code == ORDERED) > - first_prob = profile_probability::guessed_always > ().apply_scale > -(99, 100); > + { > + /* See above, except the first jump should jump much more > +often than the second one. */ > + first_prob = prob.guessed ().apply_scale (99, 100); > + prob = (prob.guessed () - first_prob) / first_prob.invert > (); > + } > + else > + { > + first_prob = prob.guessed ().apply_scale (50, 100); > + prob = first_prob; > + } > if (and_them) > { > rtx_code_label *dest_label; > --- gcc/testsuite/gcc.dg/predict-8.c.jj 2017-11-21 23:17:43.149093787 +0100 > +++ gcc/testsuite/gcc.dg/predict-8.c2018-01-19 22:24:09.949249810 +0100 > @@ -8,4 +8,4 @@ int foo(float a, float b) { > return 2; > } > > -/* { dg-final { scan-rtl-dump-times "99.0. .guessed" 2 "expand"} } */ > +/* { dg-final { scan-rtl-dump-times "64.[34]. .guessed" 2 "expand"} } */ > > Jakub
Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)
On Mon, Jan 22, 2018 at 01:11:34PM +0100, Christophe Lyon wrote: > On 19 January 2018 at 22:45, Jakub Jelinek wrote: > > That IMHO can't work and the Invalid sum message verifies that. If we want > > the first jump to hit 99times more often than the second one or vice versa, > > I believe first_prob should be .99 * prob or .01 * prob respectively, and > > the second probability then should be (0.01 * prob) / (1 - 0.99 * prob) > > or (0.99 * prob) / (1 - 0.01 * prob) respectively. > > > > With this change the Invalid sum message disappears. > > predict-8.c testcase was apparently trying to match the hardcoded 0.99 > > probability rather than .99 * 65% emitted now. > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > Hi, > > Did you notice any performance impact? Haven't been looking for that, this is a bugfix, the current code is broken. It changes (fixes) probabilities of some edges, sure, so it could affect something both ways. Jakub
Re: [C++ PATCH] PR c++/83895
On 01/21/2018 09:21 AM, Ville Voutilainen wrote: Finishing testing the attached, OK for trunk? ok (I had thought typedefs would be covered by the decl_context == TYPENAME case, but I see that is not true.) nathan -- Nathan Sidwell
Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)
Dne 2018-01-19 22:45, Jakub Jelinek napsal: Hi! This PR is about a certain test FAILing on arm, because it scans for "Invalid sum ..." message in the *.ira dump, but recent GCC versions have that message in there; not introduced by IRA though, but all the way from expansion. We are expanding: [local count: 1073741825]: _1 = *x_3(D); if (_1 u>= 0.0) goto ; [99.95%] else goto ; [0.05%] [local count: 536864]: sqrtf (_1); and do_compare_rtx_and_jump decides to split the single _1 u>= 0.0 comparison into two. The expectation is that the basic block counts stay the same, so if bb 3's count is 0.05% times bb 2's count, the probabilities need to be computed on both jumps so that this is preserved. We want to expand essentially to: [local count: 1073741825]: ... if (cond1) goto ; [first_prob] else goto ; [first_prob.invert ()] : if (cond2) goto ; [prob] else goto ; [prob.invert ()] [local count: 536864]: sqrtf (_1); and compute first_prob and prob from the original prob such that the bb counts match. The code used to hardcode first_prob to 1% or 99% depending on condition, and leave the second probability the original one. OK, so if we end u with first_prob 1% we increase probability to dispatch to bb 4 by little bit that is acceptale noise, but when first_prob is 99% then we obviously get it wrong. That IMHO can't work and the Invalid sum message verifies that. If we want the first jump to hit 99times more often than the second one or vice versa, I believe first_prob should be .99 * prob or .01 * prob respectively, and the second probability then should be (0.01 * prob) / (1 - 0.99 * prob) or (0.99 * prob) / (1 - 0.01 * prob) respectively. In new code bb4 is reached by first_prob + (1-first_prob)*second_prob which should be equal to prob. Say your choosen constant is to obtain first_prob=c*prob is c=0.99 and you want to know c2 to obtain second_prob=c2*prob You want the combined probability of branch (c*prob)+(1-c*prob)*(prob*c2) to be the same prob This should give c2=(1-c)/(1-c*prob) so (c*prob)+(1-c*prob)*prob*(1-c)/(1-c*prob) cancels out to c*prob+(1-c)*prob that is profile_probability cprob = profile_probability::guessed_always ().apply_scale (1, 100); first_prob = prob * cprob; prob = cprob.inverse () / (first_prob).inverse (); unless I missed something. No need to do .guessed conversion on prob. It will be propagated in as cprob which is already constructed as a guess. I would say that there are multiple cases where we want to redistribute probabilities so perhaps factor it out into a helper. Perhaps something like void profile_probability::split (profile_probability cprob, profile_probability *first_prob, profile_probability *second_prob) I would bet we already have something like that but don't see it. Other cases seems to choose c as 1/2 that simplifies the formulate bit it is also easy to get misunderstand such as in: profile_probability false_prob = prob.invert (); profile_probability op0_false_prob = false_prob.apply_scale (1, 2); profile_probability op1_false_prob = false_prob.apply_scale (1, 2) / op0_false_prob.invert (); So I would convert those into profile_probability::split OK with this change. Thanks for looking into this! Honza With this change the Invalid sum message disappears. predict-8.c testcase was apparently trying to match the hardcoded 0.99 probability rather than .99 * 65% emitted now. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? If this patch is right, I think do_jump_by_parts* are buggy similarly too, there we emit prob or prob.invert () for all the N jumps we emit instead of the original single conditional jump with probability prob. I think we'd need to decide what relative probabilities we want to use for the different jumps, e.g. all have even relative likelyhood and then adjust the probability of each branch and from what we compute the following probabiliries similarly to this patch. 2018-01-19 Jakub Jelinek PR tree-optimization/83081 * dojump.c (do_compare_rtx_and_jump): Fix adjustment of probabilities when splitting a single conditional jump into 2. * gcc.dg/predict-8.c: Adjust expected probability. --- gcc/dojump.c.jj 2018-01-03 10:19:55.0 +0100 +++ gcc/dojump.c2018-01-19 17:07:25.238927314 +0100 @@ -1122,11 +1122,30 @@ do_compare_rtx_and_jump (rtx op0, rtx op { profile_probability first_prob = prob; if (first_code == UNORDERED) - first_prob = profile_probability::guessed_always ().apply_scale -(1, 100); + { + /* We want to split: +if (x) goto t; // prob; +into +if (a) go
Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)
On Mon, Jan 22, 2018 at 01:52:11PM +0100, Jan Hubicka wrote: > In new code bb4 is reached by first_prob + (1-first_prob)*second_prob > which should be equal to prob. > > Say your choosen constant is to obtain first_prob=c*prob is c=0.99 and you > want to know c2 to obtain second_prob=c2*prob > > You want the combined probability of branch (c*prob)+(1-c*prob)*(prob*c2) to > be the same prob > > This should give c2=(1-c)/(1-c*prob) so > (c*prob)+(1-c*prob)*prob*(1-c)/(1-c*prob) cancels out to c*prob+(1-c)*prob > > that is > profile_probability cprob = > profile_probability::guessed_always ().apply_scale (1, 100); > first_prob = prob * cprob; Yes, if we want it to be done with arbitrary cprob, we can do it the above way and move the latter into the helper. > prob = cprob.inverse () / (first_prob).inverse (); But this looks incorrect, because that computes only the above c2 in prob, not second_prob. It needs to be prob = cprob.invert () * prob / first_prob.invert (); or prob *= cprob.invert () / first_prob.invert (); or that prob = (prob - first_prob) / first_prob.invert (); I had in the patch. The last one looked to me like cheapest to compute. > void profile_probability::split (profile_probability cprob, > profile_probability *first_prob, profile_probability *second_prob) This doesn't take prob as an argument, or should it be a method and take prob as *this? Couldn't it simply return first_prob and adjust itself, so profile_probability split (profile_probability cprob) { profile_probability ret = *this * cprob; *this = (*this - prob) / ret.invert (); // or *this = *this * cprob.invert () / ret.invert (); return ret; } ? > > I would bet we already have something like that but don't see it. > > Other cases seems to choose c as 1/2 that simplifies the formulate bit it > is also easy to get misunderstand such as in: > profile_probability false_prob = prob.invert (); > profile_probability op0_false_prob = false_prob.apply_scale (1, > 2); > profile_probability op1_false_prob = false_prob.apply_scale (1, > 2) > / op0_false_prob.invert (); Or profile_probability op1_false_prob = op0_false_prob / op0_false_prob.invert (); Jakub
Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)
Dne 2018-01-22 14:29, Jakub Jelinek napsal: On Mon, Jan 22, 2018 at 01:52:11PM +0100, Jan Hubicka wrote: In new code bb4 is reached by first_prob + (1-first_prob)*second_prob which should be equal to prob. Say your choosen constant is to obtain first_prob=c*prob is c=0.99 and you want to know c2 to obtain second_prob=c2*prob You want the combined probability of branch (c*prob)+(1-c*prob)*(prob*c2) to be the same prob This should give c2=(1-c)/(1-c*prob) so (c*prob)+(1-c*prob)*prob*(1-c)/(1-c*prob) cancels out to c*prob+(1-c)*prob that is profile_probability cprob = profile_probability::guessed_always ().apply_scale (1, 100); first_prob = prob * cprob; Yes, if we want it to be done with arbitrary cprob, we can do it the above way and move the latter into the helper. prob = cprob.inverse () / (first_prob).inverse (); But this looks incorrect, because that computes only the above c2 in prob, not second_prob. It needs to be prob = cprob.invert () * prob / first_prob.invert (); or prob *= cprob.invert () / first_prob.invert (); or that prob = (prob - first_prob) / first_prob.invert (); I had in the patch. The last one looked to me like cheapest to compute. Aha, I see. Ok that makes sense to me now :) void profile_probability::split (profile_probability cprob, profile_probability *first_prob, profile_probability *second_prob) This doesn't take prob as an argument, or should it be a method and take prob as *this? Couldn't it simply return first_prob and adjust itself, so profile_probability split (profile_probability cprob) { profile_probability ret = *this * cprob; *this = (*this - prob) / ret.invert (); // or *this = *this * cprob.invert () / ret.invert (); return ret; } ? Yep, that is fine. I would bet we already have something like that but don't see it. Other cases seems to choose c as 1/2 that simplifies the formulate bit it is also easy to get misunderstand such as in: profile_probability false_prob = prob.invert (); profile_probability op0_false_prob = false_prob.apply_scale (1, 2); profile_probability op1_false_prob = false_prob.apply_scale (1, 2) / op0_false_prob.invert (); Or profile_probability op1_false_prob = op0_false_prob / op0_false_prob.invert (); I would use the splitting function here so once someone decides to change the weights the code will not produce corrupt profile. Thanks, Honza Jakub
[PATCH 2/2] Introduce prefetch-dynamic-strides option.
The following patch adds an option to control software prefetching of memory references with non-constant/unknown strides. Currently we prefetch these references if the pass thinks there is benefit to doing so. But, since this is all based on heuristics, it's not always the case that we end up with better performance. For Falkor there is also the problem of conflicts with the hardware prefetcher, so we need to be more conservative in terms of what we issue software prefetch hints for. This also aligns GCC with what LLVM does for Falkor. Similarly to the previous patch, the defaults guarantee no change in behavior for other targets and architectures. I've regression-tested and bootstrapped it on aarch64-linux. No problems found. Ok? 2018-01-22 Luis Machado Introduce option to control whether the software prefetch pass should issue prefetch hints for non-constant strides. gcc/ * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) : New const unsigned int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include prefetch_dynamic_strides. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_DYNAMIC_STRIDES. * doc/invoke.texi (prefetch-dynamic-strides): Document new option. * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for prefetch-dynamic-strides setting. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64.c| 11 +++ gcc/doc/invoke.texi | 10 ++ gcc/params.def | 9 + gcc/params.h| 2 ++ gcc/tree-ssa-loop-prefetch.c| 10 ++ 6 files changed, 45 insertions(+) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8736bd9..22bd9ae 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* Whether software prefetch hints should be issued for non-constant + strides. */ + const unsigned int prefetch_dynamic_strides; /* The minimum constant stride beyond which we should use prefetch hints for. */ const int minimum_stride; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0ed9f14..713b230 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = -1, /* l1_cache_size */ -1, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = -1, /* l1_cache_size */ 64, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 1024,/* l2_cache_size */ + 0, /* prefetch_dynamic_strides */ 2048,/* minimum_stride */ 3/* default_opt_level */ }; @@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ 16*1024, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ 3/* default_opt_level */ }; @@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -597,6 +602,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 256, /* l2_cache_size
[PATCH 0/2] Add a couple new options to control loop prefetch pass
The following couple patches add options to better control the loop prefetch pass. With the current settings the pass tends to be very aggressive, issuing a lot of prefetch hints. Most of these don't translate to better performance. Some of these issued hints may even cause more cache evictions. Luis Machado (2): Introduce prefetch-minimum stride option Introduce prefetch-dynamic-strides option. gcc/config/aarch64/aarch64-protos.h | 6 ++ gcc/config/aarch64/aarch64.c| 24 +++- gcc/doc/invoke.texi | 25 + gcc/params.def | 18 ++ gcc/params.h| 4 gcc/tree-ssa-loop-prefetch.c| 26 ++ 6 files changed, 102 insertions(+), 1 deletion(-) -- 2.7.4
[PATCH 1/2] Introduce prefetch-minimum stride option
This patch adds a new option to control the minimum stride, for a memory reference, after which the loop prefetch pass may issue software prefetch hints for. There are two motivations: * Make the pass less aggressive, only issuing prefetch hints for bigger strides that are more likely to benefit from prefetching. I've noticed a case in cpu2017 where we were issuing thousands of hints, for example. * For processors that have a hardware prefetcher, like Falkor, it allows the loop prefetch pass to defer prefetching of smaller (less than the threshold) strides to the hardware prefetcher instead. This prevents conflicts between the software prefetcher and the hardware prefetcher. I've noticed considerable reduction in the number of prefetch hints and slightly positive performance numbers. This aligns GCC and LLVM in terms of prefetch behavior for Falkor. The default settings should guarantee no changes for existing targets. Those are free to tweak the settings as necessary. No regressions in the testsuite and bootstrapped ok on aarch64-linux. Ok? 2018-01-22 Luis Machado Introduce option to limit software prefetching to known constant strides above a specific threshold with the goal of preventing conflicts with a hardware prefetcher. gcc/ * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) : New const int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include minimum_stride field. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_MINIMUM_STRIDE. * doc/invoke.texi (prefetch-minimum-stride): Document new option. * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New. * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if stride is constant and is below the minimum stride threshold. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64.c| 13 - gcc/doc/invoke.texi | 15 +++ gcc/params.def | 9 + gcc/params.h| 2 ++ gcc/tree-ssa-loop-prefetch.c| 16 6 files changed, 57 insertions(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index ef1b0bc..8736bd9 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* The minimum constant stride beyond which we should use prefetch + hints for. */ + const int minimum_stride; const int default_opt_level; }; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 174310c..0ed9f14 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = -1, /* l1_cache_size */ -1, /* l1_cache_line_size */ -1, /* l2_cache_size */ + -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = -1, /* l1_cache_size */ 64, /* l1_cache_line_size */ -1, /* l2_cache_size */ + -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 1024,/* l2_cache_size */ - -1 /* default_opt_level */ + 2048,/* minimum_stride */ + 3/* default_opt_level */ }; static const cpu_prefetch_tune thunderxt88_prefetch_tune = @@ -574,6 +577,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ 16*1024, /* l2_cache_size */ + -1, /* minimum_stride */ 3/* default_opt_level */ }; @@ -583,6 +587,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ -1, /* l2_cache_size */ + -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -592,6 +597,7 @@ static const cpu_prefetch_tune thunderx2t99_
Re: [PATCH v3] Ability to remap file names in __FILE__, etc (PR other/70268)
Boris Kolpackov: > Ximin Luo writes: > >> -I to an absolute path is not that common for system / distro-built >> stuff. > > Ok, thanks for clarifying. > > >> In the cases that it occurs, indeed it could and should be fixed >> by the package buildsystem, e.g. by stripping a prefix when they >> add -I flags to CFLAGS. But that's a separate issue from what >> we're talking about here. > > I believe it is the same issue: any package that blindly embeds > information about how it was built into the result of the build > does not care about reproducible builds. > I don't think that's realistic position to take, and it seems like a fairly arbitrary assertion made merely to "brush this issue under the carpet". I think this because of the "implicit contract" of how CFLAGS is generally used: CFLAGS (and other flags) is a way to pass information from multiple layers of programs down to GCC. It is necessary to "blindly embed" it if you want to save it *at all*, otherwise you are forced to violate the separation-of-concerns between layers in your buildsystem stack. - Program A sets CFLAGS += --flags_A, then calls program B - Program B sets CFLAGS += --flags_B, then calls program C - Program C sets CFLAGS += --flags_C, then calls Make - Make reads CFLAGS then calls $(CC) $(CFLAGS) [etc] In general, you are not supposed to edit what was already in CFLAGS, you are supposed to append to it. That is why it's good practise to add "--no" options for various command-line flags in gcc (and other tools). So each program X has its own flags_X that it is "responsible for". If all programs in the stack makes sure their own flags_X contain reproducible values, then the whole build will be reproducible even when recording CFLAGS. However, this breaks in the case of -f*-prefix-map. If Program A sets this flag in flags_A, then this interferes with Program C, even though flags_C was fully reproducible. Note that program B and C don't even know that they are being used with GCC, and so really have no business filtering -f*-prefix-map out of the CFLAGS that program A already set. My proposal would allow all the flags_{A,B,C} to be reproducible, and avoid different programs having to embed layer-violating logic into themselves. > If others do agree that this should be "fixed" in GCC, then I > would suggest that you add a separate option for the environment > variable case (e.g., -ffile-prefix-map-env) and sidestep the > whole "how to portably distinguish a path from an environment > variable" issue. > That would also be fine for me sure. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Re: [PATCH 4/5] Remove predictors that are unrealiable.
On 01/19/2018 12:57 PM, Martin Liška wrote: > Yes, there's a huge difference in between CPU 2006 and 2017. Former has 63% > w/ dominant edges, > and later one only 11%. It's caused by these 2 benchmarks with a high > coverage: > Hi. I'm sending details about the 2 edges that influence the statistics significantly: > 500.perlbench_r: regexec.c.065i.profile: > negative return heuristics of edge 1368->1370: 2.0% exec 2477714850 hit > 2429863555 (98.1%) 2477714850: 3484:S_regtry(pTHX_ regmatch_info *reginfo, char **startposp) -: 3485:{ 2477714850: 3486:CHECKPOINT lastcp; 2477714850: 3487:REGEXP *const rx = reginfo->prog; 2477714850: 3488:regexp *const prog = ReANY(rx); 2477714850: 3489:SSize_t result; [snip] 2477714850: 8046:assert(!result || locinput - reginfo->strbeg >= 0); 2477714850: 8047:return result ? locinput - reginfo->strbeg : -1; -: 8048:} As seen it return -1 if a regex is not found, which is in case of perlbench very likely branch. > > and 523.xalancbmk_r: > build/build_peak_gcc7-m64./NameDatatypeValidator.cpp.065i.profile: > negative return heuristics of edge 3->4: 2.0% exec 1221735072 hit 1221522453 > (100.0%) 1221735072: 74:int NameDatatypeValidator::compare(const XMLCh* const lValue -: 75: , const XMLCh* const rValue -: 76: , MemoryManager* const) -: 77:{ 1221735072: 78:return ( XMLString::equals(lValue, rValue)? 0 : -1); -: 79:} -: 80: IP profile dump file: xercesc_2_7::NameDatatypeValidator::compare (struct NameDatatypeValidator * const this, const XMLCh * const lValue, const XMLCh * const rValue, struct MemoryManager * const D.17157) { bool _1; int iftmp.0_2; [count: 1221735072]: # DEBUG BEGIN_STMT _1 = xercesc_2_7::XMLString::equals (lValue_4(D), rValue_5(D)); if (_1 != 0) goto ; [0.02%] else goto ; [99.98%] [count: 1221522453]: [count: 1221735072]: # iftmp.0_2 = PHI <0(2), -1(3)> return iftmp.0_2; } Likewise, XML values are more commonly non-equal. Ok, so may I mark also negative return with PROB_EVEN to track it? Thanks, Martin > > Ideas what to do with the predictor for GCC 8 release? > Martin
Re: C++ PATCH to fix bogus error with constexpr and empty class (PR c++/81933)
On Thu, Jan 18, 2018 at 5:39 PM, Marek Polacek wrote: > The problem in this PR is that we get > error: constexpr call flows off the end of the function > for a valid testcase, but not in C++17 mode. That's because in C++17 we > execute: > > 4321 if (cxx_dialect >= cxx17 && !BINFO_VIRTUAL_P (binfo)) > 4322 { > 4323 tree decl = build_base_field_1 (t, basetype, next_field); > 4324 DECL_FIELD_OFFSET (decl) = BINFO_OFFSET (binfo); > 4325 DECL_FIELD_BIT_OFFSET (decl) = bitsize_zero_node; > 4326 SET_DECL_OFFSET_ALIGN (decl, BITS_PER_UNIT); > 4327 } > > which puts some additional fields into some classes. This makes a difference > for split_nonconstant_init: > > 751 if (split_nonconstant_init_1 (dest, init)) > 752 init = NULL_TREE; > > where split_nonconstant_init_1 is true so that means that the DECL_INITIAL > will > be null. Why is split_nonconstant_init_1 true? Because it ends up calling > complete_ctor_at_level_p with > > 6114 return count_type_elements (type, true) == num_elts; > > but the count_type_elements result differs between c++14/c++17 precisely > because of the fields added in build_base_field. But we should only add those > fields when initializing aggregates with bases, which is not what's happening > in this testcase. > > With DECL_INITIAL being null, we error here (result is NULL): > > 1702 result = *ctx->values->get (slot ? slot : res); > 1703 if (result == NULL_TREE && !*non_constant_p) > 1704 { > 1705 if (!ctx->quiet) > 1706 error ("% call flows off the end " > 1707"of the function"); > 1708 *non_constant_p = true; > 1709 } > > I think the problem is that we're dealing with an empty class here, for which > we should just generate an empty ctor, as in few other spots. This seems like a workaround rather than a fix; we should fix split_nonconstant_init to work properly rather than override it. Specifically, it seems odd to return true if num_elts is 0; we shouldn't override DECL_INITIAL if we didn't actually split anything out. Jason
RE: [patch][x86] -march=icelake
Yes, you are right, any() is not required. Here is the patch. Thanks, Julia > -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Monday, January 22, 2018 12:36 PM > To: Koval, Julia > Cc: Richard Biener ; Uros Bizjak > ; GCC Patches ; Kirill Yukhin > > Subject: Re: [patch][x86] -march=icelake > > On Mon, Jan 22, 2018 at 11:30:10AM +, Koval, Julia wrote: > > Hi, I tried omp_clause_mask and it looks ok. But it lacks check if there > > is any bit or none. With addition of it(as proposed or in some other way > > it should work. What do you think about this approach(patch attached)? > > Well, I certainly didn't mean to use omp_clause_mask for something > completely unrelated to OpenMP, the reason I've mentioned it is that it is a > class that deals with a similar problem. > > So, if you want to use the same class, it would need to be moved to some > generic header, renamed and then c-common.h would typedef that_class > omp_clause_mask. > > I'm surprised you need any, doesn't ((mask & (...)) != 0 already handle > that? > > Jakub 0001-test.patch Description: 0001-test.patch
[PATCH][AArch64] PR79262: Adjust vector cost
PR79262 has been fixed for almost all AArch64 cpus, however the example is still vectorized in a few cases, resulting in lower performance. Increase the cost of vector-to-scalar moves so it is more similar to the other vector costs. As a result -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6 performance improves. OK for commit? ChangeLog: 2018-01-22 Wilco Dijkstra PR target/79262 * config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost = 1, /* vec_int_stmt_cost */ 1, /* vec_fp_stmt_cost */ 2, /* vec_permute_cost */ - 1, /* vec_to_scalar_cost */ + 2, /* vec_to_scalar_cost */ 1, /* scalar_to_vec_cost */ 1, /* vec_align_load_cost */ 1, /* vec_unalign_load_cost */
[PATCH PR82096][gcc-7] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi This is a patch to backport r256526 and r256941 (Fix case fix) of trunk to fix emit_store_flag_force () function to fix the ICE. The original discussion is at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html and https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html Is this ok for gcc-7-branch? Testing : Ran regression testing with bootstrapped arm-none-linux-gnueabihf. Thanks Sudi ChangeLog entries: *** gcc/ChangeLog *** 2018-01-22 Sudakshina Das Backport from mainline: 2018-01-10 Sudakshina Das PR target/82096 * expmed.c (emit_store_flag_force): Swap if const op0 and change VOIDmode to mode of op0. *** gcc/testsuite/ChangeLog *** 2018-01-22 Sudakshina Das Backport from mainline: 2018-01-10 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096.c: New test. diff --git a/gcc/expmed.c b/gcc/expmed.c index e9f634a..30001ac 100644 --- a/gcc/expmed.c +++ b/gcc/expmed.c @@ -5886,6 +5886,18 @@ emit_store_flag_force (rtx target, enum rtx_code code, rtx op0, rtx op1, if (tem != 0) return tem; + /* If one operand is constant, make it the second one. Only do this + if the other operand is not constant as well. */ + + if (swap_commutative_operands_p (op0, op1)) +{ + std::swap (op0, op1); + code = swap_condition (code); +} + + if (mode == VOIDmode) +mode = GET_MODE (op0); + if (!target) target = gen_reg_rtx (word_mode); diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c new file mode 100644 index 000..d144b70 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c @@ -0,0 +1,11 @@ +/* { dg-require-effective-target arm_arch_v5t_ok { target arm*-*-* } } */ +/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */ +/* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */ + +static long long AL[24]; + +int +check_ok (void) +{ + return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll)); +}
[PATCH] Fix followup for PR83963
GRAPHITE has little testing coverage so only SPEC uncovered a latent issue in scop_detection::harmful_loop_in_region. Gets us some more testcases. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2018-01-22 Richard Biener PR tree-optimization/83963 * graphite-scop-detection.c (scop_detection::harmful_loop_in_region): Properly terminate dominator walk when crossing the exit edge not when visiting its source block. * gfortran.dg/graphite/pr83963.f: New testcase. * gcc.dg/graphite/pr83963-2.c: Likewise. Index: gcc/graphite-scop-detection.c === --- gcc/graphite-scop-detection.c (revision 256947) +++ gcc/graphite-scop-detection.c (working copy) @@ -677,10 +677,10 @@ scop_detection::harmful_loop_in_region ( if (!stmt_simple_for_scop_p (scop, gsi_stmt (gsi), bb)) return true; - if (bb != exit_bb) - for (basic_block dom = first_dom_son (CDI_DOMINATORS, bb); -dom; -dom = next_dom_son (CDI_DOMINATORS, dom)) + for (basic_block dom = first_dom_son (CDI_DOMINATORS, bb); + dom; + dom = next_dom_son (CDI_DOMINATORS, dom)) + if (dom != scop.exit->dest) worklist.safe_push (dom); } Index: gcc/testsuite/gfortran.dg/graphite/pr83963.f === --- gcc/testsuite/gfortran.dg/graphite/pr83963.f(nonexistent) +++ gcc/testsuite/gfortran.dg/graphite/pr83963.f(working copy) @@ -0,0 +1,18 @@ +! { dg-do compile } +! { dg-options "-O -floop-nest-optimize" } + + SUBROUTINE DAVCI(NORB,NCOR,NCI,NA,NB, + *CI,MAXP,MAXW1, + * IHMCON,ISTRB,ISTRP,ISTAR,II) + DIMENSION EC(MAXP,MAXP),IWRK1(2*MAXW1) + EC(II,II) = 1.0D+00 + DO 1396 II=1,MAXP +DO 1398 JJ=1,II-1 + EC(II,JJ) = 0.0D+00 + 1398 CONTINUE + 1396CONTINUE + IF (NA.EQ.NB) THEN + CALL RINAB0(SI1,SI2,NORB,NCOR,NCI,NA,NB,CI(1,IP),IACON1,IBCON1, + * IWRK1,IHMCON,ISTRB,ISTRP,ISTAR) + ENDIF + END Index: gcc/testsuite/gcc.dg/graphite/pr83963-2.c === --- gcc/testsuite/gcc.dg/graphite/pr83963-2.c (nonexistent) +++ gcc/testsuite/gcc.dg/graphite/pr83963-2.c (working copy) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O -floop-nest-optimize" } */ + +int Chv_countBigEntries (int npivot, int pivotsizes[], int countflag, +double droptol, int nD) +{ + double absval ; + double *entries ; + int count; + int ii, jj, kinc, kk, kstart, stride ; + for ( ii = 0 ; ii < nD ; ii++ ) +{ + kk = kstart ; + kinc = stride ; + for ( jj = 0 ; jj < ii ; jj++ ) + { + absval = __builtin_fabs(entries[kk]) ; + if ( absval >= droptol ) + count++ ; + kk += kinc ; + kinc -= 2 ; + } + kstart-- ; +} + return count; +}
Re: [PATCH][AArch64] PR79262: Adjust vector cost
On Mon, Jan 22, 2018 at 4:01 PM, Wilco Dijkstra wrote: > PR79262 has been fixed for almost all AArch64 cpus, however the example is > still > vectorized in a few cases, resulting in lower performance. Increase the cost > of > vector-to-scalar moves so it is more similar to the other vector costs. As a > result > -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6 > performance improves. > > OK for commit? It would be better to dissect this cost into vec_to_scalar and vec_extract where vec_to_scalar really means getting at the scalar value of a vector of uniform values which most targets can do without any instruction (just use a subreg). I suppose we could also make vec_to_scalar equal to vector extraction and remove the uses for the other case (reduction vector result to scalar reg). Richard. > ChangeLog: > 2018-01-22 Wilco Dijkstra > > PR target/79262 > * config/aarch64/aarch64.c (generic_vector_cost): Adjust > vec_to_scalar_cost. > -- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost = >1, /* vec_int_stmt_cost */ >1, /* vec_fp_stmt_cost */ >2, /* vec_permute_cost */ > - 1, /* vec_to_scalar_cost */ > + 2, /* vec_to_scalar_cost */ >1, /* scalar_to_vec_cost */ >1, /* vec_align_load_cost */ >1, /* vec_unalign_load_cost */
Re: [PATCH PR82096][gcc-7] Backport: Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
On Mon, Jan 22, 2018 at 4:10 PM, Sudakshina Das wrote: > Hi > > This is a patch to backport r256526 and r256941 (Fix case fix) of trunk to > fix emit_store_flag_force () function to fix the ICE. The original > discussion is at https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00219.html > and https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01058.html > > Is this ok for gcc-7-branch? > Testing : Ran regression testing with bootstrapped arm-none-linux-gnueabihf. The branch is currently frozen so please wait until after the GCC 7.3 release. Thanks, Richard. > Thanks > Sudi > > ChangeLog entries: > > *** gcc/ChangeLog *** > > 2018-01-22 Sudakshina Das > > Backport from mainline: > 2018-01-10 Sudakshina Das > > PR target/82096 > * expmed.c (emit_store_flag_force): Swap if const op0 > and change VOIDmode to mode of op0. > > *** gcc/testsuite/ChangeLog *** > > 2018-01-22 Sudakshina Das > > Backport from mainline: > 2018-01-10 Sudakshina Das > > PR target/82096 > * gcc.c-torture/compile/pr82096.c: New test.
Re: C++ PATCH to fix bogus error with constexpr and empty class (PR c++/81933)
On Mon, Jan 22, 2018 at 09:40:50AM -0500, Jason Merrill wrote: > This seems like a workaround rather than a fix; we should fix > split_nonconstant_init to work properly rather than override it. > Specifically, it seems odd to return true if num_elts is 0; we > shouldn't override DECL_INITIAL if we didn't actually split anything > out. Ah, ok, here's a patch for that, then. Thanks. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2018-01-22 Marek Polacek PR c++/81933 * typeck2.c (split_nonconstant_init_1): Return false if we didn't split out anything. * g++.dg/cpp1y/constexpr-empty4.C: New test. diff --git gcc/cp/typeck2.c gcc/cp/typeck2.c index 8d933257f5f..8cb0e8811d3 100644 --- gcc/cp/typeck2.c +++ gcc/cp/typeck2.c @@ -725,6 +725,11 @@ split_nonconstant_init_1 (tree dest, tree init) /* The rest of the initializer is now a constant. */ TREE_CONSTANT (init) = 1; + + /* We didn't split out anything. */ + if (num_split_elts == 0) +return false; + return complete_p && complete_ctor_at_level_p (TREE_TYPE (init), num_split_elts, inner_type); } diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C index e69de29bb2d..059220f8268 100644 --- gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C +++ gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C @@ -0,0 +1,51 @@ +// PR c++/81933 +// { dg-do compile { target c++14 } } + +namespace std { +template struct __decay_and_strip { typedef _Tp __type; }; +template struct enable_if { typedef int type; }; +template struct _Head_base { + constexpr _Head_base(_Head) {} +}; +template struct _Tuple_impl; +template +struct _Tuple_impl<_Idx, _Head, _Tail...> : _Tuple_impl<1, _Tail...>, // { dg-warning "direct base" } +_Head_base<_Head> { + typedef _Tuple_impl<1, _Tail...> _Inherited; + typedef _Head_base<_Head> _Base; + constexpr _Tuple_impl(_Head __head, _Tail... __tail) + : _Inherited(__tail...), _Base(__head) {} + _Tuple_impl(const _Tuple_impl &) = default; + _Tuple_impl(_Tuple_impl &&); +}; +template +struct _Tuple_impl<_Idx, _Head> : _Head_base<_Head> { + typedef _Head_base<_Head> _Base; + constexpr _Tuple_impl(_Head __head) : _Base(__head) {} +}; +template struct _TC { + static constexpr bool _NotSameTuple() { return true; } +}; +template class tuple : _Tuple_impl<0, _Elements...> { + typedef _Tuple_impl<0, _Elements...> _Inherited; + +public: + template ::_NotSameTuple()>::type = false> + constexpr tuple(_UElements... __elements) : _Inherited(__elements...) {} + tuple(const tuple &) = default; +}; +template +constexpr tuple::__type...> +make_tuple(_Elements... __args) { + typedef tuple::__type...> __result_type; + return __result_type(__args...); +} +} +struct any_udt {}; +template constexpr auto flatten(Tuples... tuples) { + auto all = std::make_tuple(tuples...); + auto flat(all); + return flat; +} +constexpr auto fail = flatten(any_udt{}, any_udt{}); Marek
Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")
On Mon, Jan 22, 2018 at 5:08 AM, Paolo Carlini wrote: > On 20/01/2018 02:59, Paolo Carlini wrote: >> >> Hi again, >> >> On 19/01/2018 23:55, Paolo Carlini wrote: >>> >>> ...Therefore It seems to me that a way to more cleanly solve the bug >>> would be moving something like || !DECL_NONTRIVIALLY_INITIALIZED_P to the >>> the above check in check_for_uninitialized_const_var, and therefore remove >>> completely the uninitialized case from potential_constant_expression_1, ... >> >> Of course this doesn't work. check_for_uninitialized_const would really >> need to know that the VAR_DECL appears in a statement expression which is >> initializing a constexpr variable, nothing to do with >> DECL_NONTRIVIALLY_INITIALIZED_P. Ah, that makes sense. And the static/thread_local checks are similarly duplicates of checks in start_decl. It might be nice to factor those three checks out into another function called in both places, but perhaps that's not necessary given how small they are. > Thus the below, carefully tested over the we, would be a change completely > removing the problematic error_at call, plus some testcases checking that we > would still do the right thing in a few cases (bug submitter added > constexpr-83921-2.C). The updated stmtexpr20.C shows that we would reject > anyway a statement expression using an uninitialized inner, simply because > the whole initializer would still be flagged as non const. Do we still diagnose it without the use of the variable? If not, how about adjusting check_for_uninitialized_const_var to support calling it from potential_constant_expression_1? I suppose we'd need to add a couple of parameters, one complain parm and another to indicate that the variable is in a constexpr context, and then return something. Jason
Re: C++ PATCH to fix bogus error with constexpr and empty class (PR c++/81933)
OK, thanks. On Mon, Jan 22, 2018 at 10:44 AM, Marek Polacek wrote: > On Mon, Jan 22, 2018 at 09:40:50AM -0500, Jason Merrill wrote: >> This seems like a workaround rather than a fix; we should fix >> split_nonconstant_init to work properly rather than override it. >> Specifically, it seems odd to return true if num_elts is 0; we >> shouldn't override DECL_INITIAL if we didn't actually split anything >> out. > > Ah, ok, here's a patch for that, then. Thanks. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > 2018-01-22 Marek Polacek > > PR c++/81933 > * typeck2.c (split_nonconstant_init_1): Return false if we didn't > split out anything. > > * g++.dg/cpp1y/constexpr-empty4.C: New test. > > diff --git gcc/cp/typeck2.c gcc/cp/typeck2.c > index 8d933257f5f..8cb0e8811d3 100644 > --- gcc/cp/typeck2.c > +++ gcc/cp/typeck2.c > @@ -725,6 +725,11 @@ split_nonconstant_init_1 (tree dest, tree init) > >/* The rest of the initializer is now a constant. */ >TREE_CONSTANT (init) = 1; > + > + /* We didn't split out anything. */ > + if (num_split_elts == 0) > +return false; > + >return complete_p && complete_ctor_at_level_p (TREE_TYPE (init), > num_split_elts, inner_type); > } > diff --git gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C > gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C > index e69de29bb2d..059220f8268 100644 > --- gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C > +++ gcc/testsuite/g++.dg/cpp1y/constexpr-empty4.C > @@ -0,0 +1,51 @@ > +// PR c++/81933 > +// { dg-do compile { target c++14 } } > + > +namespace std { > +template struct __decay_and_strip { typedef _Tp __type; }; > +template struct enable_if { typedef int type; }; > +template struct _Head_base { > + constexpr _Head_base(_Head) {} > +}; > +template struct _Tuple_impl; > +template > +struct _Tuple_impl<_Idx, _Head, _Tail...> : _Tuple_impl<1, _Tail...>, // { > dg-warning "direct base" } > +_Head_base<_Head> { > + typedef _Tuple_impl<1, _Tail...> _Inherited; > + typedef _Head_base<_Head> _Base; > + constexpr _Tuple_impl(_Head __head, _Tail... __tail) > + : _Inherited(__tail...), _Base(__head) {} > + _Tuple_impl(const _Tuple_impl &) = default; > + _Tuple_impl(_Tuple_impl &&); > +}; > +template > +struct _Tuple_impl<_Idx, _Head> : _Head_base<_Head> { > + typedef _Head_base<_Head> _Base; > + constexpr _Tuple_impl(_Head __head) : _Base(__head) {} > +}; > +template struct _TC { > + static constexpr bool _NotSameTuple() { return true; } > +}; > +template class tuple : _Tuple_impl<0, _Elements...> { > + typedef _Tuple_impl<0, _Elements...> _Inherited; > + > +public: > + template +enable_if<_TC<1>::_NotSameTuple()>::type = false> > + constexpr tuple(_UElements... __elements) : _Inherited(__elements...) {} > + tuple(const tuple &) = default; > +}; > +template > +constexpr tuple::__type...> > +make_tuple(_Elements... __args) { > + typedef tuple::__type...> > __result_type; > + return __result_type(__args...); > +} > +} > +struct any_udt {}; > +template constexpr auto flatten(Tuples... tuples) { > + auto all = std::make_tuple(tuples...); > + auto flat(all); > + return flat; > +} > +constexpr auto fail = flatten(any_udt{}, any_udt{}); > > Marek
Re: [C++] Deprecate old for-scope handling
On Sun, Jan 21, 2018 at 1:40 AM, Eric Gallager wrote: > On 1/19/18, Nathan Sidwell wrote: >> Jason, >> what do you think about deprecating the ARM-era for-scope handling that >> allows: >>void f () >>{ >> for (int i = 0;;); >> i = 2; >>} >> >> we noisily accept that in c++98 mode with -fpermissive. It wasn't even >> well formed then. Implementing this has some unique requirements in the >> name-lookup machinery, which I ran into again today. >> >> Option A: rip out now because it's a c++98 ARM-compatibility crutch >> Option B: deprecate in gcc-8 and remove in gcc-9. > > I support Option B because it's good to let people know ahead of time > about impending removals of things I'd just remove it in GCC 9. I think the existing permerror already qualifies as deprecation, but I wouldn't object to adjusting the diagnostic to mention this plan. I wouldn't want to tear it out at this stage of GCC 8. Jason
Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)
On Mon, Jan 22, 2018 at 02:43:50PM +0100, Jan Hubicka wrote: > > But this looks incorrect, because that computes only the above c2 in > > prob, not > > second_prob. It needs to be > > prob = cprob.invert () * prob / first_prob.invert (); > > or > > prob *= cprob.invert () / first_prob.invert (); > > or that > > prob = (prob - first_prob) / first_prob.invert (); > > I had in the patch. The last one looked to me like cheapest to compute. > > Aha, I see. Ok that makes sense to me now :) > > > > > void profile_probability::split (profile_probability cprob, > > > profile_probability *first_prob, profile_probability *second_prob) > > > > This doesn't take prob as an argument, or should it be a method and take > > prob as *this? Couldn't it simply return first_prob and adjust itself, > > so > > profile_probability split (profile_probability cprob) > > { > > profile_probability ret = *this * cprob; > > *this = (*this - prob) / ret.invert (); > > // or *this = *this * cprob.invert () / ret.invert (); > > return ret; > > } > > ? > > Yep, that is fine. So like this if it passes bootstrap/regtest on {x86_64,i686}-linux? Besides adding the new helper method and using it in the spots I've changed plus the two you've mentioned, I had to do another fix - looking at how TRUTH_ANDIF_EXPR is handled in dojump_1 and seeing Invalid sum messages in predict-8.c's *.expand jump, I've realized that it also works as is only for the discussed, i.e. if (x) goto t; // prob transformed into: if (a) goto t; // first_prob if (b) goto t; // prob but not for the case where and_them is true, where we have: if (c) goto dummy; // first_prob.invert () if (d) goto t; // prob dummy: which needs to be handled similarly to the TRUTH_ANDIF_EXPR case. At least when trying in gdb on predict-8.c various values of cprob (1%, 15%, 50%, 62%, 98%) none of them generate the Invalid sum messages. 2018-01-22 Jakub Jelinek PR tree-optimization/83081 * profile-count.h (profile_probability::split): New method. * dojump.c (do_jump_1) : Use profile_probability::split. (do_compare_rtx_and_jump): Fix adjustment of probabilities when splitting a single conditional jump into 2. * gcc.dg/predict-8.c: Adjust expected probability. --- gcc/profile-count.h.jj 2018-01-19 19:41:52.910549618 +0100 +++ gcc/profile-count.h 2018-01-22 16:20:27.073096515 +0100 @@ -410,6 +410,19 @@ public: return *this; } + /* Split *THIS (ORIG) probability into 2 probabilities, such that + the returned one (FIRST) is *THIS * CPROB and *THIS is + adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND + == ORIG. */ + profile_probability split (const profile_probability &cprob) +{ + profile_probability ret = *this * cprob; + /* The following is equivalent to: +*this = cprob.invert () * *this / ret.invert (); */ + *this = (*this - ret) / ret.invert (); + return ret; +} + gcov_type apply (gcov_type val) const { if (*this == profile_probability::uninitialized ()) --- gcc/dojump.c.jj 2018-01-19 19:41:49.978548984 +0100 +++ gcc/dojump.c2018-01-22 16:31:43.867185428 +0100 @@ -347,13 +347,11 @@ do_jump_1 (enum tree_code code, tree op0 profile_probability op1_prob = profile_probability::uninitialized (); if (prob.initialized_p ()) { -profile_probability false_prob = prob.invert (); -profile_probability op0_false_prob = false_prob.apply_scale (1, 2); - profile_probability op1_false_prob = false_prob.apply_scale (1, 2) - / op0_false_prob.invert (); + op1_prob = prob.invert (); + op0_prob = op1_prob.split (profile_probability::even ()); /* Get the probability that each jump below is true. */ -op0_prob = op0_false_prob.invert (); -op1_prob = op1_false_prob.invert (); + op0_prob = op0_prob.invert (); + op1_prob = op1_prob.invert (); } if (if_false_label == NULL) { @@ -380,8 +378,8 @@ do_jump_1 (enum tree_code code, tree op0 profile_probability op1_prob = profile_probability::uninitialized (); if (prob.initialized_p ()) { -op0_prob = prob.apply_scale (1, 2); -op1_prob = prob.apply_scale (1, 2) / op0_prob.invert (); + op1_prob = prob; + op0_prob = op1_prob.split (profile_probability::even ()); } if (if_true_label == NULL) { @@ -1120,16 +1118,27 @@ do_compare_rtx_and_jump (rtx op0, rtx op else { - profile_probability first_prob = prob; + profile_probability cprob + = profile_probability::guessed_always (); if (first_code == UNORDERED) - first_prob = profile_probability::guessed_always ().apply_scale -
Re: [PACH][RFC] Pass -m -jN to gcc_build from gcc_release
On 01/22/2018 01:57 AM, Jakub Jelinek wrote: > On Mon, Jan 22, 2018 at 09:37:05AM +0100, Richard Biener wrote: >> >> Currently gcc_release builds GCC (for generating in-tree generated >> files) serially - that's prohibitly expensive and takges ages (>4h for >> me). I'm using (when I remember to edit gcc_release ...) -j6 without >> a problem for some years, thus the following proposal. >> >> Any recommendation for the default N? 4 might work just fine as well >> and no release manager should do without at least 4 cores... > > Perhaps > num_cpus=1 > if type -p getconf 2>/dev/null; then > num_cpus=`getconf _NPROCESSORS_ONLN 2>/dev/null` > case "$num_cpus" in > '' | 0* | *[!0-9]*) num_cpus=1;; > esac > fi > > and "-j$num_cpus" ? Seems reasonable to me -- I believe we do something similar on our various internal scripts ({redhat,fedora}-rpm-config). jeff
Re: [PATCH] avoid assuming known string length is constant (PR 83896)
On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote: > gcc/ChangeLog: > > PR tree-optimization/83896 > * tree-ssa-strlen.c (get_string_len): Rename... > (get_string_cst_length): ...to this. Return HOST_WIDE_INT. > Avoid assuming length is constant. > (handle_char_store): Use HOST_WIDE_INT for string length. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/83896 > * gcc.dg/strlenopt-43.c: New test. ... > >if (TREE_CODE (rhs) == MEM_REF >&& integer_zerop (TREE_OPERAND (rhs, 1))) For the future, there is no reason it couldn't handle also non-zero offsets if the to be returned length is bigger or equal than that offset, where the offset can be then subtracted. But let's defer that for GCC9. > @@ -2789,7 +2793,8 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) > if (idx > 0) > { > strinfo *si = get_strinfo (idx); > - if (si && si->full_string_p) > + if (si && si->full_string_p > + && tree_fits_shwi_p (si->nonzero_chars)) If a && or || using condition doesn't fit onto a single line, it should be split into one condition per line, so please change the above into: if (si && si->full_string_p && tree_fits_shwi_p (si->nonzero_chars)) > @@ -2822,7 +2824,7 @@ handle_char_store (gimple_stmt_iterator *gsi) >unsigned HOST_WIDE_INT offset = 0; > >/* Set to the length of the string being assigned if known. */ > - int rhslen; > + HOST_WIDE_INT rhslen; Please remove the empty line above the above comment, the function starts with definitions of multiple variables, rhslen isn't even initialized and there is nothing special on it compared to others, so they should be in one block. Ok for trunk with those changes. > --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) > +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) > @@ -0,0 +1,13 @@ > +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen > + with non-constant length > + { dg-do compile } > + { dg-options "-O2 -Wall -fdump-tree-optimized" } */ > + > +extern char a[5]; > +extern char b[]; > + > +void f (void) > +{ > + if (__builtin_strlen (b) != 4) > +__builtin_memcpy (a, b, sizeof a); > +} Most of the strlenopt*.c tests use the strlenopt.h header and then use the non-__builtin_ function names, if you want to change that too, please do so, but not a requirement. Jakub
Re: [PATCH, rs6000] Add 128-bit support for vec_xl(), vec_xl_be(), vec_xst(), vec_xst_be() builtins.
Segher: I put back the xxpermdi,2 stuff. Per our private discussion about the parallel [(const_int 0)], I found that I could get GCC to compile without parallel. GCC worked with a -O0 on the test case but I got and IRC when using -O1. So, I had to put the parallel back in. The patch is now works for -O0, -O1, -O2 and -O3. I have completed the regression testing again for powerpc64le-unknown-linux-gnu (Power 8 LE) powerpc64le-unknown-linux-gnu (Power 8 BE) powerpc64le-unknown-linux-gnu (Power 9 LE) and all looks good. So, just for the record, here is the final patch that will get committed. Carl Love gcc/ChangeLog: 2018-01-22 Carl Love * config/rs6000/rs6000-builtin.def (ST_ELEMREV_V1TI, LD_ELEMREV_V1TI, LVX_V1TI): Add macro expansion. * config/rs6000/rs6000-c.c (altivec_builtin_types): Add argument definitions for VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_VEC_ST, VSX_BUILTIN_VEC_XL, LD_ELEMREV_V1TI builtins. * config/rs6000/rs6000-p8swap.c (insn_is_swappable_p); Change check to determine if the instruction is a byte reversing entry. Fix typo in comment. * config/rs6000/rs6000.c (altivec_expand_builtin): Add case entry for VSX_BUILTIN_ST_ELEMREV_V1TI and VSX_BUILTIN_LD_ELEMREV_V1TI. Add def_builtin calls for new builtins. * config/rs6000/vsx.md (vsx_st_elemrev_v1ti, vsx_ld_elemrev_v1ti): Add define_insn expansion. gcc/testsuite/ChangeLog: 2018-01-22 Carl Love * gcc.target/powerpc/powerpc.exp: Add torture tests for builtins-4-runnable.c, builtins-6-runnable.c, builtins-5-p9-runnable.c, builtins-6-p9-runnable.c. * gcc.target/powerpc/builtins-6-runnable.c: New test file. * gcc.target/powerpc/builtins-4-runnable.c: Add additional tests for signed/unsigned 128-bit and long long int loads. --- gcc/config/rs6000/rs6000-builtin.def |3 + gcc/config/rs6000/rs6000-c.c | 39 + gcc/config/rs6000/rs6000-p8swap.c |5 +- gcc/config/rs6000/rs6000.c | 20 + gcc/config/rs6000/vsx.md | 42 +- .../gcc.target/powerpc/builtins-4-runnable.c | 494 +- .../gcc.target/powerpc/builtins-6-runnable.c | 1001 gcc/testsuite/gcc.target/powerpc/powerpc.exp | 12 + 8 files changed, 1581 insertions(+), 35 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index b17036c5a..757fd6d50 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -1234,6 +1234,7 @@ BU_ALTIVEC_X (LVXL_V8HI, "lvxl_v8hi",MEM) BU_ALTIVEC_X (LVXL_V16QI, "lvxl_v16qi", MEM) BU_ALTIVEC_X (LVX, "lvx", MEM) BU_ALTIVEC_X (LVX_V2DF,"lvx_v2df", MEM) +BU_ALTIVEC_X (LVX_V1TI,"lvx_v1ti", MEM) BU_ALTIVEC_X (LVX_V2DI,"lvx_v2di", MEM) BU_ALTIVEC_X (LVX_V4SF,"lvx_v4sf", MEM) BU_ALTIVEC_X (LVX_V4SI,"lvx_v4si", MEM) @@ -1783,12 +1784,14 @@ BU_VSX_X (STXVW4X_V4SF, "stxvw4x_v4sf", MEM) BU_VSX_X (STXVW4X_V4SI, "stxvw4x_v4si", MEM) BU_VSX_X (STXVW4X_V8HI, "stxvw4x_v8hi", MEM) BU_VSX_X (STXVW4X_V16QI, "stxvw4x_v16qi", MEM) +BU_VSX_X (LD_ELEMREV_V1TI,"ld_elemrev_v1ti", MEM) BU_VSX_X (LD_ELEMREV_V2DF,"ld_elemrev_v2df", MEM) BU_VSX_X (LD_ELEMREV_V2DI,"ld_elemrev_v2di", MEM) BU_VSX_X (LD_ELEMREV_V4SF,"ld_elemrev_v4sf", MEM) BU_VSX_X (LD_ELEMREV_V4SI,"ld_elemrev_v4si", MEM) BU_VSX_X (LD_ELEMREV_V8HI,"ld_elemrev_v8hi", MEM) BU_VSX_X (LD_ELEMREV_V16QI, "ld_elemrev_v16qi", MEM) +BU_VSX_X (ST_ELEMREV_V1TI,"st_elemrev_v1ti", MEM) BU_VSX_X (ST_ELEMREV_V2DF,"st_elemrev_v2df", MEM) BU_VSX_X (ST_ELEMREV_V2DI,"st_elemrev_v2di", MEM) BU_VSX_X (ST_ELEMREV_V4SF,"st_elemrev_v4sf", MEM) diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index 123e46aa1..a0f790d39 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -3149,16 +3149,27 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 }, { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DF, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 }, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI, +RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 }, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI, +RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_V1TI, 0 }, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI, +RS6000_BTI_unsigned_V1TI, RS6000_BTI_
Re: [PATCH][ARM] Use utxb rN, rM, ror #8 to implement zero_extract on armv6.
Hi Roger, On 15/01/18 14:58, Roger Sayle wrote: I was hoping I could ask an ARM backend maintainer to look over the following patch. I was examining the code generated for the following C snippet on a raspberry pi, static inline int popcount_lut8(unsigned *buf, int n) { int cnt=0; unsigned int i; do { i = *buf; cnt += lut[i&255]; cnt += lut[i>>8&255]; cnt += lut[i>>16&255]; cnt += lut[i>>24]; buf++; } while(--n); return cnt; } and was surprised to see following instruction sequence generated by the compiler: movr5, r2, lsr #8 uxtb r5, r5 This sequence can be performed by a single ARM instruction: uxtb r5, r2, ror #8 I agree. The attached patch allows GCC's combine pass to take advantage of the ARM's uxtb with rotate functionality to implement the above zero_extract, and likewise to use the sxtb with rotate to implement sign_extract. ARM's uxtb and sxtb can only be used with rotates of 0, 8, 16 and 24, and of these only the 8 and 16 are useful [ror #0 is a nop, and extends with ror #24 can be implemented using regular shifts], so the approach here is to add the six missing but useful instructions as 6 different define_insn in arm.md, rather than try to be clever with new predicates. Alas, later ARM hardware has advanced bit field instructions, and earlier ARM cores didn't support extend-with-rotate, so this appears to only benefit armv6 era CPUs. Right, the deal with these instructions is that for ARM state they appear from ARMv6 onwards and for Thumb they appear from ARMv6T2. However, in ARMv6T2 we also get the more advanced UBFX instructions and we might as well use those (and we do). So indeed the UXTB+ROR instructions are only useful for ARMv6 ARM-state, which is what your patch correctly specifies. The following patch has been minimally tested by building cc1 of a cross-compiler and confirming the desired instructions appear in the assembly output for the test case. Alas, my minimal raspberry pi hardware is unlikely to be able to bootstrap gcc or run the testsuite, so I'm hoping a ARM expert can check (and confirm) whether this change is safe and suitable. [Thanks in advance and apologies for any inconvenience]. I've bootstrapped and tested your patch on arm-none-linux-gnueabihf so the patch looks good testing-wise. I only have a few minor comments inline. With those addressed the patch is ok. That being said, GCC is now in stage 4: regression fixes only. You'll have to wait until GCC 8 is released and stage 1 development is reopened before committing this patch. Thanks, Kyrill 2018-01-14 Roger Sayle * config/arm/arm.md (*arm_zeroextractsi2_8_8, *arm_signextractsi2_8_8, *arm_zeroextractsi2_8_16, *arm_signextractsi2_8_16, *arm_zeroextractsi2_16_8, *arm_signextractsi2_16_8): New. 2018-01-14 Roger Sayle * gcc.target/arm/extend-ror.c: New test. Index: gcc/gcc/config/arm/arm.md === --- gcc/gcc/config/arm/arm.md (revision 256667) +++ gcc/gcc/config/arm/arm.md (working copy) @@ -11684,6 +11684,72 @@ "" ) + Implement zero_extract using uxtb/uxth instruction with + the ror #N qualifier when applicable. Please use only two ";;" as comments. /* { dg-do compile } */ /* { dg-options "-O -march=armv6" } */ /* { dg-prune-output "switch .* conflicts with" } */ These directives can be tricky to get right for all the multilib variations that folks test. I would recommend you use something like: /* { dg-do compile } */ /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { "-march=armv6" } } */ /* { dg-require-effective-target arm_arm_ok } */ /* { dg-add-options arm_arch_v6 } */ /* { dg-additional-options "-O -marm" } */ That way it properly adds -march=armv6 -marm and skips the test gracefully if the user tries to force an incompatible set of options.
Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)
Dne 2018-01-22 16:57, Jakub Jelinek napsal: On Mon, Jan 22, 2018 at 02:43:50PM +0100, Jan Hubicka wrote: > But this looks incorrect, because that computes only the above c2 in > prob, not > second_prob. It needs to be > prob = cprob.invert () * prob / first_prob.invert (); > or > prob *= cprob.invert () / first_prob.invert (); > or that > prob = (prob - first_prob) / first_prob.invert (); > I had in the patch. The last one looked to me like cheapest to compute. Aha, I see. Ok that makes sense to me now :) > > > void profile_probability::split (profile_probability cprob, > > profile_probability *first_prob, profile_probability *second_prob) > > This doesn't take prob as an argument, or should it be a method and take > prob as *this? Couldn't it simply return first_prob and adjust itself, > so > profile_probability split (profile_probability cprob) > { > profile_probability ret = *this * cprob; > *this = (*this - prob) / ret.invert (); > // or *this = *this * cprob.invert () / ret.invert (); > return ret; > } > ? Yep, that is fine. So like this if it passes bootstrap/regtest on {x86_64,i686}-linux? Besides adding the new helper method and using it in the spots I've changed plus the two you've mentioned, I had to do another fix - looking at how TRUTH_ANDIF_EXPR is handled in dojump_1 and seeing Invalid sum messages in predict-8.c's *.expand jump, I've realized that it also works as is only for the discussed, i.e. if (x) goto t; // prob transformed into: if (a) goto t; // first_prob if (b) goto t; // prob but not for the case where and_them is true, where we have: if (c) goto dummy; // first_prob.invert () if (d) goto t; // prob dummy: which needs to be handled similarly to the TRUTH_ANDIF_EXPR case. At least when trying in gdb on predict-8.c various values of cprob (1%, 15%, 50%, 62%, 98%) none of them generate the Invalid sum messages. 2018-01-22 Jakub Jelinek PR tree-optimization/83081 * profile-count.h (profile_probability::split): New method. * dojump.c (do_jump_1) : Use profile_probability::split. (do_compare_rtx_and_jump): Fix adjustment of probabilities when splitting a single conditional jump into 2. * gcc.dg/predict-8.c: Adjust expected probability. --- gcc/profile-count.h.jj 2018-01-19 19:41:52.910549618 +0100 +++ gcc/profile-count.h 2018-01-22 16:20:27.073096515 +0100 @@ -410,6 +410,19 @@ public: return *this; } + /* Split *THIS (ORIG) probability into 2 probabilities, such that + the returned one (FIRST) is *THIS * CPROB and *THIS is + adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND + == ORIG. */ To make documentation clear, i would include the pseudocode with conditionals and individual probabilities. It is bit non-obvious transformation and it would be nice to reduce level of apparent magicness in profiling code :) OK, Honza + profile_probability split (const profile_probability &cprob) +{ + profile_probability ret = *this * cprob; + /* The following is equivalent to: +*this = cprob.invert () * *this / ret.invert (); */ + *this = (*this - ret) / ret.invert (); + return ret; +} + gcov_type apply (gcov_type val) const { if (*this == profile_probability::uninitialized ()) --- gcc/dojump.c.jj 2018-01-19 19:41:49.978548984 +0100 +++ gcc/dojump.c2018-01-22 16:31:43.867185428 +0100 @@ -347,13 +347,11 @@ do_jump_1 (enum tree_code code, tree op0 profile_probability op1_prob = profile_probability::uninitialized (); if (prob.initialized_p ()) { -profile_probability false_prob = prob.invert (); -profile_probability op0_false_prob = false_prob.apply_scale (1, 2); - profile_probability op1_false_prob = false_prob.apply_scale (1, 2) - / op0_false_prob.invert (); + op1_prob = prob.invert (); + op0_prob = op1_prob.split (profile_probability::even ()); /* Get the probability that each jump below is true. */ -op0_prob = op0_false_prob.invert (); -op1_prob = op1_false_prob.invert (); + op0_prob = op0_prob.invert (); + op1_prob = op1_prob.invert (); } if (if_false_label == NULL) { @@ -380,8 +378,8 @@ do_jump_1 (enum tree_code code, tree op0 profile_probability op1_prob = profile_probability::uninitialized (); if (prob.initialized_p ()) { -op0_prob = prob.apply_scale (1, 2); -op1_prob = prob.apply_scale (1, 2) / op0_prob.invert (); + op1_prob = prob; + op0_prob = op1_prob.split (profile_probability::even ()); } if (if_true_label == NULL) { @@ -1120,16 +1118,27 @@ do_compare_rtx_and_jump (rtx op0, rtx op else { - profile_probability first_prob = prob;
Disable some patterns for fold-left reductions (PR 83965)
In this PR we recognised a PLUS_EXPR as a fold-left reduction, then applied pattern matching to convert it to a WIDEN_SUM_EXPR. We need to keep the original code in this case since we implement the reduction using scalar rather than vector operations. Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2018-01-22 Richard Sandiford gcc/ PR tree-optimization/83965 * tree-vect-patterns.c (vect_reassociating_reduction_p): New function. (vect_recog_dot_prod_pattern, vect_recog_sad_pattern): Use it instead of checking only for a reduction. (vect_recog_widen_sum_pattern): Likewise. gcc/testsuite/ PR tree-optimization/83965 * gcc.dg/vect/pr83965.c: New test. Index: gcc/tree-vect-patterns.c === --- gcc/tree-vect-patterns.c2018-01-13 18:01:51.240735098 + +++ gcc/tree-vect-patterns.c2018-01-22 17:51:06.751462689 + @@ -217,6 +217,16 @@ vect_recog_temp_ssa_var (tree type, gimp return make_temp_ssa_name (type, stmt, "patt"); } +/* Return true if STMT_VINFO describes a reduction for which reassociation + is allowed. */ + +static bool +vect_reassociating_reduction_p (stmt_vec_info stmt_vinfo) +{ + return (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def + && STMT_VINFO_REDUC_TYPE (stmt_vinfo) != FOLD_LEFT_REDUCTION); +} + /* Function vect_recog_dot_prod_pattern Try to find the following pattern: @@ -336,7 +346,7 @@ vect_recog_dot_prod_pattern (vec *s { gimple *def_stmt; - if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def + if (!vect_reassociating_reduction_p (stmt_vinfo) && ! STMT_VINFO_GROUP_FIRST_ELEMENT (stmt_vinfo)) return NULL; plus_oprnd0 = gimple_assign_rhs1 (last_stmt); @@ -1181,7 +1191,7 @@ vect_recog_widen_sum_pattern (vec
Fix vect_float markup for a couple of tests (PR 83888)
vect_float is true for arm*-*-* targets, but the support is only available when -funsafe-math-optimizations is on. This caused failures in two tests that disable fast-math. The easiest fix seemed to be to add a new target selector for "vect_float without special options". Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2018-01-22 Richard Sandiford gcc/ PR testsuite/83888 * doc/sourcebuild.texi (vect_float): Say that the selector only describes the situation when -funsafe-math-optimizations is on. (vect_float_strict): Document. gcc/testsuite/ PR testsuite/83888 * lib/target-supports.exp (check_effective_target_vect_float): Say that the result only holds when -funsafe-math-optimizations is on. (check_effective_target_vect_float_strict): New procedure. * gcc.dg/vect/no-fast-math-vect16.c: Use vect_float_strict instead of vect_float. * gcc.dg/vect/vect-reduc-6.c: Likewise. Index: gcc/doc/sourcebuild.texi === --- gcc/doc/sourcebuild.texi2018-01-22 17:51:03.860579049 + +++ gcc/doc/sourcebuild.texi2018-01-22 17:54:02.172848564 + @@ -1403,7 +1403,13 @@ The target's preferred vector alignment alignment. @item vect_float -Target supports hardware vectors of @code{float}. +Target supports hardware vectors of @code{float} when +@option{-funsafe-math-optimizations} is in effect. + +@item vect_float_strict +Target supports hardware vectors of @code{float} when +@option{-funsafe-math-optimizations} is not in effect. +This implies @code{vect_float}. @item vect_int Target supports hardware vectors of @code{int}. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp 2018-01-22 17:51:03.817580787 + +++ gcc/testsuite/lib/target-supports.exp 2018-01-22 17:54:02.173848531 + @@ -5492,7 +5492,8 @@ proc check_effective_target_vect_long { return $answer } -# Return 1 if the target supports hardware vectors of float, 0 otherwise. +# Return 1 if the target supports hardware vectors of float when +# -funsafe-math-optimizations is enabled, 0 otherwise. # # This won't change for different subtargets so cache the result. @@ -5525,6 +5526,14 @@ proc check_effective_target_vect_float { return $et_vect_float_saved($et_index) } +# Return 1 if the target supports hardware vectors of float without +# -funsafe-math-optimizations being enabled, 0 otherwise. + +proc check_effective_target_vect_float_strict { } { +return [expr { [check_effective_target_vect_float] + && ![istarget arm*-*-*] }] +} + # Return 1 if the target supports hardware vectors of double, 0 otherwise. # # This won't change for different subtargets so cache the result. Index: gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c === --- gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c 2018-01-13 18:01:15.293116922 + +++ gcc/testsuite/gcc.dg/vect/no-fast-math-vect16.c 2018-01-22 17:54:02.172848564 + @@ -1,4 +1,4 @@ -/* { dg-require-effective-target vect_float } */ +/* { dg-require-effective-target vect_float_strict } */ #include #include "tree-vect.h" Index: gcc/testsuite/gcc.dg/vect/vect-reduc-6.c === --- gcc/testsuite/gcc.dg/vect/vect-reduc-6.c2018-01-13 18:01:15.294116882 + +++ gcc/testsuite/gcc.dg/vect/vect-reduc-6.c2018-01-22 17:54:02.172848564 + @@ -1,4 +1,4 @@ -/* { dg-require-effective-target vect_float } */ +/* { dg-require-effective-target vect_float_strict } */ /* { dg-additional-options "-fno-fast-math" } */ #include
Re: [PATCH, rev2], Fix PR target/pr83862: Fix PowerPC long double signbit with -mabi=ieeelongdouble
Hi Mike, Thanks for doing this! On Sun, Jan 21, 2018 at 07:03:58PM -0500, Michael Meissner wrote: > I've reworked the patch for PR83864. This bug is due to the compiler issuing > an internal error for code of the form on a little endian system: > > int sb (_Float128 *p, unsigned long n) { return __builtin_signbit (p[n]); > } > > The problem is that the memory optimization wants to load the high double word > from memory to isolate the signbit. In little endian, the high word is at > offset 8 instead of 0. > > The bug will also show up if you use the long double type, and you use the > -mabi=ieeelongdouble option. > > Previously the code did one UNSPEC (signbit2_dm) that was done before > register allocation, and combined the value being in vector register, memory, > and GPR register along with the shift to isolate the signbit. Then after the > split after register allocation, it created a second UNSPEC > (signbit2_dm2) that was just the direct move, or it did the memory and > GPR code path separately. > > With these patches, the generator code issues an UNSPEC (signbit2_dm) > just to get the high double word, and the shift right is then emitted. The > UNSPEC only handles the value being in either vector or GPR register. There > is > a second UNSPEC that is created by the combiner if the value is in memory. On > little endian systems, the first split pass (before register allocation) will > allocate a pseudo register to hold the initial ADD of the base and index > registers for indexed loads, and then forms a LD reg,8(tmp) to load the high > word. Just in case, some code after register allocation reforms the UNSPEC, > it > uses a base register for the load, and it can use the base register as needed > for the temporary. But does it have to do an unspec at all? Can't it just immediately take the relevant 64-bit half (during expand), no unspec in sight, and that is that? > I have run this code on both little endian and big endian power8 systems, > doing > bootstrap and regression test. There were no regressions. Can I install this > bug on the trunk? > > I don't believe the bug shows up in gcc 6/7 branches, but I will build these > and test to see if the code is needed to be backported. If it is needed on the branches, okay for there too (once 7 reopens). > [gcc] > 2018-01-21 Michael Meissner > > PR target/83862 > * config/rs6000/rs6000-protos.h (rs6000_split_signbit): Delete, > no longer used. > * config/rs6000/rs6000.c (rs6000_split_signbit): Likewise. > * config/rs6000/rs6000.md (signbit2): Change code for IEEE > 128-bit to produce an UNSPEC move to get the double word with the > signbit and then a shift directly to do signbit. > (signbit2_dm): Replace old IEEE 128-bit signbit > implementation with a new version that just does either a direct > move or a regular move. Move memory interface to separate insns. > Move insns so they are next to the expander. > (signbit2_dm_mem_be): New combiner insns to combine load > with signbit move. Split big and little endian case. > (signbit2_dm_mem_le): Likewise. > (signbit2_dm_ext): Delete, no longer used. > (signbit2_dm2): Likewise. > > [gcc/testsuite] > 2018-01-22 Michael Meissner > > PR target/83862 > * gcc.target/powerpc/pr83862.c: New test. The patch is okay for trunk, but I still think this can be a lot simpler. > +;; We can't use SUBREG:DI of the IEEE 128-bit value before register > +;; allocation, because KF/TFmode aren't tieable with DImode. Also, having > the > +;; 64-bit scalar part in the high end of the register instead of the low end > +;; also complicates things. So instead, we use an UNSPEC to do the move of > the > +;; high double word to isolate the signbit. I'll think about this. Either way, thank you for your patience! And the patch :-) Segher
Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)
On Mon, Jan 22, 2018 at 06:44:17PM +0100, Jan Hubicka wrote: > > + /* Split *THIS (ORIG) probability into 2 probabilities, such that > > + the returned one (FIRST) is *THIS * CPROB and *THIS is > > + adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND > > + == ORIG. */ > > To make documentation clear, i would include the pseudocode with > conditionals and individual > probabilities. > It is bit non-obvious transformation and it would be nice to reduce level of > apparent > magicness in profiling code :) So like this? 2018-01-22 Jakub Jelinek PR tree-optimization/83081 * profile-count.h (profile_probability::split): New method. * dojump.c (do_jump_1) : Use profile_probability::split. (do_compare_rtx_and_jump): Fix adjustment of probabilities when splitting a single conditional jump into 2. * gcc.dg/predict-8.c: Adjust expected probability. --- gcc/profile-count.h.jj 2018-01-19 19:41:52.910549618 +0100 +++ gcc/profile-count.h 2018-01-22 16:20:27.073096515 +0100 @@ -410,6 +410,30 @@ public: return *this; } + /* Split *THIS (ORIG) probability into 2 probabilities, such that + the returned one (FIRST) is *THIS * CPROB and *THIS is + adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND + == ORIG. This is useful e.g. when splitting a conditional + branch like: + if (cond) + goto lab; // ORIG probability + into + if (cond1) + goto lab; // FIRST = ORIG * CPROB probability + if (cond2) + goto lab; // SECOND probability + such that the overall probability of jumping to lab remains + the same. CPROB gives the relative probability between the + branches. */ + profile_probability split (const profile_probability &cprob) +{ + profile_probability ret = *this * cprob; + /* The following is equivalent to: + *this = cprob.invert () * *this / ret.invert (); */ + *this = (*this - ret) / ret.invert (); + return ret; +} + gcov_type apply (gcov_type val) const { if (*this == profile_probability::uninitialized ()) --- gcc/dojump.c.jj 2018-01-19 19:41:49.978548984 +0100 +++ gcc/dojump.c2018-01-22 16:31:43.867185428 +0100 @@ -347,13 +347,11 @@ do_jump_1 (enum tree_code code, tree op0 profile_probability op1_prob = profile_probability::uninitialized (); if (prob.initialized_p ()) { -profile_probability false_prob = prob.invert (); -profile_probability op0_false_prob = false_prob.apply_scale (1, 2); - profile_probability op1_false_prob = false_prob.apply_scale (1, 2) - / op0_false_prob.invert (); + op1_prob = prob.invert (); + op0_prob = op1_prob.split (profile_probability::even ()); /* Get the probability that each jump below is true. */ -op0_prob = op0_false_prob.invert (); -op1_prob = op1_false_prob.invert (); + op0_prob = op0_prob.invert (); + op1_prob = op1_prob.invert (); } if (if_false_label == NULL) { @@ -380,8 +378,8 @@ do_jump_1 (enum tree_code code, tree op0 profile_probability op1_prob = profile_probability::uninitialized (); if (prob.initialized_p ()) { -op0_prob = prob.apply_scale (1, 2); -op1_prob = prob.apply_scale (1, 2) / op0_prob.invert (); + op1_prob = prob; + op0_prob = op1_prob.split (profile_probability::even ()); } if (if_true_label == NULL) { @@ -1120,16 +1118,27 @@ do_compare_rtx_and_jump (rtx op0, rtx op else { - profile_probability first_prob = prob; + profile_probability cprob + = profile_probability::guessed_always (); if (first_code == UNORDERED) - first_prob = profile_probability::guessed_always ().apply_scale -(1, 100); + cprob = cprob.apply_scale (1, 100); else if (first_code == ORDERED) - first_prob = profile_probability::guessed_always ().apply_scale -(99, 100); + cprob = cprob.apply_scale (99, 100); + else + cprob = profile_probability::even (); + /* We want to split: +if (x) goto t; // prob; +into +if (a) goto t; // first_prob; +if (b) goto t; // prob; +such that the overall probability of jumping to t +remains the same and first_prob is prob * cprob. */ if (and_them) { rtx_code_label *dest_label; + prob = prob.invert (); + profile_probability first_prob = prob.split (cprob).invert (); + prob = prob.invert (); /*
[Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
This patch has been triggered by Thomas's recent message to the list. Not only did I start work late relative to stage 3 but debugging took somewhat longer than anticipated. Therefore, to get this committed asap, we will have to beg the indulgence of the release managers and prompt review and/or testing by fortran maintainers. (Dominique has already undertaken to test -m32.) The patch delivers rank up to 15 for F2008, the descriptor information needed to enact the F2018 C descriptor macros and an attribute field to store such information as pointer/allocatable, contiguous etc.. Only the first has been enabled so far but it was necessary to submit the array descriptor changes now to avoid any further ABI breakage in 9.0.0. I took the design choice choice to replace the dtype with a structure: typedef struct dtype_type { size_t elem_len; int version; int rank; int type; int attribute; } dtype_type; This choice was intended to reduce the changes to a minimum, since in most references to the dtype, one dtype is assigned to another. The F2018 interop defines the 'type and 'attribute fields to be signed char types. I used this intially but found that using int was the best way to silence the warnings about padding since it also allows for more attribute information to be carried. Some parts of the patch (eg. in get_scalar_to_descriptor_type) look as if latent bugs were uncovered by the change to the descriptor. If so, the time spent debugging was well worthwhile. It should be noted that some of the intrinsics, which use switch/case for the type/kind selection, limit the effective element size that they handle to the maximum value of size_t, less 7 bits. A bit of straightforward work there would fix this limitation and would allow the GFC_DTYPE shifts and masks to be eliminated. Bootstraps and regtests on FC23/x86_64 - OK for trunk? Paul 2018-22-01 Paul Thomas PR fortran/37577 * array.c (gfc_match_array_ref): If standard earlier than F2008 it is an error if the reference dimension is greater than 7. libgfortran.h : Increase GFC_MAX_DIMENSIONS to 15. Change the dtype masks and shifts accordingly. * trans-array.c (gfc_conv_descriptor_dtype): Use the dtype type node to check the field. (gfc_conv_descriptor_dtype): Access the rank field of dtype. (duplicate_allocatable_coarray): Access the rank field of the dtype descriptor rather than the dtype itself. * trans-expr.c (get_scalar_to_descriptor_type): Store the type of 'scalar' on entry and use its TREE_TYPE if it is ARRAY_TYPE (ie. a character). (gfc_conv_procedure_call): Pass TREE_OPERAND (tmp,0) to get_scalar_to_descriptor_type if the actual expression is a constant. (gfc_trans_structure_assign): Assign the rank directly to the dtype rank field. (gfc_conv_intrinsic_sizeof): Obtain the element size from the 'elem_len' field of the dtype. * trans-io.c (gfc_build_io_library_fndecls): Replace gfc_int4_type_node with dtype_type_node where necessary. (transfer_namelist_element): Use gfc_get_dtype_rank_type for scalars. * trans-types.c : Provide 'get_dtype_type_node' to acces the dtype_type_node and, if necessary, build it. The maximum size of an array element is now determined by the maximum value of size_t. Update the description of the array descriptor, including the type def for the dtype_type. (gfc_get_dtype_rank_type): Build a constructor for the dtype. Distinguish RECORD_TYPEs that are BT_DERIVED or BT_CLASS. (gfc_get_array_descriptor_base): Change the type of the dtype field to dtype_type_node. (gfc_get_array_descr_info): Get the offset to the rank field of the dtype. * trans-types.h : Add a prototype for 'get_dtype_type_node ()'. * trans.h : Define the indices of the dtype fields. 2018-22-01 Paul Thomas PR fortran/37577 * gfortran.dg/coarray_18.f90: Allow dimension 15 for F2008. * gfortran.dg/coarray_lib_this_image_2.f90: Change 'array1' to 'array01' in the tree dump comparison. * gfortran.dg/coarray_lib_token_4.f90: Likewise. * gfortran.dg/inline_sum_1.f90: Similar - allow two digits. * gfortran.dg/rank_1.f90: Allow dimension 15 for F2008. 2018-22-01 Paul Thomas PR fortran/37577 * caf/single.c (_gfortran_caf_failed_images): Access the 'type' and 'elem_len' fields of the dtype instead of the shifts. (_gfortran_caf_stopped_images): Likewise. * intrinsics/associated.c (associated): Compare the 'type' and 'elem_len' fields instead of the dtype. * caf/date_and_time.c : Access the dtype fields rather using shifts and masks. * io/transfer.c (transfer_array ): Comment on item count. (set_nml_var,st_set_nml_var): Change dtype type and use fields. (st_set_nml_dtio_var): Likewise. * libgfortran.h : Change definition of GFC_ARRAY_DESCRIPTOR and add a typedef for the dtype_type. Change the GFC_DTYPE_* macros to
Re: [PATCH, rev2], Fix PR target/pr83862: Fix PowerPC long double signbit with -mabi=ieeelongdouble
On Mon, Jan 22, 2018 at 12:32:13PM -0600, Segher Boessenkool wrote: > Hi Mike, > > Thanks for doing this! > > On Sun, Jan 21, 2018 at 07:03:58PM -0500, Michael Meissner wrote: > > With these patches, the generator code issues an UNSPEC (signbit2_dm) > > just to get the high double word, and the shift right is then emitted. The > > UNSPEC only handles the value being in either vector or GPR register. > > There is > > a second UNSPEC that is created by the combiner if the value is in memory. > > On > > little endian systems, the first split pass (before register allocation) > > will > > allocate a pseudo register to hold the initial ADD of the base and index > > registers for indexed loads, and then forms a LD reg,8(tmp) to load the high > > word. Just in case, some code after register allocation reforms the > > UNSPEC, it > > uses a base register for the load, and it can use the base register as > > needed > > for the temporary. > > But does it have to do an unspec at all? Can't it just immediately take > the relevant 64-bit half (during expand), no unspec in sight, and that > is that? Yes it needs the UNSPEC. As I said, you can't use SUBREG due to a combination of MODES_TIEABLE_P and the way things are represented in vector registers (with the scalar part being in the upper part of the register). I tried it, and could not get it to work. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[PATCH] Finish removing class move_computations_dom_walker
r232820 (aka 2c7b2f8860794cc9b9cf5eeea9d7dc109c0de3be) removed the implementation of class move_computations_dom_walker, but kept the decl. This patch removes the stray decl. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: PR tree-optimization/69452 * tree-ssa-loop-im.c (class move_computations_dom_walker): Remove decl. --- gcc/tree-ssa-loop-im.c | 11 --- 1 file changed, 11 deletions(-) diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 7864fbd..4655766 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -1081,17 +1081,6 @@ invariantness_dom_walker::before_dom_children (basic_block bb) return NULL; } -class move_computations_dom_walker : public dom_walker -{ -public: - move_computations_dom_walker (cdi_direction direction) -: dom_walker (direction), todo_ (0) {} - - virtual edge before_dom_children (basic_block); - - unsigned int todo_; -}; - /* Hoist the statements in basic block BB out of the loops prescribed by data stored in LIM_DATA structures associated with each statement. Callback for walk_dominator_tree. */ -- 1.8.5.3
[PATCH] Fix memory leaks in sbitmap.c selftests
"make selftest-valgrind" shows a few leaks in sbitmap.c's selftests; this patch fixes them. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * sbitmap.c (selftest::test_set_range): Fix memory leaks. (selftest::test_bit_in_range): Likewise. --- gcc/sbitmap.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c index cf46cb2..967868a 100644 --- a/gcc/sbitmap.c +++ b/gcc/sbitmap.c @@ -897,6 +897,7 @@ test_set_range () bitmap_set_range (s, 15, 1); ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 1, 14)); ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 15, 15)); + sbitmap_free (s); s = sbitmap_alloc (1024); bitmap_clear (s); @@ -914,6 +915,7 @@ test_set_range () ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 512 + 64, 1023)); ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512, 512)); ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512 + 63, 512 + 63)); + sbitmap_free (s); } /* Verify bitmap_bit_in_range_p functions for sbitmap. */ @@ -935,6 +937,8 @@ test_bit_in_range () ASSERT_TRUE (bitmap_bit_in_range_p (s, 100, 100)); ASSERT_TRUE (bitmap_bit_p (s, 100)); + sbitmap_free (s); + s = sbitmap_alloc (64); bitmap_clear (s); bitmap_set_bit (s, 63); @@ -942,6 +946,7 @@ test_bit_in_range () ASSERT_TRUE (bitmap_bit_in_range_p (s, 1, 63)); ASSERT_TRUE (bitmap_bit_in_range_p (s, 63, 63)); ASSERT_TRUE (bitmap_bit_p (s, 63)); + sbitmap_free (s); s = sbitmap_alloc (1024); bitmap_clear (s); @@ -985,6 +990,7 @@ test_bit_in_range () ASSERT_FALSE (bitmap_bit_in_range_p (s, 17, 31)); ASSERT_FALSE (bitmap_bit_in_range_p (s, 49, 63)); ASSERT_FALSE (bitmap_bit_in_range_p (s, 65, 1023)); + sbitmap_free (s); } /* Run all of the selftests within this file. */ -- 1.8.5.3
[PATCH] Add more test coverage to selftest::test_location_wrappers
This patch adds a few extra assertions to selftest::test_location_wrappers. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * tree.c (selftest::test_location_wrappers): Add more test coverage. --- gcc/tree.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/tree.c b/gcc/tree.c index b3e93b8..c5baf08 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -14490,6 +14490,8 @@ test_location_wrappers () { location_t loc = BUILTINS_LOCATION; + ASSERT_EQ (NULL_TREE, maybe_wrap_with_location (NULL_TREE, loc)); + /* Wrapping a constant. */ tree int_cst = build_int_cst (integer_type_node, 42); ASSERT_FALSE (CAN_HAVE_LOCATION_P (int_cst)); @@ -14500,6 +14502,14 @@ test_location_wrappers () ASSERT_EQ (loc, EXPR_LOCATION (wrapped_int_cst)); ASSERT_EQ (int_cst, tree_strip_any_location_wrapper (wrapped_int_cst)); + /* We shouldn't add wrapper nodes for UNKNOWN_LOCATION. */ + ASSERT_EQ (int_cst, maybe_wrap_with_location (int_cst, UNKNOWN_LOCATION)); + + /* We shouldn't add wrapper nodes for nodes that CAN_HAVE_LOCATION_P. */ + tree cast = build1 (NOP_EXPR, char_type_node, int_cst); + ASSERT_TRUE (CAN_HAVE_LOCATION_P (cast)); + ASSERT_EQ (cast, maybe_wrap_with_location (cast, loc)); + /* Wrapping a STRING_CST. */ tree string_cst = build_string (4, "foo"); ASSERT_FALSE (CAN_HAVE_LOCATION_P (string_cst)); -- 1.8.5.3
Re: [PATCH] Fix branch probability computation in do_compare_rtx_and_jump (PR tree-optimization/83081)
Dne 2018-01-22 19:36, Jakub Jelinek napsal: On Mon, Jan 22, 2018 at 06:44:17PM +0100, Jan Hubicka wrote: > + /* Split *THIS (ORIG) probability into 2 probabilities, such that > + the returned one (FIRST) is *THIS * CPROB and *THIS is > + adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND > + == ORIG. */ To make documentation clear, i would include the pseudocode with conditionals and individual probabilities. It is bit non-obvious transformation and it would be nice to reduce level of apparent magicness in profiling code :) So like this? 2018-01-22 Jakub Jelinek PR tree-optimization/83081 * profile-count.h (profile_probability::split): New method. * dojump.c (do_jump_1) : Use profile_probability::split. (do_compare_rtx_and_jump): Fix adjustment of probabilities when splitting a single conditional jump into 2. * gcc.dg/predict-8.c: Adjust expected probability. Looks great. Thanks a lot! Honza --- gcc/profile-count.h.jj 2018-01-19 19:41:52.910549618 +0100 +++ gcc/profile-count.h 2018-01-22 16:20:27.073096515 +0100 @@ -410,6 +410,30 @@ public: return *this; } + /* Split *THIS (ORIG) probability into 2 probabilities, such that + the returned one (FIRST) is *THIS * CPROB and *THIS is + adjusted (SECOND) so that FIRST + FIRST.invert () * SECOND + == ORIG. This is useful e.g. when splitting a conditional + branch like: + if (cond) + goto lab; // ORIG probability + into + if (cond1) + goto lab; // FIRST = ORIG * CPROB probability + if (cond2) + goto lab; // SECOND probability + such that the overall probability of jumping to lab remains + the same. CPROB gives the relative probability between the + branches. */ + profile_probability split (const profile_probability &cprob) +{ + profile_probability ret = *this * cprob; + /* The following is equivalent to: + *this = cprob.invert () * *this / ret.invert (); */ + *this = (*this - ret) / ret.invert (); + return ret; +} + gcov_type apply (gcov_type val) const { if (*this == profile_probability::uninitialized ()) --- gcc/dojump.c.jj 2018-01-19 19:41:49.978548984 +0100 +++ gcc/dojump.c2018-01-22 16:31:43.867185428 +0100 @@ -347,13 +347,11 @@ do_jump_1 (enum tree_code code, tree op0 profile_probability op1_prob = profile_probability::uninitialized (); if (prob.initialized_p ()) { -profile_probability false_prob = prob.invert (); -profile_probability op0_false_prob = false_prob.apply_scale (1, 2); - profile_probability op1_false_prob = false_prob.apply_scale (1, 2) - / op0_false_prob.invert (); + op1_prob = prob.invert (); + op0_prob = op1_prob.split (profile_probability::even ()); /* Get the probability that each jump below is true. */ -op0_prob = op0_false_prob.invert (); -op1_prob = op1_false_prob.invert (); + op0_prob = op0_prob.invert (); + op1_prob = op1_prob.invert (); } if (if_false_label == NULL) { @@ -380,8 +378,8 @@ do_jump_1 (enum tree_code code, tree op0 profile_probability op1_prob = profile_probability::uninitialized (); if (prob.initialized_p ()) { -op0_prob = prob.apply_scale (1, 2); -op1_prob = prob.apply_scale (1, 2) / op0_prob.invert (); + op1_prob = prob; + op0_prob = op1_prob.split (profile_probability::even ()); } if (if_true_label == NULL) { @@ -1120,16 +1118,27 @@ do_compare_rtx_and_jump (rtx op0, rtx op else { - profile_probability first_prob = prob; + profile_probability cprob + = profile_probability::guessed_always (); if (first_code == UNORDERED) - first_prob = profile_probability::guessed_always ().apply_scale -(1, 100); + cprob = cprob.apply_scale (1, 100); else if (first_code == ORDERED) - first_prob = profile_probability::guessed_always ().apply_scale -(99, 100); + cprob = cprob.apply_scale (99, 100); + else + cprob = profile_probability::even (); + /* We want to split: +if (x) goto t; // prob; +into +if (a) goto t; // first_prob; +if (b) goto t; // prob; +such that the overall probability of jumping to t +remains the same and first_prob is prob * cprob. */ if (and_them) { rtx_code_label *dest_label; + prob = prob.invert (); + profile_probability first_prob = prob.split (cprob).invert ();
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas wrote: > This patch has been triggered by Thomas's recent message to the list. > Not only did I start work late relative to stage 3 but debugging took > somewhat longer than anticipated. Therefore, to get this committed > asap, we will have to beg the indulgence of the release managers and > prompt review and/or testing by fortran maintainers. (Dominique has > already undertaken to test -m32.) I think that if we can "guarantee" that we're happy with the current ABI for GCC 9 (and hopefully 10, 11, ...?) we have a quite strong case for committing it now. But if anybody if planning on doing some ABI-breaking work in the foreseeable future then maybe we should wait until GCC 9 stage1 opens. Anybody with such plans? > The patch delivers rank up to 15 for F2008, the descriptor information > needed to enact the F2018 C descriptor macros and an attribute field > to store such information as pointer/allocatable, contiguous etc.. > Only the first has been enabled so far but it was necessary to submit > the array descriptor changes now to avoid any further ABI breakage in > 9.0.0. > > I took the design choice choice to replace the dtype with a structure: > typedef struct dtype_type > { > size_t elem_len; > int version; > int rank; > int type; > int attribute; > } > dtype_type; > > This choice was intended to reduce the changes to a minimum, since in > most references to the dtype, one dtype is assigned to another. The only potential objection I can come up with is that if we at some point want to make the F2018 C interoperability descriptor the native descriptor type, then we have to mash those fields into the parent descriptor anyway. But I don't really think this is a very strong objection, as I don't think it's a huge difference in the amount of work whichever path is taken. If this is the easiest path for now, so be it. > The F2018 interop defines the 'type and 'attribute fields to be signed > char types. I used this intially but found that using int was the best > way to silence the warnings about padding since it also allows for > more attribute information to be carried. To be pedantic, F2018 doesn't require that they be signed char, it's just what I wrote on https://gcc.gnu.org/wiki/ArrayDescriptorUpdate as *one* potential implementation of ISO_Fortran_binding.h. What the (draft) F2018 standard says is: CFI_attribute_t is a typedef name for a standard integer type capable of representing the values of the attribute codes. CFI_type_t is a typedef name for a standard integer type capable of representing the values for the supported type specifiers. CFI_rank_t is a typedef name for a standard integer type capable of representing the largest supported rank. So using int is standard conforming. (elem_len and version are explicitly specified to be size_t and int, respectively) > It should be noted that some of the intrinsics, which use switch/case > for the type/kind selection, limit the effective element size that > they handle to the maximum value of size_t, less 7 bits. A bit of > straightforward work there would fix this limitation and would allow > the GFC_DTYPE shifts and masks to be eliminated. Hmm, is this a hidden ABI break, then? Are you saying that even after this patch we encode the type/kind into the elem_len field? If at some point we stop doing that and use the type field, it's an ABI break even though the descriptor layout doesn't change? (I haven't had time to look into the patch itself, yet. If we decide to do this for GCC 8 I'll try to find some time.) -- Janne Blomqvist
[PATCH] v2 -Warray-bounds: Fix false positive in some "switch" stmts (PR tree-optimization/83510)
On Fri, 2018-01-19 at 09:45 +0100, Richard Biener wrote: > On Fri, Jan 19, 2018 at 12:36 AM, David Malcolm > wrote: > > PR tree-optimization/83510 reports that r255649 (for > > PR tree-optimization/83312) introduced a false positive for > > -Warray-bounds for array accesses within certain switch statements: > > those for which value-ranges allow more than one case to be > > reachable, > > but for which one or more of the VR-unreachable cases contain > > out-of-range array accesses. > > > > In the reproducer, after the switch in f is inlined into g, we have > > 3 cases > > for the switch (case 9, case 10-19, and default), within a loop > > that > > ranges from 0..9. > > > > With both the old and new code, > > vr_values::simplify_switch_using_ranges clears > > the EDGE_EXECUTABLE flag on the edge to the "case 10-19" > > block. This > > happens during the dom walk within the substitute_and_fold_engine. > > > > With the old code, the clearing of that EDGE_EXECUTABLE flag led to > > the > > /* Skip blocks that were found to be unreachable. */ > > code in the old implementation of vrp_prop::check_all_array_refs > > skipping > > the "case 10-19" block. > > > > With the new code, we have a second dom walk, and that dom_walker's > > ctor > > sets all edges to be EDGE_EXECUTABLE, losing that information. > > > > Then, dom_walker::before_dom_children (here, the subclass' > > check_array_bounds_dom_walker::before_dom_children) can return one > > edge, if > > there's a unique successor edge, and dom_walker::walk filters the > > dom walk > > to just that edge. > > > > Here we have two VR-valid edges (case 9 and default), and an VR- > > invalid > > successor edge (case 10-19). There's no *unique* valid successor > > edge, > > and hence taken_edge is NULL, and the filtering in dom_walker::walk > > doesn't fire. > > > > Hence we've lost the filtering of the "case 10-19" BB, hence the > > false > > positive. > > > > The issue is that we have two dom walks: first within vr_values' > > substitute_and_fold_dom_walker (which has skip_unreachable_blocks > > == false), > > then another within vrp_prop::check_all_array_refs (with > > skip_unreachable_blocks == true). > > > > Each has different "knowledge" about ruling out edges due to value- > > ranges, > > but we aren't combining that information. The former "knows" about > > out-edges at a particular control construct (e.g. at a switch), the > > latter > > "knows" about dominance, but only about unique successors (hence > > the > > problem when two out of three switch cases are valid). > > > > This patch combines the information by preserving the > > EDGE_EXECUTABLE > > flags from the first dom walk, and using it in the second dom walk, > > potentially rejecting additional edges. > > > > Doing so fixes the false positive. > > > > I attempted an alternative fix, merging the two dom walks into one, > > but > > that led to crashes in identify_jump_threads, so I went with this, > > as > > a less invasive fix. > > > > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. > > OK for trunk? > > Ok, but I think you need to update the domwalk construction in > graphite-scop-detection.c as well - did you test w/o graphite? Thanks. I did test with graphite enabled; the use of default args meant I didn't have to touch that code. That said, I've become unhappy the two bool params (both defaulting to false) for dom_walker's ctor, especially given that they're really a tristate. So here's an alternative version of the patch, which, rather than adding a new bool, instead introduces a 3-valued enum to explicitly capture the valid possibilities. Also, having it as an enum rather than booleans makes the meaning clearer, and makes the places that override the default more obvious. (This version of the patch *did* require touching that graphite file). > grep might be your friend... (and indeed, with an enum, it's even more grep-friendly) Successfully bootstrapped®rtested on x86_64-pc-linux-gnu (with graphite). OK for trunk? (or did you prefer the earlier patch?) > Thanks, > Richard. [...snip...] gcc/ChangeLog: PR tree-optimization/83510 * domwalk.c (set_all_edges_as_executable): New function. (dom_walker::dom_walker): Convert bool param "skip_unreachable_blocks" to enum reachability. Move setup of edge flags to set_all_edges_as_executable and only do it when reachability is REACHABLE_BLOCKS. * domwalk.h (enum dom_walker::reachability): New enum. (dom_walker::dom_walker): Convert bool param "skip_unreachable_blocks" to enum reachability. (set_all_edges_as_executable): New decl. * graphite-scop-detection.c (gather_bbs::gather_bbs): Convert from false for "skip_unreachable_blocks" to ALL_BLOCKS for "reachability". * tree-ssa-dom.c (dom_opt_dom_walker::dom_opt_dom_walker): Likewise, but converting true to REACHABLE_BLOCKS. * tree-ssa-
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
Am 22.01.2018 um 20:59 schrieb Janne Blomqvist: On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas wrote: This patch has been triggered by Thomas's recent message to the list. Not only did I start work late relative to stage 3 but debugging took somewhat longer than anticipated. Therefore, to get this committed asap, we will have to beg the indulgence of the release managers and prompt review and/or testing by fortran maintainers. (Dominique has already undertaken to test -m32.) I think that if we can "guarantee" that we're happy with the current ABI for GCC 9 (and hopefully 10, 11, ...?) we have a quite strong case for committing it now. But if anybody if planning on doing some ABI-breaking work in the foreseeable future then maybe we should wait until GCC 9 stage1 opens. Anybody with such plans? For asynchronous I/O, we could add a pointer to void (unused at present) for later use. That's all from my side. It should be noted that some of the intrinsics, which use switch/case for the type/kind selection, limit the effective element size that they handle to the maximum value of size_t, less 7 bits. A bit of straightforward work there would fix this limitation and would allow the GFC_DTYPE shifts and masks to be eliminated. Hmm, is this a hidden ABI break, then? No. This concerns code like type_size = GFC_DTYPE_TYPE_SIZE(array); switch(type_size) { case GFC_DTYPE_LOGICAL_1: case GFC_DTYPE_INTEGER_1: case GFC_DTYPE_DERIVED_1: pack_i1 ((gfc_array_i1 *) ret, (gfc_array_i1 *) array, (gfc_array_l1 *) mask, (gfc_array_i1 *) vector); return; for example in patck_generic.c. I think that, if we commit Paul's patch now, we can then fix these cases before gcc 8 is released. This is rather straigtforward. Regards Thomas
Re: [PATCH] Finish removing class move_computations_dom_walker
On January 22, 2018 8:50:44 PM GMT+01:00, David Malcolm wrote: >r232820 (aka 2c7b2f8860794cc9b9cf5eeea9d7dc109c0de3be) removed the >implementation of class move_computations_dom_walker, but kept the >decl. > >This patch removes the stray decl. > >Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. >OK for trunk? OK. Richard. >gcc/ChangeLog: > PR tree-optimization/69452 > * tree-ssa-loop-im.c (class move_computations_dom_walker): Remove >decl. >--- > gcc/tree-ssa-loop-im.c | 11 --- > 1 file changed, 11 deletions(-) > >diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c >index 7864fbd..4655766 100644 >--- a/gcc/tree-ssa-loop-im.c >+++ b/gcc/tree-ssa-loop-im.c >@@ -1081,17 +1081,6 @@ invariantness_dom_walker::before_dom_children >(basic_block bb) > return NULL; > } > >-class move_computations_dom_walker : public dom_walker >-{ >-public: >- move_computations_dom_walker (cdi_direction direction) >-: dom_walker (direction), todo_ (0) {} >- >- virtual edge before_dom_children (basic_block); >- >- unsigned int todo_; >-}; >- >/* Hoist the statements in basic block BB out of the loops prescribed >by >data stored in LIM_DATA structures associated with each statement. >Callback >for walk_dominator_tree. */
Re: [wwwdocs] Add GCC 7.3 section
On Mon, 22 Jan 2018, Sebastian Huber wrote: >> +GCC 7.3 >> + >> +This is the > href="https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED&resolution=FIXED&target_milestone=7.3";>list >> +of problem reports (PRs) from GCC's bug tracking system that are >> +known to be fixed in the 7.3 release. This list might not be >> +complete (that is, it is possible that some PRs that have been fixed >> +are not listed here). Yes, thanks. (I would have been fine for you to declare this obvious and commit. ;-) >> +Operating Systems >> + >> +RTEMS >> + >> + Support for EPIPHANY has been added. >> + Now, this doesn't look like adding a GCC 7.3 section? ;-) Fine, too, though perhaps explain what EPIPHANY is? As in "Support for the EPIPHANY was-auch-immer has been added." Thanks, Gerald
Re: [C++ PATCH] Fix ICE in potential_constant_expression_1 (PR c++/83918)
On Thu, Jan 18, 2018 at 6:13 PM, Jakub Jelinek wrote: > Before location wrappers were introduced, the potential_constant_expression_1 > assumption that a VIEW_CONVERT_EXPR must have non-NULL type was right, > if it is a type dependent reinterpret cast, it would be > REINTERPRET_CAST_EXPR instead. > > Location wrappers around decls can have NULL type, if the decl they wrap > also has NULL type. Hmm, why would a decl have NULL type? Are we wrapping a CONST_DECL in VIEW_CONVERT_EXPR? They should get NON_LVALUE_EXPR, since they aren't lvalues (except the objc++ ones that have TREE_STATIC set). > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? Alternatively it could if (TREE_TYPE (t) && ... > > On a related note, I wonder if the > return (RECUR (from, TREE_CODE (t) != VIEW_CONVERT_EXPR)); > a few lines below is correct, VCE isn't always an lvalue, only if it is used > in non-rvalue contexts. So, shouldn't it be instead: > return (RECUR (from, (TREE_CODE (t) == VIEW_CONVERT_EXPR > ? want_rval : rval))); A VCE should always be an lvalue. If it's used in an rvalue context, then it's an lvalue used in an rvalue context and so the larger expression is an rvalue, but the VCE is still an lvalue. Jason
Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")
Hi, On 22/01/2018 16:52, Jason Merrill wrote: Thus the below, carefully tested over the we, would be a change completely removing the problematic error_at call, plus some testcases checking that we would still do the right thing in a few cases (bug submitter added constexpr-83921-2.C). The updated stmtexpr20.C shows that we would reject anyway a statement expression using an uninitialized inner, simply because the whole initializer would still be flagged as non const. Do we still diagnose it without the use of the variable? Ah, I missed that case. The updated stmtexpr20.C and the additional stmtexpr21.C reflect that. If not, how about adjusting check_for_uninitialized_const_var to support calling it from potential_constant_expression_1? I suppose we'd need to add a couple of parameters, one complain parm and another to indicate that the variable is in a constexpr context, and then return something. Ok. The below passes the C++ testsuite and I'm finishing testing it. Therefore, as you already hinted to, we can now say that what was *really* missing from potential_constant_expression_1 was the use of default_init_uninitialized_part, which does all the non-trivial work besides the later !DECL_NONTRIVIALLY_INITIALIZED_P check. check_for_uninitialized_const_var also provides the informs, which were completely missing. Thanks! Paolo. Index: cp/constexpr.c === --- cp/constexpr.c (revision 256961) +++ cp/constexpr.c (working copy) @@ -5707,13 +5707,9 @@ potential_constant_expression_1 (tree t, bool want "% in % context", tmp); return false; } - else if (!DECL_NONTRIVIALLY_INITIALIZED_P (tmp)) - { - if (flags & tf_error) - error_at (DECL_SOURCE_LOCATION (tmp), "uninitialized " - "variable %qD in % context", tmp); - return false; - } + else if (!check_for_uninitialized_const_var + (tmp, /*constexpr_context_p=*/true, flags)) + return false; } return RECUR (tmp, want_rval); Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 256961) +++ cp/cp-tree.h(working copy) @@ -6221,6 +6221,7 @@ extern tree finish_case_label (location_t, tree, extern tree cxx_maybe_build_cleanup(tree, tsubst_flags_t); extern bool check_array_designated_initializer (constructor_elt *, unsigned HOST_WIDE_INT); +extern bool check_for_uninitialized_const_var (tree, bool, tsubst_flags_t); /* in decl2.c */ extern void record_mangling(tree, bool); Index: cp/decl.c === --- cp/decl.c (revision 256961) +++ cp/decl.c (working copy) @@ -72,7 +72,6 @@ static int check_static_variable_definition (tree, static void record_unknown_type (tree, const char *); static tree builtin_function_1 (tree, tree, bool); static int member_function_or_else (tree, tree, enum overload_flags); -static void check_for_uninitialized_const_var (tree); static tree local_variable_p_walkfn (tree *, int *, void *); static const char *tag_name (enum tag_types); static tree lookup_and_check_tag (enum tag_types, tree, tag_scope, bool); @@ -5543,10 +5542,14 @@ maybe_commonize_var (tree decl) } } -/* Issue an error message if DECL is an uninitialized const variable. */ +/* Issue an error message if DECL is an uninitialized const variable. + CONSTEXPR_CONTEXT_P is true when the function is called in a constexpr + context from potential_constant_expression. Returns true if all is well, + false otherwise. */ -static void -check_for_uninitialized_const_var (tree decl) +bool +check_for_uninitialized_const_var (tree decl, bool constexpr_context_p, + tsubst_flags_t complain) { tree type = strip_array_types (TREE_TYPE (decl)); @@ -,26 +5558,36 @@ maybe_commonize_var (tree decl) 7.1.6 */ if (VAR_P (decl) && TREE_CODE (type) != REFERENCE_TYPE - && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)) + && (CP_TYPE_CONST_P (type) + || (!constexpr_context_p && var_in_constexpr_fn (decl)) + || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P (decl))) && !DECL_INITIAL (decl)) { tree field = default_init_uninitialized_part (type); if (!field) - return; + return true; if (CP_TYPE_CONST_P (type)) - permerror (DECL_SOURCE_LOCATION (decl), - "uninitialized const %qD", decl); - else { - if (!is_instantiation_of_constexpr (current_function_decl)) + if (complain & tf_error) + permerror (DECL_SOURCE_LOCATION (decl), +
Re: [C++ Patch] PR 83921 ("[7/8 Regression] GCC rejects constexpr initialization of empty aggregate")
Hi again, On 22/01/2018 22:50, Paolo Carlini wrote: Ok. The below passes the C++ testsuite and I'm finishing testing it. Therefore, as you already hinted to, we can now say that what was *really* missing from potential_constant_expression_1 was the use of default_init_uninitialized_part, which does all the non-trivial work besides the later !DECL_NONTRIVIALLY_INITIALIZED_P check. check_for_uninitialized_const_var also provides the informs, which were completely missing. Grrr. Testing the library revealed immediately the failure of 18_support/byte/ops.cc, because in constexpr_context_p == true, thus from potential_constant_expression_1, the case CP_TYPE_CONST_P triggers. I guess we really want to keep the existing constexpr_context_p == false cases separate. I'm therefore restarting testing with the below. Paolo. /// Index: cp/constexpr.c === --- cp/constexpr.c (revision 256961) +++ cp/constexpr.c (working copy) @@ -5707,13 +5707,9 @@ potential_constant_expression_1 (tree t, bool want "% in % context", tmp); return false; } - else if (!DECL_NONTRIVIALLY_INITIALIZED_P (tmp)) - { - if (flags & tf_error) - error_at (DECL_SOURCE_LOCATION (tmp), "uninitialized " - "variable %qD in % context", tmp); - return false; - } + else if (!check_for_uninitialized_const_var + (tmp, /*constexpr_context_p=*/true, flags)) + return false; } return RECUR (tmp, want_rval); Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 256961) +++ cp/cp-tree.h(working copy) @@ -6221,6 +6221,7 @@ extern tree finish_case_label (location_t, tree, extern tree cxx_maybe_build_cleanup(tree, tsubst_flags_t); extern bool check_array_designated_initializer (constructor_elt *, unsigned HOST_WIDE_INT); +extern bool check_for_uninitialized_const_var (tree, bool, tsubst_flags_t); /* in decl2.c */ extern void record_mangling(tree, bool); Index: cp/decl.c === --- cp/decl.c (revision 256961) +++ cp/decl.c (working copy) @@ -72,7 +72,6 @@ static int check_static_variable_definition (tree, static void record_unknown_type (tree, const char *); static tree builtin_function_1 (tree, tree, bool); static int member_function_or_else (tree, tree, enum overload_flags); -static void check_for_uninitialized_const_var (tree); static tree local_variable_p_walkfn (tree *, int *, void *); static const char *tag_name (enum tag_types); static tree lookup_and_check_tag (enum tag_types, tree, tag_scope, bool); @@ -5543,10 +5542,14 @@ maybe_commonize_var (tree decl) } } -/* Issue an error message if DECL is an uninitialized const variable. */ +/* Issue an error message if DECL is an uninitialized const variable. + CONSTEXPR_CONTEXT_P is true when the function is called in a constexpr + context from potential_constant_expression. Returns true if all is well, + false otherwise. */ -static void -check_for_uninitialized_const_var (tree decl) +bool +check_for_uninitialized_const_var (tree decl, bool constexpr_context_p, + tsubst_flags_t complain) { tree type = strip_array_types (TREE_TYPE (decl)); @@ -,26 +5558,39 @@ maybe_commonize_var (tree decl) 7.1.6 */ if (VAR_P (decl) && TREE_CODE (type) != REFERENCE_TYPE - && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl)) + && ((!constexpr_context_p + && (CP_TYPE_CONST_P (type) || var_in_constexpr_fn (decl))) + || (constexpr_context_p && !DECL_NONTRIVIALLY_INITIALIZED_P (decl))) && !DECL_INITIAL (decl)) { tree field = default_init_uninitialized_part (type); if (!field) - return; + return true; - if (CP_TYPE_CONST_P (type)) - permerror (DECL_SOURCE_LOCATION (decl), - "uninitialized const %qD", decl); - else + if (!constexpr_context_p) { - if (!is_instantiation_of_constexpr (current_function_decl)) - error_at (DECL_SOURCE_LOCATION (decl), - "uninitialized variable %qD in % function", - decl); - cp_function_chain->invalid_constexpr = true; + if (CP_TYPE_CONST_P (type)) + { + if (complain & tf_error) + permerror (DECL_SOURCE_LOCATION (decl), + "uninitialized const %qD", decl); + } + else + { + if (!is_instantiation_of_constexpr (current_function_decl) + && (complain & tf_error)) + error_at (DECL_SOUR
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
On Mon, Jan 22, 2018 at 09:59:41PM +0200, Janne Blomqvist wrote: > On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas > wrote: > > This patch has been triggered by Thomas's recent message to the list. > > Not only did I start work late relative to stage 3 but debugging took > > somewhat longer than anticipated. Therefore, to get this committed > > asap, we will have to beg the indulgence of the release managers and > > prompt review and/or testing by fortran maintainers. (Dominique has > > already undertaken to test -m32.) > > I think that if we can "guarantee" that we're happy with the current > ABI for GCC 9 (and hopefully 10, 11, ...?) we have a quite strong case > for committing it now. But if anybody if planning on doing some > ABI-breaking work in the foreseeable future then maybe we should wait > until GCC 9 stage1 opens. Anybody with such plans? > I don't have any plans to break the ABI in 9.0. OTOH, if the patch isn't included in 8.0, then we know that the ABI will be broken for 9.0. I think we should shoot for stabilizing the ABI by including this in 8.0. -- Steve
C++ PATCH for c++/83270, ICE with lambda and LTO
In this testcase, determine_visibility was relying on the template argument handling to constrain the lambda in h to internal linkage. In the new lambda model, the local lambda has no template info, so that was breaking. So we shouldn't rely on the decl itself to trigger checking the containing function template arguments. I initially just took the internal linkage from the containing function and cleared template_decl, but that seems like it could break -fvisibility-inlines-hidden. Tested x86_64-pc-linux-gnu, applying to trunk. 2018-01-22 Jason Merrill PR c++/83720 * decl2.c (determine_visibility): Fix template_decl handling for local lambda. diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index a2b2e2892b8..ef7e6de41c3 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -2416,7 +2416,7 @@ determine_visibility (tree decl) but have no TEMPLATE_INFO. Their containing template function does, and the local class could be constrained by that. */ - if (template_decl) + if (DECL_LANG_SPECIFIC (fn) && DECL_USE_TEMPLATE (fn)) template_decl = fn; } else if (VAR_P (decl) && DECL_TINFO_P (decl) diff --git a/gcc/testsuite/g++.dg/lto/pr83720_0.C b/gcc/testsuite/g++.dg/lto/pr83720_0.C new file mode 100644 index 000..4e63c9be7cd --- /dev/null +++ b/gcc/testsuite/g++.dg/lto/pr83720_0.C @@ -0,0 +1,55 @@ +// PR c++/83720 +// { dg-lto-do assemble } + +#pragma GCC diagnostic ignored "-Wreturn-type" + +namespace b { +class h { +public: + template h(ae af::*...) { +[] {}; + } +}; +class ai {}; +template class c { +public: + template void aj(const char *, ag f) { h(f, int()); } +}; +} +template class al; +template class i { +protected: + static e g(const int) { } +}; +template class j; +template +class j : i { + typedef i ap; + +public: + static an aq(const int &ar, ao... as) { ap::g(ar)(as...); } +}; +template class al { + template using ax = a; + +public: + template , typename = ax> + al(e); + using ay = an (*)(const int &, ao...); + ay az; +}; +template +template +al::al(e) { + az = j::aq; +} +class k { +public: + k(al); +} d([](b::ai) { + struct be { +virtual void f(); + }; + struct bf; + b::c().aj("", &be::f); +});
[committed] Fix ICE with -ftree-parallelize-loops=2 --param parloops-schedule=dynamic (PR tree-optimization/83957)
Hi! The ompexpssa pass will do TODO_update_ssa_only_virtuals, so don't need to handle in detail vops. In this case the inner_phi is NULL for the vop and we crash on it. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2018-01-22 Jakub Jelinek PR tree-optimization/83957 * omp-expand.c (expand_omp_for_generic): Ignore virtual PHIs. Remove semicolon after for body surrounded by braces. * gcc.dg/autopar/pr83957.c: New test. --- gcc/omp-expand.c.jj 2018-01-15 22:46:52.058226631 +0100 +++ gcc/omp-expand.c2018-01-22 13:20:50.614656137 +0100 @@ -3156,6 +3156,9 @@ expand_omp_for_generic (struct omp_regio gphi *nphi; gphi *exit_phi = psi.phi (); + if (virtual_operand_p (gimple_phi_result (exit_phi))) + continue; + edge l2_to_l3 = find_edge (l2_bb, l3_bb); tree exit_res = PHI_ARG_DEF_FROM_EDGE (exit_phi, l2_to_l3); @@ -3178,7 +3181,7 @@ expand_omp_for_generic (struct omp_regio add_phi_arg (nphi, exit_res, l2_to_l0, UNKNOWN_LOCATION); add_phi_arg (inner_phi, new_res, l0_to_l1, UNKNOWN_LOCATION); - }; + } } set_immediate_dominator (CDI_DOMINATORS, l2_bb, --- gcc/testsuite/gcc.dg/autopar/pr83957.c.jj 2018-01-22 13:32:32.841783616 +0100 +++ gcc/testsuite/gcc.dg/autopar/pr83957.c 2018-01-22 13:22:49.525114497 +0100 @@ -0,0 +1,11 @@ +/* PR tree-optimization/83957 */ +/* { dg-do compile } */ +/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-dce --param parloops-schedule=dynamic" } */ + +void +foo (int *x, int y) +{ + if (y < 0) +for (; y < 1; ++y) + x = &y; +} Jakub
[C++ PATCH] Fix a structured binding ICE (PR c++/83958)
Hi! I've recently added the complete_type call in case the structured binding is a reference to a type that needs to be instantiated. This testcase shows a problem where it can't be instantiated and we ICE because we don't expect an incomplete type. If decl isn't a reference, cp_finish_decl for it should handle everything. I've considered using complete_type_or_else, but we aren't checking that decl's type is complete, but rather that the type it refers to is complete, and furthermore not sure if it is very nice to print in the diagnostics. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-01-22 Jakub Jelinek PR c++/83958 * decl.c (cp_finish_decomp): Diagnose if reference structure binding refers to incomplete type. * g++.dg/cpp1z/decomp35.C: New test. --- gcc/cp/decl.c.jj2018-01-19 23:34:36.634278658 +0100 +++ gcc/cp/decl.c 2018-01-22 12:16:39.316524652 +0100 @@ -7433,6 +7433,12 @@ cp_finish_decomp (tree decl, tree first, type = complete_type (TREE_TYPE (type)); if (type == error_mark_node) goto error_out; + if (!COMPLETE_TYPE_P (type)) + { + error_at (loc, "structured binding refers to incomplete type %qT", + type); + goto error_out; + } } tree eltype = NULL_TREE; --- gcc/testsuite/g++.dg/cpp1z/decomp35.C.jj2018-01-22 12:18:19.421399364 +0100 +++ gcc/testsuite/g++.dg/cpp1z/decomp35.C 2018-01-22 12:22:28.243046089 +0100 @@ -0,0 +1,35 @@ +// PR c++/83958 +// { dg-do compile { target c++11 } } +// { dg-options "" } + +template struct A; +class B; +template > class C; +template struct D; +template +struct E { + using X = W; + X operator* (); + T operator++ (); + template + bool operator!= (E); +}; +template +struct F { + class G; + using H = D; + using I = E; + class G : public I {}; + G begin (); + G end (); +}; +template struct C : F { + using J = F; + using J::begin; + using J::end; +}; +using K = class L; +struct M { + void foo () { for (auto & [ a ] : m) {} }// { dg-error "incomplete type" } + C m; // { dg-warning "only available with" "" { target c++14_down } .-1 } +}; Jakub
Re: [C++ PATCH] Fix ICE in potential_constant_expression_1 (PR c++/83918)
On Mon, Jan 22, 2018 at 04:49:34PM -0500, Jason Merrill wrote: > On Thu, Jan 18, 2018 at 6:13 PM, Jakub Jelinek wrote: > > Before location wrappers were introduced, the > > potential_constant_expression_1 > > assumption that a VIEW_CONVERT_EXPR must have non-NULL type was right, > > if it is a type dependent reinterpret cast, it would be > > REINTERPRET_CAST_EXPR instead. > > > > Location wrappers around decls can have NULL type, if the decl they wrap > > also has NULL type. > > Hmm, why would a decl have NULL type? Are we wrapping a CONST_DECL in > VIEW_CONVERT_EXPR? They should get NON_LVALUE_EXPR, since they aren't > lvalues (except the objc++ ones that have TREE_STATIC set). So, do you want following instead, if it passes bootstrap/regtest? 2018-01-22 Jakub Jelinek PR c++/83918 * tree.c (maybe_wrap_with_location): Use NON_LVALUE_EXPR rather than VIEW_CONVERT_EXPR to wrap CONST_DECLs. * g++.dg/cpp1z/pr83918.C: New test. --- gcc/tree.c.jj 2018-01-16 16:16:37.124694238 +0100 +++ gcc/tree.c 2018-01-22 23:53:58.171837029 +0100 @@ -14085,8 +14085,10 @@ maybe_wrap_with_location (tree expr, loc if (EXCEPTIONAL_CLASS_P (expr)) return expr; - tree_code code = (CONSTANT_CLASS_P (expr) && TREE_CODE (expr) != STRING_CST - ? NON_LVALUE_EXPR : VIEW_CONVERT_EXPR); + tree_code code += (((CONSTANT_CLASS_P (expr) && TREE_CODE (expr) != STRING_CST) + || (TREE_CODE (expr) == CONST_DECL && !TREE_STATIC (expr))) + ? NON_LVALUE_EXPR : VIEW_CONVERT_EXPR); tree wrapper = build1_loc (loc, code, TREE_TYPE (expr), expr); /* Mark this node as being a wrapper. */ EXPR_LOCATION_WRAPPER_P (wrapper) = 1; --- gcc/testsuite/g++.dg/cpp1z/pr83918.C.jj 2018-01-22 23:54:40.824824960 +0100 +++ gcc/testsuite/g++.dg/cpp1z/pr83918.C2018-01-22 23:54:40.824824960 +0100 @@ -0,0 +1,32 @@ +// PR c++/83918 +// { dg-do compile } +// { dg-options "-std=c++17" } + +constexpr unsigned +foo (unsigned x, unsigned y) +{ + return x > y ? x : y; +} + +template struct A; +template struct B; +template +struct A , B > +{ + enum : unsigned + { +u = foo (sizeof (S), sizeof (U)), +v = A , B >::w, +w = foo (u, v) + }; +}; + +template <> +struct A , B <>> +{ + enum : unsigned { w = 0 }; +}; + +constexpr static const auto v { A , + B <9,8,7,6,5,4,3,2,1>>::w }; +static_assert (v == sizeof (int)); Jakub
Re: [PATCH, rev2], Fix PR target/pr83862: Fix PowerPC long double signbit with -mabi=ieeelongdouble
On Mon, Jan 22, 2018 at 12:32:13PM -0600, Segher Boessenkool wrote: > Hi Mike, > > If it is needed on the branches, okay for there too (once 7 reopens). It is needed on GCC 7 if you compile the test for -mcpu=power9 and use an explicit __float128 instead of long double (-mabi=ieeelongdouble is not really supported in GCC 7). I'll check GCC 6 as well. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [C++ PATCH] Fix ICE in potential_constant_expression_1 (PR c++/83918)
On Mon, Jan 22, 2018 at 6:14 PM, Jakub Jelinek wrote: > On Mon, Jan 22, 2018 at 04:49:34PM -0500, Jason Merrill wrote: >> On Thu, Jan 18, 2018 at 6:13 PM, Jakub Jelinek wrote: >> > Before location wrappers were introduced, the >> > potential_constant_expression_1 >> > assumption that a VIEW_CONVERT_EXPR must have non-NULL type was right, >> > if it is a type dependent reinterpret cast, it would be >> > REINTERPRET_CAST_EXPR instead. >> > >> > Location wrappers around decls can have NULL type, if the decl they wrap >> > also has NULL type. >> >> Hmm, why would a decl have NULL type? Are we wrapping a CONST_DECL in >> VIEW_CONVERT_EXPR? They should get NON_LVALUE_EXPR, since they aren't >> lvalues (except the objc++ ones that have TREE_STATIC set). > > So, do you want following instead, if it passes bootstrap/regtest? Looks good. Jason
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
On 01/22/2018 12:24 PM, Thomas Koenig wrote: > Am 22.01.2018 um 20:59 schrieb Janne Blomqvist: >> On Mon, Jan 22, 2018 at 9:12 PM, Paul Richard Thomas >> wrote: >>> This patch has been triggered by Thomas's recent message to the list. >>> Not only did I start work late relative to stage 3 but debugging took >>> somewhat longer than anticipated. Therefore, to get this committed >>> asap, we will have to beg the indulgence of the release managers and >>> prompt review and/or testing by fortran maintainers. (Dominique has >>> already undertaken to test -m32.) >> >> I think that if we can "guarantee" that we're happy with the current >> ABI for GCC 9 (and hopefully 10, 11, ...?) we have a quite strong case >> for committing it now. But if anybody if planning on doing some >> ABI-breaking work in the foreseeable future then maybe we should wait >> until GCC 9 stage1 opens. Anybody with such plans? > > For asynchronous I/O, we could add a pointer to void (unused at > present) for later use. That's all from my side. > Do you mean adding something like this: diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h index 4c643b7e17b..c86e0b45e1d 100644 --- a/libgfortran/libgfortran.h +++ b/libgfortran/libgfortran.h @@ -600,6 +600,7 @@ typedef struct st_parameter_common GFC_INTEGER_4 line; CHARACTER2 (iomsg); GFC_INTEGER_4 *iostat; + void *reserved; } st_parameter_common; and the corresponding front end side. > snip -- Regarding Paul's patch, assuming the -m32 tests can be confirmed I think the patch should go in now. Jerry
Re: [PATCH, fortran] Support Fortran 2018 teams
Is Fortran 2018 teams patch ok for trunk? Damian On January 19, 2018 at 2:47:39 PM, Alessandro Fanfarillo (elfa...@ucar.edu) wrote: I can confirm that the little change suggested by Steve passes the regtests (on x86_64-pc-linux-gnu) and the regular tests using OpenCoarrays. On Fri, Jan 19, 2018 at 10:33 AM, Steve Kargl wrote: > On Fri, Jan 19, 2018 at 09:18:14AM -0800, Damian Rouson wrote: >> Thanks for catching that, Steve, and for responding, Alessandro. >> >> Anything else? >> > > I've only just started to look at the patch. Unfortunately, > I know zero about teams, so need to read the patch and F2018 > standard simultaneously. > > -- > Steve -- Alessandro Fanfarillo, Ph.D. Postdoctoral Researcher National Center for Atmospheric Research Mesa Lab, Boulder, CO, USA 303-497-1229
libgo patch committed: Add buildid support for AIX
This patch by Tony Reix adds buildid support to the go tool for AIX. This supports caching of build results. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 256877) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -87525458bcd5ab4beb5b95e7d58e3dfdbc1bd478 +3488a401e50835de5de5c4f153772ac2798d0e71 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/cmd/go/internal/work/buildid.go === --- libgo/go/cmd/go/internal/work/buildid.go(revision 256794) +++ libgo/go/cmd/go/internal/work/buildid.go(working copy) @@ -330,6 +330,43 @@ func (b *Builder) gccgoBuildIDELFFile(a return sfile, nil } +// gccgoBuildIDXCOFFFile creates an assembler file that records the +// action's build ID in a CSECT (AIX linker deletes CSECTs that are +// not referenced in the output file). +func (b *Builder) gccgoBuildIDXCOFFFile(a *Action) (string, error) { + sfile := a.Objdir + "_buildid.s" + + var buf bytes.Buffer + fmt.Fprintf(&buf, "\t.csect .go.buildid[XO]\n") + fmt.Fprintf(&buf, "\t.byte ") + for i := 0; i < len(a.buildID); i++ { + if i > 0 { + if i%8 == 0 { + fmt.Fprintf(&buf, "\n\t.byte ") + } else { + fmt.Fprintf(&buf, ",") + } + } + fmt.Fprintf(&buf, "%#02x", a.buildID[i]) + } + fmt.Fprintf(&buf, "\n") + + if cfg.BuildN || cfg.BuildX { + for _, line := range bytes.Split(buf.Bytes(), []byte("\n")) { + b.Showcmd("", "echo '%s' >> %s", line, sfile) + } + if cfg.BuildN { + return sfile, nil + } + } + + if err := ioutil.WriteFile(sfile, buf.Bytes(), 0666); err != nil { + return "", err + } + + return sfile, nil +} + // buildID returns the build ID found in the given file. // If no build ID is found, buildID returns the content hash of the file. func (b *Builder) buildID(file string) string { Index: libgo/go/cmd/go/internal/work/exec.go === --- libgo/go/cmd/go/internal/work/exec.go (revision 256873) +++ libgo/go/cmd/go/internal/work/exec.go (working copy) @@ -637,6 +637,16 @@ func (b *Builder) build(a *Action) (err return err } objects = append(objects, ofiles...) + case "aix": + asmfile, err := b.gccgoBuildIDXCOFFFile(a) + if err != nil { + return err + } + ofiles, err := BuildToolchain.asm(b, a, []string{asmfile}) + if err != nil { + return err + } + objects = append(objects, ofiles...) } } Index: libgo/go/cmd/internal/buildid/buildid.go === --- libgo/go/cmd/internal/buildid/buildid.go(revision 256593) +++ libgo/go/cmd/internal/buildid/buildid.go(working copy) @@ -7,6 +7,7 @@ package buildid import ( "bytes" "debug/elf" + "debug/xcoff" "fmt" "io" "os" @@ -40,6 +41,9 @@ func ReadFile(name string) (id string, e return "", err } if string(buf) != "!\n" { + if string(buf) == "\n" { + return readGccgoBigArchive(name, f) + } return readBinary(name, f) } @@ -156,6 +160,85 @@ func readGccgoArchive(name string, f *os } } } + +// readGccgoBigArchive tries to parse the archive as an AIX big +// archive file, and fetch the build ID from the _buildid.o entry. +// The _buildid.o entry is written by (*Builder).gccgoBuildIDXCOFFFile +// in cmd/go/internal/work/exec.go. +func readGccgoBigArchive(name string, f *os.File) (string, error) { + bad := func() (string, error) { + return "", &os.PathError{Op: "parse", Path: name, Err: errBuildIDMalformed} + } + + // Read fixed-length header. + if _, err := f.Seek(0, io.SeekStart); err != nil { + return "", err + } + var flhdr [128]byte + if _, err := io.ReadFull(f, flhdr[:]); err != nil { + return "", err + } + // Read first member offset. + offStr := strings.TrimSpace(string(flhdr[68:88])) + off, err := strconv.ParseInt(offStr, 10, 64) + if e
Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure
On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak wrote: > > The default "go build" compile options over-optimize the auxiliary > executable, built in TestCrashDumpsAllThreads testcase > (libgo/go/runtime/crash_unix_test.go). This over-optimization results > in removal of the trivial summing loop and in the inlining of the > main.loop function into main.$thunk0. > > The testcase expects backtrace that shows several instances of > main.loop in the backtrace dump. When main.loop gets inlined into > thunk, its name is not dumped anymore. > > The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining. > > Patch was bootstrapped and regression tested on x86_64-linux-gnu and > alphev68-linux-gnu, where for the later target the patch fixed the > mentioned failure. That sounds like a bug somewhere. Even when one function gets inlined into another, its name should still be dumped. This is implemented by report_inlined_functions in libbacktrace/dwarf.c. While something like your patch may be needed, I'd like to more clearly understand why libbacktrace isn't working. Ian
Re: [PATCH, fortran] Support Fortran 2018 teams
I'm heading out of town for a meeting at the end of week, so gfortran patches/reviews are on hold at the moment. If someone else wants to step up to review the patch, I won't object. -- steve On Mon, Jan 22, 2018 at 08:29:41PM -0800, Damian Rouson wrote: > Is Fortran 2018 teams patch ok for trunk? > > Damian > > On January 19, 2018 at 2:47:39 PM, Alessandro Fanfarillo (elfa...@ucar.edu) > wrote: > > I can confirm that the little change suggested by Steve passes the > regtests (on x86_64-pc-linux-gnu) and the regular tests using > OpenCoarrays. > > On Fri, Jan 19, 2018 at 10:33 AM, Steve Kargl > wrote: > > On Fri, Jan 19, 2018 at 09:18:14AM -0800, Damian Rouson wrote: > >> Thanks for catching that, Steve, and for responding, Alessandro. > >> > >> Anything else? > >> > > > > I've only just started to look at the patch. Unfortunately, > > I know zero about teams, so need to read the patch and F2018 > > standard simultaneously. > > > > -- > > Steve > > > > -- > > Alessandro Fanfarillo, Ph.D. > Postdoctoral Researcher > National Center for Atmospheric Research > Mesa Lab, Boulder, CO, USA > 303-497-1229 -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow
Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15
Dear Paul, The test suite passed without new regression with both -m32 and -m64. Thanks for the work, Dominique
Re: [PATCH, gotools]: Fix TestCrashDumpsAllThreads testcase failure
On Tue, Jan 23, 2018 at 5:49 AM, Ian Lance Taylor wrote: > On Sun, Jan 21, 2018 at 3:13 PM, Uros Bizjak wrote: >> >> The default "go build" compile options over-optimize the auxiliary >> executable, built in TestCrashDumpsAllThreads testcase >> (libgo/go/runtime/crash_unix_test.go). This over-optimization results >> in removal of the trivial summing loop and in the inlining of the >> main.loop function into main.$thunk0. >> >> The testcase expects backtrace that shows several instances of >> main.loop in the backtrace dump. When main.loop gets inlined into >> thunk, its name is not dumped anymore. >> >> The solution is to pass "-O0" to gccgo, which inhibits unwanted inlining. >> >> Patch was bootstrapped and regression tested on x86_64-linux-gnu and >> alphev68-linux-gnu, where for the later target the patch fixed the >> mentioned failure. > > That sounds like a bug somewhere. Even when one function gets inlined > into another, its name should still be dumped. This is implemented by > report_inlined_functions in libbacktrace/dwarf.c. While something > like your patch may be needed, I'd like to more clearly understand why > libbacktrace isn't working. If I manually compile the testcase from crash_unix_test.go wth gccgo -O2 (the asm is attached), the main.loop function is not even present in the dump. Even gdb session shows: (gdb) b main.loop Breakpoint 1 at 0x1200021f8: file main.go, line 11. (gdb) r Starting program: /space/homedirs/uros/a.out [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". [New Thread 0x200028831e0 (LWP 23072)] [New Thread 0x2000308b1e0 (LWP 23073)] [New Thread 0x20003c931e0 (LWP 23074)] [New Thread 0x20008fff1e0 (LWP 23075)] [New Thread 0x2000a7ff1e0 (LWP 23076)] [Switching to Thread 0x20003c931e0 (LWP 23074)] Thread 4 "a.out" hit Breakpoint 1, main.$thunk0 () at main.go:25 25 go loop(i, chans[i]) (gdb) bt #0 main.$thunk0 () at main.go:25 #1 0x02000100bf54 in runtime.kickoff () at /space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/proc.go:1161 #2 0x020001a4f934 in ?? () from /lib/libc.so.6.1 While it is possible to put breakpoint at main.loop, the gdb backtrace omits function name and displays only thunk. When -O0 is used, the gdb backtrace shows: (gdb) bt #0 main.loop () at main.go:11 #1 0x00012000237c in main.$thunk0 () at main.go:25 #2 0x02000100bf54 in runtime.kickoff () at /space/homedirs/uros/gcc-svn/trunk/libgo/go/runtime/proc.go:1161 #3 0x020001a4f934 in ?? () from /lib/libc.so.6.1 I don't think this is a bug in libbacktrace, the function simply isn't there. Uros. main.s.gz Description: GNU Zip compressed data