RE: [PATCH v2] aarch64: Add cpu cost tables for A64FX
Hi Richard > >> > ChangeLog: > >> > 2021-01-08 Qian jianhua > >> > > >> > gcc/ > >> > * config/aarch64/aarch64-cost-tables.h (a64fx_extra_costs): New. > >> > * config/aarch64/aarch64.c (a64fx_addrcost_table): New. > >> > (a64fx_regmove_cost, a64fx_vector_cost): New. > >> > (a64fx_tunings): Use the new added cost tables. > >> > >> OK for trunk, thanks. The v1 patch is OK for branches that support > >> -mcpu=a64fx. > >> > >> Would you like commit access, so that you can commit it yourself? > >> If so, please fill out the form mentioned at the beginning of > >> https://gcc.gnu.org/gitwrite.html listing me as sponsor. > >> > > It‘s my pleasure. I've applied it. > > Great! > > > Thank you so much. > > > > I don't quite know the process of gcc source committing. > > If I have the commit access, how will process be different? > > The patch submission process is pretty much the same: patches need to be > sent to the list and most patches need to be approved by a reviewer or > maintainer. The main differences are: > > - If a patch is “obviously correct”, you can apply it without going > through the approval process. (Please still send the patch to the > list though.) > > - Once a patch has been approved, you can commit the patch yourself, > rather than rely on someone else to do it for you. The main benefits > of this are: > > - You can commit from the tree that you actually tested. > > - You can deal with any merge conflicts caused by other people's > patches without having to go through another review cycle. (Most > merge conflict resolutions are “obvious” and so don't need approval.) > > - A typical workflow is to test a patch on trunk, post it for review, > and ask for approval to apply the patch to both trunk and whichever > branches are appropriate. If the patch is approved, you can later > test the patch on the approved branches (at your own pace) and > apply it if the tests pass. > > In terms of the mechanics of committing, just “git push” should work. > The server hooks will check for things like a well-formed changelog. > Thanks for detailed information. I understood. > https://gcc.gnu.org/gitwrite.html has more info about the process in general. > Quoting from that page, the next step is: > > Check out a tree using the instructions below and add yourself to the > MAINTAINERS file. Note: Your first and last names must be exactly the > same between your account on gcc.gnu.org and the MAINTAINERS file. > Place your name in the correct section following the conventions > specified in the file (e.g. "Write After Approval" is "last name > alphabetical order"). > I've already added myself to MAINTAINERS file. > Then produce a diff to that file and circulate it to the gcc-patches > list, whilst also checking in your change to test write access > (approval from the mailing list is not needed in this one case). For > all other changes, please be sure to follow the write access policies > below. > > > And which branch, which range(aarch64?) could I commit patches to? > > This patch should go to master. The v1 patch should go to > releases/gcc-10 and releases/gcc-9. > > You might need to remove some lines from the cost tables when backporting to > gcc-10 and gcc-9 (I haven't checked). If so, that kind of change counts as > “obviously correct” and so doesn't need approval. > I will commit this patch to master tomorrow. Then make some changes, and push to gcc-10 and gcc-9. > Hope this helps. Please let me know if you have any questions. > Thanks again. Regards, Qian > Thanks, > Richard >
Re: [PATCH] match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]
On Mon, 11 Jan 2021, Jeff Law wrote: On 1/9/21 5:43 PM, Maciej W. Rozycki wrote: On Mon, 21 Dec 2020, Jakub Jelinek wrote: This patch adds the ~(X - Y) -> ~X + Y simplification requested in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can be safely negated. This regresses VAX code produced by the cmpelim-eq-notsi.c test case (and its similar counterparts) with the `vax-netbsdelf' target. The point of the match.pd changes is to canonicalize GIMPLE on some form when there are several from GIMPLE POV equivalent or better forms of writing the same thing. The advantage of having one canonical way is that ICF, SCCVN etc. optimizations can then understand the different forms are equivalent. Fair enough, though in cases like this I think it is unclear which of the two forms is going to be ultimately better, especially as it may depend on the exact form of the operands used, e.g. values of any immediates, so I think a way to make the reverse transformation (whether to undo one made here or genuinely) needs to be available at a later compilation stage. One size doesn't fit all. With this in mind... So in this case the number of operations are the same before/after and parallelism is the same before/after, register lifetimes, etc. I doubt either form is particularly better suited for CSE or gives better VRP data, etc. The fact that we can't always do ~(X +C) -> ~X + -C probably argues against that form ever so slightly. If another form is then better for a particular machine, it should be done either during expansion (being able to produce both RTLs and computing their costs), or during combine with either combine splitters or define_insn_and_split in the backend, or, if it can't be done in RTL, during the isel pass. Hmm, maybe it has been discussed before, so please forgive me if I write something silly, but it seems to me like this should be done in a generic way like match.pd so that all the targets do not have to track the changes made there and then perhaps repeat the same or similar code each. So I think it would make sense to make a change like this include that reverse transformation as well, so that ultimately both forms are tried with RTL, as there is no clear advantage to either here. The idea we've kicked around in the past was to use the same syntax as match.pd, but have it be target dependent to reform expressions in ways that are beneficial to the target and have it run at the end of the gimple/ssa pipeline. Nobody's implemented this though. Yes. And while a gimple-to-gimple transform is indeed quite simple to make eventually a match.pd-like gimple-to-RTL would be more useful in the end. Note RTL can eventually be emulated close enough via the use of internal functions mapping to optabs. But still complex combined instructions will need expander help unless we expose all named instruction RTL patterns as target specific internal functions to use from that .pd file. Richard. jeff
[PATCH] reassoc: Optimize in reassoc x < 0 && y < 0 to (x | y) < 0 etc. [PR95731]
Hi! We already had x != 0 && y != 0 to (x | y) != 0 and x != -1 && y != -1 to (x & y) != -1 and x < 32U && y < 32U to (x | y) < 32U, this patch adds signed x < 0 && y < 0 to (x | y) < 0. In that case, the low/high seem to be always the same and just in_p indices whether it is >= 0 or < 0, also, all types in the same bucket (same precision) should be type compatible, but we can have some >= 0 and some < 0 comparison mixed, so the patch handles that by using the right BIT_IOR_EXPR or BIT_AND_EXPR and doing one set of < 0 or >= 0 first, then BIT_NOT_EXPR and then the other one. I had to move optimize_range_tests_var_bound before this optimization because that one deals with signed a >= 0 && a < b, and limited it to the last reassoc pass as reassoc itself can't virtually undo this optimization yet (and not sure if vrp would be able to). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-01-12 Jakub Jelinek PR tree-optimization/95731 * tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Also optimize x < 0 && y < 0 && z < 0 into (x | y | z) < 0 for signed x, y, z. (optimize_range_tests): Call optimize_range_tests_cmp_bitwise only after optimize_range_tests_var_bound. * gcc.dg/tree-ssa/pr95731.c: New test. * gcc.c-torture/execute/pr95731.c: New test. --- gcc/tree-ssa-reassoc.c.jj 2021-01-11 10:35:02.204501369 +0100 +++ gcc/tree-ssa-reassoc.c 2021-01-11 15:20:17.578155827 +0100 @@ -3320,7 +3320,8 @@ optimize_range_tests_to_bit_test (enum t /* Optimize x != 0 && y != 0 && z != 0 into (x | y | z) != 0 and similarly x != -1 && y != -1 && y != -1 into (x & y & z) != -1. Also, handle x < C && y < C && z < C where C is power of two as - (x | y | z) < C. */ + (x | y | z) < C. And also handle signed x < 0 && y < 0 && z < 0 + as (x | y | z) < 0. */ static bool optimize_range_tests_cmp_bitwise (enum tree_code opcode, int first, int length, @@ -3340,13 +3341,13 @@ optimize_range_tests_cmp_bitwise (enum t if (ranges[i].exp == NULL_TREE || TREE_CODE (ranges[i].exp) != SSA_NAME - || !ranges[i].in_p || TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) <= 1 || TREE_CODE (TREE_TYPE (ranges[i].exp)) == BOOLEAN_TYPE) continue; if (ranges[i].low != NULL_TREE && ranges[i].high != NULL_TREE + && ranges[i].in_p && tree_int_cst_equal (ranges[i].low, ranges[i].high)) { idx = !integer_zerop (ranges[i].low); @@ -3354,7 +3355,8 @@ optimize_range_tests_cmp_bitwise (enum t continue; } else if (ranges[i].high != NULL_TREE - && TREE_CODE (ranges[i].high) == INTEGER_CST) + && TREE_CODE (ranges[i].high) == INTEGER_CST + && ranges[i].in_p) { wide_int w = wi::to_wide (ranges[i].high); int prec = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)); @@ -3370,10 +3372,20 @@ optimize_range_tests_cmp_bitwise (enum t && integer_zerop (ranges[i].low continue; } + else if (ranges[i].high == NULL_TREE + && ranges[i].low != NULL_TREE + /* Perform this optimization only in the last + reassoc pass, as it interferes with the reassociation + itself or could also with VRP etc. which might not + be able to virtually undo the optimization. */ + && !reassoc_insert_powi_p + && !TYPE_UNSIGNED (TREE_TYPE (ranges[i].exp)) + && integer_zerop (ranges[i].low)) + idx = 3; else continue; - b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 3 + idx; + b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 4 + idx; if (buckets.length () <= b) buckets.safe_grow_cleared (b + 1, true); if (chains.length () <= (unsigned) i) @@ -3386,7 +3398,7 @@ optimize_range_tests_cmp_bitwise (enum t if (i && chains[i - 1]) { int j, k = i; - if ((b % 3) == 2) + if ((b % 4) == 2) { /* When ranges[X - 1].high + 1 is a power of two, we need to process the same bucket up to @@ -3439,6 +3451,19 @@ optimize_range_tests_cmp_bitwise (enum t { tree type = TREE_TYPE (ranges[j - 1].exp); strict_overflow_p |= ranges[j - 1].strict_overflow_p; + if ((b % 4) == 3) + { + /* For the signed < 0 cases, the types should be + really compatible (all signed with the same precision, + instead put ranges that have different in_p from + k first. */ + if (!useless_type_conversion_p (type1, type)) + continue; + if (ranges[j - 1].in_p != ranges[k - 1].in_p) + candidates.safe_push (&ranges[j - 1]); + type2 = type1; + continue; +
[PATCH] configure, make: Fix up --enable-link-serialization
Hi! As reported by Matthias, --enable-link-serialization=1 can currently start two concurrent links first (e.g. gnat1 and cc1). The problem is that make var = value values seem to work differently between dependencies and actual rules (where it was tested). As the language make fragments can be in different order, we can have: # Part of Makefile added by configure ada.prev = ... magic that will become $(c.serial) under --enable-link-serialization=1 # ada/gcc-interface/Make-lang.in gnat1$(exe): . $(ada.prev) ... # c/Make-lang.in c.serial = cc1$(exe) and while if I add echo $(ada.prev) in the gnat1 rule's command, it prints cc1, the dependencies are actually evaluated during reading of the goal or when. The configure creates (and puts into Makefile) some serialization order of the languages and in that order c always comes first, and the rest is actually sorted the way the all_lang_makefrags are already sorted, so just by forcing c/Make-lang.in first we achieve that X.serial variable is always defined before some other Y.prev will use it in its goal dependencies. Bootstrapped/regtested on x86_64-linux and i686-linux, without --enable-link-serialization altogether, with =1 and with =3, all results look good. Ok for trunk? 2021-01-12 Jakub Jelinek * configure.ac: Ensure c/Make-lang.in comes first in @all_lang_makefrags@. * configure: Regenerated. --- gcc/configure.ac.jj 2021-01-05 13:57:59.911905006 +0100 +++ gcc/configure.ac2021-01-11 14:49:22.878218318 +0100 @@ -6975,7 +6975,12 @@ changequote([,])dnl $ok || continue all_lang_configurefrags="$all_lang_configurefrags \$(srcdir)/$gcc_subdir/config-lang.in" - all_lang_makefrags="$all_lang_makefrags \$(srcdir)/$gcc_subdir/Make-lang.in" + if test "x$language" = xc && test -n "$all_lang_makefrags"; then + # Put c/Make-lang.in fragment first to match serialization languages order. + all_lang_makefrags="\$(srcdir)/$gcc_subdir/Make-lang.in $all_lang_makefrags" + else + all_lang_makefrags="$all_lang_makefrags \$(srcdir)/$gcc_subdir/Make-lang.in" + fi if test -f $srcdir/$gcc_subdir/lang.opt; then lang_opt_files="$lang_opt_files $srcdir/$gcc_subdir/lang.opt" all_opt_files="$all_opt_files $srcdir/$gcc_subdir/lang.opt" --- gcc/configure.jj2021-01-06 22:17:07.275967932 +0100 +++ gcc/configure 2021-01-11 14:49:29.708140555 +0100 @@ -31174,7 +31174,12 @@ do $ok || continue all_lang_configurefrags="$all_lang_configurefrags \$(srcdir)/$gcc_subdir/config-lang.in" - all_lang_makefrags="$all_lang_makefrags \$(srcdir)/$gcc_subdir/Make-lang.in" + if test "x$language" = xc && test -n "$all_lang_makefrags"; then + # Put c/Make-lang.in fragment first to match serialization languages order. + all_lang_makefrags="\$(srcdir)/$gcc_subdir/Make-lang.in $all_lang_makefrags" + else + all_lang_makefrags="$all_lang_makefrags \$(srcdir)/$gcc_subdir/Make-lang.in" + fi if test -f $srcdir/$gcc_subdir/lang.opt; then lang_opt_files="$lang_opt_files $srcdir/$gcc_subdir/lang.opt" all_opt_files="$all_opt_files $srcdir/$gcc_subdir/lang.opt" Jakub
[PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]
Hi! The following patch adds patterns (in the end I went with define_insn rather than combiner define_split + define_insn_and_split I initially hoped or define_insn_and_split) to represent (so far 128-bit only) permutations like { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 } where the second operand is CONST0_RTX CONST_VECTOR as pmovzx. define_split didn't work (due to the combiner not trying combine_split_insn when i1 is NULL) but in the end was quite large, and the reason for not trying to split this afterwards is the different vector mode of the output, and lowpart_subreg on the result is undesirable, so we'd need to split it into two instructions and hope some later pass optimizes the move into just rewriting the uses using lowpart_subreg. While I initially wrote also avx2 and avx512{bw,f} patterns, turns out that without further changes they aren't useful, such permutations are expanded into multiple permutation instructions and while combiner in theory could be able to undo that, what stops it is that the const_vector of 0 has multiple uses. So either we'd need to allow pre-reload const0_operand next to registers (and sometimes memory) in some permutations, or perhaps change the vec_perm_const target hook calling code to not force zero vectors into registers and change each vec_perm_const hook to force it into register itself if it can't expand such a permutation smartly. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-01-12 Jakub Jelinek PR target/95905 * config/i386/predicates.md (pmovzx_parallel): New predicate. * config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3, *sse4_1_zero_extendv4hiv4si2_3, *sse4_1_zero_extendv2siv2di2_3): New define_insn patterns. * gcc.target/i386/pr95905-1.c: New test. * gcc.target/i386/pr95905-2.c: New test. --- gcc/config/i386/predicates.md.jj2021-01-04 10:25:45.248161185 +0100 +++ gcc/config/i386/predicates.md 2021-01-11 19:40:04.793433826 +0100 @@ -1600,6 +1600,38 @@ (define_predicate "addsub_vs_parallel" return true; }) +;; Return true if OP is a parallel for an pmovz{bw,wd,dq} vec_select, +;; where one of the two operands of the vec_concat is const0_operand. +(define_predicate "pmovzx_parallel" + (and (match_code "parallel") + (match_code "const_int" "a")) +{ + int nelt = XVECLEN (op, 0); + int elt, i; + + if (nelt < 2) +return false; + + /* Check that the permutation is suitable for pmovz{bw,wd,dq}. + For example { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 }. */ + elt = INTVAL (XVECEXP (op, 0, 0)); + if (elt == 0) +{ + for (i = 1; i < nelt; ++i) + if ((i & 1) != 0) + { + if (INTVAL (XVECEXP (op, 0, i)) < nelt) + return false; + } + else if (INTVAL (XVECEXP (op, 0, i)) != i / 2) + return false; +} + else +return false; + + return true; +}) + ;; Return true if OP is a parallel for a vbroadcast permute. (define_predicate "avx_vbroadcast_operand" (and (match_code "parallel") --- gcc/config/i386/sse.md.jj 2021-01-07 15:29:52.604974544 +0100 +++ gcc/config/i386/sse.md 2021-01-11 21:19:06.247346965 +0100 @@ -17683,6 +17683,27 @@ (define_insn_and_split "*sse4_1_v8 (any_extend:V8HI (match_dup 1)))] "operands[1] = adjust_address_nv (operands[1], V8QImode, 0);") +(define_insn "*sse4_1_zero_extendv8qiv8hi2_3" + [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "vector_operand" "Yrm,*xm,vm") + (match_operand:V16QI 2 "const0_operand" "C,C,C")) + (match_parallel 3 "pmovzx_parallel" + [(match_operand 4 "const_int_operand" "n,n,n")])))] + "TARGET_SSE4_1" +{ + if (MEM_P (operands[1])) +return "%vpmovzxbw\t{%1, %0|%0, %1}"; + else +return "%vpmovzxbw\t{%1, %0|%0, %q1}"; +} + [(set_attr "isa" "noavx,noavx,avx") + (set_attr "type" "ssemov") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "orig,orig,maybe_evex") + (set_attr "mode" "TI")]) + (define_expand "v8qiv8hi2" [(set (match_operand:V8HI 0 "register_operand") (any_extend:V8HI @@ -17929,6 +17950,27 @@ (define_expand "v4hiv4si2" } }) +(define_insn "*sse4_1_zero_extendv4hiv4si2_3" + [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "vector_operand" "Yrm,*xm,vm") + (match_operand:V8HI 2 "const0_operand" "C,C,C")) + (match_parallel 3 "pmovzx_parallel" + [(match_operand 4 "const_int_operand" "n,n,n")])))] + "TARGET_SSE4_1" +{ + if (MEM_P (operands[1])) +return "%vpmovzxwd\t{%1, %0|%0, %1}"; + else +return "%vpmovzxwd\t{%1, %0|%0, %q1}"; +} + [(set_attr "isa" "noavx,noavx,avx") + (set_attr "type" "ssemov") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "orig,orig,maybe_evex") + (set_attr "mode" "TI
[PATCH] widening_mul: Fix up ICE caused by my signed multiplication overflow pattern recognition changes [PR98629]
Hi! As the testcase shows, my latest changes caused ICE on that testcase. The problem is that arith_overflow_check_p now can change the use_stmt argument (has a reference), so that if it succeeds (returns non-zero), it points it to the GIMPLE_COND or EQ/NE or COND_EXPR assignment from the TRUNC_DIV_EXPR assignment. The problem was that it would change use_stmt also if it returned 0 in some cases, such as multiple imm uses of the division, and in one of the callers if arith_overflow_check_p returns 0 it looks at use_stmt again and performs other checks, which of course assumes that use_stmt is the one passed to arith_overflow_check_p and not e.g. NULL instead or some other unrelated stmt. The following patch fixes that by only changing use_stmt when we are about to return non-zero (for the MULT_EXPR case, which is the only one with the need to use different use_stmt). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-01-12 Jakub Jelinek PR tree-optimization/98629 * tree-ssa-math-opts.c (arith_overflow_check_p): Don't update use_stmt unless returning non-zero. * gcc.c-torture/compile/pr98629.c: New test. --- gcc/tree-ssa-math-opts.c.jj 2021-01-11 10:35:02.196501461 +0100 +++ gcc/tree-ssa-math-opts.c2021-01-11 23:56:24.154380944 +0100 @@ -3667,6 +3667,7 @@ arith_overflow_check_p (gimple *stmt, gi tree rhs1 = gimple_assign_rhs1 (stmt); tree rhs2 = gimple_assign_rhs2 (stmt); tree multop = NULL_TREE, divlhs = NULL_TREE; + gimple *cur_use_stmt = use_stmt; if (code == MULT_EXPR) { @@ -3697,26 +3698,26 @@ arith_overflow_check_p (gimple *stmt, gi if (!divlhs) return 0; use_operand_p use; - if (!single_imm_use (divlhs, &use, &use_stmt)) + if (!single_imm_use (divlhs, &use, &cur_use_stmt)) return 0; } - if (gimple_code (use_stmt) == GIMPLE_COND) + if (gimple_code (cur_use_stmt) == GIMPLE_COND) { - ccode = gimple_cond_code (use_stmt); - crhs1 = gimple_cond_lhs (use_stmt); - crhs2 = gimple_cond_rhs (use_stmt); + ccode = gimple_cond_code (cur_use_stmt); + crhs1 = gimple_cond_lhs (cur_use_stmt); + crhs2 = gimple_cond_rhs (cur_use_stmt); } - else if (is_gimple_assign (use_stmt)) + else if (is_gimple_assign (cur_use_stmt)) { - if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS) + if (gimple_assign_rhs_class (cur_use_stmt) == GIMPLE_BINARY_RHS) { - ccode = gimple_assign_rhs_code (use_stmt); - crhs1 = gimple_assign_rhs1 (use_stmt); - crhs2 = gimple_assign_rhs2 (use_stmt); + ccode = gimple_assign_rhs_code (cur_use_stmt); + crhs1 = gimple_assign_rhs1 (cur_use_stmt); + crhs2 = gimple_assign_rhs2 (cur_use_stmt); } - else if (gimple_assign_rhs_code (use_stmt) == COND_EXPR) + else if (gimple_assign_rhs_code (cur_use_stmt) == COND_EXPR) { - tree cond = gimple_assign_rhs1 (use_stmt); + tree cond = gimple_assign_rhs1 (cur_use_stmt); if (COMPARISON_CLASS_P (cond)) { ccode = TREE_CODE (cond); @@ -3792,11 +3793,17 @@ arith_overflow_check_p (gimple *stmt, gi { if ((crhs1 == divlhs && arith_cast_equal_p (crhs2, multop)) || (crhs2 == divlhs && arith_cast_equal_p (crhs1, multop))) - return ccode == NE_EXPR ? 1 : -1; + { + use_stmt = cur_use_stmt; + return ccode == NE_EXPR ? 1 : -1; + } } else if ((crhs1 == divlhs && operand_equal_p (crhs2, multop, 0)) || (crhs2 == divlhs && crhs1 == multop)) - return ccode == NE_EXPR ? 1 : -1; + { + use_stmt = cur_use_stmt; + return ccode == NE_EXPR ? 1 : -1; + } } break; default: --- gcc/testsuite/gcc.c-torture/compile/pr98629.c.jj2021-01-12 00:00:21.132949613 +0100 +++ gcc/testsuite/gcc.c-torture/compile/pr98629.c 2021-01-11 23:59:58.178206894 +0100 @@ -0,0 +1,11 @@ +/* PR tree-optimization/98629 */ + +unsigned int a; +int b, c; + +void +foo (void) +{ + unsigned int *e = &a; + (a /= a |= b) - (0 <= (*e += *e)) * (c *= *e); +} Jakub
Re: [PATCH] reassoc: Optimize in reassoc x < 0 && y < 0 to (x | y) < 0 etc. [PR95731]
On Tue, 12 Jan 2021, Jakub Jelinek wrote: > Hi! > > We already had x != 0 && y != 0 to (x | y) != 0 and > x != -1 && y != -1 to (x & y) != -1 and > x < 32U && y < 32U to (x | y) < 32U, this patch adds signed > x < 0 && y < 0 to (x | y) < 0. In that case, the low/high seem to be > always the same and just in_p indices whether it is >= 0 or < 0, > also, all types in the same bucket (same precision) should be type > compatible, but we can have some >= 0 and some < 0 comparison mixed, > so the patch handles that by using the right BIT_IOR_EXPR or BIT_AND_EXPR > and doing one set of < 0 or >= 0 first, then BIT_NOT_EXPR and then the other > one. I had to move optimize_range_tests_var_bound before this optimization > because that one deals with signed a >= 0 && a < b, and limited it to the > last reassoc pass as reassoc itself can't virtually undo this optimization > yet (and not sure if vrp would be able to). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. > 2021-01-12 Jakub Jelinek > > PR tree-optimization/95731 > * tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Also optimize > x < 0 && y < 0 && z < 0 into (x | y | z) < 0 for signed x, y, z. > (optimize_range_tests): Call optimize_range_tests_cmp_bitwise > only after optimize_range_tests_var_bound. > > * gcc.dg/tree-ssa/pr95731.c: New test. > * gcc.c-torture/execute/pr95731.c: New test. > > --- gcc/tree-ssa-reassoc.c.jj 2021-01-11 10:35:02.204501369 +0100 > +++ gcc/tree-ssa-reassoc.c2021-01-11 15:20:17.578155827 +0100 > @@ -3320,7 +3320,8 @@ optimize_range_tests_to_bit_test (enum t > /* Optimize x != 0 && y != 0 && z != 0 into (x | y | z) != 0 > and similarly x != -1 && y != -1 && y != -1 into (x & y & z) != -1. > Also, handle x < C && y < C && z < C where C is power of two as > - (x | y | z) < C. */ > + (x | y | z) < C. And also handle signed x < 0 && y < 0 && z < 0 > + as (x | y | z) < 0. */ > > static bool > optimize_range_tests_cmp_bitwise (enum tree_code opcode, int first, int > length, > @@ -3340,13 +3341,13 @@ optimize_range_tests_cmp_bitwise (enum t > >if (ranges[i].exp == NULL_TREE > || TREE_CODE (ranges[i].exp) != SSA_NAME > - || !ranges[i].in_p > || TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) <= 1 > || TREE_CODE (TREE_TYPE (ranges[i].exp)) == BOOLEAN_TYPE) > continue; > >if (ranges[i].low != NULL_TREE > && ranges[i].high != NULL_TREE > + && ranges[i].in_p > && tree_int_cst_equal (ranges[i].low, ranges[i].high)) > { > idx = !integer_zerop (ranges[i].low); > @@ -3354,7 +3355,8 @@ optimize_range_tests_cmp_bitwise (enum t > continue; > } >else if (ranges[i].high != NULL_TREE > -&& TREE_CODE (ranges[i].high) == INTEGER_CST) > +&& TREE_CODE (ranges[i].high) == INTEGER_CST > +&& ranges[i].in_p) > { > wide_int w = wi::to_wide (ranges[i].high); > int prec = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)); > @@ -3370,10 +3372,20 @@ optimize_range_tests_cmp_bitwise (enum t > && integer_zerop (ranges[i].low > continue; > } > + else if (ranges[i].high == NULL_TREE > +&& ranges[i].low != NULL_TREE > +/* Perform this optimization only in the last > + reassoc pass, as it interferes with the reassociation > + itself or could also with VRP etc. which might not > + be able to virtually undo the optimization. */ > +&& !reassoc_insert_powi_p > +&& !TYPE_UNSIGNED (TREE_TYPE (ranges[i].exp)) > +&& integer_zerop (ranges[i].low)) > + idx = 3; >else > continue; > > - b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 3 + idx; > + b = TYPE_PRECISION (TREE_TYPE (ranges[i].exp)) * 4 + idx; >if (buckets.length () <= b) > buckets.safe_grow_cleared (b + 1, true); >if (chains.length () <= (unsigned) i) > @@ -3386,7 +3398,7 @@ optimize_range_tests_cmp_bitwise (enum t > if (i && chains[i - 1]) >{ > int j, k = i; > - if ((b % 3) == 2) > + if ((b % 4) == 2) > { > /* When ranges[X - 1].high + 1 is a power of two, > we need to process the same bucket up to > @@ -3439,6 +3451,19 @@ optimize_range_tests_cmp_bitwise (enum t > { > tree type = TREE_TYPE (ranges[j - 1].exp); > strict_overflow_p |= ranges[j - 1].strict_overflow_p; > + if ((b % 4) == 3) > + { > + /* For the signed < 0 cases, the types should be > +really compatible (all signed with the same precision, > +instead put ranges that have different in_p from > +k first. */ > + if (!useless_type_conversion_p (type1, type)) > + continue; > + if (ranges[j
Re: [PATCH] configure, make: Fix up --enable-link-serialization
On Tue, 12 Jan 2021, Jakub Jelinek wrote: > Hi! > > As reported by Matthias, --enable-link-serialization=1 can currently start > two concurrent links first (e.g. gnat1 and cc1). > The problem is that make var = value values seem to work differently between > dependencies and actual rules (where it was tested). > As the language make fragments can be in different order, we can have: > # Part of Makefile added by configure > ada.prev = ... magic that will become $(c.serial) under > --enable-link-serialization=1 > # ada/gcc-interface/Make-lang.in > gnat1$(exe): . $(ada.prev) > ... > # c/Make-lang.in > c.serial = cc1$(exe) > and while if I add echo $(ada.prev) in the gnat1 rule's command, it prints > cc1, the dependencies are actually evaluated during reading of the goal or > when. > The configure creates (and puts into Makefile) some serialization order of > the languages and in that order c always comes first, and the rest is > actually sorted the way the all_lang_makefrags are already sorted, > so just by forcing c/Make-lang.in first we achieve that X.serial variable > is always defined before some other Y.prev will use it in its goal > dependencies. > > Bootstrapped/regtested on x86_64-linux and i686-linux, without > --enable-link-serialization altogether, with =1 and with =3, all results > look good. Ok for trunk? OK. Richard. > 2021-01-12 Jakub Jelinek > > * configure.ac: Ensure c/Make-lang.in comes first in > @all_lang_makefrags@. > * configure: Regenerated. > > --- gcc/configure.ac.jj 2021-01-05 13:57:59.911905006 +0100 > +++ gcc/configure.ac 2021-01-11 14:49:22.878218318 +0100 > @@ -6975,7 +6975,12 @@ changequote([,])dnl > $ok || continue > > all_lang_configurefrags="$all_lang_configurefrags > \$(srcdir)/$gcc_subdir/config-lang.in" > - all_lang_makefrags="$all_lang_makefrags > \$(srcdir)/$gcc_subdir/Make-lang.in" > + if test "x$language" = xc && test -n "$all_lang_makefrags"; then > + # Put c/Make-lang.in fragment first to match serialization > languages order. > + all_lang_makefrags="\$(srcdir)/$gcc_subdir/Make-lang.in > $all_lang_makefrags" > + else > + all_lang_makefrags="$all_lang_makefrags > \$(srcdir)/$gcc_subdir/Make-lang.in" > + fi > if test -f $srcdir/$gcc_subdir/lang.opt; then > lang_opt_files="$lang_opt_files $srcdir/$gcc_subdir/lang.opt" > all_opt_files="$all_opt_files $srcdir/$gcc_subdir/lang.opt" > --- gcc/configure.jj 2021-01-06 22:17:07.275967932 +0100 > +++ gcc/configure 2021-01-11 14:49:29.708140555 +0100 > @@ -31174,7 +31174,12 @@ do > $ok || continue > > all_lang_configurefrags="$all_lang_configurefrags > \$(srcdir)/$gcc_subdir/config-lang.in" > - all_lang_makefrags="$all_lang_makefrags > \$(srcdir)/$gcc_subdir/Make-lang.in" > + if test "x$language" = xc && test -n "$all_lang_makefrags"; then > + # Put c/Make-lang.in fragment first to match serialization > languages order. > + all_lang_makefrags="\$(srcdir)/$gcc_subdir/Make-lang.in > $all_lang_makefrags" > + else > + all_lang_makefrags="$all_lang_makefrags > \$(srcdir)/$gcc_subdir/Make-lang.in" > + fi > if test -f $srcdir/$gcc_subdir/lang.opt; then > lang_opt_files="$lang_opt_files $srcdir/$gcc_subdir/lang.opt" > all_opt_files="$all_opt_files $srcdir/$gcc_subdir/lang.opt" > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Re: [PATCH] widening_mul: Fix up ICE caused by my signed multiplication overflow pattern recognition changes [PR98629]
On Tue, 12 Jan 2021, Jakub Jelinek wrote: > Hi! > > As the testcase shows, my latest changes caused ICE on that testcase. > The problem is that arith_overflow_check_p now can change the use_stmt > argument (has a reference), so that if it succeeds (returns non-zero), > it points it to the GIMPLE_COND or EQ/NE or COND_EXPR assignment from the > TRUNC_DIV_EXPR assignment. > The problem was that it would change use_stmt also if it returned 0 in some > cases, such as multiple imm uses of the division, and in one of the callers > if arith_overflow_check_p returns 0 it looks at use_stmt again and performs > other checks, which of course assumes that use_stmt is the one passed > to arith_overflow_check_p and not e.g. NULL instead or some other unrelated > stmt. > > The following patch fixes that by only changing use_stmt when we are about > to return non-zero (for the MULT_EXPR case, which is the only one with the > need to use different use_stmt). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. > 2021-01-12 Jakub Jelinek > > PR tree-optimization/98629 > * tree-ssa-math-opts.c (arith_overflow_check_p): Don't update use_stmt > unless returning non-zero. > > * gcc.c-torture/compile/pr98629.c: New test. > > --- gcc/tree-ssa-math-opts.c.jj 2021-01-11 10:35:02.196501461 +0100 > +++ gcc/tree-ssa-math-opts.c 2021-01-11 23:56:24.154380944 +0100 > @@ -3667,6 +3667,7 @@ arith_overflow_check_p (gimple *stmt, gi >tree rhs1 = gimple_assign_rhs1 (stmt); >tree rhs2 = gimple_assign_rhs2 (stmt); >tree multop = NULL_TREE, divlhs = NULL_TREE; > + gimple *cur_use_stmt = use_stmt; > >if (code == MULT_EXPR) > { > @@ -3697,26 +3698,26 @@ arith_overflow_check_p (gimple *stmt, gi >if (!divlhs) > return 0; >use_operand_p use; > - if (!single_imm_use (divlhs, &use, &use_stmt)) > + if (!single_imm_use (divlhs, &use, &cur_use_stmt)) > return 0; > } > - if (gimple_code (use_stmt) == GIMPLE_COND) > + if (gimple_code (cur_use_stmt) == GIMPLE_COND) > { > - ccode = gimple_cond_code (use_stmt); > - crhs1 = gimple_cond_lhs (use_stmt); > - crhs2 = gimple_cond_rhs (use_stmt); > + ccode = gimple_cond_code (cur_use_stmt); > + crhs1 = gimple_cond_lhs (cur_use_stmt); > + crhs2 = gimple_cond_rhs (cur_use_stmt); > } > - else if (is_gimple_assign (use_stmt)) > + else if (is_gimple_assign (cur_use_stmt)) > { > - if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS) > + if (gimple_assign_rhs_class (cur_use_stmt) == GIMPLE_BINARY_RHS) > { > - ccode = gimple_assign_rhs_code (use_stmt); > - crhs1 = gimple_assign_rhs1 (use_stmt); > - crhs2 = gimple_assign_rhs2 (use_stmt); > + ccode = gimple_assign_rhs_code (cur_use_stmt); > + crhs1 = gimple_assign_rhs1 (cur_use_stmt); > + crhs2 = gimple_assign_rhs2 (cur_use_stmt); > } > - else if (gimple_assign_rhs_code (use_stmt) == COND_EXPR) > + else if (gimple_assign_rhs_code (cur_use_stmt) == COND_EXPR) > { > - tree cond = gimple_assign_rhs1 (use_stmt); > + tree cond = gimple_assign_rhs1 (cur_use_stmt); > if (COMPARISON_CLASS_P (cond)) > { > ccode = TREE_CODE (cond); > @@ -3792,11 +3793,17 @@ arith_overflow_check_p (gimple *stmt, gi > { > if ((crhs1 == divlhs && arith_cast_equal_p (crhs2, multop)) > || (crhs2 == divlhs && arith_cast_equal_p (crhs1, multop))) > - return ccode == NE_EXPR ? 1 : -1; > + { > + use_stmt = cur_use_stmt; > + return ccode == NE_EXPR ? 1 : -1; > + } > } > else if ((crhs1 == divlhs && operand_equal_p (crhs2, multop, 0)) > || (crhs2 == divlhs && crhs1 == multop)) > - return ccode == NE_EXPR ? 1 : -1; > + { > + use_stmt = cur_use_stmt; > + return ccode == NE_EXPR ? 1 : -1; > + } > } >break; > default: > --- gcc/testsuite/gcc.c-torture/compile/pr98629.c.jj 2021-01-12 > 00:00:21.132949613 +0100 > +++ gcc/testsuite/gcc.c-torture/compile/pr98629.c 2021-01-11 > 23:59:58.178206894 +0100 > @@ -0,0 +1,11 @@ > +/* PR tree-optimization/98629 */ > + > +unsigned int a; > +int b, c; > + > +void > +foo (void) > +{ > + unsigned int *e = &a; > + (a /= a |= b) - (0 <= (*e += *e)) * (c *= *e); > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Re: [PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]
On Tue, Jan 12, 2021 at 10:33 AM Jakub Jelinek wrote: > > Hi! > > The following patch adds patterns (in the end I went with define_insn rather > than combiner define_split + define_insn_and_split I initially hoped or > define_insn_and_split) to represent (so far 128-bit only) permutations > like { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 } where the second > operand is CONST0_RTX CONST_VECTOR as pmovzx. > define_split didn't work (due to the combiner not trying combine_split_insn > when i1 is NULL) but in the end was quite large, and the reason for not > trying to split this afterwards is the different vector mode of the output, > and lowpart_subreg on the result is undesirable, > so we'd need to split it into two instructions and hope some later pass > optimizes the move into just rewriting the uses using lowpart_subreg. You can use post-reload define_insn_and_split here. This way, gen_lowpart on all arguments, including output, can be used. So, instead of generating an insn template, the patterns you introduced should split to "real" sse4_1 zero-extend insns. This approach is preferred to avoid having several pseudo-insns in .md files that do the same thing with slightly different patterns. There are many examples of post-reload splitters that use gen_lowpart in i386.md. OTOH, perhaps some of the new testcases can be handled in x86 target_fold_builtin? In the long term, maybe target_fold_shuffle can be introduced to map __builtin_shufle to various target builtins, so the builtin can be processed further in target_fold_builtin. As pointed out below, vector insn patterns can be quite complex, and push RTL combiners to their limits, so perhaps they can be more efficiently handled by tree passes. Uros. > While I initially wrote also avx2 and avx512{bw,f} patterns, turns out that > without further changes they aren't useful, such permutations are expanded > into multiple permutation instructions and while combiner in theory could > be able to undo that, what stops it is that the const_vector of 0 has > multiple uses. So either we'd need to allow pre-reload const0_operand > next to registers (and sometimes memory) in some permutations, or perhaps > change the vec_perm_const target hook calling code to not force zero vectors > into registers and change each vec_perm_const hook to force it into register > itself if it can't expand such a permutation smartly. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-01-12 Jakub Jelinek > > PR target/95905 > * config/i386/predicates.md (pmovzx_parallel): New predicate. > * config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3, > *sse4_1_zero_extendv4hiv4si2_3, *sse4_1_zero_extendv2siv2di2_3): New > define_insn patterns. > > * gcc.target/i386/pr95905-1.c: New test. > * gcc.target/i386/pr95905-2.c: New test. > > --- gcc/config/i386/predicates.md.jj2021-01-04 10:25:45.248161185 +0100 > +++ gcc/config/i386/predicates.md 2021-01-11 19:40:04.793433826 +0100 > @@ -1600,6 +1600,38 @@ (define_predicate "addsub_vs_parallel" >return true; > }) > > +;; Return true if OP is a parallel for an pmovz{bw,wd,dq} vec_select, > +;; where one of the two operands of the vec_concat is const0_operand. > +(define_predicate "pmovzx_parallel" > + (and (match_code "parallel") > + (match_code "const_int" "a")) > +{ > + int nelt = XVECLEN (op, 0); > + int elt, i; > + > + if (nelt < 2) > +return false; > + > + /* Check that the permutation is suitable for pmovz{bw,wd,dq}. > + For example { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 }. */ > + elt = INTVAL (XVECEXP (op, 0, 0)); > + if (elt == 0) > +{ > + for (i = 1; i < nelt; ++i) > + if ((i & 1) != 0) > + { > + if (INTVAL (XVECEXP (op, 0, i)) < nelt) > + return false; > + } > + else if (INTVAL (XVECEXP (op, 0, i)) != i / 2) > + return false; > +} > + else > +return false; > + > + return true; > +}) > + > ;; Return true if OP is a parallel for a vbroadcast permute. > (define_predicate "avx_vbroadcast_operand" >(and (match_code "parallel") > --- gcc/config/i386/sse.md.jj 2021-01-07 15:29:52.604974544 +0100 > +++ gcc/config/i386/sse.md 2021-01-11 21:19:06.247346965 +0100 > @@ -17683,6 +17683,27 @@ (define_insn_and_split "*sse4_1_v8 > (any_extend:V8HI (match_dup 1)))] >"operands[1] = adjust_address_nv (operands[1], V8QImode, 0);") > > +(define_insn "*sse4_1_zero_extendv8qiv8hi2_3" > + [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 1 "vector_operand" "Yrm,*xm,vm") > + (match_operand:V16QI 2 "const0_operand" "C,C,C")) > + (match_parallel 3 "pmovzx_parallel" > + [(match_operand 4 "const_int_operand" "n,n,n")])))] > + "TARGET_SSE4_1" > +{ > + if (MEM_P (operands[1])) > +return "%vpmovzxbw\t{%1, %0|
Re: [PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]
On Tue, Jan 12, 2021 at 11:42:44AM +0100, Uros Bizjak wrote: > > The following patch adds patterns (in the end I went with define_insn rather > > than combiner define_split + define_insn_and_split I initially hoped or > > define_insn_and_split) to represent (so far 128-bit only) permutations > > like { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 } where the second > > operand is CONST0_RTX CONST_VECTOR as pmovzx. > > define_split didn't work (due to the combiner not trying combine_split_insn > > when i1 is NULL) but in the end was quite large, and the reason for not > > trying to split this afterwards is the different vector mode of the output, > > and lowpart_subreg on the result is undesirable, > > so we'd need to split it into two instructions and hope some later pass > > optimizes the move into just rewriting the uses using lowpart_subreg. > > You can use post-reload define_insn_and_split here. This way, > gen_lowpart on all arguments, including output, can be used. So, > instead of generating an insn template, the patterns you introduced > should split to "real" sse4_1 zero-extend insns. This approach is > preferred to avoid having several pseudo-insns in .md files that do > the same thing with slightly different patterns. There are many > examples of post-reload splitters that use gen_lowpart in i386.md. Ok, will change it that way. > OTOH, perhaps some of the new testcases can be handled in x86 > target_fold_builtin? In the long term, maybe target_fold_shuffle can > be introduced to map __builtin_shufle to various target builtins, so > the builtin can be processed further in target_fold_builtin. As > pointed out below, vector insn patterns can be quite complex, and push > RTL combiners to their limits, so perhaps they can be more efficiently > handled by tree passes. My primary motivation was to generate good code from __builtin_shuffle here and trying to find the best permutation and map it back from insns to builtins would be a nightmare. I'll see how many targets I need to modify to try the no middle-end force_reg for CONST0_RTX case... Jakub
[PATCH] alias: Fix offset checks involving section anchors [PR92294]
This is a repost of: https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539763.html which was initially posted during stage 4. (And yeah, I only just missed stage 4 again.) IMO it would be better to fix the bug directly (as the patch tries to do) instead of wait for a more thorough redesign of this area. See the end of: https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540002.html for some stats. Honza: Richard said he'd like your opinion on the patch. memrefs_conflict_p has a slightly odd structure. It first checks whether two addresses based on SYMBOL_REFs refer to the same object, with a tristate result: int cmp = compare_base_symbol_refs (x,y); If the addresses do refer to the same object, we can use offset-based checks: /* If both decls are the same, decide by offsets. */ if (cmp == 1) return offset_overlap_p (c, xsize, ysize); But then, apart from the special case of forced address alignment, we use an offset-based check even if we don't know whether the addresses refer to the same object: /* Assume a potential overlap for symbolic addresses that went through alignment adjustments (i.e., that have negative sizes), because we can't know how far they are from each other. */ if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0)) return -1; /* If decls are different or we know by offsets that there is no overlap, we win. */ if (!cmp || !offset_overlap_p (c, xsize, ysize)) return 0; This somewhat contradicts: /* In general we assume that memory locations pointed to by different labels may overlap in undefined ways. */ at the end of compare_base_symbol_refs. In other words, we're taking -1 to mean that either (a) the symbols are equal (via aliasing) or (b) the references access non-overlapping objects. But even assuming that's true for normal symbols, it doesn't cope correctly with section anchors. If a symbol X at ANCHOR+OFFSET is preemptible, either (a) X = ANCHOR+OFFSET (rather than the X = ANCHOR assumed above) or (b) X and ANCHOR reference non-overlapping objects. And an offset-based comparison makes no sense for an anchor symbol vs. a bare symbol with no decl. If the bare symbol is allowed to alias other symbols then it can surely alias any symbol in the anchor's block, so there are multiple anchor offsets that might induce an alias. This patch therefore replaces the current tristate: - known equal - known independent (two accesses can't alias) - equal or independent with: - known distance apart - known independent (two accesses can't alias) - known distance apart or independent - don't know For safety, the patch puts all bare symbols in the "don't know" category. If that turns out to be too conservative, we at least need that behaviour for combinations involving a bare symbol and a section anchor. However, bare symbols should be relatively rare these days. Retested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu. OK to install? Richard gcc/ PR rtl-optimization/92294 * alias.c (compare_base_symbol_refs): Take an extra parameter and add the distance between two symbols to it. Enshrine in comments that -1 means "either 0 or 1, but we can't tell which at compile time". Return -2 for symbols whose relationship is unknown. (memrefs_conflict_p): Update call accordingly. (rtx_equal_for_memref_p): Likewise. Punt for a return value of -2, without even checking the offset. Take the distance between symbols into account. --- gcc/alias.c | 53 ++--- 1 file changed, 38 insertions(+), 15 deletions(-) diff --git a/gcc/alias.c b/gcc/alias.c index 8d3575e4e27..e22863a929a 100644 --- a/gcc/alias.c +++ b/gcc/alias.c @@ -159,7 +159,8 @@ static tree decl_for_component_ref (tree); static int write_dependence_p (const_rtx, const_rtx, machine_mode, rtx, bool, bool, bool); -static int compare_base_symbol_refs (const_rtx, const_rtx); +static int compare_base_symbol_refs (const_rtx, const_rtx, +HOST_WIDE_INT * = NULL); static void memory_modified_1 (rtx, const_rtx, void *); @@ -1837,7 +1838,11 @@ rtx_equal_for_memref_p (const_rtx x, const_rtx y) return label_ref_label (x) == label_ref_label (y); case SYMBOL_REF: - return compare_base_symbol_refs (x, y) == 1; + { + HOST_WIDE_INT distance = 0; + return (compare_base_symbol_refs (x, y, &distance) == 1 + && distance == 0); + } case ENTRY_VALUE: /* This is magic, don't go through canonicalization et al. */ @@ -2172,10 +2177,24 @@ compare_base_decls (tree base1, tree base2) return ret; } -/* Same as compare_base_decls but for SYMBOL_REF. */ +/* Compare SYMBOL_REFs X_BASE and Y_BASE. + + - Return
Re: [PATCH, OpenMP 5.0] Basic support of release/delete clauses on target/target-data directives
On 2021/1/11 6:28 PM, Jakub Jelinek wrote: On Wed, Dec 16, 2020 at 11:06:10PM +0800, Chung-Lin Tang wrote: we have some other sollve_vv tests for OpenMP 5.0, which tests the occurrence of 'delete' map kind on the target directive. That is strange. In OpenMP 5.0, I certainly see the target data: [163:16] A map-type in a map clause must be to, from, tofrom or alloc. restriction and similarly for target: [174:27] A map-type in a map clause must be to, from, tofrom or alloc. restriction. Thanks! I was originally trying to find such a restriction too, but missed those lines above. Appears that this patch isn't needed then. Chung-Lin
[PATCH][pushed] gcov: add more debugging facility
The patch is about an extensive debugging facility that I see beneficial for GCOV issue. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. I'm going to push it. Thanks, Martin gcc/ChangeLog: * gcov.c (source_info::debug): New. (print_usage): Add --debug (-D) option. (process_args): Likewise. (generate_results): Call src->debug after accumulate_line_counts. (read_graph_file): Properly assign id for EXIT_BLOCK. * profile.c (branch_prob): Dump function body before it is instrumented. --- gcc/gcov.c| 43 ++- gcc/profile.c | 5 + 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/gcc/gcov.c b/gcc/gcov.c index 01ca52b215d..93128721ef6 100644 --- a/gcc/gcov.c +++ b/gcc/gcov.c @@ -371,6 +371,9 @@ public: /* Register a new function. */ void add_function (function_info *fn); + /* Debug the source file. */ + void debug (); + /* Index of the source_info in sources vector. */ unsigned index; @@ -428,6 +431,31 @@ source_info::get_functions_at_location (unsigned line_num) const return slot; } +void source_info::debug () +{ + fprintf (stderr, "source_info: %s\n", name); + for (vector::iterator it = functions.begin (); + it != functions.end (); it++) +{ + function_info *fn = *it; + fprintf (stderr, " function_info: %s\n", fn->get_name ()); + for (vector::iterator bit = fn->blocks.begin (); + bit != fn->blocks.end (); bit++) + { + fprintf (stderr, "block_info id=%d, count=%ld\n", + bit->id, bit->count); + } +} + + for (unsigned lineno = 1; lineno < lines.size (); ++lineno) +{ + line_info &line = lines[lineno]; + fprintf (stderr, " line_info=%d, count=%ld\n", lineno, line.count); +} + + fprintf (stderr, "\n"); +} + class name_map { public: @@ -579,6 +607,10 @@ static int flag_human_readable_numbers = 0; static int flag_function_summary = 0; +/* Print debugging dumps. */ + +static int flag_debug = 0; + /* Object directory file prefix. This is the directory/file where the graph and data files are looked for, if nonzero. */ @@ -896,6 +928,7 @@ print_usage (int error_p) fnotice (file, " -c, --branch-counts Output counts of branches taken\n\ rather than percentages\n"); fnotice (file, " -d, --display-progress Display progress information\n"); + fnotice (file, " -D, --debug Display debugging dumps\n"); fnotice (file, " -f, --function-summariesOutput summaries for each function\n"); fnotice (file, " -h, --help Print this help, then exit\n"); fnotice (file, " -j, --json-format Output JSON intermediate format\n\ @@ -963,6 +996,7 @@ static const struct option options[] = { "hash-filenames",no_argument, NULL, 'x' }, { "use-colors",no_argument, NULL, 'k' }, { "use-hotness-colors", no_argument, NULL, 'q' }, + { "debug", no_argument, NULL, 'D' }, { 0, 0, 0, 0 } }; @@ -973,7 +1007,7 @@ process_args (int argc, char **argv) { int opt; - const char *opts = "abcdfhHijklmno:pqrs:tuvwx"; + const char *opts = "abcdDfhHijklmno:pqrs:tuvwx"; while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1) { switch (opt) @@ -1044,6 +1078,9 @@ process_args (int argc, char **argv) case 't': flag_use_stdout = 1; break; + case 'D': + flag_debug = 1; + break; case 'v': print_version (); /* print_version will exit. */ @@ -1466,6 +1503,8 @@ generate_results (const char *file_name) } accumulate_line_counts (src); + if (flag_debug) + src->debug (); if (!flag_use_stdout) file_summary (&src->coverage); @@ -1804,6 +1843,8 @@ read_graph_file (void) arc = XCNEW (arc_info); arc->dst = &fn->blocks[dest]; + /* Set id in order to find EXIT_BLOCK. */ + arc->dst->id = dest; arc->src = src_blk; arc->count = 0; diff --git a/gcc/profile.c b/gcc/profile.c index d629687484b..1f1d60c8180 100644 --- a/gcc/profile.c +++ b/gcc/profile.c @@ -1294,6 +1294,11 @@ branch_prob (bool thunk) if (dump_file) fprintf (dump_file, "%d instrumentation edges\n", num_instrumented); + /* Dump function body before it's instrumented. + It helps to debug gcov tool. */ + if (dump_file && (dump_flags & TDF_DETAILS)) +dump_function_to_file (cfun->decl, dump_file, dump_flags); + /* Compute two different checksums. Note that we want to compute the checksum in only once place, since it depends on the shape of the control flow which can change during -- 2.29.2
Re: [PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]
On Tue, Jan 12, 2021 at 11:47:48AM +0100, Jakub Jelinek via Gcc-patches wrote: > > OTOH, perhaps some of the new testcases can be handled in x86 > > target_fold_builtin? In the long term, maybe target_fold_shuffle can > > be introduced to map __builtin_shufle to various target builtins, so > > the builtin can be processed further in target_fold_builtin. As > > pointed out below, vector insn patterns can be quite complex, and push > > RTL combiners to their limits, so perhaps they can be more efficiently > > handled by tree passes. > > My primary motivation was to generate good code from __builtin_shuffle here > and trying to find the best permutation and map it back from insns to > builtins would be a nightmare. > I'll see how many targets I need to modify to try the no middle-end > force_reg for CONST0_RTX case... For the folding, I think best would be to change _mm_unpacklo_epi8 and all the similar intrinsics for hardcoded specific permutations from using a builtin to just using __builtin_shuffle (together with verification that we emit as good or better code from it for each case of course), and keep __builtin_shuffle -> VEC_PERM_EXPR as the canonical form (with which the GIMPLE code can do any optimizations it wants). Jakub
[PATCH][pushed] options: properly compare string arguments
Similarly to 7f967bd2a7ba156ede3fbb147e66dea5fb7137a6, we need to compare string with strcmp. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. I'm going to push it as it's the same as what I did in 7f967bd2a7ba156ede3fbb147e66dea5fb7137a6. Martin gcc/ChangeLog: PR c++/97284 * optc-save-gen.awk: Compare also n_target_save vars with strcmp. --- gcc/optc-save-gen.awk | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk index 85debfe0b64..b1f85928275 100644 --- a/gcc/optc-save-gen.awk +++ b/gcc/optc-save-gen.awk @@ -1036,8 +1036,10 @@ for (i = 0; i < n_target_save; i++) { type = var; sub("^.*[ *]", "", name) sub(" *" name "$", "", type) - if (target_save_decl[i] ~ "^const char \\*+[_" alnum "]+$") + if (target_save_decl[i] ~ "^const char \\*+[_" alnum "]+$") { var_target_str[n_target_str++] = name; + string_options_names[name]++ + } else { if (target_save_decl[i] ~ " .*\\[.+\\]+$") { size = name; @@ -1451,7 +1453,7 @@ for (i = 0; i < n_opts; i++) { continue; checked_options[name]++ - if (name in string_options_names) { + if (name in string_options_names || ("x_" name) in string_options_names) { print " if (ptr1->x_" name " != ptr2->x_" name ""; print " && (!ptr1->x_" name" || !ptr2->x_" name print " || strcmp (ptr1->x_" name", ptr2->x_" name ")))"; -- 2.29.2
[PATCH] i386, v2: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]
On Tue, Jan 12, 2021 at 11:42:44AM +0100, Uros Bizjak via Gcc-patches wrote: > You can use post-reload define_insn_and_split here. This way, > gen_lowpart on all arguments, including output, can be used. So, > instead of generating an insn template, the patterns you introduced > should split to "real" sse4_1 zero-extend insns. This approach is > preferred to avoid having several pseudo-insns in .md files that do > the same thing with slightly different patterns. There are many > examples of post-reload splitters that use gen_lowpart in i386.md. So like this? If I tweak the vec_perm_const, the other define_insn_and_split will be easier, as they won't need the vec_select and variants in what they split into, just lowpart_subreg on the operand unconditionally and zero_extend it. 2021-01-12 Jakub Jelinek PR target/95905 * config/i386/predicates.md (pmovzx_parallel): New predicate. * config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3, *sse4_1_zero_extendv4hiv4si2_3, *sse4_1_zero_extendv2siv2di2_3): New define_insn_and_split patterns. * gcc.target/i386/pr95905-1.c: New test. * gcc.target/i386/pr95905-2.c: New test. --- gcc/config/i386/predicates.md.jj2021-01-12 11:01:28.458643868 +0100 +++ gcc/config/i386/predicates.md 2021-01-12 13:58:39.816222121 +0100 @@ -1600,6 +1600,38 @@ (define_predicate "addsub_vs_parallel" return true; }) +;; Return true if OP is a parallel for an pmovz{bw,wd,dq} vec_select, +;; where one of the two operands of the vec_concat is const0_operand. +(define_predicate "pmovzx_parallel" + (and (match_code "parallel") + (match_code "const_int" "a")) +{ + int nelt = XVECLEN (op, 0); + int elt, i; + + if (nelt < 2) +return false; + + /* Check that the permutation is suitable for pmovz{bw,wd,dq}. + For example { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 }. */ + elt = INTVAL (XVECEXP (op, 0, 0)); + if (elt == 0) +{ + for (i = 1; i < nelt; ++i) + if ((i & 1) != 0) + { + if (INTVAL (XVECEXP (op, 0, i)) < nelt) + return false; + } + else if (INTVAL (XVECEXP (op, 0, i)) != i / 2) + return false; +} + else +return false; + + return true; +}) + ;; Return true if OP is a parallel for a vbroadcast permute. (define_predicate "avx_vbroadcast_operand" (and (match_code "parallel") --- gcc/config/i386/sse.md.jj 2021-01-12 11:01:28.494643460 +0100 +++ gcc/config/i386/sse.md 2021-01-12 14:30:32.688546846 +0100 @@ -17683,6 +17683,36 @@ (define_insn_and_split "*sse4_1_v8 (any_extend:V8HI (match_dup 1)))] "operands[1] = adjust_address_nv (operands[1], V8QImode, 0);") +(define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3" + [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "vector_operand" "Yrm,*xm,vm") + (match_operand:V16QI 2 "const0_operand" "C,C,C")) + (match_parallel 3 "pmovzx_parallel" + [(match_operand 4 "const_int_operand" "n,n,n")])))] + "TARGET_SSE4_1" + "#" + "&& reload_completed" + [(set (match_dup 0) + (zero_extend:V8HI + (vec_select:V8QI + (match_dup 1) + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]] +{ + operands[0] = lowpart_subreg (V8HImode, operands[0], V16QImode); + if (MEM_P (operands[1])) +{ + operands[1] = lowpart_subreg (V8QImode, operands[1], V16QImode); + operands[1] = gen_rtx_ZERO_EXTEND (V8HImode, operands[1]); + emit_insn (gen_rtx_SET (operands[0], operands[1])); + DONE; +} +}) + (define_expand "v8qiv8hi2" [(set (match_operand:V8HI 0 "register_operand") (any_extend:V8HI @@ -17929,6 +17959,34 @@ (define_expand "v4hiv4si2" } }) +(define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_3" + [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "vector_operand" "Yrm,*xm,vm") + (match_operand:V8HI 2 "const0_operand" "C,C,C")) + (match_parallel 3 "pmovzx_parallel" + [(match_operand 4 "const_int_operand" "n,n,n")])))] + "TARGET_SSE4_1" + "#" + "&& reload_completed" + [(set (match_dup 0) + (zero_extend:V4SI + (vec_select:V4HI + (match_dup 1) + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]] +{ + operands[0] = lowpart_subreg (V4SImode, operands[0], V8HImode); + if (MEM_P (operands[1])) +{ + operands[1] = lowpart_subreg (V4HImode, operands[1], V8HImode); + operands[1] = gen_rtx_ZERO_EXTEND (V4SImode, operands[1]); + emit_insn (gen_rtx_SET (operands[0], operands[1])); + DONE; +} +}) + (define_insn "avx512f_v
Re: [PATCH] i386, v2: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]
On Tue, Jan 12, 2021 at 2:40 PM Jakub Jelinek wrote: > > On Tue, Jan 12, 2021 at 11:42:44AM +0100, Uros Bizjak via Gcc-patches wrote: > > You can use post-reload define_insn_and_split here. This way, > > gen_lowpart on all arguments, including output, can be used. So, > > instead of generating an insn template, the patterns you introduced > > should split to "real" sse4_1 zero-extend insns. This approach is > > preferred to avoid having several pseudo-insns in .md files that do > > the same thing with slightly different patterns. There are many > > examples of post-reload splitters that use gen_lowpart in i386.md. > > So like this? > > If I tweak the vec_perm_const, the other define_insn_and_split will be > easier, as they won't need the vec_select and variants in what they split > into, just lowpart_subreg on the operand unconditionally and zero_extend it. > > 2021-01-12 Jakub Jelinek > > PR target/95905 > * config/i386/predicates.md (pmovzx_parallel): New predicate. > * config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3, > *sse4_1_zero_extendv4hiv4si2_3, *sse4_1_zero_extendv2siv2di2_3): New > define_insn_and_split patterns. > > * gcc.target/i386/pr95905-1.c: New test. > * gcc.target/i386/pr95905-2.c: New test. LGTM. Thanks, Uros. > > --- gcc/config/i386/predicates.md.jj2021-01-12 11:01:28.458643868 +0100 > +++ gcc/config/i386/predicates.md 2021-01-12 13:58:39.816222121 +0100 > @@ -1600,6 +1600,38 @@ (define_predicate "addsub_vs_parallel" >return true; > }) > > +;; Return true if OP is a parallel for an pmovz{bw,wd,dq} vec_select, > +;; where one of the two operands of the vec_concat is const0_operand. > +(define_predicate "pmovzx_parallel" > + (and (match_code "parallel") > + (match_code "const_int" "a")) > +{ > + int nelt = XVECLEN (op, 0); > + int elt, i; > + > + if (nelt < 2) > +return false; > + > + /* Check that the permutation is suitable for pmovz{bw,wd,dq}. > + For example { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 }. */ > + elt = INTVAL (XVECEXP (op, 0, 0)); > + if (elt == 0) > +{ > + for (i = 1; i < nelt; ++i) > + if ((i & 1) != 0) > + { > + if (INTVAL (XVECEXP (op, 0, i)) < nelt) > + return false; > + } > + else if (INTVAL (XVECEXP (op, 0, i)) != i / 2) > + return false; > +} > + else > +return false; > + > + return true; > +}) > + > ;; Return true if OP is a parallel for a vbroadcast permute. > (define_predicate "avx_vbroadcast_operand" >(and (match_code "parallel") > --- gcc/config/i386/sse.md.jj 2021-01-12 11:01:28.494643460 +0100 > +++ gcc/config/i386/sse.md 2021-01-12 14:30:32.688546846 +0100 > @@ -17683,6 +17683,36 @@ (define_insn_and_split "*sse4_1_v8 > (any_extend:V8HI (match_dup 1)))] >"operands[1] = adjust_address_nv (operands[1], V8QImode, 0);") > > +(define_insn_and_split "*sse4_1_zero_extendv8qiv8hi2_3" > + [(set (match_operand:V16QI 0 "register_operand" "=Yr,*x,v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 1 "vector_operand" "Yrm,*xm,vm") > + (match_operand:V16QI 2 "const0_operand" "C,C,C")) > + (match_parallel 3 "pmovzx_parallel" > + [(match_operand 4 "const_int_operand" "n,n,n")])))] > + "TARGET_SSE4_1" > + "#" > + "&& reload_completed" > + [(set (match_dup 0) > + (zero_extend:V8HI > + (vec_select:V8QI > + (match_dup 1) > + (parallel [(const_int 0) (const_int 1) > + (const_int 2) (const_int 3) > + (const_int 4) (const_int 5) > + (const_int 6) (const_int 7)]] > +{ > + operands[0] = lowpart_subreg (V8HImode, operands[0], V16QImode); > + if (MEM_P (operands[1])) > +{ > + operands[1] = lowpart_subreg (V8QImode, operands[1], V16QImode); > + operands[1] = gen_rtx_ZERO_EXTEND (V8HImode, operands[1]); > + emit_insn (gen_rtx_SET (operands[0], operands[1])); > + DONE; > +} > +}) > + > (define_expand "v8qiv8hi2" >[(set (match_operand:V8HI 0 "register_operand") > (any_extend:V8HI > @@ -17929,6 +17959,34 @@ (define_expand "v4hiv4si2" > } > }) > > +(define_insn_and_split "*sse4_1_zero_extendv4hiv4si2_3" > + [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 1 "vector_operand" "Yrm,*xm,vm") > + (match_operand:V8HI 2 "const0_operand" "C,C,C")) > + (match_parallel 3 "pmovzx_parallel" > + [(match_operand 4 "const_int_operand" "n,n,n")])))] > + "TARGET_SSE4_1" > + "#" > + "&& reload_completed" > + [(set (match_dup 0) > + (zero_extend:V4SI > + (vec_select:V4HI > + (match_dup 1) > + (parallel [(const_int 0) (const_int 1) > + (const_int 2) (const_int 3)]] > +{ > + operands[0] = lowpar
[PATCH] tree-optimization/98550 - fix BB vect unrolling check
This fixes the check that disqualifies BB vectorization because of required unrolling to match up with the later exact_div we do. To not disable the ability to split groups that do not match up exactly with a choosen vector type this also introduces a soft-fail mechanism to vect_build_slp_tree_1 which delays failing to after the matches[] array is populated from other checks and only then determines the split point according to the vector type. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-01-12 Richard Biener PR tree-optimization/98550 * tree-vect-slp.c (vect_record_max_nunits): Check whether the group size is a multiple of the vector element count. (vect_build_slp_tree_1): When we need to fail because the vector type choosen causes unrolling do so lazily without affecting matches only at the end to guide group splitting. * g++.dg/opt/pr98550.C: New testcase. --- gcc/testsuite/g++.dg/opt/pr98550.C | 96 ++ gcc/tree-vect-slp.c| 40 ++--- 2 files changed, 128 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/g++.dg/opt/pr98550.C diff --git a/gcc/testsuite/g++.dg/opt/pr98550.C b/gcc/testsuite/g++.dg/opt/pr98550.C new file mode 100644 index 000..49102e6c1a1 --- /dev/null +++ b/gcc/testsuite/g++.dg/opt/pr98550.C @@ -0,0 +1,96 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target c++11 } */ +/* { dg-additional-options "-O3" } */ +/* { dg-additional-options "-march=z13" { target s390x-*-* } } */ + +template struct k { static constexpr int c = a; }; +template struct o; +template struct o { + typedef decltype(0) h; +}; +template struct p : o::c, k::c, g...> {}; +class q; +class r { +public: + void ap(q); +}; +template void ax(aw ay) { ay(); } +template void ba(az bb) { + using bc = p; + using bd = typename bc::h; + ax(bb); +} +template class s; +class t { +public: + s<8> br(); + template void operator()() { ba(br()); } +}; +class q { +public: + template q(az) { H(); } + struct H { +t cc; +H() { cc(); } + }; +}; +template struct I {}; +template void cm(j cn, I) { + cm(cn, I()); + cn(cl); +} +template void cm(j, I<0>) {} +template struct u { + long cp[co]; + void cq(const u &); + void cs(int); + void operator<(u); +}; +template void u::cq(const u &l) { + cm([&](int i) { cp[i] &= l.cp[i]; }, I()); +} +template void u::cs(int m) { + cm([&](int i) { cp[i] >>= m; }, I()); +} +template class K; +template class v { + int cv; + friend K; + +public: + void cx(int, unsigned char *, unsigned long long); +}; +template class K { +public: + static void cx(v &); +}; +template +void v::cx(int, unsigned char *, unsigned long long) { + K::cx(*this); +} +template void K::cx(v &cz) { + u a, b, d; + int e, n = cz.cv; + for (; e;) +if (cz.cv) + a.cs(cz.cv); + a.cq(d); + a < b; +} +template class s { + v *dh; + +public: + void operator()(); +}; +template void s::operator()() { + int f; + unsigned char g; + long h; + dh->cx(f, &g, h); +} +void d() { + r i; + t j; + i.ap(j); +} diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 877d44b2257..65b7a27e1e8 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -873,11 +873,8 @@ vect_record_max_nunits (vec_info *vinfo, stmt_vec_info stmt_info, /* If populating the vector type requires unrolling then fail before adjusting *max_nunits for basic-block vectorization. */ - poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); - unsigned HOST_WIDE_INT const_nunits; if (is_a (vinfo) - && (!nunits.is_constant (&const_nunits) - || const_nunits > group_size)) + && !multiple_p (group_size, TYPE_VECTOR_SUBPARTS (vectype))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -928,6 +925,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, stmt_vec_info first_load = NULL, prev_first_load = NULL; bool first_stmt_load_p = false, load_p = false; bool first_stmt_phi_p = false, phi_p = false; + bool maybe_soft_fail = false; + tree soft_fail_nunits_vectype = NULL_TREE; /* For every stmt in NODE find its def stmt/s. */ stmt_vec_info stmt_info; @@ -977,10 +976,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, tree nunits_vectype; if (!vect_get_vector_types_for_stmt (vinfo, stmt_info, &vectype, - &nunits_vectype, group_size) - || (nunits_vectype - && !vect_record_max_nunits (vinfo, stmt_info, group_size, - nunits_vectype, max_nunits))) + &nunits_vectype, group_size)) { if (is_a (vinfo) && i != 0) continue; @@ -988,6 +984,17 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, matches[0] = false; return fals
Re: [PATCH] aarch64 : Mark rotate immediates with '#' as per DDI0487iFc.
Iain Sandoe writes: > Hi, > > The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front > of the immediate rotation quantity. > > Although, it seems, GAS is able to infer the # (or is leninent about > its absence) assemblers based on the LLVM back end expect it and error out. > > tested on aarch64-linux-gnu (gcc115) and aarch64-darwin20 (experimental) Sorry for the slow reply, didn't see this till now. The patch is OK in principle, and personally I prefer “#”. But how far does this spread? Are only ROR modifiers on logical patterns affected? Or is the use of a paranthesised expression instead of a literal the thing that makes the difference? GCC is generally quite lax at including “#”, so I'd expect more fallout than this. Thanks, Richard > > OK for master? > thanks > Iain > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (_rol3): Add a '#' > mark in front of the immediate quantity. > (_rolsi3_uxtw): Likewise. > --- > gcc/config/aarch64/aarch64.md | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 45d9c6ac45a..e0de82c938a 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4416,7 +4416,7 @@ > (match_operand:QI 2 "aarch64_shift_imm_" "n")) >(match_operand:GPI 3 "register_operand" "r")))] > "" > - "\\t%0, %3, %1, ror ( - %2)" > + "\\t%0, %3, %1, ror #( - %2)" > [(set_attr "type" "logic_shift_imm")] > ) > > @@ -4441,7 +4441,7 @@ > (match_operand:QI 2 "aarch64_shift_imm_si" "n")) >(match_operand:SI 3 "register_operand" "r"] > "" > - "\\t%w0, %w3, %w1, ror (32 - %2)" > + "\\t%w0, %w3, %w1, ror #(32 - %2)" > [(set_attr "type" "logic_shift_imm")] > )
Re: make FOR_EACH_IMM_USE_STMT safe for early exits
On 1/9/21 3:33 PM, Alexandre Oliva wrote: On Jan 7, 2021, Richard Biener wrote: On Wed, Jan 6, 2021 at 12:34 PM Alexandre Oliva wrote: On Jan 4, 2021, Richard Biener wrote: Thus, please remove uses of BREAK_FROM_IMM_USE_STMT together with this patch. And RETURN_FROM_IMM_USE_STMT, I suppose? Sure. Done. make FOR_EACH_IMM_USE_STMT safe for early exits Use a dtor to automatically remove ITER from IMM_USE list in FOR_EACH_IMM_USE_STMT. Regstrapped on x86_64-linux-gnu. Ok to install? Nice! I like it. One less wart for my epitaph :-). Thanks Andrew
[PATCH] if-to-switch: fix also virtual phis
Hello. As seen in the PR, we need to fix also virtual PHIs, otherwise TODO_cfg will skip edges for a missing PHI argument. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin gcc/ChangeLog: PR tree-optimization/98455 * gimple-if-to-switch.cc (condition_info::record_phi_mapping): Record also virtual PHIs. gcc/testsuite/ChangeLog: PR tree-optimization/98455 * gcc.dg/tree-ssa/pr98455.c: New test. --- gcc/gimple-if-to-switch.cc | 7 ++- gcc/testsuite/gcc.dg/tree-ssa/pr98455.c | 19 +++ 2 files changed, 21 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr98455.c diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc index 560753d0311..96213d86c28 100644 --- a/gcc/gimple-if-to-switch.cc +++ b/gcc/gimple-if-to-switch.cc @@ -91,11 +91,8 @@ condition_info::record_phi_mapping (edge e, mapping_vec *vec) gsi_next (&gsi)) { gphi *phi = gsi.phi (); - if (!virtual_operand_p (gimple_phi_result (phi))) - { - tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e); - vec->safe_push (std::make_pair (phi, arg)); - } + tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e); + vec->safe_push (std::make_pair (phi, arg)); } } diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c new file mode 100644 index 000..24e249f6fcb --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c @@ -0,0 +1,19 @@ +/* PR tree-optimization/98455 */ +/* { dg-do compile } */ +/* { dg-options "-O1 -fno-tree-dce --param case-values-threshold=1" } */ + +void +n4 (int io, int vb) +{ + double uc[2] = { 1.0, 2.0, }; + + if (io == 0) +uc[0] = 0.0; + + for (;;) +if (io == 0) + if (vb == 0) +uc[0] = uc[1]; + else if (vb == 1) +uc[1] = 0.0; +} -- 2.29.2
Re: [PATCH] libphobos: Allow building libphobos using Solaris/x86 assembler
Hi Iain, > Having a look, actually I can just use the presence of TEST_OUTPUT to be > a gate for whether to prune all output or not. that's quite nice indeed. > Maybe this can be improved later to extract the contents of TEST_OUTPUT, > but for now, it has caught a few hidden bugs in the tests that I have > handled in the patch below (no changelog entry yet). I've tried the patch myself last night on both i386-pc-solaris2.11 (both with as and gas) and sparc-sun-solaris2.11 (gas only until PR d/98584 is fixed) and found a few issues: > +proc gdc-extra-test-options { fdout test } { > +switch $test { [...] > + "runnable/test42.d" { > + # Tests that overflow line limits of older assemblers. > + puts $fdout "// { dg-xfail-if \"Lines exceed 10240 characters\" { > *-*-solaris2.* && { ! gas } } }" This doesn't work: I get +XPASS: gdc.test/runnable/test42.d (test for excess errors) +UNRESOLVED: gdc.test/runnable/test42.d compilation failed to produce executable +XPASS: gdc.test/runnable/test42.d -shared-libphobos (test for excess errors) +UNRESOLVED: gdc.test/runnable/test42.d -shared-libphobos compilation failed to produce executable which is no wonder: due to the presence of TEST_OUTPUT in the test, all output gets pruned, including the assembler message. This leads to success myInt int myBool bool i s C6test42__T4T219TiZ1C C6test427test219FZ8__mixin11C Input string too long, limit 10240 compiler exited with status 1 XPASS: gdc.test/runnable/test42.d (test for excess errors) Unfortunately the exit status of compiler and assembler is ignored by the testsuite framework when deciding whether the compilation has succeeded. Besides, the dg-xfail-if above should be restricted to *86*-*-solaris2.* && ! gas (to allow for both i?86-*-solaris2.* and x86_64-*-solaris2.*): I've previously run a Solaris/SPARC build with as and libphobos enabled and only the 64-bit gdc.test/runnable/test42.d execution tests FAIL. While the Solaris sparc and x86 assemblers have a common origin, the code bases have diverged over time and the sparc assembler doesn't have this particular low limit. However, testing on Solaris/SPARC with gas and libphobos showed another issue: +FAIL: gdc.test/runnable/traits_getPointerBitmap.d (test for excess errors) UNRESOLVED: gdc.test/runnable/traits_getPointerBitmap.d compilation failed to produce executable +FAIL: gdc.test/runnable/traits_getPointerBitmap.d -shared-libphobos (test for excess errors) UNRESOLVED: gdc.test/runnable/traits_getPointerBitmap.d -shared-libphobos compilation failed to produce executable Excess errors: runnable/traits_getPointerBitmap.d:220:9: error: vector type __vector(float[4]) is not supported on this platform Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] aarch64 : Mark rotate immediates with '#' as per DDI0487iFc.
Hi Richard, Richard Sandiford wrote: Iain Sandoe writes: The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front of the immediate rotation quantity. Although, it seems, GAS is able to infer the # (or is leninent about its absence) assemblers based on the LLVM back end expect it and error out. tested on aarch64-linux-gnu (gcc115) and aarch64-darwin20 (experimental) Sorry for the slow reply, didn't see this till now. Hmm .. I did CC you directly ( but having some troubles with this ISP which I am trying to resolve - emails going missing or have to be re-sent … :/ ) The patch is OK in principle, and personally I prefer “#”. But how far does this spread? Are only ROR modifiers on logical patterns affected? Or is the use of a paranthesised expression instead of a literal the thing that makes the difference? perhaps, Unfortunately, I have only a cause and effect mechanism to check this (i.e. problems that prevent bootstrap or ones I’ve triaged in the testsuite fails). For X86 and PPC I have a more direct way of invoking the LLVM backend (not had time to extend that to aarch64 yet). .. but via XCode (which is what 99.99% of macOS folks will use as their ‘binutils’) it’s done by “as” ==> "clang -cc1as” (effectively an assembler with preprocessing). This issue certainly fires there - and I’d have no reason to think that the preprocessor would make any material difference, so it’s likely that the LLVM backend is not set up to deduce that the value is a constant and infer the ‘#’. GCC is generally quite lax at including “#”, so I'd expect more fallout than this. This is one of three issues (that are not directly related to mach-o format or relocation syntax) that I’ve encountered in my experimental port that happen to trigger in bootstrap. (the others are a format specifier type naming clash and one related to Darwin having signed chars by default - so this is the only asm syntax one remaining - Alex fixed the other fails earlier in the year). Of course, it’s possible that there are other cases in the testsuite fallout that i’ve not yet examined - there’s quite a lot still to triage :(. Once upon a time there were some tests (at least for PPC and X86) that essentially were a single file containing all the insns - so that could be thrown at an assembler and would be expected to complete without error - but I don’t know if there’s an equivalent for aarch64. cheers Iain * I suppose someone could grep ‘#’ in the armv8_arm and c.f. cases against aarch64.md (I dare not volunteer, at present, since I’m already way overcomitted with non-$dayjob). Thanks, Richard OK for master? thanks Iain gcc/ChangeLog: * config/aarch64/aarch64.md (_rol3): Add a '#' mark in front of the immediate quantity. (_rolsi3_uxtw): Likewise. --- gcc/config/aarch64/aarch64.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 45d9c6ac45a..e0de82c938a 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4416,7 +4416,7 @@ (match_operand:QI 2 "aarch64_shift_imm_" "n")) (match_operand:GPI 3 "register_operand" "r")))] "" - "\\t%0, %3, %1, ror ( - %2)" + "\\t%0, %3, %1, ror #( - %2)" [(set_attr "type" "logic_shift_imm")] ) @@ -4441,7 +4441,7 @@ (match_operand:QI 2 "aarch64_shift_imm_si" "n")) (match_operand:SI 3 "register_operand" "r"] "" - "\\t%w0, %w3, %w1, ror (32 - %2)" + "\\t%w0, %w3, %w1, ror #(32 - %2)" [(set_attr "type" "logic_shift_imm")] )
Re: [PATCH] if-to-switch: fix also virtual phis
) On Tue, Jan 12, 2021 at 3:50 PM Martin Liška wrote: > > Hello. > > As seen in the PR, we need to fix also virtual PHIs, otherwise > TODO_cfg will skip edges for a missing PHI argument. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? OK - doesn't this mean you can remove the mark_virtual_operands_for_renaming (fun); call and thus TODO_update_ssa? Btw, the pass seems to unconditionally schedule TODO_cleanup_cfg - it would be nice to only do that (return TODO_cleanup_cfg from pass_if_to_switch::execute) if it did any transform. Thanks, Richard. > Thanks, > Martin > > gcc/ChangeLog: > > PR tree-optimization/98455 > * gimple-if-to-switch.cc (condition_info::record_phi_mapping): > Record also virtual PHIs. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/98455 > * gcc.dg/tree-ssa/pr98455.c: New test. > --- > gcc/gimple-if-to-switch.cc | 7 ++- > gcc/testsuite/gcc.dg/tree-ssa/pr98455.c | 19 +++ > 2 files changed, 21 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr98455.c > > diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc > index 560753d0311..96213d86c28 100644 > --- a/gcc/gimple-if-to-switch.cc > +++ b/gcc/gimple-if-to-switch.cc > @@ -91,11 +91,8 @@ condition_info::record_phi_mapping (edge e, mapping_vec > *vec) > gsi_next (&gsi)) > { > gphi *phi = gsi.phi (); > - if (!virtual_operand_p (gimple_phi_result (phi))) > - { > - tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e); > - vec->safe_push (std::make_pair (phi, arg)); > - } > + tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e); > + vec->safe_push (std::make_pair (phi, arg)); > } > } > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c > new file mode 100644 > index 000..24e249f6fcb > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c > @@ -0,0 +1,19 @@ > +/* PR tree-optimization/98455 */ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fno-tree-dce --param case-values-threshold=1" } */ > + > +void > +n4 (int io, int vb) > +{ > + double uc[2] = { 1.0, 2.0, }; > + > + if (io == 0) > +uc[0] = 0.0; > + > + for (;;) > +if (io == 0) > + if (vb == 0) > +uc[0] = uc[1]; > + else if (vb == 1) > +uc[1] = 0.0; > +} > -- > 2.29.2 >
Re: [PATCH] aarch64 : Mark rotate immediates with '#' as per DDI0487iFc.
Iain Sandoe writes: > Hi Richard, > > Richard Sandiford wrote: > >> Iain Sandoe writes: > >>> The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front >>> of the immediate rotation quantity. >>> >>> Although, it seems, GAS is able to infer the # (or is leninent about >>> its absence) assemblers based on the LLVM back end expect it and error >>> out. >>> >>> tested on aarch64-linux-gnu (gcc115) and aarch64-darwin20 (experimental) >> >> Sorry for the slow reply, didn't see this till now. > > Hmm .. I did CC you directly ( but having some troubles with this ISP which > I am > trying to resolve - emails going missing or have to be re-sent … :/ ) Yeah, I got the message, I just didn't see it, sorry. >> The patch is OK in principle, and personally I prefer “#”. But how far >> does this spread? Are only ROR modifiers on logical patterns affected? >> Or is the use of a paranthesised expression instead of a literal the thing >> that makes the difference? > > perhaps, Trying it out locally, that does seem to be the difference: and x1, x2, x3, ror 1// OK and x1, x2, x3, ror (1) // Rejected and x1, x2, x3, ror #(1) // OK Same for the other modifiers I tried. That doesn't look intentional, but whether it's intentional obviously isn't important in this context. So yeah, the patch is OK. Thanks, Richard
Re: [PATCH] libstdc++: implement locale support for AIX
Hi everyone, I've reworked the patch to merged dragonfly and AIX models into the new one named "ieee_1003.1-2008". It seems okay on the AIX part but if someone can test on Dragonfly and Freebsd I would be glad. Configure needs to be regenerated, first. For now, I've used #ifdef inside the code for the few differences. There are less than I thought. So, it seems okay to me. However, I haven't changed all the (locale_t) casts made on Dragonfly when passing a locale object to a syscall. On AIX, this is transparent, __c_locale being locale_t. I haven't updated tests failing on AIX yet. And I don't know for the Changelog/Patch, how renamed files must be handled ? I've retrieved my git commit with "git format-patch --no-renames" and thus the Changelog is made of "Removed" and "New File". Is it okay or is there anything special to add ? I didn't find the answer quickly. Thanks, ClémentFrom d36a04c7d4eb390817d711dc59ad0e0759bccdce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Cl=C3=A9ment=20Chigot?= Date: Tue, 29 Dec 2020 11:08:33 +0100 Subject: [PATCH] libstdc++: implement locale support for POSIX 2008 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The implementation is based on dragonfly one. It also adds support for AIX with a few tweaks. As of now, a few locale functions are missing on AIX. For strftime_l, localeconv_l, mbstowcs_l and wcsftime_l, uselocale must be set prior to use the version without _l. For strtof_l, strtod_l, strtold_l, a wrapper simply calls the default version. libstdc++-v3/ChangeLog: 2021-01-12 Clément Chigot * acinclude.m4: Add ieee_1003.1-2008 locale model. * configure: Regenerate. * config/os/aix/ctype_configure_char.cc: Enable locale support. * testsuite/lib/libstdc++.exp (check_v3_target_namedlocale): Handle AIX locale names. * testsuite/util/testsuite_hooks.h: Likewise. * config/locale/dragonfly/c_locale.cc: Removed. * config/locale/dragonfly/c_locale.h: Removed. * config/locale/dragonfly/codecvt_members.cc: Removed. * config/locale/dragonfly/collate_members.cc: Removed. * config/locale/dragonfly/ctype_members.cc: Removed. * config/locale/dragonfly/monetary_members.cc: Removed. * config/locale/dragonfly/numeric_members.cc: Removed. * config/locale/dragonfly/time_members.cc: Removed. * config/locale/dragonfly/time_members.h: Removed. * config/locale/ieee_1003.1-2008/c_locale.cc: New file. * config/locale/ieee_1003.1-2008/c_locale.h: New file. * config/locale/ieee_1003.1-2008/codecvt_members.cc: New file. * config/locale/ieee_1003.1-2008/collate_members.cc: New file. * config/locale/ieee_1003.1-2008/ctype_members.cc: New file. * config/locale/ieee_1003.1-2008/monetary_members.cc: New file. * config/locale/ieee_1003.1-2008/numeric_members.cc: New file. * config/locale/ieee_1003.1-2008/time_members.cc: New file. * config/locale/ieee_1003.1-2008/time_members.h: New file. --- libstdc++-v3/acinclude.m4 | 41 ++--- .../c_locale.cc | 3 + .../c_locale.h| 31 ++ .../codecvt_members.cc| 0 .../collate_members.cc| 0 .../ctype_members.cc | 58 +++ .../monetary_members.cc | 39 - .../numeric_members.cc| 27 + .../time_members.cc | 32 +- .../time_members.h| 0 .../config/os/aix/ctype_configure_char.cc | 44 +++--- libstdc++-v3/testsuite/lib/libstdc++.exp | 5 ++ libstdc++-v3/testsuite/util/testsuite_hooks.h | 3 + 13 files changed, 252 insertions(+), 31 deletions(-) rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/c_locale.cc (99%) rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/c_locale.h (80%) rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/codecvt_members.cc (100%) rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/collate_members.cc (100%) rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/ctype_members.cc (82%) rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/monetary_members.cc (97%) rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/numeric_members.cc (93%) rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/time_members.cc (94%) rename libstdc++-v3/config/locale/{dragonfly => ieee_1003.1-2008}/time_members.h (100%) diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index e4175ea3e64..64a91b72359 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -2421,7 +2421,7 @@ dnl AC_DEFUN([GLIBCXX_ENABLE_CLOCALE], [ GLIBCXX_ENA
Re: [PATCH] libstdc++: implement locale support for AIX
On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote: Hi everyone, I've reworked the patch to merged dragonfly and AIX models into the new one named "ieee_1003.1-2008". It seems okay on the AIX part but if someone can test on Dragonfly and Freebsd I would be glad. Configure needs to be regenerated, first. Presumably it could also be tested on GNU/Linux and Solaris, since they implement the POSIX 2008 APIs needed.
Re: [PATCH] libstdc++: implement locale support for AIX
> And I don't know for the Changelog/Patch, how renamed > files must be handled ? I've retrieved my git commit with > "git format-patch --no-renames" and thus the Changelog > is made of "Removed" and "New File". Is it okay or is there > anything special to add ? I didn't find the answer quickly. Actually, on the one, I've sent I've forgotten the "--no-renames" thus it's using git mechanism... Thanks, Clément
Re: [PATCH] libstdc++: implement locale support for AIX
> >Hi everyone, > > > >I've reworked the patch to merged dragonfly and AIX > >models into the new one named "ieee_1003.1-2008". > >It seems okay on the AIX part but if someone can test > >on Dragonfly and Freebsd I would be glad. Configure > >needs to be regenerated, first. > > Presumably it could also be tested on GNU/Linux and Solaris, since > they implement the POSIX 2008 APIs needed. Indeed, I'll try it tomorrow for Linux. I don't have any Solaris VMs on my labs. Note that I've forgotten to add some "defined(__FreeBSD__)" in the current patch, so it should not work on it. I'll send a better one tomorrow after having check on Linux.
Re: [PATCH] libstdc++: implement locale support for AIX
On Tue, Jan 12, 2021 at 10:25 AM Jonathan Wakely wrote: > > On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote: > >Hi everyone, > > > >I've reworked the patch to merged dragonfly and AIX > >models into the new one named "ieee_1003.1-2008". > >It seems okay on the AIX part but if someone can test > >on Dragonfly and Freebsd I would be glad. Configure > >needs to be regenerated, first. > > Presumably it could also be tested on GNU/Linux and Solaris, since > they implement the POSIX 2008 APIs needed. GNU/Linux currently uses the "gnu" locale configuration. Are you suggesting that GNU/Linux use the new POSIX 2008 locale configuration by default or to use GNU/Linux as another sniff test? Thanks, David
Re: [PATCH] libstdc++: implement locale support for AIX
Hi Jonathan, > On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote: >>Hi everyone, >> >>I've reworked the patch to merged dragonfly and AIX >>models into the new one named "ieee_1003.1-2008". >>It seems okay on the AIX part but if someone can test >>on Dragonfly and Freebsd I would be glad. Configure >>needs to be regenerated, first. > > Presumably it could also be tested on GNU/Linux and Solaris, since > they implement the POSIX 2008 APIs needed. I'll give the patch a whirl on Solaris. However, we will need to distinguish between 11.3 (which is XPG6 only) and 11.4 (which support XPG7). Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] libstdc++: implement locale support for AIX
Hi Clement, > I've reworked the patch to merged dragonfly and AIX > models into the new one named "ieee_1003.1-2008". > It seems okay on the AIX part but if someone can test > on Dragonfly and Freebsd I would be glad. Configure > needs to be regenerated, first. > > For now, I've used #ifdef inside the code for the few > differences. There are less than I thought. So, it seems > okay to me. TBH, I find this liberal sprinkling of target-specific #ifdefs over the code horrible: it's completely out of style with GCC conventions. This is exactly what autoconf is for: test for the existance of headers and functions like strtof_l and act accordingly. If in the future other OSes will make use of the code, most differences will already be handled and it's way easier to understand defined(HAVE_STRTOF_L) than defined(_AIX) || defined(HP_UX). From your code one simply cannot tell why some code is used on AIX while something else on (say) DragonflyBSD. It gets even worse when (as does happen) standard support differs between versions: expressing this with platform ifdefs is nothing short of a nightmare. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH][pushed] gcov: add more debugging facility
On Tue, Jan 12, 2021 at 3:55 AM Martin Liška wrote: > > The patch is about an extensive debugging facility that I see > beneficial for GCOV issue. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > I'm going to push it. > > Thanks, > Martin > > gcc/ChangeLog: > > * gcov.c (source_info::debug): New. > (print_usage): Add --debug (-D) option. > (process_args): Likewise. > (generate_results): Call src->debug after > accumulate_line_counts. > (read_graph_file): Properly assign id for EXIT_BLOCK. > * profile.c (branch_prob): Dump function body before it is > instrumented. > --- > gcc/gcov.c| 43 ++- > gcc/profile.c | 5 + > 2 files changed, 47 insertions(+), 1 deletion(-) > > diff --git a/gcc/gcov.c b/gcc/gcov.c > index 01ca52b215d..93128721ef6 100644 > --- a/gcc/gcov.c > +++ b/gcc/gcov.c > @@ -371,6 +371,9 @@ public: > /* Register a new function. */ > void add_function (function_info *fn); > > + /* Debug the source file. */ > + void debug (); > + > /* Index of the source_info in sources vector. */ > unsigned index; > > @@ -428,6 +431,31 @@ source_info::get_functions_at_location (unsigned > line_num) const > return slot; > } > > +void source_info::debug () > +{ > + fprintf (stderr, "source_info: %s\n", name); > + for (vector::iterator it = functions.begin (); > + it != functions.end (); it++) > +{ > + function_info *fn = *it; > + fprintf (stderr, " function_info: %s\n", fn->get_name ()); > + for (vector::iterator bit = fn->blocks.begin (); > + bit != fn->blocks.end (); bit++) > + { > + fprintf (stderr, "block_info id=%d, count=%ld\n", > + bit->id, bit->count); > + } > +} > + > + for (unsigned lineno = 1; lineno < lines.size (); ++lineno) > +{ > + line_info &line = lines[lineno]; > + fprintf (stderr, " line_info=%d, count=%ld\n", lineno, line.count); > +} > + > + fprintf (stderr, "\n"); > +} > + This breaks build on 32-bit hosts: https://gcc.gnu.org/pipermail/gcc-regression/2021-January/074130.html -- H.J.
[committed] LRA: patch to fix PR97969
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97969 The patch was successfully bootstrapped on x86-64. [PR97969] LRA: Transform pattern `plus (plus (hard reg, const), pseudo)` after elimination LRA can loop infinitely on targets without `reg + imm` insns. Register elimination on such targets can increase register pressure resulting in permanent stack size increase and changing elimination offset. To avoid such situation, a simple transformation can be done to avoid register pressure increase after generating reload insns containing eliminated hard regs. gcc/ChangeLog: PR target/97969 * lra-eliminations.c (eliminate_regs_in_insn): Add transformation of pattern 'plus (plus (hard reg, const), pseudo)'. gcc/testsuite/ChangeLog: PR target/97969 * gcc.target/arm/pr97969.c: New. diff --git a/gcc/lra-eliminations.c b/gcc/lra-eliminations.c index b28f3c410d3..ebcadd1006c 100644 --- a/gcc/lra-eliminations.c +++ b/gcc/lra-eliminations.c @@ -885,7 +885,7 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, bool first_p, poly_int64 update_sp_offset) { int icode = recog_memoized (insn); - rtx old_set = single_set (insn); + rtx set, old_set = single_set (insn); bool validate_p; int i; rtx substed_operand[MAX_RECOG_OPERANDS]; @@ -1038,6 +1038,32 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, bool first_p, for (i = 0; i < static_id->n_dups; i++) *id->dup_loc[i] = substed_operand[(int) static_id->dup_num[i]]; + /* Transform plus (plus (hard reg, const), pseudo) to plus (plus (pseudo, + const), hard reg) in order to keep insn containing eliminated register + after all reloads calculating its offset. This permits to keep register + pressure under control and helps to avoid LRA cycling in patalogical + cases. */ + if (! replace_p && (set = single_set (insn)) != NULL + && GET_CODE (SET_SRC (set)) == PLUS + && GET_CODE (XEXP (SET_SRC (set), 0)) == PLUS) +{ + rtx reg1, reg2, op1, op2; + + reg1 = op1 = XEXP (XEXP (SET_SRC (set), 0), 0); + reg2 = op2 = XEXP (SET_SRC (set), 1); + if (GET_CODE (reg1) == SUBREG) + reg1 = SUBREG_REG (reg1); + if (GET_CODE (reg2) == SUBREG) + reg2 = SUBREG_REG (reg2); + if (REG_P (reg1) && REG_P (reg2) + && REGNO (reg1) < FIRST_PSEUDO_REGISTER + && REGNO (reg2) >= FIRST_PSEUDO_REGISTER) + { + XEXP (XEXP (SET_SRC (set), 0), 0) = op2; + XEXP (SET_SRC (set), 1) = op1; + } +} + /* If we had a move insn but now we don't, re-recognize it. This will cause spurious re-recognition if the old move had a PARALLEL since the new one still will, but we can't call diff --git a/gcc/testsuite/gcc.target/arm/pr97969.c b/gcc/testsuite/gcc.target/arm/pr97969.c new file mode 100644 index 000..714a1d18870 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr97969.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c99 -fno-omit-frame-pointer -mthumb -w -Os" } */ + +typedef a[23]; +enum { b }; +typedef struct { + int c; + char *e; + char f +} d; +typedef enum { g = 1 } h; +typedef struct { + h i; + int j +} k; +typedef struct { + a l; + int a; + int m; + int n; + int o; + short p; + int q; + k r; + char e; + char *s; + d t; + d *u; + short v; + int w +} aa; +c(char x, int y, char z, int ab) { + aa ac; + ac.r.i = 0; + d ad; + ac.t = ad; + ac.u = 0; + ae(&ac.v, 0, 0); + ac.w = 0; + af(&ac, x + y, z, z + ab); + if (ag(0)) +return 0; + if (x) +ac.s = z + ab; + else +ac.s = x + y; + ac.o |= g; + if (!setjmp()) { +ah(ac); +ai(b); +ac.e = z + ab; +aj(ac); + } +}
[PATCH] Fix typo in function-abi.h
Hi all, just pushed the following as obvious. Regards Andrea >From 1aff68d54c33ae10fbbb52adb50527baf7b2f627 Mon Sep 17 00:00:00 2001 From: Andrea Corallo Date: Tue, 12 Jan 2021 17:52:52 +0100 Subject: [PATCH] Fix typo in function-abi.h gcc/Changelog 2021-01-12 Andrea Corallo * function-abi.h: Fix typo. --- gcc/function-abi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/function-abi.h b/gcc/function-abi.h index 0f34c8cfc65..ac58f676e33 100644 --- a/gcc/function-abi.h +++ b/gcc/function-abi.h @@ -1,4 +1,4 @@ -/* Information about fuunction binary interfaces. +/* Information about function binary interfaces. Copyright (C) 2019-2021 Free Software Foundation, Inc. This file is part of GCC -- 2.17.1
Re: [PATCH][pushed] gcov: add more debugging facility
On 1/12/21 5:27 PM, H.J. Lu wrote: This breaks build on 32-bit hosts: Sorry for that. Should be fixed with: commit 248feb2fa2c0dfc8950566010b30351879c9d057 Author: Martin Liska Date: Tue Jan 12 18:16:05 2021 +0100 gcov: fix printf format for 32-bit hosts gcc/ChangeLog: * gcov.c (source_info::debug): Fix printf format for 32-bit hosts. diff --git a/gcc/gcov.c b/gcc/gcov.c index 93128721ef6..5c651a9bdce 100644 --- a/gcc/gcov.c +++ b/gcc/gcov.c @@ -442,7 +442,7 @@ void source_info::debug () for (vector::iterator bit = fn->blocks.begin (); bit != fn->blocks.end (); bit++) { - fprintf (stderr, "block_info id=%d, count=%ld\n", + fprintf (stderr, "block_info id=%d, count=%" PRId64 " \n", bit->id, bit->count); } } @@ -450,7 +450,7 @@ void source_info::debug () for (unsigned lineno = 1; lineno < lines.size (); ++lineno) { line_info &line = lines[lineno]; - fprintf (stderr, " line_info=%d, count=%ld\n", lineno, line.count); + fprintf (stderr, " line_info=%d, count=%" PRId64 "\n", lineno, line.count); } fprintf (stderr, "\n"); I'm going to install the patch. Martin
Re: [PATCH] libstdc++: implement locale support for AIX
On 12/01/21 10:44 -0500, David Edelsohn wrote: On Tue, Jan 12, 2021 at 10:25 AM Jonathan Wakely wrote: On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote: >Hi everyone, > >I've reworked the patch to merged dragonfly and AIX >models into the new one named "ieee_1003.1-2008". >It seems okay on the AIX part but if someone can test >on Dragonfly and Freebsd I would be glad. Configure >needs to be regenerated, first. Presumably it could also be tested on GNU/Linux and Solaris, since they implement the POSIX 2008 APIs needed. GNU/Linux currently uses the "gnu" locale configuration. Are you suggesting that GNU/Linux use the new POSIX 2008 locale configuration by default or to use GNU/Linux as another sniff test? I'm suggesting to use it for testing that the new config is portable to other systems that provide the POSIX 2008 APIs. We wouldn't want to use it by default, because POSIX doesn't provide strtod_l etc that glibc provides. But maybe the "gnu" model should use newlocale and uselocale instead of __newlocale etc. so that we depend on fewer GNU extensions that are only defined with _GNU_SOURCE.
Re: [PATCH] libstdc++: implement locale support for AIX
Hi Jonathan, >> On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote: >>>Hi everyone, >>> >>>I've reworked the patch to merged dragonfly and AIX >>>models into the new one named "ieee_1003.1-2008". >>>It seems okay on the AIX part but if someone can test >>>on Dragonfly and Freebsd I would be glad. Configure >>>needs to be regenerated, first. >> >> Presumably it could also be tested on GNU/Linux and Solaris, since >> they implement the POSIX 2008 APIs needed. > > I'll give the patch a whirl on Solaris. However, we will need to > distinguish between 11.3 (which is XPG6 only) and 11.4 (which support > XPG7). as almost expected, a build on Solaris 11.4 failed miserably due to the use of the various BSD extensions (localeconv_l, mbstowcs_l, strtod_l, strtof_l, strtold_l, wcsftime_l), whose use or fallback implementations are currently guarded by _AIX. Should those be used in a directory supposed to conform to POSIX.1-2008 at all? OTOH, almost duplicating the code into a separate bsd (or whatever, it's certainly not only DragonflyBSD) directory for this sole reason would be a shame. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] libstdc++: implement locale support for AIX
On Tue, Jan 12, 2021 at 12:41 PM Rainer Orth wrote: > > Hi Jonathan, > > >> On 12/01/21 15:14 +, CHIGOT, CLEMENT wrote: > >>>Hi everyone, > >>> > >>>I've reworked the patch to merged dragonfly and AIX > >>>models into the new one named "ieee_1003.1-2008". > >>>It seems okay on the AIX part but if someone can test > >>>on Dragonfly and Freebsd I would be glad. Configure > >>>needs to be regenerated, first. > >> > >> Presumably it could also be tested on GNU/Linux and Solaris, since > >> they implement the POSIX 2008 APIs needed. > > > > I'll give the patch a whirl on Solaris. However, we will need to > > distinguish between 11.3 (which is XPG6 only) and 11.4 (which support > > XPG7). > > as almost expected, a build on Solaris 11.4 failed miserably due to the > use of the various BSD extensions (localeconv_l, mbstowcs_l, strtod_l, > strtof_l, strtold_l, wcsftime_l), whose use or fallback implementations > are currently guarded by _AIX. Should those be used in a directory > supposed to conform to POSIX.1-2008 at all? OTOH, almost duplicating > the code into a separate bsd (or whatever, it's certainly not only > DragonflyBSD) directory for this sole reason would be a shame. Hi, Rainer Thanks for testing. I agree that #ifdef's are not the correct approach, but, if you enable the fallbacks for Solaris, does everything then work? Are those fallbacks portable and we solely need a better mechanism to enable them on platforms that require them? Thanks, David
[PATCH] MAINTAINERS: Fix spacing
We indent with tabs, not spaces. This fixes it. Committed. 2021-01-12 Segher Boessenkool * MAINTAINERS: Fix spacing. --- MAINTAINERS | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index e88808f..6ec346c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -415,7 +415,7 @@ Jiufu Guo Xuepeng Guo Wei Guozhi Mostafa Hagog -Andrew Haley +Andrew Haley Frederik Harwath Stuart Hastings Michael Haubenwallner @@ -451,7 +451,7 @@ Daniel Jacobowitz Andreas Jaeger Harsha Jagasia Fariborz Jahanian -Qian Jianhua +Qian Jianhua Janis Johnson Teresa Johnson Kean Johnston @@ -465,7 +465,7 @@ Andi Kleen Jeff Knaggs Michael Koch Nicolas Koenig -Boris Kolpackov +Boris Kolpackov Dave Korn Julia Koval Hongtao Liu @@ -480,7 +480,7 @@ Razya Ladelsky Thierry Lafage Aaron W. LaFramboise Rask Ingemann Lambertsen -Jerome Lambourg +Jerome Lambourg Asher Langton Chris Lattner Terry Laurenzo -- 1.8.3.1
Re: [COMMITED] MAINTAINERS: Add myself for write after approval
Hi! On Tue, Jan 12, 2021 at 03:44:22PM +0800, Qian Jianhua wrote: > ChangeLog: > > 2021-01-12 Qian Jianhua > > * MAINTAINERS (Write After Approval): Add myself Welcome! > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -451,6 +451,7 @@ Daniel Jacobowitz > Andreas Jaeger > Harsha Jagasia > Fariborz Jahanian > +Qian Jianhua Indents should be with tabs here. I fixed it (and some others). Segher
[PATCH, Fortran] Bug fix in ISO_Fortran_binding - unsigned char arrays
Hi everyone, In a previous email thread (subject: Possible bug re: ISO_Fortran_binding.h) it was determined that there was a bug in ISO_Fortran_binding.c. When attempting to use the ISO Fortran binding from C to create an array of type uint8_t or signed char, a crash would result unless the element size was manually specified as 1 when calling CFI_establish. The F2018 standard indicates that signed char / uint8_t should be an integer type and therefore explicitly specifying the element size should not be required. The attached patch fixes the issue (test case included). OK for master? I don't have write access so I will need someone else to commit this for me if possible. Thanks, Harris Snyder Fixes a bug in ISO_Fortran_binding.c whereby signed char or uint8_t arrays would cause crashes unless an element size is specified. libgfortran/ChangeLog: * runtime/ISO_Fortran_binding.c (CFI_establish): fixed signed char arrays. gcc/testsuite/ChangeLog: * gfortran.dg/iso_fortran_binding_uint8_array.f90: New test. * gfortran.dg/iso_fortran_binding_uint8_array_driver.c: New test. diff --git a/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array.f90 b/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array.f90 new file mode 100644 index 000..bf85bf52949 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array.f90 @@ -0,0 +1,11 @@ +! { dg-do run } +! { dg-additional-sources iso_fortran_binding_uint8_array_driver.c } + +module m + use iso_c_binding +contains + subroutine fsub( x ) bind(C, name="fsub") + integer(c_int8_t), intent(inout) :: x(:) + x = x+1 + end subroutine +end module diff --git a/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array_driver.c b/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array_driver.c new file mode 100644 index 000..84ed127d525 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/iso_fortran_binding_uint8_array_driver.c @@ -0,0 +1,25 @@ +#include +#include +#include +#include "ISO_Fortran_binding.h" + +extern void fsub(CFI_cdesc_t *); + +int main(void) +{ + int8_t x[] = {1,2,3,4}; + int N = sizeof(x)/sizeof(x[0]); + + CFI_CDESC_T(1) dat; + CFI_index_t ext[1]; + ext[0] = (CFI_index_t)N; + int rc = CFI_establish((CFI_cdesc_t *)&dat, &x, CFI_attribute_other, + CFI_type_int8_t, 0, (CFI_rank_t)1, ext); + printf("CFI_establish call returned: %d\n", rc); + + fsub((CFI_cdesc_t *)&dat ); + + for (int i=0; ibase_addr = base_addr; if (type == CFI_type_char || type == CFI_type_ucs4_char || - type == CFI_type_signed_char || type == CFI_type_struct || - type == CFI_type_other) + type == CFI_type_struct || type == CFI_type_other) dv->elem_len = elem_len; else {
[PATCH] sh: Remove match_scratch operand test
This patch fixes a regression on sh4 introduced by the rtl-ssa stuff. The port had a pattern: (define_insn "movsf_ie" [(set (match_operand:SF 0 "general_movdst_operand" "=f,r,f,f,fy, f,m, r, r,m,f,y,y,rf,r,y,<,y,y") (match_operand:SF 1 "general_movsrc_operand" " f,r,G,H,FQ,mf,f,FQ,mr,r,y,f,>,fr,y,r,y,>,y")) (use (reg:SI FPSCR_MODES_REG)) (clobber (match_scratch:SI 2 "=X,X,X,X,&z, X,X, X, X,X,X,X,X, y,X,X,X,X,X"))] "TARGET_SH2E && (arith_reg_operand (operands[0], SFmode) || fpul_operand (operands[0], SFmode) || arith_reg_operand (operands[1], SFmode) || fpul_operand (operands[1], SFmode) || arith_reg_operand (operands[2], SImode))" But recog can generate this pattern from something that matches: [(set (match_operand:SF 0 "general_movdst_operand") (match_operand:SF 1 "general_movsrc_operand") (use (reg:SI FPSCR_MODES_REG))] with recog adding the (clobber (match_scratch:SI)) automatically. recog tests the C condition before adding the clobber, so there might not be an operands[2] to test. Similarly, gen_movsf_ie takes only two arguments, with operand 2 being filled in automatically. The only way to create this pattern with a REG operands[2] before RA would be to generate it directly from RTL. AFAICT the only things that do this are the secondary reload patterns, which are generated during RA and come with pre-vetted operands. arith_reg_operand rejects 6 specific registers: return (regno != T_REG && regno != PR_REG && regno != FPUL_REG && regno != FPSCR_REG && regno != MACH_REG && regno != MACL_REG); The fpul_operand tests allow FPUL_REG, leaving 5 invalid registers. However, in all alternatives of movsf_ie, either operand 0 or operand 1 is a register that belongs r, f or y, none of which include any of the 5 rejected registers. This means that any post-RA pattern would satisfy the operands[0] or operands[1] condition without the operands[2] test being necessary. Tested on sh4-elf, which it fixes quite a few ICEs. OK to install? Richard gcc/ * config/sh/sh.md (movsf_ie): Remove operands[2] test. --- gcc/config/sh/sh.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md index fe1231612cc..e3af9ae21c1 100644 --- a/gcc/config/sh/sh.md +++ b/gcc/config/sh/sh.md @@ -6067,8 +6067,7 @@ (define_insn "movsf_ie" && (arith_reg_operand (operands[0], SFmode) || fpul_operand (operands[0], SFmode) || arith_reg_operand (operands[1], SFmode) - || fpul_operand (operands[1], SFmode) - || arith_reg_operand (operands[2], SImode))" + || fpul_operand (operands[1], SFmode))" "@ fmov%1,%0 mov %1,%0
Re: [PATCH] libstdc++: implement locale support for AIX
Hi David, >> >> Presumably it could also be tested on GNU/Linux and Solaris, since >> >> they implement the POSIX 2008 APIs needed. >> > >> > I'll give the patch a whirl on Solaris. However, we will need to >> > distinguish between 11.3 (which is XPG6 only) and 11.4 (which support >> > XPG7). >> >> as almost expected, a build on Solaris 11.4 failed miserably due to the >> use of the various BSD extensions (localeconv_l, mbstowcs_l, strtod_l, >> strtof_l, strtold_l, wcsftime_l), whose use or fallback implementations >> are currently guarded by _AIX. Should those be used in a directory >> supposed to conform to POSIX.1-2008 at all? OTOH, almost duplicating >> the code into a separate bsd (or whatever, it's certainly not only >> DragonflyBSD) directory for this sole reason would be a shame. > > Hi, Rainer > > Thanks for testing. > > I agree that #ifdef's are not the correct approach, but, if you enable > the fallbacks for Solaris, does everything then work? Are those > fallbacks portable and we solely need a better mechanism to enable > them on platforms that require them? it mostly compiles, with two caveats: * c_locale.h needs to include for declarations of strtod and friends. * Solaris only declares the int_p_cs_precedes etc. members of struct lconv for C99+, but not for C++11+, as it should. I'll file a bug for that, but for now one can work around the issue by defining _LCONV_C99 before including in monetary_members.cc. With those changes, I can at least build libstdc++ with --enable-clocale=ieee_1003.1-2008. Bootstrap still running, though. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[committed] avoid two more ICEs in print_mem_ref (PR c/98597)
A recent improvement to MEM_REF formatting introduced a few invalid assumptions that didn't get exposed during testing and ended up causing ICEs in downstream packages and some distress to their maintainers. I have committed the attached patch to remove two of these assumptions and unblock the package builds. Since the MEM_REF formatting is clearly not being sufficiently exercised by the test suite I've also instrumented GCC to format every MEM_REF the uninit pass comes across, independent of whether or nor it is initialized, and will build a few packages with it to uncover any outstanding bugs. Martin commit 5a9cfad2de92f2d65585774acb524b3fa17621b5 Author: Martin Sebor Date: Tue Jan 12 12:58:27 2021 -0700 Avoid a couple more ICEs in print_mem_ref (PR c/98597). Resolves: PR c/98597 - ICE in -Wuninitialized printing a MEM_REF PR c/98592 - ICE in gimple_canonical_types_compatible_p while formatting gcc/c-family/ChangeLog: PR c/98597 PR c/98592 * c-pretty-print.c (print_mem_ref): Avoid assuming MEM_REF operand has pointer type. Remove redundant code. Avoid calling gimple_canonical_types_compatible_p. gcc/testsuite/ChangeLog: PR c/98597 PR c/98592 * g++.dg/warn/Wuninitialized-13.C: New test. gcc.dg/uninit-39.c: New test. # diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c index 87301a2091c..4d17c270b16 100644 --- a/gcc/c-family/c-pretty-print.c +++ b/gcc/c-family/c-pretty-print.c @@ -1847,10 +1847,11 @@ print_mem_ref (c_pretty_printer *pp, tree e) tree access_type = TREE_TYPE (e); if (TREE_CODE (access_type) == ARRAY_TYPE) access_type = TREE_TYPE (access_type); - tree arg_type = TREE_TYPE (TREE_TYPE (arg)); + tree arg_type = TREE_TYPE (arg); + if (POINTER_TYPE_P (arg_type)) +arg_type = TREE_TYPE (arg_type); if (TREE_CODE (arg_type) == ARRAY_TYPE) arg_type = TREE_TYPE (arg_type); - if (tree access_size = TYPE_SIZE_UNIT (access_type)) if (TREE_CODE (access_size) == INTEGER_CST) { @@ -1866,16 +1867,13 @@ print_mem_ref (c_pretty_printer *pp, tree e) /* True to include a cast to the accessed type. */ const bool access_cast = VOID_TYPE_P (arg_type) -|| !gimple_canonical_types_compatible_p (access_type, arg_type); +|| TYPE_MAIN_VARIANT (access_type) != TYPE_MAIN_VARIANT (arg_type); if (byte_off != 0) { /* When printing the byte offset for a pointer to a type of a different size than char, include a cast to char* first, before printing the cast to a pointer to the accessed type. */ - tree arg_type = TREE_TYPE (TREE_TYPE (arg)); - if (TREE_CODE (arg_type) == ARRAY_TYPE) - arg_type = TREE_TYPE (arg_type); offset_int arg_size = 0; if (tree size = TYPE_SIZE (arg_type)) arg_size = wi::to_offset (size); diff --git a/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C b/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C new file mode 100644 index 000..49ee878806a --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-13.C @@ -0,0 +1,28 @@ +/* PR c/98597 - ICE in -Wuninitialized printing a MEM_REF + { dg-do compile } + { dg-options "-O2 -Wall" } */ + +struct shared_count { + shared_count () { } + shared_count (shared_count &r) +: pi (r.pi) { } // { dg-warning "\\\[-Wuninitialized" } + int pi; +}; + +// There's another (redundant) -Wuninitialized on the line below. +struct shared_ptr { + int ptr; + shared_count refcount; +}; + +struct Bar { + Bar (int, shared_ptr); +}; + +void g () { + shared_ptr foo; + Bar (0, foo); +} + +// Prune out duplicates. +// { dg-prune-output "-Wuninitialized" } diff --git a/gcc/testsuite/gcc.dg/uninit-39.c b/gcc/testsuite/gcc.dg/uninit-39.c new file mode 100644 index 000..0f918542773 --- /dev/null +++ b/gcc/testsuite/gcc.dg/uninit-39.c @@ -0,0 +1,47 @@ +/* PR c/98592 - ICE in gimple_canonical_types_compatible_p while formatting + a MEM_REF + { dg-do compile } + { dg-options "-O2 -Wall" } */ + +void f (int); + +void vlaNx3_to_pia1 (int n) +{ + int a[n][3]; + + /* The VLA isn't formatted correctly due to PR 98587. Just verify + there is no ICE and a warning is issued. */ + f (((*(int(*)[4])&a[1][2]))[3]); // { dg-warning "\\\[-Wuninitialized" } +} + +void vlaNxN_to_pia1 (int n) +{ + int a[n][n]; + + /* Same as above. */ + f (((*(int(*)[4])&a[1][2]))[3]); // { dg-warning "\\\[-Wuninitialized" } +} + +void vlaNxN_to_pvla4xN (int n) +{ + int a[n][n]; + + /* Same as above. */ + f (((*(int(*)[4][n])&a[1][2]))[3][4]); // { dg-warning "\\\[-Wuninitialized" } +} + +void vlaN_to_pia2 (int n) +{ + int a[n]; + + /* Same as above. */ + f (((*(int(*)[3][4])&a[1]))[2][3]); // { dg-warning "\\\[-Wuninitialized" } +} + +void vlaN_to_pvlaNx4 (int n) +{ + int a[n]; + + /* Same as above. */ + f (((*(int(*)[n][4])&a[1]))[1][3]); //
Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
Hi, Just check in to see whether you have any comments and suggestions on this: FYI, I have been continue with Approach D implementation since last week: D. Adding calls to .DEFFERED_INIT during gimplification, expand the .DEFFERED_INIT during expand to real initialization. Adjusting uninitialized pass with the new refs with “.DEFFERED_INIT”. For the remaining work of Approach D: ** complete the implementation of -ftrivial-auto-var-init=pattern; ** complete the implementation of uninitialized warnings maintenance work for D. I have completed the uninitialized warnings maintenance work for D. And finished partial of the -ftrivial-auto-var-init=pattern implementation. The following are remaining work of Approach D: ** -ftrivial-auto-var-init=pattern for VLA; **add a new attribute for variable: __attribute((uninitialized) the marked variable is uninitialized intentionaly for performance purpose. ** adding complete testing cases; Please let me know if you have any objection on my current decision on implementing approach D. Thanks a lot for your help. Qing > On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches > wrote: > > Hi, > > This is an update for our previous discussion. > > 1. I implemented the following two different implementations in the latest > upstream gcc: > > A. Adding real initialization during gimplification, not maintain the > uninitialized warnings. > > D. Adding calls to .DEFFERED_INIT during gimplification, expand the > .DEFFERED_INIT during expand to > real initialization. Adjusting uninitialized pass with the new refs with > “.DEFFERED_INIT”. > > Note, in this initial implementation, > ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of > -ftrivial-auto-var-init=pattern > is not done yet. Therefore, the performance data is only about > -ftrivial-auto-var-init=zero. > > ** I added an temporary option -fauto-var-init-approach=A|B|C|D to > choose implementation A or D for > runtime performance study. > ** I didn’t finish the uninitialized warnings maintenance work for D. > (That might take more time than I expected). > > 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc > for the following 3 cases: > > no: default. (-g -O2 -march=native ) > A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > > And then compute the slowdown data for both A and D as following: > > benchmarksA / no D /no > > 500.perlbench_r 1.25% 1.25% > 502.gcc_r 0.68% 1.80% > 505.mcf_r 0.68% 0.14% > 520.omnetpp_r 4.83% 4.68% > 523.xalancbmk_r 0.18% 1.96% > 525.x264_r1.55% 2.07% > 531.deepsjeng_11.57% 11.85% > 541.leela_r 0.64% 0.80% > 557.xz_-0.41% -0.41% > > 507.cactuBSSN_r 0.44% 0.44% > 508.namd_r0.34% 0.34% > 510.parest_r 0.17% 0.25% > 511.povray_r 56.57% 57.27% > 519.lbm_r 0.00% 0.00% > 521.wrf_r -0.28% -0.37% > 526.blender_r 16.96% 17.71% > 527.cam4_r0.70% 0.53% > 538.imagick_r 2.40% 2.40% > 544.nab_r 0.00% -0.65% > > avg 5.17% 5.37% > > From the above data, we can see that in general, the runtime performance > slowdown for > implementation A and D are similar for individual benchmarks. > > There are several benchmarks that have significant slowdown with the new > added initialization for both > A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I will > try to study a little bit > more on what kind of new initializations introduced such slowdown. > > From the current study so far, I think that approach D should be good enough > for our final implementation. > So, I will try to finish approach D with the following remaining work > > ** complete the implementation of -ftrivial-auto-var-init=pattern; > ** complete the implementation of uninitialized warnings maintenance > work for D. > > > Let me know if you have any comments and suggestions on my current and future > work. > > Thanks a lot for your help. > > Qing > >> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches >> wrote: >> >> The following are the approaches I will implement and compare: >> >> Our final goal is to keep the uninitialized warning and minimize the >> run-time performance cost. >> >> A. Adding real initialization during gimplification, not maintain the >> uninitialized warnings. >> B. Adding real initialization during gimplification, marking them with >> “artificial_init”. >>Adjusting uninitialized pass, maintaining the annotation, making sure the >> real init not >>Deleted from the fake init. >> C. Marking the DECL for an uninitialized auto variable as >> “no_explicit
Re: [patch] fix -Wformat-diag warnings in rs6000-call.c
On Sat, Jan 09, 2021 at 07:44:31PM +0100, Matthias Klose wrote: > These warnings, including the suggested fixes are seen on power*-linux builds. > > warning: misspelled term 'builtin function' in format; use 'bult-in function' > instead [-Wformat-diag] This one is wrong. Almost all the functions are called __builtin_xxx so changing the warning message like that is just silly. Instead, just rephrase it so you do not need to repeat it, The warning message itself is buggy: it pretends it knows what it is talking about. No warning ever does, that is the difference between warnings and errors. The hint (after the semi-colon) is extra wrong. > warning: spurious trailing punctuation sequence '].' in format [-Wformat-diag] This is also wrong (it says error ("Second argument of %qs must be in the range [0, 3].", name); which is obviously correct: "]" is not punctuation, and that makes the error message (again, not phrased as a warning) incorrect as well. > warning: misspelled term 'floating point' in format; use 'floating-point' > instead [-Wformat-diag] It is a a noun here, which is spelled without dash. The warning is mistaken, and its advice (not phrased as advice) is bad advice. [-- Attachment #2: format-diag.diff --] [-- Type: text/x-patch, Encoding: quoted-printable, Size: 2.2K --] Please use plain text attachments, so that people can a) read it on the mail archives, and b) apply the patches easily, etc. Segher
libgo patch committed: Ensure openat uses variadic wrapper
This libgo patch by Paul Murphy ensures that openat always uses the variadic libc wrapper. On powerpc64le, not using the wrapper caused a failure in TestUnshareUidGidMapping due to stack corruption which resulted in a bogus execve syscall. This fixes GCC PR 98610. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian 3cab7034b52dbc40c78c8ccb1738795de80e927b diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index 094b8fad483..cd95c3d0755 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -a2578eb3983514641f0baf44d27d6474d3a96758 +255657dc8d61ab26121ca68f124412ef37599166 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/libgo/go/syscall/exec_linux.go b/libgo/go/syscall/exec_linux.go index 38975810432..0da6c964966 100644 --- a/libgo/go/syscall/exec_linux.go +++ b/libgo/go/syscall/exec_linux.go @@ -20,7 +20,7 @@ import ( //mount(source *byte, target *byte, fstype *byte, flags _C_long, data *byte) _C_int //sysnb rawOpenat(dirfd int, pathname *byte, flags int, perm uint32) (fd int, err Errno) -//openat(dirfd _C_int, pathname *byte, flags _C_int, perm Mode_t) _C_int +//__go_openat(dirfd _C_int, pathname *byte, flags _C_int, perm Mode_t) _C_int // SysProcIDMap holds Container ID to Host ID mappings used for User Namespaces in Linux. // See user_namespaces(7).
Re: [patch] fix -Wformat-diag warnings in rs6000-call.c
Hi! On Sun, Jan 10, 2021 at 02:18:24PM -0700, Martin Sebor wrote: > Symbols/identifiers should be formatted using the appropriate > directives or quoted in %< %>. We do not have a way to mark up the mathematical symbols [], but we do require those to express ranges (which is a bad idea /an sich/, but that aside). > The purpose of the -Wformat-diag warnings is to improve the consistency > of user-visible messages Which is not a good idea in and of itself. It *is* good to identify bad practices, of course, but to make the GCC feedback more dull and formalised only makes it harder to find where an error came from (and seeing that as a positive thing requires a special kind of mind already ;-) ). > and make them easier to translate. That can be helpful. > There was > a discussion some time back about whether internal errors should fall > into this category. IMNSHO it is counter-productive to translate internal messages at all. Can we automatically mark those as non-translated? > I'm not sure if it reached a conclusion one way > or the other but in similar situations elsewhere in GCC we have > suppressed the warning via #pragma GCC diagnostic. This isn't needed then either. Win-win-win! > If it takes too > much effort to clean them up it might make sense to do the same here > (the downside is that it doesn't help translators). Otherwise, > the messages are not really phrased in a way that's comprehensible > either to users or to tranlators (acronyms like elt or rtx aren't > universally understood). Neither translators or even users are an audience for those messages at all: the only thing they should see is "internal error", and perhaps how to report it. Reporting a translated version of the message is a bad idea. Segher
[Patch, RFC] PR fortran/93340 - [8/9/10/11 Regression] fix missed substring simplifications
Dear all, when playing around with the issues exposed by PR93340, particularly visible in the tree dump, I tried to find ways to simplify substrings in those cases where they are eligible as designator, which is required e.g. in DATA statements. Given my limited understanding, I finally arrived at a potential solution which does that simplification near the end of match_string_constant in primary.c. I couldn't find a better place, but I am open to better suggestions. The simplification below does an even better job at detecting invalid substring starting or ending indices than HEAD, and regtests cleanly on x86_64-pc-linux-gnu. Feedback appreciated. Is this potentially ok for master, or should this be done differently? Thanks, Harald PR fortran/93340 - fix missed substring simplifications Substrings were not reduced early enough for use in initializations, such as DATA statements. Add an early simplification for substrings with constant starting and ending points. gcc/fortran/ChangeLog: * gfortran.h (gfc_resolve_substring): Add prototype. * primary.c (match_string_constant): Simplify substrings with constant starting and ending points. * resolve.c: Rename resolve_substring to gfc_resolve_substring. (gfc_resolve_ref): Use renamed function gfc_resolve_substring. gcc/testsuite/ChangeLog: * substr_10.f90: New test. * substr_9.f90: New test. diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 6585e4f3ecd..4dd72b620c9 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -3467,6 +3467,7 @@ bool find_forall_index (gfc_expr *, gfc_symbol *, int); bool gfc_resolve_index (gfc_expr *, int); bool gfc_resolve_dim_arg (gfc_expr *); bool gfc_is_formal_arg (void); +bool gfc_resolve_substring (gfc_ref *, bool *); void gfc_resolve_substring_charlen (gfc_expr *); match gfc_iso_c_sub_interface(gfc_code *, gfc_symbol *); gfc_expr *gfc_expr_to_initialize (gfc_expr *); diff --git a/gcc/fortran/primary.c b/gcc/fortran/primary.c index db9ecf9a4f6..7cb378e3090 100644 --- a/gcc/fortran/primary.c +++ b/gcc/fortran/primary.c @@ -1190,6 +1190,61 @@ got_delim: if (match_substring (NULL, 0, &e->ref, false) != MATCH_NO) e->expr_type = EXPR_SUBSTRING; + /* Substrings with constant starting and ending points are eligible as + designators (F2018, section 9.1). Simplify substrings to make them usable + e.g. in data statements. */ + if (e->expr_type == EXPR_SUBSTRING + && e->ref && e->ref->type == REF_SUBSTRING + && e->ref->u.ss.start->expr_type == EXPR_CONSTANT + && (e->ref->u.ss.end == NULL + || e->ref->u.ss.end->expr_type == EXPR_CONSTANT)) +{ + gfc_expr *res; + ptrdiff_t istart, iend; + size_t length; + bool equal_length = false; + + /* Basic checks on substring starting and ending indices. */ + if (!gfc_resolve_substring (e->ref, &equal_length)) + return MATCH_ERROR; + + length = e->value.character.length; + istart = gfc_mpz_get_hwi (e->ref->u.ss.start->value.integer); + if (e->ref->u.ss.end == NULL) + iend = length; + else + iend = gfc_mpz_get_hwi (e->ref->u.ss.end->value.integer); + + if (istart <= iend) + { + if (istart < 1) + { + gfc_error ("Substring start index (%ld) at %L below 1", + (long) istart, &e->ref->u.ss.start->where); + return MATCH_ERROR; + } + if (iend > (ssize_t) length) + { + gfc_error ("Substring end index (%ld) at %L exceeds string " + "length", (long) iend, &e->ref->u.ss.end->where); + return MATCH_ERROR; + } + length = iend - istart + 1; + } + else + length = 0; + + res = gfc_get_constant_expr (BT_CHARACTER, e->ts.kind, &e->where); + res->value.character.string = gfc_get_wide_string (length + 1); + res->value.character.length = length; + if (length > 0) + memcpy (res->value.character.string, + &e->value.character.string[istart - 1], + length * sizeof (gfc_char_t)); + res->value.character.string[length] = '\0'; + e = res; +} + *result = e; return MATCH_YES; diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index f243bd185b0..3929ddff849 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -5068,8 +5068,8 @@ resolve_array_ref (gfc_array_ref *ar) } -static bool -resolve_substring (gfc_ref *ref, bool *equal_length) +bool +gfc_resolve_substring (gfc_ref *ref, bool *equal_length) { int k = gfc_validate_kind (BT_INTEGER, gfc_charlen_int_kind, false); @@ -5277,7 +5277,7 @@ gfc_resolve_ref (gfc_expr *expr) case REF_SUBSTRING: equal_length = false; - if (!resolve_substring (*prev, &equal_length)) + if (!gfc_resolve_substring (*prev, &equal_length)) return false; if (expr->expr_type != EXPR_SUBSTRING && equal_length) diff --git a/gcc/testsuite/substr_10.f90 b/gcc/testsuite/substr_10.f90 new file mode 100644 index 000..918ca8af162 --- /dev/null +++ b/gcc/testsuite/substr_1
[PATCH] c++: Always check access during late-parsing of members [PR58993]
This patch removes a vestigial use of dk_no_check from cp_parser_late_parsing_for_member, which ideally should have been removed as part of the PR41437 patch that improved access checking inside templates. This allows us to correctly reject f1 and f2 below in the testcase access34.C below (whereas before we'd only reject f3). Additional testing revealed an access issue when late-parsing a hidden friend within a class template. In the testcase friend68.C below, we're tripping over the checking assert from friend_accessible_p(f, S::j, S, S) during lookup of j in x.j (for which type_dependent_object_expression_p returns false, which is why we're doing the lookup at parse time). The reason for the assert failure is that DECL_FRIENDLIST(S) contains f but DECL_BEFRIENDING_CLASSES(f) is empty, and so friend_accessible_p (which looks at DECL_BEFRIENDING_CLASSES) wants to return false, but is_friend (which looks at DECL_FRIENDLIST) returns true. For sake of symmetry one would probably hope that DECL_BEFRIENDING_CLASSES(f) contains S, but add_friend avoids updating DECL_BEFRIENDING_CLASSES when the class type (S in this case) is dependent, for some reason. This patch works around this issue by making friend_accessible_p consider the DECL_FRIEND_CONTEXT of SCOPE. Thus we sidestep the DECL_BEFRIENDING_CLASSES / DECL_FRIENDLIST asymmetry issue while correctly validating the x.j access at parse time. A earlier version of this patch called friend_accessible_p instead of protected_accessible_p in the DECL_FRIEND_CONTEXT hunk below, but this had the side effect of making us accept the ill-formed testcase friend69.C below (ill-formed because a hidden friend is not actually a class member, so g doesn't have access to B's members even though A is a friend of B). Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look like the right approach? gcc/cp/ChangeLog: PR c++/41437 PR c++/58993 * search.c (friend_accessible_p): If scope is a hidden friend defined inside a dependent class, consider access from the defining class. * parser.c (cp_parser_late_parsing_for_member): Don't push a dk_no_check access state. gcc/testsuite/ChangeLog: PR c++/41437 PR c++/58993 * g++.dg/opt/pr87974.C: Adjust. * g++.dg/template/access34.C: New test. * g++.dg/template/friend68a.C: New test. * g++.dg/template/friend68b.C: New test. * g++.dg/template/friend69.C: New test. --- gcc/cp/parser.c | 7 -- gcc/cp/search.c | 8 +++ gcc/testsuite/g++.dg/opt/pr87974.C | 1 + gcc/testsuite/g++.dg/template/access34.C | 29 gcc/testsuite/g++.dg/template/friend68.C | 13 +++ gcc/testsuite/g++.dg/template/friend69.C | 18 +++ 6 files changed, 69 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/g++.dg/template/access34.C create mode 100644 gcc/testsuite/g++.dg/template/friend68.C create mode 100644 gcc/testsuite/g++.dg/template/friend69.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index b95151bf90d..c26e16beb70 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -30825,10 +30825,6 @@ cp_parser_late_parsing_for_member (cp_parser* parser, tree member_function) start_preparsed_function (member_function, NULL_TREE, SF_PRE_PARSED | SF_INCLASS_INLINE); - /* Don't do access checking if it is a templated function. */ - if (processing_template_decl) - push_deferring_access_checks (dk_no_check); - /* #pragma omp declare reduction needs special parsing. */ if (DECL_OMP_DECLARE_REDUCTION_P (member_function)) { @@ -30842,9 +30838,6 @@ cp_parser_late_parsing_for_member (cp_parser* parser, tree member_function) cp_parser_function_definition_after_declarator (parser, /*inline_p=*/true); - if (processing_template_decl) - pop_deferring_access_checks (); - /* Leave the scope of the containing function. */ if (function_scope) pop_function_context (); diff --git a/gcc/cp/search.c b/gcc/cp/search.c index 1a9dba451c7..dd3773da4f7 100644 --- a/gcc/cp/search.c +++ b/gcc/cp/search.c @@ -698,6 +698,14 @@ friend_accessible_p (tree scope, tree decl, tree type, tree otype) if (DECL_CLASS_SCOPE_P (scope) && friend_accessible_p (DECL_CONTEXT (scope), decl, type, otype)) return 1; + /* Perhaps SCOPE is a friend function defined inside a class from which +DECL is accessible. Checking this is necessary only when the class +is dependent, for otherwise add_friend will already have added the +class to SCOPE's DECL_BEFRIENDING_CLASSES. */ + if (tree fctx = DECL_FRIEND_CONTEXT (scope)) + if (dependent_type_p (fctx) + && protected_accessible_p (decl, fctx, type, otype)) + return
Re: [PATCH] sh: Remove match_scratch operand test
On 1/12/21 12:35 PM, Richard Sandiford wrote: > This patch fixes a regression on sh4 introduced by the rtl-ssa stuff. > The port had a pattern: > > (define_insn "movsf_ie" > [(set (match_operand:SF 0 "general_movdst_operand" > "=f,r,f,f,fy, f,m, r, r,m,f,y,y,rf,r,y,<,y,y") > (match_operand:SF 1 "general_movsrc_operand" > " f,r,G,H,FQ,mf,f,FQ,mr,r,y,f,>,fr,y,r,y,>,y")) >(use (reg:SI FPSCR_MODES_REG)) >(clobber (match_scratch:SI 2 "=X,X,X,X,&z, X,X, X, X,X,X,X,X, > y,X,X,X,X,X"))] > "TARGET_SH2E >&& (arith_reg_operand (operands[0], SFmode) >|| fpul_operand (operands[0], SFmode) >|| arith_reg_operand (operands[1], SFmode) >|| fpul_operand (operands[1], SFmode) >|| arith_reg_operand (operands[2], SImode))" > > But recog can generate this pattern from something that matches: > > [(set (match_operand:SF 0 "general_movdst_operand") > (match_operand:SF 1 "general_movsrc_operand") >(use (reg:SI FPSCR_MODES_REG))] > > with recog adding the (clobber (match_scratch:SI)) automatically. > recog tests the C condition before adding the clobber, so there might > not be an operands[2] to test. > > Similarly, gen_movsf_ie takes only two arguments, with operand 2 > being filled in automatically. The only way to create this pattern > with a REG operands[2] before RA would be to generate it directly > from RTL. AFAICT the only things that do this are the secondary > reload patterns, which are generated during RA and come with > pre-vetted operands. > > arith_reg_operand rejects 6 specific registers: > > return (regno != T_REG && regno != PR_REG > && regno != FPUL_REG && regno != FPSCR_REG > && regno != MACH_REG && regno != MACL_REG); > > The fpul_operand tests allow FPUL_REG, leaving 5 invalid registers. > However, in all alternatives of movsf_ie, either operand 0 or > operand 1 is a register that belongs r, f or y, none of which > include any of the 5 rejected registers. This means that any > post-RA pattern would satisfy the operands[0] or operands[1] > condition without the operands[2] test being necessary. > > Tested on sh4-elf, which it fixes quite a few ICEs. OK to install? OK. Hopefully sh4/sh4eb will bootstrap again after you install that change. Thanks! jeff > > Richard > > > gcc/ > * config/sh/sh.md (movsf_ie): Remove operands[2] test. > --- > gcc/config/sh/sh.md | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md > index fe1231612cc..e3af9ae21c1 100644 > --- a/gcc/config/sh/sh.md > +++ b/gcc/config/sh/sh.md > @@ -6067,8 +6067,7 @@ (define_insn "movsf_ie" > && (arith_reg_operand (operands[0], SFmode) > || fpul_operand (operands[0], SFmode) > || arith_reg_operand (operands[1], SFmode) > - || fpul_operand (operands[1], SFmode) > - || arith_reg_operand (operands[2], SImode))" > + || fpul_operand (operands[1], SFmode))" >"@ > fmov%1,%0 > mov %1,%0 >
Re: [patch] fix -Wformat-diag warnings in rs6000-call.c
On Tue, 12 Jan 2021, Segher Boessenkool wrote: > On Sat, Jan 09, 2021 at 07:44:31PM +0100, Matthias Klose wrote: > > These warnings, including the suggested fixes are seen on power*-linux > > builds. > > > > warning: misspelled term 'builtin function' in format; use 'bult-in > > function' > > instead [-Wformat-diag] > > This one is wrong. Almost all the functions are called __builtin_xxx > so changing the warning message like that is just silly. Instead, just > rephrase it so you do not need to repeat it, As documented in codingconventions.html we use "built-in" as an adjective in English text (as opposed to literal sequences of characters from C source code). Text in diagnostics that is meant to be source code rather than English words should be enclosed in %<%>. A literal %<__builtin%> should be unchanged in translations, whereas "built-in" used as an adjective should be translated. > > warning: spurious trailing punctuation sequence '].' in format > > [-Wformat-diag] > > This is also wrong (it says > error ("Second argument of %qs must be in the range [0, 3].", name); > which is obviously correct: "]" is not punctuation, and that makes the > error message (again, not phrased as a warning) incorrect as well. The leading capital letter is incorrect, as is the trailing '.'; both of those are contrary to the GNU Coding Standards for diagnostics. Once those are fixed, having a trailing ']' is fine. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] amdgcn: Fix subdf3 pattern
This patch fixes a typo in the subdf3 pattern that meant it had a non-standard name and thus the compiler would emit a libcall rather than the proper hardware instruction for DFmode subtraction. Tested with standalone AMD GCN target. I will commit shortly. Julian 2021-01-13 Julian Brown gcc/ * config/gcn/gcn-valu.md (subdf): Rename to... (subdf3): This. --- gcc/config/gcn/gcn-valu.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index 59d9849cb4e..24c17212c2a 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -2185,7 +2185,7 @@ [(set_attr "type" "vop3a") (set_attr "length" "8,8")]) -(define_insn "subdf" +(define_insn "subdf3" [(set (match_operand:DF 0 "register_operand" "= v, v") (minus:DF (match_operand:DF 1 "gcn_alu_operand" "vSvB, v") -- 2.29.2
[PATCH] amdgcn: Improve FP division accuracy
GCN has a reciprocal-approximation instruction but no hardware divide. This patch adjusts the open-coded reciprocal approximation/Newton-Raphson refinement steps to use fused multiply-add instructions as is necessary to obtain a properly-rounded result, and adds further refinement steps to correctly round the full division result. The patterns in question are still guarded by a flag_reciprocal_math condition, and do not yet support denormals. Tested with standalone AMD GCN target, with the new test failing without the path and passing with. I will commit shortly. Julian 2021-01-13 Julian Brown gcc/ * config/gcn/gcn-valu.md (recip2, recip2): Use unspec for reciprocal-approximation instructions. (div3): Use fused multiply-accumulate operations for reciprocal refinement and division result. * config/gcn/gcn.md (UNSPEC_RCP): New unspec constant. gcc/testsuite/ * gcc.target/gcn/fpdiv.c: New test. --- gcc/config/gcn/gcn-valu.md | 60 +++- gcc/config/gcn/gcn.md| 3 +- gcc/testsuite/gcc.target/gcn/fpdiv.c | 38 ++ 3 files changed, 81 insertions(+), 20 deletions(-) create mode 100644 gcc/testsuite/gcc.target/gcn/fpdiv.c diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index 24c17212c2a..beefcf754d7 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -2351,9 +2351,9 @@ (define_insn "recip2" [(set (match_operand:V_FP 0 "register_operand" "= v") - (div:V_FP - (vec_duplicate:V_FP (float: (const_int 1))) - (match_operand:V_FP 1 "gcn_alu_operand" "vSvB")))] + (unspec:V_FP + [(match_operand:V_FP 1 "gcn_alu_operand" "vSvB")] + UNSPEC_RCP))] "" "v_rcp%i0\t%0, %1" [(set_attr "type" "vop1") @@ -2361,9 +2361,9 @@ (define_insn "recip2" [(set (match_operand:FP 0 "register_operand" "= v") - (div:FP - (float:FP (const_int 1)) - (match_operand:FP 1 "gcn_alu_operand" "vSvB")))] + (unspec:FP + [(match_operand:FP 1 "gcn_alu_operand" "vSvB")] + UNSPEC_RCP))] "" "v_rcp%i0\t%0, %1" [(set_attr "type" "vop1") @@ -2382,28 +2382,39 @@ (match_operand:V_FP 2 "gcn_valu_src0_operand")] "flag_reciprocal_math" { -rtx two = gcn_vec_constant (mode, - const_double_from_real_value (dconst2, mode)); +rtx one = gcn_vec_constant (mode, + const_double_from_real_value (dconst1, mode)); rtx initrcp = gen_reg_rtx (mode); rtx fma = gen_reg_rtx (mode); rtx rcp; +rtx num = operands[1], denom = operands[2]; -bool is_rcp = (GET_CODE (operands[1]) == CONST_VECTOR +bool is_rcp = (GET_CODE (num) == CONST_VECTOR && real_identical (CONST_DOUBLE_REAL_VALUE - (CONST_VECTOR_ELT (operands[1], 0)), &dconstm1)); + (CONST_VECTOR_ELT (num, 0)), &dconstm1)); if (is_rcp) rcp = operands[0]; else rcp = gen_reg_rtx (mode); -emit_insn (gen_recip2 (initrcp, operands[2])); -emit_insn (gen_fma4_negop2 (fma, initrcp, operands[2], two)); -emit_insn (gen_mul3 (rcp, initrcp, fma)); +emit_insn (gen_recip2 (initrcp, denom)); +emit_insn (gen_fma4_negop2 (fma, initrcp, denom, one)); +emit_insn (gen_fma4 (rcp, fma, initrcp, initrcp)); if (!is_rcp) - emit_insn (gen_mul3 (operands[0], operands[1], rcp)); + { + rtx div_est = gen_reg_rtx (mode); + rtx fma2 = gen_reg_rtx (mode); + rtx fma3 = gen_reg_rtx (mode); + rtx fma4 = gen_reg_rtx (mode); + emit_insn (gen_mul3 (div_est, num, rcp)); + emit_insn (gen_fma4_negop2 (fma2, div_est, denom, num)); + emit_insn (gen_fma4 (fma3, fma2, rcp, div_est)); + emit_insn (gen_fma4_negop2 (fma4, fma3, denom, num)); + emit_insn (gen_fma4 (operands[0], fma4, rcp, fma3)); + } DONE; }) @@ -2414,10 +2425,11 @@ (match_operand:FP 2 "gcn_valu_src0_operand")] "flag_reciprocal_math" { -rtx two = const_double_from_real_value (dconst2, mode); +rtx one = const_double_from_real_value (dconst1, mode); rtx initrcp = gen_reg_rtx (mode); rtx fma = gen_reg_rtx (mode); rtx rcp; +rtx num = operands[1], denom = operands[2]; bool is_rcp = (GET_CODE (operands[1]) == CONST_DOUBLE && real_identical (CONST_DOUBLE_REAL_VALUE (operands[1]), @@ -2428,12 +2440,22 @@ else rcp = gen_reg_rtx (mode); -emit_insn (gen_recip2 (initrcp, operands[2])); -emit_insn (gen_fma4_negop2 (fma, initrcp, operands[2], two)); -emit_insn (gen_mul3 (rcp, initrcp, fma)); +emit_insn (gen_recip2 (initrcp, denom)); +emit_insn (gen_fma4_negop2 (fma, initrcp, denom, one)); +emit_insn (gen_fma4 (rcp, fma, initrcp, initrcp)); if (!is_rcp) - emit_insn (gen_mul3 (operands[0], operands[1], rcp)); + { + rtx div_est =
[PATCH] amdgcn: Fix exec register live-on-entry to BB in md-reorg
This patch fixes a corner case in the AMD GCN md-reorg pass when the EXEC register is live on entry to a BB, and could be clobbered by code inserted by the pass before a use in (e.g.) a different BB. I don't have a standalone test case demonstrating this failure mode, but I did observe it with an offload compiler (on another branch) with a particular benchmark. Tested with standalone AMD GCN target. I will commit shortly. Julian 2021-01-13 Julian Brown gcc/ * config/gcn/gcn.c (gcn_md_reorg): Fix case where EXEC reg is live on entry to a BB. --- gcc/config/gcn/gcn.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index 0fb9bd26727..630ce4eebc7 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -4501,6 +4501,8 @@ gcn_md_reorg (void) df_insn_rescan_all (); } + df_live_add_problem (); + df_live_set_all_dirty (); df_analyze (); /* This pass ensures that the EXEC register is set correctly, according @@ -4522,6 +4524,17 @@ gcn_md_reorg (void) int64_t curr_exec = 0; /* 0 here means 'the value is that of EXEC after last_exec_def is executed'. */ + bitmap live_in = DF_LR_IN (bb); + bool exec_live_on_entry = false; + if (bitmap_bit_p (live_in, EXEC_LO_REG) + || bitmap_bit_p (live_in, EXEC_HI_REG)) + { + if (dump_file) + fprintf (dump_file, "EXEC reg is live on entry to block %d\n", +(int) bb->index); + exec_live_on_entry = true; + } + FOR_BB_INSNS_SAFE (bb, insn, curr) { if (!NONDEBUG_INSN_P (insn)) @@ -4660,6 +4673,8 @@ gcn_md_reorg (void) exec_lo_def_p == exec_hi_def_p ? "full" : "partial", INSN_UID (insn)); } + + exec_live_on_entry = false; } COPY_REG_SET (&live, DF_LR_OUT (bb)); @@ -4669,7 +4684,7 @@ gcn_md_reorg (void) at the end of the block. */ if ((REGNO_REG_SET_P (&live, EXEC_LO_REG) || REGNO_REG_SET_P (&live, EXEC_HI_REG)) - && (!curr_exec_known || !curr_exec_explicit)) + && (!curr_exec_known || !curr_exec_explicit || exec_live_on_entry)) { rtx_insn *end_insn = BB_END (bb); -- 2.29.2
[PATCH] amdgcn: Remove dead code for fixed v0 register
This patch removes code to fix the v0 register in gcn_conditional_register_usage that was missed out of the previous patch removing the need for that: https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534284.html Tested with standalone AMD GCN target. I will commit shortly. Julian 2021-01-13 Julian Brown gcc/ * config/gcn/gcn.c (gcn_conditional_register_usage): Remove dead code to fix v0 register. --- gcc/config/gcn/gcn.c | 4 1 file changed, 4 deletions(-) diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c index 630ce4eebc7..b08f4b32c9c 100644 --- a/gcc/config/gcn/gcn.c +++ b/gcc/config/gcn/gcn.c @@ -2137,10 +2137,6 @@ gcn_conditional_register_usage (void) fixed_regs[cfun->machine->args.reg[WORK_ITEM_ID_Y_ARG]] = 1; if (cfun->machine->args.reg[WORK_ITEM_ID_Z_ARG] >= 0) fixed_regs[cfun->machine->args.reg[WORK_ITEM_ID_Z_ARG]] = 1; - - if (TARGET_GCN5_PLUS) -/* v0 is always zero, for global nul-offsets. */ -fixed_regs[VGPR_REGNO (0)] = 1; } /* Determine if a load or store is valid, according to the register classes -- 2.29.2
Re: [patch] fix -Wformat-diag warnings in rs6000-call.c
Hi! On Wed, Jan 13, 2021 at 12:49:42AM +, Joseph Myers wrote: > On Tue, 12 Jan 2021, Segher Boessenkool wrote: > > On Sat, Jan 09, 2021 at 07:44:31PM +0100, Matthias Klose wrote: > > > These warnings, including the suggested fixes are seen on power*-linux > > > builds. > > > > > > warning: misspelled term 'builtin function' in format; use 'bult-in > > > function' > > > instead [-Wformat-diag] > > > > This one is wrong. Almost all the functions are called __builtin_xxx > > so changing the warning message like that is just silly. Instead, just > > rephrase it so you do not need to repeat it, > > As documented in codingconventions.html we use "built-in" as an adjective > in English text (as opposed to literal sequences of characters from C > source code). Text in diagnostics that is meant to be source code rather > than English words should be enclosed in %<%>. A literal %<__builtin%> > should be unchanged in translations, whereas "built-in" used as an > adjective should be translated. My point is that "built-in function `__builtin_blablabla'" looks silly. It is much better to not mention "built-in" again (it isn't useful anyway). > > > warning: spurious trailing punctuation sequence '].' in format > > > [-Wformat-diag] > > > > This is also wrong (it says > > error ("Second argument of %qs must be in the range [0, 3].", name); > > which is obviously correct: "]" is not punctuation, and that makes the > > error message (again, not phrased as a warning) incorrect as well. > > The leading capital letter is incorrect, as is the trailing '.'; both of > those are contrary to the GNU Coding Standards for diagnostics. Yes, and yes. But the warning message is *wrong*, that is my point. It says "]." is a punctuation sequence, and it is not (the "[" isn't punctuation at all, it is a mathematical symbol). I am not saying the diagnostics could not be improved, they certainly could. But the warning messages are just wrong on many levels as well, and the suggested "fixes" make things worse in many cases (and the messages do not say "maybe" or "we suggest" or similar). Thanks, Segher
[PATCH] libstdc++: c++2b, implement WG21 P1679R3
Add contains member function to basic_string_view and basic_string. The new method is enabled for -std=gnu++20, gnu++2b and c++2b. This allows users to access the method as a GNU extension to C++20. The conditional test may be reduced to "__cplusplus > 202011L" once GCC has a c++2b switch. libstdc++-v3/ Add contains member function to basic_string_view. Likewise to basic_string_view, both with and without _GLIBCXX_USE_CXX11_ABI. Enabled with -std=gnu++20, gnu++2b and c++2b. * include/bits/basic_string.h (basic_string::contains): New. * libstdc++-v3/include/std/string_view (basic_string_view::contains): New. * testsuite/21_strings/basic_string/operations/contains/char/1.cc: New test. * testsuite/21_strings/basic_string/operations/contains/wchar_t/1.cc: New test. * testsuite/21_strings/basic_string/operations/starts_with/char/1.cc: Remove trailing whitespace * testsuite/21_strings/basic_string/operations/starts_with/wchar_t/1.cc: Remove trailing whitespace * testsuite/21_strings/basic_string_view/operations/contains/char/1.cc: New test. * testsuite/21_strings/basic_string_view/operations/contains/wchar_t/1.cc: New test. --- libstdc++-v3/include/bits/basic_string.h | 30 libstdc++-v3/include/std/string_view | 16 ++ libstdc++-v3/testsuite/21_strings/basic_string/operations/contains/char/1.cc | 65 ++ libstdc++-v3/testsuite/21_strings/basic_string/operations/contains/wchar_t/1.cc | 65 ++ libstdc++-v3/testsuite/21_strings/basic_string/operations/starts_with/char/1.cc |2 libstdc++-v3/testsuite/21_strings/basic_string/operations/starts_with/wchar_t/1.cc |2 libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char/1.cc | 51 +++ libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/wchar_t/1.cc | 51 +++ 8 files changed, 280 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index e272d332934..a569ecd8c08 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -3073,6 +3073,21 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 { return __sv_type(this->data(), this->size()).ends_with(__x); } #endif // C++20 +#if __cplusplus > 202011L || \ + (__cplusplus == 202002L && !defined __STRICT_ANSI__) + bool + contains(basic_string_view<_CharT, _Traits> __x) const noexcept + { return __sv_type(this->data(), this->size()).contains(__x); } + + bool + contains(_CharT __x) const noexcept + { return __sv_type(this->data(), this->size()).contains(__x); } + + bool + contains(const _CharT* __x) const noexcept + { return __sv_type(this->data(), this->size()).contains(__x); } +#endif // C++23 + // Allow basic_stringbuf::__xfer_bufptrs to call _M_length: template friend class basic_stringbuf; }; @@ -5998,6 +6013,21 @@ _GLIBCXX_END_NAMESPACE_CXX11 { return __sv_type(this->data(), this->size()).ends_with(__x); } #endif // C++20 +#if __cplusplus > 202011L || \ + (__cplusplus == 202002L && !defined __STRICT_ANSI__) + bool + contains(basic_string_view<_CharT, _Traits> __x) const noexcept + { return __sv_type(this->data(), this->size()).contains(__x); } + + bool + contains(_CharT __x) const noexcept + { return __sv_type(this->data(), this->size()).contains(__x); } + + bool + contains(const _CharT* __x) const noexcept + { return __sv_type(this->data(), this->size()).contains(__x); } +#endif // C++23 + # ifdef _GLIBCXX_TM_TS_INTERNAL friend void ::_txnal_cow_string_C1_for_exceptions(void* that, const char* s, diff --git a/libstdc++-v3/include/std/string_view b/libstdc++-v3/include/std/string_view index e33e1bc4b79..2f47ef6ed12 100644 --- a/libstdc++-v3/include/std/string_view +++ b/libstdc++-v3/include/std/string_view @@ -352,6 +352,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { return this->ends_with(basic_string_view(__x)); } #endif // C++20 +#if __cplusplus > 202011L || \ + (__cplusplus == 202002L && !defined __STRICT_ANSI__) +#define __cpp_lib_string_contains 202011L + constexpr bool + contains(basic_string_view __x) const noexcept + { return this->find(__x) != npos; } + + constexpr bool + contains(_CharT __x) const noexcept + { return this->find(__x) != npos; } + + constexpr bool + contains(const _CharT* __x) const noexcept + { return this->find(__x) != npos; } +#endif // C++23 + // [string.view.find], searching constexpr size_type diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/operations/contains/char/1.cc b/libstdc++-v3/testsuite/21_strings/basic_string/operations/contains/char/1.cc new file mode 100644 index 000..5d81dcee0ad --- /dev/null +++ b/libstdc++-v3/testsuite/21_strings/basic_string/operations/contains
[PATCH] c++: ICE when late parsing noexcept/NSDMI [PR98333]
Since certain members of a class are a complete-class context [class.mem.general]p7, we delay their parsing untile the whole class has been parsed. For instance, NSDMIs and noexcept-specifiers. The order in which we perform this delayed parsing matters; we were first parsing NSDMIs and only they did we parse noexcept-specifiers. That turns out to be wrong: since NSDMIs may use noexcept-specifiers, we must process noexcept-specifiers first. Otherwise we'll ICE in code that doesn't expect to see DEFERRED_PARSE. This doesn't just shift the problem, noexcept-specifiers can use members with a NSDMI just fine, and I've also tested a similar test with this member function: bool f() { return __has_nothrow_constructor (S); } and that compiled fine too. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10? gcc/cp/ChangeLog: PR c++/98333 * parser.c (cp_parser_class_specifier_1): Perform late-parsing of NSDMIs before late-parsing of noexcept-specifiers. gcc/testsuite/ChangeLog: PR c++/98333 * g++.dg/cpp0x/noexcept62.C: New test. --- gcc/cp/parser.c | 44 - gcc/testsuite/g++.dg/cpp0x/noexcept62.C | 10 ++ 2 files changed, 31 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept62.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index c713852fe93..92ea4a23d17 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -25008,31 +25008,10 @@ cp_parser_class_specifier_1 (cp_parser* parser) maybe_end_member_template_processing (); } vec_safe_truncate (unparsed_funs_with_default_args, 0); - /* Now parse any NSDMIs. */ - save_ccp = current_class_ptr; - save_ccr = current_class_ref; - FOR_EACH_VEC_SAFE_ELT (unparsed_nsdmis, ix, decl) - { - if (class_type != DECL_CONTEXT (decl)) - { - if (pushed_scope) - pop_scope (pushed_scope); - class_type = DECL_CONTEXT (decl); - pushed_scope = push_scope (class_type); - } - inject_this_parameter (class_type, TYPE_UNQUALIFIED); - cp_parser_late_parsing_nsdmi (parser, decl); - } - vec_safe_truncate (unparsed_nsdmis, 0); - current_class_ptr = save_ccp; - current_class_ref = save_ccr; - if (pushed_scope) - pop_scope (pushed_scope); /* If there are noexcept-specifiers that have not yet been processed, -take care of them now. */ - class_type = NULL_TREE; - pushed_scope = NULL_TREE; +take care of them now. Do this before processing NSDMIs as they +may depend on noexcept-specifiers already having been processed. */ FOR_EACH_VEC_SAFE_ELT (unparsed_noexcepts, ix, decl) { tree ctx = DECL_CONTEXT (decl); @@ -25084,6 +25063,25 @@ cp_parser_class_specifier_1 (cp_parser* parser) maybe_end_member_template_processing (); } vec_safe_truncate (unparsed_noexcepts, 0); + + /* Now parse any NSDMIs. */ + save_ccp = current_class_ptr; + save_ccr = current_class_ref; + FOR_EACH_VEC_SAFE_ELT (unparsed_nsdmis, ix, decl) + { + if (class_type != DECL_CONTEXT (decl)) + { + if (pushed_scope) + pop_scope (pushed_scope); + class_type = DECL_CONTEXT (decl); + pushed_scope = push_scope (class_type); + } + inject_this_parameter (class_type, TYPE_UNQUALIFIED); + cp_parser_late_parsing_nsdmi (parser, decl); + } + vec_safe_truncate (unparsed_nsdmis, 0); + current_class_ptr = save_ccp; + current_class_ref = save_ccr; if (pushed_scope) pop_scope (pushed_scope); diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept62.C b/gcc/testsuite/g++.dg/cpp0x/noexcept62.C new file mode 100644 index 000..53606c79142 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/noexcept62.C @@ -0,0 +1,10 @@ +// PR c++/98333 +// { dg-do compile { target c++11 } } + +struct T { + template + struct S { +S () noexcept (N) {} + }; + int a = __has_nothrow_constructor (S); +}; base-commit: cfaaa6a1ca744c1a93fa08a3e7ab2a821383cac1 -- 2.29.2
[PATCH] c++: ICE with delayed noexcept and attribute used [PR97966]
Another ICE with delayed noexcept parsing, but a bit gnarlier. A function definition marked with __attribute__((used)) ought to be emitted even when it is not referenced in a TU. For a member function template marked with __attribute__((used)) this means that it will be instantiated: in instantiate_class_template_1 we have 11971 /* Instantiate members marked with attribute used. */ 11972 if (r != error_mark_node && DECL_PRESERVE_P (r)) 11973 mark_used (r); It is not so surprising that this doesn't work well with delayed noexcept parsing: when we're processing the function template we delay the parsing, so the member "foo" is found, but then when we're instantiating it, "foo" hasn't yet been seen, which creates a discrepancy and a crash ensues. "foo" hasn't yet been seen because instantiate_class_template_1 just loops over the class members and instantiates right away. It stands to reason to disable delayed noexcept parsing when the function is marked with used (clang++ also rejects). Unfortunately we can't just use lookup_attribute when parsing and check if "used" is there and if so, clear CP_PARSER_FLAGS_DELAY_NOEXCEPT, because we accept attributes following the noexcept-specifier, like this: void g() noexcept(...) __attribute__((used)); Oh well. This patch should handle various forms of attributes in one place. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10? gcc/cp/ChangeLog: PR c++/97966 * parser.c (cp_parser_save_default_args): Don't delay parsing of the noexcept-specifier of a function marked with attribute used. gcc/testsuite/ChangeLog: PR c++/97966 * g++.dg/cpp0x/noexcept63.C: New test. --- gcc/cp/parser.c | 20 +- gcc/testsuite/g++.dg/cpp0x/noexcept63.C | 49 + 2 files changed, 68 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept63.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index c713852fe93..07ab966e099 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -30880,7 +30880,25 @@ cp_parser_save_default_args (cp_parser* parser, tree decl) /* Remember if there is a noexcept-specifier to post process. */ tree spec = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (decl)); if (UNPARSED_NOEXCEPT_SPEC_P (spec)) -vec_safe_push (unparsed_noexcepts, decl); +{ + /* Don't actually delay parsing of the noexcept-specifier of +a member function marked with attribute used. */ + if (__builtin_expect (DECL_PRESERVE_P (decl), 0)) + { + auto cleanup = make_temp_override + (parser->local_variables_forbidden_p); + if (DECL_THIS_STATIC (decl)) + parser->local_variables_forbidden_p |= THIS_FORBIDDEN; + inject_this_parameter (current_class_type, TYPE_UNQUALIFIED); + spec = cp_parser_late_noexcept_specifier (parser, + TREE_PURPOSE (spec)); + if (spec == error_mark_node) + spec = NULL_TREE; + fixup_deferred_exception_variants (TREE_TYPE (decl), spec); + } + else + vec_safe_push (unparsed_noexcepts, decl); +} } /* DEFAULT_ARG contains the saved tokens for the initializer of DECL, diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept63.C b/gcc/testsuite/g++.dg/cpp0x/noexcept63.C new file mode 100644 index 000..8efeac93a30 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/noexcept63.C @@ -0,0 +1,49 @@ +// PR c++/97966 +// { dg-do compile { target c++11 } } + +template +struct S1 { + __attribute__((used)) S1() noexcept(noexcept(this->foo())); // { dg-error "has no member" } + void foo(); +}; + +template +struct S2 { + __attribute__((used)) void bar() noexcept(noexcept(this->foo())); // { dg-error "has no member" } + void foo(); +}; + +template +struct S3 { + void __attribute__((used)) bar() noexcept(noexcept(this->foo())); // { dg-error "has no member" } + void foo(); +}; + +template +struct S4 { + [[gnu::used]] void bar() noexcept(noexcept(this->foo())); // { dg-error "has no member" } + void foo(); +}; + +template +struct S5 { + void bar() noexcept(noexcept(this->foo())) __attribute__((used)); // { dg-error "has no member" } + void foo(); +}; + +template +struct S6 { + static void bar() noexcept(noexcept(this->foo())) __attribute__((used)); // { dg-error ".this. may not be used in this context" } + void foo(); +}; + +void +g () +{ + S1<1> s1; + S2<1> s2; + S3<1> s3; + S4<1> s4; + S5<1> s5; + S6<1> s6; +} base-commit: cfaaa6a1ca744c1a93fa08a3e7ab2a821383cac1 -- 2.29.2
Re: Bug fix in ISO_Fortran_binding - unsigned char arrays
Hi everyone, Sorry, my previous email erroneously referred to unsigned chars / uint8_t, when I in fact meant signed chars / int8_t. The actual patch works, but the test case files have ‘unit’ in the file names. I’ll provide a corrected patch tomorrow to fix the file names. Harris
Re: [Ada,FYI] revamp ada.numerics.aux
Hello Alexandre, On 19/10/2020 13:52, Alexandre Oliva wrote: On Oct 19, 2020, Andreas Schwab wrote: -nostdinc a-nallfl.ads -o a-nallfl.o a-nallfl.ads:48:13: warning: intrinsic binding type mismatch on return value a-nallfl.ads:48:13: warning: intrinsic binding type mismatch on argument 1 a-nallfl.ads:48:13: warning: profile of "Sin" doesn't match the builtin it binds Thanks for the report. Ada's Standard.Long_Long_Float is mapped to C double rather than long double on this target. Here's a workaround, for aarch64-* and ppc*-linux-gnu, where I've observed the mismatch so far. I'm not sure whether we're going to use something like this, or a fix for the mismatch; I'm yet to figure out how to implement the latter. aarch64-* and ppc*-linux-gnu long long float/long double mismatch I have a similar issue on riscv-rtems*: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98595 Can this be fixed also with a patch to gcc/ada/Makefile.rtl? -- embedded brains GmbH Herr Sebastian HUBER Dornierstr. 4 82178 Puchheim Germany email: sebastian.hu...@embedded-brains.de phone: +49-89-18 94 741 - 16 fax: +49-89-18 94 741 - 08 Registergericht: Amtsgericht München Registernummer: HRB 157899 Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/
RE: [COMMITED] MAINTAINERS: Add myself for write after approval
> On Wed, Jan 13, 2021 at 02:45:19, Segher Boessenkool wrote: > > Hi! > > On Tue, Jan 12, 2021 at 03:44:22PM +0800, Qian Jianhua wrote: > > ChangeLog: > > > > 2021-01-12 Qian Jianhua > > > > * MAINTAINERS (Write After Approval): Add myself > > Welcome! > > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -451,6 +451,7 @@ Daniel Jacobowitz > > > Andreas Jaeger > > Harsha Jagasia > > > Fariborz Jahanian > > +Qian Jianhua > > > Indents should be with tabs here. I fixed it (and some others). > Thank you for that. Regards, Qian > > Segher >
[PATCH] i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]
Hi! The following patch implements what I've talked about, i.e. to no longer force operands of vec_perm_const into registers in the generic code, but let each of the (currently 8) targets force it into registers individually, giving the targets better control on if it does that and when and allowing them to do something special with some particular operands. And then defines the define_insn_and_split for the 256-bit and 512-bit permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could add define_insn_and_split patterns also for the bd, bq and wq). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-01-13 Jakub Jelinek PR target/95905 * optabs.c (expand_vec_perm_const): Don't force v0 and v1 into registers before calling targetm.vectorize.vec_perm_const, only after that. * config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle two argument permutation when one operand is zero vector and only after that force operands into registers. * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1, *avx512bw_zero_extendv32qiv32hi2_1, *avx512f_zero_extendv16hiv16si2_1, *avx2_zero_extendv8hiv8si2_1, *avx512f_zero_extendv8siv8di2_1, *avx2_zero_extendv4siv4di2_1): New define_insn_and_split patterns. * config/mips/mips.c (mips_vectorize_vec_perm_const): Force operands into registers. * config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise. * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise. * config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise. * config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const): Likewise. * config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const): Likewise. * config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise. Use std::swap. * gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of scan-assembler. Add tests with zero vector as first __builtin_shuffle operand. * gcc.target/i386/pr95905-3.c: New test. * gcc.target/i386/pr95905-4.c: New test. --- gcc/optabs.c.jj 2021-01-04 10:25:38.632236100 +0100 +++ gcc/optabs.c2021-01-12 14:46:44.719557815 +0100 @@ -6070,11 +6070,8 @@ expand_vec_perm_const (machine_mode mode if (targetm.vectorize.vec_perm_const != NULL) { - v0 = force_reg (mode, v0); if (single_arg_p) v1 = v0; - else - v1 = force_reg (mode, v1); if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices)) return target; @@ -6095,6 +6092,11 @@ expand_vec_perm_const (machine_mode mode return gen_lowpart (mode, target_qi); } + v0 = force_reg (mode, v0); + if (single_arg_p) +v1 = v0; + v1 = force_reg (mode, v1); + /* Otherwise expand as a fully variable permuation. */ /* The optabs are only defined for selectors with the same width --- gcc/config/i386/i386-expand.c.jj2021-01-12 11:01:51.189386077 +0100 +++ gcc/config/i386/i386-expand.c 2021-01-12 15:43:55.673095807 +0100 @@ -19929,6 +19929,33 @@ ix86_vectorize_vec_perm_const (machine_m two_args = canonicalize_perm (&d); + /* If one of the operands is a zero vector, try to match pmovzx. */ + if (two_args && (d.op0 == CONST0_RTX (vmode) || d.op1 == CONST0_RTX (vmode))) +{ + struct expand_vec_perm_d dzero = d; + if (d.op0 == CONST0_RTX (vmode)) + { + d.op1 = dzero.op1 = force_reg (vmode, d.op1); + std::swap (dzero.op0, dzero.op1); + for (i = 0; i < nelt; ++i) + dzero.perm[i] ^= nelt; + } + else + d.op0 = dzero.op0 = force_reg (vmode, d.op0); + + if (expand_vselect_vconcat (dzero.target, dzero.op0, dzero.op1, + dzero.perm, nelt, dzero.testing_p)) + return true; +} + + /* Force operands into registers. */ + rtx nop0 = force_reg (vmode, d.op0); + if (d.op0 == d.op1) +d.op1 = nop0; + d.op0 = nop0; + if (d.op0 != d.op1) +d.op1 = force_reg (vmode, d.op1); + if (ix86_expand_vec_perm_const_1 (&d)) return true; --- gcc/config/i386/sse.md.jj 2021-01-12 14:30:32.688546846 +0100 +++ gcc/config/i386/sse.md 2021-01-12 15:40:29.018402527 +0100 @@ -17611,6 +17611,23 @@ (define_insn "avx2_v16qiv16hi2v16qiv16hi2" [(set (match_operand:V16HI 0 "register_operand") (any_extend:V16HI @@ -17628,6 +17645,23 @@ (define_insn "avx512bw_v32qiv32hi2 (set_attr "prefix" "evex") (set_attr "mode" "XI")]) +(define_insn_and_split "*avx512bw_zero_extendv32qiv32hi2_1" + [(set (match_operand:V64QI 0 "register_operand" "=v") + (vec_select:V64QI + (vec_concat:V128QI + (match_operand:V64QI 1 "nonimmediate_operand" "vm") + (match_operand:V64QI 2 "const0_operand" "C")) + (match_parallel 3 "pmovzx_parallel" + [(match_operand 4 "const_int_operand" "n")])))] +
[PATCH] i386: Add define_insn_and_split patterns for btrl [PR96938]
Hi! In the following testcase we only optimize f2 and f7 to btrl, although we should optimize that way all of the functions. The problem is the type demotion/narrowing (which is performed solely during the generic folding and not later), without it we see the AND performed in SImode and match it as btrl, but with it while the shifts are still performed in SImode, the AND is already done in QImode or HImode low part of the shift. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-01-13 Jakub Jelinek PR target/96938 * config/i386/i386.md (*btr_1, *btr_2): New define_insn_and_split patterns. (splitter after *btr_2): New splitter. * gcc.target/i386/pr96938.c: New test. --- gcc/config/i386/i386.md.jj 2021-01-07 17:18:39.653487482 +0100 +++ gcc/config/i386/i386.md 2021-01-12 19:01:37.286603961 +0100 @@ -12419,6 +12419,70 @@ (define_insn_and_split "*btr_mask_ (match_dup 3))) (clobber (reg:CC FLAGS_REG))])]) +(define_insn_and_split "*btr_1" + [(set (match_operand:SWI12 0 "register_operand") + (and:SWI12 + (subreg:SWI12 + (rotate:SI (const_int -2) + (match_operand:QI 2 "register_operand")) 0) + (match_operand:SWI12 1 "nonimmediate_operand"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_USE_BT && ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel + [(set (match_dup 0) + (and:SI (rotate:SI (const_int -2) (match_dup 2)) + (match_dup 1))) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[0] = lowpart_subreg (SImode, operands[0], mode); + if (MEM_P (operands[1])) +operands[1] = force_reg (mode, operands[1]); + operands[1] = lowpart_subreg (SImode, operands[1], mode); +}) + +(define_insn_and_split "*btr_2" + [(set (zero_extract:HI + (match_operand:SWI12 0 "nonimmediate_operand") + (const_int 1) + (zero_extend:SI (match_operand:QI 1 "register_operand"))) + (const_int 0)) + (clobber (reg:CC FLAGS_REG))] + "TARGET_USE_BT && ix86_pre_reload_split ()" + "#" + "&& MEM_P (operands[0])" + [(set (match_dup 2) (match_dup 0)) + (parallel + [(set (match_dup 3) (and:SI (rotate:SI (const_int -2) (match_dup 1)) +(match_dup 4))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_dup 0) (match_dup 5))] +{ + operands[2] = gen_reg_rtx (mode); + operands[5] = gen_reg_rtx (mode); + operands[3] = lowpart_subreg (SImode, operands[5], mode); + operands[4] = lowpart_subreg (SImode, operands[2], mode); +}) + +(define_split + [(set (zero_extract:HI + (match_operand:SWI12 0 "register_operand") + (const_int 1) + (zero_extend:SI (match_operand:QI 1 "register_operand"))) + (const_int 0)) + (clobber (reg:CC FLAGS_REG))] + "TARGET_USE_BT && ix86_pre_reload_split ()" + [(parallel + [(set (match_dup 0) + (and:SI (rotate:SI (const_int -2) (match_dup 1)) + (match_dup 2))) + (clobber (reg:CC FLAGS_REG))])] +{ + operands[2] = lowpart_subreg (SImode, operands[0], mode); + operands[0] = lowpart_subreg (SImode, operands[0], mode); +}) + ;; These instructions are never faster than the corresponding ;; and/ior/xor operations when using immediate operand, so with ;; 32-bit there's no point. But in 64-bit, we can't hold the --- gcc/testsuite/gcc.target/i386/pr96938.c.jj 2021-01-12 19:12:48.285023954 +0100 +++ gcc/testsuite/gcc.target/i386/pr96938.c 2021-01-12 19:12:33.209194271 +0100 @@ -0,0 +1,66 @@ +/* PR target/96938 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -masm=att" } */ +/* { dg-final { scan-assembler-times "\tbtrl\t" 10 } } */ + +void +f1 (unsigned char *f, int o, unsigned char v) +{ + *f = (*f & ~(1 << o)) | (v << o); +} + +void +f2 (unsigned char *f, int o, unsigned char v) +{ + int t = *f & ~(1 << o); + *f = t | (v << o); +} + +void +f3 (unsigned char *f, int o, unsigned char v) +{ + *f &= ~(1 << o); +} + +void +f4 (unsigned char *f, int o, unsigned char v) +{ + *f = (*f & ~(1 << (o & 31))) | v; +} + +void +f5 (unsigned char *f, int o, unsigned char v) +{ + *f = (*f & ~(1 << (o & 31))) | (v << (o & 31)); +} + +void +f6 (unsigned short *f, int o, unsigned short v) +{ + *f = (*f & ~(1 << o)) | (v << o); +} + +void +f7 (unsigned short *f, int o, unsigned short v) +{ + int t = *f & ~(1 << o); + *f = t | (v << o); +} + +void +f8 (unsigned short *f, int o, unsigned short v) +{ + *f &= ~(1 << o); +} + +void +f9 (unsigned short *f, int o, unsigned short v) +{ + *f = (*f & ~(1 << (o & 31))) | v; +} + +void +f10 (unsigned short *f, int o, unsigned short v) +{ + *f = (*f & ~(1 << (o & 31))) | (v << (o & 31)); +} Jakub
[PATCH] match.pd: Fold (~X | C) ^ D into (X | C) ^ (~D ^ C) if C and D are constants [PR96691]
Hi! These simplifications are only simplifications if the (~D ^ C) or (D ^ C) expressions fold into constants, but in that case they decrease number of operations by 1. For the first transformation I guess I should add reverse simplification when C and D are arbitrary (constants would fold before; to be tried incrementally). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-01-13 Jakub Jelinek PR tree-optimization/96691 * match.pd ((~X | C) ^ D -> (X | C) ^ (~D ^ C), (~X & C) ^ D -> (X & C) ^ (D ^ C)): New simplifications for constant C and D. * gcc.dg/tree-ssa/pr96691.c: New test. --- gcc/match.pd.jj 2021-01-12 13:17:35.0 +0100 +++ gcc/match.pd2021-01-12 21:39:51.310921835 +0100 @@ -947,6 +947,20 @@ (define_operator_list COND_TERNARY (bit_ior:c (bit_xor:c@3 @0 @1) (bit_xor:c (bit_xor:c @1 @2) @0)) (bit_ior @3 @2)) +/* (~X | C) ^ D -> (X | C) ^ (~D ^ C) if C and D are constants. */ +(simplify + (bit_xor (bit_ior:s (bit_not @0) @1) @2) + (if ((TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST) + && TREE_CODE (@1) == TREE_CODE (@2)) + (bit_xor (bit_ior @0 @1) (bit_xor (bit_not @2) @1 + +/* (~X & C) ^ D -> (X & C) ^ (D ^ C) if C and D are constants. */ +(simplify + (bit_xor (bit_and:s (bit_not @0) @1) @2) + (if ((TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST) + && TREE_CODE (@1) == TREE_CODE (@2)) + (bit_xor (bit_and @0 @1) (bit_xor @2 @1 + /* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0. */ #if GIMPLE (simplify --- gcc/testsuite/gcc.dg/tree-ssa/pr96691.c.jj 2021-01-12 21:48:35.763025413 +0100 +++ gcc/testsuite/gcc.dg/tree-ssa/pr96691.c 2021-01-12 21:48:19.579207370 +0100 @@ -0,0 +1,21 @@ +/* PR tree-optimization/96691 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-times " \\\| 123;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \\\& 123;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \\\^ -315;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \\\^ 314;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-not " \\\^ 321;" "optimized" } } */ +/* { dg-final { scan-tree-dump-not " = ~" "optimized" } } */ + +int +foo (int x) +{ + return (~x | 123) ^ 321; +} + +int +bar (int x) +{ + return (~x & 123) ^ 321; +} Jakub
Re: [PATCH] i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]
On Wed, 13 Jan 2021, Jakub Jelinek wrote: > Hi! > > The following patch implements what I've talked about, i.e. to no longer > force operands of vec_perm_const into registers in the generic code, but let > each of the (currently 8) targets force it into registers individually, > giving the targets better control on if it does that and when and allowing > them to do something special with some particular operands. > And then defines the define_insn_and_split for the 256-bit and 512-bit > permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could > add define_insn_and_split patterns also for the bd, bq and wq). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? I wonder if it's worth handling the v0 == v1 case this way given the weird + if (op1 && op0 != op1) +op1 = force_reg (vmode, op1); code (presumably to handle RTX sharing here)? Otherwise it looks sensible - for x86, only constant op1/op0 are interesting, correct? Wouldn't that simplify things (to only handle constants this way)? Thanks, Richard. > 2021-01-13 Jakub Jelinek > > PR target/95905 > * optabs.c (expand_vec_perm_const): Don't force v0 and v1 into > registers before calling targetm.vectorize.vec_perm_const, only after > that. > * config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle > two argument permutation when one operand is zero vector and only > after that force operands into registers. > * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1, > *avx512bw_zero_extendv32qiv32hi2_1, *avx512f_zero_extendv16hiv16si2_1, > *avx2_zero_extendv8hiv8si2_1, *avx512f_zero_extendv8siv8di2_1, > *avx2_zero_extendv4siv4di2_1): New define_insn_and_split patterns. > * config/mips/mips.c (mips_vectorize_vec_perm_const): Force operands > into registers. > * config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise. > * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise. > * config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise. > * config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const): Likewise. > * config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const): Likewise. > * config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise. Use > std::swap. > > * gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of > scan-assembler. Add tests with zero vector as first __builtin_shuffle > operand. > * gcc.target/i386/pr95905-3.c: New test. > * gcc.target/i386/pr95905-4.c: New test. > > --- gcc/optabs.c.jj 2021-01-04 10:25:38.632236100 +0100 > +++ gcc/optabs.c 2021-01-12 14:46:44.719557815 +0100 > @@ -6070,11 +6070,8 @@ expand_vec_perm_const (machine_mode mode > >if (targetm.vectorize.vec_perm_const != NULL) > { > - v0 = force_reg (mode, v0); >if (single_arg_p) > v1 = v0; > - else > - v1 = force_reg (mode, v1); > >if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices)) > return target; > @@ -6095,6 +6092,11 @@ expand_vec_perm_const (machine_mode mode > return gen_lowpart (mode, target_qi); > } > > + v0 = force_reg (mode, v0); > + if (single_arg_p) > +v1 = v0; > + v1 = force_reg (mode, v1); > + >/* Otherwise expand as a fully variable permuation. */ > >/* The optabs are only defined for selectors with the same width > --- gcc/config/i386/i386-expand.c.jj 2021-01-12 11:01:51.189386077 +0100 > +++ gcc/config/i386/i386-expand.c 2021-01-12 15:43:55.673095807 +0100 > @@ -19929,6 +19929,33 @@ ix86_vectorize_vec_perm_const (machine_m > >two_args = canonicalize_perm (&d); > > + /* If one of the operands is a zero vector, try to match pmovzx. */ > + if (two_args && (d.op0 == CONST0_RTX (vmode) || d.op1 == CONST0_RTX > (vmode))) > +{ > + struct expand_vec_perm_d dzero = d; > + if (d.op0 == CONST0_RTX (vmode)) > + { > + d.op1 = dzero.op1 = force_reg (vmode, d.op1); > + std::swap (dzero.op0, dzero.op1); > + for (i = 0; i < nelt; ++i) > + dzero.perm[i] ^= nelt; > + } > + else > + d.op0 = dzero.op0 = force_reg (vmode, d.op0); > + > + if (expand_vselect_vconcat (dzero.target, dzero.op0, dzero.op1, > + dzero.perm, nelt, dzero.testing_p)) > + return true; > +} > + > + /* Force operands into registers. */ > + rtx nop0 = force_reg (vmode, d.op0); > + if (d.op0 == d.op1) > +d.op1 = nop0; > + d.op0 = nop0; > + if (d.op0 != d.op1) > +d.op1 = force_reg (vmode, d.op1); > + >if (ix86_expand_vec_perm_const_1 (&d)) > return true; > > --- gcc/config/i386/sse.md.jj 2021-01-12 14:30:32.688546846 +0100 > +++ gcc/config/i386/sse.md2021-01-12 15:40:29.018402527 +0100 > @@ -17611,6 +17611,23 @@ (define_insn "avx2_v16qiv16hi2 (set_attr "prefix" "maybe_evex") > (set_attr "mode" "OI")]) > > +(define
Re: [PATCH] i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]
On Wed, Jan 13, 2021 at 08:26:49AM +0100, Richard Biener wrote: > On Wed, 13 Jan 2021, Jakub Jelinek wrote: > > > Hi! > > > > The following patch implements what I've talked about, i.e. to no longer > > force operands of vec_perm_const into registers in the generic code, but let > > each of the (currently 8) targets force it into registers individually, > > giving the targets better control on if it does that and when and allowing > > them to do something special with some particular operands. > > And then defines the define_insn_and_split for the 256-bit and 512-bit > > permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could > > add define_insn_and_split patterns also for the bd, bq and wq). > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > I wonder if it's worth handling the v0 == v1 case this way given the > weird > > + if (op1 && op0 != op1) > +op1 = force_reg (vmode, op1); > > code (presumably to handle RTX sharing here)? The v0 == v1 case is not about RTX sharing, but about telling the targets in a quick way that it is a one argument permutation (two argument __builtin_shuffle, or 3 argument where the expander can determine they are the same etc.). Sometimes that can also be determined from the permutation, but not always. If we just forced both arguments into registers separately, this info would be lost. > Otherwise it looks sensible - for x86, only constant op1/op0 are > interesting, correct? Wouldn't that simplify things (to only handle > constants this way)? Yes, and only CONST0_RTX ATM. I originally had the generic code only treat CONST0_RTX that way and otherwise force_reg, but it didn't really simplify anything at all, the backends still need to force_reg if they require a REG, even when it is just CONST0_RTX that could make it through. Jakub
Re: [PATCH] match.pd: Fold (~X | C) ^ D into (X | C) ^ (~D ^ C) if C and D are constants [PR96691]
On Wed, 13 Jan 2021, Jakub Jelinek wrote: > Hi! > > These simplifications are only simplifications if the (~D ^ C) or (D ^ C) > expressions fold into constants, but in that case they decrease number of > operations by 1. > > For the first transformation I guess I should add reverse simplification > when C and D are arbitrary (constants would fold before; > to be tried incrementally). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-01-13 Jakub Jelinek > > PR tree-optimization/96691 > * match.pd ((~X | C) ^ D -> (X | C) ^ (~D ^ C), > (~X & C) ^ D -> (X & C) ^ (D ^ C)): New simplifications for constant > C and D. > > * gcc.dg/tree-ssa/pr96691.c: New test. > > --- gcc/match.pd.jj 2021-01-12 13:17:35.0 +0100 > +++ gcc/match.pd 2021-01-12 21:39:51.310921835 +0100 > @@ -947,6 +947,20 @@ (define_operator_list COND_TERNARY > (bit_ior:c (bit_xor:c@3 @0 @1) (bit_xor:c (bit_xor:c @1 @2) @0)) > (bit_ior @3 @2)) > > +/* (~X | C) ^ D -> (X | C) ^ (~D ^ C) if C and D are constants. */ > +(simplify > + (bit_xor (bit_ior:s (bit_not @0) @1) @2) > + (if ((TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST) > + && TREE_CODE (@1) == TREE_CODE (@2)) > + (bit_xor (bit_ior @0 @1) (bit_xor (bit_not @2) @1 I guess you could use (bit_xor (bit_ior @0 @1) (bit_xor! (bit_not! @2) @1)) and relax the constant test. Not sure in which circumstances (bit_xor (bit_not @2) @1) would simplify down to a singleton (within VN it might CSE for example). Maybe global1 = (~D) ^ C; global2 = (~X | C) ^ D; which VN should then simplify to global1 = (~D) ^ C; global2 = (X | C) ^ global1; of course such case may apply to many more patterns so not sure if we should start to use ! more aggressively now (it's relatively new). And yes, the reverse transform would be a simplification in general. Richard. > + > +/* (~X & C) ^ D -> (X & C) ^ (D ^ C) if C and D are constants. */ > +(simplify > + (bit_xor (bit_and:s (bit_not @0) @1) @2) > + (if ((TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST) > + && TREE_CODE (@1) == TREE_CODE (@2)) > + (bit_xor (bit_and @0 @1) (bit_xor @2 @1 > + > /* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0. */ > #if GIMPLE > (simplify > --- gcc/testsuite/gcc.dg/tree-ssa/pr96691.c.jj2021-01-12 > 21:48:35.763025413 +0100 > +++ gcc/testsuite/gcc.dg/tree-ssa/pr96691.c 2021-01-12 21:48:19.579207370 > +0100 > @@ -0,0 +1,21 @@ > +/* PR tree-optimization/96691 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +/* { dg-final { scan-tree-dump-times " \\\| 123;" 1 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times " \\\& 123;" 1 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times " \\\^ -315;" 1 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times " \\\^ 314;" 1 "optimized" } } */ > +/* { dg-final { scan-tree-dump-not " \\\^ 321;" "optimized" } } */ > +/* { dg-final { scan-tree-dump-not " = ~" "optimized" } } */ > + > +int > +foo (int x) > +{ > + return (~x | 123) ^ 321; > +} > + > +int > +bar (int x) > +{ > + return (~x & 123) ^ 321; > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init
On Tue, 12 Jan 2021, Qing Zhao wrote: > Hi, > > Just check in to see whether you have any comments and suggestions on this: > > FYI, I have been continue with Approach D implementation since last week: > > D. Adding calls to .DEFFERED_INIT during gimplification, expand the > .DEFFERED_INIT during expand to > real initialization. Adjusting uninitialized pass with the new refs with > “.DEFFERED_INIT”. > > For the remaining work of Approach D: > > ** complete the implementation of -ftrivial-auto-var-init=pattern; > ** complete the implementation of uninitialized warnings maintenance work > for D. > > I have completed the uninitialized warnings maintenance work for D. > And finished partial of the -ftrivial-auto-var-init=pattern implementation. > > The following are remaining work of Approach D: > >** -ftrivial-auto-var-init=pattern for VLA; >**add a new attribute for variable: > __attribute((uninitialized) > the marked variable is uninitialized intentionaly for performance purpose. >** adding complete testing cases; > > > Please let me know if you have any objection on my current decision on > implementing approach D. Did you do any analysis on how stack usage and code size are changed with approach D? How does compile-time behave (we could gobble up lots of .DEFERRED_INIT calls I guess)? Richard. > Thanks a lot for your help. > > Qing > > > > On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches > > wrote: > > > > Hi, > > > > This is an update for our previous discussion. > > > > 1. I implemented the following two different implementations in the latest > > upstream gcc: > > > > A. Adding real initialization during gimplification, not maintain the > > uninitialized warnings. > > > > D. Adding calls to .DEFFERED_INIT during gimplification, expand the > > .DEFFERED_INIT during expand to > > real initialization. Adjusting uninitialized pass with the new refs with > > “.DEFFERED_INIT”. > > > > Note, in this initial implementation, > > ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of > > -ftrivial-auto-var-init=pattern > >is not done yet. Therefore, the performance data is only about > > -ftrivial-auto-var-init=zero. > > > > ** I added an temporary option -fauto-var-init-approach=A|B|C|D to > > choose implementation A or D for > >runtime performance study. > > ** I didn’t finish the uninitialized warnings maintenance work for D. > > (That might take more time than I expected). > > > > 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc > > for the following 3 cases: > > > > no: default. (-g -O2 -march=native ) > > A: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=A > > D: default + -ftrivial-auto-var-init=zero -fauto-var-init-approach=D > > > > And then compute the slowdown data for both A and D as following: > > > > benchmarks A / no D /no > > > > 500.perlbench_r 1.25% 1.25% > > 502.gcc_r 0.68% 1.80% > > 505.mcf_r 0.68% 0.14% > > 520.omnetpp_r 4.83% 4.68% > > 523.xalancbmk_r 0.18% 1.96% > > 525.x264_r 1.55% 2.07% > > 531.deepsjeng_ 11.57% 11.85% > > 541.leela_r 0.64% 0.80% > > 557.xz_ -0.41% -0.41% > > > > 507.cactuBSSN_r 0.44% 0.44% > > 508.namd_r 0.34% 0.34% > > 510.parest_r0.17% 0.25% > > 511.povray_r56.57% 57.27% > > 519.lbm_r 0.00% 0.00% > > 521.wrf_r-0.28% -0.37% > > 526.blender_r 16.96% 17.71% > > 527.cam4_r 0.70% 0.53% > > 538.imagick_r 2.40% 2.40% > > 544.nab_r 0.00% -0.65% > > > > avg 5.17% 5.37% > > > > From the above data, we can see that in general, the runtime performance > > slowdown for > > implementation A and D are similar for individual benchmarks. > > > > There are several benchmarks that have significant slowdown with the new > > added initialization for both > > A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I > > will try to study a little bit > > more on what kind of new initializations introduced such slowdown. > > > > From the current study so far, I think that approach D should be good > > enough for our final implementation. > > So, I will try to finish approach D with the following remaining work > > > > ** complete the implementation of -ftrivial-auto-var-init=pattern; > > ** complete the implementation of uninitialized warnings maintenance > > work for D. > > > > > > Let me know if you have any comments and suggestions on my current and > > future work. > > > > Thanks a lot for your help. > > > > Qing > > > >> On Dec 9, 2020, at 10:18 AM, Qing Zhao via Gcc-patches > >> wrote: > >> > >> The following are the approaches I will implement and compare: > >> > >> Our final goal is to keep the uninit