[Bug tree-optimization/89772] memchr for a character not in constant nul-padded string not folded
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89772 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #2 from JunMa --- Agreed, I think we should get trailing nulls from c_getstr() for array when fold memchr/memcmp/bcmp builtins.
[Bug ipa/89341] [7/8/9 Regression] ICE in get, at cgraph.h:1332
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89341 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #10 from JunMa --- I saw same issue with alias attribute. gcc should error out when weakref or alias attribute attached to a definition. I'll send patch and test cases later.
[Bug tree-optimization/89809] movzwl is not utilized when uint16_t is loaded with bit-shifts (while memcpy does)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89809 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #2 from JunMa --- g++ pr89809.cpp -O3 -fdump-tree-store-merging: foo (const unsigned char * p) { unsigned char _1; signed short _2; unsigned char _3; int _4; int _5; signed short _6; signed short _7; uint16_t _10; [local count: 1073741824]: _1 = *p_9(D); _2 = (signed short) _1; _3 = MEM[(const unsigned char *)p_9(D) + 1B]; _4 = (int) _3; _5 = _4 << 8; _6 = (signed short) _5; _7 = _2 | _6; _10 = (uint16_t) _7; return _10; } looks like gcc generates too many type conversions, this prevents the optimization.
[Bug tree-optimization/89809] movzwl is not utilized when uint16_t is loaded with bit-shifts (while memcpy does)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89809 --- Comment #3 from JunMa --- the stmt generated by fe has some issue, in 004t.original dump file: return = (uint16_t) ((signed short) *p | (signed short) ((int) *(p + 1) << 8)); However, the return stmt should be: return = (uint16_t) (((int)(uint16_t) *p) | ((int)(uint16_t) *(p + 1) << 8)); then gcc will optimize it.
[Bug ipa/89341] [7/8/9 Regression] ICE in get, at cgraph.h:1332
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89341 --- Comment #12 from JunMa --- (In reply to Jan Hubicka from comment #11) > Removing the alias check seems correct to me. The same body alias patch was > long and needed special casing those aliases on quite few places. I am not > at all sure why I added this one, but it definitly silences the diagnostics > completely that is wrong. we cannot remove the alias check here directly, since the definition and alias field of target node is set to true in cgraph_node::create_alias. Consider: static void __attribute__((weakref("bar"))) foo1(void); static void __attribute__((weakref("foo1"))) foo2(void); void bar(); if alias check removed, gcc gives warning at foo2. I have sent the patch to maillist, see https://gcc.gnu.org/ml/gcc-patches/2019-03/msg01249.html, please have a look.
[Bug middle-end/89911] [9 Regression] ICE in get_attr_nonstring_decl, at calls.c:1502
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89911 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #1 from JunMa --- diff --git a/gcc/calls.c b/gcc/calls.c index 63c1bc5..d940ec8 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1556,6 +1556,8 @@ maybe_warn_nonstring_arg (tree fndecl, tree exp) return; unsigned nargs = call_expr_nargs (exp); + if (nargs == 0) +return; /* The bound argument to a bounded string function like strncpy. */ tree bound = NULL_TREE; this patch fixes it.
[Bug middle-end/89934] [9 Regression] ICE on a call with fewer arguments to strncpy declared without prototype
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89934 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #5 from JunMa --- similar issue in pr89911
[Bug middle-end/89977] missing -Wstringop-overflow with an out-of-bounds int128_t range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89977 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #1 from JunMa --- in function f, the conversion of stmt _1 = (long unsigned int) n_3 is extending, while in function g, the conversion of stmt _1 = (long unsigned int) n_3 is truncating. For integer type truncation, gcc compute the range of target only if the range size of source is less than what the precision of the target type can represent. I think this can be relaxed when the target type of truncation is unsigned.
[Bug middle-end/89977] missing -Wstringop-overflow with an out-of-bounds int128_t range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89977 --- Comment #2 from JunMa --- After a bit more thinking, the behavior of gcc trunk is right. the range of n_3 in truncation from int128 to long unsigned int equal to the range of long unsigned int. for example: if n_3 = 0x1, then _1 is 0 which is less than 7. so this is not a bug.
[Bug middle-end/89977] missing -Wstringop-overflow with an out-of-bounds int128_t range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89977 --- Comment #3 from JunMa --- (In reply to JunMa from comment #2) > After a bit more thinking, the behavior of gcc trunk is right. the range of > n_3 in truncation from int128 to long unsigned int equal to the range of > long unsigned int. for example: if n_3 = 0x1, then _1 is 0 which is > less than 7. > > so this is not a bug. sorry, when n_3 = 0x1000 , _1 is 0.
[Bug middle-end/89977] missing -Wstringop-overflow with an out-of-bounds int128_t range
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89977 --- Comment #5 from JunMa --- (In reply to Martin Sebor from comment #4) > You're right that the conversion from int128_t to unsigned long can result > in truncation, so the range of the result is that of unsigned long. Yet I > suspect that relying on it is more likely unintentional and a bug. The > question in my mind is whether narrowing int128_t conversions should be > diagnosed just in these contexts (i.e., -Wstringop-overflow) or in others as > well. We have no idea whether these truncations is intentional or not in gcc side, maybe we need a new option such as Wstringop-truncation to do this.
[Bug middle-end/89922] Loop on fixed size array is not unrolled and poorly optimized at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89922 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #5 from JunMa --- the testcase in https://godbolt.org/z/iKi0pb is well optimized in gcc6.5 with O3, but not gcc7 and later. I have checked the gimple code dumped by optimized pass which are same. The difference is done by rtl_cse1 pass.
[Bug tree-optimization/90106] builtin sqrt() ignoring libm's sqrt call result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #7 from JunMa --- yes, the transformation in CDEC prevent the tail call optimization. let's check the return stmt in CDEC pass.
[Bug c/89774] Add flag to force single precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89774 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #10 from JunMa --- (In reply to Segher Boessenkool from comment #9) > We currently only do it for trivial cases, as the example in comment 6 shows > as well. This is done during expand, which is the wrong place for it. > > PR90070 is asking for better optimisation of this: do the operation in single > precision, and use single-precision constants, if this does not change the > result (or there is some -ffast-math option). > > PR22326 is also closely related. I don't think we can close any of these PRs > as a dup of another, they are all asking for slightly different things :-) clang can do this optimization in instcombine pass. see this case: float f4( float x ) {double t = x + 2.0; return t; } float f5( float x ) {return x + 2.0; } compiled with -O2 -march=native, GCC gives: f4: vcvtss2sd%xmm0, %xmm0, %xmm0 vaddsd .LC1(%rip), %xmm0, %xmm0 vcvtsd2ss%xmm0, %xmm0, %xmm0 ret f5: vaddss .LC3(%rip), %xmm0, %xmm0 ret while clang always emits vaddss instruction.
[Bug tree-optimization/90106] builtin sqrt() ignoring libm's sqrt call result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106 --- Comment #8 from JunMa --- (In reply to Alexander Monakov from comment #6) > Reopening and confirming, GCC's code looks less efficient than possible for > no good reason. > > CDCE does > > y = sqrt (x); > ==> > y = IFN_SQRT (x); > if (__builtin_isless (x, 0)) > sqrt (x); > > but it could do > > y = IFN_SQRT (x); > if (__builtin_isless (x, 0)) > y = sqrt (x); > > (note two assignments to y) > what is the difference between this and LLVM's approach ? > or to mimic LLVM's approach: > > if (__builtin_isless (x, 0)) > y = sqrt (x); > else > y = IFN_SQRT (x); I have finished a patch which do as same as LLVM in cdce pass, and test with case below: #include int main () { float x = 1.0; float y; for (int i=0; i<1; i++) { y += sqrtf (x+i); } return y; } And I've got, for x86-64 with O2: # original asm of IFN_SQRT part .L4: pxor %xmm0, %xmm0 cvtsi2ssl %ebx, %xmm0 addss %xmm3, %xmm0 ucomiss %xmm0, %xmm4 movaps %xmm0, %xmm2 sqrtss %xmm2, %xmm2 ja .L7 and perf stat : 1,423,652,277 cycles#2.180 GHz (83.31%) 1,121,862,980 stalled-cycles-frontend # 78.80% frontend cycles idle (83.31%) 634,957,413 stalled-cycles-backend# 44.60% backend cycles idle (66.62%) 1,102,109,423 instructions #0.77 insn per cycle #1.02 stalled cycles per insn (83.31%) 200,400,940 branches # 306.873 M/sec (83.44%) 7,734 branch-misses #0.00% of all branches (83.44%) #transformed asm : .L4: pxor %xmm0, %xmm0 cvtsi2ssl %ebx, %xmm0 addss %xmm3, %xmm0 ucomiss %xmm0, %xmm2 ja .L8 sqrtss %xmm0, %xmm0 and perf stat: 1,418,560,722 cycles#2.180 GHz (83.25%) 1,116,732,674 stalled-cycles-frontend # 78.72% frontend cycles idle (83.25%) 674,837,417 stalled-cycles-backend# 47.57% backend cycles idle (66.81%) 1,003,067,037 instructions #0.71 insn per cycle #1.11 stalled cycles per insn (83.41%) 200,619,151 branches # 308.272 M/sec (83.40%) 5,637 branch-misses #0.00% of all branches (83.28%) The transformed case has less instructions and gets better performance which looks good to me. However, one thing that I noticed is the original case gets less 'stalled-cycles-backend', since its code has better ILP. I'm not sure which approach is better. Environment: gcc version: gcc trunk@270488 OS: centos7.2 HW: Intel(R) Xeon(R) CPU E5-2430 0 @ 2.20GHz
[Bug tree-optimization/90106] builtin sqrt() ignoring libm's sqrt call result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106 --- Comment #9 from JunMa --- (In reply to JunMa from comment #7) > yes, the transformation in CDEC prevent the tail call optimization. let's > check the return stmt in CDEC pass. Sorry for the confused comment. As the discussion above, The cdce pass looks for calls to built-in functions that set errno and whose result is used. It tries to transform these calls into conditionally executes calls with a simple range check on the arguments which can detect most cases and the errno does not need to be set. The transform looks like: y = sqrt (x); ==> y = IFN_SQRT (x); if (__builtin_isless (x, 0)) sqrt (x); However when the call is in tail position, this transformation breaks tailcall optimizations, since the conditionally call does not have return value. This is what this PR tries to explain and fix. Alexander gives two suggestions: first: y = IFN_SQRT (x); if (__builtin_isless (x, 0)) y = sqrt (x); second(LLVM's approach): if (__builtin_isless (x, 0)) y = sqrt (x); else y = IFN_SQRT (x); So what I want to do here is looking for tailcall and transforming as first one. I did some hacks locally, but then I found gcc generated even worse code in 'y = IFN_SQRT' part: f: pxor %xmm1, %xmm1 movaps %xmm0, %xmm2 ucomiss %xmm0, %xmm1 sqrtss %xmm2, %xmm2 ja .L4 movaps %xmm2, %xmm0 ret .L4: jmp sqrtf Then I used LLVM's approach no matter call is in tail position or not, and it gives: f: pxor %xmm1, %xmm1 ucomiss %xmm0, %xmm1 ja .L4 sqrtss %xmm0, %xmm0 ret .L4: jmp sqrtf Also in comment 6, I did some test for LLVM's approach. Sorry for the confused comment again.
[Bug tree-optimization/90387] [9 Regression] __builtin_constant_p and -Warray-bounds warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90387 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #2 from JunMa --- VRP tries to fold builtin_constant_p when its argument is a function parameter. builtin_constant_p should be removed as dead stmt in this case no matter "#if 1 " or "#if 0", since p_len is function parameter. When "#if 1" turns true, vrp pass inserts ASSERT_EXPR to infer value range of p_len, this changes argument of builtin_constant_p from function parameter to result of ASSERT_EXPR which breaks the rule.
[Bug tree-optimization/90387] [9 Regression] __builtin_constant_p and -Warray-bounds warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90387 --- Comment #4 from JunMa --- LGTM
[Bug tree-optimization/90437] Overflow detection too late for VRP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90437 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #2 from JunMa --- (In reply to Richard Biener from comment #1) > VRP obviously only sees a + b in [0, 20] and [0, 20] < [0, 10] as unknown. we do have pattern x+y < y in match.pd, but it only worked with TYPE_OVERFLOW_UNDEFINED. I'm not sure wether we can use range info in match.pd.
[Bug middle-end/90514] New: Issue about enum type in gcc tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90514 Bug ID: 90514 Summary: Issue about enum type in gcc tree Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: JunMa at linux dot alibaba.com Target Milestone: --- For case pr23046.c: enum eumtype { ENUM1, ENUM2 }; void g(const enum eumtype kind ); void f(long i); void g(const enum eumtype kind) { if ((kind != ENUM1) && (kind != ENUM2)) f(kind); } command: gcc -O2 test.c and when I dumped kind, I found: unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x70a93738 precision:32 min max values value chain value >> context > visited var def_stmt GIMPLE_NOP version:3> It looks weird to me, since I think that min/max/precision of TREE_TYPE(kind) is inverted with TREE_TYPE(TREE_TYPE(kind)). This cause vrp pass get wrong range info. Also, the code which checks enum type is out of data.
[Bug middle-end/90514] Issue about enum type in gcc tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90514 --- Comment #2 from JunMa --- (In reply to Andrew Pinski from comment #1) > Are you saying the precision should be 1? If so then no, that would be > invalid as in C, enum have the full range of the underlying type and is well > defined to have values of 3 or higher in the enum variable. Thanks for explain. I had got confused by the comments in vrp pass. the condition if ((kind != ENUM1) && (kind != ENUM2)) is not always false, and cannot be folded to if (0). Also the code deals with pr23046 is out of data, and should be removed.
[Bug testsuite/90517] [10 regression] test case gcc.dg/cdce1.c fails (unresolved) starting with r271281
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90517 JunMa changed: What|Removed |Added CC||JunMa at linux dot alibaba.com --- Comment #2 from JunMa --- (In reply to Jakub Jelinek from comment #1) > See http://gcc.gnu.org/ml/gcc-patches/2019-05/msg01024.html > Waiting for review on that. LGTM, and thanks for fix the testcases
[Bug middle-end/90514] Issue about enum type in gcc tree
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90514 --- Comment #4 from JunMa --- (In reply to Andrew Pinski from comment #3) > (In reply to JunMa from comment #2) > > I had got confused by the comments in vrp pass. the condition > > if ((kind != ENUM1) && (kind != ENUM2)) > > is not always false, and cannot be folded to if (0). > > Also the code deals with pr23046 is out of data, and should be removed. > > That was C++ code rather than C code and C++ which has different rules than > C (at least with -fstrict-enums). The code is not out of date for gimple > types. Since the types in Gimple can have more well defined behaviors which > are not exposed via C or C++ front-ends. NOTE there could be an Ada code > which hits the same failure (I don't know Ada that well but I do know the > Ada front-end supports more features of the GCC middle-end than either of > the C or C++ front-ends). Since gimple folding has changed so much, we can easily support folding it in match.pd rather than checking this in here.
[Bug tree-optimization/90106] builtin sqrt() ignoring libm's sqrt call result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106 --- Comment #13 from JunMa --- (In reply to Christophe Lyon from comment #12) > This new test fails on arm: > FAIL: gcc.dg/cdce3.c scan-tree-dump cdce "cdce3.c:9: [^\n\r]* function call > is shrink-wrapped into error conditions." I don't have arm environment. Would you please attach the dump file of cdce pass?
[Bug tree-optimization/90106] builtin sqrt() ignoring libm's sqrt call result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106 --- Comment #15 from JunMa --- (In reply to Christophe Lyon from comment #14) > Sure, here is the contents of cdce3.c.105t.cdce: > > ;; Function foo (foo, funcdef_no=0, decl_uid=4197, cgraph_uid=1, > symbol_order=0) > > foo (float x) > { > float _4; > >[local count: 1073741824]: > _4 = sqrtf (x_2(D)); > return _4; > > } The contents are as same as cdce3.c.104t.stdarg. Would you please dump it whit -fdump-tree-cdce-details? Then we can see(in x86_64) Found conditional dead call: _4 = sqrtf (x_2(D)); cdce3.c:9: note: function call is shrink-wrapped into error conditions. It seems that gcc thinks arm doesn't support direct internal function (maybe vsqrt instruction ?) of sqrtf .
[Bug tree-optimization/90106] builtin sqrt() ignoring libm's sqrt call result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106 --- Comment #17 from JunMa --- (In reply to Christophe Lyon from comment #16) > That's what I did... (use -fdump-tree-cdce-details). > > The assembler code is: > .arm > .fpu softvfp > .type foo, %function > foo: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > @ link register save eliminated. > b sqrtf > > which is a tail call ('b' is a jump instruction on arm, not a call). Hmm... I think the testcase works with -mfpu=vfp -mfloat-abi=hard.
[Bug tree-optimization/90106] builtin sqrt() ignoring libm's sqrt call result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106 --- Comment #19 from JunMa --- we can skip the target by adding /* { dg-skip-if "need hardfp abi" { *-*-* } { "-mfloat-abi=soft" } { "" } } */ to testcase.