[Bug lto/94157] [10 Regression] error: lto-wrapper failed with -Wa,--noexecstack -Wa,--noexecstack since r10-6807-gf1a681a174cdfb82
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94157 --- Comment #4 from prathamesh3492 at gcc dot gnu.org --- (In reply to Martin Liška from comment #3) > I've got a patch candidate, will send it to GCC patches mailing list. Sorry for the breakage, and thanks for taking a look! Regards, Prathamesh
[Bug target/86753] [9/10 Regression] gcc.target/aarch64/sve/vcond_[45].c fail after recent combine patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86753 --- Comment #10 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Fri Oct 18 05:13:26 2019 New Revision: 277141 URL: https://gcc.gnu.org/viewcvs?rev=277141&root=gcc&view=rev Log: 2019-10-18 Prathamesh Kulkarni Richard Sandiford PR target/86753 * tree-vectorizer.h (scalar_cond_masked_key): New struct, and define hashmap traits for it. (loop_vec_info::scalar_cond_masked_set): New member. (vect_record_loop_mask): Adjust prototype. * tree-vectorizer.c (scalar_cond_masked_key::get_cond_ops_from_tree): Implement method. * tree-vect-loop.c (vectorizable_reduction): Pass NULL as last arg to vect_record_loop_mask. (vectorizable_live_operation): Likewise. (vect_record_loop_mask): New param scalar_mask. Add entry cond, loop_mask to scalar_cond_masked_set if scalar_mask is non NULL. * tree-vect-stmts.c (check_load_store_masking): New param scalar_mask. Pass it as last arg to vect_record_loop_mask. (vectorizable_call): Pass scalar_mask as last arg to vect_record_loop_mask. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Check if another part of vectorized code applies loop_mask to condition or to it's inverse, and if yes, apply loop_mask to result of vector comparison. testsuite/ * gcc.target/aarch64/sve/cond_cnot_2.c: Remove XFAIL from { scan-assembler-not {\tsel\t}. * gcc.target/aarch64/sve/cond_convert_1.c: Adjust to make only one load conditional. * gcc.target/aarch64/sve/cond_convert_4.c: Likewise. * gcc.target/aarch64/sve/cond_unary_2.c: Likewise. * gcc.target/aarch64/sve/vcond_4.c: Remove XFAIL's. * gcc.target/aarch64/sve/vcond_5.c: Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_2.c trunk/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c trunk/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_4.c trunk/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_2.c trunk/gcc/testsuite/gcc.target/aarch64/sve/vcond_4.c trunk/gcc/testsuite/gcc.target/aarch64/sve/vcond_5.c trunk/gcc/tree-vect-loop.c trunk/gcc/tree-vect-stmts.c trunk/gcc/tree-vectorizer.c trunk/gcc/tree-vectorizer.h
[Bug tree-optimization/92155] strlen(a) not folded after memset(a, 0, sizeof a)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92155 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Hi Martin, Just wondering if it's necessary for 3rd arg to be sizeof ? IIUC memset (a, 0, n) for valid n, should result in strlen(a) equal to 0 ? Btw, it seems, the comparison is folded to 0 in following case: extern char a4[4]; void g () { __builtin_memset (a4, 0, sizeof a4); if (__builtin_strlen (a4) != 0) __builtin_abort (); } .optimized dump shows only call to memset. Thanks, Prathamesh
[Bug tree-optimization/91532] [SVE] Redundant predicated store in gcc.target/aarch64/fmla_2.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91532 --- Comment #4 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Mon Oct 21 07:31:45 2019 New Revision: 277237 URL: https://gcc.gnu.org/viewcvs?rev=277237&root=gcc&view=rev Log: 2019-10-21 Prathamesh Kulkarni PR tree-optimization/91532 * gcc.target/aarch64/sve/fmla_2.c: Add dg-scan check for two st1d insns. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
[Bug tree-optimization/92163] [10 Regression] ICE: Segmentation fault (in bitmap_set_bit)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92163 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- Created attachment 47079 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47079&action=edit Untested fix Does this patch look OK ? Thanks, Prathamesh
[Bug tree-optimization/92163] [10 Regression] ICE: Segmentation fault (in bitmap_set_bit)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92163 --- Comment #6 from prathamesh3492 at gcc dot gnu.org --- Posted updated patch upstream: https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01702.html Thanks, Prathamesh
[Bug middle-end/91272] [SVE] Use fully-masked loops for CLASTB reductions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91272 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Mon Oct 28 14:50:58 2019 New Revision: 277524 URL: https://gcc.gnu.org/viewcvs?rev=277524&root=gcc&view=rev Log: 2019-10-28 Prathamesh Kulkarni PR middle-end/91272 * tree-vect-stmts.c (vectorizable_condition): Support EXTRACT_LAST_REDUCTION with fully-masked loops. testsuite/ * gcc.target/aarch64/sve/clastb_1.c: Add dg-scan. * gcc.target/aarch64/sve/clastb_2.c: Likewise. * gcc.target/aarch64/sve/clastb_3.c: Likewise. * gcc.target/aarch64/sve/clastb_4.c: Likewise. * gcc.target/aarch64/sve/clastb_5.c: Likewise. * gcc.target/aarch64/sve/clastb_6.c: Likewise. * gcc.target/aarch64/sve/clastb_7.c: Likewise. * gcc.target/aarch64/sve/clastb_8.c: Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/aarch64/sve/clastb_1.c trunk/gcc/testsuite/gcc.target/aarch64/sve/clastb_2.c trunk/gcc/testsuite/gcc.target/aarch64/sve/clastb_3.c trunk/gcc/testsuite/gcc.target/aarch64/sve/clastb_4.c trunk/gcc/testsuite/gcc.target/aarch64/sve/clastb_5.c trunk/gcc/testsuite/gcc.target/aarch64/sve/clastb_6.c trunk/gcc/testsuite/gcc.target/aarch64/sve/clastb_7.c trunk/gcc/testsuite/gcc.target/aarch64/sve/clastb_8.c trunk/gcc/tree-vect-stmts.c
[Bug tree-optimization/92163] [10 Regression] ICE: Segmentation fault (in bitmap_set_bit)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92163 --- Comment #7 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Mon Oct 28 15:01:24 2019 New Revision: 277525 URL: https://gcc.gnu.org/viewcvs?rev=277525&root=gcc&view=rev Log: 2019-10-28 Prathamesh Kulkarni PR tree-optimization/92163 * tree-ssa-dse.c (delete_dead_or_redundant_assignment): New param need_eh_cleanup with default value NULL. Gate on need_eh_cleanup before calling bitmap_set_bit. (dse_optimize_redundant_stores): Pass global need_eh_cleanup to delete_dead_or_redundant_assignment. (dse_dom_walker::dse_optimize_stmt): Likewise. * tree-ssa-dse.h (delete_dead_or_redundant_assignment): Adjust prototype. testsuite/ * gcc.dg/tree-ssa/pr92163.c: New test. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-dse.c trunk/gcc/tree-ssa-dse.h
[Bug rtl-optimization/92342] [10 Regression] a small missed transformation into x?b:0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92342 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Hi, I reverted Segher's commit in my local tree, but am still seeing the same code-gen for g(). Thanks, Prathamesh
[Bug rtl-optimization/92342] [10 Regression] a small missed transformation into x?b:0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92342 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- (In reply to prathamesh3492 from comment #1) > Hi, > I reverted Segher's commit in my local tree, but am still seeing the same > code-gen for g(). Oops I was modifying wrong branch :-/ I can confirm reverting the commit fixes this issue. Sorry for the noise. Regards, Prathamesh > > Thanks, > Prathamesh
[Bug tree-optimization/92328] [10 Regression] ICE in eliminate_stmt, at tree-ssa-sccvn.c:5497
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92328 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Reverting the following hunk in vn_reference_lookup_3 from r276882, seems to resolve the ICE: #if 0 if (known_eq (ref->size, size2)) return vn_reference_lookup_or_insert_for_pieces (vuse, get_alias_set (lhs), vr->type, vr->operands, SSA_VAL (def_rhs)); #endif if (! INTEGRAL_TYPE_P (TREE_TYPE (def_rhs)) || type_has_mode_precision_p (TREE_TYPE (def_rhs))) { gimple_match_op op (gimple_match_cond::UNCOND, Altho, I am not sure if that's the issue. In eliminate_stmt, lhs is unsigned and sprime is int, and thus it goes into else branch and hits gcc_unreachable(): if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (sprime))) { /* We preserve conversions to but not from function or method types. This asymmetry makes it necessary to re-instantiate conversions here. */ if (POINTER_TYPE_P (TREE_TYPE (lhs)) && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (lhs sprime = fold_convert (TREE_TYPE (lhs), sprime); else gcc_unreachable (); Thanks, Prathamesh
[Bug tree-optimization/92608] [9/10 Regression] ICE: Segmentation fault (in find_loop_guard)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92608 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Posted patch upstream: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02061.html Thanks, Prathamesh
[Bug tree-optimization/92608] [9/10 Regression] ICE: Segmentation fault (in find_loop_guard)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92608 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Thu Nov 21 20:20:36 2019 New Revision: 278598 URL: https://gcc.gnu.org/viewcvs?rev=278598&root=gcc&view=rev Log: Use safe_dyn_cast instead of dyn_cast in find_loop_guard to fix PR92608. 2019-11-22 Prathamesh Kulkarni PR tree-optimization/92608 * tree-ssa-loop-unswitch.c (find_loop_guard): Use safe_dyn_cast instead of dyn_cast. testsuite/ * gcc.dg/torture/pr92608.c: New test. Added: trunk/gcc/testsuite/gcc.dg/torture/pr92608.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-unswitch.c
[Bug tree-optimization/92649] dead store elimination
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- This is likely dup of PR89332. Thanks, Prathamesh
[Bug tree-optimization/92704] [8/9/10 Regression] ICE: Segmentation fault (in process_bb)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92704 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- This seems, to happen because we end up with following phi defining .MEM_113 in ifcvt dump: [local count: 3667364]: # q7_91 = PHI # zr_lsm.55_92 = PHI # .MEM_113 = PHI <(5), .MEM_106(50)> .MEM_113 phi seems to have NULL (!) arg, which then causes segfault in following hunk in tree-ssa-sccvn.c:process_bb() gphi *phi = gsi.phi (); use_operand_p use_p = PHI_ARG_DEF_PTR_FROM_EDGE (phi, e); tree arg = USE_FROM_PTR (use_p); if (TREE_CODE (arg) != SSA_NAME || virtual_operand_p (arg)) continue; Passing -fno-tree-loop-ifconvert in addition to other options, doesn't cause the segfault. I assume phi args cannot be NULL ? Thanks, Prathamesh
[Bug tree-optimization/89007] [SVE] Implement generic vector average expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89007 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Mon Dec 9 09:59:42 2019 New Revision: 279112 URL: https://gcc.gnu.org/viewcvs?rev=279112&root=gcc&view=rev Log: 2019-12-09 Prathamesh Kulkarni PR tree-optimization/89007 * tree-vect-patterns.c (vect_recog_average_pattern): If there is no target support available, generate code to distribute rshift over plus and add a carry. testsuite/ * gcc.target/aarch64/sve/pr89007-1.c: New test. * gcc.target/aarch64/sve/pr89007-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c trunk/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-patterns.c
[Bug tree-optimization/92867] Use ERF_RETURNS_ARG in more places
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92867 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- If it's OK, I will try to implement this. Thanks, Prathamesh
[Bug tree-optimization/93054] ICE in gimple_set_lhs, at gimple.c:1820
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93054 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- I wonder if we should emit an error in the front-end if a noreturn function has non-void return type ? For above test-case, the function cb() is marked with noreturn attribute but has return-type int. Thanks, Prathamesh
[Bug tree-optimization/93397] [10 Regression] ICE in vect_create_epilog_for_reduction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93397 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Seems to ICE at all optimization levels. Because slp_node is NULL in following hunk in vect_create_epilog_for_reduction since we're calling it from loop vectorizer: /* In SLP reduction chain we reduce vector results into one vector if necessary, hence we set here REDUC_GROUP_SIZE to 1. SCALAR_DEST is the LHS of the last stmt in the reduction chain, since we are looking for the loop exit phi node. */ if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)) { stmt_vec_info t = SLP_TREE_SCALAR_STMTS (slp_node)[group_size - 1]; Gating the condition on slp_node seems to work, but not sure if that's the right fix ? Thanks, Prathamesh
[Bug ipa/88788] [9 Regression] Infinite loop in malloc_candidate_p_1 since r264838
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88788 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Sorry for the breakage, I will take a look. Regards, Prathamesh
[Bug ipa/88788] [9 Regression] Infinite loop in malloc_candidate_p_1 since r264838
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88788 --- Comment #5 from prathamesh3492 at gcc dot gnu.org --- (In reply to Martin Liška from comment #4) > Created attachment 45403 [details] > reduced test-case Thanks!
[Bug ipa/88788] [9 Regression] Infinite loop in malloc_candidate_p_1 since r264838
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88788 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |prathamesh3492 at gcc dot gnu.org --- Comment #7 from prathamesh3492 at gcc dot gnu.org --- Created attachment 45412 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45412&action=edit Untested fix Hi, The issue seems to be recursively calling malloc_candidate_p_1 with duplicate arguments, for example, with above test-case, it shows following trace: https://pastebin.com/tF5Qg06X We can see it is calling malloc_candidate_p_1 with resultobj_164=PHI<...> thrice because resultobj_164 appears 3 times as a phi-arg in: resultobj_165 = PHI <_12(12), resultobj_164(13), resultobj_164(14), resultobj_164(15)> I think it's more of a compile time hog rather than infinite recursion happening. To avoid that, I simply skipped walking over duplicate args in the phi in the attached patch: +bool skip_dup_arg = false; +for (unsigned j = i; j > 0; j--) + if (operand_equal_p (gimple_phi_arg_def (phi, j - 1), arg, 0)) +{ + skip_dup_arg = true; + break; +} +if (skip_dup_arg) + continue; + which appears to compile both the tests again. I assume a phi stmt usually won't have more than 4 or 5 args, so the loop shouldn't be too slow in practice ? I will be grateful for any other suggestions. For the larger test-case it shows 164.08 wall seconds time for compilation. Thanks, Prathamesh
[Bug ipa/88788] [9 Regression] Infinite loop in malloc_candidate_p_1 since r264838
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88788 --- Comment #10 from prathamesh3492 at gcc dot gnu.org --- Oops, I didn't realize there could be loop within phi (phi result being an arg too). I will try to come up with a better approach for handling nested PHI's. In the meantime, for stage 4, should I revert the recursive calling hunk ? Thanks, Prathamesh
[Bug ipa/88788] [9 Regression] Infinite loop in malloc_candidate_p_1 since r264838
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88788 --- Comment #12 from prathamesh3492 at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #11) > Look e.g. at -O2: > void bar (int); > > void > foo (int x) > { > int i = 0; > if (x == 8) > { > x = 16; > goto lab; > } > for (; i < 100; i++) > { > lab: > bar (x); > } > } > > but pretty much any time you have a loop where some var doesn't really > change, but there is some other edge to the loop header with a different > value for that var. Ah indeed. Thanks for the explanation!
[Bug ipa/88788] [9 Regression] Infinite loop in malloc_candidate_p_1 since r264838
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88788 --- Comment #14 from prathamesh3492 at gcc dot gnu.org --- Created attachment 45425 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45425&action=edit Patch Hi, In the attached patch, I cache results of malloc_candidate_p_1 and avoid traversing "back edges". Does it look OK ? One issue was with creation of hash_table: hash_table *mc_cache = new hash_table (100); Using num_ssa_names instead of 100 resulted in allocation failure (and ICE) for spinning-smaller.ii. Is using a smaller number like 100 OK correctness wise ? I think Richard's patch in comment 13 is a better approach, since returning false should indeed propagate quickly. Testing that patch. Thanks, Prathamesh
[Bug ipa/88788] [9 Regression] Infinite loop in malloc_candidate_p_1 since r264838
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88788 --- Comment #16 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Tue Jan 15 09:37:22 2019 New Revision: 267933 URL: https://gcc.gnu.org/viewcvs?rev=267933&root=gcc&view=rev Log: 2019-01-15 Richard Biener Prathamesh Kulkarni PR ipa/88788 * ipa-pure-const.c (malloc_candidate_p_1): Add parameter visited and return true if SSA_NAME is already marked in visited bitmap. (malloc_candidate_p): Pass visited to malloc_candidate_p_1. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-pure-const.c
[Bug ipa/85734] --suggest-attribute=malloc misdiagnoses static functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85734 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- I will take a look. Regards, Prathamesh
[Bug ipa/85734] --suggest-attribute=malloc misdiagnoses static functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85734 --- Comment #4 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Tue May 15 04:44:33 2018 New Revision: 260249 URL: https://gcc.gnu.org/viewcvs?rev=260249&root=gcc&view=rev Log: 2018-05-15 Prathamesh Kulkarni PR ipa/85734 * ipa-pure-const.c (warn_function_malloc): Pass value of known_finite param as true in call to suggest_attribute. testsuite/ * gcc.dg/ipa/pr85734.c: New test. Added: trunk/gcc/testsuite/gcc.dg/ipa/pr85734.c Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-pure-const.c trunk/gcc/testsuite/ChangeLog
[Bug ipa/85734] --suggest-attribute=malloc misdiagnoses static functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85734 --- Comment #5 from prathamesh3492 at gcc dot gnu.org --- Fixed on trunk.
[Bug c/85562] -Wsuggest-attribute=malloc misleads about "returning normally"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85562 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- Fix for PR85734 also fixes this bug.
[Bug ipa/85787] New: malloc_candidate_p fails to detect malloc attribute on nested phis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85787 Bug ID: 85787 Summary: malloc_candidate_p fails to detect malloc attribute on nested phis Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org CC: marxin at gcc dot gnu.org Target Milestone: --- For the following test-case, g should have been detected as malloc-like function by malloc_candidate_p(). void *g (int cond1, int cond2, int cond3) { void *ret; void *a; void *b; if (cond1) a = __builtin_malloc (10); else a = __builtin_malloc (20); if (cond2) b = __builtin_malloc (30); else b = __builtin_malloc (40); if (cond3) ret = a; else ret = b; return ret; }
[Bug ipa/85787] malloc_candidate_p fails to detect malloc attribute on nested phis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85787 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Working on a patch.
[Bug tree-optimization/83648] missing -Wsuggest-attribute=malloc on a trivial malloc-like function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83648 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Tue May 15 06:07:48 2018 New Revision: 260250 URL: https://gcc.gnu.org/viewcvs?rev=260250&root=gcc&view=rev Log: 2018-05-15 Prathamesh Kulkarni PR tree-optimization/83648 * ipa-pure-const.c (malloc_candidate_p): Allow function with NULL return value as malloc candidate. testsuite/ * gcc.dg/tree-ssa/pr83648.c: New test. * gcc.dg/tree-ssa/pr83648-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr83648-2.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr83648.c Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-pure-const.c trunk/gcc/testsuite/ChangeLog
[Bug ipa/85817] [9 Regression] ICE in expand_call at gcc/calls.c:4291
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85817 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Created attachment 44142 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44142&action=edit Untested fix Oops, sorry about that. I put the condition if (integer_zerop (retval)) continue; at the wrong place -;) Can also be reproduced with: _Bool f() { return 0; } In that the pure-const dump shows the function marked as malloc -:( Could you check if the attached patch fixes the GIMP issue ? Thanks, Prathamesh
[Bug tree-optimization/85820] [9 Regression] internal compiler error: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85820 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- This is most likely dup of PR85817. Could you check if the fix in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85817#c1 works ? Thanks, Prathamesh
[Bug middle-end/85817] [9 Regression] ICE in expand_call at gcc/calls.c:4291
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85817 --- Comment #4 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Fri May 18 12:31:04 2018 New Revision: 260358 URL: https://gcc.gnu.org/viewcvs?rev=260358&root=gcc&view=rev Log: 2018-05-18 Prathamesh Kulkarni PR middle-end/85817 * ipa-pure-const.c (malloc_candidate_p): Remove the check integer_zerop for retval and return false if all args to phi are zero. testsuite/ * gcc.dg/tree-ssa/pr83648.c: Change scan-tree-dump to scan-tree-dump-not for h. Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-pure-const.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/pr83648.c
[Bug middle-end/86332] New: Incorrect warning with Wstringop-overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86332 Bug ID: 86332 Summary: Incorrect warning with Wstringop-overflow Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, For the following test-case: void foo(void) { void escape(unsigned char *); unsigned char tmp[12]; unsigned char *p = tmp + 7; __builtin_memset (p, 0, 6); escape (p); } I get warning: test.c: In function ‘foo’: test.c:7:3: warning: ‘__builtin_memset’ writing 6 bytes into a region of size 5 overflows the destination [-Wstringop-overflow=] __builtin_memset (p, 0, 6); ^~ Seems like an "off by one" mistake. Doesn't warn if size of tmp is increased by 1.
[Bug middle-end/86332] Incorrect warning with Wstringop-overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86332 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Created attachment 44325 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44325&action=edit Untested fix
[Bug middle-end/86332] Incorrect warning with Wstringop-overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86332 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Oops, looks like I messed up reducing the test-case from the original program which triggered this bug -;( Sorry for the noise.
[Bug middle-end/86332] Incorrect warning with Wstringop-overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86332 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- Invalid.
[Bug middle-end/91166] [SVE] Unfolded ZIPs of constants
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91166 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Wed Jul 24 07:20:24 2019 New Revision: 273758 URL: https://gcc.gnu.org/viewcvs?rev=273758&root=gcc&view=rev Log: 2019-07-24 Prathamesh Kulkarni PR middle-end/91166 * match.pd (vec_perm_expr(v, v, mask) -> v): New pattern. (define_predicates): Add entry for uniform_vector_p. (vec_same_elem_p): New match pattern. testsuite/ * gcc.target/aarch64/sve/pr91166.c: New test. Added: trunk/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c Modified: trunk/gcc/ChangeLog trunk/gcc/match.pd trunk/gcc/testsuite/ChangeLog
[Bug target/91452] New: tls_preserve_1.c fails with -O3 -fpic -march=armv8.2-a+sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91452 Bug ID: 91452 Summary: tls_preserve_1.c fails with -O3 -fpic -march=armv8.2-a+sve Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, It seems tls_preserve_1.c is failing with -O3 -fpic -march=armv8.2-a+sve because it generates: stp q0, q1, [sp, 16] str q2, [sp, 48] which doesn't happen with -march=armv8.2-a. Thanks, Prathamesh
[Bug target/90724] ICE with __sync_bool_compare_and_swap with -march=armv8.2-a+sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90724 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Wed Aug 21 18:34:43 2019 New Revision: 274805 URL: https://gcc.gnu.org/viewcvs?rev=274805&root=gcc&view=rev Log: 2019-08-21 Prathamesh Kulkarni PR target/90724 * config/aarch64/aarch64.c (aarch64_gen_compare_reg_maybe_ze): Force y in reg if it fails aarch64_plus_operand predicate. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.c
[Bug target/88839] [SVE] Poor implementation of blend-like permutes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88839 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Wed Aug 21 20:41:41 2019 New Revision: 274810 URL: https://gcc.gnu.org/viewcvs?rev=274810&root=gcc&view=rev Log: 2019-08-22 Prathamesh Kulkarni Richard Sandiford PR target/88839 * config/aarch64/aarch64.c (aarch64_evpc_sel): New function. (aarch64_expand_vec_perm_const_1): Call aarch64_evpc_sel. testsuite/ * gcc.target/aarch64/sve/sel_1.c: New test. * gcc.target/aarch64/sve/sel_2.c: Likewise. * gcc.target/aarch64/sve/sel_3.c: Likewise. * gcc.target/aarch64/sve/sel_4.c: Likewise. * gcc.target/aarch64/sve/sel_5.c: Likewise. * gcc.target/aarch64/sve/sel_6.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/aarch64/sve/sel_1.c trunk/gcc/testsuite/gcc.target/aarch64/sve/sel_2.c trunk/gcc/testsuite/gcc.target/aarch64/sve/sel_3.c trunk/gcc/testsuite/gcc.target/aarch64/sve/sel_4.c trunk/gcc/testsuite/gcc.target/aarch64/sve/sel_5.c trunk/gcc/testsuite/gcc.target/aarch64/sve/sel_6.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.c trunk/gcc/testsuite/ChangeLog
[Bug target/90724] ICE with __sync_bool_compare_and_swap with -march=armv8.2-a+sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90724 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- (In reply to Eric Gallager from comment #2) > (In reply to prathamesh3492 from comment #1) > > Author: prathamesh3492 > > Date: Wed Aug 21 18:34:43 2019 > > New Revision: 274805 > > > > URL: https://gcc.gnu.org/viewcvs?rev=274805&root=gcc&view=rev > > Log: > > 2019-08-21 Prathamesh Kulkarni > > > > PR target/90724 > > * config/aarch64/aarch64.c (aarch64_gen_compare_reg_maybe_ze): Force y > > in reg if it fails aarch64_plus_operand predicate. > > > > Modified: > > trunk/gcc/ChangeLog > > trunk/gcc/config/aarch64/aarch64.c > > Did this fix it? On trunk, yes. Needs to be backported to gcc-9-branch. Thanks, Prathamesh
[Bug libfortran/91593] New: Implicit enum conversions in libgfortran/io/transfer.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91593 Bug ID: 91593 Summary: Implicit enum conversions in libgfortran/io/transfer.c Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libfortran Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, I added a patch for Wenum-conversion (PR78736), that exposes some implicit enum conversions in libgfortran/io/transfer.c: ./../../gcc/libgfortran/io/transfer.c: In function ‘current_mode’: ../../../gcc/libgfortran/io/transfer.c:206:5: warning: implicit conversion from ‘enum ’ to ‘file_mode’ {aka ‘enum ’} [-Wenum-conversion] 206 | m = FORM_UNSPECIFIED; | ^ ../../../gcc/libgfortran/io/transfer.c: In function ‘formatted_transfer_scalar_read’: ../../../gcc/libgfortran/io/transfer.c:1730:25: warning: implicit conversion from ‘enum ’ to ‘unit_sign’ {aka ‘enum ’} [-Wenum-conversion] 1730 |dtp->u.p.sign_status = SIGN_S; | ^ ../../../gcc/libgfortran/io/transfer.c:1735:25: warning: implicit conversion from ‘enum ’ to ‘unit_sign’ {aka ‘enum ’} [-Wenum-conversion] 1735 |dtp->u.p.sign_status = SIGN_SS; | ^ ../../../gcc/libgfortran/io/transfer.c:1740:25: warning: implicit conversion from ‘enum ’ to ‘unit_sign’ {aka ‘enum ’} [-Wenum-conversion] 1740 |dtp->u.p.sign_status = SIGN_SP; | ^ ./../../gcc/libgfortran/io/transfer.c: In function ‘formatted_transfer_scalar_write’: ../../../gcc/libgfortran/io/transfer.c:2189:25: warning: implicit conversion from ‘enum ’ to ‘unit_sign’ {aka ‘enum ’} [-Wenum-conversion] 2189 |dtp->u.p.sign_status = SIGN_S; | ^ ./../../gcc/libgfortran/io/transfer.c:2194:25: warning: implicit conversion from ‘enum ’ to ‘unit_sign’ {aka ‘enum ’} [-Wenum-conversion] 2194 |dtp->u.p.sign_status = SIGN_SS; | ^ ../../../gcc/libgfortran/io/transfer.c:2199:25: warning: implicit conversion from ‘enum ’ to ‘unit_sign’ {aka ‘enum ’} [-Wenum-conversion] 2199 |dtp->u.p.sign_status = SIGN_SP; | ^ AFAIU, the warnings are correct in this case since the enums are different and thus there's an implicit conversion from one enum type to another ? Thanks, Prathamesh
[Bug libfortran/91593] Implicit enum conversions in libgfortran/io/transfer.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91593 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Patch for PR78736 that triggers the warnings: https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01938.html Thanks, Prathamesh
[Bug tree-optimization/83661] sincos does not handle sin(2x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83661 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- Oh, I thought sincos simultaneously calculated values of sin and cos ? If that's not the case, then I wonder how is sincos transform itself beneficial ? Thanks, Prathamesh
[Bug c/78736] enum warnings in GCC (request for -Wenum-conversion to be added)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78736 --- Comment #14 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Wed Sep 4 16:25:21 2019 New Revision: 275376 URL: https://gcc.gnu.org/viewcvs?rev=275376&root=gcc&view=rev Log: Add warning Wenum-conversion for C and ObjC. The patch enables warning with Wextra due to PR91593 and warnings with allmodconfig kernel build. Once these issues are resolved, we could consider promoting it to Wall. 2019-09-04 Prathamesh Kulkarni PR c/78736 * doc/invoke.texi: Document -Wenum-conversion. c-family * c.opt (Wenum-conversion): New option. c/ * c-typeck.c (convert_for_assignment): Handle Wenum-conversion. testsuite/ * gcc.dg/Wenum-conversion.c: New test-case. Added: trunk/gcc/testsuite/gcc.dg/Wenum-conversion.c Modified: trunk/gcc/ChangeLog trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c.opt trunk/gcc/c/ChangeLog trunk/gcc/c/c-typeck.c trunk/gcc/doc/invoke.texi trunk/gcc/testsuite/ChangeLog
[Bug target/91982] gcc.target/aarch64/sve/clastb_*.c tests failing with segfault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91982 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- Probably started with r276299 ? We segfault in vect_transform_stmt in call to dominated_by_p: if (!slp_node && STMT_VINFO_REDUC_DEF (orig_stmt_info) && STMT_VINFO_REDUC_TYPE (orig_stmt_info) != FOLD_LEFT_REDUCTION && is_a (STMT_VINFO_REDUC_DEF (orig_stmt_info)->stmt)) { gphi *phi = as_a (STMT_VINFO_REDUC_DEF (orig_stmt_info)->stmt); if (dominated_by_p (CDI_DOMINATORS, gimple_bb (orig_stmt_info->stmt), gimple_bb (phi))) { because gimple_bb (orig_stmt_info->stmt) is NULL. Thanks, Prathamesh
[Bug tree-optimization/91532] [SVE] Redundant predicated store in gcc.target/aarch64/fmla_2.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91532 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Mon Oct 7 23:44:49 2019 New Revision: 276681 URL: https://gcc.gnu.org/viewcvs?rev=276681&root=gcc&view=rev Log: 2019-10-07 Prathamesh Kulkarni Richard Biener PR tree-optimization/91532 * tree-if-conv.c: Include tree-ssa-dse.h. (ifcvt_local_dce): Change param from bb to loop, and call dse_classify_store. (tree_if_conversion): Pass loop instead of loop->header as arg to ifcvt_local_dce. * tree-ssa-dse.c: Include tree-ssa-dse.h. (delete_dead_or_redundant_assignment): Remove static qualifier from declaration, and add prototype in tree-ssa-dse.h. (dse_store_status): Move to tree-ssa-dse.h. (dse_classify_store): Remove static qualifier and add new tree param stop_at_vuse, and add prototype in tree-ssa-dse.h. * tree-ssa-dse.h: New header. Added: trunk/gcc/tree-ssa-dse.h Modified: trunk/gcc/ChangeLog trunk/gcc/tree-if-conv.c trunk/gcc/tree-ssa-dse.c
[Bug tree-optimization/92033] New: ICE during dom with -march=armv8.2-a+sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92033 Bug ID: 92033 Summary: ICE during dom with -march=armv8.2-a+sve Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, With PR89007 test-case: #define N 1024 unsigned char dst[N]; unsigned char in1[N]; unsigned char in2[N]; void foo () { for( int x = 0; x < N; x++ ) dst[x] = (in1[x] + in2[x] + 1) >> 1; } Compiling with -O3 -march=armv8.2-a+sve results in following ICE: pr89007.c: In function ‘foo’: pr89007.c:7:1: internal compiler error: tree check: expected integer_cst, have poly_int_cst in to_wide, at tree.h:5795 7 | foo () | ^~~ 0x722ffd tree_check_failed(tree_node const*, char const*, int, char const*, ...) ../../gcc/gcc/tree.c:9924 0x721307 tree_check(tree_node const*, char const*, int, char const*, tree_code) ../../gcc/gcc/tree.h:3523 0x721307 wi::to_wide(tree_node const*) ../../gcc/gcc/tree.h:5795 0x721307 value_range_base::lower_bound(unsigned int) const ../../gcc/gcc/tree-vrp.c:6136 0x104e03f value_range_base::lower_bound(unsigned int) const ../../gcc/gcc/tree-vrp.c:6123 0x1604c36 range_operator::fold_range(tree_node*, value_range_base const&, value_range_base const&) const ../../gcc/gcc/range-op.cc:156 0x104fb21 range_fold_binary_expr(value_range_base*, tree_code, tree_node*, value_range_base const*, value_range_base const*) ../../gcc/gcc/tree-vrp.c:1915 0x10cad37 vr_values::extract_range_from_binary_expr(value_range*, tree_code, tree_node*, tree_node*, tree_node*) ../../gcc/gcc/vr-values.c:808 0x10cd7a8 vr_values::extract_range_from_assignment(value_range*, gassign*) ../../gcc/gcc/vr-values.c:1466 0x1538ef5 evrp_range_analyzer::record_ranges_from_stmt(gimple*, bool) ../../gcc/gcc/gimple-ssa-evrp-analyze.c:307 0xec9e9b dom_opt_dom_walker::before_dom_children(basic_block_def*) ../../gcc/gcc/tree-ssa-dom.c:1503 0x150fae7 dom_walker::walk(basic_block_def*) ../../gcc/gcc/domwalk.c:309 0xec7c94 execute ../../gcc/gcc/tree-ssa-dom.c:724 Possibly caused by r276504. Thanks, Prathamesh
[Bug tree-optimization/92033] ICE during dom with -march=armv8.2-a+sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92033 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- This seems to happen pretty much for any arithmetic ops inside loop with SVE. For instance, with cases: for (int i = 0; i < N; i++) dst[i] = ~in1[i]; for (int i = 0; i < N; i++) dst[i] = in1[i] + in2[i]; The following workaround "fixes" the issue by punting on POLY_INT_CST in range_operator::fold_range, but not sure if that's the correct approach. diff --git a/gcc/range-op.cc b/gcc/range-op.cc index fc31485384b..93eb59436dc 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -148,6 +148,13 @@ range_operator::fold_range (tree type, if (empty_range_check (r, lh, rh)) return r; + if (POLY_INT_CST_P (lh.min ()) || POLY_INT_CST_P (lh.max ()) + || POLY_INT_CST_P (rh.min ()) || POLY_INT_CST_P (rh.max ())) +{ + r.set_varying (lh.type ()); + return r; +} + for (unsigned x = 0; x < lh.num_pairs (); ++x) for (unsigned y = 0; y < rh.num_pairs (); ++y) { Thanks, Prathamesh
[Bug tree-optimization/92085] [10 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in useless_type_conversion_p, at gimple-expr.c:86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92085 --- Comment #4 from prathamesh3492 at gcc dot gnu.org --- Patch posted upstream: https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01031.html Thanks, Prathamesh
[Bug tree-optimization/92085] [10 Regression] ICE: tree check: expected class 'type', have 'exceptional' (error_mark) in useless_type_conversion_p, at gimple-expr.c:86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92085 --- Comment #5 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Tue Oct 15 07:19:41 2019 New Revision: 276984 URL: https://gcc.gnu.org/viewcvs?rev=276984&root=gcc&view=rev Log: 2019-10-15 Prathamesh Kulkarni PR tree-optimization/92085 * tree-if-conv.c (ifcvt_local_dce): Call gsi_next in else clause, instead of calling it unconditionally after delete_dead_or_redundant_assignment and fix indentation. testsuite/ * gcc.dg/tree-ssa/pr92085-1.c: New test. * gcc.dg/tree-ssa/pr92085-2.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr92085-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr92085-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-if-conv.c
[Bug target/90723] pr88598-2.c segfaults with -msve-vector-bits=256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90723 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- (In reply to Eric Gallager from comment #2) > (In reply to prathamesh3492 from comment #1) > > Author: prathamesh3492 > > Date: Sat Jul 13 08:28:33 2019 > > New Revision: 273466 > > > > URL: https://gcc.gnu.org/viewcvs?rev=273466&root=gcc&view=rev > > Log: > > 2019-07-15 Prathamesh Kulkarni > > > > PR target/90723 > > * recog.h (temporary_volatile_ok): New class. > > * config/aarch64/aarch64.c (aarch64_emit_sve_pred_move): Set > > volatile_ok temporarily to true using temporary_volatile_ok. > > * expr.c (emit_block_move_via_cpymem): Likewise. > > * optabs.c (maybe_legitimize_operand): Likewise. > > > > Modified: > > trunk/gcc/ChangeLog > > trunk/gcc/config/aarch64/aarch64.c > > trunk/gcc/expr.c > > trunk/gcc/optabs.c > > trunk/gcc/recog.h > > Did this fix it? Yes. Thanks, Prathamesh
[Bug tree-optimization/83501] [8 Regression] strlen(a) not folded after strcpy(a, "...")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83501 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Created attachment 42927 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42927&action=edit Untested fix
[Bug ipa/83506] [8 Regression] ICE: Segmentation fault in force_nonfallthru_and_redirect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83506 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Sorry for the breakage, I will take a look. Regards, Prathamesh
[Bug ipa/83506] [8 Regression] ICE: Segmentation fault in force_nonfallthru_and_redirect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83506 --- Comment #5 from prathamesh3492 at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #4) > Though, I guess the real bug is that ipa_free_fn_summary (); is no longer > called for -fno-ipa-pure-const. While the ipa_inline pass had unconditional > gate and so it was freed always if !flag_wpa, ipa-pure-const has a > non-trivial gate and thus it frees only sometimes. Calling > ipa_free_fn_summary () in ipa-inline.c if if (!flag_wpa && > !flag_ipa_pure_const && !in_lto_p) is not nice, as it duplicates the > ipa-pure-const.c gate. So, we can do something like: > --- gcc/ipa.c.jj 2017-09-01 09:26:37.0 +0200 > +++ gcc/ipa.c 2017-12-20 11:22:57.915226765 +0100 > @@ -1270,6 +1270,11 @@ ipa_single_use (void) >varpool_node *var; >hash_map single_user_map; > > + /* In WPA we use inline summaries for partitioning process. Otherwise, > + free it if earlier IPA passes have not done so yet. */ > + if (!flag_wpa) > +ipa_free_fn_summary (); > + >FOR_EACH_DEFINED_VARIABLE (var) > if (!var->all_refs_explicit_p ()) >var->aux = BOTTOM; > But I think I have a cleaner patch than that. Hi Jakub, Thanks for the fix! In r254140, I removed the call to ipa_free_fn_summary() gated on !flag_wpa from inline pass since ipa-pure-const required it for propagating malloc attribute, which unfortunately caused the above bug. Regards, Prathamesh
[Bug tree-optimization/83648] missing -Wsuggest-attribute=malloc on a trivial malloc-like function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83648 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- ipa-pure-const dump shows: MALLOC LATTICE after propagation: __builtin_malloc: malloc g: malloc_bottom f: malloc So it's not able to detect that g could be annotated with malloc. I will take a look. Thanks, Prathamesh
[Bug tree-optimization/83648] missing -Wsuggest-attribute=malloc on a trivial malloc-like function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83648 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Created attachment 43004 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43004&action=edit Untested fix
[Bug tree-optimization/82665] missing value range optimization for memchr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82665 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Patch posted upstream - https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00020.html
[Bug tree-optimization/83661] New: sincos does not handle sin(2x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83661 Bug ID: 83661 Summary: sincos does not handle sin(2x) Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, For the following test-case: double f(double x) { return __builtin_sin(2*x) + __builtin_sin(x); } optimzied dump with -O2 -funsafe-math-optimizations -ffast-math shows: ;; Function f (f, funcdef_no=0, decl_uid=1952, cgraph_uid=0, symbol_order=0) [local count: 1073741825]: _1 = __builtin_sin (x_4(D)); _2 = x_4(D) * 2.0e+0; _3 = __builtin_sin (_2); _5 = _1 + _3; return _5; Would it be a good idea to enhance sincos pass to recognize the identity sin(2*x) = 2*sin(x)*cos(x) and thus eliminate one call to __builtin_sin ? Writing 2*sin(x)*cos(x) explicitly in the source yields following optimized dump: [local count: 1073741825]: sincostmp_8 = __builtin_cexpi (x_5(D)); _1 = IMAGPART_EXPR ; _2 = REALPART_EXPR ; _3 = _1 * _2; _4 = _3 * 2.0e+0; _6 = _2 + _4; return _6; I agree in general that adding math identities like sin(x)**2 + cos(x)**2 = 1 isn't a good idea since user would almost always write the "optimized" version in practice. However for the above case, would the transform make sense ? Thanks, Prathamesh
[Bug tree-optimization/83661] sincos does not handle sin(2x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83661 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Target||x86_64-unknown-linux-gnu Host||x86_64-unknown-linux-gnu Build||x86_64-unknown-linux-gnu Severity|normal |enhancement
[Bug tree-optimization/83501] [8 Regression] strlen(a) not folded after strcpy(a, "...")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83501 --- Comment #5 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Wed Jan 3 16:07:32 2018 New Revision: 256180 URL: https://gcc.gnu.org/viewcvs?rev=256180&root=gcc&view=rev Log: 2018-01-03 Prathamesh Kulkarni PR tree-optimization/83501 * tree-ssa-strlen.c (get_string_cst): New. (handle_char_store): Call get_string_cst. testsuite/ * gcc.dg/tree-ssa/pr83501.c: New test. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr83501.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-strlen.c
[Bug tree-optimization/83501] [8 Regression] strlen(a) not folded after strcpy(a, "...")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83501 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from prathamesh3492 at gcc dot gnu.org --- Fixed.
[Bug tree-optimization/83750] New: CSE erf/erfc pair
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83750 Bug ID: 83750 Summary: CSE erf/erfc pair Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- For the following test case: double f(double x) { double g(double, double); double t1 = __builtin_erf (x); double t2 = __builtin_erfc (x); return g(t1, t2); } optimized dump shows: [local count: 1073741825]: t1_2 = __builtin_erf (x_1(D)); t2_5 = __builtin_erfc (x_1(D)); _7 = g (t1_2, t2_5); [tail call] return _7; I was wondering if it'd be a good idea to add a simple dom pass to tree-ssa-math-opts.c that would eliminate call to erfc(x) if erf(x) is present at-least with -funsafe-math-optimizations ? erfc(x) == 1.0 - erf(x) Thanks, Prathamesh
[Bug tree-optimization/83751] New: CSE erf/erfc pair
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83751 Bug ID: 83751 Summary: CSE erf/erfc pair Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- For the following test case: double f(double x) { double g(double, double); double t1 = __builtin_erf (x); double t2 = __builtin_erfc (x); return g(t1, t2); } optimized dump shows: [local count: 1073741825]: t1_2 = __builtin_erf (x_1(D)); t2_5 = __builtin_erfc (x_1(D)); _7 = g (t1_2, t2_5); [tail call] return _7; I was wondering if it'd be a good idea to add a simple dom pass to tree-ssa-math-opts.c that would eliminate call to erfc(x) if erf(x) is present at-least with -funsafe-math-optimizations ? erfc(x) == 1.0 - erf(x) Thanks, Prathamesh
[Bug tree-optimization/83751] CSE erf/erfc pair
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83751 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Oops, looks like this got posted twice :( Sorry about that.
[Bug target/83775] New: Segfault in arm_declare_function_name()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83775 Bug ID: 83775 Summary: Segfault in arm_declare_function_name() Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, For the following test-case: #define STR "1234567" const char str[] = STR; char dst[10]; void copy_from_global_str (void) { __builtin_strcpy (dst, str); if (__builtin_strlen (dst) != sizeof str - 1) __builtin_abort (); } With arm-linux-gnueabihf-gcc -O2 I get the following ICE: strlenopt-39.c: In function 'copy_from_global_str': strlenopt-39.c:13:1: internal compiler error: Segmentation fault } ^ 0xbc1f1f crash_signal ../../gcc/gcc/toplev.c:325 0xf4a5bb std::char_traits::length(char const*) /usr/include/c++/6/bits/char_traits.h:267 0xf4a5bb std::__cxx11::basic_string, std::allocator >::assign(char const*) /usr/include/c++/6/bits/basic_string.h:1268 0xf4a5bb std::__cxx11::basic_string, std::allocator >::operator=(char const*) /usr/include/c++/6/bits/basic_string.h:605 0xf4a5bb arm_declare_function_name(_IO_FILE*, char const*, tree_node*) ../../gcc/gcc/config/arm/arm.c:30958 0xf4ad2d arm_asm_declare_function_name(_IO_FILE*, char const*, tree_node*) ../../gcc/gcc/config/arm/arm.c:19899 0xefd8fc assemble_start_function(tree_node*, char const*) ../../gcc/gcc/varasm.c:1880 0x87929f rest_of_handle_final ../../gcc/gcc/final.c:4549 0x87929f execute ../../gcc/gcc/final.c:4625 This happens because of following in arm_declare_function_name(): /* Only update the assembler .arch string if it is distinct from the last such string we printed. */ std::string arch_to_print = targ_options->x_arm_arch_string; In this case, targ_options->x_arm_arch_string is NULL and hence the above error. Does the following (untested) fix look OK ? diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 196aa6de1ac..868251a154c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -30954,7 +30954,10 @@ arm_declare_function_name (FILE *stream, const char *name, tree decl) /* Only update the assembler .arch string if it is distinct from the last such string we printed. */ - std::string arch_to_print = targ_options->x_arm_arch_string; + std::string arch_to_print; + if (targ_options->x_arm_arch_string) +arch_to_print = targ_options->x_arm_arch_string; + if (arch_to_print != arm_last_printed_arch_string) { std::string arch_name Thanks, Prathamesh
[Bug tree-optimization/81703] memcpy folding defeats strlen optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81703 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Thu Jan 11 04:37:48 2018 New Revision: 256475 URL: https://gcc.gnu.org/viewcvs?rev=256475&root=gcc&view=rev Log: 2018-01-11 Martin Sebor Prathamesh Kulkarni PR tree-optimization/83501 PR tree-optimization/81703 * tree-ssa-strlen.c (get_string_cst): Rename... (get_string_len): ...to this. Handle global constants. (handle_char_store): Adjust. testsuite/ * gcc.dg/strlenopt-39.c: New test-case. * gcc.dg/pr81703.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/pr81703.c trunk/gcc/testsuite/gcc.dg/strlenopt-39.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-strlen.c
[Bug tree-optimization/83501] [8 Regression] strlen(a) not folded after strcpy(a, "...")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83501 --- Comment #8 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Thu Jan 11 04:37:48 2018 New Revision: 256475 URL: https://gcc.gnu.org/viewcvs?rev=256475&root=gcc&view=rev Log: 2018-01-11 Martin Sebor Prathamesh Kulkarni PR tree-optimization/83501 PR tree-optimization/81703 * tree-ssa-strlen.c (get_string_cst): Rename... (get_string_len): ...to this. Handle global constants. (handle_char_store): Adjust. testsuite/ * gcc.dg/strlenopt-39.c: New test-case. * gcc.dg/pr81703.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/pr81703.c trunk/gcc/testsuite/gcc.dg/strlenopt-39.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-strlen.c
[Bug tree-optimization/81703] memcpy folding defeats strlen optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81703 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- Fixed.
[Bug target/83775] Segfault in arm_declare_function_name()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83775 --- Comment #3 from prathamesh3492 at gcc dot gnu.org --- (In reply to prathamesh3492 from comment #0) > Hi, > For the following test-case: > > #define STR "1234567" > > const char str[] = STR; > > char dst[10]; > > void copy_from_global_str (void) > { > __builtin_strcpy (dst, str); > > if (__builtin_strlen (dst) != sizeof str - 1) > __builtin_abort (); > } > > With arm-linux-gnueabihf-gcc -O2 I get the following ICE: Oops, this should be cc1, I didn't invoke the driver, but cc1 directly. > strlenopt-39.c: In function 'copy_from_global_str': > strlenopt-39.c:13:1: internal compiler error: Segmentation fault > } > ^ > 0xbc1f1f crash_signal > ../../gcc/gcc/toplev.c:325 > 0xf4a5bb std::char_traits::length(char const*) > /usr/include/c++/6/bits/char_traits.h:267 > 0xf4a5bb std::__cxx11::basic_string, > std::allocator >::assign(char const*) > /usr/include/c++/6/bits/basic_string.h:1268 > 0xf4a5bb std::__cxx11::basic_string, > std::allocator >::operator=(char const*) > /usr/include/c++/6/bits/basic_string.h:605 > 0xf4a5bb arm_declare_function_name(_IO_FILE*, char const*, tree_node*) > ../../gcc/gcc/config/arm/arm.c:30958 > 0xf4ad2d arm_asm_declare_function_name(_IO_FILE*, char const*, tree_node*) > ../../gcc/gcc/config/arm/arm.c:19899 > 0xefd8fc assemble_start_function(tree_node*, char const*) > ../../gcc/gcc/varasm.c:1880 > 0x87929f rest_of_handle_final > ../../gcc/gcc/final.c:4549 > 0x87929f execute > ../../gcc/gcc/final.c:4625 > > This happens because of following in arm_declare_function_name(): > /* Only update the assembler .arch string if it is distinct from the last > such string we printed. */ > std::string arch_to_print = targ_options->x_arm_arch_string; > > In this case, targ_options->x_arm_arch_string is NULL and hence the above > error. > Does the following (untested) fix look OK ? > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 196aa6de1ac..868251a154c 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -30954,7 +30954,10 @@ arm_declare_function_name (FILE *stream, const char > *name, tree decl) > >/* Only update the assembler .arch string if it is distinct from the last > such string we printed. */ > - std::string arch_to_print = targ_options->x_arm_arch_string; > + std::string arch_to_print; > + if (targ_options->x_arm_arch_string) > +arch_to_print = targ_options->x_arm_arch_string; > + >if (arch_to_print != arm_last_printed_arch_string) > { >std::string arch_name > > > Thanks, > Prathamesh
[Bug target/83514] ABRT in arm_declare_function_name passing a null pointer to std::string
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83514 --- Comment #5 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Thu Jan 11 12:13:42 2018 New Revision: 256529 URL: https://gcc.gnu.org/viewcvs?rev=256529&root=gcc&view=rev Log: 2018-01-11 Prathamesh Kulkarni PR target/83514 * config/arm/arm.c (arm_declare_function_name): Set arch_to_print if targ_options->x_arm_arch_string is non NULL. Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c
[Bug target/83514] ABRT in arm_declare_function_name passing a null pointer to std::string
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83514 --- Comment #6 from prathamesh3492 at gcc dot gnu.org --- Committed patch to conditionally set arch_to_print after Kyrill's approval. Thanks, Prathamesh
[Bug tree-optimization/83501] [8 Regression] strlen(a) not folded after strcpy(a, "...")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83501 --- Comment #9 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Sun Jan 14 08:58:58 2018 New Revision: 256657 URL: https://gcc.gnu.org/viewcvs?rev=256657&root=gcc&view=rev Log: 2018-01-14 Prathamesh Kulkarni PR tree-optimization/83501 * gcc.dg/strlenopt-39.c: Restrict to i?86 and x86_64-*-* targets. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/strlenopt-39.c
[Bug c/83959] New: Missing buffer overflow warning on printf %s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83959 Bug ID: 83959 Summary: Missing buffer overflow warning on printf %s Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- int main(void) { char a[3] = "xyz"; __builtin_printf ("%s", a); return 0; } No warning generated with -Wall -Wextra -Wstringop-overflow=2. Should -Wstringop-overflow be catching this case ? I wonder if the compiler should warn (with Wextra maybe?) for char a[3] = "xyz"; ie when sizeof(array) == strlen(initializier) ? Although the above initializer doesn't cause overflow by itself, I suppose almost all string functions expect char arrays to end with '\0' and would end up looking past end of array thus causing overflow. Thanks, Prathamesh
[Bug tree-optimization/89332] New: Missed detection of dead stores to array in a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89332 Bug ID: 89332 Summary: Missed detection of dead stores to array in a loop Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, For the following test-case: #define ARR_MAX 6 __attribute__((const)) int f(int); int foo() { int arr[ARR_MAX]; for (int i = 0; i < ARR_MAX; i++) arr[i] = f(i); return arr[0]; } With -O3, gcc generates call to f(i) and store to arr[i] on every iteration, while clang detects the stores to arr are dead (except for arr[0]), removes the loop and emits a tail-call to f(0). aarch64-linux-gnu-gcc -O3: foo: .LFB0: .cfi_startproc stp x29, x30, [sp, -64]! .cfi_def_cfa_offset 64 .cfi_offset 29, -64 .cfi_offset 30, -56 mov x29, sp stp x19, x20, [sp, 16] .cfi_offset 19, -48 .cfi_offset 20, -40 add x20, sp, 40 mov w19, 0 .p2align 3,,7 .L2: mov w0, w19 bl f str w0, [x20], 4 add w19, w19, 1 cmp w19, 6 bne .L2 ldr w0, [sp, 40] ldp x19, x20, [sp, 16] ldp x29, x30, [sp], 64 .cfi_restore 30 .cfi_restore 29 .cfi_restore 19 .cfi_restore 20 .cfi_def_cfa_offset 0 ret clang -O3 --target=aarch64-linux-gnu: foo:// @foo // %bb.0: mov w0, wzr b f It seems, clang takes advantage of loop unrolling for the above-case, while gcc doesn't seem to. After increasing ARR_MAX from 6 to 512, clang generates same/similar code as gcc. I doubt tho if such code is written in practice or can result due to abstraction lowering ? It was just a contrived test-case I made up. Thanks, Prathamesh
[Bug target/88839] [SVE] Poor implementation of blend-like permutes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88839 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Fix committed to sve-acle-branch: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2cd1f397ed5a155e74719977823b28777caa8312 Thanks, Prathamesh
[Bug target/90644] New: Call to __builtin_memcmp not folded for identical vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90644 Bug ID: 90644 Summary: Call to __builtin_memcmp not folded for identical vectors Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, For following test-case: #include typedef int32_t vnx4si __attribute__((vector_size (32))); void foo(int a, int b) { vnx4si v = (vnx4si) { a, b, 1, 2 }; vnx4si expected = (vnx4si) { a, b, 1, 2 }; if (__builtin_memcmp (&v, &expected, sizeof (vnx4si)) != 0) __builtin_abort (); } -O2 -ftree-vectorize -march=armv8.2-a+sve folds call to __builtin_memcmp correctly, since both vectors are identical. But with -msve-vector-bits=256, it fails to fold the call to __builtin_memcmp(). The issue can also be reproduced with AdvSIMD: Fails to fold the call to __builtin_memcmp with vector_size == 16 but folds with vector_size == 32. Thanks, Prathamesh
[Bug target/88837] [SVE] Poor vector construction code in VL-specific mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88837 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Mon Jun 3 09:35:37 2019 New Revision: 271857 URL: https://gcc.gnu.org/viewcvs?rev=271857&root=gcc&view=rev Log: 2019-06-03 Prathamesh Kulkarni PR target/88837 * vector-builder.h (vector_builder::count_dups): New method. * config/aarch64/aarch64-protos.h (aarch64_expand_sve_vector_init): Declare prototype. * config/aarch64/aarch64/sve.md (aarch64_sve_rev64): Use @. (vec_init): New pattern. * config/aarch64/aarch64.c (emit_insr): New function. (aarch64_sve_expand_vector_init_handle_trailing_constants): Likewise. (aarch64_sve_expand_vector_init_insert_elems): Likewise. (aarch64_sve_expand_vector_init_handle_trailing_same_elem): Likewise. (aarch64_sve_expand_vector_init): Define two overloaded functions. testsuite/ * gcc.target/aarch64/sve/init_1.c: New test. * gcc.target/aarch64/sve/init_1_run.c: Likewise. * gcc.target/aarch64/sve/init_2.c: Likewise. * gcc.target/aarch64/sve/init_2_run.c: Likewise. * gcc.target/aarch64/sve/init_3.c: Likewise. * gcc.target/aarch64/sve/init_3_run.c: Likewise. * gcc.target/aarch64/sve/init_4.c: Likewise. * gcc.target/aarch64/sve/init_4_run.c: Likewise. * gcc.target/aarch64/sve/init_5.c: Likewise. * gcc.target/aarch64/sve/init_5_run.c: Likewise. * gcc.target/aarch64/sve/init_6.c: Likewise. * gcc.target/aarch64/sve/init_6_run.c: Likewise. * gcc.target/aarch64/sve/init_7.c: Likewise. * gcc.target/aarch64/sve/init_7_run.c: Likewise. * gcc.target/aarch64/sve/init_8.c: Likewise. * gcc.target/aarch64/sve/init_8_run.c: Likewise. * gcc.target/aarch64/sve/init_9.c: Likewise. * gcc.target/aarch64/sve/init_9_run.c: Likewise. * gcc.target/aarch64/sve/init_10.c: Likewise. * gcc.target/aarch64/sve/init_10_run.c: Likewise. * gcc.target/aarch64/sve/init_11.c: Likewise. * gcc.target/aarch64/sve/init_11_run.c: Likewise. * gcc.target/aarch64/sve/init_12.c: Likewise. * gcc.target/aarch64/sve/init_12_run.c: Likewise. Added: trunk/gcc/testsuite/gcc.target/aarch64/sve/init_1.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_10.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_11.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_11_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_12.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_12_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_1_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_2.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_2_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_3.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_3_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_4.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_4_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_5.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_5_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_6.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_6_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_7.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_7_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_8.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_8_run.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_9.c trunk/gcc/testsuite/gcc.target/aarch64/sve/init_9_run.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64-protos.h trunk/gcc/config/aarch64/aarch64-sve.md trunk/gcc/config/aarch64/aarch64.c trunk/gcc/testsuite/ChangeLog trunk/gcc/vector-builder.h
[Bug target/90722] New: ICE with __builtin_convertvector with -msve-vector-bits=256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90722 Bug ID: 90722 Summary: ICE with __builtin_convertvector with -msve-vector-bits=256 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- The following test-case: typedef int v4si __attribute__((vector_size (4 * sizeof (int; typedef double v4df __attribute__((vector_size (4 * sizeof (double; void f4 (v4df *x, v4si *y) { *y = __builtin_convertvector (*x, v4si); } results in ICE with -O2 -march=armv8.2-a+sve -msve-vector-bits=256: 0xcddacd simplify_const_unary_operation(rtx_code, machine_mode, rtx_def*, machine_mode) ../../gcc/gcc/simplify-rtx.c:1763 0xcd9c2a simplify_unary_operation(rtx_code, machine_mode, rtx_def*, machine_mode) ../../gcc/gcc/simplify-rtx.c:873 0x13bca5a combine_simplify_rtx ../../gcc/gcc/combine.c:5787 0x13bf1a6 subst ../../gcc/gcc/combine.c:5727 0x13bf2bb subst ../../gcc/gcc/combine.c:5590 0x13bf102 subst ../../gcc/gcc/combine.c:5661 0x13c0568 try_combine ../../gcc/gcc/combine.c:3420 0x13c66c6 combine_instructions ../../gcc/gcc/combine.c:1306 0x13c66c6 rest_of_handle_combine ../../gcc/gcc/combine.c:15068 0x13c66c6 execute ../../gcc/gcc/combine.c:15113 because it hits following assert in simplify_const_unary_operation: gcc_assert (known_eq (GET_MODE_NUNITS (mode), n_elts)); GET_MODE_NUNITS (mode) == 8 and n_elts == 4 for the test-case. Thanks, Prathamesh
[Bug target/90723] New: pr88598-2.c segfaults with -msve-vector-bits=256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90723 Bug ID: 90723 Summary: pr88598-2.c segfaults with -msve-vector-bits=256 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- cc1 segfaults with the following test-case, with -O2 -march=armv8.2-a+sve -msve-vector-bits=256: typedef double v4df __attribute__ ((vector_size (32))); void foo(v4df); int main () { volatile v4df x1; x1 = (v4df) { 0, 1, 1, 2 }; foo (x1); return 0; } gdb backtrace (clipped to last 14): Program received signal SIGSEGV, Segmentation fault. 0x00bfdad1 in expand_binop_directly (icode=CODE_FOR_adddi3, mode=mode@entry=E_DImode, binoptab=binoptab@entry=add_optab, op0=op0@entry=0x77a233a8, op1=op1@entry=0x77a2b290, target=target@entry=0x75095468, unsignedp=1, methods=OPTAB_LIB_WIDEN, last=0x75094bc0) at ../../gcc/gcc/optabs.c:1038 1038{ (gdb) bt #0 0x00bfdad1 in expand_binop_directly (icode=CODE_FOR_adddi3, mode=mode@entry=E_DImode, binoptab=binoptab@entry=add_optab, op0=op0@entry=0x77a233a8, op1=op1@entry=0x77a2b290, target=target@entry=0x75095468, unsignedp=1, methods=OPTAB_LIB_WIDEN, last=0x75094bc0) at ../../gcc/gcc/optabs.c:1038 #1 0x00bfc0dd in expand_binop (mode=E_DImode, binoptab=, op0=0x77a233a8, op1=0x77a2b290, target=0x75095468, unsignedp=1, methods=OPTAB_LIB_WIDEN) at ../../gcc/gcc/optabs.c:1209 #2 0x009cc7e4 in force_operand (value=0x77859f90, target=0x75095468) at ../../gcc/gcc/expr.c:7527 #3 0x009a80a3 in copy_to_mode_reg (mode=E_DImode, x=x@entry=0x77859f90) at ../../gcc/gcc/explow.c:627 #4 0x00bf2dce in maybe_legitimize_operand_same_code (icode=icode@entry=CODE_FOR_aarch64_pred_movvnx2df, opno=opno@entry=2, op=) at ../../gcc/gcc/optabs.c:7146 #5 0x00bf56ee in maybe_legitimize_operand (op=0x7bfff400, opno=2, icode=CODE_FOR_aarch64_pred_movvnx2df) at ../../gcc/gcc/optabs.c:7196 #6 maybe_legitimize_operands (icode=CODE_FOR_aarch64_pred_movvnx2df, opno=0, nops=, ops=0x7bfff3c0) at ../../gcc/gcc/optabs.c:7323 #7 0x00bf5c0a in maybe_gen_insn (icode=CODE_FOR_aarch64_pred_movvnx2df, nops=, ops=0x7bfff3c0) at ../../gcc/gcc/optabs.c:7342 #8 0x00bf8c39 in maybe_expand_insn (ops=ops@entry=0x7bfff3b0, nops=nops@entry=3, icode=) at ../../gcc/gcc/optabs.c:7416 #9 expand_insn (icode=, nops=nops@entry=3, ops=ops@entry=0x7bfff3c0) at ../../gcc/gcc/optabs.c:7416 #10 0x010378a4 in aarch64_emit_sve_pred_move (dest=, pred=, src=) at ./insn-opinit.h:735 #11 0x012cb710 in gen_movvnx2df (operand0=0x75095408, operand1=0x77859f78) at ../../gcc/gcc/config/aarch64/aarch64-sve.md:77 #12 0x009c7505 in insn_gen_fn::operator() (this=, a1=0x77859f78, a0=0x75095408) at ../../gcc/gcc/recog.h:301 #13 emit_move_insn_1 (x=0x75095408, y=0x77859f78) at ../../gcc/gcc/expr.c:3701 #14 0x009c7950 in emit_move_insn (x=x@entry=0x75095408, y=y@entry=0x77859f78) at ../../gcc/gcc/expr.c:3797 Thanks, Prathamesh
[Bug target/90724] New: ICE with __sync_bool_compare_and_swap with -march=armv8.2-a+sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90724 Bug ID: 90724 Summary: ICE with __sync_bool_compare_and_swap with -march=armv8.2-a+sve Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Following test (pr82096.c) and few others fail with following ICE with -march=armv8.2-a+sve that contain call to __sync_bool_compare_and_swap: static long long AL[24]; int check_ok (void) { return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll)); } pr82096.c: In function 'check_ok': pr82096.c:11:1: error: unrecognizable insn: 11 | } | ^ (insn 11 10 12 2 (set (reg:CC 66 cc) (compare:CC (reg:DI 95) (const_int 8589934595 [0x20003]))) "pr82096.c":10:11 -1 (nil)) during RTL pass: vregs pr82096.c:11:1: internal compiler error: in extract_insn, at recog.c:2310 0x64bb6e _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../gcc/gcc/rtl-error.c:108 0x64bb8a _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../../gcc/gcc/rtl-error.c:116 0x64a58b extract_insn(rtx_insn*) ../../gcc/gcc/recog.c:2310 0xa28a45 instantiate_virtual_regs_in_insn ../../gcc/gcc/function.c:1605 0xa28a45 instantiate_virtual_regs ../../gcc/gcc/function.c:1975 0xa28a45 execute ../../gcc/gcc/function.c:2024 Thanks, Prathamesh
[Bug target/88833] [SVE] Redundant moves for WHILELO-based loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88833 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Thu Jul 4 06:48:42 2019 New Revision: 273040 URL: https://gcc.gnu.org/viewcvs?rev=273040&root=gcc&view=rev Log: 2019-07-04 Prathamesh Kulkarni PR target/88833 * fwprop.c (reg_single_def_p): New function. (propagate_rtx_1): Add unconditional else inside RTX_EXTRA case. (forward_propagate_into): New parameter reg_prop_only with default value false. Propagate def's src into loop only if SET_SRC and SET_DEST of def_set have single definitions. Likewise if reg_prop_only is set to true. (fwprop): New param fwprop_addr_p. Integrate fwprop_addr into fwprop. (fwprop_addr): Remove. (pass_rtl_fwprop_addr::execute): Call fwprop with arg set to true. (pass_rtl_fwprop::execute): Call fwprop with arg set to false. * simplify-rtx.c (simplify_subreg): Add case for vector comparison. * config/i386/sse.md (UNSPEC_BLENDV): Adjust pattern. testsuite/ * gfortran.dg/pr88833.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/pr88833.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/fwprop.c trunk/gcc/simplify-rtx.c trunk/gcc/testsuite/ChangeLog
[Bug target/90723] pr88598-2.c segfaults with -msve-vector-bits=256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90723 --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Sat Jul 13 08:28:33 2019 New Revision: 273466 URL: https://gcc.gnu.org/viewcvs?rev=273466&root=gcc&view=rev Log: 2019-07-15 Prathamesh Kulkarni PR target/90723 * recog.h (temporary_volatile_ok): New class. * config/aarch64/aarch64.c (aarch64_emit_sve_pred_move): Set volatile_ok temporarily to true using temporary_volatile_ok. * expr.c (emit_block_move_via_cpymem): Likewise. * optabs.c (maybe_legitimize_operand): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.c trunk/gcc/expr.c trunk/gcc/optabs.c trunk/gcc/recog.h
[Bug tree-optimization/86570] New: Conditional statement doesn't trigger sincos transform
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86570 Bug ID: 86570 Summary: Conditional statement doesn't trigger sincos transform Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- For the following test-case: double f2(double x, double a, double b) { if (a == b) return __builtin_sin (a * x) + __builtin_cos (b * x); return 0; } Optimized dump with -O2 -ffast-math -funsafe-math-optimizations yields: f2 (double x, double a, double b) { double _1; double _2; double _3; double _4; double _5; double _9; [local count: 1073741825]: if (a_6(D) == b_7(D)) goto ; [34.00%] else goto ; [66.00%] [local count: 365072220]: _1 = a_6(D) * x_8(D); _2 = __builtin_sin (_1); _3 = b_7(D) * x_8(D); _4 = __builtin_cos (_3); _9 = _2 + _4; [local count: 1073741825]: # _5 = PHI <_9(3), 0.0(2)> return _5; } I assume the sincos transform would have been valid in the above case ? Similarly missed for the divmod transform. Thanks, Prathamesh
[Bug tree-optimization/86570] Conditional statement doesn't trigger sincos transform (with -ffast-math)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86570 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- AFAIU, the underlying issue doesn't seem particular to float. For example, there's a similar missed optimization with divmod transform: unsigned f(unsigned x, unsigned y, unsigned a, unsigned b) { if (a == b) { unsigned t1 = (a * x) / y; unsigned t2 = (b * x) % y; return t1 + t2; } return 0; } With -O2, optimized dump shows: f (unsigned int x, unsigned int y, unsigned int a, unsigned int b) { unsigned int t2; unsigned int t1; unsigned int _1; unsigned int _2; unsigned int _3; unsigned int _10; [local count: 1073741825]: if (a_4(D) == b_5(D)) goto ; [20.97%] else goto ; [79.03%] [local count: 225163661]: _1 = a_4(D) * x_6(D); t1_8 = _1 / y_7(D); _2 = b_5(D) * x_6(D); t2_9 = _2 % y_7(D); _10 = t1_8 + t2_9; [local count: 1073741825]: # _3 = PHI <_10(3), 0(2)> return _3; } I assume the divmod transform would be applicable in this case ? Thanks, Prathamesh
[Bug tree-optimization/80155] [7/8/9 regression] Performance regression with code hoisting enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 --- Comment #38 from prathamesh3492 at gcc dot gnu.org --- Hi, The issue can be reproduced exactly, with pr77445-2.c. I am testing with making is_digit() noinline. * Reordering SINK before PRE SPEC2006 data for building SPEC2006 with sink before pre: Number of statements sunk: +2677 (~ +14%) Number of total PRE insertions: -3971 (~ -1%) On the private embedded benchmark suite, there's overall no significant difference. Not sure if this is much helpful. Is there a way to get info about number of registers spilled from lra dump or assembly ? I would like to see the effect on spills by reordering passes. Reordering sink before pre seems to regress no-scevccp-outer-22.c and ssa-dom-thread-7.c, and several SVE tests on aarch64: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/262002-sink-pre/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt Also there seems to be some interplay with hoisting and forwprop. Disabling forwprop3 and forwprop4 seems to eliminate the spill too. However as Bin pointed out on the list, forwprop is also helping to reduce register pressure for this case by mem_ref folding (forward_propagate_addr_expr). * Jump threading cost models It seems jump-threading pass increases the size for this case from 38 to 79 blocks. Wondering if that adds up to "resource hog", eventually leading to extra spill ? Disabling jump threading pass eliminates the spill. I looked a bit into fine tuning jump threading cost models for cortex-m7. Strangely, setting max-jump-thread-duplication-stmts to 20 and fsm-scale-path-stmts to 3 not only removes the spill but also results in 9 more hoistings! I am investigating why this resulted in improved performance. However it regresses ssa-dom-thread-7.c: http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/262539-jump-thread-cost-models/aarch64-none-elf/diff-gcc-rh60-aarch64-none-elf-default-default-default.txt * Stop-gap measure for hoisting ? As a stop-gap measure, would it make sense to "localize" hoisting within "large" loop (based on loop->num_nodes?) by refusing to hoist expressions computed outside loop ? My assumption is that hoisting will increase live range of expression which was previously computed in a block outside loop but is brought inside the loop due to hoisting since we'd now need to consider path along the loop as well for estimating it's live-range ? I suppose a cheap way to test that would be to check if block's post-dominator also lies within the same loop since it would ensure all paths from block to EXIT would lie inside the loop ? I created a patch for this (http://people.linaro.org/~prathamesh.kulkarni/pdom.diff), which works to remove the spill but regressed pr77445-2.c (which is how I stumbled on that test). Although the underlying issue doesn't seem particularly relevant to hoisting, so not sure if this "heuristic" makes much sense. * Live range shrinking pass There was some discussion about an inter-block live-range shrinking GIMPLE pass on the list (https://gcc.gnu.org/ml/gcc/2018-05/msg00260.html), which will run just before expand. I would be grateful for suggestions on how to get started with it. I realize this'd be pretty hard, but would like to give a try. Thanks, Prathamesh
[Bug tree-optimization/80155] [7/8/9 regression] Performance regression with code hoisting enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 --- Comment #40 from prathamesh3492 at gcc dot gnu.org --- ping https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155#c38 Thanks, Prathamesh
[Bug middle-end/87209] New: Wuninitialized or Wmaybe-uninitialized doesn't warn when malloc's return value is used without being initialized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87209 Bug ID: 87209 Summary: Wuninitialized or Wmaybe-uninitialized doesn't warn when malloc's return value is used without being initialized Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- There's no warnings emitted for the following test-case: int f(void) { int *p = __builtin_malloc (sizeof (*p)); return *p; } I assume this should have been diagnosed with Wuninitialized or Wmaybe-uninitialized ? Thanks, Prathamesh
[Bug tree-optimization/84712] New: Missed evaluating to constant at tree level
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84712 Bug ID: 84712 Summary: Missed evaluating to constant at tree level Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, It seems GCC does not evaluate the following function to a constant at the tree level: int sum(void) { int a[] = {1, 2, 3, -1}; int x = 0; for (int i = 0; i < 4; i++) if (a[i] < 0) break; else x += a[i]; return x; } optimized dump shows: sum () { int x; int a[4]; int _25; int _33; int _41; [local count: 261993005]: MEM[(int *)&a] = { 1, 2, 3, -1 }; _25 = a[1]; if (_25 < 0) goto ; [7.91%] else goto ; [92.09%] [local count: 246744733]: x_30 = _25 + 1; _33 = a[2]; if (_33 < 0) goto ; [7.91%] else goto ; [92.09%] [local count: 232383926]: x_38 = x_30 + _33; _41 = a[3]; if (_41 < 0) goto ; [7.91%] else goto ; [92.09%] [local count: 47244641]: # x_17 = PHI goto ; [100.00%] [local count: 218858940]: x_10 = x_38 + _41; [local count: 261993005]: # x_2 = PHI a ={v} {CLOBBER}; return x_2; } However at RTL, cprop seems to do the constant folding and set return value register to 6. Thanks, Prathamesh
[Bug target/84759] Calculation of quotient and remainder with constant denominator uses __umoddi3+__udivdi3 instead of __udivmoddi4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84759 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #1 from prathamesh3492 at gcc dot gnu.org --- In the former case, the divmod transform takes place and we emit call to __udivmoddi4. However it does't trigger for divmodConst, because we avoid handling constants in the transform since expand_divmod has specialized expansions for few constants, which would otherwise have been missed. I suppose this could be somewhat improved. Thanks, Prathamesh
[Bug tree-optimization/85787] malloc_candidate_p fails to detect malloc attribute on nested phis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85787 --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Author: prathamesh3492 Date: Thu Oct 4 11:06:24 2018 New Revision: 264838 URL: https://gcc.gnu.org/viewcvs?rev=264838&root=gcc&view=rev Log: 2018-10-04 Prathamesh Kulkarni PR tree-optimization/85787 * ipa-pure-const.c (malloc_candidate_p_1): Move most of malloc_candidate_p into this function and add support for detecting multiple phis. (DUMP_AND_RETURN): Move from malloc_candidate_p into top-level macro. testsuite/ * gcc.dg/ipa/propmalloc-4.c: New test. Added: trunk/gcc/testsuite/gcc.dg/ipa/propmalloc-4.c Modified: trunk/gcc/ChangeLog trunk/gcc/ipa-pure-const.c trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/80155] [7/8/9 regression] Performance regression with code hoisting enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 --- Comment #42 from prathamesh3492 at gcc dot gnu.org --- Hi, This is another simpler approach I tried to apply "cost-model" on hoisting before approaching a more general solution: http://people.linaro.org/~prathamesh.kulkarni/hoist-change-order.diff In this prototype patch, I changed order of hoisting such that instead hoisting an expression in first candidate block, it hoists expression one dominator at a time. For pr77445-2.c test-case, str_225 + 1 gets hoisted in block 10 because it's the first candidate block found from the top-down dom-tree walk, which leaves little room for controlling hoisting. The patch forces expressions to be inserted in immediate dominator at a time instead of the first candidate block. With this change, the following series of hoistings take place for str_225 + 1: Inserting expression in block 15 for code hoisting:{pointer_plus_expr,str_225,1} (0079) Inserting expression in block 14 for code hoisting: {pointer_plus_expr,str_225,1} (0079) Inserting expression in block 11 for code hoisting: {pointer_plus_expr,str_225,1} (0079) Inserting expression in block 10 for code hoisting: {pointer_plus_expr,str_225,1} (0079) Inserting expression in block 53 for code hoisting: {pointer_plus_expr,str_225,1} (0079) str_225 + 1 originally appears in blocks 16 and 17. It is then first hoisted into their predecessor block 15, then into block 14 and so on. The advantage I see with this order of hoisting is, we can control hoisting after each insertion in it's immediate dominator. So for instance if according to our cost model, we reach "hoisting threshold" after say block 14, we can then prevent further hoistings of str_225 + 1. Whereas with the current approach it gets hoisted right up to block 10 initially. Alternatively we could try to "sink" the expression down to dominated blocks. I didn't explore this option yet. * Cost model for hoisting The cost model would be entirely target specific defined by a target hook and shouldn't affect other architectures that don't wish to use it. I suppose a very simple cost model for hoisting could take following two factors: a) Number of hoistings of a particular expression measured in terms of dominator depth - This is recorded by expr_hoist_map which is map the former representing value number of pre_expr and latter represents the count. b) Number of insertions in basic block - This is recorded by map, the former representing block index and latter represents the count. I didn't attempt to define the cost-model in the patch. I was wondering what could be other potential factors that we can consider ? * Issues with changing hoisting order I am not entirely sure if the result of changing hoisting order can result in correctness issues or missed optimizations ? For some confidence, I validated the patch with bootstrap+test on x86_64, which worked. There are two problems I see: (1) Interference with statistics of hoisting, which is easy to fix. (2) Does not honor the "expression should be available in at least one successor" constraint, which leads to more aggressive hoisting for architectures that will not use cost model. In example above, str_225 + 1 got hoisted one block further upto block-53, while with current-order it's restricted to block-10. I suppose we could fix this by recording which expressions were originally available at end of block ? The patch passes bootstrap+test on x86_64. * Hoistings crossing loop boundary - One "peculiarity" I see with FMS function in pr77445-2.c is that all the hoistings cross loop boundaries at one point, while other tests have significantly lesser. I did a quick test with SPEC2006 to collect some data: (number-of-hoistings vs number-of-functions) {2: 89, 1: 166, 3: 37, 4: 14, 5: 8, 6: 10, 7: 11, 8: 2, 13: 3, 10: 1, 11: 5, 9: 4, 17: 1, 15: 2, 27: 1, 12: 2, 18: 1, 21: 1, 26: 1} It seems most of the functions have cross loop hoistings less than 5 with 166 functions having one hoisting inside loop and 89 functions having two hoistings across loops. I was wondering if a hoisting into a block from it's successor should have "extra penalty" if it crosses a loop boundary ? Or does hoisting inside a loop have no effect on register pressure ? Thanks, Prathamesh
[Bug tree-optimization/80155] [7/8/9 regression] Performance regression with code hoisting enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155 --- Comment #43 from prathamesh3492 at gcc dot gnu.org --- Sorry for duplications / formatting errors in previous comment. Is there a way to edit posted comments ? Thanks, Prathamesh
[Bug target/87920] New: Lots of regression tests fail with bootstrap build of arm-linux-gnueabihf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87920 Bug ID: 87920 Summary: Lots of regression tests fail with bootstrap build of arm-linux-gnueabihf Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, It seems lots of tests are failing with bootstrap build of arm-linux-gnueabihf with following ICE: during GIMPLE pass: ldist /home/prathamesh.kulkarni/gnu-toolchain/gcc/tcwg-319-4/gcc/gcc/testsuite/c-c++-common/torture/pr53505.c: In function 'main': /home/prathamesh.kulkarni/gnu-toolchain/gcc/tcwg-319-4/gcc/gcc/testsuite/c-c++-common/torture/pr53505.c:29:1: internal compiler error: Segmentation fault 0x5e83cb crash_signal ../../gcc/gcc/toplev.c:325 0x660e57 inchash::hash::add(void const*, unsigned int) ../../gcc/gcc/inchash.h:100 0x660e57 inchash::hash::add_ptr(void const*) ../../gcc/gcc/inchash.h:94 0x660e57 ddr_hasher::hash(data_dependence_relation const*) ../../gcc/gcc/tree-loop-distribution.c:143 0x660e57 hash_table::find_slot(data_dependence_relation* const&, insert_option) ../../gcc/gcc/hash-table.h:414 0x660e57 get_data_dependence ../../gcc/gcc/tree-loop-distribution.c:1184 0x66157b pg_add_dependence_edges ../../gcc/gcc/tree-loop-distribution.c:1890 0x66157b build_partition_graph ../../gcc/gcc/tree-loop-distribution.c:2107 0x66180f merge_dep_scc_partitions ../../gcc/gcc/tree-loop-distribution.c:2171 0x662e69 distribute_loop ../../gcc/gcc/tree-loop-distribution.c:2892 0x66416d execute ../../gcc/gcc/tree-loop-distribution.c:3133 Several tests fail with above ICE, like pr53505.c, 20131115-1.c, 20181024-1.c etc. Thanks, Prathamesh
[Bug target/87920] Lots of regression tests fail with bootstrap build of arm-linux-gnueabihf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87920 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from prathamesh3492 at gcc dot gnu.org --- Likely yes, thanks for the pointer! I will mark this as dup. Thanks, Prathamesh *** This bug has been marked as a duplicate of bug 87899 ***
[Bug middle-end/87899] [9 regression]r264897 cause mis-compiled native arm-linux-gnueabihf toolchain
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899 prathamesh3492 at gcc dot gnu.org changed: What|Removed |Added CC||prathamesh3492 at gcc dot gnu.org --- Comment #4 from prathamesh3492 at gcc dot gnu.org --- *** Bug 87920 has been marked as a duplicate of this bug. ***