> > On Sat, 2 Nov 2024, Sam James wrote: > > > Some references to feed discussion which I had in my logs from > > discussing it with someone the other week after that interaction we > > had with alanc: > > * https://github.com/llvm/llvm-project/issues/28790 (not very > > insightful, other than to show it has a history of confusing people, but > > one might argue that for other (definitely-valid) optimisations) > > * https://github.com/llvm/llvm-project/issues/49826 > > * https://github.com/llvm/llvm-project/issues/51476 > > Thank you. Indeed > > I believe in this situation the onus is on us to prove that such transforms > are correct. > > Exhibit A: > > POSIX semantics for malloc involve errno.
So if I can check errno to see if malloc failed, I guess even our current behaviour of optimizing away paired malloc+free calls provided that the return value is unused is problematic under POSIX same way as the proposed patch. > > Exhibit B: > > malloc is supposed to give unique pointers (without any intervening free's). > However, the proposed patch should be powerful enough to transform > > void f() > { > void *p = __builtin_malloc(1); > if (p) { > asm(""); > f(); > } > __builtin_free(p); > } > > to an infinite loop, but infinite amout of unique pointers is impossible. > > In a similar vein, five mallocs, each allocating a quarter of address space, > also seems impossible. > > I am reminded that we take a pretty strict stance against enabling > -fno-math-errno -fno-trapping-math, even though they gate a lot of useful > vectorization. So it's a bit weird we're going so gung-ho with this malloc > elimination. The attached patch adds code to track size of allocated block and disable the transformation when the block is not known to be smaller then half of the address space by ranger. We can do the runtime check discussed on the top of that. I have bootstrap®resst running (it will still need testusite compensation). However I am not convinced this is a good direction. this patch will still break the example where one allocates 5x 1/4th of address space, leaves it unused, and verify that at least one of allocation fails. Reading the discussions and thinking of practical uses cases, it still seems to me that portable C code can not anything useful by such probing for malloc to fail. One can not rely on the fact that once the allocation succeeded it will succeed later, since OS may give the memory to something else. For things like glibc testuiste there is -fno-builtin-malloc and I think it should use it (also for other functions we recognize as biultins), since the tests are very likely not expecting compiler to change function calls to different one. Such as printf ("Hello %i",1) to puts ("Hello 1") I can add -fno-malloc-dce, but I still think it would make sense to do such transformation by default even for C code. If we decide that malloc dce is not safe by default, perhaps we want something like __builtin_optimizable_malloc (to pair __builtin_operator_new) to be used in cases where we do not care for malloc to fail after allocating and not using all of address space. Also non-C languages (such as fortran) may use it by default. In longer term I think we may want to do other optimizations of heap allocations, such as merging multiple malloc blocks, moving small allocations to stack or increasing alignment (turning malloc to aligned allocs when it help vectorization just as we do with static variables). Honza gcc/ChangeLog: * tree-ssa-dce.cc (is_removable_allocation_p): Break out from ...; also determine size of block. (mark_stmt_if_obviously_necessary): ... here (checks_return_value_of_removable_allocation_p): New function. (mark_all_reaching_defs_necessary_1): Do not propagate through null pointer checks of allocations. (propagate_necessity): Likewise. (eliminate_unnecessary_stmts): Fol away null pointer checks of removed allocations. diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc index c44eff35b84..82a5b665ce9 100644 --- a/gcc/tree-ssa-dce.cc +++ b/gcc/tree-ssa-dce.cc @@ -240,6 +240,114 @@ mark_operand_necessary (tree op) worklist.safe_push (stmt); } +/* Return true if STMT is a call to allocation function that can be + optimized out if the memory block is never used and only freed. + + If true is returned and size1/size2 are non-NULL then + they are initialized to expression computing allocated block size + which is later used to compare with half of address space. */ + +static bool +is_removable_allocation_p (gcall *stmt, tree *size1 = NULL, tree *size2 = NULL) +{ + tree callee = gimple_call_fndecl (stmt); + if (callee != NULL_TREE + && fndecl_built_in_p (callee, BUILT_IN_NORMAL)) + switch (DECL_FUNCTION_CODE (callee)) + { + case BUILT_IN_MALLOC: + case BUILT_IN_ALIGNED_ALLOC: + CASE_BUILT_IN_ALLOCA: + case BUILT_IN_GOMP_ALLOC: + if (size1) + { + *size1 = gimple_call_arg (stmt, 0); + *size2 = NULL; + } + return true; + case BUILT_IN_CALLOC: + if (size1) + { + *size1 = gimple_call_arg (stmt, 0); + *size2 = gimple_call_arg (stmt, 1); + } + return true; + case BUILT_IN_STRDUP: + case BUILT_IN_STRNDUP: + if (size1) + *size1 = *size2 = NULL_TREE; + return true; + + default:; + } + + if (callee != NULL_TREE + && flag_allocation_dce + && gimple_call_from_new_or_delete (stmt) + && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (callee)) + { + /* C++ standard allows removal of paired new/delete operators. */ + *size1 = *size2 = NULL_TREE; + return true; + } + return false; +} + +/* Return true if STMT is a conditional + if (ptr != NULL) + where ptr was returned by a removable allocation function + and this check can be safely assumed to be true. */ + +static bool +checks_return_value_of_removable_allocation_p (gimple *stmt) +{ + gcall *def_stmt; + tree size1, size2; + if (gimple_code (stmt) == GIMPLE_COND + && (gimple_cond_code (stmt) == EQ_EXPR + || gimple_cond_code (stmt) == NE_EXPR) + && integer_zerop (gimple_cond_rhs (stmt)) + && TREE_CODE (gimple_cond_lhs (stmt)) == SSA_NAME + && (def_stmt = dyn_cast <gcall *> + (SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt)))) + && is_removable_allocation_p (def_stmt, &size1, &size2)) + { + if (size1 && integer_onep (size1)) + { + size1 = size2; + size2 = NULL_TREE; + } + if (size2 && integer_onep (size2)) + size2 = NULL_TREE; + if (!size1 && !size2) + return true; + int_range_max vr; + if (!get_range_query (cfun)->range_of_expr (vr, size1, stmt) + || vr.undefined_p () + || vr.varying_p ()) + return false; + if (!size2) + return wi::leu_p (vr.upper_bound (), + wi::to_wide (TYPE_MAX_VALUE (ptrdiff_type_node))); + else + { + int_range_max vr2; + if (!get_range_query (cfun)->range_of_expr (vr2, size2, stmt) + || vr2.undefined_p () + || vr2.varying_p ()) + return false; + wi::overflow_type overflow; + wide_int size = wi::umul (vr.upper_bound (), vr2.upper_bound (), &overflow); + if (overflow) + return false; + return wi::leu_p (size, + wi::to_wide (TYPE_MAX_VALUE (ptrdiff_type_node))); + } + return true; + } + return false; +} + /* Mark STMT as necessary if it obviously is. Add it to the worklist if it can make other statements necessary. @@ -271,38 +379,23 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive) case GIMPLE_CALL: { + gcall *call = as_a <gcall *> (stmt); + /* Never elide a noreturn call we pruned control-flow for. */ - if ((gimple_call_flags (stmt) & ECF_NORETURN) - && gimple_call_ctrl_altering_p (stmt)) + if ((gimple_call_flags (call) & ECF_NORETURN) + && gimple_call_ctrl_altering_p (call)) { - mark_stmt_necessary (stmt, true); + mark_stmt_necessary (call, true); return; } - tree callee = gimple_call_fndecl (stmt); - if (callee != NULL_TREE - && fndecl_built_in_p (callee, BUILT_IN_NORMAL)) - switch (DECL_FUNCTION_CODE (callee)) - { - case BUILT_IN_MALLOC: - case BUILT_IN_ALIGNED_ALLOC: - case BUILT_IN_CALLOC: - CASE_BUILT_IN_ALLOCA: - case BUILT_IN_STRDUP: - case BUILT_IN_STRNDUP: - case BUILT_IN_GOMP_ALLOC: - return; - - default:; - } - if (callee != NULL_TREE - && flag_allocation_dce - && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (callee)) + if (is_removable_allocation_p (call)) return; + /* For __cxa_atexit calls, don't mark as necessary right away. */ - if (is_removable_cxa_atexit_call (stmt)) + if (is_removable_cxa_atexit_call (call)) return; /* IFN_GOACC_LOOP calls are necessary in that they are used to @@ -311,9 +404,9 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool aggressive) survive from aggressive loop removal for it has loop exit and is assumed to be finite. Therefore, we need to explicitly mark these calls. (An example is libgomp.oacc-c-c++-common/pr84955.c) */ - if (gimple_call_internal_p (stmt, IFN_GOACC_LOOP)) + if (gimple_call_internal_p (call, IFN_GOACC_LOOP)) { - mark_stmt_necessary (stmt, true); + mark_stmt_necessary (call, true); return; } break; @@ -667,6 +760,8 @@ mark_all_reaching_defs_necessary_1 (ao_ref *ref ATTRIBUTE_UNUSED, case BUILT_IN_ALIGNED_ALLOC: case BUILT_IN_CALLOC: CASE_BUILT_IN_ALLOCA: + case BUILT_IN_STRDUP: + case BUILT_IN_STRNDUP: case BUILT_IN_FREE: case BUILT_IN_GOMP_ALLOC: case BUILT_IN_GOMP_FREE: @@ -891,19 +986,11 @@ propagate_necessity (bool aggressive) { tree ptr = gimple_call_arg (stmt, 0); gcall *def_stmt; - tree def_callee; /* If the pointer we free is defined by an allocation function do not add the call to the worklist. */ if (TREE_CODE (ptr) == SSA_NAME && (def_stmt = dyn_cast <gcall *> (SSA_NAME_DEF_STMT (ptr))) - && (def_callee = gimple_call_fndecl (def_stmt)) - && ((DECL_BUILT_IN_CLASS (def_callee) == BUILT_IN_NORMAL - && (DECL_FUNCTION_CODE (def_callee) == BUILT_IN_ALIGNED_ALLOC - || DECL_FUNCTION_CODE (def_callee) == BUILT_IN_MALLOC - || DECL_FUNCTION_CODE (def_callee) == BUILT_IN_CALLOC - || DECL_FUNCTION_CODE (def_callee) == BUILT_IN_GOMP_ALLOC)) - || (DECL_IS_REPLACEABLE_OPERATOR_NEW_P (def_callee) - && gimple_call_from_new_or_delete (def_stmt)))) + && is_removable_allocation_p (def_stmt)) { if (is_delete_operator && !valid_new_delete_pair_p (def_stmt, stmt)) @@ -925,6 +1012,9 @@ propagate_necessity (bool aggressive) } } + if (checks_return_value_of_removable_allocation_p (stmt)) + continue; + FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE) mark_operand_necessary (use); @@ -1379,7 +1469,6 @@ eliminate_unnecessary_stmts (bool aggressive) basic_block bb; gimple_stmt_iterator gsi, psi; gimple *stmt; - tree call; auto_vec<edge> to_remove_edges; if (dump_file && (dump_flags & TDF_DETAILS)) @@ -1448,6 +1537,23 @@ eliminate_unnecessary_stmts (bool aggressive) gimple_set_plf (stmt, STMT_NECESSARY, false); } } + /* Conditional checking that return value of allocation is non-NULL + can be turned to constant if the allocation itself + is unnecesary. */ + if (gimple_plf (stmt, STMT_NECESSARY) + && gimple_code (stmt) == GIMPLE_COND + && TREE_CODE (gimple_cond_lhs (stmt)) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt)); + if (!gimple_nop_p (def_stmt) + && !gimple_plf (def_stmt, STMT_NECESSARY)) + { + gcc_checking_assert + (checks_return_value_of_removable_allocation_p (stmt)); + gimple_cond_set_lhs (as_a <gcond *>(stmt), integer_one_node); + update_stmt (stmt); + } + } /* If GSI is not necessary then remove it. */ if (!gimple_plf (stmt, STMT_NECESSARY)) @@ -1482,11 +1588,11 @@ eliminate_unnecessary_stmts (bool aggressive) remove_dead_stmt (&gsi, bb, to_remove_edges); continue; } - else if (is_gimple_call (stmt)) + else if (gcall *call_stmt = dyn_cast <gcall *> (stmt)) { - tree name = gimple_call_lhs (stmt); + tree name = gimple_call_lhs (call_stmt); - notice_special_calls (as_a <gcall *> (stmt)); + notice_special_calls (call_stmt); /* When LHS of var = call (); is dead, simplify it into call (); saving one operand. */ @@ -1496,36 +1602,30 @@ eliminate_unnecessary_stmts (bool aggressive) /* Avoid doing so for allocation calls which we did not mark as necessary, it will confuse the special logic we apply to malloc/free pair removal. */ - && (!(call = gimple_call_fndecl (stmt)) - || ((DECL_BUILT_IN_CLASS (call) != BUILT_IN_NORMAL - || (DECL_FUNCTION_CODE (call) != BUILT_IN_ALIGNED_ALLOC - && DECL_FUNCTION_CODE (call) != BUILT_IN_MALLOC - && DECL_FUNCTION_CODE (call) != BUILT_IN_CALLOC - && !ALLOCA_FUNCTION_CODE_P - (DECL_FUNCTION_CODE (call)))) - && !DECL_IS_REPLACEABLE_OPERATOR_NEW_P (call)))) + && !is_removable_allocation_p (call_stmt)) { something_changed = true; if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Deleting LHS of call: "); - print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); + print_gimple_stmt (dump_file, call_stmt, 0, TDF_SLIM); fprintf (dump_file, "\n"); } - gimple_call_set_lhs (stmt, NULL_TREE); - maybe_clean_or_replace_eh_stmt (stmt, stmt); - update_stmt (stmt); + gimple_call_set_lhs (call_stmt, NULL_TREE); + maybe_clean_or_replace_eh_stmt (call_stmt, call_stmt); + update_stmt (call_stmt); release_ssa_name (name); /* GOMP_SIMD_LANE (unless three argument) or ASAN_POISON without lhs is not needed. */ - if (gimple_call_internal_p (stmt)) - switch (gimple_call_internal_fn (stmt)) + if (gimple_call_internal_p (call_stmt)) + switch (gimple_call_internal_fn (call_stmt)) { case IFN_GOMP_SIMD_LANE: - if (gimple_call_num_args (stmt) >= 3 - && !integer_nonzerop (gimple_call_arg (stmt, 2))) + if (gimple_call_num_args (call_stmt) >= 3 + && !integer_nonzerop + (gimple_call_arg (call_stmt, 2))) break; /* FALLTHRU */ case IFN_ASAN_POISON: @@ -1535,8 +1635,8 @@ eliminate_unnecessary_stmts (bool aggressive) break; } } - else if (gimple_call_internal_p (stmt)) - switch (gimple_call_internal_fn (stmt)) + else if (gimple_call_internal_p (call_stmt)) + switch (gimple_call_internal_fn (call_stmt)) { case IFN_ADD_OVERFLOW: maybe_optimize_arith_overflow (&gsi, PLUS_EXPR); @@ -1548,11 +1648,11 @@ eliminate_unnecessary_stmts (bool aggressive) maybe_optimize_arith_overflow (&gsi, MULT_EXPR); break; case IFN_UADDC: - if (integer_zerop (gimple_call_arg (stmt, 2))) + if (integer_zerop (gimple_call_arg (call_stmt, 2))) maybe_optimize_arith_overflow (&gsi, PLUS_EXPR); break; case IFN_USUBC: - if (integer_zerop (gimple_call_arg (stmt, 2))) + if (integer_zerop (gimple_call_arg (call_stmt, 2))) maybe_optimize_arith_overflow (&gsi, MINUS_EXPR); break; default: