On Thu, Apr 28, 2016 at 8:36 PM, Bernd Schmidt <bschm...@redhat.com> wrote: > On 01/18/2016 10:22 AM, Richard Biener wrote: >> >> See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171 - the >> inline expansion >> for small sizes and equality compares should be done on GIMPLE. Today the >> strlen pass might be an appropriate place to do this given its >> superior knowledge >> about string lengths. >> >> The idea of turning eq feeding memcmp into a special memcmp_eq is good but >> you have to avoid doing that too early - otherwise you'd lose on >> >> res = memcmp (p, q, sz); >> if (memcmp (p, q, sz) == 0) >> ... >> >> that is, you have to make sure CSE got the chance to common the two calls. >> This is why I think this kind of transform needs to happen in specific >> places >> (like during strlen opt) rather than in generic folding. > > > Ok, here's an update. I kept pieces of your patch from that PR, but also > translating memcmps larger than a single operation into memcmp_eq as in my > previous patch. > > Then, I added by_pieces infrastructure for memcmp expansion. To avoid any > more code duplication in this area, I abstracted the existing code and > converted it to C++ classes since that seemed to fit pretty well. > > There are a few possible ways I could go with this, which is why I'm posting > it more as a RFD at this point. > - should store_by_pieces be eliminated in favour of doing just > move_by_pieces with constfns? > - the C++ification could be extended, making move_by_pieces_d and > compare_by_pieces_d classes inheriting from a common base. This > would get rid of the callbacks, replacing them with virtuals, > and also make some of the current struct members private. > - could move all of the by_pieces stuff out into a separate file? > > Later, I think we'll also want to extend this to allow vector mode > operations, but I think that's a separate patch. > > So, opinions what I should be doing with this patch? FWIW it bootstraps and > tests OK on x86_64-linux.
+struct pieces_addr +{ ... + void *m_cfndata; +public: + pieces_addr (rtx, bool, by_pieces_constfn, void *); unless you strick private: somewhere the public: is redundant Index: gcc/tree-ssa-forwprop.c =================================================================== --- gcc/tree-ssa-forwprop.c (revision 235474) +++ gcc/tree-ssa-forwprop.c (working copy) @@ -1435,6 +1435,76 @@ simplify_builtin_call (gimple_stmt_itera } } break; + + case BUILT_IN_MEMCMP: + { I think this doesn't belong in forwprop. If we want to stick it into a pass rather than folding it should be in tree-ssa-strlen.c. + if (tree_fits_uhwi_p (len) + && (leni = tree_to_uhwi (len)) <= GET_MODE_SIZE (word_mode) + && exact_log2 (leni) != -1 + && (align1 = get_pointer_alignment (arg1)) >= leni * BITS_PER_UNIT + && (align2 = get_pointer_alignment (arg2)) >= leni * BITS_PER_UNIT) + { + location_t loc = gimple_location (stmt2); + tree type, off; + type = build_nonstandard_integer_type (leni * BITS_PER_UNIT, 1); + gcc_assert (GET_MODE_SIZE (TYPE_MODE (type)) == leni); + off = build_int_cst + (build_pointer_type_for_mode (char_type_node, ptr_mode, true), 0); + arg1 = build2_loc (loc, MEM_REF, type, arg1, off); + arg2 = build2_loc (loc, MEM_REF, type, arg2, off); + res = fold_convert_loc (loc, TREE_TYPE (res), + fold_build2_loc (loc, NE_EXPR, + boolean_type_node, + arg1, arg2)); + gimplify_and_update_call_from_tree (gsi_p, res); + return true; + } for this part see gimple_fold_builtin_memory_op handling of /* If we can perform the copy efficiently with first doing all loads and then all stores inline it that way. Currently efficiently means that we can load all the memory into a single integer register which is what MOVE_MAX gives us. */ we might want to share a helper yielding the type of the load/store or NULL if not possible. Note that we can handle size-1 memcmp even for ordered compares. Jakub, where do you think this fits best? Note that gimple-fold.c may not use immediate uses but would have to match this from the comparison (I still have to find a way to handle this in match.pd where the result expression contains virtual operands in the not toplevel stmt). Richard. > > Bernd