On Mon, 20 Jan 2014, Jakub Jelinek wrote: > On Mon, Jan 20, 2014 at 01:35:05PM +0100, Richard Biener wrote: > > > tree-ssa-strlen.c apparently both doesn't this case (unknown first strlen, > > > known second strlen or movstr pattern, will only transform that if the > > > length of the resulting string is needed afterwards), and isn't run > > > for -Os or -O1 anyway. > > > > Well, I'm not sure under which circumstances this should be an > > unconditional win anyway (I expect the strcat library fn to be > > optimized well enough, and only if you can avoid the strlen on the > > dest in the end it will be profitable) > > No, while the strcat library fn can be very optimized, it still has no info > about how long the second parameter is. strcat is implementation is > typically an optimized strchr (dst, 0) followed by an optimized strcpy, > where the two can perhaps avoid some alignment adjustments or something. > But the strcpy still has to for each word or whatever chunk it reads test > for terminating zeros, while if you do an (optimized) strlen followed by > memcpy where you already know the length, that is a win.
Well, strcat itself can do that ... but yes, as I said, if you can CSE that strlen call then it's a win. But you can't know this without more context. > > > But I guess if we optimize it again, your testcase would crash again, > > > right? > > > > Right. We can apply the optimization at RTL expansion time though, > > or handle the folding completely in gimple-fold with not needing to > > dispatch to the gimplifier. > > After playing with the testcase in a debugger, my strong preference at > least for the 4.8 branch would be just a global flag (or context flag) to > prevent > the nested folding. I think the only problematic thing is what starts with > the avoid_folding_inline_builtin check in gimple_fold_builtin, and we should > just prevent that from happening when called from within > gimplify_and_update_call_from_tree (or just during that call when called > from gimple_fold_call?). Yes, that is the piece that is the problem (iff otherwise recursive folding would never fold anything). > Normally, if folding of a builtin folds it into a call to some other > builtin, that other builtin is folded right away, so the common case is > optimized immediately, the only problem is if gimple_fold_builtin tries > harder to optimize using maximum lengths or exact length (as in this case). > And, for this it wouldn't even help if we handled STRCAT/STRCAT_CHK > specially too and passed the src length to the folder routine, because > gimple_fold_builtin first folds normally and only if that fails, attempts > harder. But we still can re-introduce the folding I removed from builtins.c below the avoid_folding_inline_builtin section as in that case builtins.c didn't do the folding and the gimple folding has strictly more information. No? I don't really like a special flag ... if, then I'd rather have a flag to tell the gimplfier not to fold stmts which we can set on the gimpifier context we push in gimplify_and_update_call_from_tree. Btw, I already applied the patch (I can of course revert it if we arrive at a better solution). I don't think I removed an important optimization (in fact usually it will be a pessimization if the strlen cannot be re-used - sth tree-ssa-strlen should be able to figure out). Richard.