On Mon, Jan 20, 2014 at 01:35:05PM +0100, Richard Biener wrote: > > tree-ssa-strlen.c apparently both doesn't this case (unknown first strlen, > > known second strlen or movstr pattern, will only transform that if the > > length of the resulting string is needed afterwards), and isn't run > > for -Os or -O1 anyway. > > Well, I'm not sure under which circumstances this should be an > unconditional win anyway (I expect the strcat library fn to be > optimized well enough, and only if you can avoid the strlen on the > dest in the end it will be profitable)
No, while the strcat library fn can be very optimized, it still has no info about how long the second parameter is. strcat is implementation is typically an optimized strchr (dst, 0) followed by an optimized strcpy, where the two can perhaps avoid some alignment adjustments or something. But the strcpy still has to for each word or whatever chunk it reads test for terminating zeros, while if you do an (optimized) strlen followed by memcpy where you already know the length, that is a win. > > But I guess if we optimize it again, your testcase would crash again, right? > > Right. We can apply the optimization at RTL expansion time though, > or handle the folding completely in gimple-fold with not needing to > dispatch to the gimplifier. After playing with the testcase in a debugger, my strong preference at least for the 4.8 branch would be just a global flag (or context flag) to prevent the nested folding. I think the only problematic thing is what starts with the avoid_folding_inline_builtin check in gimple_fold_builtin, and we should just prevent that from happening when called from within gimplify_and_update_call_from_tree (or just during that call when called from gimple_fold_call?). Normally, if folding of a builtin folds it into a call to some other builtin, that other builtin is folded right away, so the common case is optimized immediately, the only problem is if gimple_fold_builtin tries harder to optimize using maximum lengths or exact length (as in this case). And, for this it wouldn't even help if we handled STRCAT/STRCAT_CHK specially too and passed the src length to the folder routine, because gimple_fold_builtin first folds normally and only if that fails, attempts harder. Jakub