Hi! Note the intent of the pass is to handle the most common cases, it is fine if some cases that aren't common aren't handled, it is all about the extra complexity vs. how much it helps on real-world code.
On Sun, May 07, 2017 at 10:10:48AM +0100, Richard Sandiford wrote: > I've got most of the way through a version that uses min_length instead. > But one thing that the terminated flag allows that a constant min_length > doesn't is: > > size_t > f1 (char *a1) > { > size_t x = strlen (a1); > char *a3 = a1 + x; > a3[0] = '1'; // a1 length x + 1, unterminated (min length x + 1) > a3[1] = '2'; // a1 length x + 2, unterminated (min length x + 2) > a3[2] = '3'; // a1 length x + 3, unterminated (min length x + 3) > a3[3] = 0; // a1 length x + 3, terminated > return strlen (a1); > } > > For the a3[3] = 0, we know a3's min_length is 3 and so it's obvious > that we can convert its min_length to a length. But even if we allow > a1's min_length to be nonconstant, it seems a bit risky to assume that > we can convert its min_length to a length as well. It would only work > if the min_lengths in related strinfos are kept in sync, whereas it > ought to be safe to say that the minimum length of something is 0. And we have code for that. If verify_related_strinfos returns non-NULL, we can adjust all the related strinfos that need adjusting. See e.g. zero_length_string on how it uses that. It is just that we should decide what is the acceptable complexity of the length/min_length expressions (whether INTEGER_CST or SSA_NAME is enough, then the above would not work, but is that really that important), or if we e.g. allow SSA_NAME + INTEGER_CST in addition to that, or sum of 2 SSA_NAMEs, etc.). I don't see how terminated vs. unterminated (which is misnamed anyway, it means that it isn't known to be terminated, it might be terminated or not) would help with that. > So I think that gives four possiblities: > > (1) Keep the terminated flag, but extend the original patch to handle > strings built up a character at a time. This would handle f1 above. Only if you allow complex expressions like SSA_NAME + INTEGER_CST in length. > (2) Replace the terminated flag with a constant minimum length, don't > handle f1 above. Sure, if you only allow constants, it will be limited to constants. > (3) Replace the terminated flag with an arbitrary minimum length and > ensure that it's always valid to copy the minimum length to the > length when we do so for the final strinfo in a chain. Even length doesn't allow arbitrary expressions, the more complex it is, the more expensive will it be to compute it when you e.g. replace strlen with that. I'd introduce min_length, start with INTEGER_CST, once it is handled everywhere in the pass properly, see if there is enough code in the wild that would justify allowing more than that. min_length is a simple guarantee that there are no zero bytes among the first min_length bytes, length is the same plus that there is a zero byte right after that, so it is easy to argue about what happens if you store non-zero somewhere into it, or store zero, etc. Jakub