On 08/25/18 01:54, Jeff Law wrote: > On 08/24/2018 11:26 AM, Bernd Edlinger wrote: >> On 08/24/18 18:51, Jeff Law wrote: >>>> Well, this is broken for wide character strings. >>>> but I hope we can get rid of STRING_CST which are >>>> not explicitly null terminated. >> >> I am afraid that is not going to happen. >> Maybe we can get STRING_CST that are never longer >> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr >> need to take care that the string is zero-terminated. >> >> string_constant, should not promise the string is zero terminated. >> But instead it can promise that: >> 1) the STRING_CST is valid up to TREE_STRING_LENGTH >> 2) mem_size is >= TREE_STRING_LENGTH >> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO. >> >> It will not guarantee anything about zero termination any more. > Interesting because those conditions would be sufficient to deal with a > regression I stumbled over after fixing Martin's patch to not assume > that all STRING_CSTs are NUL terminated. > > But I need to think about this a bit more. Essentially the question > we'd need to ask is whether or not these are sufficient in general or > just in specific cases. > > I tend to think they're not sufficient in general. If a string returned > by string_constant that didn't have a terminating NUL, but which did > pass the tests above were ultimately passed to the runtime's str* > routines, then the call may run off the end of the string. We'd like to > be able to warn for that. > > So ISTM those rules are only valid in contexts where we know the result > isn't going to be passed to str* and friends within the C library. > > I do think they're sufficient to avoid problems with the > tree-ssa-forwprop code we've looked at. So what may make the most sense > is to have that routine indicate it's willing to accept unterminated > strings, then check the conditions above before optimizing the code. >
There are not too many callers of string_constant. Not all need zero termination. But I think if the are interested in zero-termination they should simply call c_strlen or c_getstr. >> >> In the end, the best approach might be to either merge my patch >> with Martins, or step-wise, first fixing wrong code, and then >> implementing warnings without fixing wrong code. > Unsure at this time. I've been working with both. I suspect that if we > went with yours that we'd then turn around and layer Martin's on top of > it because of the desire to signal to callers that we have an > unterminated string and have the callers take appropriate action. Which > begs the question of whether or not we just go with Martin's -- ie, is > there really any value in using both. I haven't seen indications there > is value in that approach, but I'm still poking at things. > Well, ya call it "layer one patch over the other" I call it "incremental improvements". Bernd.