On 08/03/2018 03:38 PM, Bernd Edlinger wrote: > On 08/03/18 23:15, Jeff Law wrote: >> On 07/30/2018 02:21 PM, Bernd Edlinger wrote: >>> On 07/30/18 21:52, Martin Sebor wrote: >>>> On 07/30/2018 09:24 AM, Bernd Edlinger wrote: >>>>> On 07/30/18 01:05, Martin Sebor wrote: >>>>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote: >>>>>>> Hi! >>>>>>> >>>>>>> This fixes two wrong code bugs where string_constant >>>>>>> returns over length string constants. Initializers >>>>>>> like that are rejected in C++, but valid in C. >>>>>> >>>>>> If by valid you are referring to declarations like the one in >>>>>> the added test: >>>>>> >>>>>> const char a[2][3] = { "1234", "xyz" }; >>>>>> >>>>>> then (as I explained), the excess elements in "1234" make >>>>>> the char[3] initialization and thus the test case undefined. >>>>>> I have resolved bug 86711 as invalid on those grounds. >>>>>> >>>>>> Bug 86711 has a valid test case that needs to be fixed, along >>>>>> with bug 86688 that I raised for the same underlying problem: >>>>>> considering the excess nul as part of the string. As has been >>>>>> discussed in a separate bug, rather than working around >>>>>> the excessively long strings in the middle-end, it would be >>>>>> preferable to avoid creating them to begin with. >>>>>> >>>>>> I'm already working on a fix for bug 86688, in part because >>>>>> I introduced the code change and also because I'm making other >>>>>> changes in this area -- bug 86552. Both of these in response >>>>>> to your comments. >>>>>> >>>>> >>>>> Sorry, I must admit, I have completely lost track on how many things >>>>> you are trying to work in parallel. >>>>> >>>>> Nevertheless I started to review you pr86552 patch here: >>>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html >>>>> >>>>> But so far you did not respond to me. >>>>> >>>>> Well actually I doubt your patch does apply to trunk, >>>>> maybe you start to re-base that one, and post it again >>>>> instead? >>>> >>>> I read your comments and have been busy working on enhancing >>>> the patch (among other things). There are a large number of >>>> additional contexts where constant strings are expected and >>>> where a missing nul needs to be detected. Some include >>>> additional instances of strlen calls that my initial patch >>>> didn't handle, many more others that involve other string >>>> functions. I have posted an updated patch that applies >>>> cleanly and that handles the first set. >>>> >>>> There is also a class of problems involving constant character >>>> arrays initialized by a braced list, as in char [] = { x, y, z }; >>>> Those are currently not recognized as strings even if they are >>>> nul-terminated, but they are far more likely to be a source of >>>> these problems than string literals, certainly in C++ where >>>> string initializers must fit in the array. I am testing a patch >>>> to convert those into STRING_CST so they can be handled as well. >>>> >>>> Since initializing arrays with more elements than fit is >>>> undefined in C and since the behavior is undefined at compile >>>> time it seems to me that rejecting such initializers with >>>> a hard error (as opposed to a warning) would be appropriate >>>> and obviate having to deal with them in the middle-end. >>>> >>> >>> We do not want to change what is currently accepted by the >>> front end. period. >> ?!? I don't follow this. When we can detect an error, we should issue >> a suitable diagnostic. As is mentioned later in the thread, some >> diagnostics are considered "pedantic" and are conditional. But I don't >> see how you get to "We do not want to change what is currently accepted >> by the front end. period." That seems like far too strong of a statement. >> > > What I mean here is: we should fix a middle-end consistency issue with > the string constants. When that is done, we can decide to make > change a pedantic error into a hard error, but IMHO it should be an > independent > patch of it's own. Note Jakub has indicated why changing this to a hard error would be a mistake and what the semantics ought to be in this scenario. so ignore what I said about turning this into a hard error. I think we do truncation per the defined semantics and present the middle end with "canonicalized" literals.
> > By the way, I found that Fortran has non-zero terminated strings with > index range starting from 1. > Then there are also overlength strings in Fortran, that have excess precision. > Should we forbid that Fortran feature as well? I have no clue. > > Then Ada and Go do not have zero-terminated strings, but I do not know > if these can hit the strlen pass. I wouldn't think so, but maybe in code that's meant to interoperate with C. Or if they implemented a C looking strlen and GCC recognized it and turned it into a strlen call. I don't think we do anything like that now, but you never know when someone might add it. jeff