Re: [PATCH] Fix wrong code with truncated string literals (PR 86711/86714)

Jeff Law Tue, 14 Aug 2018 22:47:16 -0700

On 08/03/2018 03:38 PM, Bernd Edlinger wrote:
> On 08/03/18 23:15, Jeff Law wrote:
>> On 07/30/2018 02:21 PM, Bernd Edlinger wrote:
>>> On 07/30/18 21:52, Martin Sebor wrote:
>>>> On 07/30/2018 09:24 AM, Bernd Edlinger wrote:
>>>>> On 07/30/18 01:05, Martin Sebor wrote:
>>>>>> On 07/29/2018 04:56 AM, Bernd Edlinger wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> This fixes two wrong code bugs where string_constant
>>>>>>> returns over length string constants.  Initializers
>>>>>>> like that are rejected in C++, but valid in C.
>>>>>>
>>>>>> If by valid you are referring to declarations like the one in
>>>>>> the added test:
>>>>>>
>>>>>>       const char a[2][3] = { "1234", "xyz" };
>>>>>>
>>>>>> then (as I explained), the excess elements in "1234" make
>>>>>> the char[3] initialization and thus the test case undefined.
>>>>>> I have resolved bug 86711 as invalid on those grounds.
>>>>>>
>>>>>> Bug 86711 has a valid test case that needs to be fixed, along
>>>>>> with bug 86688 that I raised for the same underlying problem:
>>>>>> considering the excess nul as part of the string.  As has been
>>>>>> discussed in a separate bug, rather than working around
>>>>>> the excessively long strings in the middle-end, it would be
>>>>>> preferable to avoid creating them to begin with.
>>>>>>
>>>>>> I'm already working on a fix for bug 86688, in part because
>>>>>> I introduced the code change and also because I'm making other
>>>>>> changes in this area -- bug 86552.  Both of these in response
>>>>>> to your comments.
>>>>>>
>>>>>
>>>>> Sorry, I must admit, I have completely lost track on how many things
>>>>> you are trying to work in parallel.
>>>>>
>>>>> Nevertheless I started to review you pr86552 patch here:
>>>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01593.html
>>>>>
>>>>> But so far you did not respond to me.
>>>>>
>>>>> Well actually I doubt your patch does apply to trunk,
>>>>> maybe you start to re-base that one, and post it again
>>>>> instead?
>>>>
>>>> I read your comments and have been busy working on enhancing
>>>> the patch (among other things).  There are a large number of
>>>> additional contexts where constant strings are expected and
>>>> where a missing nul needs to be detected.  Some include
>>>> additional instances of strlen calls that my initial patch
>>>> didn't handle, many more others that involve other string
>>>> functions.  I have posted an updated patch that applies
>>>> cleanly and that handles the first set.
>>>>
>>>> There is also a class of problems involving constant character
>>>> arrays initialized by a braced list, as in char [] = { x, y, z };
>>>> Those are currently not recognized as strings even if they are
>>>> nul-terminated, but they are far more likely to be a source of
>>>> these problems than string literals, certainly in C++ where
>>>> string initializers must fit in the array.  I am testing a patch
>>>> to convert those into STRING_CST so they can be handled as well.
>>>>
>>>> Since initializing arrays with more elements than fit is
>>>> undefined in C and since the behavior is undefined at compile
>>>> time it seems to me that rejecting such initializers with
>>>> a hard error (as opposed to a warning) would be appropriate
>>>> and obviate having to deal with them in the middle-end.
>>>>
>>>
>>> We do not want to change what is currently accepted by the
>>> front end. period.
>> ?!?  I don't follow this.  When we can detect an error, we should issue
>> a suitable diagnostic.  As is mentioned later in the thread, some
>> diagnostics are considered "pedantic" and are conditional.  But I don't
>> see how you get to "We do not want to change what is currently accepted
>> by the front end. period."  That seems like far too strong of a statement.
>>
> 
> What I mean here is: we should fix a middle-end consistency issue with
> the string constants.  When that is done, we can decide to make
> change a pedantic error into a hard error, but IMHO it should be an 
> independent
> patch of it's own.
Note Jakub has indicated why changing this to a hard error would be a
mistake and what the semantics ought to be in this scenario.  so ignore
what I said about turning this into a hard error.  I think we do
truncation per the defined semantics and present the middle end with
"canonicalized" literals.


> 
> By the way, I found that Fortran has non-zero terminated strings with
> index range starting from 1.
> Then there are also overlength strings in Fortran, that have excess precision.
> Should we forbid that Fortran feature as well?
I have no clue.

> 
> Then Ada and Go do not have zero-terminated strings, but I do not know
> if these can hit the strlen pass.
I wouldn't think so, but maybe in code that's meant to interoperate with
C.  Or if they implemented a C looking strlen and GCC recognized it and
turned it into a strlen call.  I don't think we do anything like that
now, but you never know when someone might add it.

jeff

Re: [PATCH] Fix wrong code with truncated string literals (PR 86711/86714)

Reply via email to