Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

Richard Biener Fri, 15 Dec 2017 08:17:28 -0800

On December 15, 2017 4:58:14 PM GMT+01:00, Martin Sebor <mse...@gmail.com> 
wrote:
>On 12/15/2017 01:48 AM, Richard Biener wrote:
>> On Thu, Dec 14, 2017 at 5:01 PM, Martin Sebor <mse...@gmail.com>
>wrote:
>>> On 12/14/2017 03:43 AM, Richard Biener wrote:
>>>>
>>>> On Wed, Dec 13, 2017 at 4:47 AM, Martin Sebor <mse...@gmail.com>
>wrote:
>>>>>
>>>>> On 12/12/2017 05:35 PM, Jeff Law wrote:
>>>>>>
>>>>>>
>>>>>> On 12/12/2017 01:15 PM, Martin Sebor wrote:
>>>>>>>
>>>>>>>
>>>>>>> Bug 83373 - False positive reported by -Wstringop-overflow, is
>>>>>>> another example of warning triggered by a missed optimization
>>>>>>> opportunity, this time in the strlen pass.  The optimization
>>>>>>> is discussed in pr78450 - strlen(s) return value can be assumed
>>>>>>> to be less than the size of s.  The gist of it is that the
>result
>>>>>>> of strlen(array) can be assumed to be less than the size of
>>>>>>> the array (except in the corner case of last struct members).
>>>>>>>
>>>>>>> To avoid the false positive the attached patch adds this
>>>>>>> optimization to the strlen pass.  Although the patch passes
>>>>>>> bootstrap and regression tests for all front-ends I'm not sure
>>>>>>> the way it determines the upper bound of the range is 100%
>>>>>>> correct for languages with arrays with a non-zero lower bound.
>>>>>>> Maybe it's just not as tight as it could be.
>>>>>>
>>>>>>
>>>>>> What about something hideous like
>>>>>>
>>>>>> struct fu {
>>>>>>   char x1[10];
>>>>>>   char x2[10];
>>>>>>   int avoid_trailing_array;
>>>>>> }
>>>>>>
>>>>>> Where objects stored in x1 are not null terminated.  Are we in
>the realm
>>>>>> of undefined behavior at that point (I hope so)?
>>>>>
>>>>>
>>>>>
>>>>> Yes, this is undefined.  Pointer arithmetic (either direct or
>>>>> via standard library functions) is only defined for pointers
>>>>> to the same object or subobject.  So even something like
>>>>>
>>>>>  memcpy (pfu->x1, pfu->x1 + 10, 10);
>>>>>
>>>>> is undefined.
>>>>
>>>>
>>>> There's nothing undefined here - computing the pointer pointing
>>>> to one-after-the-last element of an array is valid (you are just
>>>> not allowed to dereference it).
>>>
>>>
>>> Right, and memcpy dereferences it, so it's undefined.
>>
>> That's interpretation of the standard that I don't share.
>
>It's not an interpretation.  It's a basic rule of the languages
>that the standards are explicit about.  In C11 you will find
>this specified in detail in 6.5.6, paragraph 7 and 8 (of
>particular relevance to your question below is p7: "a pointer
>to an object that is not an element of an array behaves the same
>as a pointer to the first element of an array of length one.")


I know. 

>> Also, if I have struct f { int i; int j; };  and a int * that points
>> to the j member you say I have no standard conforming way
>> to get at a pointer to the i member from this, right?
>
>Correct.  See above.
>
>> Because
>> the pointer points to an 'int' object.  But it also points within
>> a struct f object!  So at least maybe (int *)((char *)p - offsetof
>> (struct f, j))
>> should be valid?
>
>No, not really.  It works in practice but it's not well-defined.
>It doesn't matter how you get at the result.  What matters is
>what you start with.  As Jeff said, to derive a pointer to
>distinct suobjects of a larger object you need to start with
>a pointer to the larger object and treat it as an array of
>chars.

That's obviously not constraints people use C and C++ with so I see no way to 
enforce this within gimple.

>> This means that pfu->x1 + 10 is a valid pointer
>> into *pfu no matter what you say and you can dereference it.
>
>No.
>
>As another hopefully more convincing example consider a multi-
>dimensional array A[2][2].  The value of the offset of A[i][j]
>is sizeof A[i] + j.  With that, the offset of A[1][0] is
>sizeof A[1] + 0, and so would be the offset of A[0][2]. But
>that doesn't make A[0][2] a valid reference to an element of
>A (because A[0] has only two elements, A[0][0] and A[0][1]),
>or &A[0] + 2 a derefernceable pointer.  It's a pointer that
>points just past the last element of the array A[0].  That
>there's another array right after A[0] (namely A[1]) is
>immaterial, same as in the struct f example above.

I know. Dependence analysis relies on this. We've had bugs in the past with gcc 
itself introducing such bogus references. 

Richard. 

>
>Martin

Re: [PATCH] set range for strlen(array) to avoid spurious -Wstringop-overflow (PR 83373 , PR 78450)

Reply via email to