Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Jeff Law Tue, 21 Aug 2018 08:28:51 -0700

On 08/20/2018 11:17 PM, Bernd Edlinger wrote:
> On 08/21/18 01:23, Martin Sebor wrote:
>> On 08/20/2018 04:17 PM, Jeff Law wrote:
>>> On 08/18/2018 11:38 AM, Martin Sebor wrote:
>>>> On 08/17/2018 09:32 PM, Jeff Law wrote:
>>>>> On 08/17/2018 02:17 PM, Martin Sebor wrote:
>>>>>> On 08/17/2018 12:44 PM, Bernd Edlinger wrote:
>>>>>>> On 08/17/18 20:23, Martin Sebor wrote:
>>>>>>>> On 08/17/2018 06:14 AM, Joseph Myers wrote:
>>>>>>>>> On Fri, 17 Aug 2018, Jeff Law wrote:
>>>>>>>>>
>>>>>>>>>> On 08/16/2018 05:01 PM, Joseph Myers wrote:
>>>>>>>>>>> On Thu, 16 Aug 2018, Jeff Law wrote:
>>>>>>>>>>>
>>>>>>>>>>>> restores previous behavior.  The sprintf bits want to count
>>>>>>>>>>>> element
>>>>>>>>>>>> sized chunks, which for wchars is 4 bytes (that count will then be
>>>>>>>>>>>
>>>>>>>>>>>>    /* Compute the range the argument's length can be in.  */
>>>>>>>>>>>> -  fmtresult slen = get_string_length (arg);
>>>>>>>>>>>> +  int count_by = dir.specifier == 'S' || dir.modifier ==
>>>>>>>>>>>> FMT_LEN_l ? 4 : 1;
>>>>>>>>>>>
>>>>>>>>>>> I don't see how a hardcoded 4 is correct here.  Surely you need to
>>>>>>>>>>> example
>>>>>>>>>>> wchar_type_node to determine its actual size for this target.
>>>>>>>>>> We did kick this around a little.  IIRC Martin didn't think that it
>>>>>>>>>> was
>>>>>>>>>> worth handling the 2 byte wchar case.
>>>>>>>>
>>>>>>>> Sorry, I think we may have miscommunicated -- I didn't think it
>>>>>>>> was useful to pass a size of the character type to the function.
>>>>>>>> I agree that passing in a hardcoded constant doesn't seem right
>>>>>>>> (even if GCC's wchar_t were always 4 bytes wide).
>>>>>>>>
>>>>>>>> I'm still not sure I see the benefit of passing in the expected
>>>>>>>> element size given that the patch causes c_strlen() fail when
>>>>>>>> the actual element size doesn't match ELTSIZE.  If the caller
>>>>>>>> asks for the number of bytes in a const wchar_t array it should
>>>>>>>> get back the number bytes.  (I could see it fail if the caller
>>>>>>>> asked for the number of words in a char array whose size was
>>>>>>>> not evenly divisible by wordsize.)
>>>>>>>>
>>>>>>>
>>>>>>> I think in this case c_strlen should use the type which the %S format
>>>>>>> uses at runtime, otherwise it will not have anything to do with
>>>>>>> the reality.
>>>>>>
>>>>>> %S is not what I'm talking about.
>>>>>>
>>>>>> Failing in the case I described (a caller asking for the size
>>>>>> in bytes of a constant object whose type is larger) prevents
>>>>>> callers that don't care about the type from detecting the actual
>>>>>> size of a constant.
>>>>>>
>>>>>> Specifically for sprintf, it means that the buffer overflow
>>>>>> below is no longer diagnosed:
>>>>>>
>>>>>>   struct S { char a[2]; void (*pf)(void); };
>>>>>>
>>>>>>   void f (struct S *p)
>>>>>>   {
>>>>>>     const char *q = (char*)L"\x41424344\x45464748";
>>>>>>
>>>>>>     sprintf (p->a, "%s", q);
>>>>>>   }
>>>>> I don't think this is in the testsuite, is it?  I verified that there
>>>>> was no regressions when I installed Bernd's patch and when I installed
>>>>> yours.
>>>>
>>>> No, there are very few tests that exercise these kinds of mixed
>>>> argument types.  Code like that is most likely the result of
>>>> a mistake, but it's not the kind of a bug I had even thought
>>>> about until some of the codegen issues with mixed argument types
>>>> were brought up (PR 86711/86714).
>>> Phew.  I was worried I'd somehow missed the failure or tested the wrong
>>> patch or who knows what.
>>>
>>> Can you add that test, xfailed for now to the testsuite?
>>
>> Done in r263676.
>>
>>>> I would just like the ability to get the length/size somehow
>>>> so that the questionable code that started us down this path
>>>> can be detected.  This is not just the sprintf(d, "%s", L"1")
>>>> kind of a mismatch but also the missing nul detection in cases
>>>> like:
>>> [ ... ]
>>> I know.  And ideally we'll be able to handle everything you want to.
>>> But we also have to realize that sometimes we may have to punt.
>>>
>>> Also remember that incremental progress is (usually) good.
>>>
>>>
>>>>
>>>>   const wchar_t a[1] = L"\xffffffff";
>>>>   strcpy(d, (char*)a);
>>> This touches on both the representational issue (excess elements in the
>>> initializer) and how to handle a non-terminated string.  Both are issues
>>> we're debating right now.
>>>
>>>>
>>>> I don't think this is necessarily an important use case to
>>>> optimize for but it is one that GCC optimizes already nonetheless,
>>>> and always has.  For example, this has always been folded into 4
>>>> by GCC and still is even with the patch:
>>>>
>>>>   const __WCHAR_TYPE__ wc[] = L"\x12345678";
>>>>
>>>>   int f (void)
>>>>   {
>>>>     const char *p = (char*)wc;
>>>>     return strlen (p);            // folded to 4
>>>>   }
>>>>
>>>> It's optimized because fold_const_call() relies on c_getstr()
>>>> which returns the underlying byte representation of the wide
>>>> string, and despite c_strlen() now trying to prevent it.
>>> And I think you're hitting on some of issues already raised in the thread.
>>>
>>> In this specific case though ISTM 4 is the right answer.  We casted to a
>>> char *, so that's what we should be counting.  Or am I missing
>>> something?  Also note that's the value we'd get from the strlen C
>>> library call IIUC.
>>
>> It is the right answer.  My point is that this is optimized
>> but the change to c_strlen() prevents the same optimization
>> in other similar cases.  For example, GCC 6 folds this into
>> memcpy:
>>
>>    __builtin_strcpy (d, (char*)L"\x12345678");
>>
>> GCC 7 and 8 do too but get the byte count wrong (my bad).
>>
>> Current trunk doesn't optimize it.  If restoring the original
>> behavior is the intent (and not just fixing the counting bug)
>> then c_strlen() should be fixed to fold this again.
>>
>> (I'm not a fan of the strcpy to memcpy transformation because
>> it makes other things difficult but that's beside the point.)
>>
> 
> Yes, this is a good example why it is not good to fold
> stuff that is invalid:
> 
>   __builtin_strcpy(d, (char*)L"\x12345678");
>   =>
>   __builtin_memcpy(d, (char*)L"\x12345678", 5);
> 
> because it folds away the invalidness, makes invalid things
> valid, and prevents later passes to spot the bug.
> 
> As a matter of common sense, we should not fold invalid stuff
> to valid stuff, even if that is historic behavior.
Generally agreed.  I'd rather see an invalid call of this nature simply
turned into a trap rather than folded into something that is neither
safe nor correct.


Jeff

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Reply via email to