Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Bernd Edlinger Tue, 21 Aug 2018 21:21:42 -0700

On 08/21/18 10:59, Richard Biener wrote:
> On Tue, 21 Aug 2018, Bernd Edlinger wrote:
> 
>> gcc -S -O2 -Wall -Wformat-overflow -ftrack-macro-expansion=0 -fshort-wchar 
>> builtin-sprintf-warn-20.c
>> builtin-sprintf-warn-20.c: In function 'test':
>> builtin-sprintf-warn-20.c:19:39: warning: hex escape sequence out of range
>> 19 |     ? (char*)L"\x4142\x4344" : (char*)L"\x41424344\x45464748";
>>      |                                       ^~~~~~~~~~~~~~~~~~~~~~~
>>
>> Hmm, this test might create some noise on short-wchar targets.
>>
>> I would prefer a warning here, about the wrong type of the parameter.
>> The buffer overflow is only a secondary thing.
>>
>> For constant objects like those, the GIMPLE type is still guaranteed to be 
>> reliable,
>> right?
> 
> TREE_TYPE of tcc_declaration and tcc_constant trees should more-or-less
> (minus qualifications not affecting semantics) be those set by
> frontends.
>


and in this case:

const union
{ struct {
     wchar_t x[4];
   };
   struct {
     char z[8];
   };
} u = {{L"123"}};

int test()
{
   return __builtin_strlen(u.z);
}


string_constant works out the initializer for u.x
which has a different type than u.z


> Richard.
> 
>>
>> Bernd.
>>
>>>>> I would just like the ability to get the length/size somehow
>>>>> so that the questionable code that started us down this path
>>>>> can be detected.  This is not just the sprintf(d, "%s", L"1")
>>>>> kind of a mismatch but also the missing nul detection in cases
>>>>> like:
>>>> [ ... ]
>>>> I know.  And ideally we'll be able to handle everything you want to.
>>>> But we also have to realize that sometimes we may have to punt.
>>>>
>>>> Also remember that incremental progress is (usually) good.
>>>>
>>>>
>>>>>
>>>>>    const wchar_t a[1] = L"\xffffffff";
>>>>>    strcpy(d, (char*)a);
>>>> This touches on both the representational issue (excess elements in the
>>>> initializer) and how to handle a non-terminated string.  Both are issues
>>>> we're debating right now.
>>>>
>>>>>
>>>>> I don't think this is necessarily an important use case to
>>>>> optimize for but it is one that GCC optimizes already nonetheless,
>>>>> and always has.  For example, this has always been folded into 4
>>>>> by GCC and still is even with the patch:
>>>>>
>>>>>    const __WCHAR_TYPE__ wc[] = L"\x12345678";
>>>>>
>>>>>    int f (void)
>>>>>    {
>>>>>      const char *p = (char*)wc;
>>>>>      return strlen (p);            // folded to 4
>>>>>    }
>>>>>
>>>>> It's optimized because fold_const_call() relies on c_getstr()
>>>>> which returns the underlying byte representation of the wide
>>>>> string, and despite c_strlen() now trying to prevent it.
>>>> And I think you're hitting on some of issues already raised in the thread.
>>>>
>>>> In this specific case though ISTM 4 is the right answer.  We casted to a
>>>> char *, so that's what we should be counting.  Or am I missing
>>>> something?  Also note that's the value we'd get from the strlen C
>>>> library call IIUC.
>>>
>>> It is the right answer.  My point is that this is optimized
>>> but the change to c_strlen() prevents the same optimization
>>> in other similar cases.  For example, GCC 6 folds this into
>>> memcpy:
>>>
>>>     __builtin_strcpy (d, (char*)L"\x12345678");
>>>
>>> GCC 7 and 8 do too but get the byte count wrong (my bad).
>>>
>>> Current trunk doesn't optimize it.  If restoring the original
>>> behavior is the intent (and not just fixing the counting bug)
>>> then c_strlen() should be fixed to fold this again.
>>>
>>> (I'm not a fan of the strcpy to memcpy transformation because
>>> it makes other things difficult but that's beside the point.)
>>>
>>>>> The missing nul variant of the same test case isn't folded (it
>>>>> ends up calling the library strlen) but the bug cannot be
>>>>> detected using the modified c_strlen():
>>>>>
>>>>>    const __WCHAR_TYPE__ wc[1] = L"\x12345678";   // no nul
>>>>>
>>>>>    int f (void)
>>>>>    {
>>>>>      const char *p = (char*)wc;
>>>>>      return strlen (p);            // not folded
>>>>>    }
>>>>>
>>>>> To detect it, I'd have to use some other function, like
>>>>> c_getstr().  I could certainly do that but I don't think
>>>>> I should need to.
>>>> And that's on the list of things we're going to try and address next.  I
>>>> don't think it needs to be conflated with change which indicated to
>>>> c_strlen how to count.
>>>
>>> The two are one and the same: if c_strlen() doesn't count bytes
>>> in arrays of wide characters (wchar_t, char16_t, or char32_t)
>>> then the original GCC behavior is not restored and the missing
>>> nul detection won't work for these "mixed type" cases.
>>>
>>>> I do hope you're getting all these testcases recorded.  I'd even support
>>>> installing them into the suite now, properly xfailed.  It's easier to
>>>> see how any patch under consideration affects the wider set of tests.
>>>
>>> I haven't been recording any of them --  I have no idea where
>>> this is going.  As I've said, these patches prevent some of
>>> the work I have already submitted.  That's why I have been
>>> objecting to them, including adding the ELTSIZE argument to
>>> c_strlen().
>>>
>>> Martin
>>
>>
>

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Reply via email to