On 08/21/18 10:59, Richard Biener wrote:
> On Tue, 21 Aug 2018, Bernd Edlinger wrote:
>
>> gcc -S -O2 -Wall -Wformat-overflow -ftrack-macro-expansion=0 -fshort-wchar
>> builtin-sprintf-warn-20.c
>> builtin-sprintf-warn-20.c: In function 'test':
>> builtin-sprintf-warn-20.c:19:39: warning: hex escape sequence out of range
>> 19 | ? (char*)L"\x4142\x4344" : (char*)L"\x41424344\x45464748";
>> | ^~~~~~~~~~~~~~~~~~~~~~~
>>
>> Hmm, this test might create some noise on short-wchar targets.
>>
>> I would prefer a warning here, about the wrong type of the parameter.
>> The buffer overflow is only a secondary thing.
>>
>> For constant objects like those, the GIMPLE type is still guaranteed to be
>> reliable,
>> right?
>
> TREE_TYPE of tcc_declaration and tcc_constant trees should more-or-less
> (minus qualifications not affecting semantics) be those set by
> frontends.
>
and in this case:
const union
{ struct {
wchar_t x[4];
};
struct {
char z[8];
};
} u = {{L"123"}};
int test()
{
return __builtin_strlen(u.z);
}
string_constant works out the initializer for u.x
which has a different type than u.z
> Richard.
>
>>
>> Bernd.
>>
>>>>> I would just like the ability to get the length/size somehow
>>>>> so that the questionable code that started us down this path
>>>>> can be detected. This is not just the sprintf(d, "%s", L"1")
>>>>> kind of a mismatch but also the missing nul detection in cases
>>>>> like:
>>>> [ ... ]
>>>> I know. And ideally we'll be able to handle everything you want to.
>>>> But we also have to realize that sometimes we may have to punt.
>>>>
>>>> Also remember that incremental progress is (usually) good.
>>>>
>>>>
>>>>>
>>>>> const wchar_t a[1] = L"\xffffffff";
>>>>> strcpy(d, (char*)a);
>>>> This touches on both the representational issue (excess elements in the
>>>> initializer) and how to handle a non-terminated string. Both are issues
>>>> we're debating right now.
>>>>
>>>>>
>>>>> I don't think this is necessarily an important use case to
>>>>> optimize for but it is one that GCC optimizes already nonetheless,
>>>>> and always has. For example, this has always been folded into 4
>>>>> by GCC and still is even with the patch:
>>>>>
>>>>> const __WCHAR_TYPE__ wc[] = L"\x12345678";
>>>>>
>>>>> int f (void)
>>>>> {
>>>>> const char *p = (char*)wc;
>>>>> return strlen (p); // folded to 4
>>>>> }
>>>>>
>>>>> It's optimized because fold_const_call() relies on c_getstr()
>>>>> which returns the underlying byte representation of the wide
>>>>> string, and despite c_strlen() now trying to prevent it.
>>>> And I think you're hitting on some of issues already raised in the thread.
>>>>
>>>> In this specific case though ISTM 4 is the right answer. We casted to a
>>>> char *, so that's what we should be counting. Or am I missing
>>>> something? Also note that's the value we'd get from the strlen C
>>>> library call IIUC.
>>>
>>> It is the right answer. My point is that this is optimized
>>> but the change to c_strlen() prevents the same optimization
>>> in other similar cases. For example, GCC 6 folds this into
>>> memcpy:
>>>
>>> __builtin_strcpy (d, (char*)L"\x12345678");
>>>
>>> GCC 7 and 8 do too but get the byte count wrong (my bad).
>>>
>>> Current trunk doesn't optimize it. If restoring the original
>>> behavior is the intent (and not just fixing the counting bug)
>>> then c_strlen() should be fixed to fold this again.
>>>
>>> (I'm not a fan of the strcpy to memcpy transformation because
>>> it makes other things difficult but that's beside the point.)
>>>
>>>>> The missing nul variant of the same test case isn't folded (it
>>>>> ends up calling the library strlen) but the bug cannot be
>>>>> detected using the modified c_strlen():
>>>>>
>>>>> const __WCHAR_TYPE__ wc[1] = L"\x12345678"; // no nul
>>>>>
>>>>> int f (void)
>>>>> {
>>>>> const char *p = (char*)wc;
>>>>> return strlen (p); // not folded
>>>>> }
>>>>>
>>>>> To detect it, I'd have to use some other function, like
>>>>> c_getstr(). I could certainly do that but I don't think
>>>>> I should need to.
>>>> And that's on the list of things we're going to try and address next. I
>>>> don't think it needs to be conflated with change which indicated to
>>>> c_strlen how to count.
>>>
>>> The two are one and the same: if c_strlen() doesn't count bytes
>>> in arrays of wide characters (wchar_t, char16_t, or char32_t)
>>> then the original GCC behavior is not restored and the missing
>>> nul detection won't work for these "mixed type" cases.
>>>
>>>> I do hope you're getting all these testcases recorded. I'd even support
>>>> installing them into the suite now, properly xfailed. It's easier to
>>>> see how any patch under consideration affects the wider set of tests.
>>>
>>> I haven't been recording any of them -- I have no idea where
>>> this is going. As I've said, these patches prevent some of
>>> the work I have already submitted. That's why I have been
>>> objecting to them, including adding the ELTSIZE argument to
>>> c_strlen().
>>>
>>> Martin
>>
>>
>