On 08/21/18 10:59, Richard Biener wrote: > On Tue, 21 Aug 2018, Bernd Edlinger wrote: > >> gcc -S -O2 -Wall -Wformat-overflow -ftrack-macro-expansion=0 -fshort-wchar >> builtin-sprintf-warn-20.c >> builtin-sprintf-warn-20.c: In function 'test': >> builtin-sprintf-warn-20.c:19:39: warning: hex escape sequence out of range >> 19 | ? (char*)L"\x4142\x4344" : (char*)L"\x41424344\x45464748"; >> | ^~~~~~~~~~~~~~~~~~~~~~~ >> >> Hmm, this test might create some noise on short-wchar targets. >> >> I would prefer a warning here, about the wrong type of the parameter. >> The buffer overflow is only a secondary thing. >> >> For constant objects like those, the GIMPLE type is still guaranteed to be >> reliable, >> right? > > TREE_TYPE of tcc_declaration and tcc_constant trees should more-or-less > (minus qualifications not affecting semantics) be those set by > frontends. >
and in this case: const union { struct { wchar_t x[4]; }; struct { char z[8]; }; } u = {{L"123"}}; int test() { return __builtin_strlen(u.z); } string_constant works out the initializer for u.x which has a different type than u.z > Richard. > >> >> Bernd. >> >>>>> I would just like the ability to get the length/size somehow >>>>> so that the questionable code that started us down this path >>>>> can be detected. This is not just the sprintf(d, "%s", L"1") >>>>> kind of a mismatch but also the missing nul detection in cases >>>>> like: >>>> [ ... ] >>>> I know. And ideally we'll be able to handle everything you want to. >>>> But we also have to realize that sometimes we may have to punt. >>>> >>>> Also remember that incremental progress is (usually) good. >>>> >>>> >>>>> >>>>> const wchar_t a[1] = L"\xffffffff"; >>>>> strcpy(d, (char*)a); >>>> This touches on both the representational issue (excess elements in the >>>> initializer) and how to handle a non-terminated string. Both are issues >>>> we're debating right now. >>>> >>>>> >>>>> I don't think this is necessarily an important use case to >>>>> optimize for but it is one that GCC optimizes already nonetheless, >>>>> and always has. For example, this has always been folded into 4 >>>>> by GCC and still is even with the patch: >>>>> >>>>> const __WCHAR_TYPE__ wc[] = L"\x12345678"; >>>>> >>>>> int f (void) >>>>> { >>>>> const char *p = (char*)wc; >>>>> return strlen (p); // folded to 4 >>>>> } >>>>> >>>>> It's optimized because fold_const_call() relies on c_getstr() >>>>> which returns the underlying byte representation of the wide >>>>> string, and despite c_strlen() now trying to prevent it. >>>> And I think you're hitting on some of issues already raised in the thread. >>>> >>>> In this specific case though ISTM 4 is the right answer. We casted to a >>>> char *, so that's what we should be counting. Or am I missing >>>> something? Also note that's the value we'd get from the strlen C >>>> library call IIUC. >>> >>> It is the right answer. My point is that this is optimized >>> but the change to c_strlen() prevents the same optimization >>> in other similar cases. For example, GCC 6 folds this into >>> memcpy: >>> >>> __builtin_strcpy (d, (char*)L"\x12345678"); >>> >>> GCC 7 and 8 do too but get the byte count wrong (my bad). >>> >>> Current trunk doesn't optimize it. If restoring the original >>> behavior is the intent (and not just fixing the counting bug) >>> then c_strlen() should be fixed to fold this again. >>> >>> (I'm not a fan of the strcpy to memcpy transformation because >>> it makes other things difficult but that's beside the point.) >>> >>>>> The missing nul variant of the same test case isn't folded (it >>>>> ends up calling the library strlen) but the bug cannot be >>>>> detected using the modified c_strlen(): >>>>> >>>>> const __WCHAR_TYPE__ wc[1] = L"\x12345678"; // no nul >>>>> >>>>> int f (void) >>>>> { >>>>> const char *p = (char*)wc; >>>>> return strlen (p); // not folded >>>>> } >>>>> >>>>> To detect it, I'd have to use some other function, like >>>>> c_getstr(). I could certainly do that but I don't think >>>>> I should need to. >>>> And that's on the list of things we're going to try and address next. I >>>> don't think it needs to be conflated with change which indicated to >>>> c_strlen how to count. >>> >>> The two are one and the same: if c_strlen() doesn't count bytes >>> in arrays of wide characters (wchar_t, char16_t, or char32_t) >>> then the original GCC behavior is not restored and the missing >>> nul detection won't work for these "mixed type" cases. >>> >>>> I do hope you're getting all these testcases recorded. I'd even support >>>> installing them into the suite now, properly xfailed. It's easier to >>>> see how any patch under consideration affects the wider set of tests. >>> >>> I haven't been recording any of them -- I have no idea where >>> this is going. As I've said, these patches prevent some of >>> the work I have already submitted. That's why I have been >>> objecting to them, including adding the ELTSIZE argument to >>> c_strlen(). >>> >>> Martin >> >> >