On Tue, 21 Aug 2018, Bernd Edlinger wrote:
> gcc -S -O2 -Wall -Wformat-overflow -ftrack-macro-expansion=0 -fshort-wchar
> builtin-sprintf-warn-20.c
> builtin-sprintf-warn-20.c: In function 'test':
> builtin-sprintf-warn-20.c:19:39: warning: hex escape sequence out of range
> 19 | ? (char*)L"\x4142\x4344" : (char*)L"\x41424344\x45464748";
> | ^~~~~~~~~~~~~~~~~~~~~~~
>
> Hmm, this test might create some noise on short-wchar targets.
>
> I would prefer a warning here, about the wrong type of the parameter.
> The buffer overflow is only a secondary thing.
>
> For constant objects like those, the GIMPLE type is still guaranteed to be
> reliable,
> right?
TREE_TYPE of tcc_declaration and tcc_constant trees should more-or-less
(minus qualifications not affecting semantics) be those set by
frontends.
Richard.
>
> Bernd.
>
> >>> I would just like the ability to get the length/size somehow
> >>> so that the questionable code that started us down this path
> >>> can be detected. This is not just the sprintf(d, "%s", L"1")
> >>> kind of a mismatch but also the missing nul detection in cases
> >>> like:
> >> [ ... ]
> >> I know. And ideally we'll be able to handle everything you want to.
> >> But we also have to realize that sometimes we may have to punt.
> >>
> >> Also remember that incremental progress is (usually) good.
> >>
> >>
> >>>
> >>> const wchar_t a[1] = L"\xffffffff";
> >>> strcpy(d, (char*)a);
> >> This touches on both the representational issue (excess elements in the
> >> initializer) and how to handle a non-terminated string. Both are issues
> >> we're debating right now.
> >>
> >>>
> >>> I don't think this is necessarily an important use case to
> >>> optimize for but it is one that GCC optimizes already nonetheless,
> >>> and always has. For example, this has always been folded into 4
> >>> by GCC and still is even with the patch:
> >>>
> >>> const __WCHAR_TYPE__ wc[] = L"\x12345678";
> >>>
> >>> int f (void)
> >>> {
> >>> const char *p = (char*)wc;
> >>> return strlen (p); // folded to 4
> >>> }
> >>>
> >>> It's optimized because fold_const_call() relies on c_getstr()
> >>> which returns the underlying byte representation of the wide
> >>> string, and despite c_strlen() now trying to prevent it.
> >> And I think you're hitting on some of issues already raised in the thread.
> >>
> >> In this specific case though ISTM 4 is the right answer. We casted to a
> >> char *, so that's what we should be counting. Or am I missing
> >> something? Also note that's the value we'd get from the strlen C
> >> library call IIUC.
> >
> > It is the right answer. My point is that this is optimized
> > but the change to c_strlen() prevents the same optimization
> > in other similar cases. For example, GCC 6 folds this into
> > memcpy:
> >
> > __builtin_strcpy (d, (char*)L"\x12345678");
> >
> > GCC 7 and 8 do too but get the byte count wrong (my bad).
> >
> > Current trunk doesn't optimize it. If restoring the original
> > behavior is the intent (and not just fixing the counting bug)
> > then c_strlen() should be fixed to fold this again.
> >
> > (I'm not a fan of the strcpy to memcpy transformation because
> > it makes other things difficult but that's beside the point.)
> >
> >>> The missing nul variant of the same test case isn't folded (it
> >>> ends up calling the library strlen) but the bug cannot be
> >>> detected using the modified c_strlen():
> >>>
> >>> const __WCHAR_TYPE__ wc[1] = L"\x12345678"; // no nul
> >>>
> >>> int f (void)
> >>> {
> >>> const char *p = (char*)wc;
> >>> return strlen (p); // not folded
> >>> }
> >>>
> >>> To detect it, I'd have to use some other function, like
> >>> c_getstr(). I could certainly do that but I don't think
> >>> I should need to.
> >> And that's on the list of things we're going to try and address next. I
> >> don't think it needs to be conflated with change which indicated to
> >> c_strlen how to count.
> >
> > The two are one and the same: if c_strlen() doesn't count bytes
> > in arrays of wide characters (wchar_t, char16_t, or char32_t)
> > then the original GCC behavior is not restored and the missing
> > nul detection won't work for these "mixed type" cases.
> >
> >> I do hope you're getting all these testcases recorded. I'd even support
> >> installing them into the suite now, properly xfailed. It's easier to
> >> see how any patch under consideration affects the wider set of tests.
> >
> > I haven't been recording any of them -- I have no idea where
> > this is going. As I've said, these patches prevent some of
> > the work I have already submitted. That's why I have been
> > objecting to them, including adding the ELTSIZE argument to
> > c_strlen().
> >
> > Martin
>
>
--
Richard Biener <[email protected]>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB
21284 (AG Nuernberg)