Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Richard Biener Tue, 21 Aug 2018 02:00:08 -0700

On Tue, 21 Aug 2018, Bernd Edlinger wrote:

> gcc -S -O2 -Wall -Wformat-overflow -ftrack-macro-expansion=0 -fshort-wchar 
> builtin-sprintf-warn-20.c
> builtin-sprintf-warn-20.c: In function 'test':
> builtin-sprintf-warn-20.c:19:39: warning: hex escape sequence out of range
> 19 |     ? (char*)L"\x4142\x4344" : (char*)L"\x41424344\x45464748";
>     |                                       ^~~~~~~~~~~~~~~~~~~~~~~
> 
> Hmm, this test might create some noise on short-wchar targets.
> 
> I would prefer a warning here, about the wrong type of the parameter.
> The buffer overflow is only a secondary thing.
> 
> For constant objects like those, the GIMPLE type is still guaranteed to be 
> reliable,
> right?


TREE_TYPE of tcc_declaration and tcc_constant trees should more-or-less
(minus qualifications not affecting semantics) be those set by
frontends.

Richard.

> 
> Bernd.
> 
> >>> I would just like the ability to get the length/size somehow
> >>> so that the questionable code that started us down this path
> >>> can be detected.  This is not just the sprintf(d, "%s", L"1")
> >>> kind of a mismatch but also the missing nul detection in cases
> >>> like:
> >> [ ... ]
> >> I know.  And ideally we'll be able to handle everything you want to.
> >> But we also have to realize that sometimes we may have to punt.
> >>
> >> Also remember that incremental progress is (usually) good.
> >>
> >>
> >>>
> >>>   const wchar_t a[1] = L"\xffffffff";
> >>>   strcpy(d, (char*)a);
> >> This touches on both the representational issue (excess elements in the
> >> initializer) and how to handle a non-terminated string.  Both are issues
> >> we're debating right now.
> >>
> >>>
> >>> I don't think this is necessarily an important use case to
> >>> optimize for but it is one that GCC optimizes already nonetheless,
> >>> and always has.  For example, this has always been folded into 4
> >>> by GCC and still is even with the patch:
> >>>
> >>>   const __WCHAR_TYPE__ wc[] = L"\x12345678";
> >>>
> >>>   int f (void)
> >>>   {
> >>>     const char *p = (char*)wc;
> >>>     return strlen (p);            // folded to 4
> >>>   }
> >>>
> >>> It's optimized because fold_const_call() relies on c_getstr()
> >>> which returns the underlying byte representation of the wide
> >>> string, and despite c_strlen() now trying to prevent it.
> >> And I think you're hitting on some of issues already raised in the thread.
> >>
> >> In this specific case though ISTM 4 is the right answer.  We casted to a
> >> char *, so that's what we should be counting.  Or am I missing
> >> something?  Also note that's the value we'd get from the strlen C
> >> library call IIUC.
> > 
> > It is the right answer.  My point is that this is optimized
> > but the change to c_strlen() prevents the same optimization
> > in other similar cases.  For example, GCC 6 folds this into
> > memcpy:
> > 
> >    __builtin_strcpy (d, (char*)L"\x12345678");
> > 
> > GCC 7 and 8 do too but get the byte count wrong (my bad).
> > 
> > Current trunk doesn't optimize it.  If restoring the original
> > behavior is the intent (and not just fixing the counting bug)
> > then c_strlen() should be fixed to fold this again.
> > 
> > (I'm not a fan of the strcpy to memcpy transformation because
> > it makes other things difficult but that's beside the point.)
> > 
> >>> The missing nul variant of the same test case isn't folded (it
> >>> ends up calling the library strlen) but the bug cannot be
> >>> detected using the modified c_strlen():
> >>>
> >>>   const __WCHAR_TYPE__ wc[1] = L"\x12345678";   // no nul
> >>>
> >>>   int f (void)
> >>>   {
> >>>     const char *p = (char*)wc;
> >>>     return strlen (p);            // not folded
> >>>   }
> >>>
> >>> To detect it, I'd have to use some other function, like
> >>> c_getstr().  I could certainly do that but I don't think
> >>> I should need to.
> >> And that's on the list of things we're going to try and address next.  I
> >> don't think it needs to be conflated with change which indicated to
> >> c_strlen how to count.
> > 
> > The two are one and the same: if c_strlen() doesn't count bytes
> > in arrays of wide characters (wchar_t, char16_t, or char32_t)
> > then the original GCC behavior is not restored and the missing
> > nul detection won't work for these "mixed type" cases.
> > 
> >> I do hope you're getting all these testcases recorded.  I'd even support
> >> installing them into the suite now, properly xfailed.  It's easier to
> >> see how any patch under consideration affects the wider set of tests.
> > 
> > I haven't been recording any of them --  I have no idea where
> > this is going.  As I've said, these patches prevent some of
> > the work I have already submitted.  That's why I have been
> > objecting to them, including adding the ELTSIZE argument to
> > c_strlen().
> > 
> > Martin
> 
> 

-- 
Richard Biener <[email protected]>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

Reply via email to