On 08/20/2018 11:17 PM, Bernd Edlinger wrote: > On 08/21/18 01:23, Martin Sebor wrote: >> On 08/20/2018 04:17 PM, Jeff Law wrote: >>> On 08/18/2018 11:38 AM, Martin Sebor wrote: >>>> On 08/17/2018 09:32 PM, Jeff Law wrote: >>>>> On 08/17/2018 02:17 PM, Martin Sebor wrote: >>>>>> On 08/17/2018 12:44 PM, Bernd Edlinger wrote: >>>>>>> On 08/17/18 20:23, Martin Sebor wrote: >>>>>>>> On 08/17/2018 06:14 AM, Joseph Myers wrote: >>>>>>>>> On Fri, 17 Aug 2018, Jeff Law wrote: >>>>>>>>> >>>>>>>>>> On 08/16/2018 05:01 PM, Joseph Myers wrote: >>>>>>>>>>> On Thu, 16 Aug 2018, Jeff Law wrote: >>>>>>>>>>> >>>>>>>>>>>> restores previous behavior. The sprintf bits want to count >>>>>>>>>>>> element >>>>>>>>>>>> sized chunks, which for wchars is 4 bytes (that count will then be >>>>>>>>>>> >>>>>>>>>>>> /* Compute the range the argument's length can be in. */ >>>>>>>>>>>> - fmtresult slen = get_string_length (arg); >>>>>>>>>>>> + int count_by = dir.specifier == 'S' || dir.modifier == >>>>>>>>>>>> FMT_LEN_l ? 4 : 1; >>>>>>>>>>> >>>>>>>>>>> I don't see how a hardcoded 4 is correct here. Surely you need to >>>>>>>>>>> example >>>>>>>>>>> wchar_type_node to determine its actual size for this target. >>>>>>>>>> We did kick this around a little. IIRC Martin didn't think that it >>>>>>>>>> was >>>>>>>>>> worth handling the 2 byte wchar case. >>>>>>>> >>>>>>>> Sorry, I think we may have miscommunicated -- I didn't think it >>>>>>>> was useful to pass a size of the character type to the function. >>>>>>>> I agree that passing in a hardcoded constant doesn't seem right >>>>>>>> (even if GCC's wchar_t were always 4 bytes wide). >>>>>>>> >>>>>>>> I'm still not sure I see the benefit of passing in the expected >>>>>>>> element size given that the patch causes c_strlen() fail when >>>>>>>> the actual element size doesn't match ELTSIZE. If the caller >>>>>>>> asks for the number of bytes in a const wchar_t array it should >>>>>>>> get back the number bytes. (I could see it fail if the caller >>>>>>>> asked for the number of words in a char array whose size was >>>>>>>> not evenly divisible by wordsize.) >>>>>>>> >>>>>>> >>>>>>> I think in this case c_strlen should use the type which the %S format >>>>>>> uses at runtime, otherwise it will not have anything to do with >>>>>>> the reality. >>>>>> >>>>>> %S is not what I'm talking about. >>>>>> >>>>>> Failing in the case I described (a caller asking for the size >>>>>> in bytes of a constant object whose type is larger) prevents >>>>>> callers that don't care about the type from detecting the actual >>>>>> size of a constant. >>>>>> >>>>>> Specifically for sprintf, it means that the buffer overflow >>>>>> below is no longer diagnosed: >>>>>> >>>>>> struct S { char a[2]; void (*pf)(void); }; >>>>>> >>>>>> void f (struct S *p) >>>>>> { >>>>>> const char *q = (char*)L"\x41424344\x45464748"; >>>>>> >>>>>> sprintf (p->a, "%s", q); >>>>>> } >>>>> I don't think this is in the testsuite, is it? I verified that there >>>>> was no regressions when I installed Bernd's patch and when I installed >>>>> yours. >>>> >>>> No, there are very few tests that exercise these kinds of mixed >>>> argument types. Code like that is most likely the result of >>>> a mistake, but it's not the kind of a bug I had even thought >>>> about until some of the codegen issues with mixed argument types >>>> were brought up (PR 86711/86714). >>> Phew. I was worried I'd somehow missed the failure or tested the wrong >>> patch or who knows what. >>> >>> Can you add that test, xfailed for now to the testsuite? >> >> Done in r263676. >> >>>> I would just like the ability to get the length/size somehow >>>> so that the questionable code that started us down this path >>>> can be detected. This is not just the sprintf(d, "%s", L"1") >>>> kind of a mismatch but also the missing nul detection in cases >>>> like: >>> [ ... ] >>> I know. And ideally we'll be able to handle everything you want to. >>> But we also have to realize that sometimes we may have to punt. >>> >>> Also remember that incremental progress is (usually) good. >>> >>> >>>> >>>> const wchar_t a[1] = L"\xffffffff"; >>>> strcpy(d, (char*)a); >>> This touches on both the representational issue (excess elements in the >>> initializer) and how to handle a non-terminated string. Both are issues >>> we're debating right now. >>> >>>> >>>> I don't think this is necessarily an important use case to >>>> optimize for but it is one that GCC optimizes already nonetheless, >>>> and always has. For example, this has always been folded into 4 >>>> by GCC and still is even with the patch: >>>> >>>> const __WCHAR_TYPE__ wc[] = L"\x12345678"; >>>> >>>> int f (void) >>>> { >>>> const char *p = (char*)wc; >>>> return strlen (p); // folded to 4 >>>> } >>>> >>>> It's optimized because fold_const_call() relies on c_getstr() >>>> which returns the underlying byte representation of the wide >>>> string, and despite c_strlen() now trying to prevent it. >>> And I think you're hitting on some of issues already raised in the thread. >>> >>> In this specific case though ISTM 4 is the right answer. We casted to a >>> char *, so that's what we should be counting. Or am I missing >>> something? Also note that's the value we'd get from the strlen C >>> library call IIUC. >> >> It is the right answer. My point is that this is optimized >> but the change to c_strlen() prevents the same optimization >> in other similar cases. For example, GCC 6 folds this into >> memcpy: >> >> __builtin_strcpy (d, (char*)L"\x12345678"); >> >> GCC 7 and 8 do too but get the byte count wrong (my bad). >> >> Current trunk doesn't optimize it. If restoring the original >> behavior is the intent (and not just fixing the counting bug) >> then c_strlen() should be fixed to fold this again. >> >> (I'm not a fan of the strcpy to memcpy transformation because >> it makes other things difficult but that's beside the point.) >> > > Yes, this is a good example why it is not good to fold > stuff that is invalid: > > __builtin_strcpy(d, (char*)L"\x12345678"); > => > __builtin_memcpy(d, (char*)L"\x12345678", 5); > > because it folds away the invalidness, makes invalid things > valid, and prevents later passes to spot the bug. > > As a matter of common sense, we should not fold invalid stuff > to valid stuff, even if that is historic behavior. Generally agreed. I'd rather see an invalid call of this nature simply turned into a trap rather than folded into something that is neither safe nor correct.
Jeff