On 08/06/2018 11:15 AM, Martin Sebor wrote: >>> These examples do not aim to be valid C, they just point out limitations >>> of the middle-end design, and a good deal of the problems are due >>> to trying to do things that are not safe within the boundaries given >>> by the middle-end design. >> I really think this is important -- and as such I think we need to move >> away from trying to describe scenarios in C because doing so keeps >> bringing us back to the "C doesn't allow XYZ" kinds of arguments when >> what we're really discussing are GIMPLE semantic issues. >> >> So examples should be GIMPLE. You might start with (possibly invalid) C >> code to generate the GIMPLE, but the actual discussion needs to be >> looking at GIMPLE. We might include the C code in case someone wants to >> look at things in a debugger, but bringing the focus to GIMPLE is really >> important here. > > I don't understand the goal of this exercise. Unless the GIMPLE > code is the result of a valid test case (in some language GCC > supports), what does it matter what it looks like? The basis of > every single transformation done by a compiler is that the source > code is correct. If it isn't then all bets are off. I'm no GIMPLE > expert but even I can come up with any number of GIMPLE expressions > that have undefined behavior. What would that prove? > > But let me try anyway. Here's a simplified (and gimplified) version > of the test case that started this debate: > > struct S { char a[4], b; }; > > f (struct S * p) > { > int D.1908; > > _1 = &p->a; > _2 = __builtin_strlen (_1); // strlen (p->a); > D.1908 = (int) _2; > return D.1908; > } You need to include the declaration of _1 so that we can see its type. Assuming you're just calling strlen (p->a), it's type will be:
char[4] * _1; Which provides you with useful type information. If I'm wrong on this one, I'm sure Jakub & Richi will chime in. > > and one involving a pointer: > > g (struct S * p) > { > int D.1910; > char * q; > > q = &p->a; > _1 = __builtin_strlen (q); > D.1910 = (int) _1; > return D.1910; > } In this case the pointer passed is a char *, so you know nothing. It seems a bit silly since you can see that q = &p->a, but that's how it works. It all comes down to how information is lost as we go through the pipeline. It may not matter in this specific example, but it matters in the general case. > > and another one involving a pointer and strcpy and > _FORTIFY_SOURCE: > > h (struct S * p) > { > int D.2208; > char * q; > > q = &p->a; > _1 = strcpy (q, "1234"); > _2 = (long int) _1; > D.2208 = (int) _2; > return D.2208; > } > > with strcpy defined as: > > __attribute__((artificial, gnu_inline, always_inline, leaf, nothrow)) > strcpy (char * restrict __dest, const char * restrict __src) > { > char * D.2210; > > _1 = __builtin_object_size (__dest, 1); > D.2210 = __builtin___strcpy_chk (__dest, __src, _1); > return D.2210; > } > > What does this show? > > AFAICS, all three functions are equivalent GIMPLE, yet I'm being > told that the first one is different in some important detail from > the second, and that even though it's the same as the third and > even though it's good to have __strcpy_chk() abort in the third > case it's bad for the strlen() call to return a value constrained > to [0, 3]. Would defining strlen like so Note that _b_o_s is supposed to honor the somewhat wonky GIMPLE semantics in this space. If it doesn't that'd be a bug. Assuming it does honor those semantics, then the right things will happen. Note that the loss of information as we go through the pipeline may mean that _b_o_s could return "don't know" which is -1. It may also return the largest of multiple objects that the pointer points to. So you won't get a error from the fortification system in those cases where semantics between C and GIMPLE differ. > > __attribute__((artificial, gnu_inline, always_inline, leaf, nothrow)) > strlen (const char * __src) > { > char * D.2210; > > _1 = __builtin_object_size (__src, 1); > D.2210 = __builtin___strlen_chk (__src, _1); > return D.2210; > } > > and having __strlen_chk() abort if __strc were longer than _1 > be also bad? (If not -- I sincerely hope that's the answer -- > then I'll be happy to put together a patch for that. In fact, > I think it would be useful to extend this to all string > functions (i.e., have them all abort on reads past the end, > just as they abort on writes). So the key here is _b_o_s should be honoring the semantics of GIMPLE. It can/will return "don't know" or sizes potentially larger than the object in some cases. Jeff > > Martin