Re: 33 unknowns left
On Wed, 26 Aug 2015, Joseph Myers wrote: kristerw = kristerw Krister Walfridsson Yes, this is my current address (the "cato@" address mentioned in some other mail is obsolete). [I have been away from GCC development for a long time, but I plan to start contributing in a few weeks -- as soon as I have managed to catch up on what has happened the last couple of years...] /Krister
Re: Krister Walfridsson appointed NetBSD OS port maintainer
On Thu, 30 Aug 2007, David Edelsohn wrote: I am pleased to announce that the GCC Steering Committee has appointed Krister Walfridsson as NetBSD OS port maintainer. Please join me in congratulating Krister on his new role. Krister, please update your listing in the MAINTAINERS file. Thanks! Updated as follows. /Krister 2007-08-30 Krister Walfridsson <[EMAIL PROTECTED]> * MAINTAINERS (OS Port Maintainers): Add myself as NetBSD maintainer. (Write After Approval): Remove myself. Index: MAINTAINERS === --- MAINTAINERS (revision 127935) +++ MAINTAINERS (working copy) @@ -105,6 +105,7 @@ hpuxDave Anglin [EMAIL PROTECTED] hpux Steve Ellcey[EMAIL PROTECTED] irix, osf Rainer Orth [EMAIL PROTECTED] netbsd Jason Thorpe[EMAIL PROTECTED] +netbsd Krister Walfridsson [EMAIL PROTECTED] sco5, unixware, sco udkKean Johnston [EMAIL PROTECTED] sh-linux-gnu Kaz Kojima [EMAIL PROTECTED] RTEMS PortsJoel Sherrill [EMAIL PROTECTED] @@ -413,7 +414,6 @@ Caroline Tice [EMAIL PROTECTED] Michael Tiemann[EMAIL PROTECTED] David Ung [EMAIL PROTECTED] Jonathan Wakely[EMAIL PROTECTED] -Krister Walfridsson[EMAIL PROTECTED] Feng Wang [EMAIL PROTECTED] Stephen M. Webb[EMAIL PROTECTED] John Wehle [EMAIL PROTECTED]
Re: gnat1 huge time
On Wed, 28 Nov 2007, Joel Sherrill wrote: I am trying to get the SVN head built locally again and back at work on the GNAT/RTEMS work I was doing. Unfortunately, I have tripped across something that is quite bad. Compiling on Linux x86 targeting the PowerPC or SPARC leads to a huge compilation time on a single file. FWIW. I see the same behavior for i386-unknown-netbsdelf3.1 too (when compiling natively, i.e. not using cross compiler). The problem was introduced by revision 125624 (the dataflow merge). /Krister
Re: gnat1 huge time
On Fri, 30 Nov 2007, Joel Sherrill wrote: Krister Walfridsson wrote: On Wed, 28 Nov 2007, Joel Sherrill wrote: I am trying to get the SVN head built locally again and back at work on the GNAT/RTEMS work I was doing. Unfortunately, I have tripped across something that is quite bad. Compiling on Linux x86 targeting the PowerPC or SPARC leads to a huge compilation time on a single file. FWIW. I see the same behavior for i386-unknown-netbsdelf3.1 too (when compiling natively, i.e. not using cross compiler). Is there a PR for this or do I need to file one? I have not filed any PR for this, so please go ahead and file a PR. /Krister
GIMPLE undefined behavior
I'm implementing a tool for translation validation (similar to Alive2 for LLVM). The tool uses an SMT solver to verify for each GIMPLE pass that the output IR is a refinement of the input IR: * That each compiled function returns an identical result before/after the pass (for input that does not invoke UB) * That each function does not have additional UB after the pass * That values in global memory are identical after the two versions of the function are run I have reported three bugs (106523, 106613, 106744) where the tool has found differences in the result, but it is a bit unclear to me what is considered undefined behavior in GIMPLE, so I have not reported any such cases yet... For example, ifcombine optimizes int foo (int x, int a, int b) { int c; int _1; int _2; int _3; int _4; : c_6 = 1 << a_5(D); _1 = c_6 & x_7(D); if (_1 != 0) goto ; else goto ; : _2 = x_7(D) >> b_8(D); _3 = _2 & 1; if (_3 != 0) goto ; else goto ; : : # _4 = PHI <2(4), 0(2), 0(3)> return _4; } to int foo (int x, int a, int b) { int c; int _4; int _10; int _11; int _12; int _13; : _10 = 1 << b_8(D); _11 = 1 << a_5(D); _12 = _10 | _11; _13 = x_7(D) & _12; if (_12 == _13) goto ; else goto ; : : # _4 = PHI <2(3), 0(2)> return _4; } Both return the same value for foo(8, 1, 34), but the optimized version shifts more than 31 bits when calculating _10. tree.def says for LSHIFT_EXPR that "the result is undefined" if the number of bits to shift by is larger than the type size, which I interpret as it just should be considered to return an arbitrary value. I.e., this does not invoke undefined behavior, so the optimization is allowed. Is my understanding correct? What about signed integer wrapping for PLUS_EXPR? This happens for int f (int c, int s) { int y2; int y1; int x2; int x1; int _7; : y1_2 = c_1(D) + 2; x1_4 = y1_2 * s_3(D); y2_5 = c_1(D) + 4; x2_6 = s_3(D) * y2_5; _7 = x1_4 + x2_6; return _7; } where slsr optimizes this to int f (int c, int s) { int y1; int x2; int x1; int _7; int slsr_9; : y1_2 = c_1(D) + 2; x1_4 = y1_2 * s_3(D); slsr_9 = s_3(D) * 2; x2_6 = x1_4 + slsr_9; _7 = x1_4 + x2_6; return _7; Calling f(-3, 0x75181005) makes slsr_9 overflow in the optimized code, even though the original did not overflow. My understanding is that signed overflow invokes undefined behavior in GIMPLE, so this is a bug in ifcombine. Is my understanding correct? I would appreciate some comments on which non-memory-related operations I should treat as invoking undefined behavior (memory operations are more complicated, and I'll be back with more concrete questions later...). /Krister
Re: GIMPLE undefined behavior
On Thu, 1 Sep 2022, Richard Biener wrote: It's generally poorly documented what is considered 'undefined behavior'. We desparately need a section in the internals manual for this. For the {L,R}SHIFT_EXPR case we assume the shift operand is in range of [0, precision - 1], so in theory value-range propagation could infer that b_8(D) < 32 after it "executed". But it seems that range-on-exit doesn't do that yet. [...] The problem with shifts is that there's not a "do it anway, but without undefined behavior" operation to substitute. I read this as I should not report these as bugs for now. But I'll probably keep this as UB in my tool to get an idea of how often this happens... Calling f(-3, 0x75181005) makes slsr_9 overflow in the optimized code, even though the original did not overflow. My understanding is that signed overflow invokes undefined behavior in GIMPLE, so this is a bug in ifcombine. Is my understanding correct? Yes, the above would be a bug - again value-range propagation might be leveraged to produce a wrong-code testcase. OK. I'll open bugs for the signed overflow issues the tool finds. I would appreciate some comments on which non-memory-related operations I should treat as invoking undefined behavior (memory operations are more complicated, and I'll be back with more concrete questions later...). The more "interesting" cases are uninitialized values (registers or memory). Yes, this is the next thing I was planning to implement. :) In general what we should worry about most is introducing undefined behavior that, when a later pass can assume it doesn't happen, causes wrong code to be generated. Likewise when we have late instrumentation that would flag such undefined behavior as a user error. Agreed. But that comes back to the issue of lacking documentation... :( /Krister
Translation validation
I have implemented a tool for translation validation (similar to Alive2 for LLVM). The tool takes GIMPLE IR for two functions and checks that the second is a refinement of the first. That is, * The returned value is the same for both functions for all input that does not invoke undefined behavior. * The second does not have any undefined behavior that the first does not have. * The global memory is the same after invoking both functions with input that does not invoke undefined behavior. The implementation is rather limited, but I have already found a few bugs in GCC (PR 106513, 106523, 106744, 106883, and 106884) by running the tool on the c-c++-common, gcc.c-torture, gcc.dg, and gcc.misc-tests test suites. The tool is available at https://github.com/kristerw/pysmtgcc and there is some additional information in the blog post https://kristerw.github.io/2022/09/13/translation-validation/ /Krister
Re: Translation validation
On Wed, 14 Sep 2022, Richard Biener wrote: Note that the folding/peephole optimizations done early can be avoided when you separate opportunities in the source to multiple statements, like change int a, b; a = b + 1 - b; to a = b + 1; a = a - b; note that parts of the folding optimizations are shared between GENERIC and GIMPLE (those generated from match.pd), so those will be exercised. Fully exercising and verifying the fold-const.cc only optimizations which happen on GENERIC only is indeed going to be difficult. Another way to avoid some of the pre-SSA optimizations is to feed the plugin with input from the GIMPLE frontend. Look at gcc/testsuite/gcc.dg/gimplefe-*.c for examples, IL can be dumped roughly in a format suitable as GIMPLE frontend input by dumping with -fdump-tree--gimple (but note all type, global variable and function declarations are missing ...) This does not help when trying to find wrong-code bugs by compiling random source code with plugin1.py. :( But it helps when writing tests for plugin2.py, so I wrote a simple script that extracts the first operand from match.pd simplify expressions and uses that to generate test cases. For example, (simplify (mult (abs@1 @0) @1) (mult @0 @0)) is generated as int __GIMPLE () f (int a0) { int _0; int a1; int _2; int _3; _2 = a0; a1 = __ABS _2; _3 = a1; _0 = a1 * _3; return _0; } This is not an optimal way to find bugs as many of the simplify expressions have additional restrictions in the replacement part, so many of my generated functions fail to trigger optimizations... This did not find any wrong-code bugs, but it found two cases where match.pd is missing TYPE_OVERFLOW_SANITIZED checks: PR 106990. /Krister
CLOBBER(eol)
I'm working on an updated version of my translation validator [*], and I have some problems with CLOBBER(eol). I currently treat CLOBBER(eol) as making the memory invalid (i.e. all subsequent accesses is undefined behavior), but the tool reports miscompilation for a few cases where the tree-nrv pass makes the IR clobber by changing code such as [**] union bu o; ... o = i; MEM[(union *)&o].b18 = _11; MEM[(union *)&o].b20 = _11; = o; o ={v} {CLOBBER(eol)}; return ; to use instead of o union bu o [value-expr: ]; ... = i; MEM[(union *)&].b18 = _11; MEM[(union *)&].b20 = _11; ={v} {CLOBBER(eol)}; return ; As a result, the tool therefore thinks the returned value is unavailable. Is my understanding of CLOBBER(eol) correct? (I.e., is this a bug in tree-nrv?) Or is special so that I should just ignore CLOBBER(eol) for this case? /Krister [*] I'm planning to release the updated version in a few weeks. Meanwhile, you can find the old version at https://github.com/kristerw/pysmtgcc [**] This example originates from gcc.c-torture/execute/921204-1.c compiled for x86 using the flags "-O -m32".
Re: CLOBBER(eol)
On Tue, 27 Jun 2023, Richard Biener wrote: I think this is a bug in NRV, yes, is special but the above would allow to DSE the three stores. Can you open a bugreport? Done! https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110434
types in GIMPLE IR
I have some random questions concerning types in the GIMPLE IR that I would appreciate some clarification on. Type safety --- Some transformations treat 1-bit types as a synonym of _Bool and mix the types in expressions, such as: _2; _Bool _3; _Bool _4; ... _4 = _2 ^ _3; and similarly mixing _Bool and enum enum E:bool { E0, E1 }; in one operation. I had expected this to be invalid... What are the type safety rules in the GIMPLE IR? Somewhat related, gcc.c-torture/compile/pr96796.c performs a VIEW_CONVERT_EXPR from struct S1 { long f3; char f4; } g_3_4; to an int p_51_9 = VIEW_CONVERT_EXPR(g_3_4); That must be wrong? Semantics of "Wide" Booleans, such as , seems to allow more values than 0 and 1. For example, I've seen some IR operations like: _66 = _16 ? _Literal () -1 : 0; But I guess there must be some semantic difference between and a 32-bit int, otherwise the wide Boolean type would not be needed... So what are the difference? /Krister
Re: types in GIMPLE IR
On Thu, 29 Jun 2023, Richard Biener wrote: The thing with signed bools is that the two relevant values are -1 (true) and 0 (false), those are used for vector bool components where we also need them to be of wider type (32bits in this case). My main confusion comes from seeing IR doing arithmetic such as _127; _169; ... _169 = _127 + -1; or _127; _169; ... _169 = -_127; and it was unclear to me what kind of arithmetic is allowed. I have now verified that all cases seems to be just one operation of this form (where _127 has the value 0 or 1), so it cannot construct values such as 42. But the wide signed Boolean can have the three different values 1, 0, and -1, which I still think is at least one too many. :) I'll update my tool to complain if the value is outside the range [-1, 1]. /Krister
Re: types in GIMPLE IR
On Thu, 29 Jun 2023, Richard Biener wrote: IIRC we have some simplification rules that turn bit operations into arithmetics. Arithmetic is allowed if it keeps the values inside [-1,0] for signed bools or [0, 1] for unsigned bools. I have now verified that all cases seems to be just one operation of this form (where _127 has the value 0 or 1), so it cannot construct values such as 42. But the wide signed Boolean can have the three different values 1, 0, and -1, which I still think is at least one too many. :) Yeah, I'd be interested in a testcase that shows this behavior. I created PR 110487 with one example. I'll update my tool to complain if the value is outside the range [-1, 1]. It is likely that all issues I have seen so far are due to PR 110487, so I'll keep the current behavior that complains if the value is outside the range [-1, 0]. /Krister
semantics of uninitialized values in GIMPLE
I have implemented support for uninitialized memory in my translation validator. But I am not sure how well this corresponds to the GIMPLE semantics, so I have some questions... My implementation tracks uninitialized bits. Use of uninitialized bits is in general treated as UB (for example, `x + y` is UB if `x` or `y` has any uninitialized bits), but there are a few exceptions: * load/store: It is always OK to load/store values having uninitialized bits. * Phi nodes: Phi nodes propagates uninitialized bits. * selection: Instructions that are selecting an element (COND_EXPR, VEC_PERM_EXPR, etc.) may select between values having uninitialized bits, and the resulting value may have uninitialized bits. But the condition/mask must not have uninitialized bits. * Extraction: Instructions that are extracting bits (BIT_FIELD_REF etc.) may have uninitialized bits in both the input/output. * Insertion: Instructions that are constructing/inserting values (COMPLEX_EXPR, etc.) may have uninitialized bits in both the input/output. All other use of values having uninitialized bits are considered UB. Does this behavior make sense? The above seems to work fine so far, with one exception that can be seen in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit field unsigned char c6:1, c7:1, c8:1, c9:3, c10:1; which is written to as x.c6 = 0; x.c7 = 0; x.c8 = 0; x.c9 = 7; The store merging pass changes this to _71 = MEM [(struct C *)&x + 8B]; _42 = _71 & 128; _45 = _42 | 56; and the translation validator is now complaining that the pass has introduced UB that was not in the original IR (because the most significant bit in _71 is uninitialized when passed to BIT_AND_EXPR). I could solve this by allowing uninitialized bits in BIT_AND_EXPR and BIT_OR_EXP, and propagating each bit according to * `0 & uninit` is an initialized `0` * `1 & uninit` is uninitialized * `0 | uninit` is uninitialized * `1 | uninit` is an initialized `1` Is that the correct GIMPLE semantics? /Krister
Re: semantics of uninitialized values in GIMPLE
On Fri, 7 Jul 2023, Richard Biener wrote: I have implemented support for uninitialized memory in my translation validator. But I am not sure how well this corresponds to the GIMPLE semantics, so I have some questions... My implementation tracks uninitialized bits. Use of uninitialized bits is in general treated as UB (for example, `x + y` is UB if `x` or `y` has any uninitialized bits), but there are a few exceptions: * load/store: It is always OK to load/store values having uninitialized bits. * Phi nodes: Phi nodes propagates uninitialized bits. definitely, I think plain copies would be OK as well With "plain copies", do you mean treating it as it is always defined? That would prevent optimizations such as transforming int t; if (1) t = *p; else t = 0; return t + 1; to the equivalent of return *p + 1; because the latter is UB if *p is undefined, but the original is OK if phi nodes always give us a fully defined value. * selection: Instructions that are selecting an element (COND_EXPR, VEC_PERM_EXPR, etc.) may select between values having uninitialized bits, and the resulting value may have uninitialized bits. But the condition/mask must not have uninitialized bits. * Extraction: Instructions that are extracting bits (BIT_FIELD_REF etc.) may have uninitialized bits in both the input/output. * Insertion: Instructions that are constructing/inserting values (COMPLEX_EXPR, etc.) may have uninitialized bits in both the input/output. Generally I think "moving" uninitialized bits in full or in part is OK. Operating on them, including comparing them (with themselves) is eventually asking for troubles. Makes sense. But I think we must restrict the definition of "move". For example, one can see x * 2 as moving the uninitialized bits one step. And x + x and x << 1 are the same as x * 2, so then they too would be defined? I would prefer if all of them were UB if x contains uninitialized bits... All other use of values having uninitialized bits are considered UB. Does this behavior make sense? The above seems to work fine so far, with one exception that can be seen in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit field unsigned char c6:1, c7:1, c8:1, c9:3, c10:1; which is written to as x.c6 = 0; x.c7 = 0; x.c8 = 0; x.c9 = 7; The store merging pass changes this to _71 = MEM [(struct C *)&x + 8B]; _42 = _71 & 128; _45 = _42 | 56; and the translation validator is now complaining that the pass has introduced UB that was not in the original IR (because the most significant bit in _71 is uninitialized when passed to BIT_AND_EXPR). I could solve this by allowing uninitialized bits in BIT_AND_EXPR and BIT_OR_EXP, and propagating each bit according to * `0 & uninit` is an initialized `0` * `1 & uninit` is uninitialized * `0 | uninit` is uninitialized * `1 | uninit` is an initialized `1` Is that the correct GIMPLE semantics? I think the above "moves" the uninitialized MSB from memory to _45 so that's OK. Some "operations" like & 0 or & 1 give either defined values or take the uninitialized bit unmodified (thus "move"). I think both kinds are OK. Likewise + 0 or * 0/1 would be OK. What's not OK is operations that use an (fully?) uninitialized value twice, like x - x when x is uninitialized. + 0 and * 0/1 makes sense to me. One annoyance is that it will make my tool slower as it must track undefined bits in more cases. But that is not a valid reason for restricting the IR semantics... I think we want that, as soon as the uninitialized value becomes "part" of a then partly initialized value, it's value is "fixed". With things like _Complex or vector the situation is a bit of a grey area since we have ways to operate on elements. Note that when we for example use a BIT_FIELD_REF to extract the MSB from _42 above the value would be again fully undefined. Do you mean that all operations on the values having "fixed" bits are valid? I do not think that will work. The frozen value may be any value, which mean that the program is nondeterministic. For example, if x has one "fixed" uninitialized bit with the other bits 0, then code such as if (x) could "randomly" be either true or false, so the program will not really have any defined semantic. And if use of an extracted "fixed" uninitialized bit is UB, then the following may be UB (if x or y has a "fixed" uninitialized least significant bit) bool b = (x & 1) != (y & 1) while the equivalent bool b = (x ^ y) & 1 is defined. And things may be even more fun when arithmetic is "smearing" the fixed uninitialized values over the variable -- it will then be impossible to determine if the extracted bits are OK or not when extracted... I hope this makes some sense, it's a tricky area. Richard.
Re: semantics of uninitialized values in GIMPLE
On Tue, 11 Jul 2023, Richard Biener wrote: With "plain copies", do you mean treating it as it is always defined? That would prevent optimizations such as transforming int t; if (1) t = *p; else t = 0; return t + 1; to the equivalent of return *p + 1; because the latter is UB if *p is undefined, but the original is OK if phi nodes always give us a fully defined value. I meant the copy will simply transfer the state (uninitialized or initialized) from RHS to LHS but in itself does not invoke undefined behavior. And "plain" copy would be int t, u; t = u; where u is uninitialized and t will be as well but this copy in itself isn't problematic. So I misunderstood what you meant. This is how I have implemented it! :) Maybe we should treat these as undefined as well - the problem with PHIs is that the implied copies are necessary because of SSA construction even when they do not appear in the original program. But since transforms like jump threading can cause those copies to become degenerate it's odd to only treat those as OK. So you have to think about _1 = PHI vs. _1 = p_2(D); * selection: Instructions that are selecting an element (COND_EXPR, VEC_PERM_EXPR, etc.) may select between values having uninitialized bits, and the resulting value may have uninitialized bits. But the condition/mask must not have uninitialized bits. * Extraction: Instructions that are extracting bits (BIT_FIELD_REF etc.) may have uninitialized bits in both the input/output. * Insertion: Instructions that are constructing/inserting values (COMPLEX_EXPR, etc.) may have uninitialized bits in both the input/output. Generally I think "moving" uninitialized bits in full or in part is OK. Operating on them, including comparing them (with themselves) is eventually asking for troubles. Makes sense. But I think we must restrict the definition of "move". For example, one can see x * 2 as moving the uninitialized bits one step. And x + x and x << 1 are the same as x * 2, so then they too would be defined? I would prefer if all of them were UB if x contains uninitialized bits... I guess you have to try ... All other use of values having uninitialized bits are considered UB. Does this behavior make sense? The above seems to work fine so far, with one exception that can be seen in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit field unsigned char c6:1, c7:1, c8:1, c9:3, c10:1; which is written to as x.c6 = 0; x.c7 = 0; x.c8 = 0; x.c9 = 7; The store merging pass changes this to _71 = MEM [(struct C *)&x + 8B]; _42 = _71 & 128; _45 = _42 | 56; and the translation validator is now complaining that the pass has introduced UB that was not in the original IR (because the most significant bit in _71 is uninitialized when passed to BIT_AND_EXPR). I could solve this by allowing uninitialized bits in BIT_AND_EXPR and BIT_OR_EXP, and propagating each bit according to * `0 & uninit` is an initialized `0` * `1 & uninit` is uninitialized * `0 | uninit` is uninitialized * `1 | uninit` is an initialized `1` Is that the correct GIMPLE semantics? I think the above "moves" the uninitialized MSB from memory to _45 so that's OK. Some "operations" like & 0 or & 1 give either defined values or take the uninitialized bit unmodified (thus "move"). I think both kinds are OK. Likewise + 0 or * 0/1 would be OK. What's not OK is operations that use an (fully?) uninitialized value twice, like x - x when x is uninitialized. + 0 and * 0/1 makes sense to me. One annoyance is that it will make my tool slower as it must track undefined bits in more cases. But that is not a valid reason for restricting the IR semantics... * 0 produces a defined value. The difference with x * 1 is that we can elide the operation so the result should better behave the same with x itself. So I stand corrected and * 1 should not be "OK", that is the result is still uninitialized. With '&' a (partly) undefined value might become (fully) defined. A good history of us getting more "correct" here is visible in tree-ssa-ccp.cc:likely_value which tells CCP when a lattice value is UNDEFINED (CCP doesn't track undefinedness on a bit level so it has to treat partly undefined values as defined). Thanks. I'll take a look at this. I think we want that, as soon as the uninitialized value becomes "part" of a then partly initialized value, it's value is "fixed". With things like _Complex or vector the situation is a bit of a grey area since we have ways to operate on elements. Note that when we for example use a BIT_FIELD_REF to extract the MSB from _42 above the value would be again fully undefined. Do you mean that all operations on the values having "fixed" bits are valid? I do not think that will work. The frozen value may be any value, which mean that the program is nondeterministic. For example, if x has one "fixed" uninitialize
Re: semantics of uninitialized values in GIMPLE
On Tue, 11 Jul 2023, Richard Biener wrote: I'll update my implementation, and will come back with a more detailed proposal in a few weeks when I have tried some more things. Thanks! I've also taken the opportunity given by your work at the recent bugs to propose a talk at this years GNU Cauldron about undefined behavior on GCC and hope to at least start on documenting the state of art^WGCC in the internals manual for this. If you have any pointers to your work / research I'd be happy to point to it, learn from it (and maybe steal a word or two ;)). Nice! No, I have not published anything since the original release of 'pysmtgcc' last year -- I was planning to document it in detail, but found out that nothing worked, and I did not really understand what I was doing... And the Python prototype started to be annoying, so I threw all away and wrote a new tool in C++. The new implementation now handles most of GIMPLE (but it still does not handle loops or function calls). I also have support for checking that the generated assembly has the same semantics as the optimized GIMPLE (only partial support for RISC-V for now). I plan to publish a write-up of the memory model soon in a series of blog posts -- I'll send you the link when it is available. And FWIW, I'm also planning to submit a talk to GNU Cauldron. :) My proposal is about "Analyzing IR and Assembly Using SMT Solvers". The translation validation stuff is actually just a test case for my analysis engine -- I have more ideas for how it can be used... /Krister
Re: semantics of uninitialized values in GIMPLE
On Tue, 11 Jul 2023, Krister Walfridsson wrote: On Tue, 11 Jul 2023, Richard Biener wrote: I'll update my implementation, and will come back with a more detailed proposal in a few weeks when I have tried some more things. Thanks! I've also taken the opportunity given by your work at the recent bugs to propose a talk at this years GNU Cauldron about undefined behavior on GCC and hope to at least start on documenting the state of art^WGCC in the internals manual for this. If you have any pointers to your work / research I'd be happy to point to it, learn from it (and maybe steal a word or two ;)). Nice! No, I have not published anything since the original release of 'pysmtgcc' last year -- I was planning to document it in detail, but found out that nothing worked, and I did not really understand what I was doing... And the Python prototype started to be annoying, so I threw all away and wrote a new tool in C++. The new implementation now handles most of GIMPLE (but it still does not handle loops or function calls). I also have support for checking that the generated assembly has the same semantics as the optimized GIMPLE (only partial support for RISC-V for now). I plan to publish a write-up of the memory model soon in a series of blog posts -- I'll send you the link when it is available. I have now published a few blog posts about my work. You can find them at: https://github.com/kristerw/pysmtgcc I'll publish the remaining blog posts next week. /Krister
CLZ when CLZ_DEFINED_VALUE_AT_ZERO is false
My translation validation tool reports some miscompilations related to the internal call CLZ(0) when CLZ_DEFINED_VALUE_AT_ZERO is false, but I am not sure I use the correct semantics... I started by modeling CLZ(0) as undefined behavior, but that made the tool report a miscompilation for gcc.c-torture/execute/920501-6.c where sccp changes a loop to _36 = t_10(D) != 0; _35 = .CLZ (t_10(D)); _34 = 63 - _35; _33 = (unsigned int) _34; _32 = (long long unsigned int) _33; _31 = _32 + 1; b_38 = _36 ? _31 : 1; This may call CLZ with 0, but the value is not used, so it seems reasonable to allow this. I therefore changed my tool to treat CLZ(0) as returning an undefined value instead (by generating it as a symbolic value, which then makes the tool test that the IR is valid for all values). This still makes this optimization fail because the calculation of _34 may overflow... :( This could be a bug in the sccp pass (which could have done the calculation as unsigned), but fixing this would still make the tool report a miscompilation as the ranges calculated during the dom3 pass claims that _35 has a range: _35 : [irange] int [0, 63] So what is the correct semantics for CLZ when CLZ_DEFINED_VALUE_AT_ZERO is false? /Krister
Re: CLZ when CLZ_DEFINED_VALUE_AT_ZERO is false
On Fri, 1 Sep 2023, Richard Biener wrote: The value of .CLZ (0) is undefined then. I belive your analysis is correct in that both 63 - _35 might overflow and that dom3 (thus ranger) mis-computes the range for _35. I wonder why we don't elide _36 ? _31 : 1 with that info (possibly no range-op for .CLZ?), of course it would be wrong to do so. Can you open a bugreport please? PR 111280. /Krister
smtgcc translation validation
I have now released the new implementation for translation validation I talked about at the GNU Tools Cauldron last week: https://github.com/kristerw/smtgcc /Krister
Re: smtgcc translation validation
On Sun, 18 Aug 2024, Gerald Pfeifer wrote: On Sat, 30 Sep 2023, Krister Walfridsson via Gcc wrote: I have now released the new implementation for translation validation I talked about at the GNU Tools Cauldron last week: https://github.com/kristerw/smtgcc Wouldn't it be appropriate/nice to promote this a bit more strongly from gcc.gnu.org? One example I can think of is https://gcc.gnu.org/extensions.html . Another might be https://gcc.gnu.org/bugs/ to help understand whether something is a user bug or GCC bug? Happy to accept patches, or if you want to suggest some text massage that into HTML and add it. I suggest waiting a few weeks -- the tool still has a few issues with false positives, and I need to document the limitations. But my plan is to have it fixed before the GNU Tools Cauldron; the main message in my proposed talk is "all of you should start using smtgcc!" :) I'll send you a patch when this is done! /Krister
Re: smtgcc end-of-year update
On Thu, 2 Jan 2025, Andi Kleen wrote: Krister Walfridsson via Gcc writes: But running smtgcc on the test suite is not the best use case for the tool -- it only detects bugs where the test triggers an unrelated bug compared to what the test is checking, which should be uncommon. I therefore plan to start testing by compiling real-world code and doing some fuzzing. One problem is that this often finds the open bugs mentioned in the previous section, making it time-consuming to check if each new issue is a repeat. For example, running yarpgen gives me a few failure reports per day of PR106883. I would increase the priority of PR106883 if it blocks so much testing. But does disabling the slsr pass not help? Yes, I can disable the slsr pass, or just ignore all the problems detected in it. But most of the bugs mentioned occur frequently enough that I have the same problem with them, and there wouldn't be much left of the compiler to check if I disable/ignore all of the affected passes... I don't want to overstate the problem -- I'll manage. But it would be great if a few of these PRs were fixed! :) /Krister
smtgcc end-of-year update
I have continued working on my translation validator, smtgcc [1], over the last few months. Here's an update on what has happened since my talk at the GNU Tools Cauldron [2]. Improvements The main focus has been improving the smtgcc-tv-backend tool which checks that the generated assembly is a refinement of the final GIMPLE IR. Highlights: * Fixed the RISC-V 64-bit pointer problem, so both RV32 and RV64 now work. * Fixed lots of corner cases where we failed to parse the assembly. * Added support for the RISC-V B extensions. * Added partial support for the RISC-V V extensions. * Added support for the AArch64 architecture. AArch64 support is still a bit limited: the ABI is not fully implemented, and checking is slow and often times out because the current implementation generates overly complicated SMT formulas for some common instructions. Reported GCC bugs - Most GCC bugs I have reported have been fixed, but there are currently three open bugs where GCC generates incorrect code: PR113703, PR117186, PR118174. I also claim that PR117688 still happens even though it is marked as resolved -- I will investigate it a bit more before reopening the PR. I have also reported several bugs where optimizations perform invalid transformations that introduce new undefined behavior, although these do not, in practice, miscompile code with the current pass ordering. The following are still open: PR106883, PR106884, PR109626, PR110760, PR111257, PR111280, PR111494, PR113590, PR114056, PR117927. GCC testing --- I have been running smtgcc over the GCC test suite weekly, and it seems to detect a newly introduced bug roughly every month. I plan to continue doing this in the coming months as well. But running smtgcc on the test suite is not the best use case for the tool -- it only detects bugs where the test triggers an unrelated bug compared to what the test is checking, which should be uncommon. I therefore plan to start testing by compiling real-world code and doing some fuzzing. One problem is that this often finds the open bugs mentioned in the previous section, making it time-consuming to check if each new issue is a repeat. For example, running yarpgen gives me a few failure reports per day of PR106883. /Krister [1] https://github.com/kristerw/smtgcc/ [2] https://www.youtube.com/watch?v=vrkR7jKcFpw
Re: Pointer semantics in GIMPLE
On Fri, 21 Mar 2025, Richard Biener wrote: Issues -- The semantics I've described above result in many reports of miscompilations that I haven't reported yet. As mentioned earlier, the vectorizer can use POINTER_PLUS_EXPR to generate pointers that extend up to a vector length past the object (and similarly, up to one vector length before the object in a backwards loop). This is invalid if the object is too close to address 0 or 0x. And this out of bounds distance can be made arbitrarily large, as can be seen by compiling the following function below for x86_64 with -O3: void foo(int *p, uint64_t s, int64_t n) { for (int64_t i = 0; i < n; i++) { int *q = p + i * s; for (int j = 0; j < 4; j++) *q++ = j; } } Here, the vectorized code add s * 4 to the pointer ivtmp_8 at the end of each iteration: : bnd.8_38 = (unsigned long) n_10(D); _21 = s_11(D) * 4; : # ivtmp_8 = PHI # ivtmp_26 = PHI MEM [(int *)ivtmp_8] = { 0, 1, 2, 3 }; ivtmp_7 = ivtmp_8 + _21; ivtmp_20 = ivtmp_26 + 1; if (ivtmp_20 < bnd.8_38) goto ; else goto ; : goto ; This means calling foo with a sufficiently large s can guarantee wrapping or evaluating to 0, even though the original IR before optimization did not wrap or evaluating to 0. For example, when calling foo as: foo(p, -((uintptr_t)p / 4), 1); I guess this is essentially the same issue as PR113590, and could be fixed by moving all induction variable updates to the latch. Yes. But if that happens, the motivating examples for needing to handle invalid pointer values would no longer occur. Would that mean smtgcc should adopt a more restrictive semantics for pointer values? In principle yes, but PR113590 blocks this I guess. Could smtgcc consider a SSA def to be happening only at the latest possible point, that is, "virtually" sink it to the latch? (I realize when you have two dependent defs that can be moved together only such "virtual" sinking could be somewhat complicated) I think it feels a bit strange (and brittle) to define the semantics by virtually sinking everything as much as possible. But isn't this sinking just another way of saying that the operation is UB only if the value is used? That is, we can solve this essentially the same way LLVM does with its deferred UB. Yeah, that's another way of thinking. Though technically GCC doesn't implement defered UB - for the case above we're simply lucky nothing exploits the UB that is in principle there. The idea is that POINTER_DIFF_EXPR, PLUS_EXPR, etc. are defined for all inputs, but the result is a poison value when it doesn't fit in the return type. Any use of a poison value is UB. This is trivial to implement in smtgcc and would solve PR113590. I _think_ it's a reasonable thing for smtgcc to do. For GCC itself it would complicate what is UB and what not quite a bit - what's the ultimate "use" UB triggers? I suppose POISON + 1 is simply POISON again. Can you store POISON to memory or is that then UB at the point of the store? Can you pass it to a call? What if the call is inlined and just does return arg + 1? IIRC I read some paper about this defered UB in LLVM and wasn't convinced they solve any real problem. I agree deferred UB causes more problems than it solves... And I think it's important that smtgcc uses the same semantics as GCC, so I'll keep the current behavior. /Krister
Pointer semantics in GIMPLE
I'm working on ensuring that the GIMPLE semantics used by smtgcc are correct, and I have a lot of questions about the details. I'll be sending a series of emails with these questions. This first one is about pointers in general. Each section starts with a description of the semantics I've implemented (or plan to implement), followed by concrete questions if relevant. Let me know if the described semantics are incorrect or incomplete. Pointers values --- C imposes restrictions on pointer values (alignment, must point to valid memory, etc.), but GIMPLE seems more permissive. For example, when vectorizing a loop: void foo(int *p, int n) { for (int i = 0; i < n; i++) p[i] = 1; } the vectorized loop updates the pointer vectp_p.8 for the next iteration at the end of the loop, causing it to point up to a vector-length out of range after the last iteration: : # vectp_p.8_34 = PHI # ivtmp_37 = PHI MEM [(int *)vectp_p.8_34] = { 1, 1, 1, 1 }; vectp_p.8_35 = vectp_p.8_34 + 16; ivtmp_38 = ivtmp_37 + 1; if (ivtmp_38 < bnd.5_31) goto ; else goto ; : goto ; Because of this, I allow all values -- essentially treating pointers as unsigned integers. Things like ensuring pointers point to valid memory or have correct alignment are only checked when dereferencing the pointer (I'll discuss pointer dereferencing semantics in the next email). Pointer arithmetic -- POINTER_PLUS_EXPR --- POINTER_PLUS_EXPR calculates p + x: * It is UB if the calculation overflows when TYPE_OVERFLOW_WRAPS(ptr_type) is false. * It is UB if p + x == 0, unless p == 0 and x == 0, unless flag_delete_null_pointer_checks is false. Since the pointer type is unsigned, it's not entirely clear what overflow means. For example, p - 1 is represented as p + 0x, which overflows when treated as an unsigned addition. I check for overflow in p + x as follows: is_overflow = ((intptr_t)x < 0) ? (p + x) > p : (p + x) < p; Pointer arithmetic -- MEM_REF and TARGET_MEM_REF A calculation p + x can also be done using MEM_REF as &MEM[p + x]. Question: smtgcc does not check for overflow in address calculations for MEM_REF and TARGET_MEM_REF. Should it treat overflow as UB in the same way as POINTER_PLUS_EXPR? Pointer arithmetic -- COMPONENT_REF --- COMPONENT_REF adds the offset of a structure element to the pointer to the structure. Question: smtgcc does not check for overflow in address calculation for COMPONENT_REF. Should it treat overflow as UB in the same way as POINTER_PLUS_EXPR? Pointer arithmetic -- ARRAY_REF --- ARRAY_REF adds the offset of the indexed element to the pointer to the array. * It is UB if the index is negative. * If the TYPE_DOMAIN for the array type has a TYPE_MAX_VALUE and the array is not a flexible array, it is UB if the index exceeds "one after" this maximum value. * If it is a flexible array or does not have a maximum value, it is considered an unsized array, so all non-negative indices are valid. But it is UB if the calculation of offset = index * element_size overflows. Question: smtgcc does not check for overflow in the calculation of p + offset. Should it treat overflow as UB in the same way as POINTER_PLUS_EXPR? Question: What is the correct way to check for flexible arrays? I am currently using array_ref_flexible_size_p and flag_strict_flex_arrays, but I noticed that middle-end passes do not seem to use flag_strict_flex_arrays. Is there a more canonical way to determine how to interpret flexible arrays? Pointer arithmetic -- POINTER_DIFF_EXPR --- Subtracting a pointer q from a pointer p is done using POINTER_DIFF_EXPR. * It is UB if the difference does not fit in a signed integer with the same precision as the pointers. This implies that an object's size must be less than half the address space; otherwise, POINTER_DIFF_EXPR cannot be used to compute sizes in C. But there may be additional restrictions. For example, compiling the function: void foo(int *p, int *q, int n) { for (int i = 0; i < n; i++) p[i] = q[i] + 1; } causes the vectorized code to perform overlap checks like: _7 = q_11(D) + 4; _25 = p_12(D) - _7; _26 = (sizetype) _25; _27 = _26 > 8; _28 = _27; if (_28 != 0) goto ; else goto ; which takes the difference between two pointers that may point to different objects. This suggests that all objects must fit within half the address space. Question: What are the restrictions on valid address ranges and object sizes? Issues -- The semantics I've described above result in many reports of miscompilations that I haven't reported yet. As mentioned earlier, the vectorizer can use POINTER_PLUS_EXPR to generate pointers that extend
Memory access in GIMPLE
I have more questions about GIMPLE memory semantics for smtgcc. As before, each section starts with a description of the semantics I've implemented (or plan to implement), followed by concrete questions if relevant. Let me know if the described semantics are incorrect or incomplete. Accessing memory Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the lhs and/or rhs is a memory reference expression (such as MEM_REF). When both lhs and rhs access memory, one of the following must hold -- otherwise the access is UB: 1. There is no overlap between lhs and rhs 2. lhs and rhs represent the same address A memory access is also UB in the following cases: * Any accessed byte is outside valid memory * The pointer violates the alignment requirements * The pointer provenance doesn't match the object * The type is incorrect from a TBAA perspective * It's a store to constant memory smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this mail. Provenance has its own issues, which I'll come back to in a separate mail. Checking memory access is within bounds --- A memory access may be represented by a chain of memory reference expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example, accessing a structure: struct s { int x, y; }; as: int foo (struct s * p) { int _3; : _3 = p_1(D)->x; return _3; } involves a MEM_REF for the whole object and a COMPONENT_REF to select the field. Conceptually, we load the entire structure and then pick out the element -- so all bytes of the structure must be in valid memory. We could also do the access as: int foo (struct s * p) { int * q; int _3; : q_2 = &p_1(D)->x; _3 = *q_2; return _3; } This calculates the address of the element, and then reads it as an integer, so only the four bytes of x must be in valid memory. In other words, the compiler is not allowed to optimize: q_2 = &p_1(D)->x; _3 = *q_2; to _3 = p_1(D)->x; Question: Describing the first case as conceptually reading the whole structure makes sense for loads. But I assume the same requirement -- that the entire object must be in valid memory -- also applies for stores. Is that correct? Allowed out-of-bounds read? --- Compile the function below for x86_64 with "-O3 -march=x86-64-v2": int f(int *a) { for (int i = 0; i < 100; i++) if (a[i]) return 1; return 0; } The vectorizer transforms this into code that processes one scalar element at a time until the pointer is 16-byte aligned, then switches to vector loads. The original code is well-defined when called like this: int a[2] __attribute__((aligned(16))) = {1, 0}; f(a); But the vectorized version ends up reading 8 bytes out of bounds. This out-of-bounds read is harmless in practice -- it stays within the same memory page, so the extra bytes are accessable. But it's invalid under the smtgcc memory model. Question: Is this considered a valid access in GIMPLE? If so, what are the rules for allowed out-of-bounds memory reads? Alignment check --- Question: smtgcc currently gets the alignment requirements by calling get_object_alignment on the tree expression returned from gimple_assign_lhs (for stores) or gimple_assign_rhs1 (for loads). Is that the correct way to get the required alignment? /Krister
Re: Pointer semantics in GIMPLE
On Thu, 20 Mar 2025, Richard Biener wrote: Pointer arithmetic -- POINTER_DIFF_EXPR --- Subtracting a pointer q from a pointer p is done using POINTER_DIFF_EXPR. * It is UB if the difference does not fit in a signed integer with the same precision as the pointers. Yep. This implies that an object's size must be less than half the address space; otherwise, POINTER_DIFF_EXPR cannot be used to compute sizes in C. But there may be additional restrictions. For example, compiling the function: void foo(int *p, int *q, int n) { for (int i = 0; i < n; i++) p[i] = q[i] + 1; } causes the vectorized code to perform overlap checks like: _7 = q_11(D) + 4; _25 = p_12(D) - _7; _26 = (sizetype) _25; _27 = _26 > 8; _28 = _27; if (_28 != 0) goto ; else goto ; which takes the difference between two pointers that may point to different objects. This suggests that all objects must fit within half the address space. Interesting detail. But yes, the above is either incorrect code (I didn't double-check) or this condition must hold. For 64bit virtual address-space it likely holds in practice. For 32bit virtual address-space it might not. Question: What are the restrictions on valid address ranges and object sizes? The restriction on object size is documented, as well as that objects may not cross/include 'NULL'. Can you open a bugreport for the vectorizer overlap test issue above? I think we want to track this and at least document the restriction (we could possibly add a target hook that can tell whether such a simplified check is OK). PR119399 Issues -- The semantics I've described above result in many reports of miscompilations that I haven't reported yet. As mentioned earlier, the vectorizer can use POINTER_PLUS_EXPR to generate pointers that extend up to a vector length past the object (and similarly, up to one vector length before the object in a backwards loop). This is invalid if the object is too close to address 0 or 0x. And this out of bounds distance can be made arbitrarily large, as can be seen by compiling the following function below for x86_64 with -O3: void foo(int *p, uint64_t s, int64_t n) { for (int64_t i = 0; i < n; i++) { int *q = p + i * s; for (int j = 0; j < 4; j++) *q++ = j; } } Here, the vectorized code add s * 4 to the pointer ivtmp_8 at the end of each iteration: : bnd.8_38 = (unsigned long) n_10(D); _21 = s_11(D) * 4; : # ivtmp_8 = PHI # ivtmp_26 = PHI MEM [(int *)ivtmp_8] = { 0, 1, 2, 3 }; ivtmp_7 = ivtmp_8 + _21; ivtmp_20 = ivtmp_26 + 1; if (ivtmp_20 < bnd.8_38) goto ; else goto ; : goto ; This means calling foo with a sufficiently large s can guarantee wrapping or evaluating to 0, even though the original IR before optimization did not wrap or evaluating to 0. For example, when calling foo as: foo(p, -((uintptr_t)p / 4), 1); I guess this is essentially the same issue as PR113590, and could be fixed by moving all induction variable updates to the latch. Yes. But if that happens, the motivating examples for needing to handle invalid pointer values would no longer occur. Would that mean smtgcc should adopt a more restrictive semantics for pointer values? In principle yes, but PR113590 blocks this I guess. Could smtgcc consider a SSA def to be happening only at the latest possible point, that is, "virtually" sink it to the latch? (I realize when you have two dependent defs that can be moved together only such "virtual" sinking could be somewhat complicated) I think it feels a bit strange (and brittle) to define the semantics by virtually sinking everything as much as possible. But isn't this sinking just another way of saying that the operation is UB only if the value is used? That is, we can solve this essentially the same way LLVM does with its deferred UB. The idea is that POINTER_DIFF_EXPR, PLUS_EXPR, etc. are defined for all inputs, but the result is a poison value when it doesn't fit in the return type. Any use of a poison value is UB. This is trivial to implement in smtgcc and would solve PR113590. /Krister
Re: Memory access in GIMPLE
On Thu, 3 Apr 2025, Richard Biener wrote: On Thu, Apr 3, 2025 at 2:23 AM Krister Walfridsson via Gcc wrote: I have more questions about GIMPLE memory semantics for smtgcc. As before, each section starts with a description of the semantics I've implemented (or plan to implement), followed by concrete questions if relevant. Let me know if the described semantics are incorrect or incomplete. Accessing memory Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the lhs and/or rhs is a memory reference expression (such as MEM_REF). When both lhs and rhs access memory, one of the following must hold -- otherwise the access is UB: 1. There is no overlap between lhs and rhs 2. lhs and rhs represent the same address A memory access is also UB in the following cases: * Any accessed byte is outside valid memory * The pointer violates the alignment requirements * The pointer provenance doesn't match the object * The type is incorrect from a TBAA perspective * It's a store to constant memory correct. Note that GIMPLE_CALL and GIMPLE_ASM can also access memory. smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this mail. Provenance has its own issues, which I'll come back to in a separate mail. Checking memory access is within bounds --- A memory access may be represented by a chain of memory reference expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example, accessing a structure: struct s { int x, y; }; as: int foo (struct s * p) { int _3; : _3 = p_1(D)->x; return _3; } involves a MEM_REF for the whole object and a COMPONENT_REF to select the field. Conceptually, we load the entire structure and then pick out the element -- so all bytes of the structure must be in valid memory. We could also do the access as: int foo (struct s * p) { int * q; int _3; : q_2 = &p_1(D)->x; _3 = *q_2; return _3; } This calculates the address of the element, and then reads it as an integer, so only the four bytes of x must be in valid memory. In other words, the compiler is not allowed to optimize: q_2 = &p_1(D)->x; _3 = *q_2; to _3 = p_1(D)->x; Correct. The reason that p_1(D)->x is considered accessing the whole object is because of TBAA, so with -fno-strict-aliasing there is no UB when the whole object isn't accessible (but the subsetted piece is). I'm still a bit confused... Assume we're using -fno-strict-aliasing and p points to a 4-byte buffer in the example above. Then: _3 = p_1(D)->x; is valid. So that means my paragraph starting with "In other words..." must be incorrect? That is, the compiler is allowed to optimize: q_2 = &p_1(D)->x; _3 = *q_2; to _3 = p_1(D)->x; since it's equally valid. But this optimization would be invalid for -fstrict-aliasing? And a memory access like: _4 = p_1(D)->y; would be invalid (for both -fno-strict-aliasing and -fstrict-aliasing)? That is, the subsetted piece must be accessible? /Krister