On Sun, 2022-06-12 at 20:27 +0200, Tim Lange wrote: > > > On Do, Jun 9 2022 at 13:40:06 -0400, David Malcolm > <dmalc...@redhat.com> wrote: > > On Thu, 2022-06-09 at 16:49 +0200, Tim Lange wrote: > > > > > > > On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm > > > <dmalc...@redhat.com> wrote: > > > > > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote: > > > > > > > > > > Hi Dave, > > > > Hi Tim; various responses inline below... > > > > > > > > > > > > I did spent some time to think about the zero state machine. > > > I > > > first > > > > > thought about distinguishing between "assigned zero" and "EQ > > > 0 > > > > > condition on the path" for cases where e.g. unreachable() is > > > used > > > to > > > > > say that some variable will never be zero according to the > > > programmer. > > > > > In that case the dev might not want zero warnings from > > > conditions > > > > > outside the if body itself for dev build where unreachable > > > expands > > > to > > > > > some panic exit. But as the condition constraint is not > > > pruned > > > when the > > > > > state machine is distinguishing the states, I'm not sure how > > > to > > > know > > > > > whether the analysis already left the if body? > > > > > > > > The analyzer works on the gimple-ssa representation, which uses > > > basic > > > > blocks in a CFG, rather than an AST, so the only remants we > > > have > > > of > > > > scope is in "clobber" statements (a special kind of assignment > > > stmt), > > > > which the gimplify adds as variables go out of scope. > > > If the constraints only lived until the immediate dominator of `if > > > (cond)`, I could easily distinguish: > > > 1. if (x == 0) && still inside the if => zero > > > 2. if (x == 0) && outside if => maybe zero > > > but as this seems to be not the case, if I want to distinguish 1. > > > & > > > 2., > > > I'd have to find another way. > > > > > > > > For pruning, the analyzer's state_machine class has a > > > "can_purge_p" > > > > virtual function: > > > > > > > > /* Return true if it safe to discard the given state (to help > > > > when simplifying state objects). > > > > States that need leak detection should return false. */ > > > > virtual bool can_purge_p (state_t s) const = 0; > > > > > > > > which should return true for a "zeroness" state machine, in > > > that > > > we > > > > always consider pruning states for svalues that aren't needed > > > anymore > > > > along a path. > > > Is implemented and returns true. > > > > > > > > Is there some other kind of state explosion you're running > > > into? > > > It's > > > > hard to figure this out further without seeing code. > > > No, my code is by far not that mature to be tested. I just had in > > > my > > > head that I wanted to find out if I can distinguish the two cases. > > > > > > > > > > > > > Also, while trying out different things, it seems simple > > > assignments on > > > > > phi like here > > > > > int x; > > > > > if (argc == 1) { > > > > > x = 1; // x_5 > > > > > } else { > > > > > x = 0; // x_4 > > > > > } > > > > > printf("%i", 5 / x); // x_2 > > > > > automatically work such that x_2 already inherits the state > > > from > > > > > x_4/x_5 without me doing anything inside my sm's on_phi > > > function. > > > Same > > > > > for the simple y = 0; x = y; case. Where does this happen > > > inside > > > the > > > > > code? > > > > > > > > With the caveat that I'm seeing your code, what's probably > > > happening > > > is > > > > that we have either: > > > > > > > > BB (a): > > > > x_5 = 1; > > > > goto BB (c); > > > > > > > > BB (b): > > > > x_4 = 0; > > > > goto BB (c); > > > > > > > > BB (c): > > > > x_2 = PHI (x_5 from (a), x_4 from (b)); > > > I compiled it with -g, so this one is like the dumped gimple. > > > > > > > > or (at higher optimization levels): > > > > > > > > BB (a): > > > > goto BB (c); > > > > > > > > BB (b): > > > > goto BB (c); > > > > > > > > BB (c): > > > > x_2 = PHI (1 from (a), 0 from (b)); > > > > > > > > and I think that at the phi node we have > > > region_model::handle_phi, > > > > which is setting x_2 to either the constant 1 or the constant 0 > > > in > > > the > > > > store, and is calling the on_phi vfunc, leading to on_phi being > > > called > > > > for all state machines. > > > Thanks, that is the case. The set_value inside handle_phi seems to > > > this > > > for me. > > > > > > > > BTW, are you implementing an override for this vfunc: > > > > virtual state_machine::state_t get_default_state (const svalue > > > *) > > > > const; > > > > > > > > to capture the inherently known zeroness/nonzeroness of > > > constant_svalue > > > > instances? That would make those constants have that state. > > > Yeah, I saw that on your nullness check. I tried it on a small > > > example > > > with and without, but didn't noticed a difference in warnings > > > (except > > > for not having zero(x_4) inside the supergraph.dot). So if I > > > understood > > > this right, this is just to have one state less for that > > > variable/value[0]? > > > > The states are stored in sm_state_map using a mapping from svalue to > > state. > > > > Given e.g. a parameter, this will be "initial_svalue (parm)", but in > > the above case where x_4 either has value 1 or has value 0, it's not > > storing a state for x_4; it's storing a state for 0 or a state for 1. > > So without implementing the get_default_state vfunc, the > > sm_state_maps > > will gradually aquire explicit "this is zero" or "this is nonzero" > > for > > all of the constants that get used, leading to an explosion of states > > for all the different combinations of constants that have been > > encountered along an execution path, which is probably not going to > > be > > useful. If you do implement the get_default_state vfunc for > > constants, > > then the constants will implicitly have the zero vs nonzero states, > > and > > this information won't get stored in the sm_state_map instances. > > > > program_state::can_merge_with_p compares the sm_state_map instances > > between the two program_state instances - and so if the > > get_default_state vfunc is implemented for constants, the peer > > sm_state_maps for the "zero" state machine will both be empty, and > > thus > > equal, and thus it will merge state. > > > > So it might make sense to revamp this sm_state_map comparison so that > > we consider implicit states in sm_state_maps when merging the store, > > so > > that we see that "1" and "0" have different sm-states for the "zero" > > state machine, and thus we shouldn't merge states for these. See > > binding_cluster::can_merge_p, which calls svalue::can_merge_p. It > > might make sense to have svalue::can_merge_p "know" about > > sm-state-maps > > for this. > > > > svalue::can_merge_p calls model_merger::mergeable_svalue_p, so that > > we > > can reject merging values that explicitly have different state- > > machine > > states, but that code could be reworked so that it considers whether > > a > > pair of svalues can be merged, using implicit state-machine states > > (and > > thus have it reject the merger of a zero with a nonzero constant). > > > > Or, whilst prototyping, you could simply hack in a rejection of > > merging > > zero vs non-zero integer values into svalue::can_merge_p. > > > > > > > > If that is right, is it also favorable to "merge" the stop state > > > and > > > non_zero state inside the zero state machine because - for now - > > > there > > > is no warning planned on non-zero values? > > > > > > [0] less states are favorable because then the analyzer maybe has > > > less > > > different enodes to visit and thus less runtime(?) > > > > > > > > Thanks > > > > Dave > > > > > > > > > - Tim > > > > > > Also another question unrelated to the ones before. I do have a > > > weird > > > bug in my zero sm[1] but I'm unsure where my sm is flawed. Take > > > for > > > example, the probably simplest case: > > > int x = 0;h > > > printf("%d", 42 / x); > > > If I use inform to emit a notice for my state machine result, it > > > seems > > > to be correct and I do get following traversal order (by printing > > > the > > > gimple stmt inside on_stmt): > > > x_2 = 0; > > > _1 = 42 % x_2; > > > # .MEM_4 = VDEF <.MEM_3(D)> > > > printf ("%d", _1); > > > _5 = 0; > > > <L0>: > > > # VUSE <.MEM_4> > > > return _5; > > > But if i use my zero_diagnostics and sm_ctxt->warn to emit the > > > warning, > > > I get an unexpected traversal order of: > > > x_2 = 0; > > > _1 = 42 / x_2; > > > # .MEM_4 = VDEF <.MEM_3(D)> > > > printf ("%d", _1); > > > _5 = 0; > > > <L0>: > > > # VUSE <.MEM_4> > > > return _5; > > > x_2 = 0; <-- why do I get these stmts again but > > > they > > > are not as > > > duplicated in the e-supergraph? > > > _1 = 42 / x_2; > > > _5 = 0; > > > _1 = 42 / x_2; > > > _5 = 0; > > > > What exactly is your test source file? > It is just the two lines wrapped into main(): > #include <stdio.h> > int main(int argc, char **argv) { > int x = 0; > printf("%d", 42 / x); > return 0; > } > > > > Note that a function being called by another function will get > > explored > > in the exploded_graph twice: once being analyzed "standalone", and > > again being analyzed as called from the other function. > > > > When you say "I get these stmts again", do you mean the state machine > > code is seeing them again, or the region model code? > In the state_machine::on_stmt function. > > > > sm_ctxt->warn (new foo) > > > > will save an instance of a pending_diagnostic into the > > diagnotic_manager, associating it with a particular exploded_node > > (and > > some other info). > > > > After the exploded_graph has been built, the diagnostic_manager does > > some deduplication work, and then tries to find a feasible path to > > each > > pending_diagnostic's exploded node. > > > > This feasibility analysis reruns much of the region_model code that > > was > > run when building the exploded graph. So that's another way you can > > "see" stmts repeatedly. > Okay, that probably explains it. I had two distinct problems where I > thought they might be both related to seeing the statements again: > | 13 | int x = 0; > | | ^ > | | | > | | (1) set to zero here > | | (3) set to zero here > | 14 | printf("%d", 42 / x); > | | ~~~~~~~~~~~~~~~~~~~~ > | | | > | | (4) division by zero > | 15 | > | 16 | return 0; > | | ~ > | | | > | | (2) set to zero here > | | (5) set to zero here
Aha: when you say "seeing the statements again", you're referring to the diagnostic_path that's emitted. gcc/analyzer/diagnostic-manager.cc finds a suitable path through the exploded graph for the warning, and then walks the path, generating a list of checker_event instances. It then "optimizes" this list of checker_events, and the result is what then gets visualized in ASCII art. Something really weird's happening in the above. The way diagnostic-manager.cc generates events could definitely be improved - it tries to track the state of interest, and thus generate events that relate to it, but it only knows about state machine changes, not region_model changes, and it doesn't do a good job of the former. I've got some ideas for rewriting it which I'm hoping to get to later this month, once I finish my work on SARIF. Basically, my plan is to update the exploded_graph exploration so that I capture a list of state changes on each edge, capturing the sequence of how each source state became each destination state, in a reversible way, so that for any given diagnostic I hope we'll be able to walk backwards along a graph and see what the changes were, and where they came from. Given that, I think it's best if you put your state machine work on hold for now, and focus on the other tests you were interested in. FWIW, I just added some more ideas to the tracker bug here: https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=105887&hide_resolved=1 at least one of which looks fairly easy (105947). > * In line 13, there are two state changes while I'd expect only one. > => I re-looked at the other state machines and they transfer to the > stop state after the warning. If I do the same, my problem in line 13 > is fixed. > * The other problem in line 16 is that return 0 is not relevant toward > the division by zero. > => After looking at the diagnostic code, I think those should be > filtered out but aren't because x (in gimple x_2) and return 0 (in > gimple _5) share the (int)0 state. Inside > diagnostic_manager::prune_for_sm_diagnostic there is a check that the > state_change->m_sval is equal to the sval of the saved_diagnostic, > which is not enough to filter unrelated events in the zero state > machine case. I don't know that it makes much sense to track the states on constants; it makes more sense for things like the initial_value (x), if x were a parameter. It may be that the current implementation of state machines in the analyzer just isn't a good fit for this test. > I tried to filter the event by comparing the state of interest but that > doesn't work if the unrelated event is after the related event on the > path. > Similarly, I tried to add a new field m_var to state_change_event and > provide this as a comparison but in most functions that call the > constructor of state_change_event, the var is not available. > > Is there is any easy way to filter these unrelated events without > considerable changes [1]? > > By the way, I can trigger the same behavior on the malloc null_deref > with gcc version 12.1.1 shipped in Fedora 36. Assume this simple > program: > #include <stddef.h> > int main (void) > { > int *p = NULL; > *p = 42; > > int *q = NULL; > > return 0; > } > results in the following warning: > /home/tim/Projects/simple_c/main.c: In function ‘main’: > /home/tim/Projects/simple_c/main.c:12:6: warning: dereference of NULL > ‘p’ [CWE-476] [-Wanalyzer-null-dereference] > 12 | *p = 42; > | ~~~^~~~ > ‘main’: events 1-4 > | > | 11 | int *p = NULL; > | | ^ > | | | > | | (1) ‘p’ is NULL > | 12 | *p = 42; > | | ~~~~~~~ > | | | > | | (4) dereference of NULL ‘p’ > | 13 | > | 14 | int *q = NULL; > | | ~ > | | | > | | (2) ‘p’ is NULL > | | (3) ‘p’ is NULL > That definitely looks like a bug in the code I was talking about earlier. I've filed this as: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105958 Thanks Dave > > > > > > > I tracked the cause down the call stack to > > > m_saved_diagnostics.safe_push (sd); inside > > > diagnostic_manager::add_diagnostic. For whatever reason, the > > > pushing > > > of > > > diagnostics bricks the zero sm. It might be a embarrassing error > > > on > > > my > > > side, but I'm stuck on this and can't seem to find what I'm doing > > > wrong. > > > > FWIW, assuming you're using the gdb support scripts: > > > > > > https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html#support-scripts-for-gdb > > there's a handy: > > > > (gdb) break-on-saved-diagnostic > > > > command which adds a breakpoint on the analyzer's diagnostic_manager > > saving a pending_diagnostic. > > > > You might also what to try -fdump-analyzer-stderr, which will dump > > lots > > of information on what the analyzer is doing to stderr, which > > although > > verbose is usually very helpful in figuring out why the analyzer is > > doing something. > > > > Hope this is helpful > Definitely helped me understand more of the internals! > > - Tim > > Dave > > > > > > > > > > > > - Tim > > > > > > [1] > > > > > > > > > > > > https://github.com/timll/gcc/blob/castsize/gcc/analyzer/sm-zero.cc#L181 > > > > > > > > > > > >