Hi Dave, Thanks for the feedback!
On Wed, Aug 16, 2023 at 5:29 PM David Malcolm <dmalc...@redhat.com> wrote: > > On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote: > > Hi everyone, > > [fixing typo in my email address] > > Hi Eric, thanks for the update, and the WIP patch. > > > > > After pushing the code that supports various known function classes last > > week, > > I've turned my attention back to the core reference count checking > > functionality. This functionality used to reside in region_model, which > > wasn't ideal. To address this, I've introduced a hook to register callbacks > > to pop_frame. Specifically, this allows the code that checks the reference > > count and emits diagnostics to be housed within the plugin, rather than the > > core analyzer. > > > > As of now, the parameters of pop_frame_callback are tailored specifically to > > our needs. If the use of callbacks at the end of pop_frame becomes more > > prevalent, we can revisit the setup to potentially make it more general. > > > > Moreover, the core reference count checking logic was previously somewhat > > bloated, contained in one extensive function. I've since refactored it, > > breaking it down into several helper functions to simplify and reduce > > complexity. There are still some aspects that need refinement, especially > > since the plugin has seen changes since I last worked on this logic. > > However, > > I believe that there aren't any significant problems. > > Suggestion: introduce some more decls into analyzer-decls.h and > known_functions for them into the plugin so that you can run/test/debug > the helper functions independently (similar to the existing ones in kf- > analyzer.cc). > > e.g. > extern void __analyzer_cpython_dump_real_refcounts (void); > extern void __analyzer_cpython_dump_ob_refcnt (void); > > > Thanks for the suggestion. This will be even more helpful now that we have split the logic into helper functions. I will look into these when I come back to the "is there a refcount problem" side of the equation. > > Currently, I've started working a custom stmt_finder similar to > > leak_stmt_finder > > to address the issue of m_stmt and m_stmt_finder being NULL at the time of > > region_model::pop_frame. This approach was discussed as a viable solution in > > a previous email, and I'll keep everyone posted on my progress. Afterwards, > > I > > will go back to address the refinements necessary mentioned above. > > You might want to experiment with splitting out > (a) "is there a refcount problem" from > (b) "emit a refcount problem". > > For example, you could hardcode (a) to true, so we always complain with > (b) on every heap-allocated object, just to debug the stmt_finder > workaround. > > > [...snip...] > > BTW, you don't need to bother to write ChangeLog entries if you're just > sending a work-in-progress for me. > > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > index 7cd72e8a886..918bb5a5587 100644 > > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > [...] > > > +/* For PyListObjects: processes the ob_item field within the current > > region and > > + * increments the reference count if conditions are met. */ > > +void > > +process_ob_item_region (const region_model *model, region_model_manager > > *mgr, > > + region_model_context *ctxt, const region > > *curr_region, > > + const svalue *pylist_type_ptr, const region > > *base_reg, > > + int &actual_refcnt) > > You seem to be special-casing PyListObject here; why? That seems like > it's not going to be scalable tothe general case. > > Am I right in thinking the intent of this code is to count the actual > number of pointers in memory that point to a particular region? > > Doesn't the ob_item buffer show up in the store as another cluster? > Can't you just look at the bindings in the clusters and tally up the > pointers for each heap_allocated_region? (accumulating a result map > from region to int of the actual reference counts). Yes, you're correct, I hadn't thought of that ... This simplifies a lot of things. Thanks for the great suggestion! > > Or am I missing something? > > What does > model->debug (); > show in your examples? > > > > +{ > > + tree ob_item_field_tree = get_field_by_name (pylistobj_record, > > "ob_item"); > > + const region *ob_item_field_reg > > + = mgr->get_field_region (curr_region, ob_item_field_tree); > > + const svalue *ob_item_ptr = model->get_store_value (ob_item_field_reg, > > ctxt); > > + > > + if (const auto &cast_ob_item_reg = ob_item_ptr->dyn_cast_region_svalue > > ()) > > + { > > + const region *ob_item_reg = cast_ob_item_reg->get_pointee (); > > + const svalue *allocated_bytes = model->get_dynamic_extents > > (ob_item_reg); > > + const region *ob_item_sized = mgr->get_sized_region ( > > + ob_item_reg, pyobj_ptr_ptr, allocated_bytes); > > + const svalue *buffer_contents_sval > > + = model->get_store_value (ob_item_sized, ctxt); > > + > > + if (const auto &buffer_contents > > + = buffer_contents_sval->dyn_cast_compound_svalue ()) > > + { > > + for (const auto &buffer_content : buffer_contents->get_map ()) > > + { > > + const auto &content_value = buffer_content.second; > > + if (const auto &content_region > > + = content_value->dyn_cast_region_svalue ()) > > + if (content_region->get_pointee () == base_reg) > > + actual_refcnt++; > > + } > > + } > > + } > > +} > > + > > +/* Counts the actual references from all clusters in the model's store. */ > > +int > > +count_actual_references (const region_model *model, region_model_manager > > *mgr, > > + region_model_context *ctxt, const region *base_reg, > > + const svalue *pylist_type_ptr, tree ob_type_field) > > +{ > > + int actual_refcnt = 0; > > + for (const auto &other_cluster : *model->get_store ()) > > + { > > + for (const auto &binding : other_cluster.second->get_map ()) > > + { > > + const auto &sval = binding.second; > > + const auto &curr_region = sval->maybe_get_region (); > > + > > + if (!curr_region || curr_region->get_kind () != RK_HEAP_ALLOCATED) > > + continue; > > + > > + increment_count_if_base_matches (curr_region, base_reg, > > + actual_refcnt); > > + > > + const region *ob_type_region > > + = mgr->get_field_region (curr_region, ob_type_field); > > + const svalue *stored_sval > > + = model->get_store_value (ob_type_region, ctxt); > > + const auto &remove_cast = stored_sval->dyn_cast_unaryop_svalue (); > > + > > + if (!remove_cast) > > + continue; > > + > > + const svalue *type = remove_cast->get_arg (); > > + if (type == pylist_type_ptr) > > + process_ob_item_region (model, mgr, ctxt, curr_region, > > + pylist_type_ptr, base_reg, > > actual_refcnt); > > + } > > + } > > + return actual_refcnt; > > +} > > > > Hope the above is constructive. > Dave > Best, Eric