Re: Update on CPython Extension Module -fanalyzer plugin development

David Malcolm via Gcc Wed, 16 Aug 2023 14:29:52 -0700

On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote:
> Hi everyone,


[fixing typo in my email address]

Hi Eric, thanks for the update, and the WIP patch.

> 
> After pushing the code that supports various known function classes last week,
> I've turned my attention back to the core reference count checking 
> functionality. This functionality used to reside in region_model, which 
> wasn't ideal. To address this, I've introduced a hook to register callbacks 
> to pop_frame. Specifically, this allows the code that checks the reference 
> count and emits diagnostics to be housed within the plugin, rather than the 
> core analyzer.
> 
> As of now, the parameters of pop_frame_callback are tailored specifically to 
> our needs. If the use of callbacks at the end of pop_frame becomes more 
> prevalent, we can revisit the setup to potentially make it more general.
> 
> Moreover, the core reference count checking logic was previously somewhat 
> bloated, contained in one extensive function. I've since refactored it, 
> breaking it down into several helper functions to simplify and reduce
> complexity. There are still some aspects that need refinement, especially 
> since the plugin has seen changes since I last worked on this logic. However, 
> I believe that there aren't any significant problems.

Suggestion: introduce some more decls into analyzer-decls.h and
known_functions for them into the plugin so that you can run/test/debug
the helper functions independently (similar to the existing ones in kf-
analyzer.cc).

e.g.
  extern void __analyzer_cpython_dump_real_refcounts (void);
  extern void __analyzer_cpython_dump_ob_refcnt (void);

> 
> Currently, I've started working a custom stmt_finder similar to 
> leak_stmt_finder 
> to address the issue of m_stmt and m_stmt_finder being NULL at the time of 
> region_model::pop_frame. This approach was discussed as a viable solution in 
> a previous email, and I'll keep everyone posted on my progress. Afterwards, I 
> will go back to address the refinements necessary mentioned above.

You might want to experiment with splitting out
(a) "is there a refcount problem" from
(b) "emit a refcount problem".

For example, you could hardcode (a) to true, so we always complain with
(b) on every heap-allocated object, just to debug the stmt_finder
workaround.


[...snip...]

BTW, you don't need to bother to write ChangeLog entries if you're just
sending a work-in-progress for me.

> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 7cd72e8a886..918bb5a5587 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c

[...]

> +/* For PyListObjects: processes the ob_item field within the current region 
> and
> + * increments the reference count if conditions are met. */
> +void
> +process_ob_item_region (const region_model *model, region_model_manager *mgr,
> +                       region_model_context *ctxt, const region *curr_region,
> +                       const svalue *pylist_type_ptr, const region *base_reg,
> +                       int &actual_refcnt)

You seem to be special-casing PyListObject here; why?  That seems like
it's not going to be scalable tothe general case.

Am I right in thinking the intent of this code is to count the actual
number of pointers in memory that point to a particular region?

Doesn't the ob_item buffer show up in the store as another cluster? 
Can't you just look at the bindings in the clusters and tally up the
pointers for each heap_allocated_region? (accumulating a result map
from region to int of the actual reference counts).

Or am I missing something?

What does
  model->debug ();
show in your examples?


> +{
> +  tree ob_item_field_tree = get_field_by_name (pylistobj_record, "ob_item");
> +  const region *ob_item_field_reg
> +      = mgr->get_field_region (curr_region, ob_item_field_tree);
> +  const svalue *ob_item_ptr = model->get_store_value (ob_item_field_reg, 
> ctxt);
> +
> +  if (const auto &cast_ob_item_reg = ob_item_ptr->dyn_cast_region_svalue ())
> +    {
> +      const region *ob_item_reg = cast_ob_item_reg->get_pointee ();
> +      const svalue *allocated_bytes = model->get_dynamic_extents 
> (ob_item_reg);
> +      const region *ob_item_sized = mgr->get_sized_region (
> +         ob_item_reg, pyobj_ptr_ptr, allocated_bytes);
> +      const svalue *buffer_contents_sval
> +         = model->get_store_value (ob_item_sized, ctxt);
> +
> +      if (const auto &buffer_contents
> +         = buffer_contents_sval->dyn_cast_compound_svalue ())
> +       {
> +         for (const auto &buffer_content : buffer_contents->get_map ())
> +           {
> +                   const auto &content_value = buffer_content.second;
> +                   if (const auto &content_region
> +                       = content_value->dyn_cast_region_svalue ())
> +                           if (content_region->get_pointee () == base_reg)
> +                                   actual_refcnt++;
> +           }
> +       }
> +    }
> +}
> +
> +/* Counts the actual references from all clusters in the model's store. */
> +int
> +count_actual_references (const region_model *model, region_model_manager 
> *mgr,
> +                        region_model_context *ctxt, const region *base_reg,
> +                        const svalue *pylist_type_ptr, tree ob_type_field)
> +{
> +  int actual_refcnt = 0;
> +  for (const auto &other_cluster : *model->get_store ())
> +    {
> +      for (const auto &binding : other_cluster.second->get_map ())
> +       {
> +         const auto &sval = binding.second;
> +         const auto &curr_region = sval->maybe_get_region ();
> +
> +         if (!curr_region || curr_region->get_kind () != RK_HEAP_ALLOCATED)
> +           continue;
> +
> +         increment_count_if_base_matches (curr_region, base_reg,
> +                                           actual_refcnt);
> +
> +         const region *ob_type_region
> +             = mgr->get_field_region (curr_region, ob_type_field);
> +         const svalue *stored_sval
> +             = model->get_store_value (ob_type_region, ctxt);
> +         const auto &remove_cast = stored_sval->dyn_cast_unaryop_svalue ();
> +
> +         if (!remove_cast)
> +           continue;
> +
> +         const svalue *type = remove_cast->get_arg ();
> +         if (type == pylist_type_ptr)
> +           process_ob_item_region (model, mgr, ctxt, curr_region,
> +                                   pylist_type_ptr, base_reg, actual_refcnt);
> +       }
> +    }
> +  return actual_refcnt;
> +}
> 

Hope the above is constructive.
Dave

Re: Update on CPython Extension Module -fanalyzer plugin development

Reply via email to