Re: Update on CPython Extension Module -fanalyzer plugin development

Eric Feng via Gcc Mon, 21 Aug 2023 07:07:12 -0700

Hi Dave,

Just wanted to give you and everyone else a short update on how
reference count checking is going — we can now observe the refcnt
diagnostic being emitted:


rc3.c:22:10: warning: REF COUNT PROBLEM
   22 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ fails
    |......
    |   22 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

I will fix up and refactor the logic for counting the actual ref count
before coming back and refining the diagnostic to give much more
detailed information.

Best,
Eric


On Wed, Aug 16, 2023 at 9:47 PM Eric Feng <[email protected]> wrote:
>
> Hi Dave,
>
> Thanks for the feedback!
>
>
> On Wed, Aug 16, 2023 at 5:29 PM David Malcolm <[email protected]> wrote:
> >
> > On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote:
> > > Hi everyone,
> >
> > [fixing typo in my email address]
> >
> > Hi Eric, thanks for the update, and the WIP patch.
> >
> > >
> > > After pushing the code that supports various known function classes last 
> > > week,
> > > I've turned my attention back to the core reference count checking
> > > functionality. This functionality used to reside in region_model, which
> > > wasn't ideal. To address this, I've introduced a hook to register 
> > > callbacks
> > > to pop_frame. Specifically, this allows the code that checks the reference
> > > count and emits diagnostics to be housed within the plugin, rather than 
> > > the
> > > core analyzer.
> > >
> > > As of now, the parameters of pop_frame_callback are tailored specifically 
> > > to
> > > our needs. If the use of callbacks at the end of pop_frame becomes more
> > > prevalent, we can revisit the setup to potentially make it more general.
> > >
> > > Moreover, the core reference count checking logic was previously somewhat
> > > bloated, contained in one extensive function. I've since refactored it,
> > > breaking it down into several helper functions to simplify and reduce
> > > complexity. There are still some aspects that need refinement, especially
> > > since the plugin has seen changes since I last worked on this logic. 
> > > However,
> > > I believe that there aren't any significant problems.
> >
> > Suggestion: introduce some more decls into analyzer-decls.h and
> > known_functions for them into the plugin so that you can run/test/debug
> > the helper functions independently (similar to the existing ones in kf-
> > analyzer.cc).
> >
> > e.g.
> >   extern void __analyzer_cpython_dump_real_refcounts (void);
> >   extern void __analyzer_cpython_dump_ob_refcnt (void);
> >
> > >
> Thanks for the suggestion. This will be even more helpful now that we
> have split the logic into helper functions. I will look into these
> when I come back to the "is there a refcount problem" side of the
> equation.
> > > Currently, I've started working a custom stmt_finder similar to 
> > > leak_stmt_finder
> > > to address the issue of m_stmt and m_stmt_finder being NULL at the time of
> > > region_model::pop_frame. This approach was discussed as a viable solution 
> > > in
> > > a previous email, and I'll keep everyone posted on my progress. 
> > > Afterwards, I
> > > will go back to address the refinements necessary mentioned above.
> >
> > You might want to experiment with splitting out
> > (a) "is there a refcount problem" from
> > (b) "emit a refcount problem".
> >
> > For example, you could hardcode (a) to true, so we always complain with
> > (b) on every heap-allocated object, just to debug the stmt_finder
> > workaround.
> >
> >
> > [...snip...]
> >
> > BTW, you don't need to bother to write ChangeLog entries if you're just
> > sending a work-in-progress for me.
> >
> > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
> > > b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > > index 7cd72e8a886..918bb5a5587 100644
> > > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> >
> > [...]
> >
> > > +/* For PyListObjects: processes the ob_item field within the current 
> > > region and
> > > + * increments the reference count if conditions are met. */
> > > +void
> > > +process_ob_item_region (const region_model *model, region_model_manager 
> > > *mgr,
> > > +                       region_model_context *ctxt, const region 
> > > *curr_region,
> > > +                       const svalue *pylist_type_ptr, const region 
> > > *base_reg,
> > > +                       int &actual_refcnt)
> >
> > You seem to be special-casing PyListObject here; why?  That seems like
> > it's not going to be scalable tothe general case.
> >
> > Am I right in thinking the intent of this code is to count the actual
> > number of pointers in memory that point to a particular region?
> >
> > Doesn't the ob_item buffer show up in the store as another cluster?
> > Can't you just look at the bindings in the clusters and tally up the
> > pointers for each heap_allocated_region? (accumulating a result map
> > from region to int of the actual reference counts).
> Yes, you're correct, I hadn't thought of that ... This simplifies a
> lot of things. Thanks for the great suggestion!
> >
> > Or am I missing something?
> >
> > What does
> >   model->debug ();
> > show in your examples?
> >
> >
> > > +{
> > > +  tree ob_item_field_tree = get_field_by_name (pylistobj_record, 
> > > "ob_item");
> > > +  const region *ob_item_field_reg
> > > +      = mgr->get_field_region (curr_region, ob_item_field_tree);
> > > +  const svalue *ob_item_ptr = model->get_store_value (ob_item_field_reg, 
> > > ctxt);
> > > +
> > > +  if (const auto &cast_ob_item_reg = ob_item_ptr->dyn_cast_region_svalue 
> > > ())
> > > +    {
> > > +      const region *ob_item_reg = cast_ob_item_reg->get_pointee ();
> > > +      const svalue *allocated_bytes = model->get_dynamic_extents 
> > > (ob_item_reg);
> > > +      const region *ob_item_sized = mgr->get_sized_region (
> > > +         ob_item_reg, pyobj_ptr_ptr, allocated_bytes);
> > > +      const svalue *buffer_contents_sval
> > > +         = model->get_store_value (ob_item_sized, ctxt);
> > > +
> > > +      if (const auto &buffer_contents
> > > +         = buffer_contents_sval->dyn_cast_compound_svalue ())
> > > +       {
> > > +         for (const auto &buffer_content : buffer_contents->get_map ())
> > > +           {
> > > +                   const auto &content_value = buffer_content.second;
> > > +                   if (const auto &content_region
> > > +                       = content_value->dyn_cast_region_svalue ())
> > > +                           if (content_region->get_pointee () == 
> > > base_reg)
> > > +                                   actual_refcnt++;
> > > +           }
> > > +       }
> > > +    }
> > > +}
> > > +
> > > +/* Counts the actual references from all clusters in the model's store. 
> > > */
> > > +int
> > > +count_actual_references (const region_model *model, region_model_manager 
> > > *mgr,
> > > +                        region_model_context *ctxt, const region 
> > > *base_reg,
> > > +                        const svalue *pylist_type_ptr, tree 
> > > ob_type_field)
> > > +{
> > > +  int actual_refcnt = 0;
> > > +  for (const auto &other_cluster : *model->get_store ())
> > > +    {
> > > +      for (const auto &binding : other_cluster.second->get_map ())
> > > +       {
> > > +         const auto &sval = binding.second;
> > > +         const auto &curr_region = sval->maybe_get_region ();
> > > +
> > > +         if (!curr_region || curr_region->get_kind () != 
> > > RK_HEAP_ALLOCATED)
> > > +           continue;
> > > +
> > > +         increment_count_if_base_matches (curr_region, base_reg,
> > > +                                           actual_refcnt);
> > > +
> > > +         const region *ob_type_region
> > > +             = mgr->get_field_region (curr_region, ob_type_field);
> > > +         const svalue *stored_sval
> > > +             = model->get_store_value (ob_type_region, ctxt);
> > > +         const auto &remove_cast = stored_sval->dyn_cast_unaryop_svalue 
> > > ();
> > > +
> > > +         if (!remove_cast)
> > > +           continue;
> > > +
> > > +         const svalue *type = remove_cast->get_arg ();
> > > +         if (type == pylist_type_ptr)
> > > +           process_ob_item_region (model, mgr, ctxt, curr_region,
> > > +                                   pylist_type_ptr, base_reg, 
> > > actual_refcnt);
> > > +       }
> > > +    }
> > > +  return actual_refcnt;
> > > +}
> > >
> >
> > Hope the above is constructive.
> > Dave
> >
>
> Best,
> Eric

Re: Update on CPython Extension Module -fanalyzer plugin development

Reply via email to