date:20230823

analyzer: How to recognize builtins

2023-08-23 Thread Benjamin Priour via Gcc

Hi David,

A quick update on transitioning the analyzer tests from gcc.dg to
c-c++-common.
As discussed previously, C builtins are not C++ builtins, therefore I had
to add
recognition of the builtins by their name rather than with the predicate
fndecl_built_in_p.

Do you know why this happens?  I see both of those tests have their own
> prototypes for the functions in question, rather than using system
> headers; maybe those prototypes don't match some property that the C++
> FE is checking for?
>

The C FE relies on c-decl.cc:c_builtin_function, the macro
C_DECL_BUILTIN_PROTOTYPE,
as well as c-decl.cc:match_builtin_function_types to recognize a builtin
function decl
and provide a unified type between what the user wrote, and the known
builtin fndecl.

I'd like however to correct something I've said previously

My question is then : Should G++ behave as GCC and ignore the user's
> signatures of built-ins, instead using the attributes specified by GCC ?
> At the moment I went with a "yes". If the function is a builtin, I'm
> operating on its builtin_known_function::builtin_decl () (see above) rather
> than the callee_fndecl
> deduced from a gimple call, therefore overriding the user's signature.
>

Actually, GCC does not ignore the user's signatures of built-ins, but if
match_builtin_function_types finds out that the user's signature
is close enough to the one described in builtins.def, then it will complete
the user's add with attributes and the likes so that we get an unified type.

A case where the user's signature *isn't* close enough to the built-in's is
gcc.dg/analyzer/pr96841.c's malloc's signature.

__SIZE_TYPE__
malloc (__SIZE_TYPE__);

To simplify, whenever a Wbuiltin-declaration-mismatch is emitted, then the
user's decl isn't unified with the built-ins,
and the extra GCC's attributes defined in builtins.def are not added to the
user's decl.
In such cases, the function decl is not recognized as a builtin by
fndecl_builtin_p, hence the analyzer's (before my patch) known_function
is never found out.
What this mean is that pr96841.c's malloc uses don't go within
kf_malloc:impl_call_pre nor kf_malloc:impl_call_post with the current
analyzer's handling of built-ins - do note however that sm-malloc goes as
expected, as it recognizes malloc calls both by name
and builtin decl (sm-malloc.cc::known_allocator_p).

With my patch, since the recognition is by name and no longer dependent on
the fndecl_builtin_p predicate, pr96841's malloc *is*
recognized as a builtin, and its kf is processed.
This led to pr96841.c no longer passing, as the while loop makes the
analysis too complex. (-Wanalyzer-too-complex)

> > My question is then : Should G++ behave as GCC and ignore the user's
> > signatures of built-ins, instead using the attributes specified by
> > GCC ?
> > At the moment I went with a "yes". If the function is a builtin, I'm
> > operating on its builtin_known_function::builtin_decl () (see above)
> > rather
> > than the callee_fndecl
> > deduced from a gimple call, therefore overriding the user's
> > signature.
> >
> > Doing so led to tests sprintf-2.c and realloc-1.c to pass both in GCC
> > and
> > G++.
>
> I think I agree with "yes" here.

I would like to proceed with what we said here, so that the analyzer
proceeds
with the analysis of malloc or any other builtins even when their fndecl is
not recognized
as such. Then I would update pr96841.c to expect a -Wanalyzer-too-complex.
Besides, sm-malloc already makes the assumption that a call named malloc
should be treated
as a malloc, whether it is matching a builtin or not.

Thanks,
Benjamin.

Re: [PATCH v7 2/4] p1689r5: initial support

2023-08-23 Thread Jason Merrill via Gcc


On 7/2/23 12:32, Ben Boeckel wrote:

This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
   `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdeps-target=` specifies the `.o` that will be written for the TU
   that is scanned. This is required so that the build system can
   correlate the dependency output with the actual compilation that will
   occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: 
https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously representable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.


Thank you for your patience, just a few tweaks left and I think we can 
put this all in this week or next.



diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index aef703f8111..141cfd60eda 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -302,6 +302,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
  /* Style of header dependencies to generate.  */
  enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
  
+/* Structured format of module dependencies to generate.  */

+enum cpp_fdeps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };


These should be FDEPS_FMT_* or just FDEPS_*.


@@ -395,10 +423,16 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
int colmax)
   if (colmax && colmax < 34)
 colmax = 34;
 
+  /* Write out C++ modules information if no other `-fdeps-format=`

+ option is given. */
+  cpp_fdeps_format fdeps_format = CPP_OPTION (pfile, deps.fdeps_format);
+  bool write_make_modules_deps = fdeps_format == DEPS_FMT_NONE &&
+CPP_OPTION (pfile, deps.modules);


We typically format an expression like this as:


+  bool write_make_modules_deps = (fdeps_format == FDEPS_FMT_NONE
+ && CPP_OPTION (pfile, deps.modules));



@@ -473,6 +507,117 @@ deps_write (const cpp_reader *pfile, FILE *fp, unsigned 
int colmax)
   make_write (pfile, fp, colmax);
 }
 
+static void

+p1689r5_write_filepath (const char *name, FILE *fp)


This and the other p1689r5 functions need more comments, at least one at 
the top explaining their purpose.


Jason

Re: Update on CPython Extension Module -fanalyzer plugin development

2023-08-23 Thread Eric Feng via Gcc

On Mon, Aug 21, 2023 at 11:04 AM David Malcolm  wrote:
>
> On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote:
> > Hi Dave,
> >
> > Just wanted to give you and everyone else a short update on how
> > reference count checking is going — we can now observe the refcnt
> > diagnostic being emitted:
> >
> > rc3.c:22:10: warning: REF COUNT PROBLEM
> >22 |   return list;
> >   |  ^~~~
> >   ‘create_py_object’: events 1-4
> > |
> > |4 |   PyObject* item = PyLong_FromLong(3);
> > |  |^~
> > |  ||
> > |  |(1) when ‘PyLong_FromLong’ succeeds
> > |5 |   PyObject* list = PyList_New(1);
> > |  |~
> > |  ||
> > |  |(2) when ‘PyList_New’ succeeds
> > |..
> > |   14 |   PyList_Append(list, item);
> > |  |   ~
> > |  |   |
> > |  |   (3) when ‘PyList_Append’ fails
> > |..
> > |   22 |   return list;
> > |  |  
> > |  |  |
> > |  |  (4) here
> > |
> >
> > I will fix up and refactor the logic for counting the actual ref
> > count
> > before coming back and refining the diagnostic to give much more
> > detailed information.
>
> Excellent!  Thanks for the update.
>
> Dave
>

Hi Dave,

I've since fixed up the logic to count the actual reference counts of
the PyObject* instances. Now, I'm contemplating the specific
diagnostics we'd want to issue and the appropriate conditions for
emitting them. With this in mind, I wanted to check in with you on the
appropriate approach:

To start, I'm adopting the same assumptions as cpychecker for
functions returning a PyObject*. That is, I'm operating under the
premise that by default such functions return a new reference upon
success rather than, for example, a borrowed reference (which we can
tackle later on). Given this, it's my understanding that the reference
count of the returned object should be 1 if the object is newly
created within the function body and incremented by 1 from what it was
previously if not newly created (e.g passed in as an argument).
Furthermore, the reference count for any PyObject* instances created
within the function should be 0, barring situations where we're
returning a collection, like a list, that includes references to these
objects.

Let me know what you think; thanks!

Best,
Eric

Re: Update on CPython Extension Module -fanalyzer plugin development

2023-08-23 Thread David Malcolm via Gcc

On Wed, 2023-08-23 at 17:15 -0400, Eric Feng wrote:
> On Mon, Aug 21, 2023 at 11:04 AM David Malcolm 
> wrote:
> > 
> > On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote:
> > > Hi Dave,
> > > 
> > > Just wanted to give you and everyone else a short update on how
> > > reference count checking is going — we can now observe the refcnt
> > > diagnostic being emitted:
> > > 
> > > rc3.c:22:10: warning: REF COUNT PROBLEM
> > >    22 |   return list;
> > >   |  ^~~~
> > >   ‘create_py_object’: events 1-4
> > >     |
> > >     |    4 |   PyObject* item = PyLong_FromLong(3);
> > >     |  |    ^~
> > >     |  |    |
> > >     |  |    (1) when ‘PyLong_FromLong’
> > > succeeds
> > >     |    5 |   PyObject* list = PyList_New(1);
> > >     |  |    ~
> > >     |  |    |
> > >     |  |    (2) when ‘PyList_New’ succeeds
> > >     |..
> > >     |   14 |   PyList_Append(list, item);
> > >     |  |   ~
> > >     |  |   |
> > >     |  |   (3) when ‘PyList_Append’ fails
> > >     |..
> > >     |   22 |   return list;
> > >     |  |  
> > >     |  |  |
> > >     |  |  (4) here
> > >     |
> > > 
> > > I will fix up and refactor the logic for counting the actual ref
> > > count
> > > before coming back and refining the diagnostic to give much more
> > > detailed information.
> > 
> > Excellent!  Thanks for the update.
> > 
> > Dave
> > 
> 
> Hi Dave,
> 
> I've since fixed up the logic to count the actual reference counts of
> the PyObject* instances. 

Sounds promising.

> Now, I'm contemplating the specific
> diagnostics we'd want to issue and the appropriate conditions for
> emitting them. With this in mind, I wanted to check in with you on
> the
> appropriate approach:
> 
> To start, I'm adopting the same assumptions as cpychecker for
> functions returning a PyObject*. That is, I'm operating under the
> premise that by default such functions return a new reference upon
> success rather than, for example, a borrowed reference (which we can
> tackle later on). Given this, it's my understanding that the
> reference
> count of the returned object should be 1 if the object is newly
> created within the function body and incremented by 1 from what it
> was
> previously if not newly created (e.g passed in as an argument).
> Furthermore, the reference count for any PyObject* instances created
> within the function should be 0, barring situations where we're
> returning a collection, like a list, that includes references to
> these
> objects.
> 
> Let me know what you think; thanks!

This sounds like a good approach for v1 of the implementation.

It's probably best to focus on getting a simple version of the patch
into trunk, and leave any polish of it to followups.

In terms of deciding what the reference count of a returned PyObject *
ought to be, cpychecker had logic to try to detect callbacks used by
PyMethodDef tables, so that e.g. in:

static PyMethodDef widget_methods[] = {
{"display",
 (PyCFunction)widget_display,
 (METH_VARARGS | METH_KEYWORDS), /* ml_flags */
 NULL},

{NULL, NULL, 0, NULL} /* terminator */
};

...we'd know that the callback function "widget_display" follows the
standard rules for a PyCFunction (e.g. returns a new reference).

But that's for later; don't bother trying to implement that until we
have the basics working.

Is it worth posting a work-in-progress patch of what you have so far? 
(you don't need to bother with a ChangeLog for that)

Dave

Re: analyzer: How to recognize builtins

2023-08-23 Thread David Malcolm via Gcc

On Wed, 2023-08-23 at 15:55 +0200, Benjamin Priour wrote:
> Hi David,
> 
> A quick update on transitioning the analyzer tests from gcc.dg to
> c-c++-common.
> As discussed previously, C builtins are not C++ builtins, therefore I
> had
> to add
> recognition of the builtins by their name rather than with the
> predicate
> fndecl_built_in_p.
> 
> Do you know why this happens?  I see both of those tests have their
> own
> > prototypes for the functions in question, rather than using system
> > headers; maybe those prototypes don't match some property that the
> > C++
> > FE is checking for?
> > 
> 
> The C FE relies on c-decl.cc:c_builtin_function, the macro
> C_DECL_BUILTIN_PROTOTYPE,
> as well as c-decl.cc:match_builtin_function_types to recognize a
> builtin
> function decl
> and provide a unified type between what the user wrote, and the known
> builtin fndecl.
> 
> I'd like however to correct something I've said previously
> 
> My question is then : Should G++ behave as GCC and ignore the user's
> > signatures of built-ins, instead using the attributes specified by
> > GCC ?
> > At the moment I went with a "yes". If the function is a builtin,
> > I'm
> > operating on its builtin_known_function::builtin_decl () (see
> > above) rather
> > than the callee_fndecl
> > deduced from a gimple call, therefore overriding the user's
> > signature.
> > 
> 
> Actually, GCC does not ignore the user's signatures of built-ins, but
> if
> match_builtin_function_types finds out that the user's signature
> is close enough to the one described in builtins.def, then it will
> complete
> the user's add with attributes and the likes so that we get an
> unified type.
> 
> A case where the user's signature *isn't* close enough to the built-
> in's is
> gcc.dg/analyzer/pr96841.c's malloc's signature.
> 
> __SIZE_TYPE__
> malloc (__SIZE_TYPE__);
> 
> To simplify, whenever a Wbuiltin-declaration-mismatch is emitted,
> then the
> user's decl isn't unified with the built-ins,
> and the extra GCC's attributes defined in builtins.def are not added
> to the
> user's decl.
> In such cases, the function decl is not recognized as a builtin by
> fndecl_builtin_p, hence the analyzer's (before my patch)
> known_function
> is never found out.
> What this mean is that pr96841.c's malloc uses don't go within
> kf_malloc:impl_call_pre nor kf_malloc:impl_call_post with the current
> analyzer's handling of built-ins - do note however that sm-malloc
> goes as
> expected, as it recognizes malloc calls both by name
> and builtin decl (sm-malloc.cc::known_allocator_p).
> 
> With my patch, since the recognition is by name and no longer
> dependent on
> the fndecl_builtin_p predicate, pr96841's malloc *is*
> recognized as a builtin, and its kf is processed.
> This led to pr96841.c no longer passing, as the while loop makes the
> analysis too complex. (-Wanalyzer-too-complex) 

Are you able to post the part of that patch that affects recognition of
builtins?  (even if it's just a work-in-progress with no ChangeLog)

> 
> > > My question is then : Should G++ behave as GCC and ignore the
> > > user's
> > > signatures of built-ins, instead using the attributes specified
> > > by
> > > GCC ?
> > > At the moment I went with a "yes". If the function is a builtin,
> > > I'm
> > > operating on its builtin_known_function::builtin_decl () (see
> > > above)
> > > rather
> > > than the callee_fndecl
> > > deduced from a gimple call, therefore overriding the user's
> > > signature.
> > > 
> > > Doing so led to tests sprintf-2.c and realloc-1.c to pass both in
> > > GCC
> > > and
> > > G++.
> > 
> > I think I agree with "yes" here.
> 
> I would like to proceed with what we said here, so that the analyzer
> proceeds
> with the analysis of malloc or any other builtins even when their
> fndecl is
> not recognized
> as such. Then I would update pr96841.c to expect a -Wanalyzer-too-
> complex.
> Besides, sm-malloc already makes the assumption that a call named
> malloc
> should be treated
> as a malloc, whether it is matching a builtin or not.

Sounds good.

Dave

[RISC-V] [SIG-toolchain] Meeting will be canceled (Aug 24, 2023)

2023-08-23 Thread jiawei

Hi all,




Today's meeting will be canceled, since during the China RISC-V Summit.




The next RISC-V GNU Toolchain meeting is collecting topics:

https://docs.google.com/document/d/1JSs-BSlPJ3QYbAb-Add1TlbYx0nOT1ur3jcsITIJ01U/edit?usp=sharing

Please add what want to discuss in the next meeting.







Regards,

Jiawei

analyzer: How to recognize builtins

Re: [PATCH v7 2/4] p1689r5: initial support

Re: Update on CPython Extension Module -fanalyzer plugin development

Re: Update on CPython Extension Module -fanalyzer plugin development

Re: analyzer: How to recognize builtins

[RISC-V] [SIG-toolchain] Meeting will be canceled (Aug 24, 2023)

6 matches

Site Navigation

Mail list logo

Footer information