analyzer: How to recognize builtins
Hi David, A quick update on transitioning the analyzer tests from gcc.dg to c-c++-common. As discussed previously, C builtins are not C++ builtins, therefore I had to add recognition of the builtins by their name rather than with the predicate fndecl_built_in_p. Do you know why this happens? I see both of those tests have their own > prototypes for the functions in question, rather than using system > headers; maybe those prototypes don't match some property that the C++ > FE is checking for? > The C FE relies on c-decl.cc:c_builtin_function, the macro C_DECL_BUILTIN_PROTOTYPE, as well as c-decl.cc:match_builtin_function_types to recognize a builtin function decl and provide a unified type between what the user wrote, and the known builtin fndecl. I'd like however to correct something I've said previously My question is then : Should G++ behave as GCC and ignore the user's > signatures of built-ins, instead using the attributes specified by GCC ? > At the moment I went with a "yes". If the function is a builtin, I'm > operating on its builtin_known_function::builtin_decl () (see above) rather > than the callee_fndecl > deduced from a gimple call, therefore overriding the user's signature. > Actually, GCC does not ignore the user's signatures of built-ins, but if match_builtin_function_types finds out that the user's signature is close enough to the one described in builtins.def, then it will complete the user's add with attributes and the likes so that we get an unified type. A case where the user's signature *isn't* close enough to the built-in's is gcc.dg/analyzer/pr96841.c's malloc's signature. __SIZE_TYPE__ malloc (__SIZE_TYPE__); To simplify, whenever a Wbuiltin-declaration-mismatch is emitted, then the user's decl isn't unified with the built-ins, and the extra GCC's attributes defined in builtins.def are not added to the user's decl. In such cases, the function decl is not recognized as a builtin by fndecl_builtin_p, hence the analyzer's (before my patch) known_function is never found out. What this mean is that pr96841.c's malloc uses don't go within kf_malloc:impl_call_pre nor kf_malloc:impl_call_post with the current analyzer's handling of built-ins - do note however that sm-malloc goes as expected, as it recognizes malloc calls both by name and builtin decl (sm-malloc.cc::known_allocator_p). With my patch, since the recognition is by name and no longer dependent on the fndecl_builtin_p predicate, pr96841's malloc *is* recognized as a builtin, and its kf is processed. This led to pr96841.c no longer passing, as the while loop makes the analysis too complex. (-Wanalyzer-too-complex) > > My question is then : Should G++ behave as GCC and ignore the user's > > signatures of built-ins, instead using the attributes specified by > > GCC ? > > At the moment I went with a "yes". If the function is a builtin, I'm > > operating on its builtin_known_function::builtin_decl () (see above) > > rather > > than the callee_fndecl > > deduced from a gimple call, therefore overriding the user's > > signature. > > > > Doing so led to tests sprintf-2.c and realloc-1.c to pass both in GCC > > and > > G++. > > I think I agree with "yes" here. I would like to proceed with what we said here, so that the analyzer proceeds with the analysis of malloc or any other builtins even when their fndecl is not recognized as such. Then I would update pr96841.c to expect a -Wanalyzer-too-complex. Besides, sm-malloc already makes the assumption that a call named malloc should be treated as a malloc, whether it is matching a builtin or not. Thanks, Benjamin.
Re: [PATCH v7 2/4] p1689r5: initial support
On 7/2/23 12:32, Ben Boeckel wrote: This patch implements support for [P1689R5][] to communicate to a build system the C++20 module dependencies to build systems so that they may build `.gcm` files in the proper order. Support is communicated through the following three new flags: - `-fdeps-format=` specifies the format for the output. Currently named `p1689r5`. - `-fdeps-file=` specifies the path to the file to write the format to. - `-fdeps-target=` specifies the `.o` that will be written for the TU that is scanned. This is required so that the build system can correlate the dependency output with the actual compilation that will occur. CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0) using an experimental feature selection (to allow for future usage evolution without committing to how it works today). While it remains experimental, docs may be found in CMake's documentation for experimental features. Future work may include using this format for Fortran module dependencies as well, however this is still pending work. [P1689R5]: https://isocpp.org/files/papers/P1689R5.html [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst TODO: - header-unit information fields Header units (including the standard library headers) are 100% unsupported right now because the `-E` mechanism wants to import their BMIs. A new mode (i.e., something more workable than existing `-E` behavior) that mocks up header units as if they were imported purely from their path and content would be required. - non-utf8 paths The current standard says that paths that are not unambiguously represented using UTF-8 are not supported (because these cases are rare and the extra complication is not worth it at this time). Future versions of the format might have ways of encoding non-UTF-8 paths. For now, this patch just doesn't support non-UTF-8 paths (ignoring the "unambiguously representable in UTF-8" case). - figure out why junk gets placed at the end of the file Sometimes it seems like the file gets a lot of `NUL` bytes appended to it. It happens rarely and seems to be the result of some `ftruncate`-style call which results in extra padding in the contents. Noting it here as an observation at least. Thank you for your patience, just a few tweaks left and I think we can put this all in this week or next. diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h index aef703f8111..141cfd60eda 100644 --- a/libcpp/include/cpplib.h +++ b/libcpp/include/cpplib.h @@ -302,6 +302,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t; /* Style of header dependencies to generate. */ enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM }; +/* Structured format of module dependencies to generate. */ +enum cpp_fdeps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 }; These should be FDEPS_FMT_* or just FDEPS_*. @@ -395,10 +423,16 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax) if (colmax && colmax < 34) colmax = 34; + /* Write out C++ modules information if no other `-fdeps-format=` + option is given. */ + cpp_fdeps_format fdeps_format = CPP_OPTION (pfile, deps.fdeps_format); + bool write_make_modules_deps = fdeps_format == DEPS_FMT_NONE && +CPP_OPTION (pfile, deps.modules); We typically format an expression like this as: + bool write_make_modules_deps = (fdeps_format == FDEPS_FMT_NONE + && CPP_OPTION (pfile, deps.modules)); @@ -473,6 +507,117 @@ deps_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax) make_write (pfile, fp, colmax); } +static void +p1689r5_write_filepath (const char *name, FILE *fp) This and the other p1689r5 functions need more comments, at least one at the top explaining their purpose. Jason
Re: Update on CPython Extension Module -fanalyzer plugin development
On Mon, Aug 21, 2023 at 11:04 AM David Malcolm wrote: > > On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote: > > Hi Dave, > > > > Just wanted to give you and everyone else a short update on how > > reference count checking is going — we can now observe the refcnt > > diagnostic being emitted: > > > > rc3.c:22:10: warning: REF COUNT PROBLEM > >22 | return list; > > | ^~~~ > > ‘create_py_object’: events 1-4 > > | > > |4 | PyObject* item = PyLong_FromLong(3); > > | |^~ > > | || > > | |(1) when ‘PyLong_FromLong’ succeeds > > |5 | PyObject* list = PyList_New(1); > > | |~ > > | || > > | |(2) when ‘PyList_New’ succeeds > > |.. > > | 14 | PyList_Append(list, item); > > | | ~ > > | | | > > | | (3) when ‘PyList_Append’ fails > > |.. > > | 22 | return list; > > | | > > | | | > > | | (4) here > > | > > > > I will fix up and refactor the logic for counting the actual ref > > count > > before coming back and refining the diagnostic to give much more > > detailed information. > > Excellent! Thanks for the update. > > Dave > Hi Dave, I've since fixed up the logic to count the actual reference counts of the PyObject* instances. Now, I'm contemplating the specific diagnostics we'd want to issue and the appropriate conditions for emitting them. With this in mind, I wanted to check in with you on the appropriate approach: To start, I'm adopting the same assumptions as cpychecker for functions returning a PyObject*. That is, I'm operating under the premise that by default such functions return a new reference upon success rather than, for example, a borrowed reference (which we can tackle later on). Given this, it's my understanding that the reference count of the returned object should be 1 if the object is newly created within the function body and incremented by 1 from what it was previously if not newly created (e.g passed in as an argument). Furthermore, the reference count for any PyObject* instances created within the function should be 0, barring situations where we're returning a collection, like a list, that includes references to these objects. Let me know what you think; thanks! Best, Eric
Re: Update on CPython Extension Module -fanalyzer plugin development
On Wed, 2023-08-23 at 17:15 -0400, Eric Feng wrote: > On Mon, Aug 21, 2023 at 11:04 AM David Malcolm > wrote: > > > > On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote: > > > Hi Dave, > > > > > > Just wanted to give you and everyone else a short update on how > > > reference count checking is going — we can now observe the refcnt > > > diagnostic being emitted: > > > > > > rc3.c:22:10: warning: REF COUNT PROBLEM > > > 22 | return list; > > > | ^~~~ > > > ‘create_py_object’: events 1-4 > > > | > > > | 4 | PyObject* item = PyLong_FromLong(3); > > > | | ^~ > > > | | | > > > | | (1) when ‘PyLong_FromLong’ > > > succeeds > > > | 5 | PyObject* list = PyList_New(1); > > > | | ~ > > > | | | > > > | | (2) when ‘PyList_New’ succeeds > > > |.. > > > | 14 | PyList_Append(list, item); > > > | | ~ > > > | | | > > > | | (3) when ‘PyList_Append’ fails > > > |.. > > > | 22 | return list; > > > | | > > > | | | > > > | | (4) here > > > | > > > > > > I will fix up and refactor the logic for counting the actual ref > > > count > > > before coming back and refining the diagnostic to give much more > > > detailed information. > > > > Excellent! Thanks for the update. > > > > Dave > > > > Hi Dave, > > I've since fixed up the logic to count the actual reference counts of > the PyObject* instances. Sounds promising. > Now, I'm contemplating the specific > diagnostics we'd want to issue and the appropriate conditions for > emitting them. With this in mind, I wanted to check in with you on > the > appropriate approach: > > To start, I'm adopting the same assumptions as cpychecker for > functions returning a PyObject*. That is, I'm operating under the > premise that by default such functions return a new reference upon > success rather than, for example, a borrowed reference (which we can > tackle later on). Given this, it's my understanding that the > reference > count of the returned object should be 1 if the object is newly > created within the function body and incremented by 1 from what it > was > previously if not newly created (e.g passed in as an argument). > Furthermore, the reference count for any PyObject* instances created > within the function should be 0, barring situations where we're > returning a collection, like a list, that includes references to > these > objects. > > Let me know what you think; thanks! This sounds like a good approach for v1 of the implementation. It's probably best to focus on getting a simple version of the patch into trunk, and leave any polish of it to followups. In terms of deciding what the reference count of a returned PyObject * ought to be, cpychecker had logic to try to detect callbacks used by PyMethodDef tables, so that e.g. in: static PyMethodDef widget_methods[] = { {"display", (PyCFunction)widget_display, (METH_VARARGS | METH_KEYWORDS), /* ml_flags */ NULL}, {NULL, NULL, 0, NULL} /* terminator */ }; ...we'd know that the callback function "widget_display" follows the standard rules for a PyCFunction (e.g. returns a new reference). But that's for later; don't bother trying to implement that until we have the basics working. Is it worth posting a work-in-progress patch of what you have so far? (you don't need to bother with a ChangeLog for that) Dave
Re: analyzer: How to recognize builtins
On Wed, 2023-08-23 at 15:55 +0200, Benjamin Priour wrote: > Hi David, > > A quick update on transitioning the analyzer tests from gcc.dg to > c-c++-common. > As discussed previously, C builtins are not C++ builtins, therefore I > had > to add > recognition of the builtins by their name rather than with the > predicate > fndecl_built_in_p. > > Do you know why this happens? I see both of those tests have their > own > > prototypes for the functions in question, rather than using system > > headers; maybe those prototypes don't match some property that the > > C++ > > FE is checking for? > > > > The C FE relies on c-decl.cc:c_builtin_function, the macro > C_DECL_BUILTIN_PROTOTYPE, > as well as c-decl.cc:match_builtin_function_types to recognize a > builtin > function decl > and provide a unified type between what the user wrote, and the known > builtin fndecl. > > I'd like however to correct something I've said previously > > My question is then : Should G++ behave as GCC and ignore the user's > > signatures of built-ins, instead using the attributes specified by > > GCC ? > > At the moment I went with a "yes". If the function is a builtin, > > I'm > > operating on its builtin_known_function::builtin_decl () (see > > above) rather > > than the callee_fndecl > > deduced from a gimple call, therefore overriding the user's > > signature. > > > > Actually, GCC does not ignore the user's signatures of built-ins, but > if > match_builtin_function_types finds out that the user's signature > is close enough to the one described in builtins.def, then it will > complete > the user's add with attributes and the likes so that we get an > unified type. > > A case where the user's signature *isn't* close enough to the built- > in's is > gcc.dg/analyzer/pr96841.c's malloc's signature. > > __SIZE_TYPE__ > malloc (__SIZE_TYPE__); > > To simplify, whenever a Wbuiltin-declaration-mismatch is emitted, > then the > user's decl isn't unified with the built-ins, > and the extra GCC's attributes defined in builtins.def are not added > to the > user's decl. > In such cases, the function decl is not recognized as a builtin by > fndecl_builtin_p, hence the analyzer's (before my patch) > known_function > is never found out. > What this mean is that pr96841.c's malloc uses don't go within > kf_malloc:impl_call_pre nor kf_malloc:impl_call_post with the current > analyzer's handling of built-ins - do note however that sm-malloc > goes as > expected, as it recognizes malloc calls both by name > and builtin decl (sm-malloc.cc::known_allocator_p). > > With my patch, since the recognition is by name and no longer > dependent on > the fndecl_builtin_p predicate, pr96841's malloc *is* > recognized as a builtin, and its kf is processed. > This led to pr96841.c no longer passing, as the while loop makes the > analysis too complex. (-Wanalyzer-too-complex) Are you able to post the part of that patch that affects recognition of builtins? (even if it's just a work-in-progress with no ChangeLog) > > > > My question is then : Should G++ behave as GCC and ignore the > > > user's > > > signatures of built-ins, instead using the attributes specified > > > by > > > GCC ? > > > At the moment I went with a "yes". If the function is a builtin, > > > I'm > > > operating on its builtin_known_function::builtin_decl () (see > > > above) > > > rather > > > than the callee_fndecl > > > deduced from a gimple call, therefore overriding the user's > > > signature. > > > > > > Doing so led to tests sprintf-2.c and realloc-1.c to pass both in > > > GCC > > > and > > > G++. > > > > I think I agree with "yes" here. > > I would like to proceed with what we said here, so that the analyzer > proceeds > with the analysis of malloc or any other builtins even when their > fndecl is > not recognized > as such. Then I would update pr96841.c to expect a -Wanalyzer-too- > complex. > Besides, sm-malloc already makes the assumption that a call named > malloc > should be treated > as a malloc, whether it is matching a builtin or not. Sounds good. Dave
[RISC-V] [SIG-toolchain] Meeting will be canceled (Aug 24, 2023)
Hi all, Today's meeting will be canceled, since during the China RISC-V Summit. The next RISC-V GNU Toolchain meeting is collecting topics: https://docs.google.com/document/d/1JSs-BSlPJ3QYbAb-Add1TlbYx0nOT1ur3jcsITIJ01U/edit?usp=sharing Please add what want to discuss in the next meeting. Regards, Jiawei