Update on CPython Extension Module -fanalyzer plugin development

2023-08-25 Thread Eric Feng via Gcc
Hi Dave,

Please find an updated WIP patch on reference count checking below. Some
parts aren't properly formatted yet; I apologize for that.

Since the last WIP patch, the major updates include:
- Updated certain areas of the core analyzer to support custom stmt_finder.
- A significant revamp of the reference count checking logic.
- __analyzer_cpython_dump_refcounts: This dumps reference count related 
information.

Here's an updated look at the current WIP diagnostic we issue:
rc3.c:25:10: warning: Expected  to 
have reference count: ‘1’ but ob_refcnt field is: ‘2’
   25 |   return list;
  |  ^~~~
  ‘create_py_object’: events 1-4
|
|4 |   PyObject* item = PyLong_FromLong(3);
|  |^~
|  ||
|  |(1) when ‘PyLong_FromLong’ succeeds
|5 |   PyObject* list = PyList_New(1);
|  |~
|  ||
|  |(2) when ‘PyList_New’ succeeds
|..
|   16 |   PyList_Append(list, item);
|  |   ~
|  |   |
|  |   (3) when ‘PyList_Append’ succeeds, moving buffer
|..
|   25 |   return list;
|  |  
|  |  |
|  |  (4) here
|

The reference count checking logic for v1 should be almost complete.
Currently, it supports situations where the returned object is newly created.
It doesn't yet support the other case (i.e., incremented by 1 from
what it was previously, if not newly created). 

This week, I've focused primarily on the reference count checking logic. I plan
to shift my focus to refining the diagnostic next. As seen in the placeholder
diagnostic message above, I believe we should at least inform the user about the
variable name associated with the region that has an unexpected reference count
issue (in this case, 'item'). Initially, I suspect the issue might be that:

tree reg_tree = model->get_representative_tree (curr_region);

returns NULL since curr_region is heap allocated and thus the path_var returned 
would be:

path_var (NULL_TREE, 0);

Which means that:

refcnt_stmt_finder finder (*eg, reg_tree);

always receives a NULL_TREE, causing it to always default to the
not_found case. A workaround might be necessary, but I haven't delved too 
deeply into this yet,
so my suspicion could be off. Additionally, I think it would be helpful to show 
users what the
ob_refcnt looks like in each event as well. I'll keep you updated on both these
points and welcome any feedback.

Best,
Eric
---
 gcc/analyzer/engine.cc|   8 +-
 gcc/analyzer/exploded-graph.h |   4 +-
 gcc/analyzer/region-model.cc  |   3 +
 gcc/analyzer/region-model.h   |  38 +-
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-
 5 files changed, 421 insertions(+), 8 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 61685f43fba..f9e239128a0 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -115,10 +115,12 @@ impl_region_model_context (program_state *state,
 }
 
 bool
-impl_region_model_context::warn (std::unique_ptr d)
+impl_region_model_context::warn (std::unique_ptr d,
+const stmt_finder *custom_finder)
 {
   LOG_FUNC (get_logger ());
-  if (m_stmt == NULL && m_stmt_finder == NULL)
+  auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder;
+  if (m_stmt == NULL && curr_stmt_finder == NULL)
 {
   if (get_logger ())
get_logger ()->log ("rejecting diagnostic: no stmt");
@@ -129,7 +131,7 @@ impl_region_model_context::warn 
(std::unique_ptr d)
   bool terminate_path = d->terminate_path_p ();
   if (m_eg->get_diagnostic_manager ().add_diagnostic
  (m_enode_for_diag, m_enode_for_diag->get_supernode (),
-  m_stmt, m_stmt_finder, std::move (d)))
+  m_stmt, curr_stmt_finder, std::move (d)))
{
  if (m_path_ctxt
  && terminate_path
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 4a4ef9d12b4..633f8c263fc 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context
 uncertainty_t *uncertainty,
 logger *logger = NULL);
 
-  bool warn (std::unique_ptr d) final override;
+  bool warn (std::unique_ptr d,
+const stmt_finder *custom_finder = NULL) final override;
   void add_note (std::unique_ptr pn) final override;
   void on_svalue_leak (const svalue *) override;
   void on_liveness_change (const svalue_set &live_svalues,
@@ -106,6 +107,7 @@ class impl_region_model_context : public 
region_model_context
 std::unique_ptr *out_sm_context) override;
 
   const gimple *get_stmt

x64 MSABI gcc 3.2.3

2023-08-25 Thread Paul Edwards via Gcc
I have a slightly modified gcc 3.2.3.

Source is available in gcc-stage* in custom.zip at http://pdos.org

Executables are available in customb.zip but everything that
is really needed is in pdos.zip

gccdos.txt has instructions to run windows.mak which
produces the gccx64.exe that I use, but has these warnings
on the way:

C:\devel\gcc\gcc>pdmake -f windows.mak

gccwin -DTARGET_64BIT_DEFAULT -DMSABI -fno-common -O2 -S -D__WIN32__
-D__NOBIVA__ -DI386 -DHAVE_CONFIG_H -DIN_GCC -DPUREISO -I
../../pdos/pdpclib -I ../../pdos/src -I . -I config/i386 -I config -I
../include -o combine.s combine.c
combine.c: In function `try_combine':
combine.c:1682: warning: left shift count >= width of type
combine.c:1684: warning: left shift count >= width of type
combine.c:1694: warning: left shift count >= width of type
combine.c:1694: warning: left shift count >= width of type
combine.c:1696: warning: left shift count >= width of type
pdas --oformat coff -o combine.o combine.s

If I do pdmake -f makefile.te6 in c:\devel\pdos\pdpclib I do get
a valid 64-bit EFI executable.

But a more complex program gives:

C:\devel\pdos\pdpclib>pdmake -f makefile.eb6
rm -f *.o
rm -f pdptest.efi
gccx64 -S -O2 -mno-red-zone -D__64BIT__ -D__EFI__ -D__EFIBIOS__ -U__WIN32__
-I. -fno-builtin -fno-common -ansi -o pdptest.s pdptest.c
pdptest.c: In function `main':
pdptest.c:189: insn does not satisfy its constraints:
compilation terminated.
[pdptest.o] Error 1: Operation not permitted

C:\devel\pdos\pdpclib>


When I switch from mingw64 to gccx64:

C:\devel\pdos\pdpclib>git diff .
diff --git a/pdpclib/makefile.eb6 b/pdpclib/makefile.eb6
index 19fe84e4..98b02da0 100644
--- a/pdpclib/makefile.eb6
+++ b/pdpclib/makefile.eb6
@@ -1,14 +1,15 @@
 # This builds EFI executables for x86_64
 # We put no-builtin to stop the optimizer from making memset call memset

-CC=x86_64-w64-mingw32-gcc
+#CC=x86_64-w64-mingw32-gcc
+CC=gccx64
 #AR=x86_64-w64-mingw32-ar
 AR=ar
 #LD=x86_64-w64-mingw32-ld
 LD=pdld --no-insert-timestamp
 #AS=x86_64-w64-mingw32-as
 AS=pdas --64 --oformat coff
-COPTS=-S -O2 -mno-red-zone -D__64BIT__ -D__EFI__ -D__EFIBIOS__ -U__WIN32__
-I. -fno-builtin -fno-common -ansi -Wno-builtin-declaration-mismatch
+COPTS=-S -O2 -mno-red-zone -D__64BIT__ -D__EFI__ -D__EFIBIOS__ -U__WIN32__
-I. -fno-builtin -fno-common -ansi

 TARGET=pdptest.efi
 OBJS=efistart.o stdio.o string.o stdlib.o start.o time.o errno.o \

C:\devel\pdos\pdpclib>


Anyone interested in helping out with this?

Note that this (3.2.3) is the only version of gcc that runs
on PDOS/386 and it took a lot of effort to even do that.

Note that 3.2.3 normally wouldn't do the MSABI, but my
slightly modified version does:

#ifdef MSABI
static int const x86_64_int_parameter_registers[6] = {2 /*RCX*/, 1 /*RDX*/,
8 /*R8*/, 9 /*R9*/,
/* the following 2 need to
be deleted */
FIRST_REX_INT_REG + 2 /*R10 */,
FIRST_REX_INT_REG + 3 /*R11 */};
#else
static int const x86_64_int_parameter_registers[6] = {5 /*RDI*/, 4 /*RSI*/,
1 /*RDX*/, 2 /*RCX*/,
FIRST_REX_INT_REG /*R8 */,
FIRST_REX_INT_REG + 1 /*R9 */};
#endif

My version (like standard 3.2.3) has 64-bit long, which is not
what Windows uses, but I am likely to stick with 64-bit long
regardless.

Thanks. Paul.


Re: Update on CPython Extension Module -fanalyzer plugin development

2023-08-25 Thread David Malcolm via Gcc
On Fri, 2023-08-25 at 08:50 -0400, Eric Feng wrote:
> Hi Dave,
> 
> Please find an updated WIP patch on reference count checking below. Some
> parts aren't properly formatted yet; I apologize for that.
> 
> Since the last WIP patch, the major updates include:
> - Updated certain areas of the core analyzer to support custom stmt_finder.
> - A significant revamp of the reference count checking logic.
> - __analyzer_cpython_dump_refcounts: This dumps reference count related 
> information.

Thanks for the patch.

Given the scope, this is close to being ready for trunk; various
comments inline below...


> 
> Here's an updated look at the current WIP diagnostic we issue:
> rc3.c:25:10: warning: Expected  to 
> have reference count: ‘1’ but ob_refcnt field is: ‘2’
>    25 |   return list;
>   |  ^~~~
>   ‘create_py_object’: events 1-4
>     |
>     |    4 |   PyObject* item = PyLong_FromLong(3);
>     |  |    ^~
>     |  |    |
>     |  |    (1) when ‘PyLong_FromLong’ succeeds
>     |    5 |   PyObject* list = PyList_New(1);
>     |  |    ~
>     |  |    |
>     |  |    (2) when ‘PyList_New’ succeeds
>     |..
>     |   16 |   PyList_Append(list, item);
>     |  |   ~
>     |  |   |
>     |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
>     |..
>     |   25 |   return list;
>     |  |  
>     |  |  |
>     |  |  (4) here
>     |
> 

Looks good.

As mentioned in our chat, ideally there would be some kind of
information to the user as to what the reference actually are.

Perhaps it could be a series of notes like this:

note: there is 1 pointer to "*item"
note: (1) '*item' is pointed to by '((PyListObject *)list)->ob_item[0]'

Or, to be even more fancy, we could add events to the execution path
whenever the number of pointers or ob_refcnt changes, giving something
like:

(1) when ‘PyLong_FromLong’ succeeds
(2) '(*item).ob_refcnt' is now 1
(3) when ‘PyList_New’ succeeds
(4) '*item'  is now pointed to by '((PyListObject *)list)->ob_item[0]';
pointer count is 2, ob_refcnt is 1
(5) when 'create_py_object' returns, '(*item).ob_refcnt' is 1

...or somesuch (I'm brainstorming here, and I expect that the above
might not be implementable, due to local variables and SSA names
confusing things).

Plus there's the question of whether the return value "owns" a
reference. 

But all of that can be deferred to follow-up work.


> The reference count checking logic for v1 should be almost complete.
> Currently, it supports situations where the returned object is newly created.
> It doesn't yet support the other case (i.e., incremented by 1 from
> what it was previously, if not newly created). 
> 
> This week, I've focused primarily on the reference count checking logic. I 
> plan
> to shift my focus to refining the diagnostic next. As seen in the placeholder
> diagnostic message above, I believe we should at least inform the user about 
> the
> variable name associated with the region that has an unexpected reference 
> count
> issue (in this case, 'item'). Initially, I suspect the issue might be that:
> 
> tree reg_tree = model->get_representative_tree (curr_region);
> 
> returns NULL since curr_region is heap allocated and thus the path_var 
> returned would be:
> 
> path_var (NULL_TREE, 0);
> 
> Which means that:
> 
> refcnt_stmt_finder finder (*eg, reg_tree);
> 
> always receives a NULL_TREE, causing it to always default to the
> not_found case. A workaround might be necessary, but I haven't delved too 
> deeply into this yet,
> so my suspicion could be off.
> 

You could try looking for a representative tree for the pointer, rather
than the region.

I ran into similar issues with leak detection: we report a leak when
nothing is pointing at the region, but if nothing is pointing at the
region, then there's nothing to label/describe the region with.  A
workaround I've used is to look at the old model immediately before the
state change, rather than the new one.

So for example, use the region_model * for immediately before popping
the frame (if it's available; you might need to make a copy?), and do
something like:

  tree expr = old_model->get_representative_tree (curr_region);



>  Additionally, I think it would be helpful to show users what the
> ob_refcnt looks like in each event as well.
> 

You could try make a custom_event subclass for such ob_refcnt messages,
but adding them could be fiddly.  Perhaps
diagnostic_manager::add_events_for_eedge could support some kind of
plugin hook whenever we add a statement_event, which you could use to
add your extra events whenever the ob_refcnt of the object of interest
differs between the old and the new state? (though currently we only
emit a statement_event for the first stmt in an enode, and that
function is already quite messy, a

How can I run xg++ from its build location?

2023-08-25 Thread Michael Welsh Duggan via Gcc
I am attempting to debug an issue in gcc (PR 110827, if curious).  In
order to do this I have built a stage 1 compiler with debugging and
without optimization as discussed here:

https://gcc.gnu.org/wiki/DebuggingGCC#Building_a_Debuggable_Compiler

I would like run the compiler from its build location without installing
it, but I cannot determine how to have gcc look for its include files
and libraries from within its source and build trees.  My first problem
was that it looked for for cc1plus in the wrong location; my next
problems involved include paths.

Is it possible to do this without extensive command-line options, or
does this need to be installed?  If the latter, what target do I use to
install the unoptimized stage 1 compiler?

-- 
Michael Welsh Duggan
(m...@md5i.com)



Re: How can I run xg++ from its build location?

2023-08-25 Thread David Edelsohn via Gcc
On Fri, Aug 25, 2023 at 4:16 PM Michael Welsh Duggan via Gcc <
gcc@gcc.gnu.org> wrote:

> I am attempting to debug an issue in gcc (PR 110827, if curious).  In
> order to do this I have built a stage 1 compiler with debugging and
> without optimization as discussed here:
>
> https://gcc.gnu.org/wiki/DebuggingGCC#Building_a_Debuggable_Compiler
>
> I would like run the compiler from its build location without installing
> it, but I cannot determine how to have gcc look for its include files
> and libraries from within its source and build trees.  My first problem
> was that it looked for for cc1plus in the wrong location; my next
> problems involved include paths.
>
> Is it possible to do this without extensive command-line options, or
> does this need to be installed?  If the latter, what target do I use to
> install the unoptimized stage 1 compiler?
>

One can use

$ ./xgcc -B./

to run GCC in the build directory, or wherever the driver and compiler are
installed.

To debug the compiler, it is easier to produce a preprocessed file with the
header files included (-E command line option) and direct the compiler,
e.g, cc1 or cc1plus, to use the preprocessed file as input.

David


Re: How can I run xg++ from its build location?

2023-08-25 Thread Sam James via Gcc


Michael Welsh Duggan via Gcc  writes:

> I am attempting to debug an issue in gcc (PR 110827, if curious).  In
> order to do this I have built a stage 1 compiler with debugging and
> without optimization as discussed here:
>
> https://gcc.gnu.org/wiki/DebuggingGCC#Building_a_Debuggable_Compiler
>
> I would like run the compiler from its build location without installing
> it, but I cannot determine how to have gcc look for its include files
> and libraries from within its source and build trees.  My first problem
> was that it looked for for cc1plus in the wrong location; my next
> problems involved include paths.
>

You can try something like:
./xgcc -B$(pwd) -I$(pwd) ... # (may need to add some more include
directives in)

I know someone was interested in adding a proper wrapper for this
so it Just Worked.

But for now, I'd really just...

> Is it possible to do this without extensive command-line options, or
> does this need to be installed?  If the latter, what target do I use to
> install the unoptimized stage 1 compiler?

Is there a particular reason you can't install it to a temporary prefix,
like ./configure --prefix=/opt/foo or even /tmp/foo?


Re: How can I run xg++ from its build location?

2023-08-25 Thread David Malcolm via Gcc
On Fri, 2023-08-25 at 16:12 -0400, Michael Welsh Duggan via Gcc wrote:
> I am attempting to debug an issue in gcc (PR 110827, if curious).  In
> order to do this I have built a stage 1 compiler with debugging and
> without optimization as discussed here:
> 
> https://gcc.gnu.org/wiki/DebuggingGCC#Building_a_Debuggable_Compiler
> 
> I would like run the compiler from its build location without
> installing
> it, but I cannot determine how to have gcc look for its include files
> and libraries from within its source and build trees.  My first
> problem
> was that it looked for for cc1plus in the wrong location; my next
> problems involved include paths.
> 
> Is it possible to do this without extensive command-line options, or
> does this need to be installed?  If the latter, what target do I use
> to
> install the unoptimized stage 1 compiler?
> 

You could try:
  ./xgcc -B.   REST OF GCC OPTIONS   -wrapper gdb,--args

where "./xgcc -B." will run the locally-built "driver" binary (xgcc),
and make it look for things like "cc1" in ".".

You may find this guide on debugging gcc helpful:
  https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html

Dave



gcc-12-20230825 is now available

2023-08-25 Thread GCC Administrator via Gcc
Snapshot gcc-12-20230825 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20230825/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-12 revision 8443b6abd7bc7e2f3ed28a0a1d8ebdde9da31ff3

You'll find:

 gcc-12-20230825.tar.xz   Complete GCC

  SHA256=9e0c6207e540eb9904cde0bab9be183b9939207ce5d2d02454240acd846167c5
  SHA1=6eeea7e189857cf218fe09a2b1ab0ef7dc9d2887

Diffs from 12-20230818 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.