Re: GSoC: want to take part in `Extend the static analysis pass for CPython Extension`

2023-04-03 Thread Steven Sun via Gcc
I do not have specific ideas on (c). I prefer to work on (b) if possible.

The PEP 701 branch is under active development now. I review others' PRs
and open some PRs myself.

https://github.com/pablogsal/cpython/pull/54
https://github.com/pablogsal/cpython/pull/61
https://github.com/pablogsal/cpython/pull/63


I will submit a proposal on (b) as soon as possible. And by the way, I can get
to work long before the start-coding timepoint of GSoC timeline.


From: David Malcolm 
Sent: Monday, April 3, 2023 7:41
To: Sun Steven ; gcc@gcc.gnu.org 
Subject: Re: GSoC: want to take part in `Extend the static analysis pass for 
CPython Extension`

On Sat, 2023-04-01 at 20:32 +, Sun Steven via Gcc wrote:
> Hello,

Hi!

I just replied to your other email in the "[GSoC] Interest and initial
proposal for project on reimplementing cpychecker as -fanalyzer plugin
" thread.

>
> I want to take part in this project.
>
> b. Write a plugin to add checking for usage of the CPython API (e.g.
> reference-counting); see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646
>
>
> I know the deadline is arriving, but this idea just came to me now.

Indeed; the deadline for submitting proposals to the official GSoC
website is April 4 - 18:00 UTC (i.e. this coming Tuesday); see:
https://developers.google.com/open-source/gsoc/timeline

Google are very strict about that deadline.

>
> Self-intro:
> I am a fan of C++, and have expertise in writing low-latency codes. I
> previously worked at a high-frequency trading company, mainly writing
> C++ and Python on Linux.
>
> Familiarity with GCC:
> I get an overall idea of how the compiler works. I have debugged
> several GCC c++ frontend bugs. (eg. 108218,  99686, 99019,...)

Thanks; I just took a look at those.


> But I only checked the c++ frontend codes in detail, not the middle
> or backend codes. I have the ability to work with large codebases.
>
> Familiarity with CPython:
> I use a lot of CPython. Recently, I am contributing to the CPython
> interpreter on PEP 701 (mainly on the parser, which I am familiar
> with)
>
>
> I have always been wanting to contribute major changes to GCC, but
> just don't know if that project exists. I understand how middle-end
> works, but never really interact with the GIMPLE. This project allows
> me to take a real look at how GCC's middle end works.

Given your knowledge of both C++ and of CPython internals, this project
sounds like a good way for you to get involved.

>
> I want to know if anyone was already on this project. I would prefer
> a large-sized object (350hrs).

I see you've already posted to the thread Eric started.

>
> If b. was already taken, I also accept a. and c.

I had to check the wiki page to see which ones (a) and (c) were;

(a) is "Add format-string support to -fanalyzer."

(c) is "Add a checker for some API or project of interest to the
contributor (e.g. the Linux kernel, a POSIX API that we're not yet
checking, or something else), either as a plugin, or as part of the
analyzer core for e.g. POSIX."

Do you have specific ideas for (c)?

(a) would make a great project, in that it's reasonably self-contained.
Eric's proposal for (b) plans to eventually tackle it, but there's a
huge amount of potential work in (b) already.

> By the way, I don't really care about the GSoC. If we miss the
> deadline, we can still push forward this project without the support
> of GSoC, as long as I get coached.

I'm keen on helping new GCC contributors, with or without GSoC.  A good
next step is to build GCC from source, and try hacking in a new
warning.  See:
  https://gcc-newbies-guide.readthedocs.io/en/latest/

But remember that the GSoC deadline is April 4 - 18:00 UTC (i.e. this
coming Tuesday), so if you're going to apply, you need to act fast.

Good luck
Dave



GSoC: Porting cpychecker to a -fanalyzer plugin

2023-05-08 Thread Steven Sun via Gcc
Hi, Eric and Dave,

I did not make it to the GSoC program. I am not surprised.

In this email, I would like to share some thoughts on this project with Eric
and pose some questions to Dave.

In the past month, I have been active in the CPython community. Now I am
nominated as a triage member. https://github.com/python/core-workflow/issues/503

I took a look at how the GCC extension and how the analyzer works. I have the
basic idea of how this project should work.


Questions:

1. Where should this project (cpychecker) resides?

Since it's an extension, it may live outside of the GCC project. But it
currently also relies on some internal headers of the analyzer. If it lives
outside, making the analyzer's internal header stable for public use would
be the best choice here.

2. Where do people in GCC discuss development plans or new ideas?

In other large projects, I observed people discussing such things in a forum.

I emailed one of the contributors. He replied that this email list would be such
a place, as well as the IRC channel. But this mailing list is less active than
the project itself. I guess the most discussions are through the `gcc-patch`
mailing list.

Thoughts/Experiences/Advice: (to Eric)

1. Plugins

GCC has plugin mechanisms: https://gcc.gnu.org/wiki/plugins

If you provide a shared library, the compiler loads your library and calls your
function.

It initiates your plugin. Your plugin registers some callbacks. The compiler
invokes the callbacks later.

Specific to the analyzer, you can see this initialization happen at
`gcc/analyzer/engine.cc`.

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/analyzer/engine.cc;h=a5965c2b8ff048e47d9c1687d5298a11020a5bee;hb=HEAD#l6102

You can try writing a basic "nop" plugin first. You need to include those 
headers
defining the virtual function interfaces.

1. State Machine and Known Functions

As you can see from the interface: the class `plugin_analyzer_init_iface`

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/analyzer/analyzer.h;h=a1619525afaf9322f1ef6d6ec387d6eea70f7c0f;hb=HEAD#l275

You can register two things: state machine and known functions.

The state machine is defined in `sm.h`. These provide core functionality. You
can check all those `sm-*.cc` files. For instance, we have several states on a
pointer, malloced or freed. You can read the logic in `sm-malloc.cc`

Known function is defined in `analyzer.h`. It provides you the ability to do
checks on function calls. You can check `kf.cc` for reference implementations.

When completed, this plugin would consist of several `state_machine`s and
`known_function`s.

3. Go through the code logic with GDB

I don't know to what extent you have interacted with GCC or if you have coded in
C++. I strongly recommend using gdb.

I found it very helpful to debug with gdb. You can go through the code with gdb
and do breakpoints anywhere. You don't need to add some debug lines, then
recompile. (Once you have tried compiling GCC, you will understand what I am
saying.) You can also see the full backtrace, knowing the callee of each 
function
(even where function pointers are used).

You can breakpoint all `ana::*` functions using a wildcard character `*.` Then
gcc will break at any function related to the analyzer. You can then use `c` to
continue.

4. Start with easy issues.

You can read David's guide here.
https://gcc-newbies-guide.readthedocs.io/en/latest/index.html

My personal experience is that if you don't know what to do. Try solving
relevant issues. You can merely find out what caused the bug. Solving them would
be a plus.

I did this in issues #109190 and #109027 and understood how the analyzer works.

---


I will act more like a reviewer and adviser for this project. (To Eric:) I can
review your code and give you advice. I will help you more when you are stuck
with some implementation bugs.

CC me the relevant changes. I will review them when I am available.

Best,
Steven



Query status of GSoC project: CPyChecker

2023-06-27 Thread Steven Sun via Gcc
Hi Eric, I am Steven (now) from the CPython team.

How is the project going? Do you have any prototypes
or ideas that can be discussed? Which part will you start at?


I recently debugged dozens of Python bugs, some involving
C APIs. I can provide some test cases for you.


For the ref count part:

A major change (immortal objects) is introduced in Python 3.12.
Basically, immortal objects will have the ref count fixed at
a very large number (depending on `sizeof(void*)` ). But I
don't think it is necessary to implement this in the early
stages.

Some stable API steals reference conditionally (on success),
thus its behavior cannot be simply described by one attribute.


For CPython versions:

Some stable CPython API behavior varied across the minor
release. (eg. 3.10 -> 3.11) For instance, some API accepted
NULL as args for <3.8, but not >=3.8.

Considering both "GCC" and "CPython" are hard for users to
upgrade, we might want to consider how to live with these
behavioral differences in the first place.

Versions older than 3 minor releases cannot be touched. (3.13
now in active development, 3.12, 3.11 for bug fixes, 3.10, 3.9
security fixes only) So, versions <= 3.10 can be treated as frozen.


Re: Query status of GSoC project: CPyChecker

2023-06-29 Thread Steven Sun via Gcc
Hi Eric,

> Thanks for reaching out. The project is still in very early stages. So
> far we have taught the analyzer the basic behavior for
> PyLong_FromLong, PyList_New, and Py_DECREF via known function
> subclassing. Additionally, Py_INCREF is supported out of the box.
> Reference count checking functionality remains the priority, but it is
> not yet fully implemented.

Great!

> Regarding CPython versions, the goal is to just get things working on
> one version first. I arbitrarily picked 3.9, but happy to consider
> another version as an initial goal if it’s more helpful to the CPython
> community.

I am not sure about this.

cpychecker is more beneficial to CPython extension devs than to
CPython devs, since it is almost impossible to let the cpychecker learn
the most updated internal function definitions without handwritten
attributes or seeing the whole function definitions.

So it depends on the extension maintainer. I am observing this pattern
that popular libraries are gradually upgrading. 3.9 and 3.10 is definitely
the current mainstream.

Saying so, I think 3.9 is fine for now, but it will be outdated after 2 to 3
years.


Best,
Steven