Re: GSoC: want to take part in `Extend the static analysis pass for CPython Extension`
I do not have specific ideas on (c). I prefer to work on (b) if possible. The PEP 701 branch is under active development now. I review others' PRs and open some PRs myself. https://github.com/pablogsal/cpython/pull/54 https://github.com/pablogsal/cpython/pull/61 https://github.com/pablogsal/cpython/pull/63 I will submit a proposal on (b) as soon as possible. And by the way, I can get to work long before the start-coding timepoint of GSoC timeline. From: David Malcolm Sent: Monday, April 3, 2023 7:41 To: Sun Steven ; gcc@gcc.gnu.org Subject: Re: GSoC: want to take part in `Extend the static analysis pass for CPython Extension` On Sat, 2023-04-01 at 20:32 +, Sun Steven via Gcc wrote: > Hello, Hi! I just replied to your other email in the "[GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin " thread. > > I want to take part in this project. > > b. Write a plugin to add checking for usage of the CPython API (e.g. > reference-counting); see > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646 > > > I know the deadline is arriving, but this idea just came to me now. Indeed; the deadline for submitting proposals to the official GSoC website is April 4 - 18:00 UTC (i.e. this coming Tuesday); see: https://developers.google.com/open-source/gsoc/timeline Google are very strict about that deadline. > > Self-intro: > I am a fan of C++, and have expertise in writing low-latency codes. I > previously worked at a high-frequency trading company, mainly writing > C++ and Python on Linux. > > Familiarity with GCC: > I get an overall idea of how the compiler works. I have debugged > several GCC c++ frontend bugs. (eg. 108218, 99686, 99019,...) Thanks; I just took a look at those. > But I only checked the c++ frontend codes in detail, not the middle > or backend codes. I have the ability to work with large codebases. > > Familiarity with CPython: > I use a lot of CPython. Recently, I am contributing to the CPython > interpreter on PEP 701 (mainly on the parser, which I am familiar > with) > > > I have always been wanting to contribute major changes to GCC, but > just don't know if that project exists. I understand how middle-end > works, but never really interact with the GIMPLE. This project allows > me to take a real look at how GCC's middle end works. Given your knowledge of both C++ and of CPython internals, this project sounds like a good way for you to get involved. > > I want to know if anyone was already on this project. I would prefer > a large-sized object (350hrs). I see you've already posted to the thread Eric started. > > If b. was already taken, I also accept a. and c. I had to check the wiki page to see which ones (a) and (c) were; (a) is "Add format-string support to -fanalyzer." (c) is "Add a checker for some API or project of interest to the contributor (e.g. the Linux kernel, a POSIX API that we're not yet checking, or something else), either as a plugin, or as part of the analyzer core for e.g. POSIX." Do you have specific ideas for (c)? (a) would make a great project, in that it's reasonably self-contained. Eric's proposal for (b) plans to eventually tackle it, but there's a huge amount of potential work in (b) already. > By the way, I don't really care about the GSoC. If we miss the > deadline, we can still push forward this project without the support > of GSoC, as long as I get coached. I'm keen on helping new GCC contributors, with or without GSoC. A good next step is to build GCC from source, and try hacking in a new warning. See: https://gcc-newbies-guide.readthedocs.io/en/latest/ But remember that the GSoC deadline is April 4 - 18:00 UTC (i.e. this coming Tuesday), so if you're going to apply, you need to act fast. Good luck Dave
GSoC: Porting cpychecker to a -fanalyzer plugin
Hi, Eric and Dave, I did not make it to the GSoC program. I am not surprised. In this email, I would like to share some thoughts on this project with Eric and pose some questions to Dave. In the past month, I have been active in the CPython community. Now I am nominated as a triage member. https://github.com/python/core-workflow/issues/503 I took a look at how the GCC extension and how the analyzer works. I have the basic idea of how this project should work. Questions: 1. Where should this project (cpychecker) resides? Since it's an extension, it may live outside of the GCC project. But it currently also relies on some internal headers of the analyzer. If it lives outside, making the analyzer's internal header stable for public use would be the best choice here. 2. Where do people in GCC discuss development plans or new ideas? In other large projects, I observed people discussing such things in a forum. I emailed one of the contributors. He replied that this email list would be such a place, as well as the IRC channel. But this mailing list is less active than the project itself. I guess the most discussions are through the `gcc-patch` mailing list. Thoughts/Experiences/Advice: (to Eric) 1. Plugins GCC has plugin mechanisms: https://gcc.gnu.org/wiki/plugins If you provide a shared library, the compiler loads your library and calls your function. It initiates your plugin. Your plugin registers some callbacks. The compiler invokes the callbacks later. Specific to the analyzer, you can see this initialization happen at `gcc/analyzer/engine.cc`. https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/analyzer/engine.cc;h=a5965c2b8ff048e47d9c1687d5298a11020a5bee;hb=HEAD#l6102 You can try writing a basic "nop" plugin first. You need to include those headers defining the virtual function interfaces. 1. State Machine and Known Functions As you can see from the interface: the class `plugin_analyzer_init_iface` https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/analyzer/analyzer.h;h=a1619525afaf9322f1ef6d6ec387d6eea70f7c0f;hb=HEAD#l275 You can register two things: state machine and known functions. The state machine is defined in `sm.h`. These provide core functionality. You can check all those `sm-*.cc` files. For instance, we have several states on a pointer, malloced or freed. You can read the logic in `sm-malloc.cc` Known function is defined in `analyzer.h`. It provides you the ability to do checks on function calls. You can check `kf.cc` for reference implementations. When completed, this plugin would consist of several `state_machine`s and `known_function`s. 3. Go through the code logic with GDB I don't know to what extent you have interacted with GCC or if you have coded in C++. I strongly recommend using gdb. I found it very helpful to debug with gdb. You can go through the code with gdb and do breakpoints anywhere. You don't need to add some debug lines, then recompile. (Once you have tried compiling GCC, you will understand what I am saying.) You can also see the full backtrace, knowing the callee of each function (even where function pointers are used). You can breakpoint all `ana::*` functions using a wildcard character `*.` Then gcc will break at any function related to the analyzer. You can then use `c` to continue. 4. Start with easy issues. You can read David's guide here. https://gcc-newbies-guide.readthedocs.io/en/latest/index.html My personal experience is that if you don't know what to do. Try solving relevant issues. You can merely find out what caused the bug. Solving them would be a plus. I did this in issues #109190 and #109027 and understood how the analyzer works. --- I will act more like a reviewer and adviser for this project. (To Eric:) I can review your code and give you advice. I will help you more when you are stuck with some implementation bugs. CC me the relevant changes. I will review them when I am available. Best, Steven
Query status of GSoC project: CPyChecker
Hi Eric, I am Steven (now) from the CPython team. How is the project going? Do you have any prototypes or ideas that can be discussed? Which part will you start at? I recently debugged dozens of Python bugs, some involving C APIs. I can provide some test cases for you. For the ref count part: A major change (immortal objects) is introduced in Python 3.12. Basically, immortal objects will have the ref count fixed at a very large number (depending on `sizeof(void*)` ). But I don't think it is necessary to implement this in the early stages. Some stable API steals reference conditionally (on success), thus its behavior cannot be simply described by one attribute. For CPython versions: Some stable CPython API behavior varied across the minor release. (eg. 3.10 -> 3.11) For instance, some API accepted NULL as args for <3.8, but not >=3.8. Considering both "GCC" and "CPython" are hard for users to upgrade, we might want to consider how to live with these behavioral differences in the first place. Versions older than 3 minor releases cannot be touched. (3.13 now in active development, 3.12, 3.11 for bug fixes, 3.10, 3.9 security fixes only) So, versions <= 3.10 can be treated as frozen.
Re: Query status of GSoC project: CPyChecker
Hi Eric, > Thanks for reaching out. The project is still in very early stages. So > far we have taught the analyzer the basic behavior for > PyLong_FromLong, PyList_New, and Py_DECREF via known function > subclassing. Additionally, Py_INCREF is supported out of the box. > Reference count checking functionality remains the priority, but it is > not yet fully implemented. Great! > Regarding CPython versions, the goal is to just get things working on > one version first. I arbitrarily picked 3.9, but happy to consider > another version as an initial goal if it’s more helpful to the CPython > community. I am not sure about this. cpychecker is more beneficial to CPython extension devs than to CPython devs, since it is almost impossible to let the cpychecker learn the most updated internal function definitions without handwritten attributes or seeing the whole function definitions. So it depends on the extension maintainer. I am observing this pattern that popular libraries are gradually upgrading. 3.9 and 3.10 is definitely the current mainstream. Saying so, I think 3.9 is fine for now, but it will be outdated after 2 to 3 years. Best, Steven