On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote: > Hi, sir. > > I've been trying to understand the static analyzer's code. I spent most > of > my time learning the state machine's API. I learned how state machine's > on_stmt is supposed to "recognize" specific functions and how > on_transition > takes a specific tree from one state to another, and how the captured > states are used by pending_diagnostics to report the errors. > Furthermore, I > was able to create a dummy checker that mimicked the behaviour of sm- > file's > double_fclose and compile GCC with these changes. Is this the right way > of > learning?
This sounds great. > > As you've mentioned on the projects page that you would like to add > more > support for some POSIX APIs. Can you please write (or refer me to a) a > simple C program that uses such an API (and also what the analyzer > should > have done) so that I can attempt to add such a checker to the analyzer. A couple of project ideas: (i) treat data coming from a network connection as tainted, by somehow teaching the analyzer about networking APIs. Ideally: look at some subset of historical CVEs involving network-facing attacks on user- space daemons, and find a way to detect them in the analyzer (need to find a way to mark the incoming data as tainted, so that the analyer "know" about the trust boundary - that the incoming data needs to be sanitized and treated with extra caution; see https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html for my attempts to do this for the Linux kernel). Obviously this is potentially a huge project, so maybe just picking a tiny subset and getting that working as a proof-of-concept would be a good GSoC project. Maybe find an old CVE that someone has written a good write-up for, and think about "how could GCC/-fanalyzer have spotted it?" (ii) add leak-detection for POSIX file descriptors: i.e. the integer values returned by "open", "dup", etc. It would be good to have a check that the user's code doesn't leak these values e.g. on error- handling paths, by failing to close a file-descriptor (and not storing it anywhere). I think that much of this could be done by analogy with the sm-file.cc code. > > Also, I didn't realize the complexity of adding SARIF when I mentioned > it. > I'd rather work on adding more checkers. Fair enough. Hope this above is constructive. Dave > > Regards. > > Mir Immad > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan...@gmail.com> wrote: > > > Hi Sir, > > > > I've been trying to understand the static analyzer's code. I spent > > most of > > my time learning the state machine's API. I learned how state > > machine's > > on_stmt is supposed to "recognize" specific functions and takes a > > specific > > tree from one state to another, and how the captured states are used > > by > > pending_diagnostics to report the errors. Furthermore, I was able to > > create > > a dummy checker that mimicked the behaviour of sm-file's > > double_fclose and > > compile GCC with these changes. Is this the right way of learning? > > > > As you've mentioned on the projects page that you would like to add > > more > > support for some POSIX APIs. Can you please write (or refer me to a) > > a > > simple C program that uses such an API (and also what the analyzer > > should > > have done) so that I can attempt to add such a checker to the > > analyzer. > > > > Also, I didn't realize the complexity of adding SARIF when I > > mentioned it. > > I'd rather work on adding more checkers. > > > > Regards. > > Mir Immad > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <dmalc...@redhat.com> > > wrote: > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote: > > > > HI David, > > > > I've been tinkering with the static analyzer for the last few > > > > days. I > > > > find > > > > the project of adding SARIF output to the analyzer intresting. > > > > I'm > > > > writing > > > > this to let you know that I'm trying to learn the codebase. > > > > Thank you. > > > > > > Excellent. > > > > > > BTW, I think adding SARIF output would involve working more with > > > GCC's > > > diagnostics subsystem than with the static analyzer, since (in > > > theory) > > > all of the static analyzer's output is passing through the > > > diagnostics > > > subsystem - though the static analyzer is probably the only GCC > > > component generating diagnostic paths. > > > > > > I'm happy to mentor such a project as I maintain both subsystems > > > and > > > SARIF output would benefit both - but it would be rather tangential > > > to > > > the analyzer - so if you had specifically wanted to be working on > > > the > > > guts of the analyzer itself, you may want to pick a different > > > subproject. > > > > > > The SARIF standard is rather long and complicated, and we would > > > want to > > > be compatible with clang's implementation. > > > > > > It would be very cool if gcc could also accept SARIF files as an > > > *input* format, and emit them as diagnostics; that might help with > > > debugging SARIF output. (I have a old patch for adding JSON > > > parsing > > > support to GCC that could be used as a starting point for this). > > > > > > Hope the above makes sense > > > Dave > > > > > > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm < > > > > dmalc...@redhat.com> > > > > wrote: > > > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote: > > > > > > Hi everyone, > > > > > > > > > > Hi, and welcome. > > > > > > > > > > > I intend to work on the static analyzer. Are these documents > > > > > > enough to > > > > > > get > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and > > > > > > > > > > > > > > > > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals > > > > > > > > > > Yes. > > > > > > > > > > There are also some high-level notes here: > > > > > https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer > > > > > > > > > > Also, given that the analyzer is part of GCC, the more general > > > > > introductions to hacking on GCC will be useful. > > > > > > > > > > I recommend creating a trivial C source file with a bug in it > > > > > (e.g. > > > > > a > > > > > 3-line function with a use-after-free), and stepping through > > > > > the > > > > > analyzer to get a sense of how it works. > > > > > > > > > > Hope this is helpful; don't hesitate to ask questions. > > > > > Dave > > > > > > > > > > > > > > > > > > >